By: Jason Lamb

Machine learning has made huge strides in the last few years. Teaching machines to learn and perform autonomous tasks seemed far-fetched not too long ago. However, we have recently seen a boom in Artificial Intelligence (AI) systems outperform humans in specific tasks. For example AlphaGo, an AI system, has bested the top Go (a Japanese strategy board game) player in a game thought to be too complicated for machines to learn, or the human benchmark for GLUE (a benchmark for evaluating the performance of machine models on natural language tasks) has been surpassed by 12 different AI models (as of writing this post). A major factor contributing to many of these recent accomplishments is due in part to advances in neural networks and deep learning

Deep learning

What are Neural Networks? 

Neural networks are a form of machine learning where the aim is to replicate the neurons in the human brain. In reality, a simple neural network is nothing more than a series of linear transformations (y=mx+b but with matrices). How these models typically learn is through a method called “supervised learning”. We feed our model the correct answer, and if the prediction was incorrect, the weights inside the neural layers are adjusted in a way to correct the output. In the past few years, researchers have realized that the more layers you stack end-to-end, the better the predictability of the model tends to be leading to the pioneering of deep learning. 

What is Deep Learning?

Deep learning is typically what we call models that have a “deep” architecture or many layers of neural networks. What deep learning aims to do is add multiple layers that generalize patterns to create a representation of the input sample. 

How is Deep Learning Applied to the Real World?

In facial recognition, a task that often uses deep learning, the first few layers of the network might identify light and dark pixels, where the following layers might identify a cluster of dark pixels as an edge. The middle layers of the network might group certain edges together to form shapes and objects while the last few layers learn the combination of shapes and objects to make a particular human face. 

As you add more layers and complexity to a model, or create deep models, we often notice the model’s ability to predict improves. While larger models tend to perform better at downstream tasks, they suffer the pitfall of requiring massive amounts of field-specific data to adequately generalize a task. To overcome this, many areas of deep learning have turned to pre-training models on an adjacent domain. In natural language processing (analysis of text), researchers have adopted the method of training large models (e.g. BERT) on large corpuses of wikipedia articles to garner a generalized knowledge of the english language prior to learning how to predict on english tasks. In the case of legal prediction tasks, we would be learning how to read prior to learning how to read the law! 

How can Deep Learning be Applied to Law?

The field of law is filled with data, whether it is from case law or statute. As one of the requirements for training accurate models is large amounts of data, the legal field poses many interesting avenues for deep learning. By training models like BERT, which already has a generalized understanding of the English language we could, in theory, further pretrain it to have a generalized understanding of the legal language. If we were able to understand the legal language, we would be able to help predict legal issues such as what our model might think the outcome of a case should have been or inform everyday people what type of outcome they can expect without going to court. Being able to train models that understand the legal language, we would have the potential to provide better access to justice to the general public and help identify inconsistencies in our legal system!