Center for Text Analytic Methods in Legal Studies

The Center for Text Analytic Methods in Legal Studies is a research collaboration of experts from the University of Pittsburgh’s Schools of Law and Computing and Information, the RAND Corporation, Duquesne Law School, and Worcester Polytechnic Institute.

The center's goal is to apply newly developed machine learning and natural language processing techniques to newly available sources of legal text data to evaluate legal and social questions involving racism, gender equality, immigration, public health, crime, or education that have real-world policy implications and that traditionally have not been evaluable to the same extent without text analytic tools.

Initial Investigations

Initially, the Center is investigating drug interdiction automobile stop cases concerning the constitutionality of police decisions to search. Such cases are a persistent cause of racial friction and have led to thousands of court decisions at the state and federal levels. The team seeks to identify factors on which courts rely in assessing if police have “reasonable suspicion” to detain a motorist for further investigation (e.g., a police dog drug sniff), to assign statistical weights to these factors across thousands of cases, to identify explicit or implicit racial bias in the cases and explore their relationships to factor weights and case outcomes, and to draw out the social and legal policy implications of their findings.

An increasing awareness of social issues in the legal domain requires a deeper investigation into court decisions and other legal texts. The Center develops and applies NLP/ML tools to evaluate hypotheses about court decisions concerning social issues involving bias, racism, gender equality, immigration, public health, crime, and education.

New Tools Enable Better Analytics

New developments in NLP and ML and the availability of large text corpora, such as the Harvard Law School Caselaw Access Project’s data comprising 6.7 million federal and state court decisions, make it possible to analyze legal texts as never before. The new tools enable collecting data-supported evidence on the existence of entities, patterns, and relationships in the legal data so that one learns about the law with new empirically- assessed hypothesis-based arguments.

The Center will focus on developing and applying the NLP/ML tools to evaluate hypotheses about systemic aspects of court decisions involving social issues. The Center engages legal domain experts at RAND, Pitt Law, and Duquesne Law to apply new techniques and text corpora to investigate hypotheses in their specialty areas. It will explore the pedagogical potential of engaging law and pre-law students in annotating legal cases to improve case reading skills and train machine learning models.