Collection of Datasets for Legal Text Processing
A curated list of resources dedicated to legal data. The collection contains data sets, tools and other links related to the legal domain. Most resources are openly available.
- Caselaw Access Project by Harvard Law School
- CourtListner - Search millions of opinions by case name, topic, or citation. 403 Jurisdictions. Sponsored by the Non-Profit Free Law Project.
- H2O Open Case Book
- Open Legal Data
- A Dataset of German Legal Documents for Named Entity Recognition (Lynx Project)
- GerDaLIR: A German Dataset for Legal Information Retrieval (Paper)
- MultiEURLEX - A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer
- Mining Legal Arguments in Court Decisions - Data and software (European Court of Human Rights (ECHR))
- Blackstone - A spaCy pipeline and model for NLP on unstructured legal text.
- Pseudo-anonymization of French legal cases
- Scripts to crawl English legal corpora
- LEGAL-BERT: The Muppets straight out of Law School