LexNLP is a library for working with real, unstructured legal text, including contracts, plans, policies, procedures, and other material.
- Segmentation and tokenization, such as
- A sentence parser that is aware of common legal abbreviations like LLC. or F.3d.
- Pre-trained segmentation models for legal concepts such as pages or sections.
- Pre-trained word embedding and topic models, broadly and for specific practice areas
- Pre-trained classifiers for document type and clause type
- Broad range of fact extraction, such as:
- Monetary amounts, non-monetary amounts, percentages, ratios
- Conditional statements and constraints, like "less than" or "later than"
- Dates, recurring dates, and durations
- Courts, regulations, and citations
- Tools for building new clustering and classification methods
- Hundreds of unit tests from real legal documents
- ContraxSuite: https://contraxsuite.com/
- LexPredict: https://lexpredict.com/
- Official Website: https://lexnlp.com/
- Documentation: http://lexpredict-lexnlp.readthedocs.io/en/latest/ (in progress)
- Contact: support@contraxsuite.com
- ContraxSuite web application: https://github.com/LexPredict/lexpredict-contraxsuite
- LexNLP library for extraction: https://github.com/LexPredict/lexpredict-lexnlp
- ContraxSuite pre-trained models and "knowledge sets": https://github.com/LexPredict/lexpredict-legal-dictionary
- ContraxSuite agreement samples: https://github.com/LexPredict/lexpredict-contraxsuite-samples
- ContraxSuite deployment automation: https://github.com/LexPredict/lexpredict-contraxsuite-deploy Please note that ContraxSuite installations generally require trained models or knowledge sets for usage.
LexNLP is available under a dual-licensing model. By default, this library can be used under AGPLv3 terms as detailed in the repository LICENSE file; however, organizations can request a release from the AGPL terms or a non-GPL evaluation license by contacting ContraxSuite Licensing at <license@contraxsuite.com>.
- Python 3.13 (minimum; supported range
>=3.13,<3.15is declared inpyproject.toml) uv
cd /path/to/LexNLP
uv python install 3.13
uv venv --python 3.13 .venv
uv pip install --python .venv/bin/python -e ".[dev,test]"
./.venv/bin/python scripts/bootstrap_assets.py --nltk --contract-modelThe project now uses Astral's native uv_build backend — the [build-system] in pyproject.toml declares requires = ["uv_build>=0.9,<0.10"] and build-backend = "uv_build". This drops setuptools/wheel from the build toolchain and keeps the build, resolve and lint toolchain in a single vendor. Build with:
uv build # sdist + wheel
uv build --wheel # wheel onlyConcurrent and Arrow-native extraction helpers that exercise the Python 3.13 feature set declared in pyproject.toml:
from lexnlp.extract.batch import extract_batch, annotations_to_dataframe, find_fuzzy_dates
from lexnlp.extract.en.amounts import get_amount_annotations
# Concurrent batch extraction via ``asyncio.TaskGroup``:
results = extract_batch(get_amount_annotations, docs, max_workers=8)
# Convert any iterable of annotations to a PyArrow-backed pandas DataFrame:
df = annotations_to_dataframe(ann for r in results for ann in r.annotations)
# Fuzzy ISO-date matcher built on the ``regex`` 2024+ engine:
matches = list(find_fuzzy_dates("Shipped 2O24-01-15", max_edits=1))See MODERNIZATION_ROADMAP.md §4.0 for the full design.
python-requirements.txt and python-requirements-dev.txt are deprecated and kept only for legacy reproduction. The Pipfile / Pipfile.lock pair has been removed — ci/check_dist_contents.py continues to ban both from built artifacts. Use uv with pyproject.toml for all local setup and CI workflows.
See MIGRATION_RUNBOOK.md for complete migration/triage/quality-gate procedures.
- Do not add/remove/modify
skip,skipif, orxfailmarkers to bypass failing tests. - Target is 100% pass.
- If Stanford assets are enabled, 100% pass includes both base and Stanford-only suites.
# Base suite
./.venv/bin/pytest lexnlp
# Stanford-only suite (run when Stanford assets are installed)
PATH=/opt/homebrew/opt/openjdk/bin:$PATH \
LEXNLP_USE_STANFORD=true \
./.venv/bin/pytest \
lexnlp/nlp/en/tests/test_stanford.py \
lexnlp/extract/en/entities/tests/test_stanford_ner.py- 2.3.0: November 30, 2022 - Twenty sixth scheduled public release; code
- 2.2.1.0: August 10, 2022 - Twenty fifth scheduled public release; code
- 2.2.0: July 7, 2022 - Twenty fourth scheduled public release; code
- 2.1.0: September 16, 2021 - Twenty third scheduled public release; code
- 2.0.0: May 10, 2021 - Twenty second scheduled public release; code
- 1.8.0: December 2, 2020 - Twenty first scheduled public release; code
- 1.7.0: August 27, 2020 - Twentieth scheduled public release; code
- 1.6.0: May 27, 2020 - Nineteenth scheduled public release; code
- 1.4.0: December 20, 2019 - Eighteenth scheduled public release; code
- 1.3.0: November 1, 2019 - Seventeenth scheduled public release; code
- 0.2.7: August 1, 2019 - Sixteenth scheduled public release; code
- 0.2.6: June 12, 2019 - Fifteenth scheduled public release; code
- 0.2.5: March 1, 2019 - Fourteenth scheduled public release; code
- 0.2.4: February 1, 2019 - Thirteenth scheduled public release; code
- 0.2.3: Junuary 10, 2019 - Twelfth scheduled public release; code
- 0.2.2: September 30, 2018 - Eleventh scheduled public release; code
- 0.2.1: August 24, 2018 - Tenth scheduled public release; code
- 0.2.0: August 1, 2018 - Ninth scheduled public release; code
- 0.1.9: July 1, 2018 - Ninth scheduled public release; code
- 0.1.8: May 1, 2018 - Eighth scheduled public release; code
- 0.1.7: April 1, 2018 - Seventh scheduled public release; code
- 0.1.6: March 1, 2018 - Sixth scheduled public release; code
- 0.1.5: February 1, 2018 - Fifth scheduled public release; code
- 0.1.4: January 1, 2018 - Fourth scheduled public release; code
- 0.1.3: December 1, 2017 - Third scheduled public release; code
- 0.1.2: November 1, 2017 - Second scheduled public release; code
- 0.1.1: October 2, 2017 - Bug fix release for 0.1.0; code
- 0.1.0: September 30, 2017 - First public release; code