LexNLP by LexPredict

Information retrieval and extraction for real, unstructured legal text

LexNLP is a library for working with real, unstructured legal text, including contracts, plans, policies, procedures, and other material.

LexNLP provides functionality such as:

Segmentation and tokenization, such as
- A sentence parser that is aware of common legal abbreviations like LLC. or F.3d.
- Pre-trained segmentation models for legal concepts such as pages or sections.
Pre-trained word embedding and topic models, broadly and for specific practice areas
Pre-trained classifiers for document type and clause type
Broad range of fact extraction, such as:
- Monetary amounts, non-monetary amounts, percentages, ratios
- Conditional statements and constraints, like "less than" or "later than"
- Dates, recurring dates, and durations
- Courts, regulations, and citations
Tools for building new clustering and classification methods
Hundreds of unit tests from real legal documents

Information

ContraxSuite: https://contraxsuite.com/
LexPredict: https://lexpredict.com/
Official Website: https://lexnlp.com/
Documentation: http://lexpredict-lexnlp.readthedocs.io/en/latest/ (in progress)
Contact: support@contraxsuite.com

Structure

ContraxSuite web application: https://github.com/LexPredict/lexpredict-contraxsuite
LexNLP library for extraction: https://github.com/LexPredict/lexpredict-lexnlp
ContraxSuite pre-trained models and "knowledge sets": https://github.com/LexPredict/lexpredict-legal-dictionary
ContraxSuite agreement samples: https://github.com/LexPredict/lexpredict-contraxsuite-samples
ContraxSuite deployment automation: https://github.com/LexPredict/lexpredict-contraxsuite-deploy Please note that ContraxSuite installations generally require trained models or knowledge sets for usage.

Licensing

LexNLP is available under a dual-licensing model. By default, this library can be used under AGPLv3 terms as detailed in the repository LICENSE file; however, organizations can request a release from the AGPL terms or a non-GPL evaluation license by contacting ContraxSuite Licensing at <license@contraxsuite.com>.

Requirements

Python 3.13 (minimum; supported range >=3.13,<3.15 is declared in pyproject.toml)
uv

Quick Setup (uv + pyproject)

cd /path/to/LexNLP
uv python install 3.13
uv venv --python 3.13 .venv
uv pip install --python .venv/bin/python -e ".[dev,test]"
./.venv/bin/python scripts/bootstrap_assets.py --nltk --contract-model

Build system

The project now uses Astral's native uv_build backend — the [build-system] in pyproject.toml declares requires = ["uv_build>=0.9,<0.10"] and build-backend = "uv_build". This drops setuptools/wheel from the build toolchain and keeps the build, resolve and lint toolchain in a single vendor. Build with:

uv build           # sdist + wheel
uv build --wheel   # wheel only

New in this branch: `lexnlp.extract.batch`

Concurrent and Arrow-native extraction helpers that exercise the Python 3.13 feature set declared in pyproject.toml:

from lexnlp.extract.batch import extract_batch, annotations_to_dataframe, find_fuzzy_dates
from lexnlp.extract.en.amounts import get_amount_annotations

# Concurrent batch extraction via ``asyncio.TaskGroup``:
results = extract_batch(get_amount_annotations, docs, max_workers=8)

# Convert any iterable of annotations to a PyArrow-backed pandas DataFrame:
df = annotations_to_dataframe(ann for r in results for ann in r.annotations)

# Fuzzy ISO-date matcher built on the ``regex`` 2024+ engine:
matches = list(find_fuzzy_dates("Shipped 2O24-01-15", max_edits=1))

See MODERNIZATION_ROADMAP.md §4.0 for the full design.

Deprecated Setup Variants

python-requirements.txt and python-requirements-dev.txt are deprecated and kept only for legacy reproduction. The Pipfile / Pipfile.lock pair has been removed — ci/check_dist_contents.py continues to ban both from built artifacts. Use uv with pyproject.toml for all local setup and CI workflows.

Migration Runbook

See MIGRATION_RUNBOOK.md for complete migration/triage/quality-gate procedures.

Test Integrity and Full Validation

Do not add/remove/modify skip, skipif, or xfail markers to bypass failing tests.
Target is 100% pass.
If Stanford assets are enabled, 100% pass includes both base and Stanford-only suites.

# Base suite
./.venv/bin/pytest lexnlp

# Stanford-only suite (run when Stanford assets are installed)
PATH=/opt/homebrew/opt/openjdk/bin:$PATH \
LEXNLP_USE_STANFORD=true \
./.venv/bin/pytest \
  lexnlp/nlp/en/tests/test_stanford.py \
  lexnlp/extract/en/entities/tests/test_stanford_ner.py

Releases

2.3.0: November 30, 2022 - Twenty sixth scheduled public release; code
2.2.1.0: August 10, 2022 - Twenty fifth scheduled public release; code
2.2.0: July 7, 2022 - Twenty fourth scheduled public release; code
2.1.0: September 16, 2021 - Twenty third scheduled public release; code
2.0.0: May 10, 2021 - Twenty second scheduled public release; code
1.8.0: December 2, 2020 - Twenty first scheduled public release; code
1.7.0: August 27, 2020 - Twentieth scheduled public release; code
1.6.0: May 27, 2020 - Nineteenth scheduled public release; code
1.4.0: December 20, 2019 - Eighteenth scheduled public release; code
1.3.0: November 1, 2019 - Seventeenth scheduled public release; code
0.2.7: August 1, 2019 - Sixteenth scheduled public release; code
0.2.6: June 12, 2019 - Fifteenth scheduled public release; code
0.2.5: March 1, 2019 - Fourteenth scheduled public release; code
0.2.4: February 1, 2019 - Thirteenth scheduled public release; code
0.2.3: Junuary 10, 2019 - Twelfth scheduled public release; code
0.2.2: September 30, 2018 - Eleventh scheduled public release; code
0.2.1: August 24, 2018 - Tenth scheduled public release; code
0.2.0: August 1, 2018 - Ninth scheduled public release; code
0.1.9: July 1, 2018 - Ninth scheduled public release; code
0.1.8: May 1, 2018 - Eighth scheduled public release; code
0.1.7: April 1, 2018 - Seventh scheduled public release; code
0.1.6: March 1, 2018 - Sixth scheduled public release; code
0.1.5: February 1, 2018 - Fifth scheduled public release; code
0.1.4: January 1, 2018 - Fourth scheduled public release; code
0.1.3: December 1, 2017 - Third scheduled public release; code
0.1.2: November 1, 2017 - Second scheduled public release; code
0.1.1: October 2, 2017 - Bug fix release for 0.1.0; code
0.1.0: September 30, 2017 - First public release; code

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
.github/workflows		.github/workflows
ci		ci
documentation/docs		documentation/docs
lexnlp		lexnlp
libs		libs
notebooks		notebooks
scripts		scripts
test_data		test_data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.travis.yml		.travis.yml
AGENTS.md		AGENTS.md
DEPENDENCY_MODERNIZATION_PLAN.md		DEPENDENCY_MODERNIZATION_PLAN.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MIGRATION_RUNBOOK.md		MIGRATION_RUNBOOK.md
MODERNIZATION_ROADMAP.md		MODERNIZATION_ROADMAP.md
README.md		README.md
README.rst		README.rst
TY_CHECK_TODO.md		TY_CHECK_TODO.md
arthrod-lexpredict-lexnlp-pr19-comments.md		arthrod-lexpredict-lexnlp-pr19-comments.md
arthrod-lexpredict-lexnlp-pr21-comments.md		arthrod-lexpredict-lexnlp-pr21-comments.md
arthrod-lexpredict-lexnlp-pr22-comments.md		arthrod-lexpredict-lexnlp-pr22-comments.md
index.rst		index.rst
notes.md		notes.md
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LexNLP by LexPredict

Information retrieval and extraction for real, unstructured legal text

LexNLP provides functionality such as:

Information

Structure

Licensing

Requirements

Quick Setup (uv + pyproject)

Build system

New in this branch: `lexnlp.extract.batch`

Deprecated Setup Variants

Migration Runbook

Test Integrity and Full Validation

Releases

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LexNLP by LexPredict

Information retrieval and extraction for real, unstructured legal text

LexNLP provides functionality such as:

Information

Structure

Licensing

Requirements

Quick Setup (uv + pyproject)

Build system

New in this branch: lexnlp.extract.batch

Deprecated Setup Variants

Migration Runbook

Test Integrity and Full Validation

Releases

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

New in this branch: `lexnlp.extract.batch`

Packages