The tdda package provides Python support for
test-driven data analysis
(1-page summary,
blog,
book).
-
Reference Testing (
tdda.referencetest): extensions tounittestandpytestfor testing data analysis pipelines. Supports file-based comparisons, semantic equivalence, automatic rewriting of reference results, and test tagging. -
Automatic Test Generation (
tdda gentest): generates reference tests for any command-line script or program (Python, R, shell, Makefile, ...). *"Gentest writes tests, so you don't have to."*™ -
Constraints (
tdda.constraints): discovers constraints from Pandas DataFrames, Parquet files, flat files, and relational databases; verifies new data against those constraints; detects failing records. -
Regular Expression Inference (
tdda.rexpy): automatically infers regular expressions from a column of string data. -
Data Diff (
tdda diff): compares data frames in Parquet or flat files and reports differences in a visual format. -
Serial Format (
tdda.serial): documents CSV and flat-file formats in.serialmetadata files for accurate, portable reading and writing. Supports conversion to/from CSVW and Frictionless metadata. -
Utility Functions (
tdda.utils): Unicode normalization (Normal Form TK), glyph counting, and RFC 9839 support.
Full documentation: tdda.readthedocs.io
pip install tdda
To upgrade an existing installation:
pip install -U tdda
git clone https://github.com/tdda/tdda.git
cd tdda
pip install .
pip install pygresql # PostgreSQL
pip install mysql-connector-python # MySQL/MariaDB
pip install pymongo # MongoDB
tdda test
- TDDA Blog
- Book
- Quick Reference Guide
- 1-page summary
- Full documentation
- PyCon UK talk (video)
- Mastodon
- Nick Radcliffe
- Simon Brown