Skip to content

tdda/tdda

Repository files navigation

Test-Driven Data Analysis (TDDA)

The tdda package provides Python support for test-driven data analysis (1-page summary, blog, book).

Features

  • Reference Testing (tdda.referencetest): extensions to unittest and pytest for testing data analysis pipelines. Supports file-based comparisons, semantic equivalence, automatic rewriting of reference results, and test tagging.

  • Automatic Test Generation (tdda gentest): generates reference tests for any command-line script or program (Python, R, shell, Makefile, ...). *"Gentest writes tests, so you don't have to."*™

  • Constraints (tdda.constraints): discovers constraints from Pandas DataFrames, Parquet files, flat files, and relational databases; verifies new data against those constraints; detects failing records.

  • Regular Expression Inference (tdda.rexpy): automatically infers regular expressions from a column of string data.

  • Data Diff (tdda diff): compares data frames in Parquet or flat files and reports differences in a visual format.

  • Serial Format (tdda.serial): documents CSV and flat-file formats in .serial metadata files for accurate, portable reading and writing. Supports conversion to/from CSVW and Frictionless metadata.

  • Utility Functions (tdda.utils): Unicode normalization (Normal Form TK), glyph counting, and RFC 9839 support.

Documentation

Full documentation: tdda.readthedocs.io

Installation

pip install tdda

To upgrade an existing installation:

pip install -U tdda

Source installation

git clone https://github.com/tdda/tdda.git
cd tdda
pip install .

Optional database support

pip install pygresql                  # PostgreSQL
pip install mysql-connector-python   # MySQL/MariaDB
pip install pymongo                  # MongoDB

Testing

tdda test

Resources

Authors

  • Nick Radcliffe
  • Simon Brown

About

Test-Driven Data Analysis Functions

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors