A robust and extensible Data Quality as a Service (DQaaS) platform built with FastAPI and Pandas. This project enables you to upload and validate datasets against configurable rules, version them intelligently, and generate profiling and validation reports — all accessible via a clean, documented API.
- 📤 Upload and validate CSV datasets via API
- 🔐 API key authentication for all endpoints
- 📐 Rule-based validation (
range,not_null,regex,unique, etc.) - 📊 Data profiling using
ydata-profiling - 🧠 Intelligent versioning using file content hashes
- 📝 Auto-generation of reports and metadata
- 🧪 Full unit + integration test suite
- ✅ CI-ready with GitHub Actions and pre-commit hooks
validator-starter/
├── app/ # Backend core (FastAPI, logic, validation)
├── frontend/ # Streamlit frontend UI
├── datasets/ # Validated CSVs (versioned)
├── metadatas/ # JSON validation metadata
├── reports/ # Validation HTML reports
├── profilers/ # Profiling HTML reports
├── tests/ # Unit + integration tests
├── Makefile # Dev workflow commands
├── .env.example # Environment config sample
├── requirements.txt # Global/shared dependencies
make venv # Create both backend/frontend virtualenvs
make install # Install all backend/frontend dependencies
make run-backend # Run the FastAPI backend
make run-ui # Run the Streamlit frontendAll endpoints require a header:
x-api-key: your_api_key_here
Defined via .env or directly in config.py.
POST /validate?rules_file=your_rules.json
file: CSV (multipart/form-data)rules_file: JSON file name invalidation_rules/
[
{ "column": "id", "rule": "not_null" },
{ "column": "email", "rule": "regex", "pattern": "^[^@\s]+@[^@\s]+\.[^@\s]+$" },
{ "column": "age", "rule": "range", "min": 18, "max": 99 },
{ "column": "name", "rule": "unique" }
]make test-unit
make test-api
make coverage- Add Docker support
- Deploy to Render/Railway
- CLI usage:
python -m validator file.csv