Find and fix data quality issues in your Last.fm scrobble history
-
Updated
Dec 11, 2025 - HTML
Find and fix data quality issues in your Last.fm scrobble history
Just what you expect from your electricity grid data
Agentic Data Engineering Platform is an open-source, production-ready ETL solution that combines the Medallion Architecture with AI-powered agents that autonomously profile, clean, and optimize your data—so you can focus on insights, not infrastructure.
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
Data Trust Engineering (DTE) is a vendor-neutral, engineering-first approach to building trusted, Data, Analytics and AI-ready data systems. This repo hosts the Manifesto, Patterns, and the Trust Dashboard MVP.
A complete data mining project proposal and methodological blueprint for predicting the link between hypertension and psychopathologies in geriatric care, following the CRISP-DM framework.
Data Migration Quality Framework - A robust ETL pipeline with advanced anomaly detection for ensuring data quality during migrations
End-to-End Data Engineering Pipeline for E-commerce Analytics.
A web application for displaying automation test reports.
Detecting errors and anomalies in structured data using automation
Comprehensive data governance pipeline for SSH honeypot logs—covering data profiling, cleansing, quality assurance, encryption, classification, and GDPR/CCPA/HIPAA compliance. Built with Pandas, Pandera, YData Profiling, and cryptography, with simulated Caesar cipher attacks to demonstrate practical data-security techniques.
Data file examples and user guides for VerityPy and VerityDotNet libraries
R package for delineating temporal dataset shifts in Eletronic Health Records
re_data - fix data issues before your users & CEO would discover them 😊
Collection of R scripts to test packages in conducting data quality assessments
Metrics Observability & Troubleshooting
A comprehensive repository housing a collection of insightful blog posts, in-depth documentation, and resources exploring various facets of data engineering. From ETL processes and database management to orchestration tools, data quality, monitoring, and deployment strategies
This GitHub repository provides a comprehensive set of tools and algorithms for detecting fraud anomalies in various data sources. Fraudulent activities can have severe consequences, impacting businesses and individuals alike. With this repository, we aim to empower researchers with effective techniques to identify and prevent fraudulent behavior.
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."