re_data - fix data issues before your users & CEO would discover them 😊
-
Updated
Apr 30, 2024 - HTML
re_data - fix data issues before your users & CEO would discover them 😊
Metrics Observability & Troubleshooting
R package for delineating temporal dataset shifts in Eletronic Health Records
Find and fix data quality issues in your Last.fm scrobble history
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
Collection of R scripts to test packages in conducting data quality assessments
R code for the discovery of COVID-19 subgroups by symptoms and comorbidities.
A web application for displaying automation test reports.
Just what you expect from your electricity grid data
TellMeQuality is a tool for measuring Data Quality according to ISO/IEC 25024.
This GitHub repository provides a comprehensive set of tools and algorithms for detecting fraud anomalies in various data sources. Fraudulent activities can have severe consequences, impacting businesses and individuals alike. With this repository, we aim to empower researchers with effective techniques to identify and prevent fraudulent behavior.
To describe age-gender unbiased COVID-19 subphenotypes regarding severity patterns through a two-stage clustering approach using patient phenotypes and demographic features. Additional source and temporal variability assessments are included as part of data quality analyses.
Comprehensive data governance pipeline for SSH honeypot logs—covering data profiling, cleansing, quality assurance, encryption, classification, and GDPR/CCPA/HIPAA compliance. Built with Pandas, Pandera, YData Profiling, and cryptography, with simulated Caesar cipher attacks to demonstrate practical data-security techniques.
Data Trust Engineering (DTE) is a vendor-neutral, engineering-first approach to building trusted, Data, Analytics and AI-ready data systems. This repo hosts the Manifesto, Patterns, and the Trust Dashboard MVP.
A comprehensive repository housing a collection of insightful blog posts, in-depth documentation, and resources exploring various facets of data engineering. From ETL processes and database management to orchestration tools, data quality, monitoring, and deployment strategies
Data file examples and user guides for VerityPy and VerityDotNet libraries
Agentic Data Engineering Platform is an open-source, production-ready ETL solution that combines the Medallion Architecture with AI-powered agents that autonomously profile, clean, and optimize your data—so you can focus on insights, not infrastructure.
Detecting errors and anomalies in structured data using automation
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."