DsFeatFreqComp – Dataset Feature-Frequency Comparison R Package
-
Updated
Dec 29, 2020 - R
DsFeatFreqComp – Dataset Feature-Frequency Comparison R Package
Data Quality control framework for dataframes in R
O Hub é a solução responsável por centralizar a consolidação dos dados no BigQuery, ferramenta escolhida para servir de data warehouse do raft-suite.
Repository containing tutorials and use cases for the NFDInspector
Data quality monitoring library designed for time series data, made for modern data stack
Projeto de Data Lakehouse com o dataset Brazilian E-Commerce Public Dataset by Olist
This is a tool developed in Python to assist with the data governance process, particularly during the migration project Mainframe>MDM>PIC. The team checks the integrity of the data and evaluate business rules are being fullfiled by synchronizing the data between the MDM platform and the current item information on Mainframe. This tool's purpose…
Building Data Pipelines for a data warehouse with Airflow and AWS
Pipeline modular para monitorar qualidade, latência e anomalias em dados empresariais. Inclui validação com Pandera, rastreamento técnico, visualizações e dashboard interativo com Streamlit.
Mini Data Warehouse alimenté via des pipelines ETL Apache Hop (dimensions + fait). Modélisation étoile, SQL, et gestion de la qualité des données.
A R package for assessing LC-MS data quality using total ion current.
SQL practice focused on QA Automation: data validation, data-quality checks, and realistic back-end testing scenarios using PostgreSQL.
Fun personal projects for learning Geo Spatial And Asset Managment Principles
Project of the Systems and Methods for Big and Unstructured Data course at Politecnico di Milano, A.Y. 2023/2024
Prevent potential breaking changes from SQL migrations that might impact downstream tables, charts, etc. Helps to maintain consistency wherever your data is going.
A Set of Metrics and Tools for Data Quality Assessment and Reporting on Rare Diseases Data
🛍️ Modern E-commerce Data Warehouse built with dbt, PostgreSQL & Python. Features dimensional modeling, automated testing, CI/CD pipeline, and comprehensive analytics for customer insights, product performance, and marketing ROI. 📊✨
MySQL data-cleaning project on global layoffs data. Cleaned a raw table (layoffs) by removing duplicates, standardizing fields, filling missing values, and dropping irrelevant columns/rows to produce a cleaned staging table (layoffs_staging2).
Machine learning pipeline to detect fraudulent credit card transactions, reducing losses and saving ~$48M annually.
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."