data-centric-ai

Here are 87 public repositories matching this topic...

FEROsites / Machine-Learning-AI-Library

📚 Explore a curated library for mastering Machine Learning, Deep Learning, and AI through free resources, courses, and tools for all levels.

nlp opencv machine-learning text-classification tensorflow prediction kaggle weak-supervision dataops outlier-detection labeling data-quality data-curation dataquality explainable-artificial-intelligence xai noisy-labels data-centric-ai

Updated Dec 18, 2025

rajatrajan07 / Machine-Learning-AI-Library

Star

📚 Explore a growing library of free resources for learning Machine Learning, Deep Learning, and AI, covering essential concepts to advanced techniques.

python bot opencv machine-learning deep-learning text-classification data-validation chatbot prediction kaggle weak-supervision dataops labeling datasets data-cleaning data-curation data-labeling data-centric-ai

Updated Dec 18, 2025

voxel51 / fiftyone

Star

Refine high-quality datasets and visual AI models

visualization python data-science machine-learning computer-vision deep-learning artificial-intelligence developer-tools image-classification object-detection data-cleaning active-learning data-quality data-curation unstructured-data vector-search data-centric-ai

Updated Dec 18, 2025
Python

cleanlab / cleanlab

Star

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Updated Dec 17, 2025
Python

Renumics / spotlight

Star

Interactively explore unstructured datasets from your dataframe.

audio machine-learning video computer-vision timeseries images exploratory-data-analysis data-visualization hacktoberfest meshes data-curation unstructured-data data-centric-ai

Updated Dec 17, 2025
TypeScript

cleanlab / cleanvision

Star

Automatically find issues in image datasets and practice data-centric computer vision.

data-science computer-vision deep-learning data-validation exploratory-data-analysis image-classification image-generation image-segmentation image-analysis data-exploration image-quality data-quality data-profiling data-centric-ai

Updated Dec 17, 2025
Python

cleanlab / examples

Star

Notebooks demonstrating example applications of the cleanlab library

data-science notebook robust-machine-learning data-centric-ai

Updated Dec 16, 2025
Jupyter Notebook

KibromBerihu / ai4elife

Star

This data-centric AI repository implements a robust deep learning method (LFBNet) for fully automated tumor segmentation in whole-body [18]F-FDG PET/CT images.

ai deep-learning medical-imaging pet data-analysis survival-analysis image-segmentation biomarkers fdg-pet lymphoma data-centric-ai whole-body-segmentation automated-pet-segmentation pet-segmentation pet-ct-segmentation

Updated Dec 12, 2025
Python

Digital-Dermatology / SelfClean-Revised-Benchmarks

Star

[ML4H 2023] 🧼🔎 SelfClean revised versions of benchmark datasets for more reliable performance estimation.

benchmarks data-cleaning data-quality data-quality-assessment data-centric-ai

Updated Dec 3, 2025

SreeTetali / robust-nli-analysis

Star

robust-nli-analysis robust-NLP-analysis

nlp machine-learning natural-language-processing deep-learning transformers snli electra huggingface bias-mitigation snli-dataset data-centric-ai

Updated Nov 29, 2025
Python

3lc-ai / 3lc-ultralytics

Star

3LC Integration with Ultralytics YOLO

computer-vision image-classification image-segmentation image-detection data-centric-ai

Updated Dec 5, 2025
Python

shw7007 / Box-Embedding-Unboxed

Star

Unboxing the Geometry of Knowledge Graphs: Analyzing training dynamics and solving topological traps via a Data-Centric approach

visualization pytorch knowledge-graph explainable-ai box-embedding data-centric-ai

Updated Nov 23, 2025
Python

rmovva / wimhf

Star

What's In My Human Feedback? Explaining preferences in human feedback using interpretability + LLMs. https://arxiv.org/abs/2510.26202

human-centered-ai data-centric-ai rlhf

Updated Nov 5, 2025
Python

ArlesZhang / FuelGenius-The-Training-Data-AI-Agent

Star

A training data pipeline + intelligent flywheel system designed specifically for AI data engineers.

data-validation data-version-control data-quality synthetic-data mlflow data-centric-ai mlops-platform

Updated Nov 5, 2025

sfarrukhm / data-centric-mlops-pipeline

Star

This project focuses on the data side of MLOps — building a simple, reliable pipeline around the NYC Green Taxi dataset. It covers data ingestion, validation, and versioning, with automation through FastAPI, Docker, and GitHub Actions. Learning how to make data workflows cleaner, reproducible, and easier to extend toward full ML pipelines.

data-science machine-learning data-engineering reproducibility data-pipeline nyc-taxi-dataset dvc mlops github-actions fastapi data-centric-ai

Updated Oct 22, 2025
Jupyter Notebook

SJTU-DMTai / awesome-ml-data-quality-papers

Star

Papers about training data quality management for ML models.

machine-learning data-management data-quality data-profiling data-debugging data-valuation data-centric-ai ai4db db4ai

Updated Oct 15, 2025

Digital-Dermatology / SelfClean

Star

[NeurIPS 2024] 🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.

machine-learning deep-learning data-cleaning data-curation self-supervised-learning data-centric-ai neurips-2024

Updated Oct 14, 2025
Python

gszfwsb / NCFM

Star

Official PyTorch implementation of the paper "Dataset Distillation with Neural Characteristic Function: A Minmax Perspective" (NCFM) in CVPR 2025 (Full Score, Highlight).

computer-vision synthetic-data data-centric-ai dataset-distillation

Updated Oct 10, 2025
Python

ArthurMangussi / AdvML

Star

Adversarial Machine Learning Applied to Missing Data Imputation

cybersecurity adversarial-machine-learning adversarial-attacks missing-data-imputation data-centric-ai

Updated Sep 10, 2025
Python

aai-institute / pyDVL

Star

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

machine-learning game-theory data-cleaning data-quality banzhaf-index influence-functions robust-machine-learning shapley-value data-valuation data-centric-ai transferlab least-core data-pruning

Updated Sep 7, 2025
Python

Improve this page

Add a description, image, and links to the data-centric-ai topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-centric-ai topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-centric-ai

Here are 87 public repositories matching this topic...

FEROsites / Machine-Learning-AI-Library

rajatrajan07 / Machine-Learning-AI-Library

voxel51 / fiftyone

cleanlab / cleanlab

Renumics / spotlight

cleanlab / cleanvision

cleanlab / examples

KibromBerihu / ai4elife

Digital-Dermatology / SelfClean-Revised-Benchmarks

SreeTetali / robust-nli-analysis

3lc-ai / 3lc-ultralytics

shw7007 / Box-Embedding-Unboxed

rmovva / wimhf

ArlesZhang / FuelGenius-The-Training-Data-AI-Agent

sfarrukhm / data-centric-mlops-pipeline

SJTU-DMTai / awesome-ml-data-quality-papers

Digital-Dermatology / SelfClean

gszfwsb / NCFM

ArthurMangussi / AdvML

aai-institute / pyDVL

Improve this page

Add this topic to your repo