Stars
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Free MLOps course from DataTalks.Club
Python code for common Machine Learning Algorithms
Projects & Resources to help you become a better AI Developer.
Code, Notebooks and Examples from Practical Business Python
TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
This repository contains advanced LLM-based chatbots for Q&A using LLM agents, and Retrieval Augmented Generation (RAG) and with different databases. (VectorDB, GraphDB, SQLite, CSV, XLSX, etc.)
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All comp…
Suite of tools containing an in-memory vector datastore and AI proxy
Multiclass image classification using Convolutional Neural Network
YAML based tool for monitoring metrics across multiple hosts
A complete real-time Change Data Capture (CDC) pipeline using Apache Flink, MariaDB, and Docker Compose. This project demonstrates how to build a modern streaming analytics system that processes da…
Solved Example Questions of the Book "Data Analytics and Decision Making 4th Edition by Albright, Winston and Zappe" in R
A collection of all 774 local government areas in Nigeria. All LGAS with State Name, Latitude, Longitude and Wikidata:Identifiers.
Expert knowledge skills for Claude Code to help with specific technologies and tools.
an implementation of an Ingestion framework for lakehouses using dagster, dlthub, slingdata, trino, dremio and minio
This is basically a simple web scraping program from Jumia deals --> real estate.
tuberculosis classification in chest radiographs using convolutional neural network and Fastai python library
Simulate PostgreSQL Change Data Capture with realistic inserts, updates, deletes, and optional schema evolution. Ideal for testing CDC pipelines, Debezium connectors, and streaming systems.
This project provisions a modular AWS data pipeline using Terraform. Each AWS service lives in its own directory under infrastructure/services, so you can provision and manage them independently.
A tutorial on how to build a visualization dashboard using dash plotly
A repository for python for data science resources in jupyter notebooks.
Modern e-commerce analytics stack: MySQL → S3 → Snowflake → dbt → Dagster. Implements incremental ingestion, SCD handling, data quality checks, and enterprise-grade governance.
Practical examples of Slowly Changing Dimensions (SCD Type 0, 1, and 2) using dbt and MySQL.