Stars
Modern e-commerce analytics stack: MySQL → S3 → Snowflake → dbt → Dagster. Implements incremental ingestion, SCD handling, data quality checks, and enterprise-grade governance.
Practical examples of Slowly Changing Dimensions (SCD Type 0, 1, and 2) using dbt and MySQL.
This repository contains advanced LLM-based chatbots for Q&A using LLM agents, and Retrieval Augmented Generation (RAG) and with different databases. (VectorDB, GraphDB, SQLite, CSV, XLSX, etc.)
Expert knowledge skills for Claude Code to help with specific technologies and tools.
A real-time change data capture (CDC) pipeline using Debezium, Kafka, and PostgreSQL to stream database changes and send alerts via Telegram. The project also includes a PostgreSQL trigger-based au…
This project provisions a modular AWS data pipeline using Terraform. Each AWS service lives in its own directory under infrastructure/services, so you can provision and manage them independently.
A complete real-time Change Data Capture (CDC) pipeline using Apache Flink, MariaDB, and Docker Compose. This project demonstrates how to build a modern streaming analytics system that processes da…
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All comp…
Free MLOps course from DataTalks.Club
A collection of all 774 local government areas in Nigeria. All LGAS with State Name, Latitude, Longitude and Wikidata:Identifiers.
an implementation of an Ingestion framework for lakehouses using dagster, dlthub, slingdata, trino, dremio and minio
Simulate PostgreSQL Change Data Capture with realistic inserts, updates, deletes, and optional schema evolution. Ideal for testing CDC pipelines, Debezium connectors, and streaming systems.
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Projects & Resources to help you become a better AI Developer.
Suite of tools containing an in-memory vector datastore and AI proxy
This is basically a simple web scraping program from Jumia deals --> real estate.
tuberculosis classification in chest radiographs using convolutional neural network and Fastai python library
A tutorial on how to build a visualization dashboard using dash plotly
A repository for python for data science resources in jupyter notebooks.
YAML based tool for monitoring metrics across multiple hosts
Solved Example Questions of the Book "Data Analytics and Decision Making 4th Edition by Albright, Winston and Zappe" in R
Code, Notebooks and Examples from Practical Business Python
Python code for common Machine Learning Algorithms
Multiclass image classification using Convolutional Neural Network