Skip to content
View Gagan-KM's full-sized avatar
💭
in preparation to become a data engineer
💭
in preparation to become a data engineer

Block or report Gagan-KM

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DigitalPlat FreeDomain: Free Domain For Everyone

HTML 178,141 3,606 Updated Apr 24, 2026

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

Jupyter Notebook 42,481 8,410 Updated Jun 10, 2026

PlotLLM is an AI-powered Matplotlib plot generator built with Streamlit. Describe the plot you need in natural language, and a LLM reasoning model writes the Python code for you. Run it locally wit…

Python 8 Updated Feb 10, 2025

This repository implements a real-time credit card fraud detection pipeline using Kafka, Spark and Cassandra. Kafka continuously produces credit card transactions that will be analyzed by the Spark…

Scala 24 11 Updated Feb 3, 2021

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 41,688 7,876 Updated Apr 2, 2026

Realtime data pipeline using Kafka + Spark + AWS S3 (Terraform) + Snowflake

Python 2 1 Updated Feb 26, 2024

Scrape tech articles, transform, do sentiment analysis, and push to a MongoDB Atlas database, build an interactive dashboard with Streamlit to be hosted on its community cloud and automated with Gi…

Jupyter Notebook 5 Updated Jan 4, 2025

Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.

Java 2,304 1,743 Updated Jun 15, 2026

Apache Kafka - A distributed event streaming platform

Java 32,826 15,278 Updated Jun 16, 2026

Azure Data Engineer Project

10 4 Updated Jan 15, 2024

Design data models, build data warehouses and data lakes, automate data pipelines, and work with massive datasets

Jupyter Notebook 7 2 Updated May 19, 2020

In this project, we will build and ETL(Extract,Transform,Load) pipeline using the Spotify API on AWS. The pipeline will retrieve data from the Spotify API, transform into desired format and load it…

Jupyter Notebook 25 4 Updated May 6, 2023

Big Data Engineering Course and project work.

Python 7 5 Updated Mar 9, 2024

Cool DE Projects

Jupyter Notebook 75 8 Updated Mar 22, 2026

Repository containing projects and summaries of my studies in the field of Data Engineering.

HTML 55 21 Updated May 22, 2026

Example end to end data engineering project.

Python 1,411 278 Updated Dec 8, 2022

Personal Data Engineering Projects

Jupyter Notebook 1,017 209 Updated Feb 8, 2023

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Python 1,930 585 Updated Aug 26, 2022

Roadmap for Data Engineering

Java 245 31 Updated Jun 20, 2024

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 45,827 17,247 Updated Jun 16, 2026

Apache Superset is a Data Visualization and Data Exploration Platform

TypeScript 73,313 17,620 Updated Jun 16, 2026

Learn how to develop, deploy and iterate on production-grade ML applications.

Jupyter Notebook 48,138 7,576 Updated Mar 4, 2026

AWS-native chatbot using Bedrock

TypeScript 1,306 531 Updated Jun 16, 2026

This repository helps you learn Python and Machine Learning from scratch.

Jupyter Notebook 2,170 918 Updated Jun 16, 2026