etl-pipeline

Star

Here are 2,661 public repositories matching this topic...

apache / streampark

Star

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.

streaming apache easy-to-use etl-pipeline development-framework streampark operation-platform

Updated Nov 5, 2025
Java

AlexIoannides / pyspark-example-project

Star

Implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jan 1, 2023
Python

risingwavelabs / risingwave

Star

Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.

rust database kafka postgresql stream-processing data-engineering materialized-view etl-pipeline apache-iceberg

Updated Dec 18, 2025
Rust

Zipstack / unstract

Star

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

unstructured-data etl-pipeline llm-platform

Updated Dec 18, 2025
Python

san089 / Udacity-Data-Engineering-Projects

Star

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Updated Aug 26, 2022
Python

DataWithBaraa / sql-data-warehouse-project

Sponsor

Star

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Updated Apr 23, 2025
TSQL

orchest / orchest

Star

Build data pipelines, the easy way 🛠️

python docker kubernetes data-science machine-learning airflow cloud deployment jupyter etl ide pipelines self-hosted jupyterlab notebooks data-pipelines dag etl-pipeline orchest

Updated Jun 6, 2023
TypeScript

san089 / goodreads_etl_pipeline

Star

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

apache / hamilton

Star

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

python data-science machine-learning etl pandas orchestration data-engineering data-analysis software-engineering feature-engineering dataframe hacktoberfest dag lineage etl-framework etl-pipeline rag mlops llmops

Updated Dec 6, 2025
Jupyter Notebook

YotpoLtd / metorikku

Star

A simplified, lightweight ETL Framework based on Apache Spark

scala sql big-data spark etl distributed-computing etl-framework etl-pipeline

Updated Jan 24, 2024
Scala

airscholar / e2e-data-engineering

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Feb 14, 2025
Python

Open-Source-Legal / OpenContracts

Star

Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

agent etl unstructured-data etl-pipeline vector-database llm prompt-engineering agentic-ai

Updated Dec 17, 2025
Python

Noobzik / ATL-Datamart

Star

TP d'architecture décisionnel à destination des étudiants de l'EPSI et DC Paris. Le but est de déployer une architecture data dès la récupération de la donnée vers la restitution sous la forme de dataviz en passant par un Datalake, Data Warehouse et d'un Data Mart

data-warehouse minio datalake etl-pipeline data-infrastructure

Updated Apr 16, 2025
Python

martandsingh / ApacheSpark

Star

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Sep 26, 2025
Python

Mmodarre / AzureDataFactoryHOL

Star

Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial

azure azure-data-factory hands-on-lab azure-key-vault etl-pipeline adf-pipeline filter-activity lookup-activity foreach-activity metadata-activity mapping-dataflows hands-on-azure-data-factory azure-data-factory-tutorial azure-modern-data-warehous web-activity foreach-loop-activity

Updated Apr 27, 2021

SorellaLabs / brontes

Star

A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection

rust ethereum evm etl-pipeline mev

Updated Jul 28, 2025
Rust

josephmachado / bitcoinMonitor

Star

Near real time ETL to populate a dashboard.

docker postgres cron docker-compose python3 metabase pytest etl-pipeline nearrealtime

Updated Sep 9, 2025
Python

imsanjoykb / Data-Science-Regular-Bootcamp

Star

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

Updated Jan 29, 2023
Jupyter Notebook

restarone / violet_rails

Star

an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next SaaS/XaaS project. Built with Rails 6, Devise, Sidekiq & PostgreSQL

Updated Dec 13, 2025
Ruby

data-engineering-community / data-engineering-project-template

Sponsor

Star

This is a template you can use for your next data engineering portfolio project.

python data sql etl data-warehouse data-engineering etl-pipeline

Updated Sep 10, 2021

Improve this page

Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl-pipeline

Here are 2,661 public repositories matching this topic...

apache / streampark

AlexIoannides / pyspark-example-project

risingwavelabs / risingwave

Zipstack / unstract

san089 / Udacity-Data-Engineering-Projects

DataWithBaraa / sql-data-warehouse-project

orchest / orchest

san089 / goodreads_etl_pipeline

apache / hamilton

YotpoLtd / metorikku

airscholar / e2e-data-engineering

Open-Source-Legal / OpenContracts

Noobzik / ATL-Datamart

martandsingh / ApacheSpark

Mmodarre / AzureDataFactoryHOL

SorellaLabs / brontes

josephmachado / bitcoinMonitor

imsanjoykb / Data-Science-Regular-Bootcamp

restarone / violet_rails

data-engineering-community / data-engineering-project-template

Improve this page

Add this topic to your repo