Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Apr 4, 2026 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
A system for agentic LLM-powered data processing and ETL
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Scalable and efficient data transformation framework - backwards compatible with dbt.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Implementing best practices for PySpark ETL jobs and applications.
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
A Python stream processing engine modeled after Yahoo! Pipes
Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.
Postgres to Elasticsearch/OpenSearch sync
Humans and AI agents, building knowledge bases together. Self-hosted document annotation, version control, semantic search, and MCP.
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
📄 ⚙️ ETL processes for medical and scientific papers
Add a description, image, and links to the etl topic page so that developers can more easily learn about it.
To associate your repository with the etl topic, visit your repo's landing page and select "manage topics."