-
KTH Royal Institute of Technology
- Stockholm
- http://www.karamel.io/
Stars
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache Spark, Apache Flink, ...
A command-line tool for launching Apache Spark clusters.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Ephemeral Hadoop clusters using Google Compute Platform
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.