Skip to content
#

deltalake

Here are 84 public repositories matching this topic...

An end-to-end Netflix data engineering pipeline built on Microsoft Azure. This project ingests raw Netflix data, applies PySpark transformations , enforces data quality with Delta Live Tables, and orchestrates workflows via Azure Data Factory and Databricks.

  • Updated Jul 8, 2025

An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format

  • Updated Jul 8, 2025

This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.

  • Updated Apr 23, 2025
  • Python

Improve this page

Add a description, image, and links to the deltalake topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the deltalake topic, visit your repo's landing page and select "manage topics."

Learn more