Small data pipeline with airflow scheduling
-
Updated
May 5, 2023 - Jupyter Notebook
Small data pipeline with airflow scheduling
Databricks provides a unified, open platform for all your data. It empowers data scientists, data engineers and data analysts with a simple collaborative environment to run interactive and scheduled data analysis workloads.
A datastack playground; includes Spark, Kafka, Airbyte, etc.
Open-source Delta Lake data quality and management tool. Go-first, dbt-compatible, CLI-friendly. Supports profiling, validation, lineage, and alerts.
An end-to-end Netflix data engineering pipeline built on Microsoft Azure. This project ingests raw Netflix data, applies PySpark transformations , enforces data quality with Delta Live Tables, and orchestrates workflows via Azure Data Factory and Databricks.
This is the summary of learning Data Science using Databricks
An end‑to‑end data engineering pipeline for NYC Green Taxi trip records, built on Microsoft Azure. This project ingests Jan–Dec 2024 Parquet files from the NYC Taxi API into a Bronze Delta Lake layer, cleans and enriches the data in a Silver layer with PySpark on Azure Databricks, then saves the transformed data to the Gold layer in delta format
AI-driven AML investigation pipeline using Databricks, Delta Lake, and Azure OpenAI
Kionas is a datawarehouse system!
Projeto de engenharia de dados para obtenção de dados, desenvolvimento de um deltalake com o python e análises com o Apache Spark
This project implements a real-time data pipeline using Apache Airflow, Kafka, Apache Spark, and Delta Lake. It supports both batch (Coldpath) and real-time (Hotpath) data ingestion, processing, and storage. Airflow is used for orchestrating the data workflows.
Self Sandbox for interact with any lake storage like DeltaLake, IceBerg, Hudi etc.
This repository includes all files that compose the design and unification of the databases AdventureWorks and WideWorldAdventure project.
Add a description, image, and links to the deltalake topic page so that developers can more easily learn about it.
To associate your repository with the deltalake topic, visit your repo's landing page and select "manage topics."