lakehouse

Here are 148 public repositories matching this topic...

adidas / lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

framework big-data spark data-engineering databricks data-quality delta-lake great-expectations lakehouse configuration-driven

Updated Mar 4, 2026
Python

apache / doris-mcp-server

Star

Apache Doris MCP Server

real-time ai mcp olap query-engine lakehouse

Updated Mar 13, 2026
Python

data-dot-all / dataall

Star

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws data-science data aws-s3 redshift etl-framework aws-glue aws-lake-formation lakehouse lakeformation

Updated Apr 8, 2026
Python

google / space

Star

Unified storage framework for the entire machine learning lifecycle

machine-learning tensorflow dml data-warehouse dataset dataops olap ray apache-parquet apache-arrow multimodal multimodal-data tensorflow-dataset mlops lakehouse

Updated Mar 3, 2024
Python

mattiasthalen / adventure-works

Star

Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, event-enhanced Puppini bridges, and temporal resolution across DAS/DAB/DAR layers.

serverless data-warehouse data-engineering data-modeling iceberg data-architecture dimensional-modeling lakehouse duckdb unified-star-schema sqlmesh hook-methodology analytical-data-storage-system

Updated Mar 31, 2025
Python

lisancao / lakehouse-at-home

Star

A fully open-source, self-hostable data lakehouse for local development and testing of modern data workflows

open-source big-data self-hosted lakehouse

Updated Mar 18, 2026
Python

abeltavares / batch-data-pipeline

Star

🦆 Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.

Updated Nov 5, 2025
Python

abeltavares / real-time-data-pipeline

Star

📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

docker open-source aws big-data etl s3 data-visualization data-engineering minio apache-flink apache-kafka real-time-data data-pipeline trino streaming-analytics apache-superset apache-iceberg lakehouse sql-analytics

Updated Jan 18, 2025
Python

ysfesr / Building-Data-LakeHouse

Star

Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data

docker spark presto hive minio s3-storage delta-lake lakehouse

Updated Dec 2, 2023
Python

factorhouse / examples

Star

Feature demos, integration guides & hands-on labs/projects using Kpow, Flex, Kafka, Flink, Iceberg & more

docker kubernetes demo tutorial flex kafka examples project quickstart flink iceberg datastreaming lakehouse kpow factorhouse

Updated Apr 8, 2026
Python

Mmodarre / Lakehouse_Plumber

Star

The Metadata Driven framework for Databricks Lakeflow Declarative Pipelines (formerly Delta Live Tables). Metadata framework that generates production ready Pyspark code for Lakeflow Declarative Pipelines

python databricks dlt etl-framework metadata-driven pypi-package lakehouse delta-live-tables frameworke lakeflow-declarative-pipelines

Updated Apr 8, 2026
Python

lakevision-project / lakevision

Star

Lakevision is a tool which provides insights into your Apache Iceberg based Data Lakehouse.

aws-s3 apache svelte python3 daft iceberg carbon-design-system fast-api carbon-components-svelte lakehouse sveltekit datalakehouse pyiceberg mcp-server

Updated Apr 11, 2026
Python

microsoft / LakeBench

Star

A multi-modal Python library for benchmarking lakehouse engines and ELT scenarios, supporting both industry-standard and novel benchmarks.

benchmark spark benchmark-framework daft lakehouse polars

Updated Mar 30, 2026
Python

jrlasak / databricks_apparel_streaming

Star

Databricks DLT Apparel Pipeline Project: Learn medallion architecture, streaming, and data engineering with Delta Live Tables. Includes synthetic data, step-by-step guide, and certification prep.

etl pyspark data-engineering learning-by-doing data-pipelines databricks dlt databricks-notebooks azure-databricks lakehouse delta-live-tables medallion-architecture

Updated Feb 25, 2026
Python

harrydevforlife / building-lakehouse

Star

Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.

python airflow spark s3 metabase minio dbt flask-api hive-metastore delta-lake lakehouse

Updated Dec 15, 2025
Python

databricks-industry-solutions / omop-cdm

Star

Unlocking the Power of Health Data With a Modern Data Lakehouse

hls rwe lakehouse omop-cdm databricks-industry-solutions

Updated Mar 29, 2026
Python

leehuwuj / olh

Sponsor

Star

Open source stack lakehouse

kubernetes spark bigdata dataplatform deltalake lakehouse

Updated Mar 2, 2024
Python

xikitoptr / ELT_e-commerce

Star

This project implements a Lakehouse Medallion Architecture using modern Data Stack tools such as Fivetran, Snowflake and dbt. The ficticious organization is an e-commerce company.

python sql snowflake dbt elt dataengineering fivetran lakehouse medallion-architecture

Updated Sep 30, 2024
Python

databricks-industry-solutions / interop

Star

From FHIR ingestion to patient outcomes analysis

hls fhir lakehouse databricks-industry-solutions

Updated Dec 2, 2024
Python

databricks-industry-solutions / dns-analytics

Star

Leverage the Databricks Solution Accelerator for DNS analytics to accelerate time to detection and response across petabytes of data. Tap into DNS traffic logs, enrich streaming threat intelligence, and apply advanced analytics to detect DNS abnormalities and prevent malicious attacks.

dns streaming cybersecurity lakehouse databricks-industry-solutions

Updated Jun 16, 2023
Python

Improve this page

Add a description, image, and links to the lakehouse topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lakehouse topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lakehouse

Here are 148 public repositories matching this topic...

adidas / lakehouse-engine

apache / doris-mcp-server

data-dot-all / dataall

google / space

mattiasthalen / adventure-works

lisancao / lakehouse-at-home

abeltavares / batch-data-pipeline

abeltavares / real-time-data-pipeline

ysfesr / Building-Data-LakeHouse

factorhouse / examples

Mmodarre / Lakehouse_Plumber

lakevision-project / lakevision

microsoft / LakeBench

jrlasak / databricks_apparel_streaming

harrydevforlife / building-lakehouse

databricks-industry-solutions / omop-cdm

leehuwuj / olh

xikitoptr / ELT_e-commerce

databricks-industry-solutions / interop

databricks-industry-solutions / dns-analytics

Improve this page

Add this topic to your repo