Data standard and archive storage for structured FollowTheMoney data, leaked data, private and public document collections.
-
Updated
Feb 11, 2026 - Python
Data standard and archive storage for structured FollowTheMoney data, leaked data, private and public document collections.
Smart Automation Tool for building modern Data Lakes and Data Pipelines
A Python ETL library for creating declarative data pipelines.
Databricks Platform - Architecture, Security, Automation and much more!!
Don't Panic. This guide will help you when it feels like the end of the world.
This repository contains curated code snippets, notebooks, and examples featured in my articles published on Substack/Medium/Towards Data Science.
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
A Flight SQL proxy for Delta Lake. Query Delta tables via Apache Arrow Flight with efficient streaming and predicate pushdown.
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
Python ELT pipeline with Airflow orchestration built to gather and transform API data on asteroid approaching dates, metrics, and descriptive attributes from two endpoints with both batch and incremental approaches along with loading tasks using Deltalake.
AI-driven AML investigation pipeline using Databricks, Delta Lake, and Azure OpenAI
End-to-end Azure data engineering pipeline using Spotify dummy data with Medallion Architecture, Databricks, Delta Live Tables, and incremental loading.
development scaffold for test driven pyspark structured streaming with fast local testing
Deltalake examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
Spark-free Python utilities for Microsoft Fabric focused on Data Engineering using Polars and delta-rs
Add a description, image, and links to the deltalake topic page so that developers can more easily learn about it.
To associate your repository with the deltalake topic, visit your repo's landing page and select "manage topics."