- Melbourne
Stars
Open Source version of MetricFlow allows you to define, build, and maintain metrics in code.
DataKit is a browser-based data analysis platform that processes multi-gigabyte files locally. All processing happens in your browser - no data is sent to external servers.
Substack API client is a modern TypeScript library provides a clean, entity-based interface to interact with Substack publications, posts, comments, and user profiles.
This repo is for the Linkedin Learning course: Data Quality: Transactions, Ingestions, and Storage
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
Production-Grade Container Scheduling and Management
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
New file format for storage of large columnar datasets.
Open-source vector similarity search for Postgres
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…
📙 Awesome Data Catalogs and Observability Platforms.
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
This repo is for LinkedIn Learning course: Data Pipeline Automation with GitHub Actions
This repo is for the Linkedin Learning course: Data Quality: Analytics and Serving
✨ Build dashboards with end-to-end version control. 🔋 CLI w/ batteries included, no infra required. Develop on your laptop for instant results, deploy changes safely (with automated checks), and ke…
MetricFlow allows you to define, build, and maintain metrics in code.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊
SQL databases in Python, designed for simplicity, compatibility, and robustness.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
🟣 Nosql interview questions and answers to help you prepare for your next software architecture and design patterns interview in 2025.
An orchestration platform for the development, production, and observation of data assets.
In this repository we store all materials for dlt workshops, courses, etc.
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data…
Open, Multi-modal Catalog for Data & AI
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Simple, unified interface to multiple Generative AI providers