Skip to content
View HopeFan's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report HopeFan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open Source version of MetricFlow allows you to define, build, and maintain metrics in code.

Python 4 Updated Sep 23, 2023

DataKit is a browser-based data analysis platform that processes multi-gigabyte files locally. All processing happens in your browser - no data is sent to external servers.

TypeScript 264 14 Updated Dec 8, 2025

Substack API client is a modern TypeScript library provides a clean, entity-based interface to interact with Substack publications, posts, comments, and user profiles.

TypeScript 37 6 Updated Nov 18, 2025

This repo is for the Linkedin Learning course: Data Quality: Transactions, Ingestions, and Storage

Jupyter Notebook 4 3 Updated Nov 10, 2025

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

Jupyter Notebook 34,051 7,210 Updated Dec 19, 2025

Production-Grade Container Scheduling and Management

Go 119,429 42,024 Updated Dec 24, 2025

Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…

Rust 5,851 503 Updated Dec 24, 2025

New file format for storage of large columnar datasets.

C++ 660 59 Updated Dec 22, 2025

Open-source vector similarity search for Postgres

C 18,939 1,005 Updated Dec 13, 2025

Ollama Python library

Python 9,052 876 Updated Dec 11, 2025

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Java 1,370 132 Updated Dec 16, 2025

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…

TypeScript 8,284 1,570 Updated Dec 24, 2025

📙 Awesome Data Catalogs and Observability Platforms.

955 68 Updated Aug 14, 2025

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…

Python 32,866 5,122 Updated Dec 24, 2025

This repo is for LinkedIn Learning course: Data Pipeline Automation with GitHub Actions

HTML 65 79 Updated Dec 24, 2025

This repo is for the Linkedin Learning course: Data Quality: Analytics and Serving

Jupyter Notebook 6 6 Updated Jul 7, 2025

✨ Build dashboards with end-to-end version control. 🔋 CLI w/ batteries included, no infra required. Develop on your laptop for instant results, deploy changes safely (with automated checks), and ke…

Python 89 10 Updated Dec 24, 2025

MetricFlow allows you to define, build, and maintain metrics in code.

Python 1,421 137 Updated Dec 23, 2025

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,330 3,425 Updated Dec 24, 2025

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊

Clojure 45,287 6,122 Updated Dec 24, 2025

SQL databases in Python, designed for simplicity, compatibility, and robustness.

Python 17,386 792 Updated Dec 24, 2025

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Python 2,268 251 Updated Dec 23, 2025

🟣 Nosql interview questions and answers to help you prepare for your next software architecture and design patterns interview in 2025.

28 9 Updated May 19, 2025

An orchestration platform for the development, production, and observation of data assets.

Python 14,639 1,915 Updated Dec 24, 2025

In this repository we store all materials for dlt workshops, courses, etc.

Python 245 43 Updated Dec 11, 2025

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data…

Python 4,912 2,073 Updated Dec 19, 2025

Open, Multi-modal Catalog for Data & AI

Java 3,231 557 Updated Dec 18, 2025

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Python 4,713 972 Updated Dec 18, 2025

Simple, unified interface to multiple Generative AI providers

Python 13,250 1,357 Updated Dec 15, 2025
Next