Stars
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Rust SDK for Apache Avro - a data serialization system.
Rust crate for Substrait: Cross-Language Serialization for Relational Algebra
High Performance Inter-Thread Messaging Library
New generation of cloud-native and AI-native messaging infrastructure.
Apache Fluss is a streaming storage built for real-time analytics.
Distributed stream processing engine in Rust
Empowering everyone to build reliable and efficient software.
Build Rust Cargo crates within a Java Maven Project
Source code for 'Modern Parallel Programming with C++ and Assembly' by Dan Kusswurm
Fast Static Symbol Table (FSST): efficient random-access string compression
A Comprehensive Survey and Experimental Comparison of Graph-based Approximate Nearest Neighbor Search
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
Build ChatGPT over your data, all with natural language
Header-only C++/python library for fast approximate nearest neighbors
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
A new arguably faster implementation of Apache Spark from scratch in Rust
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
Most advanced key-value database written in Go, extremely fast, compatible with LSM tree and B+ tree.