Lists (1)
Sort Name ascending (A-Z)
Stars
ZeroFS - The Filesystem That Makes S3 your Primary Storage. ZeroFS is 9P/NFS/NBD on top of S3. Initially built for www.merklemap.com
High-performance distributed multi-tier cache system. Built in Rust.
A cloud native embedded storage engine built on object storage.
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.
Spark integrations for working with Lance datasets
Integration between Lance and Ray for distributed data processing
The observability platform for Iceberg lakehouses.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A lightweight data processing framework built on DuckDB and 3FS.
Perforator is a cluster-wide continuous profiling tool designed for large data centers
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
antgroup / ant-ray
Forked from ray-project/rayRay is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay is forked from ray, offering incremental new features on top …
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
A collection of RBIR projects and posts for anyone interested in joining this journey.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Eclipse Theia is a cloud & desktop IDE framework implemented in TypeScript.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
New file format for storage of large columnar datasets.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Alluxio, data orchestration for analytics and machine learning in the cloud