Starred repositories
An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.
SQLStorm: Taking Database Benchmarking into the LLM Era
Official Repository of "LLM × DATA" Survey Paper
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
AI模型聚合管理中转分发系统,一个应用管理您的所有AI模型,支持将多种大模型转为统一格式调用,支持OpenAI、Claude、Gemini等格式,可供个人或者企业内部管理与分发渠道使用。🍥 The next-generation LLM gateway and AI asset management system supports multiple languages.
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
Data and tools for generating and inspecting OLMo pre-training data.
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
GPTuner is a manual-reading database tuning system leveraging domain knowlege automatically and extensively to enhance knob tuning process.
BigVectorBench advances vector database benchmarking by defining and evaluating the embedding performance of heterogeneous data and abstracting compound queries, which can be multimodal or single-m…
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models".
Open-source search and retrieval database for AI applications.
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
AlayaLite – A Fast, Flexible Vector Database for Everyone.
Vector search engine inside Milvus, integrating FAISS, HNSW, DiskANN.
A library for efficient similarity search and clustering of dense vectors.
Elegant reading of real-time and hottest news
InkFuse - An Experimental Database Runtime Unifying Vectorized and Compiled Query Execution.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream processing, observability, analytics and AI/ML
Global-Scale Sustainable Blockchain Fabric
OpenResume is a powerful open-source resume builder and resume parser. https://open-resume.com/
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with…
OpenHuFu is an open-sourced data federation system to support collaborative queries over multi databases with security guarantee.