Skip to content
View noahshpak's full-sized avatar
  • SF // NYC

Sponsoring

@jonhoo

Block or report noahshpak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AI agents running research on single-GPU nanochat training automatically

Python 86,538 12,534 Updated Mar 26, 2026

Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory

C++ 286 12 Updated May 4, 2026

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 4,157 323 Updated May 28, 2026

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,960 444 Updated Mar 5, 2025

🗻 Log-structured, embeddable key-value storage engine written in Rust

Rust 2,113 100 Updated Jun 13, 2026

🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines

Python 4,141 278 Updated Jun 12, 2026

Official Repo for InSTA: Towards Internet-Scale Training For Agents

Python 56 4 Updated Jul 11, 2025

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.

Scala 373 82 Updated Jun 10, 2026

SeaweedFS is a distributed storage system for object storage (S3), file systems, and Iceberg tables, designed to handle billions of files with O(1) disk access and effortless horizontal scaling.

Go 32,893 2,846 Updated Jun 13, 2026

A Kubernetes operator to install and manage Dragonfly instances.

Go 333 100 Updated Jun 12, 2026

Scalable and efficient data transformation framework - backwards compatible with dbt.

Python 3,134 401 Updated Jun 13, 2026

Readings in Databases

8,104 926 Updated Sep 9, 2024

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,847 1,303 Updated Jun 13, 2026

A modern replacement for Redis and Memcached

C++ 30,643 1,188 Updated Jun 13, 2026

Recipes to scale inference-time compute of open models

Python 1,132 132 Updated May 26, 2026

Summarize existing representative LLMs text datasets.

1,472 150 Updated Mar 11, 2026

Train high-quality text-to-image diffusion models in a data & compute efficient manner

Python 514 38 Updated Mar 27, 2025

Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Python 1,149 95 Updated Feb 12, 2026

Python insert-only client for River.

Python 10 4 Updated Oct 2, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 82,772 18,023 Updated Jun 13, 2026

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

Rust 5,562 488 Updated Jun 12, 2026

AIStore: scalable storage for AI applications

Go 1,880 264 Updated Jun 12, 2026

Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…

Rust 6,646 712 Updated Jun 12, 2026

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,922 3,661 Updated Jun 13, 2026

Curate better data for LLMs

Python 1,071 105 Updated Mar 19, 2024

Resource, examples & tutorials for multimodal AI, RAG and agents using vector search and LLMs

Jupyter Notebook 961 169 Updated Apr 24, 2026

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

HTML 10,594 907 Updated Jun 12, 2026
Next