Starred repositories
NextPlaid, ColGREP: Multi-vector search, from database to coding agents.
Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and m…
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
The fastest BM25 scoring engine: 2,300x faster than BM25S. 28K QPS on 8.8M docs. 5 BM25 variants (Robertson, Lucene, ATIRE, BM25L, BM25+). Memory-mapped persistence, BMW pruning, streaming indexing…
A missing piece of the Python multitask (both threads and processes) API: An extension that supports stateful worker pools & size-aware iterators.
SkyRL: A Modular Full-stack RL Library for LLMs
Code for "MetaLint: Generalizable Idiomatic Code Quality Analysis through Instruction-Following and Easy-to-Hard Generalization"
The AILuminate v1.1 benchmark suite is an AI risk assessment benchmark developed with broad involvement from leading AI companies, academia, and civil society.
bloom - evaluate any behavior immediately 🌸🌱
Cypress bot to book restaurants on Tock and optionally report attempts to slack
PromptMII: Meta-Learning Instruction Induction for LLMs
Classifiers for "Investigating Affective Use and Emotional Well-being in ChatGPT"
The conversational control layer for customer-facing AI agents - Parlant is a context-engineering framework optimized for controlling customer interactions.
Lightweight wrapper for generating and editing images from Gemini 2.5 Flash Image/Nano Banana
Format and normalize Chinese names into Western forms
Embeddable library or single binary for indexing and searching 1B vectors
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
one page guides that i let my subscribed/customised agents consume to perform actions
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.5, GPT-OSS, Llama, and more!
A powerful coding agent toolkit providing semantic retrieval and editing capabilities (MCP server & other integrations)
Synthetic data curation for post-training and structured data extraction
Detect and redact PII locally with SOTA performance