Starred repositories
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Tiny language detector + tokenizer for 50+ SE/South Asian languages (Burmese, Karen, Chin, Shan, Mon, Khmer, Lao, Thai, Tamil, Hindi, …). Revamp of pyidaungsu.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, a…
Calculate quality metrics with FFmpeg (SSIM, PSNR, VMAF, VIF)
MiMo-Audio: Audio Language Models are Few-Shot Learners
Open-source music recognition app with Concert Finder, Story Cards and Personal Stats
An open-source Shazam client for Linux, written in Rust.
A vector index built on TurboQuant, written in Rust with Python bindings
Free, no-nonsense, super fast blogging.
The AI coding agent that runs on stolen Chipotle compute 🌯 Fork of OpenCode with Pepper AI as default model. Community project to add providers from Home Depot, Lowes, Target, Starbucks & more.
A reasonably complete and well-tested golang port of httpbin, with zero dependencies outside the go stdlib.
Http server for pytest to test http clients
Creates diagrams from textual descriptions!
Enhancing Cross-Lingual Transfer through Reversible Transliteration: A Huffman-Based Approach for Low-Resource Languages (ACL 2025)
Scalable data pre processing and curation toolkit for LLMs
BirdNET analyzer for scientific audio data processing.
Identify bird sounds in real time with this Android version of BirdNET. Bird sound recognition for more than 6,000 species worldwide.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing…
Next-generation Punkt sentence boundary detection with zero dependencies
Visualize Different Text Splitting Methods
Sample code for deep learning & neural networks