Starred repositories
Creates diagrams from textual descriptions!
Enhancing Cross-Lingual Transfer through Reversible Transliteration: A Huffman-Based Approach for Low-Resource Languages (ACL 2025)
Scalable data pre processing and curation toolkit for LLMs
BirdNET analyzer for scientific audio data processing.
Identify bird sounds in real time with this Android version of BirdNET. Bird sound recognition for more than 6,000 species worldwide.
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing…
Next-generation Punkt sentence boundary detection with zero dependencies
Visualize Different Text Splitting Methods
Sample code for deep learning & neural networks
Financial data platform for analysts, quants and AI agents.
Fast BM25 search in Python, powered by Numpy and Numba
Convert news articles, blog posts (and more) into audio podcast episodes using natural-sounding AI text-to-speech models
An extremely fast Python linter and code formatter, written in Rust.
A bridge between Lichess bots and chess engines
Curated list of datasets and tools for post-training.
Sunfish: a Python Chess Engine in 111 lines of code
A chess library for Python, with move generation and validation, PGN parsing and writing, Polyglot opening book reading, Gaviota tablebase probing, Syzygy tablebase probing, and UCI/XBoard engine c…
WHATWG-compliant and fast URL parser written in modern C++, part of Internet Archive, Node.js, Clickhouse, Redpanda, Kong, Telegram, Adguard, Datadog and Cloudflare Workers.
List of libraries, tools and APIs for web scraping and data processing.
Build a RAG dataset for your domain in just a few lines of codes, using your XML sitemap
Chatmail Rust Core library, used by Android/iOS/desktop chatmail apps, bindings and bots 📧
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages