Stars
LLM Council works together to answer your hardest questions
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Cell2Sentence: Teaching Large Language Models the Language of Biology
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enabling layer-wise analysis of hidden states and predictions.
YSDA course in Natural Language Processing
A framework for evaluating Machine Translation models.
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Toolkit used to collect translations from various online providers and LLMs
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
A benchmark with locally sourced multilingual questions for 31 languages.
Example competitions for the CodaLab project.
Examples and guides for using the Gemini API
A curated list of research papers and resources on code-switching
Quantifying Language Confusion in LLMs.
[NeurIPS 2025 D&B Track] Evaluation Code Repo for Paper "PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts"
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Generate text images for training deep learning ocr model
Render documents on a virtual paper with folds and other types of damage using blender geometry nodes.
A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation