Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Probabilistic programming with large language models
The simplest, fastest repository for training/finetuning small-sized VLMs.
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Fully open reproduction of DeepSeek-R1
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Recipes to scale inference-time compute of open models
Python toolbox for sampling Determinantal Point Processes
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Everything about the SmolLM and SmolVLM family of models
Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
A Framework of Small-scale Large Multimodal Models
An open-source implementation for training LLaVA-NeXT.
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
DeepSeek-VL: Towards Real-World Vision-Language Understanding
A JAX research toolkit for building, editing, and visualizing neural networks.
Entropy Based Sampling and Parallel CoT Decoding
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.