Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.
Probabilistic programming with large language models
The simplest, fastest repository for training/finetuning small-sized VLMs.
[ICCV2025] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Fully open reproduction of DeepSeek-R1
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Recipes to scale inference-time compute of open models
Python toolbox for sampling Determinantal Point Processes
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Everything about the SmolLM and SmolVLM family of models
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
A Framework of Small-scale Large Multimodal Models
An open-source implementation for training LLaVA-NeXT.
SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
DeepSeek-VL: Towards Real-World Vision-Language Understanding
A JAX research toolkit for building, editing, and visualizing neural networks.