Skip to content
View ericlewis's full-sized avatar

Sponsors

@hotelvictorcharlie

Sponsoring

@agg23
@swiftwasm
@kean

Highlights

  • Pro

Block or report ericlewis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Python 220 16 Updated Mar 31, 2025

DeepSeek-OCR as Vision Tower

Python 1 Updated Nov 21, 2025

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 31,373 3,898 Updated Feb 16, 2026

A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating the potential of cross-task information transfer in persona…

Python 128 3 Updated Dec 25, 2025

Research implementation to investigate methods of integrating the speech modality into pre-trained language models

Python 1 Updated Dec 26, 2025

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,867 205 Updated Jan 16, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,818 632 Updated Feb 13, 2026

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 927 931 Updated Jul 4, 2024

A paper list of some recent works about Token Compress for Vit and VLM

831 39 Updated Feb 10, 2026

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 3,360 211 Updated May 19, 2025

Reading list for research topics in multimodal machine learning

6,816 896 Updated Aug 20, 2024

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

Python 305 63 Updated Feb 16, 2026

BLIP-2 implementation for training vision-language models. Q-Former + frozen encoders + any LLM. Colab-ready notebooks with MoE variant.

Jupyter Notebook 3 Updated Dec 19, 2025

An API-compatible, drop-in replacement for Apple's Foundation Models framework with support for custom language model providers.

Swift 771 56 Updated Feb 11, 2026

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

3,538 198 Updated May 7, 2025

A Framework of Small-scale Large Multimodal Models

Python 962 96 Updated Feb 7, 2026

Trying to study the effect of different connectors , (linear, MLP and Cross Attention) to analyze what paradigms do LLM'S use or make a best guess

3 Updated Nov 26, 2025

A curated list of vision-and-language pre-training (VLP). :-)

62 7 Updated Jul 6, 2022

Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Python 225 4 Updated Feb 13, 2026

MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.

Python 394 21 Updated Aug 26, 2025

Turn Apple's CVPR-25 FastVLM encoder into a reproducible baseline for mobile apps. First complete implementation achieving <250ms multimodal inference on iPhone.

Python 11 2 Updated Oct 27, 2025

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

Python 276 9 Updated May 26, 2025
Python 4,563 445 Updated Sep 14, 2025

Fully Open Framework for Democratized Multimodal Reinforcement Learning.

Python 40 3 Updated Dec 19, 2025

Famous Vision Language Models and Their Architectures

Markdown 1,178 54 Updated Jan 11, 2026

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…

Kotlin 7,711 948 Updated Feb 12, 2026

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Ins…

443 49 Updated Sep 25, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 725 32 Updated Feb 12, 2026

Open CoreUI - A rewritten Open WebUI in Rust, significantly reducing memory and resource usage, requiring no dependency services, no Docker, with both a server version and a Tauri-based desktop cli…

Svelte 1,493 105 Updated Dec 28, 2025

Unified LLM orchestration and gateway service for DGX Spark — dynamically manages vLLM, SGLang, and TensorRT-LLM backends under a single OpenAI-compatible API.

Python 3 1 Updated Feb 13, 2026
Next