-
Tencent
- Shanghai, People's Republic of China
Stars
A simple yet powerful agent framework that delivers with open-source models
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
仅需Python基础,从0构建大语言模型;从0逐步构建GLM4\Llama3\RWKV6, 深入理解大模型原理
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
📋 A list of open LLMs available for commercial use.
The official GitHub page for the survey paper "A Survey of Large Language Models".
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Align Anything: Training All-modality Model with Feedback
Solve Visual Understanding with Reinforced VLMs
Code for Finetune like you pretrain: Improved finetuning of zero-shot vision models
TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
A concise but complete implementation of CLIP with various experimental improvements from recent papers
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
EVA Series: Visual Representation Fantasies from BAAI
LAVIS - A One-stop Library for Language-Vision Intelligence