Lists (3)
Sort Name ascending (A-Z)
Stars
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
a state-of-the-art-level open visual language model | 多模态预训练模型
Google AI 2018 BERT pytorch implementation
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Mobile-Agent: The Powerful GUI Agent Family
OpenMMLab's next-generation platform for general 3D object detection.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Solve Visual Understanding with Reinforced VLMs
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用。
[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
Nightly release of ControlNet 1.1
Most popular metrics used to evaluate object detection algorithms.
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调