Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Making large AI models cheaper, faster and more accessible
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Fast, small, and fully autonomous AI personal assistant infrastructure, ANY OS, ANY PLATFORM — deploy anywhere, swap anything 🦀
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Janus-Series: Unified Multimodal Understanding and Generation Models
End-to-End Object Detection with Transformers
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
a state-of-the-art-level open visual language model | 多模态预训练模型
PyTorch implementations of deep reinforcement learning algorithms and environments
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
Segment Anything in Medical Images
3D ResNets for Action Recognition (CVPR 2018)
A collection of loss functions for medical image segmentation
医学影像数据集列表 『An Index for Medical Imaging Datasets』
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
Pytorch framework for doing deep learning on point clouds.