Stars
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A high-throughput and memory-efficient inference and serving engine for LLMs
Code release for ConvNeXt V2 model
RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO and designed for fine-tuning.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
The unified repository for few-shot font generation methods. This repository includes FUNIT (ICCV'19), DM-Font (ECCV'20), LF-Font (AAAI'21) and MX-Font (ICCV'21).
Tools and instructions for importing custom models into a certain anime game
Code for the paper "Font Representation Learning via Paired-glyph Matching" (BMVC2022)
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Create and modify Word documents with Python
Easy-to-use method for color detection. This method uses multiple ranges and can automatically determine them.
🎨 Color recognition & classification & detection on webcam stream / on video / on single image using K-Nearest Neighbors (KNN) is trained with color histogram features by OpenCV.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Unicoder model for understanding and generation.