Starred repositories
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured tr…
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
A Notebook with Flexible Customization and Easy Integration.
🤖 GPT Vision, Open Source Vision components for GPTs, generative AI, and LLM projects. Not only UI Components.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A lightweight library for portable low-level GPU computation using WebGPU.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Fine-tune SAM (Segment Anything Model) for computer vision tasks such as semantic segmentation, matting, detection ... in specific scenarios
The production-scale datacenter profiler (C/C++, Go, Rust, Python, Java, NodeJS, .NET, PHP, Ruby, Perl, ...)
This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.
Using Low-rank adaptation to quickly fine-tune diffusion models.
Pure JS implementation of the HTML Canvas 2D drawing API
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
ML Records in 1110 Lab of BUPT. Some detailed information can be referenced on: https://mathpretty.com/10388.html