Stars
Model Context Protocol(MCP) 编程极速入门
The official Java SDK for Model Context Protocol servers and clients. Maintained in collaboration with Spring AI
Official Pytorch Implementation of DenseDiffusion (ICCV 2023)
Code for "Semantic Object Accuracy for Generative Text-to-Image Synthesis" (TPAMI 2020)
Quick scripts to calculate CLIP text-image similarity
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
🔥 公益免费的ChatGPT API,Free ChatGPT API,GPT4 API,可直连,无需代理,使用标准 OpenAI APIKEY 格式访问 ChatGPT,可搭配ChatGPT-next-web、ChatGPT-Midjourney、Lobe-chat、Botgem、FastGPT、沉浸式翻译等项目使用
[ICLR 2025] Benchmarking Agentic Workflow Generation
Repository containing all necessary codes to get started on the SoccerNet Dense Video Captioning challenge.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
hyc2026 / sft-qwen2.5-omni-thinker
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
[CVPR 2024] InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Directed Diffusion: Direct Control of Object Placement through Attention Guidance (AAAI2024)
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
🚴 Call stack profiler for Python. Shows you why your code is slow!
Training-free Regional Prompting for Diffusion Transformers 🔥
[NeurIPS 2023] Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
This is an official repository for the paper, NoiseCollage, which is a revolutionary extension of text-to-image diffusion models for layout-aware image generation.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.