Stars
OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains
AI 驱动的学术论文深度分析工具:MinerU 解析 + Claude 生成图文技术文章 + GitHub 代码级创新点定位,结果自动存入 Obsidian vault
WeChat 4.0 database decryptor - extract keys from memory, decrypt SQLCipher 4 databases, real-time message monitor
Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."
Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper.
Code2World: A GUI World Model via Renderable Code Generation
[ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
A curated list of vibe coding references, collaborating with AI to write code.
[ICLR26] Official implementation of the paper "Urban Socio-Semantic Segmentation with Vision-Language Reasoning"
[ACL 2026 Findings] Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
[ICLR2026] There is No VAE: End-To-End Pixel-Space Generative Modeling Via Self-Supervised Pre-Training
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
RoxanneWAAANG / Qwen2.5-VL
Forked from QwenLM/Qwen3-VLQwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
A Survey of Reinforcement Learning for Large Reasoning Models
🔎 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...