Stars
This repository contains the official implementation code of NeurIPS 2025 paper: "Instance-Level Composed Image Retrieval".
Awesome Unified Multimodal Models
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A lightweight LMM-based Document Parsing Model
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
macOS on the Microsoft Surface Laptop 3 thanks to Acidanthera's OpenCore bootloader
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
检测和提取各种场景图片中的表格区域,并纠正透视和旋转问题 Detect and extract table regions from images in various scenarios, and correct perspective and rotation issues.
整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.
使用Github Action将国外的Docker镜像转存到阿里云私有仓库,供国内服务器使用,免费易用
PyTorch code and models for V-JEPA self-supervised learning from video.
An educational resource to help anyone learn deep reinforcement learning.
Chinese Translation for Book 《Reinforcement Learning- An Introduction》-Second Edition
Publish your Home-Assistant Instance using Matter.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
《精通比特币》第二版 区块链研习社 云天明联合出品。本书更名《精通区块链编程第二版》已由机械工业出版社出版,京东有售。
Xiaomi Home Integration for Home Assistant
Retrieval and Retrieval-augmented LLMs
A curated list of awesome LLM/VLM/VLA for Autonomous Driving(LLM4AD) resources (continually updated)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Supercharge Your LLM Application Evaluations 🚀
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
✨✨Latest Advances on Multimodal Large Language Models
GraphTranslator:Aligning Graph Model to Large Language Model for Open-ended Tasks
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Vuestic Admin is an open-source, ready-to-use admin template suite designed for rapid development, easy maintenance, and high accessibility. Built on Vuestic UI, Vue 3, Vite, Pinia, and Tailwind CS…
Infrared remote library for Arduino: send and receive infrared signals with multiple protocols