Skip to content
View shiyongde's full-sized avatar

Block or report shiyongde

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pioneering Automated GUI Interaction with Native Agents

Python 11,004 830 Updated Jan 27, 2026

[CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Python 85 6 Updated Jun 18, 2026

Embedding model prioritized towards Multimodal RAG, overall + VisDoc double top1 on MMEB benchmark

Python 36 1 Updated Jun 16, 2026

The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 10,613 1,595 Updated Jun 15, 2026
Python 9 Updated Jun 18, 2026
Python 52 8 Updated Oct 20, 2025

🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning

Python 397 23 Updated Apr 3, 2026

SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Foundation Models

Python 32 3 Updated Nov 7, 2025

基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版

Python 28,666 6,067 Updated Apr 20, 2026

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

86 7 Updated Jun 6, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 657 58 Updated Mar 17, 2026

skip-vision: efficient and scalable acceleration of vision-language models via adaptive token skipping

Python 12 1 Updated Oct 31, 2025

Inference, Fine Tuning and many more recipes with Gemma family of models

Jupyter Notebook 305 47 Updated Apr 2, 2026

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,531 1,314 Updated Jul 9, 2025

[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding

Python 158 Updated Dec 9, 2025

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

Python 70 1 Updated Jul 22, 2025

A new zero-shot framework to explore and search for the language descriptive targets in unknown environment based on Large Vision Language Model.

Python 74 5 Updated Nov 28, 2024

[IROS 2025 Best Paper Award Finalist & IEEE TRO 2026] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Python 3,065 207 Updated May 29, 2026

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 10,061 783 Updated Sep 22, 2025
Python 54 6 Updated Dec 23, 2024

NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.

Python 7,378 1,269 Updated Jun 17, 2026

Efficient Triton Kernels for LLM Training

Python 6,444 542 Updated Jun 17, 2026

A live stream development of RL tunning for LLM agents

Python 4,103 578 Updated May 5, 2026

Fully local web research and report writing assistant

Python 9,218 966 Updated Jun 9, 2026

🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation

Python 19,865 2,288 Updated Jun 12, 2026

[NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"

Python 275 12 Updated Nov 5, 2025

(CVPR 2025 highlight✨) Official repository of paper "LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models"

Python 598 31 Updated Feb 4, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,705 1,062 Updated Apr 30, 2026
Next