-
Shanghai Jiao Tong University
- Shanghai
Highlights
- Pro
Stars
"🐈 nanobot: The Ultra-Lightweight Personal AI Agent"
🏛️ 三省六部制 · OpenClaw Multi-Agent Orchestration System — 9 specialized AI agents with real-time dashboard, model config, and full audit trails
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
Official inference repo for FLUX.1 models
Open-source multi-turn evaluation toolkit of LLMs. Under construction...
A benchmark evolving framework and a benchmark for LLMs' multi-turn instruction following evaluation.
Downloads videos and playlists from YouTube
MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation Model
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
The baselines of ARC-Challenge-Interspeech2026
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…
上海交通大学开题报告/中期报告LaTeX模板(非官方) Shanghai Jiao Tong University LaTeX templates for thesis proposals and annual reports (unofficial)
A benchmark on visual perception in text strings for both LLMs and MLLMs.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
AI for Science 论文解读合集(持续更新ing),论文/数据集/教程下载:hyper.ai
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
AcadHomepage: A Modern and Responsive Academic Personal Homepage
This is a repository for listing papers on scene graph generation and application.
Official repo for 'Large Multimodal Models Evaluation: A Survey'
This project introduces a novel, user-centric leaderboard for Large Language Models (LLMs) that moves beyond one-size-fits-all evaluations. Our framework empowers users to create personalized ranki…
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871