-
Fudan University
- Shanghai
-
09:12
(UTC +08:00) - https://blog.csdn.net/weixin_50011798/article/details/135598566
- https://leetcode.com/u/lkdhy/
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Ongoing research training transformer models at scale
Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…
SGLang is a fast serving framework for large language models and vision language models.
We introduce Reasoning via Video, a new paradigm that uses maze-solving video generation to probe multimodal reasoning; our VR-Bench shows that fine-tuned video models consistently outperform stron…
This is a collection of recent papers on reasoning in video generation models.
slime is an LLM post-training framework for RL Scaling.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Enjoy the magic of Diffusion models!
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give it a star 🌟 if you find it useful.
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
LLM/VLM gaming agents and model evaluation through games.
GitHub Copilot CLI brings the power of Copilot coding agent directly to your terminal.
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
A sleek dataset viewer built entirely by AI Agent. Supports streaming large files from WebDAV, S3, SSH, Local or Hugging Face.
A Survey of Reinforcement Learning for Large Reasoning Models
The absolute trainer to light up AI agents.
EvaLearn is a pioneering benchmark designed to evaluate large language models (LLMs) on their learning capability and efficiency in challenging tasks.
A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more inte…
Generate large-scale explorable 3D scenes with high-quality panorama videos from a single image or text prompt.
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a re…
Tongyi Deep Research, the Leading Open-source Deep Research Agent
verl: Volcano Engine Reinforcement Learning for LLMs
Train transformer language models with reinforcement learning.