Skip to content
View DingYikang's full-sized avatar
  • Tsinghua University
  • Beijing

Block or report DingYikang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🌟本项目自动抓取并索引科学空间的文章元数据,按研究主题进行规则分类,方便在 GitHub 上快速浏览并跳转到原文。

Python 245 7 Updated Jun 8, 2026
Python 526 12 Updated May 1, 2026
Python 798 83 Updated May 6, 2026

[ICML'26] Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

Python 511 19 Updated May 23, 2026

Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals

Python 2,263 194 Updated Apr 19, 2026

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.

Python 927 75 Updated Mar 20, 2026

一个基于nano banana pro🍌的原生AI PPT生成应用,迈向"Vibe PPT"; 支持上传任意模板图片,上传任意素材&智能解析,一句话/大纲/页面描述自动生成PPT,口头修改指定区域、一键导出可编辑ppt - An AI-native slides generator based on nano banana pro🍌

Python 14,936 1,743 Updated Jun 15, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,533 265 Updated Apr 15, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,072 140 Updated Jun 13, 2026

Terminal Velocity Matching

Python 87 1 Updated Feb 14, 2026

StreamDiffusion, Live Stream APP

Python 490 57 Updated May 19, 2026

Native Multimodal Models are World Learners

Python 1,524 66 Updated Dec 30, 2025

A minimal implementation of DeepMind's Genie world model

Python 1,311 102 Updated Apr 15, 2026

LongLive 2.0: Infra - Long Video Gen

Python 2,336 210 Updated Jun 13, 2026

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 3,127 166 Updated Feb 3, 2026

[ICLR 2026] Pyramidal Patchification Flow for Visual Generation (PPFlow)

Python 7 1 Updated Jul 20, 2025

Project Lyra: Open Generative 3D World Models

Python 2,089 224 Updated Jun 11, 2026

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,832 265 Updated Apr 23, 2026

Latent Bridge Matching for Fast Image-to-Image Translation (ICCV 2025 Highlight)

Python 839 57 Updated Jul 24, 2025

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Python 3,490 262 Updated Jun 3, 2026

[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Python 482 8 Updated Apr 16, 2026

[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Python 570 20 Updated Apr 22, 2026

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 21,652 2,498 Updated May 25, 2026

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,514 181 Updated Mar 28, 2025

✨✨Latest Advances on Multimodal Large Language Models

17,885 1,128 Updated May 1, 2026

The official UniVerse-1 code.

Python 129 11 Updated Oct 13, 2025

ViPE: Video Pose Engine for Geometric 3D Perception

Python 1,981 161 Updated Jun 9, 2026

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 10,673 876 Updated Jun 12, 2026

High-resolution models for human tasks.

Python 5,385 319 Updated May 26, 2026

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

939 121 Updated Aug 27, 2025
Next