-
Beihang University
- Beijing, China
Highlights
- Pro
Stars
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
A simple screen parsing tool towards pure vision based GUI agent
Ideogram 4: Open image model at the forefront of design
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling
Evolve your language agent with Agentic Context Engineering (ACE)
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
ModelTC / LightX2V-Wan2.2-Lightning
Forked from Wan-Video/Wan2.2Wan2.2-Lightning: Speed up wan2.2 model with distillation
OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need th…
GoatWu / CausVid-Plus
Forked from tianweiy/CausVidUnofficial extension implementation of CausVid
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
[ICML'26] Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
Decoupled Weight Decay Regularization (ICLR 2019)
🎓 系统性大语言模型构建课程|🛠️ 覆盖预训练数据工程、Tokenizer、Transformer、MoE、GPU 编程 (CUDA/Triton)、分布式训练、Scaling Laws、推理优化及对齐 (SFT/RLHF/GRPO)|🚀 6 个渐进式作业 + 代码驱动,建立 LLM 全栈认知体系
[ICLR 2026] Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
[NeurIPS 2025] OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from sim…
DiagramBank: A Dataset of Diagram Design Exemplars with Paper Metadata for Retrieval-Augmented Generation.
Official implementation of AnimateDiff.
[ICCV 2025 Highlight] OminiControl: Minimal and Universal Control for Diffusion Transformer
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer