-
Harbin Institute of Technology, Shenzhen
- Shenzhen, China
- https://lizaijing.github.io/
Stars
Official Implementation for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Awesome collection of resources and papers on Diffusion Models for Robotic Manipulation.
About Awesome things towards foundation agents. Papers / Repos / Blogs / ...
[CVPR 2025] Official Implementation for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
A compilation of the best multi-agent papers
lizaijing / Optimus-1
Forked from JiuTian-VL/Optimus-1[NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
[CVPR 2024 Workshop] The Champion Solution for Ego4D EgoSchema Challenge in CVPR 2024
[NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
[ACMMM 2022 Oral] Official Implementation for Bi-directional Heterogeneous Graph Hashing towards Efficient Outfit Recommendation
Official repository of the “Mask Again: Masked Knowledge Distillation for Masked Video Modeling” (ACM MM 2023)
Official repository of the "Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning" (ACM MM 2023)
Open-Sora: Democratizing Efficient Video Production for All
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
✨✨Latest Advances on Multimodal Large Language Models
😎 curated list of awesome LMM hallucinations papers, methods & resources.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Official repo for consistency models.
SAEval: A benchmark for sentiment analysis to evaluate the model's performance on various subtasks.
UniSA: Unified Generative Framework for Sentiment Analysis
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family