Skip to content
View TongLi3701's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@hpcaitech

Block or report TongLi3701

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Multimodal

15 repositories

[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Python 2,993 359 Updated Apr 22, 2025

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

Python 4,823 307 Updated Mar 7, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,178 736 Updated May 31, 2024

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

Jupyter Notebook 131 11 Updated Oct 27, 2023

VideoSys: An easy and efficient system for video generation

Python 2,009 133 Updated Aug 27, 2025

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,096 1,071 Updated Oct 29, 2025
Jupyter Notebook 1,064 131 Updated Sep 18, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 28,136 2,815 Updated Apr 30, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,803 201 Updated Apr 9, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 66,614 9,529 Updated Dec 16, 2025

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,425 162 Updated Mar 3, 2025

[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.

Python 1,897 188 Updated Oct 30, 2025

Biomedical Generalist Video Generation Model

Python 175 1 Updated Oct 20, 2024

🍃 MINT-1T: A one trillion token multimodal interleaved dataset.

827 18 Updated Jul 31, 2024

[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Python 411 17 Updated Apr 25, 2025