-
Colossal-AI Team @hpcaitech
- Shanghai, China
-
20:54
(UTC +08:00) - https://www.linkedin.com/in/tongli3701/
Multimodal
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
VideoSys: An easy and efficient system for video generation
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Open-Sora: Democratizing Efficient Video Production for All
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
GPT4V-level open-source multi-modal model based on Llama3-8B
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation