Starred repositories
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A high-throughput and memory-efficient inference and serving engine for LLMs
Paper reading and discussion notes, covering AI frameworks, distributed systems, cluster management, etc.
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
Dynamic Memory Management for Serving LLMs without PagedAttention
A low-latency & high-throughput serving engine for LLMs
A multi-voice TTS system trained with an emphasis on quality
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Zero-Shot Detection via Vision and Language Knowledge Distillation
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition
An up-to-date list of works on Multi-Task Learning
awesome-autonomous-driving
Official implementation of CrossViT. https://arxiv.org/abs/2103.14899
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image