Lists (12)
Sort Name ascending (A-Z)
Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
🔊 Text-Prompted Generative Audio Model
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A High-Quality Real Time Upscaler for Anime Video
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A multi-voice TTS system trained with an emphasis on quality
PyTorch code and models for the DINOv2 self-supervised learning method.
Official inference library for Mistral models
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body
A unified framework for 3D content generation.
Official Code for Stable Cascade
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
DeepFashion2 Dataset https://arxiv.org/pdf/1901.07973.pdf
[ICLR24] Official PyTorch Implementation of Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Text To Video Synthesis Colab
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
Codebase for Aria - an Open Multimodal Native MoE
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
152334H / tortoise-tts-fast
Forked from neonbjb/tortoise-ttsFast TorToiSe inference (5x or your money back!)
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
Patent analysis using the Google Patents Public Datasets on BigQuery
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Official code for the CVPR 2025 paper "SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models."
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
Fuzzy Metaballs+ Renderer. With Optical Flow, Mesh Exporting and more.