Highlights
- Pro
Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
🔊 Text-Prompted Generative Audio Model
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A multi-voice TTS system trained with an emphasis on quality
High-Resolution Image Synthesis with Latent Diffusion Models
Foundational Models for State-of-the-Art Speech and Text Translation
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
Flax is a neural network library for JAX that is designed for flexibility.
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Optimize prompts, code, and more with AI-powered Reflective Text Evolution
Fast parallel LLM inference for MLX
Bare-bones implementations of some generative models in Jax: diffusion, normalizing flows, consistency models, flow matching, (beta)-VAEs, etc
Minimal, lightweight JAX implementations of popular models.
Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.
Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.
cgarciae / nanoGPT-jax
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
This is a port of Mistral-7B model in JAX
A set of TFDS dataset builders for common datasets
A novel Disfluency Correction & Machine translation Dataset for English, Hindi, German and French