Stars
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Making large AI models cheaper, faster and more accessible
TensorFlow code and pre-trained models for BERT
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Code samples for my book "Neural Networks and Deep Learning"
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.
Foundational model for human-like, expressive TTS
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks