Stars
π Text-Prompted Generative Audio Model
A simple screen parsing tool towards pure vision based GUI agent
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllableβ¦
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Image restoration with neural networks but without learning.
Official Code for Stable Cascade
A course in reinforcement learning in the wild
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A collection of infrastructure and tools for research in neural network interpretability.
OmniGen2: Exploration to Advanced Multimodal Generation.
Algorithms for outlier, adversarial and drift detection
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
Google Colaboratory Notebooks and Repositories (by @firmai)
Tools to train a generative model on arbitrary audio samples
Utility functions for handling MIDI data in a nice/intuitive way.
A list of Machine Learning Art Colabs
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
Machine Learning applied to sound
Tegridy MIDI Dataset for precise and effective Music AI models creation.
A neural attention model for speech command recognition
Multiple notebooks which allow the use of various machine learning methods to generate or modify multimedia content
The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.
Symbolic Music NLP Artificial Intelligence Toolkit
Generate YouTube Shorts using Reddit posts scraped with PRAW, title and captions generated with GPT, images and thumbnails generated with Stable Diffusion and voiceover with 11Labs
Deploy a RAG use case on AWS by using Terraform and Amazon Bedrock