Stars
This collection of helper scripts and guides for AWS SageMaker HyperPod and ParallelCluster makes it easy to get started with large-scale distributed training on Slurm-based HPC clusters.
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Documenting my attempts to get the internal speakers of the Galaxy Book4 Pro working on Linux
Notes and utilities for running Linux on the Samsung Galaxy Book2 Pro
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
Open Source framework for voice and multimodal conversational AI
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Recurrent neural network for audio noise reduction
CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Tools for handling multimodal data in machine learning projects.
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
AcademiCodec: An Open Source Audio Codec Model for Academic Research
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Foundational Models for State-of-the-Art Speech and Text Translation
This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN, Non-attentive Tacotron, GST, VAE, GMVAE, and X-vectors for…
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Password cracker for SQLCipher v2 using OpenCL
SQLCipher is a standalone fork of SQLite that adds 256 bit AES encryption of database files and other security features.
This is Pytorch Implementation of Google's Non-attentive Tacotron.
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
A statistical model-based Voice Activity Detection
Efficient voice activity detection algorithms using long-term speech information in C++
Hackable and optimized Transformers building blocks, supporting a composable construction.