Stars
Streaming ASR and TTS based on FastAPI+ sherpa-onnx
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Generate audiobooks from e-books, voice cloning & 1158+ languages!
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code incluβ¦
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural language processing. Stay updated on the latest in machine learniβ¦
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processinβ¦
The repository provides links to collections of influential and interesting research papers from top AI conferences, with open-source code to promote reproducibility and provide detailed implementaβ¦
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal proceβ¦
ICCV 2023-2025 Papers: Discover cutting-edge research from ICCV 2023-25, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included.β¦
Code and slides of my YouTube series called "Audio Signal Proessing for Machine Learning"
I will update this repository to learn Machine learning with python with statistics content and materials
A day to day plan for this challenge. Covers both theoritical and practical aspects
Here my amazing tutorial collection contain amazing notebook must read. It's contain pytorch, Advance pandas, Ensemble learning, Tensorflow, Genetic Algorithms, Dask, Word Embedding
My notebook on using Python with Jupyter Notebook, PySpark etc
Text and code for the second edition of Think Bayes, by Allen Downey.
π Path to a free self-taught education in Computer Science!
Detect the lip, Recognition sentences and Show Subtitles.
Docker container with interesting tools to work with audio-visual data in pytorch
A self-supervised learning framework for audio-visual speech
My notes / works on deep learning from Coursera
Hands-On Computer Vision with TensorFlow 2, published by Packt
π§βπ« 60+ Implementations/tutorials of deep learning papers with side-by-side notes π; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gaβ¦
Visual Speech Recognition for Multiple Languages
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.