Skip to content
View jihoojung0106's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Block or report jihoojung0106

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
467 stars written in Python
Clear filter

Robust Speech Recognition via Large-Scale Weak Supervision

Python 94,192 11,720 Updated Dec 15, 2025

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 65,637 6,606 Updated Jan 22, 2026

The world's simplest facial recognition api for Python and the command line

Python 56,090 13,716 Updated Aug 21, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,456 5,949 Updated Aug 16, 2024

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Python 39,465 7,334 Updated Nov 27, 2022

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,394 4,777 Updated Jun 2, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 32,697 6,738 Updated Feb 5, 2026

Deezer source separation library including pretrained models.

Python 28,020 3,068 Updated Apr 2, 2025

SoftVC VITS Singing Voice Conversion

Python 27,976 5,085 Updated Nov 11, 2023

Download your Spotify playlists and songs along with album art and metadata (from YouTube if a match is found).

Python 23,820 2,055 Updated Nov 15, 2025

GUI for a Vocal Remover that uses Deep Neural Networks.

Python 23,452 1,757 Updated Mar 13, 2025

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Python 22,131 3,023 Updated Jan 25, 2026

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,984 2,133 Updated Jan 27, 2026

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 19,475 2,193 Updated Feb 4, 2026

Mamba SSM architecture

Python 17,142 1,580 Updated Jan 12, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,684 3,325 Updated Feb 5, 2026

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 14,816 1,563 Updated Feb 4, 2026

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,048 2,073 Updated Feb 2, 2026

An open source implementation of CLIP.

Python 13,345 1,234 Updated Nov 4, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,541 1,190 Updated Feb 5, 2026

A PyTorch-based Speech Toolkit

Python 11,179 1,646 Updated Feb 4, 2026

Spark-TTS Inference Code

Python 10,915 1,171 Updated Apr 9, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,443 1,266 Updated Aug 4, 2025

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Python 10,211 868 Updated Jul 6, 2024

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 9,720 1,411 Updated Apr 24, 2024

End-to-End Speech Processing Toolkit

Python 9,717 2,379 Updated Feb 4, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,562 871 Updated Jan 19, 2026

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,090 726 Updated Dec 30, 2025

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,814 1,389 Updated Dec 6, 2023

Multilingual Voice Understanding Model

Python 7,462 695 Updated Dec 30, 2025
Next