Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 304 Updated Jun 12, 2025

jitsi / jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Python 834 108 Updated Feb 15, 2025

BYVoid / OpenCC

Conversion between Traditional and Simplified Chinese

C++ 9,392 1,035 Updated Dec 24, 2025

sarulab-speech / jtubespeech

Python 228 48 Updated Nov 13, 2023

laboroai / LaboroTVSpeech

Shell 89 11 Updated Mar 5, 2021

huggingface / datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 21,018 3,044 Updated Dec 19, 2025

Tencent-Hunyuan / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,288 361 Updated Nov 27, 2025

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,851 139 Updated Jul 5, 2024

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,353 3,243 Updated Dec 23, 2025

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 941 101 Updated Oct 24, 2025

wenet-e2e / wesep

Target Speaker Extraction Toolkit

Python 233 32 Updated Oct 4, 2025

speechbrain / benchmarks

This repository contains the SpeechBrain Benchmarks

Python 134 46 Updated Jul 15, 2025

gumblex / zhconv

Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.

Python 560 40 Updated Apr 17, 2024

isi-nlp / uroman

Universal Romanizer that can convert any unicode script to roman (latin) script

Perl 233 23 Updated Jul 26, 2024

lovemefan / paraformer-python

paraformer(chinense asr) online onnx runtime for python

Python 53 5 Updated Mar 27, 2024

state-spaces / mamba

Mamba SSM architecture

Python 16,797 1,548 Updated Dec 23, 2025

JusperLee / SPMamba

Python 196 25 Updated Dec 5, 2024

mpc001 / Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Python 428 102 Updated May 18, 2023

desh2608 / gss

A simple package for Guided source separation (GSS)

Python 132 16 Updated May 20, 2024

DataoceanAI / CNVSRC2023Baseline

Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)

Python 22 4 Updated Apr 27, 2024

fgnt / nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation

Python 547 166 Updated Mar 19, 2025

JahLee ALIVE321

Lists (6)

ASR

CS_basic

Datasets

DL-utils

Papers

Training Frameworks

Starred repositories

Python