Skip to content
View ALIVE321's full-sized avatar
🌴
On vacation
🌴
On vacation
  • SJTU
  • Shanghai, China
  • 00:02 (UTC +08:00)

Block or report ALIVE321

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A list of papers for child ASR

50 6 Updated Oct 8, 2024

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,465 1,999 Updated Nov 1, 2025

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Python 124 6 Updated Dec 9, 2024

Muon is an optimizer for hidden layers in neural networks

Python 2,123 100 Updated Nov 23, 2025

中文逆文本正则化 (Chinese ITN, Chinese Inverse Text Normalization) ,即将文本中的中文数字转为阿拉伯数字。

Python 20 1 Updated Nov 28, 2023

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,984 130 Updated Dec 18, 2025

Scripts, dot files, and other things that make my programming life a happy one

Shell 33 6 Updated Dec 11, 2025

汉字转拼音(pypinyin)

Python 5,226 628 Updated Nov 24, 2025

Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.

Python 443 53 Updated Jul 22, 2022

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 304 Updated Jun 12, 2025

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Python 834 108 Updated Feb 15, 2025

Conversion between Traditional and Simplified Chinese

C++ 9,392 1,035 Updated Dec 24, 2025
Python 228 48 Updated Nov 13, 2023
Shell 89 11 Updated Mar 5, 2021

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Python 21,018 3,044 Updated Dec 19, 2025

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,288 361 Updated Nov 27, 2025

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,851 139 Updated Jul 5, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,353 3,243 Updated Dec 23, 2025

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 941 101 Updated Oct 24, 2025

Target Speaker Extraction Toolkit

Python 233 32 Updated Oct 4, 2025

This repository contains the SpeechBrain Benchmarks

Python 134 46 Updated Jul 15, 2025

Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.

Python 560 40 Updated Apr 17, 2024

Universal Romanizer that can convert any unicode script to roman (latin) script

Perl 233 23 Updated Jul 26, 2024

paraformer(chinense asr) online onnx runtime for python

Python 53 5 Updated Mar 27, 2024

Mamba SSM architecture

Python 16,797 1,548 Updated Dec 23, 2025
Python 196 25 Updated Dec 5, 2024

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Python 428 102 Updated May 18, 2023

A simple package for Guided source separation (GSS)

Python 132 16 Updated May 20, 2024

Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)

Python 22 4 Updated Apr 27, 2024

Different implementations of "Weighted Prediction Error" for speech dereverberation

Python 547 166 Updated Mar 19, 2025
Next