-
SJTU
- Shanghai, China
-
00:02
(UTC +08:00)
Lists (6)
Sort Name ascending (A-Z)
Starred repositories
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Muon is an optimizer for hidden layers in neural networks
中文逆文本正则化 (Chinese ITN, Chinese Inverse Text Normalization) ,即将文本中的中文数字转为阿拉伯数字。
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Scripts, dot files, and other things that make my programming life a happy one
Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Conversion between Traditional and Simplified Chinese
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
This repository contains the SpeechBrain Benchmarks
Simple conversion and localization between simplified and traditional Chinese using tables from MediaWiki.
Universal Romanizer that can convert any unicode script to roman (latin) script
paraformer(chinense asr) online onnx runtime for python
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
A simple package for Guided source separation (GSS)
Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)
Different implementations of "Weighted Prediction Error" for speech dereverberation