Highlights
- Pro
Stars
Python (re)implementation of several k-anonymization algorithms
[INTERSPEECH'24] Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
日本語LLMまとめ - Overview of Japanese LLMs
⚡ Dynamically generated stats for your github readmes
Area-weighted venn-diagrams for Python/matplotlib
🚀 Beautiful highly customizable statusline for Claude Code CLI with powerline support, themes, and more.
EVAR ~ Evaluation package for Audio Representations
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
JIS配列環境でもUS(ANSI)配列のキーボードを、OS再起動やレジストリ編集なしに切り替えられるシンプルなツール
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework