OLaPh (Optimal Language Phonemizer) is a multilingual phonemization framework that converts text into phonemes surpassing the quality of comparable frameworks.

Python 17 2 Updated May 13, 2026

juice500ml / phonetic-arithmetic

Python 6 Updated Apr 7, 2026

bernardo-torres / audio-resampling-in-python

Forked from jonashaag/audio-resampling-in-python

Comparison of Python audio resampling implementations

Jupyter Notebook 2 Updated Dec 1, 2023

vectominist / spin

Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering"

Python 65 6 Updated May 19, 2023

tronghieuit / tiny-tts

The Smallest English TTS Model with only 1M parameters

Python 381 35 Updated Apr 10, 2026

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 3,049 263 Updated May 6, 2026

facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.

Python 517 92 Updated May 13, 2026

frothywater / kanade-tokenizer

Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative modeling.

Python 98 12 Updated Apr 3, 2026

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 764 45 Updated Nov 19, 2024

ZHZisZZ / dllm

dLLM: Simple Diffusion Language Modeling

Python 2,493 260 Updated Apr 15, 2026

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 976 143 Updated Dec 2, 2025

Ailln / cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

Python 760 83 Updated Apr 23, 2026

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 5,974 857 Updated May 6, 2026

RWKV-Vibe / RWKV-LM-V7

RWKV-LM-V7(https://github.com/BlinkDL/RWKV-LM) Under Lightning Framework

HIP 59 13 Updated May 13, 2026

RWKV / rwkv.cpp

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

C++ 1,568 128 Updated Mar 23, 2025

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,193 289 Updated May 5, 2026

claude-code-best / claude-code

原汁原昧 Claude Code 可运行,可构建, 可调试版; 生产级工程化, 企业级可靠性; 安全无毒, 内存泄露修复

TypeScript 18,263 15,810 Updated May 14, 2026

ChinaSiro / claude-code-sourcemap

TypeScript 9,193 14,625 Updated Mar 31, 2026

润心 alephpi

Highlights

Lists (16)

Agent

AI

altegrad challenge

audio

CV

font

investment

JS

Like

NLP

Rime

rwkv

SD

Vue

zotero

汉语

Stars