Skip to content
View xmzhao's full-sized avatar
  • Tencent
  • Chengdu, China

Block or report xmzhao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A lightweight library for normalizing speech transcripts before computing WER

Python 27 4 Updated May 29, 2026

A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience

TypeScript 126,500 9,199 Updated Jun 18, 2026

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recogniti…

Python 141 19 Updated Jun 10, 2022

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,352 131 Updated Mar 23, 2026

A 10000+ hours dataset for Chinese speech recognition

Shell 616 56 Updated Jan 9, 2026

Large, modern dataset for speech recognition

Shell 728 66 Updated Feb 26, 2024

Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms, Tasks, Search & Drive with AI - Comprehensive Google Workspace / G Suite MCP Server & CLI Tool

Python 2,702 821 Updated Jun 18, 2026

26m function call model that runs on incredibly small devices

Python 2,616 176 Updated May 16, 2026

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,843 813 Updated Mar 25, 2026

A Conversational Speech Generation Model

Python 14,667 1,482 Updated May 27, 2025

Inference and training library for high-quality TTS models.

Python 5,580 590 Updated Dec 10, 2024
Python 400 68 Updated Sep 3, 2024

A toolkit for processing speech data and creating speech datasets

Python 213 45 Updated Mar 29, 2026

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

SCSS 36 3 Updated Dec 31, 2023

SOTA Open Source TTS

Python 30,861 2,636 Updated Jun 9, 2026

A Kubernetes media gateway for WebRTC. Contact: info@l7mp.io

Go 1,025 91 Updated Jun 4, 2026

Production-grade engineering skills for AI coding agents.

Shell 62,836 6,824 Updated Jun 18, 2026

Agent Skills for Google products and technologies

Python 13,901 1,056 Updated Jun 18, 2026

A cross-platform desktop All-in-One assistant for Claude Code, Codex, OpenCode, OpenClaw, Gemini CLI & Hermes Agent. Only official website: ccswitch.io

Rust 104,152 6,886 Updated Jun 18, 2026

The official Lark/Feishu CLI tool, maintained by the larksuite team — built for humans and AI Agents. Covers core business domains including Messenger, Docs, Base, Sheets, Calendar, Mail, Tasks, Me…

Go 14,372 990 Updated Jun 18, 2026

ESC-50: Dataset for Environmental Sound Classification

Python 1,831 322 Updated Mar 20, 2024

The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.

710 154 Updated May 21, 2018

ACL 2026 - Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

Python 118 10 Updated Apr 11, 2026

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 22,559 2,308 Updated Jun 3, 2026

MeetEval - A meeting transcription evaluation toolkit

Python 164 18 Updated Jan 27, 2026

A blazing fast AI Gateway with integrated guardrails. Route to 1,600+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.

TypeScript 12,112 1,137 Updated May 25, 2026

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,337 198 Updated Jun 15, 2026

All parts of Claude Code's system prompt, 27 builtin tool descriptions, sub agent prompts (Plan/Explore/Task), utility prompts (CLAUDE.md, compact, statusline, magic docs, WebFetch, Bash cmd, secur…

JavaScript 11,205 1,946 Updated Jun 17, 2026

Faster Whisper transcription with CTranslate2

Python 23,716 1,943 Updated Nov 19, 2025
Next