MXuer

Follow

🎯

Focusing

MXuer

🎯

Focusing

Follow

13 followers · 36 following

Achievements

Achievements

Starred repositories

usememos / memos

Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.

Go 60,714 4,465 Updated Jun 12, 2026

rednote-hilab / dots.tts

Python 501 35 Updated Jun 12, 2026

cclank / cell-architecture-studio

Interactive 3D cell architecture gallery built with React and Three.js

HTML 1,307 270 Updated Jun 6, 2026

DayuanJiang / next-ai-draw-io

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 31,889 3,330 Updated Jun 12, 2026

MXuer / mms-alignment-tools

using MMS to do the audio-transcript alignment

Python 10 Updated May 29, 2023

DataoceanAI / Dolphin

Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.

Python 756 65 Updated Jun 11, 2026

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 152,456 10,544 Updated May 26, 2026

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,384 1,156 Updated Jun 11, 2026

SWivid / Habibi-TTS

Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"

Python 326 34 Updated Mar 30, 2026

csteinmetz1 / pyloudnorm

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm

Python 773 60 Updated Jan 4, 2026

ysharma3501 / LuxTTS

A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.

Python 4,177 537 Updated Jun 5, 2026

FunAudioLLM / Fun-ASR

End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.

Python 1,267 125 Updated Jun 12, 2026

penberg / awesome-low-latency

Patterns and resources of low latency programming.

1,228 67 Updated Jul 30, 2025

wenet-e2e / opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

235 11 Updated Dec 10, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,832 254 Updated Dec 30, 2025

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 207 17 Updated Jun 11, 2026

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,435 440 Updated Dec 11, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,460 107 Updated Mar 16, 2026

xingchensong / CosyVoice-ttsfrd

Python 25 3 Updated Jun 19, 2025

abjadai / catt

The official implementation of CATT Arabic diacritization models.

Python 75 11 Updated Jul 18, 2025

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 21,108 2,608 Updated Jun 12, 2026

atcelen / IDesign

Python 97 21 Updated Jul 21, 2025

frankyoujian / Edge-Punct-Casing

Python 32 9 Updated Feb 4, 2025

datalab-to / marker

Convert PDF to markdown + JSON quickly with high accuracy

Python 36,055 2,491 Updated Jun 6, 2026

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

317 18 Updated Nov 28, 2024

zai-org / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 3,187 281 Updated Dec 5, 2024

resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning

Python 3,267 484 Updated Oct 12, 2023

voilet1996 / practice-demo

前端实践项目

Vue 1 Updated Jan 18, 2024

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook 5,562 503 Updated Feb 23, 2026

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 39,441 4,241 Updated Apr 10, 2026

Starred topics

speaker-diarization

spell-check

audio-alignment

conformer

vad

rnn-transducer

audio-processing

ctc

crnn-tensorflow

Python

See all starred topics