Skip to content
View deyituo's full-sized avatar

Block or report deyituo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

将冰冷的离别化为温暖的 Skill,欢迎加入数字生命1.0!Transforming cold farewells into warm skills? It's giving rebirth era. Welcome to Digital Life 1.0. 🫶

Python 13,040 1,203 Updated Apr 9, 2026

High-Quality Voice Cloning TTS for 600+ Languages

Python 3,012 457 Updated Apr 11, 2026

Codebase for 'ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining'

Python 12 1 Updated Apr 6, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 181,652 107,249 Updated Apr 12, 2026

Automatic evaluation of speech-to-speech models via TRACE.

Python 6 Updated Feb 10, 2026

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,015 103 Updated Mar 3, 2026

Diffusion-based singing voice pitch correction

Python 140 20 Updated Sep 20, 2024

Covo-Audio is a 7B-parameter end-to-end large audio language model that directly processes continuous audio inputs and generates audio outputs within a single unified architecture.

Python 135 14 Updated Mar 17, 2026

Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.

Python 178 16 Updated Mar 20, 2026

A CLI for Bilibili — browse videos, users, search, and feeds from the terminal

Python 654 68 Updated Mar 14, 2026

A CLI for Xiaohongshu (小红书) — search, read, interact via reverse-engineered API

Python 1,609 163 Updated Mar 21, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,370 232 Updated Jan 30, 2026

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Python 463 100 Updated Oct 23, 2023

This repository is maintained by the Speech Team at Alibaba’s Tongyi Lab, serving as an open-source platform for our cutting-edge research in speech, audio, NLP technologies. We believe in accelera…

Python 32 4 Updated Mar 16, 2026
Python 372 29 Updated Mar 25, 2026

An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs

Python 29 Updated Mar 15, 2026

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 50 4 Updated Feb 17, 2026

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 1,115 64 Updated Dec 17, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

Shell 112,663 18,824 Updated Apr 10, 2026

official implementation for "DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment"

Python 15 Updated Sep 29, 2025

Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control

Python 220 16 Updated Feb 26, 2026

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 647 56 Updated Mar 17, 2026

A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

Python 236 26 Updated Mar 9, 2026
Jupyter Notebook 8 Updated Sep 8, 2025

Official implementation of YingMusic-SVC.

Python 126 12 Updated Dec 29, 2025

[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training

71 2 Updated Apr 6, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 10,558 1,372 Updated Mar 17, 2026

Worlds first open-source real-time end-to-end spoken dialogue model with personalized voice cloning.

Jupyter Notebook 542 58 Updated Jan 28, 2026

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,819 507 Updated Mar 27, 2026
Next