boji123

burkliu boji123

23 followers · 14 following

Achievements

Organizations

Stars

pengzhendong / audiolab

A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)

Python 38 3 Updated Mar 31, 2026

byteresearchcla / RealSI

RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios

Python 79 8 Updated Jul 4, 2025

Playmate111 / Playmate2

[AAAI 2026] Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback

Python 298 28 Updated Nov 21, 2025

RuoChoXio / ETrajEval

EtrajEval: Official framework for emotional support evaluation in language models, from the paper "Detecting Emotional Dynamic Trajectories: An Evaluation Framework for Emotional Support in Languag…

Python 16 6 Updated Nov 14, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 299 23 Updated Apr 15, 2026

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,019 102 Updated Mar 3, 2026

wenet-e2e / WeTextProcessing

Text Normalization & Inverse Text Normalization

Python 752 101 Updated Feb 27, 2026

ASLP-lab / OSUM

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 486 31 Updated Nov 23, 2025

pengzhendong / streaming-sensevoice

Pseudo Streaming SenseVoice with Hotwords

Python 444 52 Updated Mar 13, 2025

yanghaha0908 / EmoVoice

Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"

Python 115 12 Updated Oct 16, 2025

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 7,986 731 Updated Dec 30, 2025

QuwanAI / MoodBench

基于PQAEF (https://github.com/QuwanAI/PQAEF) 框架设计的情感陪伴对话系统测评基准

Python 41 14 Updated Sep 1, 2025

NVIDIA / NeMo-speech-data-processor

A toolkit for processing speech data and creating speech datasets

Python 207 43 Updated Mar 29, 2026

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 247 25 Updated Feb 25, 2026

ScottishFold007 / Cosyvoice_DPO_NOTES

CosyVoice_DPO_NOTES: Supercharge Your Cosyvoice model with Cutting-Edge DPO Fine-Tuning!

Python 124 19 Updated Aug 8, 2025

antgroup / echomimic_v3

[AAAI 2026] EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Python 871 102 Updated Mar 18, 2026

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,393 101 Updated Mar 16, 2026

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 8,020 618 Updated Jan 18, 2026

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flexible speaker control, and multilingual support, while enablin…

Python 1,296 125 Updated Mar 23, 2026

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 20,073 2,468 Updated Mar 16, 2026

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,206 145 Updated Sep 5, 2024

fmu2 / flow-VAE

Variational Autoencoder (VAE) with Normalizing Flows

Python 72 8 Updated Oct 10, 2024

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,121 1,974 Updated Jan 9, 2026

FoundationAgents / OpenManus

No fortress, purely open ground. OpenManus is Coming.

Python 55,798 9,736 Updated Feb 11, 2026

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,568 345 Updated Jun 21, 2025

qi-hua / async_cosyvoice

使用vllm加速cosyvoice2的推理

Jupyter Notebook 491 64 Updated Apr 26, 2025

xingchensong / TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 230 30 Updated Apr 8, 2026

AI-Hobbyist / Genshin_Datasets

Genshin Datasets For SVC/SVS/TTS

721 40 Updated Jan 11, 2026

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,767 808 Updated Mar 25, 2026

duixcom / Duix-Avatar

🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.

C 12,767 2,111 Updated Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

burkliu boji123

Achievements

Achievements

Organizations

Block or report boji123

Stars

pengzhendong / audiolab

byteresearchcla / RealSI

Playmate111 / Playmate2

RuoChoXio / ETrajEval

meituan-longcat / LongCat-Audio-Codec

XiaomiMiMo / MiMo-Audio

wenet-e2e / WeTextProcessing

ASLP-lab / OSUM

pengzhendong / streaming-sensevoice

yanghaha0908 / EmoVoice

FunAudioLLM / SenseVoice

QuwanAI / MoodBench

NVIDIA / NeMo-speech-data-processor

xingchensong / FlashCosyVoice

ScottishFold007 / Cosyvoice_DPO_NOTES

antgroup / echomimic_v3

stepfun-ai / Step-Audio2

boson-ai / higgs-audio

OpenMOSS / MOSS-TTSD

index-tts / index-tts

NVIDIA / BigVGAN

fmu2 / flow-VAE

QwenLM / Qwen3

FoundationAgents / OpenManus

MoonshotAI / Kimi-Audio

qi-hua / async_cosyvoice

xingchensong / TouchNet

AI-Hobbyist / Genshin_Datasets

open-mmlab / Amphion

duixcom / Duix-Avatar