Zth9730

🥬

Ataraxy

TianHao Zhang Zth9730

🥬

Ataraxy

University of Science and Technology Beijing

13 followers · 15 following

Computer of Science and Technology Beijing

Achievements

RAE Public
Forked from bytetriper/RAE

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python MIT License Updated Oct 14, 2025
transformer-vocos Public
Forked from Mddct/transformer-vocos

Python Updated Jul 6, 2025
usm-tokenizer Public
Forked from Mddct/usm-tokenizer

semantic tokenizer for speech and music

Python Updated Jun 27, 2025
sequence-vector-quantize Public
Forked from Mddct/sequence-vector-quantize

dh vq-q or vae exp

Python Updated May 19, 2025
wenet Public
Forked from wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 1 Apache License 2.0 Updated Apr 10, 2025
audio-pipeline Public
Forked from Mddct/audio-pipeline

Python Apache License 2.0 Updated Oct 17, 2024
blsp Public
Forked from cwang621/blsp

BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing

Python Apache License 2.0 Updated Oct 15, 2024
SpeechTokenizer Public
Forked from ZhangXInFD/SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python Apache License 2.0 Updated Aug 14, 2024
RepCodec Public
Forked from mct10/RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Python Other Updated Jul 12, 2024
NeMo-text-processing Public
Forked from NVIDIA/NeMo-text-processing

NeMo text processing for ASR and TTS

Python Apache License 2.0 Updated Feb 29, 2024
latent-diffusion Public
Forked from CompVis/latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook MIT License Updated Feb 29, 2024
MS-SNSD Public
Forked from microsoft/MS-SNSD

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) l…

Python MIT License Updated Jan 9, 2024
icefall Public
Forked from k2-fsa/icefall

Python Apache License 2.0 Updated Dec 21, 2023
Unconstrained-AVSR Public

1 Apache License 2.0 Updated Dec 20, 2023
MaTe3D Public
Forked from HumanAIGC/MaTe3D

MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing

Apache License 2.0 Updated Dec 13, 2023
Zth9730.github.io Public

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

JavaScript MIT License Updated Dec 12, 2023
awesome-source-free-test-time-adaptation Public
Forked from YuejiangLIU/awesome-source-free-test-time-adaptation

A curated list of papers in Test-time Adaptation, Test-time Training and Source-free Domain Adaptation

Updated Oct 2, 2023
fairseq2 Public
Forked from facebookresearch/fairseq2

FAIR Sequence Modeling Toolkit

Python MIT License Updated Aug 24, 2023
PromptingWhisper Public
Forked from jasonppy/PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Python Updated Aug 15, 2023
chirp Public
Forked from google-research/perch

Python Apache License 2.0 Updated Aug 10, 2023
MyArxiv Public template
Forked from MLNLP-World/MyArxiv

CSS GNU General Public License v2.0 Updated Jul 31, 2023
RetNet Public
Forked from Jamie-Stirling/RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

Python MIT License Updated Jul 21, 2023
Whisper-Finetune Public
Forked from yeyupiaoling/Whisper-Finetune

微调Whisper语音识别模型和加速推理，支持Web部署和Android部署

C Apache License 2.0 Updated Jul 18, 2023
unilm Public
Forked from microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python MIT License Updated Jul 18, 2023
Macaw-LLM Public
Forked from lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python Updated May 31, 2023
asteroid Public
Forked from asteroid-team/asteroid

The PyTorch-based audio source separation toolkit for researchers

Python MIT License Updated May 26, 2023
Pengi Public
Forked from microsoft/Pengi

An Audio Language model for Audio Tasks

MIT License Updated May 22, 2023
JaxSpeechX Public

Fast and Effortless Speech Recognition Deployment with JAX

Updated Apr 28, 2023
PaddleSpeech Public
Forked from PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NA…

Python Apache License 2.0 Updated Apr 26, 2023
bark Public
Forked from suno-ai/bark

🔊 Text-prompted Generative Audio Model

Python Other Updated Apr 15, 2023

TianHao Zhang Zth9730

Achievements

Achievements

RAE Public

Uh oh!

transformer-vocos Public

Uh oh!

usm-tokenizer Public

Uh oh!

sequence-vector-quantize Public

Uh oh!

wenet Public

Uh oh!

audio-pipeline Public

Uh oh!

blsp Public

Uh oh!

SpeechTokenizer Public

Uh oh!

RepCodec Public

Uh oh!

NeMo-text-processing Public

Uh oh!

latent-diffusion Public

Uh oh!

MS-SNSD Public

Uh oh!

icefall Public

Uh oh!

Unconstrained-AVSR Public

Uh oh!

MaTe3D Public

Uh oh!

Zth9730.github.io Public

Uh oh!

awesome-source-free-test-time-adaptation Public

Uh oh!

fairseq2 Public

Uh oh!

PromptingWhisper Public

Uh oh!

chirp Public

Uh oh!

MyArxiv Public template

Uh oh!

RetNet Public

Uh oh!

Whisper-Finetune Public

Uh oh!

unilm Public

Uh oh!

Macaw-LLM Public

Uh oh!

asteroid Public

Uh oh!

Pengi Public

Uh oh!

JaxSpeechX Public

Uh oh!

PaddleSpeech Public

Uh oh!

bark Public

Uh oh!