dukGuo

Follow

Dake Guo dukGuo

Follow

Qwen Team | ASLP@NPU

48 followers · 75 following

Northwestern Polytechnical University
China

Achievements

Achievements

Starred repositories

zgwl / chinese-buy-us-stock-guide

美股指南

4,379 676 Updated Jun 11, 2026

vectominist / usad

Official implementation of "USAD: Universal Speech and Audio Representation via Distillation"

Python 8 1 Updated Jun 7, 2026

rednote-hilab / dots.tts

Python 645 46 Updated Jun 12, 2026

MiniMax-AI / VTP

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 491 14 Updated Apr 15, 2026

hanjq17 / W-Flow

Official code release for the paper "One-Step Generative Modeling via Wasserstein Gradient Flows"

Python 47 3 Updated Jun 9, 2026

CostaliyA / Flow-OPD

Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"

Python 236 2 Updated Jun 7, 2026

facebookresearch / WavFlow

MultiModal Audio Generation in Raw Waveform Space.

Python 154 10 Updated May 26, 2026

tang-bd / v-grpo

[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Python 55 2 Updated Apr 28, 2026

CompVis / patch-forcing

[CVPR 2026] Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation

Python 87 3 Updated Apr 26, 2026

tiantiaf0627 / voxlect

[KDD 2026] Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Python 36 3 Updated Aug 10, 2025

ASLP-lab / MINT-Bench

Python 47 2 Updated May 2, 2026

maswang32 / latentfouriertransform

Python 26 2 Updated Apr 21, 2026

eladwf / topdown-semantic-vocoder

A dual-rate LLM architecture bridging DSP and NLP. Decouples semantic planning from lexical synthesis to solve O(N2) bottlenecks.

Python 7 Updated Apr 11, 2026

lucidrains / transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,371 73 Updated Jan 27, 2026

LAION-AI / scaled-echo-tts

Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024 GPUs)

Python 24 Updated Mar 29, 2026

NousResearch / hermes-agent

The agent that grows with you

Python 194,046 33,976 Updated Jun 15, 2026

CVL-UESTC / RDVQ

CVPR 2026 (Oral)-Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression

Python 43 Updated Jun 12, 2026

ShivamDuggal4 / UNITE-tokenization-generation

Single-stage End-to-End Training for Tokenization and Generation

Python 115 1 Updated Mar 24, 2026

AaltoML / DiVeQ

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Python 11 1 Updated May 12, 2026

ASLP-lab / WenetSpeech-Wu-Repo

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

QwenLM / Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,910 292 Updated Jan 30, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 378,799 79,249 Updated Jun 15, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 11,957 1,554 Updated Mar 17, 2026

k2-fsa / Flow2GAN

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 142 8 Updated Mar 8, 2026

disco-speech / DisCo-Speech

89 7 Updated Dec 31, 2025

jingzhunxue / FlowMirror_HydraVox

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

ASLP-lab / VoiceSculptor

An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.

Python 250 12 Updated Feb 26, 2026

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,531 319 Updated May 26, 2026

qiuzh20 / gated_attention

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 961 60 Updated Dec 20, 2025

slidevjs / slidev

Presentation Slides for Developers

TypeScript 47,180 2,101 Updated Jun 3, 2026

Starred topics

Python

Machine learning

Linux

Awesome Lists