Skip to content
View dukGuo's full-sized avatar
  • Northwestern Polytechnical University
  • China

Block or report dukGuo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

美股指南

4,379 676 Updated Jun 11, 2026

Official implementation of "USAD: Universal Speech and Audio Representation via Distillation"

Python 8 1 Updated Jun 7, 2026
Python 645 46 Updated Jun 12, 2026

Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 491 14 Updated Apr 15, 2026

Official code release for the paper "One-Step Generative Modeling via Wasserstein Gradient Flows"

Python 47 3 Updated Jun 9, 2026

Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"

Python 236 2 Updated Jun 7, 2026

MultiModal Audio Generation in Raw Waveform Space.

Python 154 10 Updated May 26, 2026

[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Python 55 2 Updated Apr 28, 2026

[CVPR 2026] Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation

Python 87 3 Updated Apr 26, 2026

[KDD 2026] Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Python 36 3 Updated Aug 10, 2025
Python 47 2 Updated May 2, 2026

A dual-rate LLM architecture bridging DSP and NLP. Decouples semantic planning from lexical synthesis to solve O(N2) bottlenecks.

Python 7 Updated Apr 11, 2026

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,371 73 Updated Jan 27, 2026

Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024 GPUs)

Python 24 Updated Mar 29, 2026

The agent that grows with you

Python 194,046 33,976 Updated Jun 15, 2026

CVPR 2026 (Oral)-Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression

Python 43 Updated Jun 12, 2026

Single-stage End-to-End Training for Tokenization and Generation

Python 115 1 Updated Mar 24, 2026

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Python 11 1 Updated May 12, 2026

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,910 292 Updated Jan 30, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 378,799 79,249 Updated Jun 15, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 11,957 1,554 Updated Mar 17, 2026

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 142 8 Updated Mar 8, 2026

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.

Python 250 12 Updated Feb 26, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,531 319 Updated May 26, 2026

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 961 60 Updated Dec 20, 2025

Presentation Slides for Developers

TypeScript 47,180 2,101 Updated Jun 3, 2026
Next