Skip to content
View dukGuo's full-sized avatar
  • Northwestern Polytechnical University
  • China

Block or report dukGuo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Python 2 2 Updated Jun 19, 2026

美股指南

4,828 741 Updated Jun 20, 2026

Official implementation of "USAD: Universal Speech and Audio Representation via Distillation"

Python 10 1 Updated Jun 7, 2026
Python 758 61 Updated Jun 16, 2026

[ECCV 2026] Towards Scalable Pre-training of Visual Tokenizers for Generation

Python 492 14 Updated Apr 15, 2026

Official code release for the paper "One-Step Generative Modeling via Wasserstein Gradient Flows"

Python 59 4 Updated Jun 9, 2026

Official Repo of "Flow-OPD: On-Policy Distillation for Flow Matching Models"

Python 244 2 Updated Jun 7, 2026

MultiModal Audio Generation in Raw Waveform Space.

Python 154 10 Updated May 26, 2026

[CVPR 2026 Findings] V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think

Python 56 2 Updated Apr 28, 2026

[CVPR 2026] Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation

Python 89 3 Updated Apr 26, 2026

[KDD 2026] Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Python 38 3 Updated Aug 10, 2025
Python 47 2 Updated May 2, 2026

A dual-rate LLM architecture bridging DSP and NLP. Decouples semantic planning from lexical synthesis to solve O(N2) bottlenecks.

Python 7 Updated Apr 11, 2026

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,374 73 Updated Jan 27, 2026

Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024 GPUs)

Python 24 Updated Mar 29, 2026

The agent that grows with you

Python 199,994 35,598 Updated Jun 23, 2026

CVPR 2026 (Oral)-Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression

Python 44 Updated Jun 16, 2026

Single-stage End-to-End Training for Tokenization and Generation

Python 115 1 Updated Mar 24, 2026

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Python 11 1 Updated May 12, 2026

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,956 297 Updated Jan 30, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,978 79,560 Updated Jun 23, 2026

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 12,092 1,563 Updated Mar 17, 2026

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 142 8 Updated Mar 8, 2026

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.

Python 251 12 Updated Feb 26, 2026

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,533 319 Updated May 26, 2026

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 965 61 Updated Dec 20, 2025
Next