Skip to content
View hasaki321's full-sized avatar

Block or report hasaki321

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Stop-To-Ask-Questions-The-Stupid-Ways

1,261 302 Updated Sep 18, 2023
Python 5 2 Updated Jan 27, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,116 31,502 Updated Dec 21, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 303 Updated Jun 12, 2025
Jupyter Notebook 15 1 Updated Dec 17, 2025

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 1,345 106 Updated Apr 15, 2025

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Python 64 9 Updated Oct 28, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,663 165 Updated Dec 5, 2025
Python 7,541 445 Updated Dec 14, 2025

speech self-supervised representations

Python 514 39 Updated Apr 27, 2023

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,831 108 Updated Dec 8, 2025

[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,907 313 Updated Feb 19, 2025

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 669 43 Updated Dec 20, 2025

(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Python 57 Updated Oct 14, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,499 302 Updated Nov 5, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 731 42 Updated Nov 19, 2024

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 73 6 Updated Aug 24, 2024

[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

Jupyter Notebook 43 6 Updated Feb 9, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,148 2,016 Updated Dec 17, 2025

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Python 1,213 101 Updated Jun 29, 2025

[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation

Python 658 24 Updated Nov 27, 2025

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Jupyter Notebook 71 4 Updated Mar 17, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,393 319 Updated Jun 21, 2025

DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast voice synthesis.🐙

Python 49 5 Updated Dec 21, 2025

High-performance Image Tokenizers for VAR and AR

Python 300 6 Updated Apr 25, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,466 70 Updated Mar 16, 2025

Advanced GRAG implementation for ComfyUI with beginner-friendly and expert modes

Python 15 3 Updated Nov 6, 2025

https://little-misfit.github.io/GRAG-Image-Editing/

Python 114 3 Updated Nov 27, 2025
Next