hasaki321

hina hasaki321

6 followers · 10 following

Lanzhou University
Lanzhou
https://hrshome.site

Stars

tangx / Stop-Ask-Questions-The-Stupid-Ways

Stop-To-Ask-Questions-The-Stupid-Ways

1,261 302 Updated Sep 18, 2023

naist-nlp / SimulST

Python 5 2 Updated Jan 27, 2025

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 154,116 31,502 Updated Dec 21, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,854 303 Updated Jun 12, 2025

LeiLiLab / InfiniSST

Jupyter Notebook 15 1 Updated Dec 17, 2025

kyutai-labs / hibiki

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…

Rust 1,345 106 Updated Apr 15, 2025

gioannides / Density-Adaptive-JEPA

Python 13 Updated Dec 7, 2025

YangXusheng-yxs / CodecFormer_5Hz

Python 30 4 Updated Oct 23, 2025

hertz-pj / SNAC-Vocos

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Python 64 9 Updated Oct 28, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,663 165 Updated Dec 5, 2025

Tongyi-MAI / Z-Image

Python 7,541 445 Updated Dec 14, 2025

auspicious3000 / contentvec

speech self-supervised representations

Python 514 39 Updated Apr 27, 2023

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,831 108 Updated Dec 8, 2025

Saiyan-World / goku

[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/

Python 2,907 313 Updated Feb 19, 2025

qiuzh20 / gated_attention

The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Jupyter Notebook 669 43 Updated Dec 20, 2025

CVMI-Lab / VFMTok

(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

Python 57 Updated Oct 14, 2025

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,499 302 Updated Nov 5, 2024

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 731 42 Updated Nov 19, 2024

qiuk2 / AAR

[Official Implementation] Acoustic Autoregressive Modeling 🔥

Python 73 6 Updated Aug 24, 2024

ozspeech / OZSpeech

[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

Jupyter Notebook 43 6 Updated Feb 9, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 18,148 2,016 Updated Dec 17, 2025

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Python 1,213 101 Updated Jun 29, 2025

FoundationVision / InfinityStar

[NeurIPS 2025 Oral]Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation

Python 658 24 Updated Nov 27, 2025

Berkeley-Speech-Group / sylber

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

Jupyter Notebook 71 4 Updated Mar 17, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,393 319 Updated Jun 21, 2025

Tobertz-max / DiFlow-TTS

DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast voice synthesis.🐙

Python 49 5 Updated Dec 21, 2025

lxa9867 / ImageFolder

High-performance Image Tokenizers for VAR and AR

Python 300 6 Updated Apr 25, 2025

sihyun-yu / REPA

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,466 70 Updated Mar 16, 2025

amir84ferdos / ComfyUI-GRAG-ArchAi3D

Advanced GRAG implementation for ComfyUI with beginner-friendly and expert modes

Python 15 3 Updated Nov 6, 2025

little-misfit / GRAG-Image-Editing

https://little-misfit.github.io/GRAG-Image-Editing/

Python 114 3 Updated Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly