Skip to content
View Daisyqk's full-sized avatar

Block or report Daisyqk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,436 11,108 Updated Nov 7, 2025

🧡 Everything is RSSible

TypeScript 39,708 8,702 Updated Nov 7, 2025

Mother of All BCI Benchmarks

Python 867 217 Updated Nov 7, 2025

Ongoing research training transformer models at scale

Python 14,126 3,252 Updated Nov 7, 2025

沉浸式双语网页翻译扩展 , 支持输入框翻译, 鼠标悬停翻译, PDF, Epub, 字幕文件, TXT 文件翻译 - Immersive Dual Web Page Translation Extension

16,450 943 Updated Nov 7, 2025

Deep learning software to decode EEG, ECG or MEG signals

Python 1,067 233 Updated Nov 7, 2025

A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗Diffusers.

Python 527 20 Updated Nov 7, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 14,904 1,684 Updated Nov 7, 2025

Automatically crawl arXiv papers daily and summarize them using AI. Illustrating them using GitHub Pages.

JavaScript 2,034 654 Updated Nov 7, 2025

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deplo…

C 308 24 Updated Nov 7, 2025

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 12,903 1,347 Updated Nov 7, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 1,710 177 Updated Nov 6, 2025

SOTA Open Source TTS

Python 24,003 1,958 Updated Nov 6, 2025

A feature-rich command-line audio/video downloader

Python 134,136 10,773 Updated Nov 5, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 20,001 2,085 Updated Nov 5, 2025

End-to-End Speech Processing Toolkit

Python 9,568 2,343 Updated Nov 5, 2025

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 8,743 968 Updated Nov 5, 2025

This repository aims to collect Transformer-based sound event detection (SED) algorithms.

Jupyter Notebook 76 5 Updated Nov 4, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,014 87 Updated Nov 4, 2025

Translate the video from one language to another and add dubbing.

Python 15,125 1,765 Updated Nov 4, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 25,764 2,591 Updated Nov 4, 2025

一个开源的多角色、多情绪 AI 配音生成平台,支持小说、剧本、视频等内容的自动配音与导出。

Python 194 25 Updated Nov 4, 2025

总结Prompt&LLM论文,开源数据&模型,AIGC应用

3,274 316 Updated Nov 3, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,077 827 Updated Nov 3, 2025

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

Python 450 29 Updated Oct 29, 2025

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Python 24,370 3,428 Updated Oct 28, 2025

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 685 270 Updated Oct 27, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 692 91 Updated Oct 27, 2025

Official codebase for "Brain-JEPA: Brain Dynamics Foundation Model with Gradient Positioning and Spatiotemporal Masking" (NeurIPS 2024, Spotlight).

Python 140 34 Updated Oct 27, 2025

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,198 106 Updated Oct 26, 2025
Next