Skip to content
View RMSnow's full-sized avatar

Highlights

  • Pro

Block or report RMSnow

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 9,165 571 Updated Oct 30, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 977 42 Updated Oct 27, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 19,981 2,199 Updated Aug 12, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 1,867 127 Updated Oct 30, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,035 669 Updated Oct 29, 2024
Python 17 1 Updated Sep 14, 2024

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,601 125 Updated Sep 19, 2023

Train transformer language models with reinforcement learning.

Python 9,885 1,250 Updated Oct 28, 2024

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

Python 2,363 231 Updated Oct 30, 2024

State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning

Python 488 53 Updated Oct 29, 2024

A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/

JavaScript 2,046 304 Updated Sep 10, 2024

The Emotional Voices Database: Towards Controlling the Emotional Expressiveness in Voice Generation Systems

Python 254 19 Updated Oct 10, 2023

Code for ICML2020 paper - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information

Jupyter Notebook 308 39 Updated May 10, 2024
Python 6,556 498 Updated Oct 14, 2024

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Python 618 43 Updated Oct 27, 2024

Perceptual Quality Estimator for speech and audio

C++ 689 123 Updated Aug 2, 2024

Inference and training library for high-quality TTS models.

Python 4,514 457 Updated Oct 14, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,492 164 Updated Sep 24, 2024
Python 37 2 Updated Oct 28, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,931 635 Updated Oct 22, 2024

The official GitHub page for the survey paper "Foundation Models for Music: A Survey".

90 3 Updated Sep 4, 2024

A library for speech data augmentation in time-domain

Python 641 57 Updated Aug 30, 2021

Diffusion Model for Voice Conversion

Jupyter Notebook 36 7 Updated Mar 14, 2024

PolySinger: Singing-Voice to Singing-Voice Translation From English to Japanese

3 Updated Jul 8, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 446 38 Updated Jul 26, 2024
Jupyter Notebook 45 3 Updated Oct 18, 2024

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…

507 35 Updated Oct 13, 2024

Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793

Python 310 10 Updated Oct 19, 2024

This is the GitHub page for publicly available emotional speech data.

318 23 Updated Jan 6, 2022

Public Code for Neural Codec Language Models for Disentangled and Textless Voice Conversion (Interspeech 2024)

3 Updated Jun 6, 2024
Next