Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
a.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
Robust Speech Recognition via Large-Scale Weak Supervision
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
A generative speech model for daily dialogue.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are commi…
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
基于 OpenAI API 的文本翻译、文本润色、语法纠错 Bob 插件,让我们一起迎接不需要巴别塔的新时代!Licensed under CC BY-NC-SA 4.0
PyTorch deep learning projects made easy.
A data augmentations library for audio, image, text, and video.
Scalable and user friendly neural 🧠 forecasting algorithms.
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
This library provides common speech features for ASR including MFCCs and filterbank energies.
PyTorch implementation of adversarial attacks [torchattacks]
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
FinRL®-Meta: Dynamic datasets and market environments for FinRL.
A list of tools, papers and code related to Deepfake Detection.
SALMONN family: A suite of advanced multi-modal LLMs
SincNet is a neural architecture for efficiently processing raw audio samples.
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
In defence of metric learning for speaker recognition
Audio processing by using pytorch 1D convolution network
VideoX: a collection of video cross-modal models
A comprehensive benchmark of deepfake detection