Stars
Command-line program to download videos from YouTube.com and other video sites
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Robust Speech Recognition via Large-Scale Weak Supervision
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Image-to-Image Translation in PyTorch
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Faster Whisper transcription with CTranslate2
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Magenta: Music and Art Generation with Machine Intelligence
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
State-of-the-Art Text Embeddings
Automatic headphone equalization from frequency responses
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
100+ Chinese Word Vectors 上百种预训练中文词向量
Python bindings for FFmpeg - with complex filtering support
Simultaneous speech-to-text models
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
Python library for audio and music analysis
Multilingual Voice Understanding Model
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Official repo for consistency models.
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications