-
The University of Sheffield
- Sheffield
-
21:14
(UTC +01:00) - http://www.robertflynn.co.uk
- @RobFlynnHere
- https://huggingface.co/rjflynn2
- https://wandb.ai/wobrob101
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)
[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Sync papers from Zotero to a reMarkable tablet
A curated list of projects related to the reMarkable tablet
Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality
PyTorch implementation of the Mamba-3 architecture
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Interactive visualizations of the geometric intuition behind diffusion models.
🚀 Efficient implementations for emerging model architectures
A torch implementation of a recursion which turns out to be useful for RNN-T.
Open-source release accompanying Gao et al. 2025
Jax Codebase for Evolutionary Strategies at the Hyperscale
Python library for backtesting trading strategies & analyzing financial markets (formerly pythalesians)
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Text to speech alignment using CTC forced alignment
LongCat Audio Tokenizer and Detokenizer
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.