Skip to content
View anton-jeran's full-sized avatar
👋
👋

Organizations

@GAMMA-UMD

Block or report anton-jeran

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SOTA Open Source TTS

Python 30,802 2,628 Updated Jun 9, 2026

Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.

Python 363 90 Updated May 23, 2023

Sound event localization, detection, and tracking of multiple overlapping and moving sources in 2D spherical space using convolutional recurrent neural network

Python 401 72 Updated Nov 21, 2022

Code for voicing silent speech from EMG. Official repository for the papers "Digital Voicing of Silent Speech" at EMNLP 2020 and "An Improved Model for Voicing Silent Speech" at ACL 2021. Also incl…

Python 171 72 Updated Apr 30, 2024

Amazon Nova Act is an AWS service for building and deploying highly reliable AI agents that automate UI-based workflows at scale.

Python 909 147 Updated May 22, 2026

This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.

Python 54 6 Updated Mar 17, 2025

A Python Room Spatial Impulse Response Ray-Tracing Toolkit

C++ 85 10 Updated Mar 4, 2026

When given different views of an object as input, it can tell us if that specific object is present in a larger picture or not.

Python 6 Updated Jan 20, 2019

SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios

Python 275 23 Updated Jan 22, 2025

Fully open reproduction of DeepSeek-R1

Python 26,309 2,437 Updated Apr 2, 2026

A framework for few-shot evaluation of language models.

Python 12,948 3,337 Updated Jun 2, 2026

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 4,225 604 Updated Jun 11, 2026

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 21,281 1,830 Updated Mar 5, 2026

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Python 131 5 Updated Dec 9, 2024

Training and evaluation pipeline for MEG and EEG brain signal encoding and decoding using deep learning. Code for our paper "Decoding speech perception from non-invasive brain recordings" published…

Python 477 75 Updated Mar 12, 2024

Official implementation of NeurIPS 2024 paper "DiffusionPDE: Generative PDE-Solving Under Partial Observation"

Python 170 24 Updated Apr 29, 2025

Impulse Response measurement tool for MATLAB

MATLAB 41 9 Updated Sep 27, 2020

This is the official implementation of our mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh.

Python 109 13 Updated Jul 24, 2024

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Python 180 32 Updated Mar 19, 2026

This is the official implementation of reverberant speech to room impulse response estimator

Python 42 5 Updated Aug 7, 2024

Expressive Anechoic Recordings of Speech (EARS)

Python 218 13 Updated Jun 25, 2024

Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Python 141 10 Updated Mar 28, 2025

Official release of the Eyeful Tower dataset, a high-fidelity multi-view capture of 11 real-world scenes, from the paper “VR-NeRF High-Fidelity Virtualized Walkable Spaces” (Xu et al., SIGGRAPH Asi…

Python 196 8 Updated May 17, 2026

This is the official implementation of our end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.

Python 5 4 Updated May 5, 2024

A Differentiable Room Acoustics Simulator

Python 4 1 Updated Feb 4, 2026

PyTorch Implementation of FastDiff (IJCAI'22)

Python 423 59 Updated Jun 20, 2024

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 6,080 519 Updated Jul 1, 2025

Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark

63 1 Updated Aug 29, 2024
JavaScript 1 Updated Jul 22, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 22,467 2,302 Updated Jun 3, 2026
Next