-
Microsoft
- Redmond
- https://soham97.github.io
- @sohamdesh_
Highlights
- Pro
Stars
Security hooks for AI coding agents : Block dangerous commands, prevent secret leaks, and enforce runtime policies across Claude, OpenClaw, Antigravity, Codex, Cursor and Windsurf
Make beautiful isometric infrastructure diagrams
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
A Conversational Speech Generation Model
Unified automatic quality assessment for speech, music, and sound.
Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Audio Entailment: Deductive Reasoning for Audio Understanding
Awesome speech/audio LLMs, representation learning, and codec models
A simple library for Fréchet Audio Distance (FAD) calculation
PAM is a no-reference audio quality metric for audio generation tasks
Repository for "Training Audio Captioning Models without Audio"
Tracking states of the arts and recent results (bibliography) on sound tasks.
Web-crawl for "Audio Retrieval with WavText5K and CLAP Training"
Learning audio concepts from natural language supervision
Code repo for "Multi-Task Learning for Interpretable Weakly Labelled Sound Event Detection"
speech enhancement\speech seperation\sound source localization
Self-Supervised Speech Pre-training and Representation Learning Toolkit
Reading list for research topics in Sound AI