Skip to content
View m-bain's full-sized avatar

Block or report m-bain

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Python 26 1 Updated Jan 28, 2025

Multimodal language model benchmark, featuring challenging examples

Python 181 11 Updated Dec 18, 2024
Python 356 12 Updated Jan 27, 2024

Structured Outputs

Python 13,136 658 Updated Dec 12, 2025

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,837 583 Updated May 3, 2024

GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm

C 9,876 356 Updated Oct 25, 2025

LLM training code for Databricks foundation models

Python 4,371 578 Updated Oct 27, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,190 2,684 Updated Aug 12, 2024

A Data Streaming Library for Efficient Neural Network Training

Python 1,433 181 Updated Oct 27, 2025

Reference implementation for DPO (Direct Preference Optimization)

Python 2,814 233 Updated Aug 11, 2024

MeetEval - A meeting transcription evaluation toolkit

Python 124 16 Updated Oct 2, 2025

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processin…

683 42 Updated Dec 25, 2024
Python 13 2 Updated May 10, 2025

Tools for handling multimodal data in machine learning projects.

Python 1,095 257 Updated Dec 15, 2025

Easily create large video dataset from video urls

Python 642 72 Updated Jul 30, 2024

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

Python 11 1 Updated May 25, 2023

String-to-String Algorithms for Natural Language Processing

Jupyter Notebook 563 30 Updated Jul 26, 2024

ImageBind One Embedding Space to Bind Them All

Python 8,907 835 Updated Nov 21, 2025

the subtitle editor :)

C# 11,752 1,139 Updated Dec 8, 2025

Simple diarization model

Python 53 3 Updated Jun 13, 2025
Python 16 1 Updated Sep 25, 2023

Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva

Python 92 22 Updated Feb 18, 2025

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

Python 532 22 Updated Nov 6, 2023

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,969 8,300 Updated May 27, 2025

[CVPR'23 Highlight] AutoAD: Movie Description in Context.

Python 101 4 Updated Nov 6, 2024

A database of movie scripts from several sources

Python 182 27 Updated May 3, 2024

Inference code for Llama models

Python 58,998 9,817 Updated Jan 26, 2025

gpu tester detects broken and slow gpus in a cluster

Python 72 6 Updated Feb 19, 2023

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Python 74 16 Updated Sep 27, 2021

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,073 1,088 Updated Nov 18, 2024
Next