akshitac8

😑

Busy

Akshita Gupta akshitac8

😑

Busy

ELLIS PhD student @TU-Darmstadt

269 followers · 457 following

TU-Darmstadt
Darmstadt
09:13 (UTC -12:00)
http://akshitac8.github.io/
@akshitac8

Achievements

x2 x2

Achievements

x2 x2

Highlights

Developer Program Member

Organizations

Stars

ant-research / UniAD

[CVPR'25] Official implementation for paper - Contextual AD Narration with Interleaved Multimodal Sequence

Python 8 Updated May 1, 2025

LALBJ / PAI

[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

Python 163 10 Updated Nov 6, 2024

zhang9302002 / ThinkingWithVideos

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 81 1 Updated Oct 15, 2025

shikras / shikra

Python 805 48 Updated Jul 8, 2024

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

997 84 Updated Dec 15, 2025

xmed-lab / TAM

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

Python 176 5 Updated Dec 14, 2025

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 365 43 Updated Dec 11, 2025

VectorSpaceLab / Video-XL

🔥🔥First-ever hour scale video understanding models

Python 611 41 Updated Jul 14, 2025

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,203 540 Updated May 5, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,302 1,596 Updated Jan 30, 2026

Vision-CAIR / LongVU

[ICML 2025] Official PyTorch implementation of LongVU

Python 422 35 Updated May 8, 2025

apple / visatronic-demo

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

HTML 15 3 Updated May 28, 2025

m-bain / CondensedMovies

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]

Python 194 29 Updated Sep 21, 2022

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 870 73 Updated Aug 27, 2024

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,403 95 Updated Jul 22, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,123 222 Updated May 19, 2025

lucas-ventura / chapter-llama

Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"

Python 89 15 Updated Jun 6, 2025

Jyxarthur / AutoAD-Zero

[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Python 28 1 Updated Jan 28, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,543 5,962 Updated Aug 16, 2024

Jyxarthur / shot-by-shot

[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, W…

Python 20 3 Updated Jul 26, 2025

chunmeifeng / SPRC

【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval

Python 92 8 Updated Apr 16, 2024

sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 317 25 Updated Apr 29, 2025

OpenGVLab / video-mamba-suite

The suite of modeling video with Mamba

Python 289 30 Updated May 14, 2024

lisadunlap / LADS

Official Implementation of LADS (Latent Augmentation using Domain descriptionS)

Python 51 9 Updated Apr 18, 2023

aim-uofa / AdelaiDepth

This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.

Python 1,110 150 Updated Nov 10, 2023

NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3

Python 6,892 1,235 Updated Sep 12, 2023

snap-research / 3dgp

3D generation on ImageNet [ICLR 2023]

Python 214 8 Updated May 23, 2023

XingangPan / DragGAN

Official Code for DragGAN (SIGGRAPH 2023)

Python 35,970 3,441 Updated May 18, 2024

guozix / TaI-DPT

Python 95 7 Updated Sep 23, 2023

dingfengshi / tridetplus

Code for the paper, Temporal Action Localization with Enhanced Instant Discriminability

Python 28 1 Updated Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akshita Gupta akshitac8

Achievements

Achievements

Highlights

Organizations

Block or report akshitac8

Stars

ant-research / UniAD

LALBJ / PAI

zhang9302002 / ThinkingWithVideos

shikras / shikra

NVIDIA / audio-flamingo

xmed-lab / TAM

zjysteven / lmms-finetune

VectorSpaceLab / Video-XL

apple / ml-fastvlm

QwenLM / Qwen3-VL

Vision-CAIR / LongVU

apple / visatronic-demo

m-bain / CondensedMovies

OpenMOSS / AnyGPT

0nutation / SpeechGPT

ictnlp / LLaMA-Omni

lucas-ventura / chapter-llama

Jyxarthur / AutoAD-Zero

coqui-ai / TTS

Jyxarthur / shot-by-shot

chunmeifeng / SPRC

sming256 / OpenTAD

OpenGVLab / video-mamba-suite

lisadunlap / LADS

aim-uofa / AdelaiDepth

NVlabs / stylegan3

snap-research / 3dgp

XingangPan / DragGAN

guozix / TaI-DPT

dingfengshi / tridetplus