Deeplearning utils for multimodal research
-
Updated
Jul 28, 2023 - Python
Deeplearning utils for multimodal research
Code and Models for Binding Text, Images, Graphs, and Audio for Music Representation Learning
Learning a common representation space from speech and text for cross-modal retrieval given textual queries and speech files.
A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.
This repository implements temporal reasoning capabilities for vision-language models in simulated embodied environments, addressing the critical limitation of frame-by-frame processing in current multimodal AI systems.
MCD-UNet: A Multi-modal Conditional Diffusion UNet for 3D Medical Image Segmentation
A project for generating artistic images semantically relatead to music inputs.
Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.
Semi-Supervised Learning (SSL)
A deep learning system for real-time emotion recognition from both text and images using transformers.
This repository contains the codebase for optimizing a Vision to Text model on a target RTX3060 device using Apache TVM
Findings ACL 23: Visual Coherence Loss for Coherent and Visually Grounded Story Generation
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
DishVision AI is a multimodal food recognition app powered by Google Gemini AI and Streamlit. Upload or capture a dish image, and the AI will detect its name, ingredients, and recipe instantly! 🚀🔥
This code is part of the paper: "A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation" published at ACM ICMI 2022.
Implementation for 3D bounding box prediction of objects using images, point clouds, and segmentation masks.
Human-aligned evaluation suite for LLMs, multimodal models, speech intent and jailbreak defense.
Repository for context based emotion recognition
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."