#

multimodal-deep-learning

Here are 313 public repositories matching this topic...

WestCoastGod / Generate-Image-from-Music-Emotion

Generate image from music emotion. Starting from retrieve music information, predicting the valence and arousal values and eventually generate image with similar emotion.

music machine-learning deep-learning random-forest music-information-retrieval image-generation multimodal-deep-learning diffusion-models music-emotion-recognition image-emotion music-to-image-emotion

Updated Dec 7, 2025
Python

praevalis / demorph

Deepfake Detection Solution using Multimodal Approach.

multimodal-deep-learning deepfake-detection

Updated Dec 14, 2025
Python

fork123aniket / Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

story-generation multimodal-learning multimodal multimodal-deep-learning multimodal-data vision-language vision-language-transformer generative-ai vision-language-model multimodal-large-language-models vision-language-learning generative-ai-model agentic-workflow agentic-rag agentic-ai internvl2

Updated Jan 29, 2025
Python

tudorhirtopanu / av-matchmaker

Multi-speaker diarization from video using SyncNet’s cross-modal embedding space to match multiple face tracks to corresponding audio tracks.

audio-visual-speech-recognition multimodal-deep-learning

Updated Oct 20, 2025
Python

kyegomez / MMCA-MGQA

Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention

artificial-intelligence attention attention-mechanism multimodality attention-is-all-you-need multimodal multimodal-deep-learning gpt4

Updated Mar 11, 2024
Python

Demfier / pmup

App to cheer you up with some awesome quotes when depressed using deep learning

flask sentiment-analysis tensorflow pytorch node-js emotion-recognition capsnet multimodal-deep-learning

Updated Feb 25, 2019
Python

licesonw / deepmm

Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.

deep-learning wide-and-deep factorization-machine neural-factorization-machines categorical-features deepfm multimodal multimodal-deep-learning deep-and-cross

Updated Jul 23, 2020
Python

mobled37 / utils

Deeplearning utils for multimodal research

finetuning multimodal-deep-learning

Updated Jul 28, 2023
Python

a-tabaza / binding_music

Code and Models for Binding Text, Images, Graphs, and Audio for Music Representation Learning

music-information-retrieval multimodal-deep-learning joint-embedding

Updated Jun 24, 2024
Python

thatAverageGuy / EarlyFusion-on-EasyVQA

Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.

transformers pytorch visual-question-answering vqa-dataset multimodal-deep-learning streamlit early-fusion

Updated Aug 22, 2022
Python

DistilledCode / mmrl

Multi-Modal Representational Learning for Social Media Popularity Prediction

neural-network embeddings data-pipeline multimodal-deep-learning praw-reddit airflow-dags chromadb multimodal-large-language-models

Updated Jun 30, 2024
Python

StavrosMitro / MISA

Enhanced fork of MISA integrating the MMLatch feedback mechanism for multimodal sentiment analysis.

ai sentiment-analysis artificial-intelligence neural-networks ntua multimodal-deep-learning

Updated Jul 29, 2025
Python

deepur71 / InstructPix2Pix

Implementation of InstructPix2Pix from scratch

multimodal-deep-learning

Updated Dec 13, 2024
Python

JHKim-snu / GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

robotic-arm lifelong-learning vision-and-language multimodal-deep-learning robot-manipulation iros2023

Updated Apr 23, 2024
Python

Rishab27279 / MoodyAI

python nlp docker computer-vision deep-learning sentiment-analysis pytorch emotion-recognition video-analysis multimodal-deep-learning streamlit distillation-model wav2vec2 cross-attention openai-whisper dino-v2 multimodal-ai

Updated Oct 6, 2025
Python

AndreiMoraru123 / ContextCollector

Mixed vision-language Attention Model that gets better by making mistakes

Updated Feb 3, 2024
Python

kassy11 / daicwoz_voice

Preprocessing and feature extraction for raw voice data of DAIC-WOZ

multimodal multimodal-deep-learning depression-detection daic-woz mental-helath

Updated Dec 26, 2024
Python

dermatologist / kedro-tf-text

Kedro pipelines for preprocessing text and tabular data for multi-modal ML in TensorFlow.

medical healthcare gpt hacktoberfest nlp-machine-learning bert multimodal-deep-learning kedro

Updated Feb 9, 2023
Python

guyyariv / vLMIG

This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation

deep-learning language-model vision-and-language multimodal-deep-learning visual-commonsense-reasoning visual-commonsense

Updated Jul 1, 2024
Python

ahmdtaha / distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

python3 pytorch unsupervised-learning vision-and-language multimodal-deep-learning self-supervised-learning vision-language contrastive-learning distributed-data-parallel vision-transformer vision-language-pretraining

Updated Sep 26, 2023
Python

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."