multimodal-deep-learning

A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.

machine-learning ai ml agi unified generative multimodality multimodal multimodal-deep-learning multimodal-ai unified-multimodal-models ml-generative

Updated Jul 30, 2025
Python

hiteshnarayan / FER_Landmark-Detection

Star

Facial landmark detection and emotion classification using PyTorch, MediaPipe, and ResNet18 with attention mechanisms on the FER2013

computer-vision human-computer-interaction multimodal-deep-learning

Updated Jul 26, 2025
Jupyter Notebook

multimindlab / .github

Sponsor

Star

Build AI Superpowers with One SDK: Agents, Models, Context, Orchestration. Anywhere. Framework for Python & JavaScript/TypeScript

Updated Jul 22, 2025

Etienne-bobo / Skimlit-Nlp

Star

The purpose of this project is to build an NLP model to make reading medical abtracts easier.

nlp keras-tensorflow multimodal-deep-learning tensorflow2

Updated Aug 20, 2023
Jupyter Notebook

rghvsrdhtra / Multimodal-Classification-Template

Star

A generalized research workbench for fusing visual (CNN) and textual (RNN/LSTM) features. It provides a clean, modular Keras framework, optimized for efficient training on Google Colab by leveraging local I/O and persistent asset saving to Google Drive.

colab image-classification text-processing cnn-keras cnn-model multimodal-learning multimodal multimodal-deep-learning colab-notebook multimodal-classification efficientnet multimodal-ai

Updated Oct 29, 2025
Jupyter Notebook

Rukh-sana / embodied-temporal-reasoning

Star

This repository implements temporal reasoning capabilities for vision-language models in simulated embodied environments, addressing the critical limitation of frame-by-frame processing in current multimodal AI systems.

computer-vision robotics artificial-intelligence multimodal multimodal-deep-learning embodied-artificial-intelligence temporal-reasoning embodied-agent embodied-ai habitat-sim llava

Updated Sep 24, 2025
Python

slinusc / path-vqa-blip

Star

Fine-tuning BLIP for pathological visual question answering.

blip pathology multimodal-deep-learning

Updated Jun 30, 2024
Jupyter Notebook

arielsiman-tov / Weather-Image-Classification-for-Extreme-Weather-Events

Star

Build a reliable and interpretable model that classifies extreme weather from images – enhancing early detection, situational awareness, and decision-making.

image deep-neural-networks deep-learning cnn pytorch embeddings rnn image-classification cosine-similarity feature-engineering clip tsne canny-edge-detection prompts encoder-decoder resnet-50 multimodal multimodal-deep-learning diffusion-models

Updated Jun 17, 2025
Jupyter Notebook

isevr / TVEmotion

Star

A novel multimodal approach for emotion recognition deploying early fusion based on graph-captured embeddings

graph-convolutional-networks emotion-recognition multimodal-deep-learning

Updated Jan 3, 2024
Jupyter Notebook

chikap421 / cvboil

Star

This repository accompanies the Thesis "Automated Segmentation and Analysis of High-Speed Video Phase-Detection Data for Boiling Heat Transfer Characterization Using U-Net Convolutional Neural Networks and Uncertainty Quantification" published by MIT Libraries.

machine-learning cnn computer uncertainty-quantification multimodal-deep-learning unet-image-segmentation

Updated May 25, 2024
MATLAB

ksm26 / Introducing-Multimodal-Llama-3.2

Star

This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.

nlp machine-learning llama tokenization multimodal-deep-learning prompting tool-calling llama-stack advaned-prompting image-reasoning

Updated Dec 18, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-deep-learning

Here are 526 public repositories matching this topic...

ishitab1310 / HateFilter

mobled37 / utils

pabloggarc / TFG

a-tabaza / binding_music

Rishab27279 / MoodyAI

ShowMeModel / transformers-multimodal-example

jena-shreyas / Awesome-Video-Language-Resources

Navoditamathur / research_papers_implementation

marcomoldovan / cross-modal-speech-segment-retrieval

elijahnzeli1 / SalesAI

hiteshnarayan / FER_Landmark-Detection

multimindlab / .github

Etienne-bobo / Skimlit-Nlp

rghvsrdhtra / Multimodal-Classification-Template

Rukh-sana / embodied-temporal-reasoning

slinusc / path-vqa-blip

arielsiman-tov / Weather-Image-Classification-for-Extreme-Weather-Events

isevr / TVEmotion

chikap421 / cvboil

ksm26 / Introducing-Multimodal-Llama-3.2

Improve this page

Add this topic to your repo