Skip to content
#

multimodal-deep-learning

Here are 526 public repositories matching this topic...

A generalized research workbench for fusing visual (CNN) and textual (RNN/LSTM) features. It provides a clean, modular Keras framework, optimized for efficient training on Google Colab by leveraging local I/O and persistent asset saving to Google Drive.

  • Updated Oct 29, 2025
  • Jupyter Notebook

This repository implements temporal reasoning capabilities for vision-language models in simulated embodied environments, addressing the critical limitation of frame-by-frame processing in current multimodal AI systems.

  • Updated Sep 24, 2025
  • Python

Build a reliable and interpretable model that classifies extreme weather from images – enhancing early detection, situational awareness, and decision-making.

  • Updated Jun 17, 2025
  • Jupyter Notebook

This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.

  • Updated Dec 18, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."

Learn more