mutimodal
Here are 9 public repositories matching this topic...
基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
-
Updated
May 26, 2025 - Python
明康慧醫(MKTY)——基於LLM與多模態人工智能的健康管理與輔助診療系統設計與實現。(明康慧醫智慧醫療系統)該項目已用於齊魯工業大學(山東省科學院)計算機學部2025年畢業設計。項目作者:杜宇 @duyu09,電子郵箱:qluduyu09@163.com [Source code of Design and Implementation of MINH KHỎE TUỆ Y - A Health Management and Assisted Diagnosis System Based on LLM and Multimodal Artificial Intelligence. (Minh Khoe Tue Y Smart Healthcare System)]
-
Updated
Aug 19, 2025 - Vue
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
-
Updated
Apr 15, 2025 - Python
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
-
Updated
Nov 21, 2024 - Python
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
-
Updated
Feb 9, 2025 - Python
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
-
Updated
Mar 20, 2025 - Jupyter Notebook
QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]
-
Updated
Jul 20, 2025 - Python
Improve this page
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."