LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
-
Updated
Sep 22, 2025 - Go
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
DeepResearchAgent is a hierarchical multi-agent system designed not only for deep research tasks but also for general-purpose task solving. The framework leverages a top-level planning agent to coordinate multiple specialized lower-level agents, enabling automated task decomposition and efficient execution across diverse and complex domains.
RMDL: Random Multimodel Deep Learning for Classification
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
yolov5, yolov8, segmenations, face, pose, keypoints on deepstream
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
This is our solution for KDD Cup 2020. We implemented a very neat and simple neural ranking model based on siamese BERT which ranked first among the solo teams and ranked 12th among all teams on the final leaderboard.
OpenVINO+NCS2/NCS+MutiModel(FaceDetection, EmotionRecognition)+MultiStick+MultiProcess+MultiThread+USB Camera/PiCamera. RaspberryPi 3 compatible. Async.
End-to-End AI Voice Assistant pipeline with Whisper for Speech-to-Text, Hugging Face LLM for response generation, and Edge-TTS for Text-to-Speech. Features include Voice Activity Detection (VAD), tunable parameters for pitch, gender, and speed, and real-time response with latency optimization.
Accepted by TMM 2022
ArangoGraph is the easiest way to run ArangoDB. Available on AWS and Google Cloud.
This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
Robust particle filter based on dynamic averaging of multiple noise models
VyomAI: state-of-the-art NLP LLM Vision MultiModel transformers implementation into Pytorch
The Pictionary app uses LLaMA 3.1 to generate random drawing prompts and LLaMA 3.2 Vision to predict and judge user drawings based on these prompts. It provides an interactive and fun way to test your drawing skills within a set time limit.
📄 SemEval 2024 Task 8: Artificial Intelligence Text Detection System using Natural Language Processing and Neural Network techniques.
Add a description, image, and links to the multimodel topic page so that developers can more easily learn about it.
To associate your repository with the multimodel topic, visit your repo's landing page and select "manage topics."