- Islamabad Pakistan
-
19:52
(UTC +05:00) - https://visionusecases.com/
- in/muhammadrizwanmunawar
- @muhammdrizwanmr
- @muhammadrizwanmunawar
- https://muhammadrizwanmunawar.medium.com/
Highlights
Lists (2)
Sort Name ascending (A-Z)
Stars
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
An open-source AI agent that brings the power of Gemini directly into your terminal.
An extremely fast Python package and project manager, written in Rust.
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Label Studio is a multi-type data labeling and annotation tool with standardized output format
FoundationDB - the open source, distributed, transactional key-value store
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Reference PyTorch implementation and models for DINOv3
🧙 Build, run, and manage data pipelines for integrating and transforming data.
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
Torchreid: Deep learning person re-identification in PyTorch.
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Images to inference with no labeling (use foundation models to train supervised models).
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Transform Web Content into LLM-Ready Data
Implementation of "YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception".
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Convert JSON annotations into YOLO format.