Stars
Paper list for Efficient Reasoning.
MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
Running Llama3.2-11B-Vision
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
[TPAMI 2025] Towards Visual Grounding: A Survey
Trained a classifier to recognize 3000 images with 15 categories using Bag of Features model and Spatial Pyramid Matching algorithm. Improved accuracy from ~50% to ~70%
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning