Skip to content
View Wakals's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Wakals

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open-source unified multimodal model

Python 6,033 536 Updated May 4, 2026

📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for streaming video.

180 5 Updated Jun 10, 2026

Implementation of paper "Playful Agentic Robot Learning"

Python 65 1 Updated Jun 20, 2026

RoboBrain 2.5: Advanced version of RoboBrain. Depth in Sight, Time in Mind. 🎉🎉🎉

Python 1,106 109 Updated Feb 28, 2026

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,238 1,107 Updated Jun 2, 2026

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 6,467 764 Updated Mar 23, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,101 117 Updated Jul 29, 2024

SenseNova-U series: Native Unified Paradigm with NEO-unify from the First Principles

Python 3,335 292 Updated Jun 15, 2026

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

Jupyter Notebook 10,494 702 Updated Jun 23, 2026

RoboTwin 2.0 Offical Repo

Python 2,475 404 Updated May 23, 2026

Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals

Python 2,305 198 Updated Apr 19, 2026

Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 6,571 383 Updated Jun 23, 2026

A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.

180,643 18,483 Updated Apr 20, 2026

Fully Open Framework for Democratized Multimodal Training

Python 1,098 75 Updated Jun 23, 2026

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

Python 174 17 Updated Mar 14, 2026

A curated, continuously updated reading list, paper blogs, and resources for World Action Models (WAMs) in embodied AI.

HTML 914 23 Updated Jun 21, 2026

The official repository of Qwen-VLA

631 25 Updated May 29, 2026

An image-to-world skillset for Claude.

TypeScript 4,623 465 Updated May 15, 2026

Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Python 3,921 547 Updated May 26, 2026

Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"

Python 189 29 Updated May 28, 2026

Implementation of the paper "Counting Through Occlusion: Framework for Open World Amodal Counting"

Python 1 Updated Nov 16, 2025

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Python 10 Updated Apr 23, 2025

[ICLR 2026 Oral] Visual Planning: Let's Think Only with Images

Python 362 12 Updated Apr 24, 2026

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 19,389 2,481 Updated May 30, 2026

Depth Anything 3

Python 5,610 621 Updated Mar 21, 2026

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 8,319 862 Updated Mar 24, 2026

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Python 8,124 613 Updated Jul 17, 2024

Official implementation of "RL Makes MLLMs See Better Than SFT"

Python 7 Updated Apr 10, 2026
Next