Skip to content
View TheDenk's full-sized avatar

Block or report TheDenk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,084 141 Updated Dec 18, 2025

Fara-7B: An Efficient Agentic Model for Computer Use

Python 3,338 296 Updated Dec 15, 2025

GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.

Python 1,710 142 Updated Dec 19, 2025
Python 126 18 Updated Sep 23, 2025

A powerful Python library for creating and managing isolated desktop environments using Docker containers.

Python 441 42 Updated Sep 8, 2025

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,807 578 Updated Mar 23, 2025

Lighter web automation with Python

Python 8,188 511 Updated Nov 10, 2025

Private, fast, and honest web browser

C++ 9,059 177 Updated Dec 20, 2025

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,608 124 Updated Oct 31, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,426 230 Updated Nov 12, 2025

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

Python 632 37 Updated Nov 20, 2025

Witness the aha moment of VLM with less than $3.

Python 4,012 289 Updated May 19, 2025

Native Multimodal Models are World Learners

Python 1,372 52 Updated Nov 28, 2025

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

479 26 Updated Dec 15, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,763 2,885 Updated Dec 24, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,585 595 Updated Dec 23, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,437 464 Updated Dec 18, 2025

Display and control your Android device

C 132,992 12,422 Updated Dec 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,091 12,161 Updated Dec 24, 2025

Structured Outputs

Python 13,151 658 Updated Dec 12, 2025

🔥🔥 Open-sourced unified customization model

Python 1,196 73 Updated Sep 12, 2025

Open-Source Frontier Voice AI

Python 18,979 2,097 Updated Dec 17, 2025

Tips and resources to prepare for Behavioral interviews.

7,478 1,462 Updated Aug 19, 2025

12 Lessons to Get Started Building AI Agents

Jupyter Notebook 47,477 16,338 Updated Dec 21, 2025

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

Python 791 82 Updated Jul 29, 2024

LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨ (ICCV 2025 Highlight)

Python 803 49 Updated Jul 24, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,972 663 Updated Nov 20, 2025

Wan2.2-Lightning: Speed up wan2.2 model with distillation

Python 248 16 Updated Nov 7, 2025
Next