Skip to content
View TheDenk's full-sized avatar

Block or report TheDenk

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,065 140 Updated Dec 18, 2025

Fara-7B: An Efficient Agentic Model for Computer Use

Python 3,307 289 Updated Dec 15, 2025

GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.

Python 1,657 136 Updated Dec 19, 2025
Python 126 18 Updated Sep 23, 2025

A powerful Python library for creating and managing isolated desktop environments using Docker containers.

Python 441 42 Updated Sep 8, 2025

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,789 575 Updated Mar 23, 2025

Lighter web automation with Python

Python 8,183 511 Updated Nov 10, 2025

Private, fast, and honest web browser

C++ 8,808 175 Updated Dec 20, 2025

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,600 123 Updated Oct 31, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,414 230 Updated Nov 12, 2025

ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

Python 628 37 Updated Nov 20, 2025

Witness the aha moment of VLM with less than $3.

Python 4,009 289 Updated May 19, 2025

Native Multimodal Models are World Learners

Python 1,367 52 Updated Nov 28, 2025

A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.

478 26 Updated Dec 15, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,673 2,865 Updated Dec 21, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,569 595 Updated Dec 22, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,409 462 Updated Dec 18, 2025

Display and control your Android device

C 132,853 12,407 Updated Dec 20, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,895 12,110 Updated Dec 22, 2025

Structured Outputs

Python 13,142 658 Updated Dec 12, 2025

🔥🔥 Open-sourced unified customization model

Python 1,196 73 Updated Sep 12, 2025

Open-Source Frontier Voice AI

Python 18,824 2,081 Updated Dec 17, 2025

Tips and resources to prepare for Behavioral interviews.

7,472 1,461 Updated Aug 19, 2025

12 Lessons to Get Started Building AI Agents

Jupyter Notebook 47,369 16,274 Updated Dec 21, 2025

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

Python 791 82 Updated Jul 29, 2024

LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨ (ICCV 2025 Highlight)

Python 801 49 Updated Jul 24, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,881 655 Updated Nov 20, 2025

Wan2.2-Lightning: Speed up wan2.2 model with distillation

Python 246 16 Updated Nov 7, 2025
Next