Skip to content
View KidsXH's full-sized avatar

Highlights

  • Pro

Organizations

@ZJUVAI

Block or report KidsXH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 245 4 Updated Apr 19, 2026

CLI for common Playwright actions. Record and generate Playwright code, inspect selectors and take screenshots.

TypeScript 11,470 599 Updated Jun 10, 2026
Python 52 5 Updated Aug 31, 2025

Automatic solver for plane geometry problems.

Jupyter Notebook 29 6 Updated Jun 20, 2026

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 19,508 1,491 Updated Feb 27, 2026

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

1,484 47 Updated Mar 9, 2026

[AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

HTML 57 3 Updated Jun 8, 2026

R1-onevision, a visual language model capable of deep CoT reasoning.

Python 581 16 Updated Apr 13, 2025
TypeScript 9 3 Updated Jun 20, 2024

Leveraging Multimodal Prompt for Visualization Authoring with LLMs

TypeScript 7 1 Updated Jan 29, 2026

Vega-Lite Chart Dataset and NL Generation Framework using LLMs

Python 136 17 Updated May 30, 2024

Code for BLT research paper

Python 2,044 193 Updated Nov 3, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,149 2,698 Updated Jan 23, 2026

PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Python 28 5 Updated Oct 10, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,373 73 Updated Jan 27, 2026

Awesome-Paper-list: Visualization meets LLM

83 5 Updated Mar 26, 2026

A benchmark designed to evaluate visualization generation methods.

Python 59 13 Updated Nov 4, 2025

Graph Diffusion Policy Optimization

Python 43 5 Updated Mar 17, 2024

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,328 8,852 Updated Jun 21, 2026

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 4,415 371 Updated Jun 17, 2026
TypeScript 25 1 Updated Apr 19, 2024

[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Python 279 16 Updated Apr 17, 2024

Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"

49 2 Updated Oct 21, 2023

Build AI Agents, Visually

TypeScript 53,884 24,572 Updated Jun 16, 2026
TypeScript 18 6 Updated Jul 19, 2023

Here is the official implementation of the model KD3A in paper "KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation".

Python 120 14 Updated Aug 30, 2022

OI / ACM-ICPC essays and learning materials

Rich Text Format 1,658 388 Updated Jun 1, 2025