Stars
CLI for common Playwright actions. Record and generate Playwright code, inspect selectors and take screenshots.
ZJUVAI / GenesisGeo
Forked from Newclid/NewclidAutomatic solver for plane geometry problems.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
R1-onevision, a visual language model capable of deep CoT reasoning.
Leveraging Multimodal Prompt for Visualization Authoring with LLMs
Vega-Lite Chart Dataset and NL Generation Framework using LLMs
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Awesome-Paper-list: Visualization meets LLM
A benchmark designed to evaluate visualization generation methods.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
General technology for enabling AI capabilities w/ LLMs and MLLMs
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"
Here is the official implementation of the model KD3A in paper "KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation".
OI / ACM-ICPC essays and learning materials