Skip to content
View Weili-NLP's full-sized avatar
  • Baidu
  • Beijing

Block or report Weili-NLP

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 179 19 Updated Apr 14, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 379,373 79,409 Updated Jun 18, 2026

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

2,250 172 Updated May 16, 2026
Python 1,384 112 Updated Feb 12, 2026

Official Repo of paper "KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction". In the paper, we propose KnowCoder, the most powerful large language model so far for…

Python 107 12 Updated May 28, 2025

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Python 201 18 Updated Apr 9, 2026

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,250 107 Updated Oct 29, 2025

NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.

Python 7,376 1,269 Updated Jun 17, 2026

[ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.

Python 301 20 Updated May 20, 2024

AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reasoning models.

Python 132 9 Updated Mar 18, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,279 8,847 Updated Jun 17, 2026

Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing

Python 72 4 Updated Jul 13, 2025

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Python 588 84 Updated Feb 27, 2026

Repo of ACL 2025 Paper "Quantification of Large Language Model Distillation"

Python 103 9 Updated Mar 5, 2026

Towards Large Multimodal Models as Visual Foundation Agents

Python 268 11 Updated Apr 24, 2025

Simulation platform for general-purpose robotics & embodied AI learning.

Python 29,372 2,786 Updated Jun 17, 2026

✨✨Latest Advances on Multimodal Large Language Models

17,899 1,128 Updated Jun 18, 2026

[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models

Jupyter Notebook 3,998 385 Updated Feb 6, 2024

Official repo with the MM-PlanLLM code, from the paper Show and Guide: Instructional-Plan Grounded Vision and Language Model.

Python 2 Updated Nov 12, 2024

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python 1,372 73 Updated Jan 27, 2026

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,518 182 Updated Mar 28, 2025

Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.

Python 549 31 Updated Aug 14, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,493 251 Updated Dec 3, 2024

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 6,451 763 Updated Mar 23, 2025

A flexible and efficient codebase for training visually-conditioned language models (VLMs)

Python 993 1,125 Updated Jul 4, 2024

PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Jupyter Notebook 2,117 358 Updated Jul 14, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 3,300 202 Updated Oct 31, 2024
89 Updated Jan 25, 2024

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 6,782 751 Updated Mar 19, 2025
Next