Lists (1)
Sort Name ascending (A-Z)
Stars
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
NVIDIA Isaac Sim™ is an open-source application on NVIDIA Omniverse for developing, simulating, and testing AI-driven robots in realistic virtual environments.
Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…
A PowerPoint add-in to insert LaTeX equations into PowerPoint presentations on Windows and Mac
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A curated list of large VLM-based VLA models for robotic manipulation.
The official code for paper "GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation"
CVPR2026 ANTS: Shaping the Adaptive Negative Textual Space by MLLM for OOD Detection
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Code, documentation, and discussion around the MIMIC-CXR database
Enhanced Generative Structure Prior for Text Image Super-Resolution [TPAMI]
GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
A Multitask Conversational Vision-Language Model for Radiology
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
[ICCV 2025] FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
这是一个简单的技术科普教程项目,主要聚焦于解释一些有趣的,前沿的技术概念和原理。每篇文章都力求在 5 分钟内阅读完成。
[ICCV2025] Official code for Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training
[CVPR2026] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding