Stars
a open framework for blind navigation based on esp32
real time face swap and one-click video deepfake with only a single image
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
Multi - LiDAR-to-LiDAR calibration framework for ROS2 and non-ROS applications
🤖 Machine Learning Summer School Guide
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping
[ICCV 2023} Official repo of "BEVBert: Multimodal Map Pre-training for Language-guided Navigation"
Official code release for ConceptGraphs
[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
List of language agents based on paper "Cognitive Architectures for Language Agents"
A beautiful and customizable wallpapers manager for Linux
[ICLR 2024] Source codes for the paper "Building Cooperative Embodied Agents Modularly with Large Language Models"
Stable Diffusion web UI
rtl8812AU_8821AU linux kernel driver for AC1200 (801.11ac) Wireless Dual-Band USB Adapter
A platform for executing RRT exploration in ROS Noetic and Ubuntu 20.04LTS
A platform for executing RRT exploration in ROS Melodic and Ubuntu 18.04LTS
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Surgical Visual Question Answering. A transformer-based surgical VQA model. Offical Implementation of "Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformers", MICCAI 2022.
[NeurIPS 2023 FMDM Workshop] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf