-
Xi'an Jiaotong University
- XI'AN CHINA
Lists (2)
Sort Name ascending (A-Z)
Stars
QingYuanQu / insightface
Forked from deepinsight/insightfaceState-of-the-art 2D and 3D Face Analysis Project学习文档
[RSS 2025] PIN-WM : Learning Physics-INformed World Models for Non-Prehensile Manipulation
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Open-Sora: Democratizing Efficient Video Production for All
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[CVPR 2025] Official PyTorch implementation of "EdgeTAM: On-Device Track Anything Model"
🔥This is a curated list of "A survey on Efficient Vision-Language Action Models" research. We will continue to maintain and update the repository, so follow us to keep up with the latest developmen…
awesome grounding: A curated list of research papers in visual grounding
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Resources for Multiple Object Tracking (MOT)
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
A modular high-level library to train embodied AI agents across a variety of tasks and environments.
Real Time Head Pose Estimation: Accurate head pose estimation using ResNet 18/34/50 and MobileNet V2/V3 models. Evaluate yaw, pitch, and roll with pre-trained weights for quick integration.
Head Pose Estimation Based on 5D Rotation Representation
Official Pytorch implementation of "Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation" IEEE TIP 24
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
[ICCV2025] 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Our Webapp to annotate multi-camera pedestrian detection datasets.
Human Trajectory Prediction Dataset Benchmark (ACCV 2020)
[ICCV 2023] ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking
Official PyTorch implementation of "6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry," ECCV 2024
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.
A repo of awesome papers about multi target multi camera tracking