Crowded place
jetson ai tutorial website
https://www.jetson-ai-lab.com/
booth 551
https://www.jetson-ai-lab.com/tutorials/gtc26/
ssh jetson@172.20.36.244
password: jetson
jetson@jat-4cbb4701c876:$ ^C
jetson@jat-4cbb4701c876:$ nvidia-smi
Wed Mar 18 17:26:55 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00 Driver Version: 580.00 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA Thor Off | 00000000:01:00.0 Off | N/A |
| N/A 36C N/A 2W / N/A | Not Supported | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2642 G /usr/lib/xorg/Xorg 31MiB | | 0 N/A N/A 2963 G /usr/bin/gnome-shell 6MiB | | 0 N/A N/A 3450 G /usr/libexec/gnome-initial-setup 11MiB | +-----------------------------------------------------------------------------------------+ jetson@jat-4cbb4701c876:~$
Why VLMs and Not CNN-Based Models for Physical AI? Traditional CNN-based object detection models like YOLO — they output bounding boxes or class labels. A CNN-based system knows “there’s a cup” but not “the cup is too close to the edge and might fall.”
VLMs think. They reason about context, spatial relationships, and consequences:
setup venv and then install vllm
vllm serve ~/models/cosmos-reason2-8b \
--served-model-name nvidia/cosmos-reason2-8b-fp8 \
--max-model-len 8192 \
--gpu-memory-utilization 0.7 \
--reasoning-parser qwen3 \
--media-io-kwargs '{"video": {"num_frames": -1}}' \
--enable-prefix-caching \
--port 8000
sudo docker run -it --rm --runtime=nvidia --network host
-v $MODEL_PATH:/models/cosmos-reason2-8b:ro
-v ${HOME}/.cache/vllm:/root/.cache/vllm
ghcr.io/nvidia-ai-iot/vllm:0.14.0-r38.3-arm64-sbsa-cu130-24.04
vllm serve /models/cosmos-reason2-8b
--served-model-name nvidia/cosmos-reason2-8b-fp8
--max-model-len 8192
--gpu-memory-utilization 0.7
--reasoning-parser qwen3
--media-io-kwargs '{"video": {"num_frames": -1}}'
--enable-prefix-caching
--port 8010