Skip to content

GPU graphic not show metrics #725

@Gakhramanzode

Description

@Gakhramanzode

I have two GPU on node:

  • GPU-2cf4c3f6-25e7-7c24-d621-479c5d7150fd;
  • GPU-4969baef-c6e6-039b-5e1d-c72f494347b6.

we see this is below on screenshot:
Image

but we see metrics from only one GPU GPU-4969baef-c6e6-039b-5e1d-c72f494347b6:
Image

and not see metrics from GPU-2cf4c3f6-25e7-7c24-d621-479c5d7150fd
Image

we are sure GPU GPU-2cf4c3f6-25e7-7c24-d621-479c5d7150fd on load:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:2A:00.0 Off |                    0 |
| N/A   48C    P0            194W /  300W |   74367MiB /  81920MiB |     91%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          On  |   00000000:3D:00.0 Off |                    0 |
| N/A   49C    P0            190W /  300W |   74367MiB /  81920MiB |     89%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         4184729      C   ...3) --multiprocessing-fork_TP0      74358MiB |
|    1   N/A  N/A         4184730      C   ...7) --multiprocessing-fork_TP1      74358MiB |
+-----------------------------------------------------------------------------------------+

what could be the problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions