Skip to content

Conversation

@IlyasMoutawwakil
Copy link

This PR adds support for amd gpus through amdsmi.
#445

@IlyasMoutawwakil
Copy link
Author

running this on an MI250, rocm 5.6.1, with amd-smi installed:

from codecarbon import EmissionsTracker
import torch
import time


def workload():
    matrix1 = torch.randn(1000, 1000, device="cuda")
    matrix2 = torch.randn(1000, 1000, device="cuda")

    return matrix1 @ matrix2


with EmissionsTracker(tracking_mode="process") as tracker:
    start = time.time()
    while time.time() - start < 10:
        workload()

print("total_energy:", tracker._total_energy.kWh)
print("total_co2:", tracker.final_emissions)

I get a coherent output:

[codecarbon INFO @ 09:47:24] [setup] RAM Tracking...
[codecarbon INFO @ 09:47:24] [setup] GPU Tracking...
[codecarbon INFO @ 09:47:25] Tracking AMD GPU via amdsmi
[codecarbon INFO @ 09:47:25] [setup] CPU Tracking...
[codecarbon WARNING @ 09:47:25] No CPU tracking mode found. Falling back on CPU constant mode.
[codecarbon INFO @ 09:47:26] CPU Model on constant consumption mode: AMD EPYC 7763 64-Core Processor
[codecarbon INFO @ 09:47:26] >>> Tracker's metadata:
[codecarbon INFO @ 09:47:26]   Platform system: Linux-5.15.0-84-generic-x86_64-with-glibc2.35
[codecarbon INFO @ 09:47:26]   Python version: 3.10.12
[codecarbon INFO @ 09:47:26]   CodeCarbon version: 2.3.2
[codecarbon INFO @ 09:47:26]   Available RAM : 1007.705 GB
[codecarbon INFO @ 09:47:26]   CPU count: 128
[codecarbon INFO @ 09:47:26]   CPU model: AMD EPYC 7763 64-Core Processor
[codecarbon INFO @ 09:47:26]   GPU count: 1
[codecarbon INFO @ 09:47:26]   GPU model: 1 x AMD INSTINCT MI250 (MCM) OAM AC MBA

[codecarbon INFO @ 09:47:40] Energy consumed for RAM : 0.000001 kWh. RAM Power : 0.3306770324707031 W
[codecarbon INFO @ 09:47:40] Energy consumed for all GPUs : 0.000848 kWh. Total GPU Power : 303.82411939507443 W
[codecarbon INFO @ 09:47:40] Energy consumed for all CPUs : 0.000391 kWh. Total CPU Power : 140.0 W
[codecarbon INFO @ 09:47:40] 0.001240 kWh of electricity used since the beginning.
total_energy: 0.001239593259251299
total_co2: 3.539906470443935e-05

@IlyasMoutawwakil
Copy link
Author

IlyasMoutawwakil commented Jan 17, 2024

@benoit-cty @SabAmine

@benoit-cty
Copy link
Contributor

Thanks, it's really great!

Do you think it's possible to have a machine with both AMD and Nvidia GPU ?

Before merging, this PR needs unit tests and documentation.

@IlyasMoutawwakil
Copy link
Author

IlyasMoutawwakil commented Jan 22, 2024

Apparently it's possible to have both (though very rare), I will update the code to account for it.
Not sure about unit tests, do you run NVIDIA GPU tests in a workflow ?

@benoit-cty
Copy link
Contributor

For the test, we do 'mock' to check the function call.
This way you can test if the code works with both nvidia and amd GPU, even if you don't have them.

@benoit-cty benoit-cty linked an issue Feb 9, 2024 that may be closed by this pull request
return []


def is_gpu_details_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! Just one comment, maybe instead of removing this you can do

def is_gpu_details_available():
      return  PYNVML_AVAILABLE or AMDSMI_AVAILABLE

@fxmarty
Copy link

fxmarty commented Apr 16, 2024

@IlyasMoutawwakil any update?

@benoit-cty
Copy link
Contributor

ROCm is not packaged for Python, but I think it is not blocking for this PR, we already have the issue for MacOSX and Windows.

To merge this PRR, we need tests and documentation.

@benoit-cty benoit-cty changed the title AMD GPUs support [Version 3] AMD GPUs support Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RoCm support

4 participants