Skip to content

devenkulkarni/mig-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS Infrastructure Module for setting up GPU powered instance along with RKE2 and NVIDIA Drivers

This module provisions a GPU-accelerated EC2 instance running a custom-built AMI openSUSE Leap 15.6 and prepares it for the installation of GPU operator by installing NVIDIA drivers and RKE2.

Features

  • By default provisions G4dn instance types with NVIDIA GPUs.
  • Uses openSUSE Leap 15.x AMI.
  • Automated driver installation via startupscript.tftpl.
  • Configures Security Groups for SSH and Kubernetes API (6443).

Usage

Prerequisites

# Install AWS CLI and configure credentials
aws configure
# Verify identity
aws sts get-caller-identity

Usage

1. Clone the repository

git clone https://github.com/devenkulkarni/migdemo.git
cd migdemo

2. Project Structure

migdemo
├── demo
│   ├── gpu-workload.yaml
│   ├── mig-mixed-strategy-deploy-pod.sh
│   ├── mig-mixed-strategy-log-verify.sh
│   ├── mig-single-strategy-deploy-pod.sh
│   └── mig-single-strategy-log-verify.sh
├── gpu-operator
│   └── README.md
├── infra
│   ├── data.tf
│   ├── docs.md
│   ├── main.tf
│   ├── outputs.tf
│   ├── provider.tf
│   ├── scripts
│   │   ├── rke2-localpath-install.sh
│   │   └── startupscript.tftpl
│   ├── terraform.tfvars
│   ├── terraform.tfvars.example
│   └── variables.tf
└── README.md

3. Running the Terraform Code

  • Copy ./terraform.tfvars.example to ./terraform.tfvars

  • Edit ./terraform.tfvars

Update the required variables:

  • prefix to give the resources an identifiable name (e.g., your initials or first name)

  • region to specify the AWS region where resources will be created (e.g., us-west-2)

  • zone to specify the AWS zone where resources will be created (e.g., us-west-2c)

  • instance_type to specify the instance type (e.g, g4dn.xlarge)

  • rke2_version to specify the version of RKE2 cluster to be installed.

Then execute the terraform commands:

# Navigate to the AWS infra implementation directory
cd infra

# Initialize the working directory (downloads providers and modules)
terraform init -upgrade

# Preview the changes (highly recommended)
terraform plan 

# Apply the configuration
terraform apply --auto-approve

Cleanup:

To tear down the infrastructure and avoid costs:

terraform destroy --auto-approve

About

Set up your own GPU powered instance on AWS with RKE2 Kubernetes and then help you setup NVIDIA GPU operator for testing out AI workloads.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors