A Cloud Function that automatically recovers terminated spot VMs in Google Cloud Platform.
- Periodically checks spot VMs
- Automatically restarts terminated VMs
- Scheduled execution with Cloud Scheduler
- VM exclusion with labels:
# Exclude a VM from recovery gcloud compute instances add-labels INSTANCE_NAME --labels=exclude_from_keeper=true # Include a VM back gcloud compute instances remove-labels INSTANCE_NAME --labels=exclude_from_keeper
- Terraform >= 1.0
- Google Cloud Account
- Google Cloud CLI installed and configured
- Required IAM permissions:
- Cloud Functions Admin
- Cloud Scheduler Admin
- Storage Admin
- Service Account Admin
- IAM Admin
keeper/
├── main.py # Cloud Function source code
├── requirements.txt # Python dependencies
├── terraform/ # Infrastructure as Code
│ ├── main.tf # Main Terraform configuration
│ ├── variables.tf # Variable definitions
│ └── terraform.tfvars.example # Example variable values
- Clone the repository
- Authenticate with Google Cloud:
gcloud auth application-default login
- Create
terraform.tfvars:cd keeper/terraform cp terraform.tfvars.example terraform.tfvars - Edit
terraform.tfvars:project_id = "your-project-id" region = "your-region" zone = "your-zone" schedule = "*/1 * * * *" # Desired cron schedule
- Initialize Terraform:
terraform init
- Review the plan:
terraform plan
- Apply the configuration:
terraform apply
You can configure the following variables in terraform.tfvars:
project_id: Your GCP project IDregion: Region where resources will be createdzone: Zone where spot VMs are locatedschedule: Cloud Scheduler cron expression
To remove all created resources:
terraform destroy