Currently, Ubicloud VMs can only be in "running" or "deleted" states from a user's perspective. There's no way to temporarily stop a VM to save costs while preserving its storage and configuration.
I noticed there's an internal stopped state and incr_stop semaphore in prog/vm/metal/nexus.rb, but it:
- Doesn't release CPU cores or hugepages back to the host
- Continues billing for the stopped VM
- Has no corresponding resume capability
- Isn't exposed via the public API
Proposed Solution
Implement full VM stop/resume functionality that:
- Stops the VM - Shuts down Cloud Hypervisor process via systemd
- Releases resources - Frees CPU cores and hugepages on the host
- Finalizes billing - Ends current billing records
- Preserves state - Keeps storage volumes, network config, firewall rules
- Resumes on demand - Reallocates resources and restarts VM
- Handles edge cases - Queues or migrates if original host lacks capacity
Technical Approach
Based on my exploration of the codebase, here's a proposed implementation:
1. Extend stopped state to release resources
# prog/vm/metal/nexus.rb
label def stopped
when_stop_set? do
# Stop the VM
host.sshable.cmd("sudo systemctl stop :vm_name", vm_name:)
# Release CPU and memory back to host
vm_host.update(
used_cores: Sequel[:used_cores] - vm.cores,
used_hugepages_1g: Sequel[:used_hugepages_1g] - vm.memory_gib
)
# For sliced VMs
if vm.vm_host_slice
vm.vm_host_slice.update(
used_cpu_percent: Sequel[:used_cpu_percent] - vm.cpu_percent_limit,
used_memory_gib: Sequel[:used_memory_gib] - vm.memory_gib
)
end
# Finalize billing records
active_billing_records.each(&:finalize)
vm.update(display_state: "stopped")
end
decr_stop
# Check for resume signal
when_resume_set? do
hop_resuming
end
nap 60
end
2. Add new resuming state
label def resuming
# Check if original host has capacity
if host_has_capacity?
reallocate_on_current_host
else
# Option 1: Wait for capacity
# Option 2: Migrate to different host with capacity
hop_find_new_host
end
# Reallocate resources
vm_host.update(
used_cores: Sequel[:used_cores] + vm.cores,
used_hugepages_1g: Sequel[:used_hugepages_1g] + vm.memory_gib
)
# Restart systemd service
host.sshable.cmd("sudo systemctl start :vm_name", vm_name:)
# Create new billing records
create_billing_records
vm.update(display_state: "starting")
hop_wait_sshable
end
def host_has_capacity?
vm_host.used_cores + vm.cores <= vm_host.total_cores &&
vm_host.used_hugepages_1g + vm.memory_gib <= vm_host.total_hugepages_1g
end
3. Add resume semaphore
# model/vm.rb
semaphore :resume # Add to existing semaphores
4. Add API endpoints
# routes/project/location/vm.rb
# Stop VM - releases CPU/memory, stops billing
post "/:vm_name/stop" do
authorize("Vm:stop", @location.id)
vm = @project.vms_dataset.where(location: @location.name, name: params[:vm_name]).first
raise ResourceNotFound, "VM not found" unless vm
raise InvalidRequest, "VM is not running" unless vm.display_state == "running"
vm.incr_stop
serialize(vm, :detailed)
end
# Resume VM - reallocates resources, resumes billing
post "/:vm_name/resume" do
authorize("Vm:resume", @location.id)
vm = @project.vms_dataset.where(location: @location.name, name: params[:vm_name]).first
raise ResourceNotFound, "VM not found" unless vm
raise InvalidRequest, "VM is not stopped" unless vm.display_state == "stopped"
vm.incr_resume
serialize(vm, :detailed)
end
5. Update VM serializer
# serializers/vm.rb
def self.serialize_detailed(vm)
{
# ... existing fields ...
can_stop: vm.display_state == "running",
can_resume: vm.display_state == "stopped",
}
end
State Diagram
User Actions
│
┌───────────────┴───────────────┐
│ │
▼ ▼
POST /stop POST /resume
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ running │ │ stopped │
│ (wait) │◄────────────│ │
│ │ │ • No CPU used │
│ • CPU allocated│ │ • No RAM used │
│ • RAM allocated│ │ • No billing │
│ • Billing active│ │ • Storage kept │
└────────┬────────┘ └────────┬────────┘
│ │
│ POST /delete │
▼ ▼
┌─────────────────────────────────────────────────┐
│ destroyed │
│ (all resources released) │
└─────────────────────────────────────────────────┘
Edge Cases to Handle
| Scenario |
Proposed Behavior |
| Resume but host is full |
Queue and retry, or offer migration |
| Host rebooted while VM stopped |
VM stays stopped (no auto-start) |
| Stop during active SSH session |
Warn user, proceed with stop |
| Network/IP address |
Keep allocated (user expectation) |
| Attached volumes |
Keep attached, unmount cleanly |
| Firewall rules |
Preserve, reapply on resume |
| Load balancer membership |
Remove on stop, re-add on resume |
Billing Considerations
Proposed billing model for stopped VMs:
| Resource |
While Running |
While Stopped |
| vCPU |
Billed |
Not billed |
| Memory |
Billed |
Not billed |
| Storage |
Billed |
Billed (still allocated) |
| IPv4 address |
Billed |
Billed (still reserved) |
| IPv6 address |
Free |
Free |
This matches user expectations - you pay for what you're using, but reserved resources (storage, IP) still cost money.
Benefits
- Cost savings for users - Significant savings for dev/staging workloads
- Better resource utilization - Stopped VMs free up host capacity for others
- CI/CD optimization - Runners can be stopped between jobs
Alternatives Considered
- Snapshot and delete - More complex, longer resume time, loses ephemeral state
- Hibernate to disk - Cloud Hypervisor doesn't support this well
- Keep billing while stopped - Poor user experience, not competitive
Willingness to Contribute
I'm happy to implement this feature and submit a PR. I've reviewed the codebase and believe the implementation is straightforward given the existing stopped state foundation.
Before starting, I'd like to:
- Confirm this aligns with Ubicloud's roadmap
- Discuss the billing model for stopped VMs
- Get guidance on handling the "no capacity on resume" edge case
Related Code
prog/vm/metal/nexus.rb - VM state machine (existing stopped state at line 241)
model/vm.rb - VM model with semaphores
model/vm_host.rb - Host resource tracking
model/billing_record.rb - Billing record management
routes/project/location/vm.rb - API routes
rhizome/host/lib/cloud_hypervisor.rb - Cloud Hypervisor integration
Currently, Ubicloud VMs can only be in "running" or "deleted" states from a user's perspective. There's no way to temporarily stop a VM to save costs while preserving its storage and configuration.
I noticed there's an internal
stoppedstate andincr_stopsemaphore inprog/vm/metal/nexus.rb, but it:Proposed Solution
Implement full VM stop/resume functionality that:
Technical Approach
Based on my exploration of the codebase, here's a proposed implementation:
1. Extend
stoppedstate to release resources2. Add new
resumingstate3. Add
resumesemaphore4. Add API endpoints
5. Update VM serializer
State Diagram
Edge Cases to Handle
Billing Considerations
Proposed billing model for stopped VMs:
This matches user expectations - you pay for what you're using, but reserved resources (storage, IP) still cost money.
Benefits
Alternatives Considered
Willingness to Contribute
I'm happy to implement this feature and submit a PR. I've reviewed the codebase and believe the implementation is straightforward given the existing
stoppedstate foundation.Before starting, I'd like to:
Related Code
prog/vm/metal/nexus.rb- VM state machine (existingstoppedstate at line 241)model/vm.rb- VM model with semaphoresmodel/vm_host.rb- Host resource trackingmodel/billing_record.rb- Billing record managementroutes/project/location/vm.rb- API routesrhizome/host/lib/cloud_hypervisor.rb- Cloud Hypervisor integration