A shared Zig library providing GPU detection, capability querying, and intelligent backend selection for GPU-accelerated Unix utilities.
- GPU Detection: Query Metal and Vulkan device capabilities at runtime
- Performance Scoring: Classify GPUs into tiers (Entry/Mid/High/Ultra)
- Auto-Selection: Intelligent backend selection based on workload characteristics
- Swappable Backends: Abstract interface for pluggable compute backends
| Backend | Platform | Status |
|---|---|---|
| Metal | macOS | Supported |
| Vulkan | Linux, macOS (via MoltenVK) | Supported |
| CUDA | Linux, Windows | Planned |
| OpenCL | Cross-platform | Planned |
Add as a dependency using zig fetch:
zig fetch --save "git+https://github.com/e-jerk/gpu#v0.1.0"Import and use:
const gpu = @import("e_jerk_gpu");
// Detect GPU capabilities
const caps = gpu.metal.detectCapabilities() orelse gpu.GpuCapabilities.default;
// Create auto-selector with hardware capabilities
var selector = gpu.AutoSelector.initWithCapabilities(caps);
// Select backend for workload
const result = selector.select(.{
.data_size = file_size,
.compute_intensity = 0.7,
});
if (result.backend == .metal) {
// Use Metal backend
} else if (result.backend == .vulkan) {
// Use Vulkan backend
} else {
// Use CPU fallback
}Runtime-detected GPU attributes:
pub const GpuCapabilities = struct {
max_threads_per_group: u32, // Max threads per threadgroup/workgroup
max_buffer_size: u64, // Maximum buffer size in bytes
recommended_memory: u64, // Device local memory size
is_discrete: bool, // Discrete vs integrated GPU
device_type: DeviceType, // discrete/integrated/virtual/cpu/other
device_name: ?[]const u8, // Optional device name for logging
};Performance Scoring:
performanceScore(): Returns 0-120+ based on hardware attributes- Discrete GPU: +50 points
- Thread count: +10/+20/+30 for 256/512/1024+ threads
- Memory: +5/+10/+20/+30 for 2/4/8/16+ GB
- Buffer size: +5/+10 for 1/4+ GB max buffer
Tier Classification:
- Entry (0-39): Intel/AMD integrated, base Apple Silicon
- Mid (40-69): Apple Silicon Pro, mainstream discrete
- High (70-99): Apple Silicon Max, high-end discrete
- Ultra (100+): Apple Silicon Ultra, workstation GPUs
Intelligent backend selection based on workload characteristics:
pub const WorkloadInfo = struct {
data_size: usize, // Bytes to process
compute_intensity: f32, // 0.0 (trivial) to 1.0 (intensive)
parallelizable: bool, // Benefits from parallelism?
memory_bound: bool, // Memory-bound vs compute-bound
batch_size: usize, // Number of independent operations
custom_bias: i32, // Manual adjustment (-10 to +10)
};Selection Algorithm:
- Hard Limits: Reject GPU if data < min_gpu_size or > max_gpu_size
- Parallelism Score: +3 for parallelizable, -5 for sequential
- Data Size Score: +1 for ≥1MB, +2 for ≥4MB
- Compute Intensity: +4 for ≥0.8, +2 for ≥0.5, -2 for <0.2
- Memory Bound: ±1 based on data size
- Batch Size: +1 for ≥10, +3 for ≥100
- Hardware Bias: +4 (ultra) to -2 (entry) from GPU tier
Threshold Adjustments by Tier:
| Tier | Min GPU Size | GPU Bias |
|---|---|---|
| Ultra | 32KB | +4 |
| High | 64KB | +2 |
| Mid | 128KB | 0 |
| Entry | 256KB | -2 |
Metal (metal.zig):
- Queries
MTLDeviceproperties on macOS - Detects Apple Silicon vs Intel/AMD
- Reports recommendedWorkingSetSize, maxThreadsPerThreadgroup
Vulkan (vulkan.zig):
- Queries
VkPhysicalDevicePropertiesandVkPhysicalDeviceMemoryProperties - Detects device type (discrete, integrated, etc.)
- Reports maxComputeWorkGroupInvocations, maxStorageBufferRange
Abstract interface for compute backends:
pub const ComputeBackend = struct {
pub const VTable = struct {
init: *const fn(allocator: Allocator) anyerror!*anyopaque,
deinit: *const fn(self: *anyopaque) void,
createBuffer: *const fn(self: *anyopaque, size: usize) anyerror!BufferHandle,
execute: *const fn(self: *anyopaque, ...) anyerror!ComputeResult,
// ...
};
};
pub const BackendRegistry = struct {
backends: std.ArrayList(ComputeBackend),
pub fn register(self: *@This(), backend: ComputeBackend) !void;
pub fn getBestBackend(self: @This()) ?ComputeBackend;
};// Simple one-liner for common cases
const backend = gpu.quickSelect(
file_size, // Data size in bytes
0.7, // Compute intensity
detected_caps, // Optional capabilities
);var config = gpu.AutoSelectConfig{
.min_gpu_size = 64 * 1024, // 64KB minimum
.max_gpu_size = 32 * 1024 * 1024, // 32MB maximum
.gpu_bias = 2, // Prefer GPU
.preferred_gpu_backend = .metal, // Prefer Metal over Vulkan
};
var selector = gpu.AutoSelector.initWithConfig(config);
const result = selector.select(workload);pub const SelectionResult = struct {
backend: Backend, // .cpu, .metal, .vulkan, etc.
gpu_score: i32, // Positive = GPU better, negative = CPU better
reason: []const u8, // Human-readable explanation
};const gpu = @import("e_jerk_gpu");
pub fn processFile(path: []const u8, options: Options) !void {
const file_size = try getFileSize(path);
// Detect hardware
const caps = gpu.metal.detectCapabilities() orelse
gpu.vulkan.detectCapabilities() orelse
gpu.GpuCapabilities.default;
// Create selector
var selector = gpu.AutoSelector.initWithCapabilities(caps);
// Describe workload
const workload = gpu.WorkloadInfo{
.data_size = file_size,
.compute_intensity = if (options.case_insensitive) 0.8 else 0.5,
.parallelizable = true,
};
// Select backend
const result = selector.select(workload);
if (options.verbose) {
std.debug.print("Backend: {s} (score: {d}, reason: {s})\n",
.{@tagName(result.backend), result.gpu_score, result.reason});
}
// Dispatch to appropriate implementation
switch (result.backend) {
.metal => try processWithMetal(path, options),
.vulkan => try processWithVulkan(path, options),
.cpu => try processWithCpu(path, options),
else => try processWithCpu(path, options),
}
}Source code: Unlicense (public domain) Binaries: GPL-3.0-or-later