-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
[HIGH] Implement health checks and graceful shutdown for production readiness
Priority
🔴 P1 (High)
Description
The application currently lacks health check endpoints and graceful shutdown handling, which are essential for production deployment with load balancers and orchestrators like Kubernetes.
Missing Features:
- Health check endpoint for load balancers
- Readiness probe for Kubernetes
- Liveness probe for Kubernetes
- Graceful shutdown on SIGTERM/SIGINT
- Active session completion before shutdown
Acceptance Criteria
- Add
/healthendpoint returning overall health status - Add
/health/readyendpoint for readiness checks - Add
/health/liveendpoint for liveness checks - Implement graceful shutdown with configurable timeout
- Wait for active sessions to complete before shutdown
- Add metrics for shutdown duration
- Document health check behavior
- Test with Kubernetes deployment
Implementation
1. Health Check Endpoints
/// GET /health - Overall health status
async fn health_check(
State(state): State<Arc<AppState>>,
) -> Result<Json<HealthStatus>, StatusCode> {
let mut status = HealthStatus {
status: "healthy".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
uptime_seconds: state.start_time.elapsed().as_secs(),
checks: HashMap::new(),
};
// Check Redis
let redis_check = check_redis(&state.session_manager).await;
if redis_check.status != "ok" {
status.status = "degraded".to_string();
}
status.checks.insert("redis".to_string(), redis_check);
// Check NATS
let nats_check = check_nats(&state.media_metrics).await;
if nats_check.status != "ok" {
status.status = "degraded".to_string();
}
status.checks.insert("nats".to_string(), nats_check);
// Check FairPlay SDK
#[cfg(feature = "fairplay")]
if let Some(handler) = &state.media_api_state.fairplay_handler {
let fairplay_check = check_fairplay(handler).await;
if fairplay_check.status != "ok" {
status.status = "degraded".to_string();
}
status.checks.insert("fairplay".to_string(), fairplay_check);
}
if status.status == "healthy" {
Ok(Json(status))
} else {
Err(StatusCode::SERVICE_UNAVAILABLE)
}
}
/// GET /health/ready - Readiness probe
async fn readiness_check(
State(state): State<Arc<AppState>>,
) -> Result<Json<ReadinessStatus>, StatusCode> {
// Check if application is ready to serve traffic
// Check Redis connection
if state.session_manager.health_check().await.is_err() {
return Ok(Json(ReadinessStatus {
ready: false,
reason: Some("Redis not available".to_string()),
}));
}
// Check if shutting down
if state.is_shutting_down.load(Ordering::Relaxed) {
return Ok(Json(ReadinessStatus {
ready: false,
reason: Some("Server is shutting down".to_string()),
}));
}
Ok(Json(ReadinessStatus {
ready: true,
reason: None,
}))
}
/// GET /health/live - Liveness probe
async fn liveness_check() -> Json<LivenessStatus> {
Json(LivenessStatus { alive: true })
}2. Graceful Shutdown
use tokio::signal;
use std::sync::atomic::{AtomicBool, Ordering};
async fn shutdown_signal(is_shutting_down: Arc<AtomicBool>) {
let ctrl_c = async {
signal::ctrl_c()
.await
.expect("failed to install Ctrl+C handler");
};
#[cfg(unix)]
let terminate = async {
signal::unix::signal(signal::unix::SignalKind::terminate())
.expect("failed to install signal handler")
.recv()
.await;
};
tokio::select! {
_ = ctrl_c => {
log::info!("Received Ctrl+C signal");
},
_ = terminate => {
log::info!("Received SIGTERM signal");
},
}
is_shutting_down.store(true, Ordering::Relaxed);
log::info!("Shutdown signal received, starting graceful shutdown");
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// ... initialization ...
let is_shutting_down = Arc::new(AtomicBool::new(false));
// Start server with graceful shutdown
let server = axum::Server::bind(&addr)
.serve(app.into_make_service_with_connect_info::<SocketAddr>())
.with_graceful_shutdown(shutdown_signal(is_shutting_down.clone()));
log::info!("Server listening on {}", addr);
if let Err(e) = server.await {
log::error!("Server error: {}", e);
}
// Graceful shutdown sequence
log::info!("Server stopped accepting new connections");
// Wait for active sessions to complete (with timeout)
let shutdown_timeout = Duration::from_secs(30);
log::info!("Waiting up to {:?} for active sessions to complete...", shutdown_timeout);
tokio::select! {
_ = session_manager.wait_for_completion() => {
log::info!("All sessions completed gracefully");
}
_ = tokio::time::sleep(shutdown_timeout) => {
let remaining = session_manager.get_active_session_count().await?;
log::warn!("Shutdown timeout reached, {} sessions still active", remaining);
}
}
log::info!("Graceful shutdown complete");
Ok(())
}3. Kubernetes Configuration
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: arkavo-media-drm
spec:
replicas: 3
template:
spec:
containers:
- name: arkavo
image: arkavo/media-drm:latest
ports:
- containerPort: 9443
name: http
livenessProbe:
httpGet:
path: /health/live
port: 9443
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 9443
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
terminationGracePeriodSeconds: 45Testing Requirements
- Test health check endpoints return correct status
- Test readiness probe during startup
- Test readiness probe during shutdown
- Test liveness probe
- Test graceful shutdown with active sessions
- Test shutdown timeout behavior
- Test Kubernetes deployment with probes
- Load test to verify no dropped connections during shutdown
Documentation Requirements
- Document health check endpoints
- Document shutdown behavior
- Document Kubernetes configuration
- Add operational runbook
Related
- PR FairPlay SDK 26 #25 - FairPlay SDK 26 Integration
- Issue [CRITICAL] Implement OpenTDF-based Key Management for Content Protection #26 - Content key management
- Issue [HIGH] Refactor vendor SDK management to external dependency #27 - Vendor SDK management
Metadata
Metadata
Assignees
Labels
No labels