SLA

Pemantauan

Definisi

Service Level Agreement. Kontrak formal antara penyedia layanan dan pelanggan yang mendefinisikan jaminan kinerja terukur seperti persentase uptime (misalnya 99,99%), waktu respons, dan kredit remediasi untuk pelanggaran.

What SLAs Define

A Service Level Agreement is a formal contract — internal or external — that specifies the performance and availability standards a service must meet. Typical network and infrastructure SLAs define UptimeThe percentage of time a system or service is operational and accessible. Expressed in 'nines' (e.g., 99.99% = 'four nines' = ~52 minutes of downtime per year). A primary metric in SLAs for internet services. percentages (e.g., 99.9% equals roughly 8.7 hours of downtime per year), maximum LatencyThe time delay for a data packet to travel from source to destination, typically measured in milliseconds (ms). Lower latency is critical for real-time applications like video calls, gaming, and financial trading. thresholds, Packet LossThe percentage of data packets that fail to reach their destination, typically caused by network congestion, faulty hardware, or wireless interference. Even 1-2% packet loss can noticeably degrade voice and video quality. limits, and response times for support incidents. Each metric usually includes measurement methodology, reporting periods, and financial remedies for violations.

SLOs and SLIs

SLAs are often built on two supporting concepts. Service Level Indicators (SLIs) are the actual measurements — request success rate, p99 LatencyThe time delay for a data packet to travel from source to destination, typically measured in milliseconds (ms). Lower latency is critical for real-time applications like video calls, gaming, and financial trading., or DNS resolution time. Service Level Objectives (SLOs) are internal targets set more conservatively than the SLA to create an error budget. Teams spend error budget on risky deployments and treat near-exhaustion as a signal to freeze changes and prioritize reliability work.

Measuring and Reporting SLA Compliance

Accurate SLA measurement requires continuous Synthetic MonitoringA proactive monitoring approach that simulates user interactions (HTTP requests, browser transactions, API calls) from distributed locations to measure availability and performance before real users are affected. from locations that represent real user traffic, not just internal health checks. ObservabilityThe ability to understand a system's internal state from its external outputs, built on three pillars: metrics (numeric measurements), logs (event records), and traces (request paths). Goes beyond traditional monitoring by enabling root cause analysis. tooling aggregates these signals to produce compliance reports. When an ISPInternet Service Provider. A company that provides internet access to consumers and businesses, assigning public IP addresses and routing traffic to the wider internet. Examples include Comcast, AT&T, and SK Broadband. or cloud provider offers an SLA, independent verification using tools like Ping Test gives customers ground-truth data to compare against provider claims. MTTRMean Time to Repair (or Recover). The average time required to restore a system to full operation after a failure. A key reliability metric used alongside MTTF and MTBF to measure and improve incident response effectiveness. and MTTFMean Time to Failure. The average time a non-repairable system or component operates before its first failure. For repairable systems, the related metric MTBF (Mean Time Between Failures) includes repair time. directly determine whether an infrastructure team can realistically honor a given SLA commitment.

Istilah Terkait

Lainnya di Pemantauan