-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Description
What happened?
When the EventedPLEG feature is enabled, GenericPLEG also works for backing up the EventedPLEG. In case both PLEGs update the same pod status in the cache at the almost same time, Timestamp was introduced to PodSandboxStatusResponse in the CRI API (KEP). This timestamp (PodSandboxStatusResponse.Timestamp) is recorded in seconds while the timestamp in an event (ContainerEventResponse.CreatedAt) is recorded in nanoseconds. So, these timestamps cannot be compared accurately.
- Evented PLEG uses a timestamp in
ContainerEventResponse.CreatedAtfor the cache. This is set in nanoseconds in runtimes (CRI-O code) (though there is a bug (EventedPLEG: Pass event created timestamp correctly to cache #124297) that kubelet treats it as seconds). - Generic PLEG uses a timestamp in
PodSandboxStatusResponse.Timestampfor the cache:
timestamp = time.Unix(resp.Timestamp, 0)
This value is set in seconds by CRI-O while containerd does not look to set it (though there is another issue ([FG:InPlacePodVerticalScaling] resources in pod status are never updated if EventedPLEG is enabled #125624 (comment)) regarding containerd case).
What did you expect to happen?
PodSandboxStatusResponse.Timestamp should be recorded in nanoseconds.
How can we reproduce it (as minimally and precisely as possible)?
It is difficult to reproduce this race intentionally on Kubernetes.
The following sample code simulates the problem:
package main
import (
"fmt"
"time"
)
func main() {
eventedTime := time.Now().UnixNano()
time.Sleep(100 * time.Millisecond)
genericTime := time.Now().Unix()
cachedEventedTime := time.Unix(0, eventedTime)
cachedGenericTime := time.Unix(genericTime, 0)
if cachedEventedTime.Before(cachedGenericTime) {
fmt.Printf("Expected: evented:%v, generic:%v\n", cachedEventedTime, cachedGenericTime)
} else {
fmt.Printf("Unexpected: evented:%v, generic:%v\n", cachedEventedTime, cachedGenericTime)
}
}
$ go run timestamp.go
Unexpected: evented:2024-07-28 13:57:26.329366032 +0200 CEST, generic:2024-07-28 13:57:26 +0200 CEST
Anything else we need to know?
No response
Kubernetes version
Current master (1.31)
Cloud provider
N/A
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output hereInstall tools
Container runtime (CRI) and version (if applicable)
This issue could happen with CRI-O, which sets Timestamp to PodSandboxStatusResponse.
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status