enhancement(file source):skip redundant file fingerprinting for already-watched files#25602
enhancement(file source):skip redundant file fingerprinting for already-watched files#25602vparfonov wants to merge 2 commits into
Conversation
…iles On each glob cycle, FileServer fingerprinted every file returned by the paths provider, even files already being actively watched. Each fingerprint involves syscalls (open, seek, read magic bytes, seek, read first line, EOF check). On clusters with 500+ pods this caused thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting etcd on control plane nodes. Add a path-based reverse lookup before fingerprinting. If a file path is already tracked in fp_map and hasn't been truncated (file size >= read position), skip fingerprinting entirely. Truncated files still fall through to full fingerprinting to preserve correct behavior. Measured impact (500 files, 35s trace): - open: 1,503 → 5 (99.7% reduction) - lseek: 3,000 → 0 (100% reduction) - read: 4,500 → 2,500 (44% reduction, remaining are data reads) - total: 12,033 → 2,555 (78.8% reduction) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04bac4e09f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efaf039b73
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if let Some((file_id, file_position, devno, inode)) = watched_paths.get(&path) | ||
| && let Ok(metadata) = fs::metadata(&path).await | ||
| && metadata.portable_dev() == *devno | ||
| && metadata.portable_ino() == *inode | ||
| && metadata.len() >= *file_position |
There was a problem hiding this comment.
Keep fingerprinting files that may be copy-truncated
When checksum fingerprinting is configured, copytruncate-style rotation keeps the same dev/inode while replacing the file contents; if the file is refilled to at least the old read offset before the next discovery pass, this condition treats it as unchanged and skips fingerprinting. The previous path would detect the changed first-lines checksum and start a new watcher from the beginning, but the fast path leaves the existing watcher at file_position, causing the beginning of the new log file to be dropped in that rotation scenario.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This optimization skips fingerprinting for files that have the same path, inode, and size >= read position. There is one edge case it does not cover: copytruncate-style log rotation where the file is truncated and refilled past the old read position within a single glob cycle (60s). In that scenario, the fast path would not detect the content change and the beginning of the new log content could be silently missed.
This does NOT affect the kubernetes_logs source — kubelet always creates new numbered files (new inode) on rotation, which the inode check catches. It only affects the generic file source with logrotate copytruncate and very high write throughput.
Before this change, fingerprinting every file on every cycle would catch this case (by detecting the changed first-line CRC). The tradeoff is ~79% fewer disk I/O syscalls in exchange for this narrow edge case.
If needed, a periodic forced re-fingerprint (e.g., every Nth cycle) could be added as a follow-up to cover this scenario without losing the I/O improvement.
WDYT, Vector team?
Summary
Every glob cycle (default 60s),
FileServerfingerprints every file returned by the paths provider, even files already tracked infp_map. Each fingerprint involves ~6 syscalls (open, seek etc).On large Kubernetes clusters with 500+ pods, this causes thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting other node services (e.g., etcd on control plane nodes).
Changes
Before the glob loop, build a reverse lookup (
path → fingerprint) fromfp_map. When glob returns a path, check the reverse map first. If the path is already tracked, skip fingerprinting entirely — no file I/O needed.To handle file truncation (same path, different content), we do one
stat()call to compare the current file size against our read position. If the file is smaller than where we last read (metadata.len() < file_position), it was truncated — fall through to full fingerprinting so Vector detects the change and re-reads from the beginning. Ifstat()fails (file deleted, permissions error), we also fall through to full fingerprinting.Measured Impact (500 files, 35s trace,
glob_minimum_cooldown_ms = 10000 (10s))Vector configuration
How did you test this PR?
Exsited test passed - no regression
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.