fix: flush pcapNG packet batch before returning on chunk boundary#3857
Merged
Conversation
Contributor
|
thanks!
|
0a74355 to
9d226ef
Compare
9d226ef to
313d8a4
Compare
Contributor
Author
Thank you for the review! I went ahead and removed the testing files altogether. I agree addition of large PCAP files to repository probably isn't worthwhile - I just wanted to make sure I provided collateral to reproduce the behavior I was trying to fix. It's still in the PRs commit history if you want to verify fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Clearly describe the problem and solution
Problem: arkime_reader_scheme_processNG() silently drops packets when processing pcapNG files larger than the scheme reader's 1MB chunk buffer.
We discovered this while implementing Arkime command socket mode (--command-socket) for our SPI generation pipeline. After deploying socket mode (which uses --scheme), we observed that some pcapNG files produced
dramatically fewer sessions than the same files processed via legacy mode (--libpcap). For one representative file: --scheme produced 783 sessions vs --libpcap's 2,665 sessions. The recv counter showed all 16,571 packets
were received, but only 2,611 were processed — the rest vanished without any error or warning.
Debugging the state machine with instrumentation revealed that when a 1MB chunk boundary falls mid-EPB-header or mid-SHB-header, three code paths in processNG return 0 without jumping to the processNG: label. This skips
arkime_packet_batch_flush(), causing all packets accumulated in the current chunk's local batch to be silently lost. The next chunk starts with a fresh empty batch.
The three affected paths are in:
Other "need more data" paths in the same function correctly use goto processNG to flush the batch before returning. These three were introduced in #3335 ("Scheme pcapng") and appear to be an oversight — the original
non-pcapNG scheme reader (commit e84edcf) correctly used goto process for all mid-packet returns.
Fix: Change all three return 0 to goto processNG so the packet batch is always flushed before returning, matching the pattern used by all other chunk-boundary paths in the same function.
Test: Added tests/pcapng-chunk-boundary.t with a synthetic pcapNG file (12,115 packets across 10 flows) specifically constructed so the 1MB chunk boundary falls 15 bytes into an Enhanced Packet Block header. Without the
fix: 201 packets processed. With the fix: 12,115 packets processed.
Relevant issue number(s) if applicable
Related to #3335 (Scheme pcapng) which introduced the affected code paths.
Did you run npm run lint from the viewer or parliament directory (whichever you are making changes to) and correct any errors
N/A — changes are in capture C code only, no viewer/parliament changes.
Did you Ensure that all tests still pass by navigating to the tests directory and running ./tests.pl --viewer
Yes
License
I confirm that this contribution is made under an Apache 2.0 license and that I have the authority necessary to make this contribution on behalf of its copyright owner.