Skip to content

Multiline parsing breaks when Buffer_Chunk_Size boundary is crossed #4317

@starteleport

Description

@starteleport

Bug Report

Describe the bug
I have tail input with multiline filter. Suppose we have two subsequent records R1 and R2 that must be concatenated into one by multiline parser. If Buffer_Chunk_Size is such that R1 and R2 are in different chunks, multiline parser breaks log into two records instead of one.

To Reproduce

[3] kube.repro.log: [1636558784.789656241, {"log"=>"info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
", "stream"=>"stdout", "time"=>"2021-11-10T15:39:44.789656241Z"}]
[4] kube.repro.log: [1636558784.789656241, {"log"=>"      => SpanId:5018968b4ab3f342, TraceId:94b3b5b666fba84ca420f0a336559e7b, ParentId:0000000000000000 => ConnectionId:0HMD3I6TBKKDD => RequestPath:/ RequestId:0HMD3I6TBKKDD:00000002
      Request starting HTTP/1.1 GET http://10.0.2.125:80/ - -
"}]

Expected behavior
Lines [3] and [4] are concatenated and output as single log message.

Screenshots
Not applicable

Your Environment

  • Version used: reproduced on 1.8.3+
  • Configuration: please see gist above
  • Environment name and version (e.g. Kubernetes? What version?): local
  • Server type and version:
  • Operating System and version: macOS Monterey
  • Filters and plugins: multiline, multiline_parser

Additional context
In production this leads to multiline log records being randomly split when they shouldn't. As I understand, I could just set large Buffer_Chunk_Size but this is clearly a duct tape fix for me as it does not eliminate problem completely, just reducing the frequency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions