Skip to content

[Bug]: Python SDK sometimes crashes in streaming jobs running on 2.47.0+ SDK #27330

@tvalentyn

Description

@tvalentyn

What happened?

We suspect that an upgrade to protobuf==4.x.x in Beam SDK & worker containers (#24599) introduced a failure mode in Python streaming pipelines, where Python process sometimes crashes with AttributeError messages , segmentation faults and in some cases causes pipeline stuckness. We expect this to be resolved in Beam 2.53.0.

Batch pipelines should not be affected.

Mitigations:

  • Use apache-beam==2.53.0 or above (once released), OR

  • Use apache-beam==2.46.0 or below, OR

  • Install protobuf 3.x in the submission and runtime environment. For example, you can use a --requirements_file pipeline option with a file that includes:

     protobuf==3.20.3
     grpcio-status==1.48.2
    

    OR

  • If you must use protobuf 4.x, use a python implementation of protobuf by setting a PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python environment variable in the runtime environment. This might degrade the performance since python implementation is less efficient. For example, you could create a custom Beam SDK container from a Dockerfile that looks like the following:

     FROM apache/beam_python3.10_sdk:2.47.0
     ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
    

Example errors:

File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1655, in _create_pardo_operation
output_tags = list(transform_proto.outputs.keys())
AttributeError: 'tuple' object has no attribute 'keys'
File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1314, in get_output_coders
pcoll_id in transform_proto.outputs.items()
AttributeError: 'function' object has no attribute 'items'
File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 819, in wrapper
result = cache[args] = func(*args)
File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 979, in topological_height
for pcoll in descriptor.transforms[transform_id].outputs.values()
AttributeError: 'function' object has no attribute 'values'
File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 819, in wrapper
result = cache[args] = func(*args)
Default Python SDK image for environment is apache/beam_python3.10_sdk:2.47.0.dev
File "/usr/local/lib/python3.10/site-packages/apache_beam/runners/worker/bundle_processor.py", line 979, in topological_height
for pcoll in descriptor.transforms[transform_id].outputs.values()
AttributeError: 'traceback' object has no attribute 'values'

The pipelines usually recover after the process crash but may cause delays or pipeline stuckness.

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Labels

P1bugdone & doneIssue has been reviewed after it was closed for verification, followups, etc.python

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions