Skip to content

java.io.NotSerializableException when using JavaSerializer in v5.1.0, v6.1.0 #274

@kevinwallimann

Description

@kevinwallimann

Description

With the new configurable schema converter feature (#268, #269), the class DefaultSchemaConverter is instantiated by default as member variable schemaConverter in AvroDataToCatalyst. Even though AvroDataToCatalyst as a case class is serializable by default, serialization fails when using the JavaSerializer with the following error message

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: za.co.absa.abris.avro.sql.DefaultSchemaConverter
Serialization stack:
	- object not serializable (class: za.co.absa.abris.avro.sql.DefaultSchemaConverter, value: za.co.absa.abris.avro.sql.DefaultSchemaConverter@1ce2ce83)
	- field (class: za.co.absa.abris.avro.sql.AvroDataToCatalyst, name: schemaConverter, type: interface za.co.absa.abris.avro.sql.SchemaConverter)
	- object (class za.co.absa.abris.avro.sql.AvroDataToCatalyst, from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))
	- field (class: org.apache.spark.sql.catalyst.expressions.IsNotNull, name: child, type: class org.apache.spark.sql.catalyst.expressions.Expression)
	- object (class org.apache.spark.sql.catalyst.expressions.IsNotNull, isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]}))))
	- field (class: org.apache.spark.sql.execution.FilterExec, name: condition, type: class org.apache.spark.sql.catalyst.expressions.Expression)
	- object (class org.apache.spark.sql.execution.FilterExec, Filter isnotnull(from_avro(value#647, (readerSchema,{"type":"record","name":"e2etest","fields":[{"name":"field1","type":"string"},{"name":"field2","type":"int"}]})))

How to fix

Add Serializable trait to SchemaConverter to trait.
Make schemaConverter lazy

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions