This repository contains the AnyBlox plugin for Spark.
You need Java 11, Maven 3.9, SBT 1.10, and Scala 2.12. We recommend SKDMan for managing those.
After that simply run sbt package. The .jar file will be produced in target/scala-2.12.
The plugin needs to be registered with Spark in spark-defaults.conf:
spark.plugins org.anyblox.spark.AnyBloxPluginYou will need the following Arrow jars to be plugged in as well:
You can then run spark-shell by passing required packages and jars:
/opt/spark/bin/spark-shell --packages org.scala-lang:toolkit_2.12:0.1.7 --jars "/anyblox/anyblox-spark_2.12-0.1.0-SNAPSHOT.jar,/arrow/arrow-c-data-18.1.0.jar,/arrow/arrow-vector-18.1.0.jar"Open .any files as dataframes using standard Spark syntax:
val df = spark.read.format("anyblox").load("/path/to/data.any")You can use the dataframe like any other Spark df, e.g. create a view and query it with SQL:
df.createTempView("myview")
spark.sql("SELECT * FROM myview").show