To build an application, the Scala Build Tool should be installed. Check the installed version of SBT by command:
sbt --versionLater, go to the project's main directory and type:
sbt assemblyTo be done.
To create a standalone cluster on Linux, use:
# Start master node
./start-master.sh
# Start worker node
./start-slave.sh <master-URL>According to Apache Spark documentation:
"The launch scripts do not currently support Windows.
To run a Spark cluster on Windows, start the master and workers by hand."
To create a standalone cluster on Windows, use:
# Start master node
spark-class org.apache.spark.deploy.master.Master
# Start worker node
spark-class org.apache.spark.deploy.worker.Worker <master-URL>To run an application using Apache Spark, use spark-submit script:
spark-submit --class "app.BioApp" --master <master-URL> <path-to-JAR>Additional flags which could be useful while running an application:
--verbose, -v - enable debug output
--total-executor-cores [NUM] - total number of executors
--executor-cores [NUM] - number of cores used by each executor
For other options, see spark-submit --help.
To be done.
Information about running nodes are available in a browser on <master-URL>, which was displayed during starting the master node.
localhost:<master-port>By default, it is binded to the port localhost:8080
To access Spark Web UI and display information about jobs, stages, storage, etc., open a browser and go to:
localhost:4040If you have more than one application up at the same time, they are binded to the subsequent ports: localhost:4041, localhost:4042,
and so on.
Note: this UI is available only when the application is running. To restore UI from already finished applications, see Monitoring and instrumentation page.
Web UI for HDFS management is accesible via:
localhost:9870The following problems may occur while submitting the application to Apache Spark:
Ensure that correct class and package names are given as arguments to the spark-submit and that chosen class has main function implemented.
The most common reason for this issue is IP address mismatch between the value given in SparkController.scala while building SparkSession and the value of env variable SPARK_LOCAL_IP set in ${SPARK_HOME}/conf/spark-env.sh
Step 1. Install winutils.exe file from a directory dedicated for used Apache Spark version from this website.
Step 2. Create a directory C:\\Program Files\Hadoop\bin and place the winutils.exe file inside.
Issue 4 - java.lang.NoSuchMethodError: com.google.common.hash.Hasher.putUnencodedChars while running MHAP
This issue occurs when your Apache Spark uses too low version of Guava library.
It could be checked in Spark WebUI, in Environment > Classpath Entries tab.
Method putUnencodedChars was added to Gauva in release 15.
The most straightforward solution is to download JAR of the latest version of Guava
and replace the old one in the ${SPARK_HOME}\jars directory.
To run one of the provided examples, build an application according to instructions above and use:
spark-submit --class "examples.<example-name>" --master <master-URL> <path-to-JAR>Apache Spark documentation
sbt Reference Manual
Scala documentation