Getting Started

This Pig branch adds a Spark execution mode (Spork!).

Getting Started

Dependencies

Spark version 0.9.0
Hadoop 1.0.4
Java 7
Git client
ant

Building spork

Download the code and build spork using ant:

$ git clone https://github.com/sigmoidanalytics/spork.git -b spork-0.9
$ ant jar-all

Configuring spork

Export below variables into shell or in your bash profile:

export SPARK_HOME=/path/to/spark
export HADOOP_HOME=/path/to/hadoop
export HADOOP_CONF_DIR=/path/to/hadoop/conf
export BROADCAST_MASTER_IP="SET IT AS THE SPARK_MASTER_IP"      # localhost
export BROADCAST_PORT=6000
export SPARK_MASTER="set spark master here"     # local or spark://localhost:7077

Run sample script

Put data into hdfs:

$ hadoop fs -mkdir /pig-test/input/
$ hadoop fs -put ./tutorial/data/excite-small.log /pig-test/input/

Start pig and paste the script:

$ ./pig-spark
raw = LOAD '/pig-test/input/excite-small.log' USING PigStorage('\t') AS (user: chararray, time:chararray, query:chararray);
queries = FOREACH raw GENERATE query;
distinct_queries = DISTINCT queries;
STORE distinct_queries INTO '/pig-test/output/';

TODO

Migrate to Spark-1.0
Create spark planner instead of using mapreduce planner
Get e2e tests to work with Spork and create a benchmark report

Please feel free to file issues on our github repo (https://github.com/sigmoidanalytics/spork) or mail us at: spark@sigmoidanalytics.com.

Name		Name	Last commit message	Last commit date
Latest commit History 2,171 Commits
.eclipse.templates		.eclipse.templates
bin		bin
conf		conf
contrib		contrib
ivy		ivy
lib-src		lib-src
lib		lib
license		license
shims		shims
src		src
test		test
tutorial		tutorial
.gitignore		.gitignore
Apr		Apr
CHANGES.txt		CHANGES.txt
KEYS		KEYS
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
PIG-README.txt		PIG-README.txt
README.md		README.md
RELEASE_NOTES.txt		RELEASE_NOTES.txt
SPARK_README.txt		SPARK_README.txt
Test.txt		Test.txt
autocomplete		autocomplete
build.xml		build.xml
doap_Pig.rdf		doap_Pig.rdf
ivy.xml		ivy.xml
pig-spark		pig-spark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Dependencies

Building spork

Configuring spork

Run sample script

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Dependencies

Building spork

Configuring spork

Run sample script

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages