Pig
For online Hadoop training, send mail to neeraj.ymca.2k6@gmail.com
          Agenda
Download Pig tar.gz file
Extract the content of Pig tar.gz
Configure pig-env.sh file
Configure pig.properties file
Start your Hadoop
Start Pig shell
Input file for Pig query
Access HDFS from Pig shell
Execute Pig commands
Store Pig query's output into HDFS
Check the output
Comparison of HBase/Hive/Pig
Download Pig from Apache website
         www.apache.org/dyn/closer.cgi/pig
Select a stable version of Pig
Click on pig-0.11.0-tar.gz
Save pig-0.11.0-tar.gz file
Untar pig-0.11.0-tar.gz file
                               5
       Configure pig-env.sh file
Create pig-env.sh file in PIG_HOME/conf
Add the following entries in PIG_HOME/conf/pig-env.sh file
export JAVA_HOME=/usr
export PIG_HOME=/home/neeraj/local_cluster_home/pig-0.11.0
export HADOOP_HOME=/home/neeraj/local_cluster_home/hadoop-1.0.3
export PIG_CLASSPATH=$HADOOP_HOME/conf/
      Configure pig.properties file
Add the following entries in PIG_HOME/conf/pig.properties file
         fs.default.name=hdfs://localhost:9000
         mapred.job.tracker=localhost:9001
Copy core-site.xml, hdfs-site.xml & mapred-site.xml file from
HADOOP_HOME/conf to PIG_HOME/conf
Start your Hadoop
      Check Hadoop processes
                           &
                    Safemode
Make sure that safe mode is off before you start Pig
Start Pig shell
Input file for Pig
Access HDFS from Pig shell
                       Execute Pig query
records = LOAD '/pig_input_files/temprature.txt' AS (year:chararray, temperature:int);
filtered_records = FILTER records BY temperature != 9999;
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records GENERATE group,MAX(filtered_records.temperature);
DUMP max_temp;
                       Execute Pig query
records = LOAD '/pig_input_files/temprature.txt' AS (year:chararray, temperature:int);
filtered_records = FILTER records BY temperature != 9999;
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records GENERATE group,MAX(filtered_records.temperature);
STORE max_temp INTO '/pig_output_files';
Pig job details
Output of Pig query
Exit from Pig shell
HBase/Hive/Pig
       HBase/Hive/Pig suitability
HBase is suitable when...
                     When you need to handle unstructured data
                     When you need to edit the data
                     When you need versioned data
Hive is suitable when...
                     When you need to handle structured data
                     When you don't need to edit the data
                     When you comfortable in SQL syntax
Pig is suitable when...
                     When you need to handle structured data
                     When you don't need to edit the data
                     When you are comfortable in scripting
                 …Thanks…
For online Hadoop training, send mail to neeraj.ymca.2k6@gmail.com