1.
Stop all hadoop 1x processes
stop-all from master
2. remove and repoint the symlink (as ubuntu user)
sudo rm /opt/hadoop
sudo ln -s /usr/local/hadoop-2.6.0 /opt/hadoop
3. Edit .profile (check the following entries)
cat /home/hadoop/.profile
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
4. Create Temp folder in HADOOP_HOME
$ mkdir -p $HADOOP_HOME/tmp
5. Make changes as mentioned below in all the machines:
$HADOOP_CONF_DIR/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
$HADOOP_CONF_DIR/hdfs-site.xml :
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.block.size</name>
<value>4194304</value>
</property>
</configuration>
$HADOOP_CONF_DIR/mapred-site.xml :
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$HADOOP_CONF_DIR/yarn-site.xml :
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
6. Add slaves
Add the slave entries in $HADOOP_CONF_DIR/slaves on all machines machine:
slave1
slave2
slave3
7. Format the namenode on master
$ bin/hadoop namenode -format
8. Start Hadoop Daemons on master
$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemons.sh start datanode
$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemons.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver
on slaves
$ sbin/yarn-daemon.sh start nodemanager
$ sbin/hadoop-daemons.sh start datanode
9. Check for jps output on slaves and master.
For master:
$ jps
6539 ResourceManager
6451 DataNode
8701 Jps
6895 JobHistoryServer
6234 NameNode
6765 NodeManager
For slaves:
$ jps
8014 NodeManager
7858 DataNode
9868 Jps
10. Create sample file to test
$ mkdir input
$ cat > input/file
This is one line
This is another one
11. Add this directory to HDFS:
$ bin/hadoop dfs -copyFromLocal input /input
bin/hdfs dfs -copyFromLocal input /input
12. Run sample program
bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.*.jar
wordcount /input /output
13. Verify output
bin/hadoop dfs -cat /output/*
14. Check the urls for the cluster health, hdfs, and jobhistory
1. http://master:50070/dfshealth.jsp
2. http://master:8088/cluster
3. http://master:19888/jobhistory (for Job History Server)
15. Verify that the output is generated
16. Pull down the output to local machine
hdfs dfs -copyToLocal output/part-r-00000
17. Stop the hadoop daemons
$ sbin/mr-jobhistory-daemon.sh stop historyserver
$ sbin/yarn-daemons.sh stop nodemanager
$ sbin/yarn-daemon.sh stop resourcemanager
$ sbin/hadoop-daemons.sh stop datanode
$ sbin/hadoop-daemon.sh stop namenode