Installing Hadoop in Pseudo Distributed Mode
Follow the steps given below to install Hadoop 2.4.1 in pseudo distributed
mode.
Step 1 − Setting Up Hadoop
You can set Hadoop environment variables by appending the following
commands to ~/.bashrc file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
Now apply all the changes into the current running system.
$ source ~/.bashrc
Step 2 − Hadoop Configuration
You can find all the Hadoop configuration files in the location
$HADOOP_HOME/etc/hadoop. It is required to make changes in those
configuration files according to your Hadoop infrastructure.
$ cd $HADOOP_HOME/etc/hadoop
In order to develop Hadoop programs in java, you have to reset the java
environment variables in hadoop-env.sh file by
replacing JAVA_HOME value with the location of java in your system.
export JAVA_HOME=/usr/local/jdk1.7.0_71
The following are the list of files that you have to edit to configure
Hadoop.
core-site.xml
The core-site.xml file contains information such as the port number used
for Hadoop instance, memory allocated for the file system, memory limit
for storing the data, and size of Read/Write buffers.
Open the core-site.xml and add the following properties in between
<configuration>, </configuration> tags.
<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>
hdfs-site.xml
The hdfs-site.xml file contains information such as the value of
replication data, namenode path, and datanode paths of your local file
systems. It means the place where you want to store the Hadoop
infrastructure.
Let us assume the following data.
dfs.replication (data replication value) = 1
(In the below given path /hadoop/ is the user name.
hadoopinfra/hdfs/namenode is the directory created by hdfs file
system.)
namenode path = //home/hadoop/hadoopinfra/hdfs/namenode
(hadoopinfra/hdfs/datanode is the directory created by hdfs file
system.)
datanode path = //home/hadoop/hadoopinfra/hdfs/datanode
Open this file and add the following properties in between the
<configuration> </configuration> tags in this file.
<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.name.dir</name>
      <value>file:///home/hadoop/hadoopinfra/hdfs/namenode
</value>
   </property>
   <property>
       <name>dfs.data.dir</name>
       <value>file:///home/hadoop/hadoopinfra/hdfs/datanode
</value>
   </property>
</configuration>
Note − In the above file, all the property values are user-defined and you
can make changes according to your Hadoop infrastructure.
yarn-site.xml
This file is used to configure yarn into Hadoop. Open the yarn-site.xml file
and add the following properties in between the <configuration>,
</configuration> tags in this file.
<configuration>
   <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
   </property>
</configuration>
mapred-site.xml
This file is used to specify which MapReduce framework we are using. By
default, Hadoop contains a template of yarn-site.xml. First of all, it is
required to copy the file from mapred-site.xml.template to mapred-
site.xml file using the following command.
$ cp mapred-site.xml.template mapred-site.xml
Open mapred-site.xml file and add the following properties in between
the <configuration>, </configuration>tags in this file.
<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>