AMC ENGINEERING COLLEGE
Dept. Of Computer Science and Engineering
Big Data Analytics [21CS71] Assignement-2
Topic: - Installation of Hadoop Kalyan G V (1AM21CS077)
Installation of Hadoop
Steps for Hadoop Installation:
1. Install Java Development Kit (JDK):
Hadoop requires Java to be installed on your system.
• To check if Java is installed:
java -version
• If Java is not installed, install it using:
sudo apt update
sudo apt install openjdk-8-jdk
• Verify the installation:
java -version
2. Install SSH:
Hadoop uses SSH to communicate between its nodes.
• Install SSH if it is not already present:
sudo apt install openssh-server
• Ensure SSH is running:
sudo systemctl start ssh
sudo systemctl enable ssh
3. Download Hadoop:
• Download Hadoop from the official Apache website:
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
• Extract the downloaded tar file:
tar -xzvf hadoop-3.3.1.tar.gz
• Move the extracted folder to /usr/local/hadoop:
sudo mv hadoop-3.3.1 /usr/local/hadoop
4. Configure Hadoop Environment Variables:
• Open the .bashrc file to add Hadoop-related environment variables:
nano ~/.bashrc
• Add the following lines at the end of the file:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_HOME=$HADOOP_HOME
• Save and close the file. Then, apply the changes:
source ~/.bashrc
5. Configure Hadoop Files:
Hadoop requires several configuration files to be set up for proper functioning.
• core-site.xml: Navigate to $HADOOP_HOME/etc/hadoop and open core-site.xml:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
• hdfs-site.xml: Edit hdfs-site.xml:
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
• mapred-site.xml: Edit mapred-site.xml:
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template
$HADOOP_HOME/etc/hadoop/mapred-site.xml
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following configuration:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
• yarn-site.xml: Edit yarn-site.xml:
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following configuration:
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
</configuration>
6. Format the Hadoop File System:
• Format the Hadoop Distributed File System (HDFS) for the first time:
hdfs namenode -format
7. Start Hadoop Services:
• Start the HDFS daemons (Namenode, Datanode):
start-dfs.sh
• Start YARN daemons (ResourceManager, NodeManager):
start-yarn.sh
8. Verify Hadoop Installation:
• Check if Hadoop is running by opening the ResourceManager and Namenode web UIs:
o ResourceManager UI: http://localhost:8088
o Namenode UI: http://localhost:9870
o
o
• Check the status of HDFS:
hdfs dfsadmin -report
• You can also run a simple Hadoop job:
yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar
pi 2 5