0% found this document useful (0 votes)

23 views11 pages

Unit 3 PART 2

This document provides a comprehensive guide for installing Hadoop, detailing system requirements, installation steps, and configuration settings for both Hadoop and YARN. It covers prerequisites like Java installation, environment variable setup, and configuration of essential XML files for Hadoop's operation. Additionally, it outlines how to start and monitor Hadoop services, emphasizing the importance of YARN in managing resources and scheduling jobs in a Hadoop cluster.

Uploaded by

Abdul Samad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views11 pages

Unit 3 PART 2

Uploaded by

Abdul Samad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

.

Pre-requisites for Installing Hadoop

Before installing Hadoop, make sure the following software and system requirements are
met:

System Requirements:

 Operating System: Linux (Ubuntu or CentOS) is the most commonly used for
Hadoop installations. It can also be installed on Windows using Cygwin, but Linux is
preferred for production environments.
 Memory: At least 4GB of RAM.
 Disk Space: At least 10GB of free disk space.
 Java: Hadoop requires Java 8 or later. Ensure Java is installed on your system.
 SSH: Hadoop requires SSH for communication between the master and slave nodes
(even in a single-node setup).

2. Step-by-Step Guide to Install Hadoop

Step 1: Install Java

Hadoop requires Java to be installed on the system. You can check whether Java is installed
by typing:

bash
CopyEdit
java -version

If Java is not installed, you can install it as follows:

 For Ubuntu:

bash
CopyEdit
sudo apt update
sudo apt install openjdk-8-jdk

Istallation, set the JAVA_HOME environment variable. For Ubuntu, you can do this by editing
the .bashrc file:

bash
CopyEdit
nano ~/.bashrc

Add the following line:

bash
CopyEdit
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

Then, source the .bashrc file to apply the changes:

bash
CopyEdit
source ~/.bashrc

Step 2: Download Hadoop

Go to the official Apache Hadoop website (https://hadoop.apache.org/) and download the

latest stable version of Hadoop. Alternatively, you can download Hadoop using wget:

bash
CopyEdit
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-
3.3.1.tar.gz

Once downloaded, extract the tarball:

bash
CopyEdit
tar -xvzf hadoop-3.3.1.tar.gz

Move the extracted files to a directory of your choice:

bash
CopyEdit
sudo mv hadoop-3.3.1 /usr/local/hadoop

Step 3: Set Up Environment Variables

Add the Hadoop environment variables in the .bashrc file:

bash
CopyEdit
nano ~/.bashrc

Add the following lines at the end of the file:

bash
CopyEdit
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Then, apply the changes:

bash
CopyEdit
source ~/.bashrc

Step 4: Configure Hadoop

Before starting Hadoop, several configuration files need to be modified. These files are
located in the $HADOOP_HOME/etc/hadoop directory.
1. hadoop-env.sh

Edit the hadoop-env.sh file to specify the JAVA_HOME path:

bash
CopyEdit
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Find the line with # export JAVA_HOME and update it as follows:

bash
CopyEdit
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
2. core-site.xml

The core-site.xml file contains the configuration for Hadoop's core settings, including the
file system URI. Edit the file as follows:

bash
CopyEdit
nano $HADOOP_HOME/etc/hadoop/core-site.xml

Add the following configuration inside the <configuration> tags:

xml
CopyEdit
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
3. hdfs-site.xml

The hdfs-site.xml file contains the configuration for Hadoop's HDFS. Edit the file to
configure the directories for the NameNode and DataNode:

bash
CopyEdit
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following configuration:

xml
CopyEdit
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
4. yarn-site.xml

The yarn-site.xml file configures Hadoop YARN (Yet Another Resource Negotiator). Edit
it to configure the ResourceManager and NodeManager settings:

bash
CopyEdit
nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following configuration:

xml
CopyEdit
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

3. Format the Hadoop Filesystem (HDFS)

Before you start HDFS, you must format the filesystem. Run the following command:

bash
CopyEdit
hdfs namenode -format

This will initialize the HDFS file system.

4. Start Hadoop Daemons

After configuring Hadoop, you can start the necessary daemons to launch Hadoop:

1. Start HDFS:

bash
CopyEdit
start-dfs.sh

This will start the NameNode and DataNode services.

2. Start YARN:

bash
CopyEdit
start-yarn.sh
This will start the ResourceManager and NodeManager services.

3. Check the Status:

To check if the Hadoop daemons are running, use the following commands:

bash
CopyEdit
jps

This will list the running Java processes. Look for the following processes to ensure that
Hadoop is running:

 NameNode
 DataNode
 ResourceManager
 NodeManager

5. Access the Hadoop Web Interfaces

Hadoop provides web interfaces for monitoring and managing HDFS and YARN:

 HDFS NameNode UI: http://localhost:9870

 YARN ResourceManager UI: http://localhost:8088

You can open these URLs in your web browser to check the status and health of your Hadoop
services.

6. Stop Hadoop Services

Once you are done with your work, you can stop the Hadoop daemons with the following
commands:

1. Stop HDFS:

bash
CopyEdit
stop-dfs.sh

2. Stop YARN:

bash
CopyEdit
Stop-yarn.sh
YARN Configuration in Hadoop

YARN (Yet Another Resource Negotiator) is a key component of the Hadoop ecosystem that
manages resources and schedules jobs across the cluster. It separates the resource
management and job scheduling functionalities in Hadoop, which were previously handled by
MapReduce. YARN allows multiple applications to share resources in the Hadoop cluster
efficiently.

This lecture will cover the key configuration steps involved in setting up YARN on a Hadoop
cluster.

1. Understanding YARN Architecture

YARN consists of the following components:

 ResourceManager (RM): Manages resources in the cluster and schedules

applications.
 NodeManager (NM): Manages the resources on individual nodes and reports to the
ResourceManager.
 ApplicationMaster (AM): Manages the lifecycle of an application, including job
scheduling, monitoring, and resource negotiation.
 Container: A resource allocation for running an application on a node.

2. Key Configuration Files for YARN

The configuration of YARN is primarily done through XML configuration files located in the
etc/hadoop directory. The main configuration files for YARN include:

 yarn-site.xml: The primary configuration file for YARN.

 mapred-site.xml: Contains the configuration for MapReduce-related tasks in YARN.

3. Configuration for YARN in yarn-site.xml

The yarn-site.xml file contains important configuration parameters for YARN. Below are
the key configurations to set up YARN:

Edit yarn-site.xml

1. ResourceManager Address: Configure the ResourceManager’s address. This is the

central point that clients and NodeManagers will connect to.

xml
CopyEdit
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>

2. ResourceManager Web UI: This configuration specifies the web UI of the

ResourceManager, which allows you to monitor YARN resource usage.

xml
CopyEdit
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>

3. NodeManager Local Directory: Defines the directory where the NodeManager

stores temporary data.

xml
CopyEdit
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/nm-local-dir</value>
</property>

4. NodeManager Log Directory: Defines the directory where the NodeManager stores
log files.

xml
CopyEdit
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs</value>
</property>

5. NodeManager Resource Memory: Configures the amount of memory the

NodeManager can allocate for containers on each node.

xml
CopyEdit
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>

6. NodeManager Virtual Cores: This parameter controls the number of virtual cores
(CPU) available for containers on the NodeManager.

xml
CopyEdit
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
7. YARN ResourceManager Scheduler: You can configure the YARN scheduler
(default is CapacityScheduler).

xml
CopyEdit
<property>
<name>yarn.scheduler.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capaci
ty.CapacityScheduler</value>
</property>

8. ResourceManager Admins: You can define a list of administrators who can access
the ResourceManager's web interface and manage jobs.

xml
CopyEdit
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8050</value>
</property>

4. Configuration for MapReduce in mapred-site.xml

The mapred-site.xml file is used for configuring MapReduce, but since YARN runs
MapReduce tasks, certain configurations here are also necessary.

Edit mapred-site.xml

1. MapReduce Framework: In YARN, MapReduce tasks run in containers, so you

need to set the framework to YARN.

xml
CopyEdit
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

2. JobHistory Server: Configure the JobHistory server to keep track of job history in
the YARN environment.

xml
CopyEdit
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

3. JobHistory Web UI: Configure the JobHistory web UI so that you can monitor jobs.

xml
CopyEdit
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>

4. ResourceManager for MapReduce: Specify the ResourceManager for managing

MapReduce tasks.

xml
CopyEdit
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

5. Starting YARN Daemons

Once the configuration files are properly set, start the necessary YARN daemons:

1. Start ResourceManager:

bash
CopyEdit
start-resourcemanager.sh

2. Start NodeManager:

bash
CopyEdit
start-nodemanager.sh

3. Start YARN (HDFS services must be running):

bash
CopyEdit
start-yarn.sh

4. Check YARN Status:

Use the jps command to check if YARN processes are running, such as ResourceManager,
NodeManager, etc.

bash
CopyEdit
jps

6. Monitoring YARN

YARN provides web interfaces for monitoring resources and running applications:

 ResourceManager Web UI: http://localhost:8088/

 NodeManager Web UI: http://localhost:8042/
These interfaces allow you to view job statuses, resource allocation, and overall cluster
health.

7. YARN Logs

To access logs for a specific YARN application, use the following command:

bash
CopyEdit
yarn logs -applicationId <application_123456789_0001>

This command will provide details about the logs for the given application.

YARN is a powerful resource management system that allows multiple applications to run
concurrently and efficiently share resources across the Hadoop cluster. Proper configuration
of YARN, including ResourceManager, NodeManager, and MapReduce settings, is crucial to
optimize resource usage and improve the overall performance of the Hadoop cluster. By
understanding and configuring YARN correctly, organizations can handle large-scale data
processing with ease, supporting diverse applications and workloads.

Sample Map Reduce program Application

Sample input

output

TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
Hadoop Installation Cluster
No ratings yet
Hadoop Installation Cluster
9 pages
Lab 1
No ratings yet
Lab 1
12 pages
Install Hadoop
No ratings yet
Install Hadoop
8 pages
How To Install and Set Up A 3-Node Hadoop Cluster
No ratings yet
How To Install and Set Up A 3-Node Hadoop Cluster
36 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Ex 1
No ratings yet
Ex 1
5 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Hadoop Installation Steps
No ratings yet
Hadoop Installation Steps
4 pages
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
No ratings yet
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
7 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Hadoop Setup Guide for Ubuntu 16.04/18.04
No ratings yet
Hadoop Setup Guide for Ubuntu 16.04/18.04
20 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop
No ratings yet
Hadoop
18 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
Start Hadoop
No ratings yet
Start Hadoop
4 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
33 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
6 Hadoop
No ratings yet
6 Hadoop
20 pages
Hadoop Installation Guide
No ratings yet
Hadoop Installation Guide
18 pages
Assignment Tanupriya BDDV
No ratings yet
Assignment Tanupriya BDDV
8 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
CLD 7
No ratings yet
CLD 7
3 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Practical 5
No ratings yet
Practical 5
3 pages
Experiment-2 BDA Lab
No ratings yet
Experiment-2 BDA Lab
13 pages
BDA Unit-4
No ratings yet
BDA Unit-4
38 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
Hadoop Setup Guide for Windows Users
No ratings yet
Hadoop Setup Guide for Windows Users
29 pages
Bda Lab Manual Print 3.6.24
No ratings yet
Bda Lab Manual Print 3.6.24
45 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Bdamanual
No ratings yet
Bdamanual
8 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
Week 1 Lab
No ratings yet
Week 1 Lab
8 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Hadoop Setup for CSE Students
No ratings yet
Hadoop Setup for CSE Students
17 pages
Installation of Hadoop
No ratings yet
Installation of Hadoop
6 pages
$ Sudo Apt-Get Install Oracle-Java8-Installer
No ratings yet
$ Sudo Apt-Get Install Oracle-Java8-Installer
4 pages
Hadoop HDFS
No ratings yet
Hadoop HDFS
3 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Bda Internal 1
No ratings yet
Bda Internal 1
22 pages
Hadoop
No ratings yet
Hadoop
27 pages
Big Data
No ratings yet
Big Data
5 pages
Unix Commands Part 2
No ratings yet
Unix Commands Part 2
37 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
Exp 1 1
No ratings yet
Exp 1 1
24 pages
Online:: Setting Up The Environment
No ratings yet
Online:: Setting Up The Environment
9 pages
Bda Lab Record
No ratings yet
Bda Lab Record
60 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
T-Swing - Exit Clearance Form
No ratings yet
T-Swing - Exit Clearance Form
2 pages
Digitech XC4881 - Negative Film - Slide Scanner - User Manual
No ratings yet
Digitech XC4881 - Negative Film - Slide Scanner - User Manual
30 pages
ASEAN AWS Academy Quickstart
No ratings yet
ASEAN AWS Academy Quickstart
12 pages
FabFilter Total Bundle 02.02.2016 VST2, VST3, AAX, RTAS x86 x64 No Install (2016)
No ratings yet
FabFilter Total Bundle 02.02.2016 VST2, VST3, AAX, RTAS x86 x64 No Install (2016)
4 pages
Imp Manual Fourth Edition Rev 4
No ratings yet
Imp Manual Fourth Edition Rev 4
46 pages
Roadmap SH
No ratings yet
Roadmap SH
1 page
Route Survey Literature Review
100% (2)
Route Survey Literature Review
8 pages
Hema R
No ratings yet
Hema R
2 pages
PN 2170004002 - U120 Smart Mission Urine Analyzer INTL Sale Sheet - v6
No ratings yet
PN 2170004002 - U120 Smart Mission Urine Analyzer INTL Sale Sheet - v6
2 pages
Attempt Any Five Questions (5 6 30 Marks)
No ratings yet
Attempt Any Five Questions (5 6 30 Marks)
3 pages
Commcare HQ Readthedocs Io en Latest
No ratings yet
Commcare HQ Readthedocs Io en Latest
458 pages
DSE860 DSE865 Data Sheet
No ratings yet
DSE860 DSE865 Data Sheet
1 page
T13-005 BACnet Communications Troubleshooting-1
No ratings yet
T13-005 BACnet Communications Troubleshooting-1
4 pages
2021 04 19 Reference List
No ratings yet
2021 04 19 Reference List
4 pages
Module 4 The Information Age
No ratings yet
Module 4 The Information Age
16 pages
Vostokov Dmitry Memory Thinking For C and C++ Windows Diagnostics
No ratings yet
Vostokov Dmitry Memory Thinking For C and C++ Windows Diagnostics
251 pages
SF Agentforce Specialist Dumps
No ratings yet
SF Agentforce Specialist Dumps
59 pages
Mathematics Grade 12 Summer School Question Paper 2025
No ratings yet
Mathematics Grade 12 Summer School Question Paper 2025
6 pages
DM Lab
No ratings yet
DM Lab
41 pages
Debre Markos Online Voting System
No ratings yet
Debre Markos Online Voting System
59 pages
Yash Resume
No ratings yet
Yash Resume
1 page
Nano Sweep BT
No ratings yet
Nano Sweep BT
38 pages
BNWAS 用户手册英文版 (User Manual)
No ratings yet
BNWAS 用户手册英文版 (User Manual)
64 pages
Conditionals Loops Jumps: Compatibility
No ratings yet
Conditionals Loops Jumps: Compatibility
4 pages
Information Practices: Section A
No ratings yet
Information Practices: Section A
8 pages
Systems Engineering Chapter 7
No ratings yet
Systems Engineering Chapter 7
68 pages
Halo Go Dealer Tutorial V1.0 PDF
No ratings yet
Halo Go Dealer Tutorial V1.0 PDF
15 pages
Cryptography & Data Security Guide
No ratings yet
Cryptography & Data Security Guide
24 pages
Q3 Distinguishing HTML Structure, Elements, and Attributes
No ratings yet
Q3 Distinguishing HTML Structure, Elements, and Attributes
4 pages
EMAX Hawk Pro User Manual
No ratings yet
EMAX Hawk Pro User Manual
11 pages

Unit 3 PART 2

Uploaded by

Unit 3 PART 2

Uploaded by

.

Pre-requisites for Installing Hadoop

2. Step-by-Step Guide to Install Hadoop

Step 1: Install Java

If Java is not installed, you can install it as follows:

Add the following line:

Then, source the .bashrc file to apply the changes:

Step 2: Download Hadoop

Go to the official Apache Hadoop website (https://hadoop.apache.org/) and download the

Once downloaded, extract the tarball:

Move the extracted files to a directory of your choice:

Step 3: Set Up Environment Variables

Add the Hadoop environment variables in the .bashrc file:

Add the following lines at the end of the file:

Then, apply the changes:

Step 4: Configure Hadoop

Edit the hadoop-env.sh file to specify the JAVA_HOME path:

Find the line with # export JAVA_HOME and update it as follows:

Add the following configuration inside the <configuration> tags:

Add the following configuration:

Add the following configuration:

3. Format the Hadoop Filesystem (HDFS)

This will initialize the HDFS file system.

4. Start Hadoop Daemons

This will start the NameNode and DataNode services.

3. Check the Status:

5. Access the Hadoop Web Interfaces

 HDFS NameNode UI: http://localhost:9870

6. Stop Hadoop Services

1. Understanding YARN Architecture

YARN consists of the following components:

 ResourceManager (RM): Manages resources in the cluster and schedules

2. Key Configuration Files for YARN

 yarn-site.xml: The primary configuration file for YARN.

3. Configuration for YARN in yarn-site.xml

1. ResourceManager Address: Configure the ResourceManager’s address. This is the

2. ResourceManager Web UI: This configuration specifies the web UI of the

3. NodeManager Local Directory: Defines the directory where the NodeManager

5. NodeManager Resource Memory: Configures the amount of memory the

4. Configuration for MapReduce in mapred-site.xml

1. MapReduce Framework: In YARN, MapReduce tasks run in containers, so you

4. ResourceManager for MapReduce: Specify the ResourceManager for managing

5. Starting YARN Daemons

3. Start YARN (HDFS services must be running):

4. Check YARN Status:

 ResourceManager Web UI: http://localhost:8088/

Sample Map Reduce program Application

You might also like