Unit 3

MapReduce is a data processing tool developed by Google in 2004 that processes large datasets in parallel across distributed systems. It consists of two main phases: the Mapper, which converts input data into key-value pairs, and the Reducer, which aggregates those pairs into a final output. The process includes job submission, task execution, and handling failures, with shuffling and sorting phases ensuring efficient data transfer between mappers and reducers.

Uploaded by

anildudla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views27 pages

Unit 3

Uploaded by

anildudla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

UNIT – III

Map Reduce Technique

What is MapReduce?
A MapReduce is a data processing tool which is used to
process the data parallelly in a distributed form. It was
developed in 2004, on the basis of paper titled as
"MapReduce: Simplified Data Processing on Large Clusters,"
published by Google.
The MapReduce is a paradigm which has two phases, the
mapper phase, and the reducer phase. In the Mapper, the
input is given in the form of a key-value pair. The output of the
Mapper is fed to the reducer as input. The reducer runs only
after the Mapper is over. The reducer too takes input in key-
value format, and the output of reducer is the final output.
MapReduce
MapReduce facilitates concurrent processing by
splitting petabytes of data into smaller chunks, and
processing them in parallel on Hadoop commodity
servers. In the end, it aggregates all the data from
multiple servers to return a consolidated output back to
the application.
How MapReduce Works?

The MapReduce algorithm contains two important tasks, namely Map

and Reduce.
The Map task takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples (key-value
pairs).
The Reduce task takes the output from the Map as an input and
combines those data tuples (key-value pairs) into a smaller set of tuples.
The reduce task is always performed after the map job.
Input Phase − Here we have a Record Reader that translates each record in an
input file and sends the parsed data to the mapper in the form of key-value pairs.
Map − Map is a user-defined function, which takes a series of key-value pairs
and processes each one of them to generate zero or more key-value pairs.
Intermediate Keys − They key-value pairs generated by the mapper are known
as intermediate keys.
Combiner − A combiner is a type of local Reducer that groups similar data from
the map phase into identifiable sets. It takes the intermediate keys from the
mapper as input and applies a user-defined code to aggregate the values in a
small scope of one mapper. It is not a part of the main MapReduce algorithm; it
is optional
Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It
downloads the grouped key-value pairs onto the local machine, where the
Reducer is running. The individual key-value pairs are sorted by key into a larger
data list. The data list groups the equivalent keys together so that their values can
be iterated easily in the Reducer task.
Reducer − The Reducer takes the grouped key-value paired data as input and
runs a Reducer function on each one of them. Here, the data can be aggregated,
filtered, and combined in a number of ways, and it requires a wide range of
processing. Once the execution is over, it gives zero or more key-value pairs to the
final step.
Output Phase − In the output phase, we have an output formatter that
translates the final key-value pairs from the Reducer function and writes them
onto a file using a record writer.
Anatomy of a Map Reduce Job Run

Map reduce job run process is mainly depends on

JOB SUBMISSION
JOB INITIALIZAATION
TASK ASSIGNMENT
TASK EXECUTION
PROGRESS AND STATUS UPDATES
JOB COMPLETION
FAILURES
Anatomy of a Map Reduce Job Run
MapReduce can be used to work with a solitary method call: submit() on
a Job object (you can likewise call waitForCompletion(), which presents
the activity on the off chance that it hasn’t been submitted effectively, at
that point sits tight for it to finish).
Let’s understand the components –
Client: Submitting the MapReduce job.
Yarn node manager: In a cluster, it monitors and launches the
compute containers on machines.
Yarn resource manager: Handles the allocation of computing
resources coordination on the cluster.
MapReduce application master Facilitates the tasks running
the MapReduce work.
Distributed Filesystem: Shares job files with other entities.
How to submit Job?
To create an internal JobSubmitter instance, use the submit() which
further calls submitJobInternal() on it. Having submitted the job,
waitForCompletion() polls the job’s progress after submitting the job
once per second.
The resource manager asks for a new application ID that is used for
MapReduce Job ID.
Output specification of the job is checked. For e.g. an error is thrown
to the MapReduce program or the job is not submitted or the output
directory already exists or it has not been specified.
If the splits cannot be computed, it computes the input splits for the
job. This can be due to the job is not submitted and an error is thrown
to the MapReduce program.
Resources needed to run the job are copied – it includes the job JAR
file, and the computed input splits, to the shared filesystem in a
directory named after the job ID and the configuration file.
It copies job JAR with a high replication factor, which is controlled
by mapreduce.client.submit.file.replication property. AS there are a
number of copies across the cluster for the node managers to access.
By calling submitApplication(), submits the job to the resource
manager
Failures
Real user code can process crash, can be full of bugs or even the
machine can fail. The capability of Hadoop to handle such failures is
the biggest benefit of using it which allows the job to be completed
successfully. Any of the following components

Application master
Node manager
Resource manager
Task
Shuffle and Sorting
In this lesson, we will learn completely about MapReduce
Shuffling and Sorting. Here we will offer you a detailed description
of the Hadoop Shuffling and Sorting phase. Initially, we will discuss
what is MapReduce Shuffling, next with MapReduce Sorting, then
we will discuss MapReduce the secondary sorting phase in detail.
Shuffling is the process by which it transfers the mapper’s
intermediate output to the reducer. Reducer gets one or more keys
and associated values based on reducers. The intermediated key –
value generated by the mapper is sorted automatically by key. In
Sort phase merging and sorting of the map, the output takes place.
Shuffling in MapReduce
The process of moving data from the mappers to reducers is
shuffling. Shuffling is also the process by which the system
performs the sort. Then it moves the map output to the
reducer as input. This is the reason the shuffle phase is
required for the reducers. Else, they would not have any input
(or input from every mapper). Meanwhile, shuffling can begin
even before the map phase has finished. Therefore this saves
some time and completes the tasks in lesser time.
Sorting in MapReduce
MapReduce Framework automatically sorts the keys
generated by the mapper. Therefore, before starting of
reducer, all intermediate key-value pairs get sorted by key
and not by value. It does not sort values transferred to each
reducer. They can be in any order.
Map Reduce Types and Formats

 MapReduce is the processing unit of Hadoop, using which the

data in Hadoop can be processed.
 The MapReduce task works on <Key, Value> pair.
 Two main features of MapReduce are parallel programming
model and large-scale distributed model.
 MapReduce allows for the distributed processing of the map
and reduction operations.
○ Map procedure(Transform): Performs a filtering and sorting
operation.
○ Reduce procedure(Aggregates): Performs a summary
operation
 MapReduce Workflow:
Mapper class’s KEYIN must be consistent with inputformat.class
Mapper class’s KEYOUT must be consistent with map.out.key.class…
Formats

Map Reduce formats is basically clasified in two types..these are:

Input formats
Text input format
Binary Input format
Multiple input formats
DB Input formats
Output formats
Text output formats
Binary output formats
Multiple output formats
Lazy outputs formats
DB output formats

Map Reduce
No ratings yet
Map Reduce
74 pages
MapReduce Technique Overview
No ratings yet
MapReduce Technique Overview
16 pages
Unit 3
No ratings yet
Unit 3
22 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Unit - III
No ratings yet
Unit - III
37 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Unit 2
No ratings yet
Unit 2
12 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Bda Unit 2
No ratings yet
Bda Unit 2
54 pages
Notes 3 & 4 B Unit
No ratings yet
Notes 3 & 4 B Unit
19 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Bda U2
No ratings yet
Bda U2
79 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Chapter 4 - Understanding Map Reduce Fundamentals
No ratings yet
Chapter 4 - Understanding Map Reduce Fundamentals
45 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Describe The MapReduce Execution Steps With A Neat Diagram
No ratings yet
Describe The MapReduce Execution Steps With A Neat Diagram
10 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
20 pages
Map Reduce
No ratings yet
Map Reduce
11 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
Unit Ii Iintroduction To Map Reduce
No ratings yet
Unit Ii Iintroduction To Map Reduce
4 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
Big Data
No ratings yet
Big Data
120 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Anatomy of Hadoop MapReduce Jobs
No ratings yet
Anatomy of Hadoop MapReduce Jobs
11 pages
MapReduce Algorithm Explained
No ratings yet
MapReduce Algorithm Explained
8 pages
L04 MapReduce
No ratings yet
L04 MapReduce
37 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
MapReduce in Hadoop Explained
No ratings yet
MapReduce in Hadoop Explained
45 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Map Reduce Workflow Colloquim
No ratings yet
Map Reduce Workflow Colloquim
30 pages
Bda Unit 2
No ratings yet
Bda Unit 2
48 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Reduce
No ratings yet
Map Reduce
33 pages
Data Science
No ratings yet
Data Science
7 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
MapReduce and HDFS Architecture Guide
No ratings yet
MapReduce and HDFS Architecture Guide
9 pages
Anatomy of A MapReduce Job
100% (1)
Anatomy of A MapReduce Job
5 pages
What Is Mapreduce?
No ratings yet
What Is Mapreduce?
3 pages
MapReduce: Big Data Processing Guide
No ratings yet
MapReduce: Big Data Processing Guide
25 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
Big Data Unit-2 PPT Part2
No ratings yet
Big Data Unit-2 PPT Part2
78 pages
MapReduce for Big Data Analysis
No ratings yet
MapReduce for Big Data Analysis
59 pages
Unit 3
No ratings yet
Unit 3
13 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Trends Problems and Solutions of Recommender System
No ratings yet
Trends Problems and Solutions of Recommender System
4 pages
Map Reduce Types and Formats
No ratings yet
Map Reduce Types and Formats
32 pages
FuzzAttention On Session-Based Recommender System
No ratings yet
FuzzAttention On Session-Based Recommender System
6 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
95 pages
Unit II Hadoop IO
No ratings yet
Unit II Hadoop IO
27 pages
Unit-1 CC
No ratings yet
Unit-1 CC
86 pages
Threads Synchronization
No ratings yet
Threads Synchronization
29 pages
Java Notes
No ratings yet
Java Notes
14 pages
Raspberry Pi - Wikipedia
No ratings yet
Raspberry Pi - Wikipedia
16 pages
Multiple Choice Questions For Mid - 1
No ratings yet
Multiple Choice Questions For Mid - 1
26 pages
Computer Design Basic
No ratings yet
Computer Design Basic
1 page
Database Systems and DBMS Overview
No ratings yet
Database Systems and DBMS Overview
6 pages
Minskytron Annotated
No ratings yet
Minskytron Annotated
4 pages
21ST MELC 3 2ND QUARTER - Docx - 20240115 - 201745 - 0000
No ratings yet
21ST MELC 3 2ND QUARTER - Docx - 20240115 - 201745 - 0000
13 pages
CS - Lab Assignment-5
No ratings yet
CS - Lab Assignment-5
4 pages
Jjtu Thesis Guidelines
100% (3)
Jjtu Thesis Guidelines
5 pages
LPP Excel Notes
No ratings yet
LPP Excel Notes
5 pages
ABC-ul Psihopatului de Succes Kevin Dutton
100% (1)
ABC-ul Psihopatului de Succes Kevin Dutton
381 pages
WickedWhims v174h Exception
No ratings yet
WickedWhims v174h Exception
5 pages
AI-Driven Phishing Scam Detection
No ratings yet
AI-Driven Phishing Scam Detection
8 pages
Gjelas
No ratings yet
Gjelas
4 pages
Class Xii Computer Science Mock Test Paper 02 Ms
No ratings yet
Class Xii Computer Science Mock Test Paper 02 Ms
7 pages
Satya Teja1
No ratings yet
Satya Teja1
3 pages
Low and High Fidility Wireframes
No ratings yet
Low and High Fidility Wireframes
5 pages
CNC Turing Controller PLC User Manual
No ratings yet
CNC Turing Controller PLC User Manual
162 pages
License & Registration Details - Parivahan Sewa - Ministry of Road Transport & Highways, Government of India
No ratings yet
License & Registration Details - Parivahan Sewa - Ministry of Road Transport & Highways, Government of India
2 pages
ERP & SAP Overview for Consultants
No ratings yet
ERP & SAP Overview for Consultants
10 pages
How To Disassemble & Reassemble Evo ECU Code
No ratings yet
How To Disassemble & Reassemble Evo ECU Code
15 pages
CAEN RFID API Reference Manual Rev 04
No ratings yet
CAEN RFID API Reference Manual Rev 04
131 pages
Sr Data Engineer Job at NTE India
No ratings yet
Sr Data Engineer Job at NTE India
3 pages
Ladder Logic Examples and PLC Programming Examples
100% (2)
Ladder Logic Examples and PLC Programming Examples
6 pages
Comics for Cybersecurity Education
No ratings yet
Comics for Cybersecurity Education
26 pages
OceanStor SNS2224&SNS2248&SNS3096&SNS5192&SNS5384 V100R002C00SPC500 Upgrade Guide 03
No ratings yet
OceanStor SNS2224&SNS2248&SNS3096&SNS5192&SNS5384 V100R002C00SPC500 Upgrade Guide 03
32 pages
MyAnatomy - Campus Hiring - SDE Intern - 2026 Batch - Notification
No ratings yet
MyAnatomy - Campus Hiring - SDE Intern - 2026 Batch - Notification
2 pages
Cdda All Keybindings v1.0
100% (1)
Cdda All Keybindings v1.0
4 pages
CHATBOT
No ratings yet
CHATBOT
10 pages
DX Series Smart Edge Capture Device R1.6
No ratings yet
DX Series Smart Edge Capture Device R1.6
2 pages
Three Servo Packaging Machine Touch Screen Operation Instructions
No ratings yet
Three Servo Packaging Machine Touch Screen Operation Instructions
8 pages
DSE5510 Firmware Upgrade Guide
No ratings yet
DSE5510 Firmware Upgrade Guide
8 pages
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
No ratings yet
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
10 pages
July 2017 Journal
No ratings yet
July 2017 Journal
116 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT – III

Map Reduce Technique

The MapReduce algorithm contains two important tasks, namely Map

Map reduce job run process is mainly depends on

 MapReduce is the processing unit of Hadoop, using which the

Map Reduce formats is basically clasified in two types..these are:

You might also like