Hadoop

Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It uses MapReduce as a programming model and HDFS as a distributed file system. HDFS stores large files across clusters and replicates data for reliability, while MapReduce allows parallel processing of datasets in a fault-tolerant manner. A typical Hadoop cluster integrates these components, with a master node running job and name nodes and slave nodes running task and data nodes.

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views7 pages

Hadoop

Uploaded by

jefferyleclerc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Hadoop/MapReduce

Object-oriented framework presentation

CSCI 5448
Casey McTaggart
What is Apache Hadoop?
• Large scale, open source software framework
▫ Yahoo! has been the largest contributor to date
• Dedicated to scalable, distributed, data-intensive
computing
• Handles thousands of nodes and petabytes of data
• Supports applications under a free license
• 3 Hadoop subprojects:
▫ Hadoop Common: common utilities package
▫ HFDS: Hadoop Distributed File System with high
throughput access to application data
▫ MapReduce: A software framework for distributed
processing of large data sets on computer clusters
Hadoop MapReduce
• MapReduce is a programming model and software
framework first developed by Google (Google’s
MapReduce paper submitted in 2004)
• Intended to facilitate and simplify the processing of
vast amounts of data in parallel on large clusters of
commodity hardware in a reliable, fault-tolerant
manner
▫ Petabytes of data
▫ Thousands of nodes
• Computational processing occurs on both:
▫ Unstructured data : filesystem
▫ Structured data : database
Hadoop Distributed File System (HFDS)
• Inspired by Google File System
• Scalable, distributed, portable filesystem written in Java for
Hadoop framework
▫ Primary distributed storage used by Hadoop applications
• HFDS can be part of a Hadoop cluster or can be a stand-alone
general purpose distributed file system
• An HFDS cluster primarily consists of
▫ NameNode that manages file system metadata
▫ DataNode that stores actual data
• Stores very large files in blocks across machines in a large
cluster
▫ Reliability and fault tolerance ensured by replicating data across
multiple hosts
• Has data awareness between nodes
• Designed to be deployed on low-cost hardware
More on Hadoop file systems

• Hadoop can work directly with any distributed

file system which can be mounted by the
underlying OS
• However, doing this means a loss of locality as
Hadoop needs to know which servers are closest
to the data
• Hadoop-specific file systems like HFDS are
developed for locality, speed, fault tolerance,
integration with Hadoop, and reliability
Typical Hadoop cluster integrates
MapReduce and HFDS
• Master/slave architecture
• Master node contains
▫ Job tracker node (MapReduce layer)
▫ Task tracker node (MapReduce layer)
▫ Name node (HFDS layer)
▫ Data node (HFDS layer)
• Multiple slave nodes contain
▫ Task tracker node (MapReduce layer)
▫ Data node (HFDS layer)
• MapReduce layer has job and task tracker nodes
• HFDS layer has name and data nodes
Hadoop simple cluster graphic
MapReduce layer HFDS layer

Master Node

JobTracker TaskTracker Name Data

Slave Node
1..*
TaskTracker Data

Unit-2 Hadoop HDFS Hadoopecosystem
No ratings yet
Unit-2 Hadoop HDFS Hadoopecosystem
25 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Hadoopintro
No ratings yet
Hadoopintro
31 pages
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
No ratings yet
Intro Hadoop Ecosystem Components, Hadoop Ecosystem Tools
15 pages
Module - 2
No ratings yet
Module - 2
84 pages
Unit 2,3
No ratings yet
Unit 2,3
24 pages
Hadoop Basics for Engineering Students
No ratings yet
Hadoop Basics for Engineering Students
18 pages
Hadoop for Big Data Enthusiasts
No ratings yet
Hadoop for Big Data Enthusiasts
21 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Big Data
No ratings yet
Big Data
67 pages
Chap 2 Hadoop
No ratings yet
Chap 2 Hadoop
24 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Hadoop 1
No ratings yet
Hadoop 1
75 pages
Hadoop for Big Data Analysis
No ratings yet
Hadoop for Big Data Analysis
4 pages
Module II
No ratings yet
Module II
46 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
Hadoop
No ratings yet
Hadoop
5 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
UNIT 5 Combined
No ratings yet
UNIT 5 Combined
13 pages
BIG Data - Unit - 2
No ratings yet
BIG Data - Unit - 2
24 pages
BDA Notes
No ratings yet
BDA Notes
25 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Hadoop Notes 2
No ratings yet
Hadoop Notes 2
5 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
Hadoop PDF
0% (1)
Hadoop PDF
4 pages
Bda-Unit-2 - 2023
No ratings yet
Bda-Unit-2 - 2023
58 pages
Wa0002.
No ratings yet
Wa0002.
32 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
Module 2.1
No ratings yet
Module 2.1
21 pages
Unit 2 Hadoop
No ratings yet
Unit 2 Hadoop
60 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
10th August Morning and Afternoon Session Hadoop
No ratings yet
10th August Morning and Afternoon Session Hadoop
18 pages
Hadoop Framework & HDFS Overview
No ratings yet
Hadoop Framework & HDFS Overview
10 pages
UNIT V-Cloud Computing
No ratings yet
UNIT V-Cloud Computing
33 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Unit 5-PLH
No ratings yet
Unit 5-PLH
34 pages
Unit 2
No ratings yet
Unit 2
17 pages
Hadoop
No ratings yet
Hadoop
7 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
Big Data Analytics AAM Unit 5
No ratings yet
Big Data Analytics AAM Unit 5
28 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
CC 2
No ratings yet
CC 2
25 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
25 pages
Hadoop for Big Data Professionals
No ratings yet
Hadoop for Big Data Professionals
13 pages
Introduction To Hadoop and MapReduce Programming
No ratings yet
Introduction To Hadoop and MapReduce Programming
29 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
10 pages
2 Mapreduce Model Principles
No ratings yet
2 Mapreduce Model Principles
7 pages
MapReduce - What It Is, and Why It Is So Popular
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
7 pages
Optimizing BKM+ for Clustering Efficiency
No ratings yet
Optimizing BKM+ for Clustering Efficiency
3 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
4 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
7 pages
Paper Dvi
No ratings yet
Paper Dvi
7 pages
Balanced k-means Algorithm Analysis
No ratings yet
Balanced k-means Algorithm Analysis
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
2 pages
K-Means Clustering in SAP HANA PAL
No ratings yet
K-Means Clustering in SAP HANA PAL
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
4 pages
SAP HANA K-Means for Segmentation
No ratings yet
SAP HANA K-Means for Segmentation
6 pages
SAP HANA K-Means Clustering Guide
No ratings yet
SAP HANA K-Means Clustering Guide
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
3 pages
SVM Distance-Based Kernel Accuracy
No ratings yet
SVM Distance-Based Kernel Accuracy
1 page
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
4 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
3 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
3 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
42 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
6 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
2 pages
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
No ratings yet
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
4 pages
Big Data Clustering with MapReduce
No ratings yet
Big Data Clustering with MapReduce
7 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Embed and Conquer: Scalable Embeddings For Kernel K-Means On Mapreduce
No ratings yet
Embed and Conquer: Scalable Embeddings For Kernel K-Means On Mapreduce
9 pages
CPU Scheduling 2
No ratings yet
CPU Scheduling 2
5 pages
Operating System Solutions
No ratings yet
Operating System Solutions
7 pages
Operating System 5TH Sem Vtu Notes
50% (2)
Operating System 5TH Sem Vtu Notes
3 pages
Operating System MCQS (GATE)
75% (4)
Operating System MCQS (GATE)
40 pages
AM601PC Knowledge Representation and Reasoning UNIT-1
No ratings yet
AM601PC Knowledge Representation and Reasoning UNIT-1
16 pages
UltraViewer Service Logs
No ratings yet
UltraViewer Service Logs
110 pages
Error 20240919230124
No ratings yet
Error 20240919230124
36 pages
Trace
No ratings yet
Trace
34 pages
Unit 3 Bba
No ratings yet
Unit 3 Bba
11 pages
ES 2023 L6 Embedded System Interfacing RTOS
No ratings yet
ES 2023 L6 Embedded System Interfacing RTOS
13 pages
Multithreading in Python
No ratings yet
Multithreading in Python
23 pages
Could Virtual Threads Cast Away The Usage of Kotlin Coroutines
No ratings yet
Could Virtual Threads Cast Away The Usage of Kotlin Coroutines
39 pages
Unit 3 Oops
No ratings yet
Unit 3 Oops
58 pages
Monte Carlo Simulation - Methods, Assessment and Applications
No ratings yet
Monte Carlo Simulation - Methods, Assessment and Applications
167 pages
Real Time Scheduling Algorithms
No ratings yet
Real Time Scheduling Algorithms
3 pages
Operating System: COCSC301/CACSC301
No ratings yet
Operating System: COCSC301/CACSC301
16 pages
Synchronization
No ratings yet
Synchronization
19 pages
Distributed Systems: MapReduce Basics
No ratings yet
Distributed Systems: MapReduce Basics
24 pages
BCA 2nd Year OS Assessment
No ratings yet
BCA 2nd Year OS Assessment
2 pages
Operating Systems Cheat Sheet: by Via
No ratings yet
Operating Systems Cheat Sheet: by Via
9 pages
Practice Solution
No ratings yet
Practice Solution
14 pages
Lect2-PARALLEL DATABASE
No ratings yet
Lect2-PARALLEL DATABASE
25 pages
Hadoop
No ratings yet
Hadoop
11 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
Parallel Computing Course Guide
No ratings yet
Parallel Computing Course Guide
58 pages
Java Programming Chapter 11
No ratings yet
Java Programming Chapter 11
59 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
TM298 Final by ISA 2nd Edition
No ratings yet
TM298 Final by ISA 2nd Edition
78 pages
Parallel Database Systems
No ratings yet
Parallel Database Systems
17 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
20 pages

Hadoop

Uploaded by

Hadoop

Uploaded by

Hadoop/MapReduce

Object-oriented framework presentation

• Hadoop can work directly with any distributed

JobTracker TaskTracker Name Data

You might also like