HDFS

Uploaded by

chise6969

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

HDFS

Uploaded by

chise6969

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Hadoop Distributed File

System (HDFS)
An Overview
What is HDFS?

• HDFS is a distributed file system designed

for large-scale data storage.
• It is a core component of the Hadoop
ecosystem.
• Primary Use: Manage and store vast
amounts of data across multiple
machines.
• Fault Tolerance:
Automatically
replicates data to
ensure reliability.

Key • Scalability: Capable

of handling large
Features datasets with ease.

• High Throughput:
Optimized for large
files and data access.
• NameNode: The master
node managing metadata
and access.

• DataNodes: Worker
Architecture
nodes that store and
manage actual data. of HDFS

• Block Storage: Data is

split into blocks and
distributed across nodes.
HDFS Architecture
• Hadoop Distributed File System follows the master-slave
architecture.
• Each cluster comprises a single master node and multiple
slave nodes.
• Internally the files get divided into one or more blocks, and
each block is stored on different slave machines depending
on the replication factor.
• The master node stores and manages the file system
namespace, that is information about blocks of files like
block locations, permissions, etc. The slave nodes store
data blocks of files.
• The Master node is the NameNode and DataNodes are the
slave nodes.
• It executes the file system namespace operations
like opening, renaming, and closing files and
directories.
• NameNode manages and maintains the
DataNodes.
• It determines the mapping of blocks of a file to
DataNodes.
Functions of • NameNode records each change made to the file
system namespace.
NameNode • It keeps the locations of each block of a file.
• NameNode takes care of the replication factor of
all the blocks.
• NameNode receives heartbeat and block reports
from all DataNodes that ensure DataNode is alive.
• If the DataNode fails, the NameNode chooses new
DataNodes for new replicas.
• DataNode is responsible for serving the client
read/write requests.
• Based on the instruction from the NameNode,
DataNodes performs block creation, replication,
Functions of and deletion.

DataNode • DataNodes send a heartbeat to NameNode to

report the health of HDFS.
• DataNodes also sends block reports to
NameNode to report the list of blocks it
contains.
Blocks in HDFS Architecture
Blocks in HDFS
Architecture
• HDFS split the file into block-sized chunks called a
block. The size of the block is 128 Mb by default. One
can configure the block size as per the requirement.
• For example, if there is a file of size 612 Mb, then HDFS
will create four blocks of size 128 Mb and one block of
size 100 Mb.
• The file of a smaller size does not occupy the full block
size space in the disk.
• For example, the file of size 2 Mb will occupy only 2 Mb
space in the disk.
• The user doesn’t have any control over the location of
the blocks.
Replication Management
• For a distributed system, the data must be redundant to
multiple places so that if one machine fails, the data is
accessible from other machines.
• In Hadoop, HDFS stores replicas of a block on multiple
DataNodes based on the replication factor.
• The replication factor is the number of copies to be created for
blocks of a file in HDFS architecture.
• If the replication factor is 3, then three copies of a block get
stored on different DataNodes.
• If one DataNode containing the data block fails, then the block
is accessible from the other DataNode containing a replica of
the block.
• If we are storing a file of 128 Mb and the replication factor is 3,
then (3*128=384) 384 Mb of disk space is occupied for a file as
three copies of a block get stored.
• This replication mechanism makes HDFS fault-tolerant.
• HDFS replicates each
file block across
multiple DataNodes.

Default replication Data

factor: 3 (can be Replication
configured). Mechanism

• Ensures data
availability in case of
node failures.
• Efficient handling of
large datasets.

• Built-in redundancy
for high availability. Advantages

• Flexible scalability,
allowing more nodes to
be added as needed.
• Small File Problem: Not
optimized for managing
many small files.

Challenges
• Single Point of Failure:
The NameNode is a critical
component; its failure can
impact the system.
Conclusion

• HDFS is a robust solution for managing distributed

storage.
• It plays a vital role in modern big data infrastructures.
• While highly scalable and efficient, certain challenges,
like small file handling, must be managed.

Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
L-8 HDFS Design and Architecture, Flume and Sqoop
No ratings yet
L-8 HDFS Design and Architecture, Flume and Sqoop
66 pages
BDA Chapter 2
No ratings yet
BDA Chapter 2
36 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
BCS061 Notes Unit3
No ratings yet
BCS061 Notes Unit3
23 pages
Unit 3 1
No ratings yet
Unit 3 1
20 pages
UNIT 3 HDFS, Hadoop Environment Part 1
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 1
9 pages
Unit 3 HDFS Notes
No ratings yet
Unit 3 HDFS Notes
71 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
HDFS
No ratings yet
HDFS
16 pages
Unit - 3 (HDFS) - 1
No ratings yet
Unit - 3 (HDFS) - 1
24 pages
Unit - 3 (HDFS)
No ratings yet
Unit - 3 (HDFS)
23 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
3 HDFS
No ratings yet
3 HDFS
20 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
BD U-3 Notes
No ratings yet
BD U-3 Notes
27 pages
Document 4 HDFS
No ratings yet
Document 4 HDFS
8 pages
HDFS
No ratings yet
HDFS
8 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
HDFS (27 Jan 2025 Hadoop Distributed File System)
No ratings yet
HDFS (27 Jan 2025 Hadoop Distributed File System)
73 pages
HDFS
No ratings yet
HDFS
14 pages
Big Data Assighmwnt 2
No ratings yet
Big Data Assighmwnt 2
60 pages
HDFS
No ratings yet
HDFS
11 pages
Assignment 1 Big Data
No ratings yet
Assignment 1 Big Data
9 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
169 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Big Data Refers To Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers To Extremely Large and Complex Datasets That 1
421 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
Unit II Big Data Analytics
No ratings yet
Unit II Big Data Analytics
11 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Introduction To Hadoop Distributed File System
No ratings yet
Introduction To Hadoop Distributed File System
3 pages
Hadoop Training in Hyderabad - Hadoop File System
No ratings yet
Hadoop Training in Hyderabad - Hadoop File System
5 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
DC Mod 6
No ratings yet
DC Mod 6
9 pages
HDFS Presentation Kunal Yadav
No ratings yet
HDFS Presentation Kunal Yadav
11 pages
Hdfs and Pig
No ratings yet
Hdfs and Pig
13 pages
Introduction To Hadoop Distributed File System (HDFS)
No ratings yet
Introduction To Hadoop Distributed File System (HDFS)
22 pages
Module 4 - Hadoop HDFS
No ratings yet
Module 4 - Hadoop HDFS
102 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
Unit IV
No ratings yet
Unit IV
248 pages
BBVCX
No ratings yet
BBVCX
89 pages
Unit-2 CH 1 Updated
No ratings yet
Unit-2 CH 1 Updated
22 pages
Notes
88% (8)
Notes
18 pages
UNIT II Hadoop Framework
No ratings yet
UNIT II Hadoop Framework
25 pages
HDFS Concepts
No ratings yet
HDFS Concepts
4 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
BD U-3 (Anupam Sir)
No ratings yet
BD U-3 (Anupam Sir)
23 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
Huawei
No ratings yet
Huawei
32 pages
Data Wrangling and Munging
No ratings yet
Data Wrangling and Munging
21 pages
YARN Tutorial: Architecture & Use Cases
No ratings yet
YARN Tutorial: Architecture & Use Cases
14 pages
Map Reduce
No ratings yet
Map Reduce
14 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
15 pages
OC - Module 2 - DA Lifecycle 021312
No ratings yet
OC - Module 2 - DA Lifecycle 021312
33 pages
Pig Hive Spark Big Data Analytics
No ratings yet
Pig Hive Spark Big Data Analytics
10 pages
Matrix Mult
No ratings yet
Matrix Mult
6 pages
Understanding Points and Patches A Journey Into Geometry Modeling and Applications
No ratings yet
Understanding Points and Patches A Journey Into Geometry Modeling and Applications
11 pages
Cloud Security: Timothy Brown
No ratings yet
Cloud Security: Timothy Brown
40 pages
Describe and Contrast Comprehensively The Different Types of Architectures For Integrating Systems
100% (1)
Describe and Contrast Comprehensively The Different Types of Architectures For Integrating Systems
5 pages
Interviewvit Aws Cheatsheet
No ratings yet
Interviewvit Aws Cheatsheet
33 pages
Aws Saa c03 Part 6
No ratings yet
Aws Saa c03 Part 6
7 pages
Google Cloud Training Event
No ratings yet
Google Cloud Training Event
9 pages
Amazon RDS Custom
No ratings yet
Amazon RDS Custom
26 pages
Cloud Computing for Beginners
No ratings yet
Cloud Computing for Beginners
2 pages
Open Source Cloud Platforms Comparison
No ratings yet
Open Source Cloud Platforms Comparison
3 pages
2.1,2.2-Service Models of Cloud Computing
No ratings yet
2.1,2.2-Service Models of Cloud Computing
17 pages
Surbiryala 2019
No ratings yet
Surbiryala 2019
7 pages
17 Cra Ries: Archives
No ratings yet
17 Cra Ries: Archives
80 pages
CC - Question - Bank
No ratings yet
CC - Question - Bank
6 pages
Google Cloud Fundamentals Core Infrastructure
No ratings yet
Google Cloud Fundamentals Core Infrastructure
2 pages
vivado2016+IP Lic
No ratings yet
vivado2016+IP Lic
25 pages
Class 8 Lesson-8 Cloud Computing and IoT
No ratings yet
Class 8 Lesson-8 Cloud Computing and IoT
1 page
DRM
No ratings yet
DRM
3 pages
CC MODULE 4.1
No ratings yet
CC MODULE 4.1
31 pages
Big Data Insights with Hadoop
No ratings yet
Big Data Insights with Hadoop
34 pages
Url Profile Results Shopifystores Usa7
No ratings yet
Url Profile Results Shopifystores Usa7
520 pages
Job Description - Mulesoft Developer
No ratings yet
Job Description - Mulesoft Developer
2 pages
Makala Sai Charan CV PDF
No ratings yet
Makala Sai Charan CV PDF
3 pages
Cloud Computing - Chapter 1
No ratings yet
Cloud Computing - Chapter 1
11 pages
AWS Security Architecture Guide
No ratings yet
AWS Security Architecture Guide
13 pages
IoT Communication Models
No ratings yet
IoT Communication Models
6 pages
MCA Cluster Computing
No ratings yet
MCA Cluster Computing
24 pages
Flowchart
No ratings yet
Flowchart
1 page
Free Proxy List
No ratings yet
Free Proxy List
2 pages
Evolution and Architecture of Cloud Computing
No ratings yet
Evolution and Architecture of Cloud Computing
39 pages
Cloud COMPUTING Module 4
No ratings yet
Cloud COMPUTING Module 4
50 pages
20 GDPS
No ratings yet
20 GDPS
10 pages

HDFS

Uploaded by

HDFS

Uploaded by

Hadoop Distributed File

• HDFS is a distributed file system designed

Key • Scalability: Capable

• Block Storage: Data is

DataNode • DataNodes send a heartbeat to NameNode to

Default replication Data

• HDFS is a robust solution for managing distributed

You might also like