0% found this document useful (0 votes)
16 views17 pages

3a HDFS

The document provides an overview of the Hadoop Distributed File System (HDFS), detailing its architecture, components, and functionalities. It explains the roles of the NameNode, Secondary NameNode, and DataNodes, as well as the replication process and benefits. Additionally, it includes command-line instructions for accessing and managing files within HDFS, along with practical exercises for users.

Uploaded by

tronghiengato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views17 pages

3a HDFS

The document provides an overview of the Hadoop Distributed File System (HDFS), detailing its architecture, components, and functionalities. It explains the roles of the NameNode, Secondary NameNode, and DataNodes, as well as the replication process and benefits. Additionally, it includes command-line instructions for accessing and managing files within HDFS, along with practical exercises for users.

Uploaded by

tronghiengato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

HDFS

Hadoop Distributed File System

Thuong-Cang PHAN, PhD.

College of Information & Communication Technology - CTU


Apache Hadoop
• Open Source:
http://hadoop.apache.org/

• Wide acceptance:
- http://wiki.apache.org/hadoop/PoweredBy
- Amazon.com, Apple, AOL, eBay, IBM, Google, LinkedIn,
Last.fm, MicrosoY, SAP, Twiter, …

CIT - Can Tho University


Hadoop Architecture

CIT - Can Tho University


Storage - HDFS
• HDFS - Hadoop Distributed File System
• The primary distributed file system used by Hadoop
applications which runs on large clusters of commodity
machines.

• HDFS clusters consist of:


• A NameNode: manages the file system metadata and
monitors data nodes.
• A Secondary Namenode: periodically copy and merge the
namespace image and edit log. In case if the name node
crashes, the namespace image stored in secondary
namenode can be used to restart the namenode.
• DataNodes: store, read and write actual data blocks.

CIT - Can Tho University


HDFS - Basics
• Given file is cut in blocks (e.g., 128MB)

• Which are then assigned to (different) nodes

CIT - Can Tho University


HDFS Architecture

Resource:
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
CIT - Can Tho University
HDFS- Replication
• Can specify default replication factor (defaults to three).

• Replication is pipelined
• If block is full, NameNode is asked for other DataNodes
(that can hold replica)
• DataNode is contacted, receives data
• Forwards to third replica, etc.

CIT - Can Tho University


HDFS- Replication
• Benefits of replication
• Availability: data isn’t lost when a node fails
• Reliability: HDFS compares replicas and fixes data corruption
• Performance: allows for data locality

CIT - Can Tho University


HDFS- Writing Data

CIT - Can Tho University


HDFS- Reading Data

CIT - Can Tho University


Accessing HDFS via Command Line

• Users typically access HDFS via the command:

hadoop fs -subcommand

• subcommand is similar to corresponding UNIX commands

• Examples
• $ hadoop fs -ls /user/tomwheeler
• $ hadoop fs -cat /customers.csv
• $ hadoop fs -rm /webdata/access.log
• $ hadoop fs -mkdir /reports/marketing

CIT - Can Tho University


Copying Local Data To/From HDFS
• HDFS is distinct from your local filesystem
• hadoop fs –put copies local files to HDFS
• hadoop fs –get fetches a local copy of a file from HDFS

CIT - Can Tho University


hadoop fs -mkdir /user/hadoop/hadoopdemo

HDFS – hadoop fs Shell Commands

• hadoop fs -ls
• $hadoop fs -ls /
• $hadoop fs -ls /home/hadoop/hadoop
• $hadoop fs -lsr /home/hadoop/hadoop

• hadoop fs -mkdir
• $hadoop fs -mkdir /input/test
• $hadoop fs -mkdir /input/test1/test2

• hadoop fs -rm
• $hadoop fs -rm /input/test1
• $hadoop fs -rmr /intput/test1

CIT - Can Tho University


HDFS – hadoop fs Shell Commands
• hadoop fs -put
• $hadoop fs -put localFileName /input/test
• $hadoop fs -put localfile1 localfile2 /input/test
• $hadoop fs -put WordCount.java hdfs://localhost:9000/input/test

• hadoop fs -get
• $hadoop fs -get /input/test/hdfsFileName localFileName

• hadoop fs -copyFromLocal <localsrc> <URI>


• copy a file from the local file system to the hadoop hdfs

• hadoop fs -copyToLocal <URI> <localdst>


• copy a file from the hdfs to the local file system

CIT - Can Tho University


HDFS – hadoop fs Shell Commands
• hadoop fs -getmerge <src> <localdst> [addnl]
• The addnl option is for adding new line character at the end of each file.
• hadoop fs -cp
• hadoop fs -cp <SrcFile> <TgtFile>
• hadoop fs -cp <SrcFile1> <SrcFile2>
hdfs://namenodehost/<TgtDirectory>
• hadoop fs -du
• hadoop fs -du hdfs://namenodehost/<TgtDirectory>
• hadoop fs -dus
• hadoop fs -du hdfs://namenodehost/<TgtDirectory>
• hadoop fs -expunge
• empty the trash

CIT - Can Tho University


HDFS - Exercises
1. How to create a directory in HDFS
2. How to copy a local file to HDFS
3. How to display the contents of a file in HDFS
4. How to remove a file from HDFS

CIT - Can Tho University


HDFS – Web Interface

CIT - Can Tho University

You might also like