0% found this document useful (0 votes)
37 views30 pages

Practical BDA (1-7)

BDA PRACTICAL

Uploaded by

psp.ssaiet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views30 pages

Practical BDA (1-7)

BDA PRACTICAL

Uploaded by

psp.ssaiet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Enrollment No.

[ Big data analysis (3170722) ]


211230107015

PRACTICAL: 1
AIM: To Demonstrate Installation and Configuration of MongoDB client
Server.
STEP: 1 — Download the MongoDB MSI Installer Package
Download the current version of MongoDB. Make sure you select MSI as the
package.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP: 2 — Install MongoDB with the Installation Wizard

A. Navigate to your downloads folder and double click on the .msi package you just
downloaded. This will launch the installation wizard.

B. Click Next to start installation.


C. Accept the licence agreement then click Next.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

D. Select the Complete setup.

E. Select “Run service as Network Service user” and make a note of the data directory,
we’ll need this later.

F. We won’t need Mongo Compass, so deselect it and click Next.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

G. Click Install to begin installation.

H. Hit Finish to complete installation.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP: 3 After installation we have to set the path of mongodb and mongoshell in
environment variable.
A. Open Environment variable and then open path and add the path of mongodb and
mongoshell.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

B. After that command promat cheak the version of mongodb and run the
mongodb.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 2
AIM: Write the MongoDB queries for creating database, collection,
inserting documents, updating document, deleting documents.
Code: -

A. Creating database and collection

B. Inserting documents

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

C. Updating Documents

D. Deleting Documents

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 3
AIM: Write the MongoDB queries for the given collection.
 Collection and Inserted Documents:

A. Find the document where in the name of collage has value ‘Neha’.

B. Display name of student from trupti collection

C. Display name of student with id of the student having id value 3.

D. Display documents with students id with 1 to 2.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 4
AIM: Write MongoDB queries for aggregate methods such as Count, Limit,
Sort etc.

A. Display documents in the ascending order of _id

B. Display documents in the descending order of name.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

C. Display documents first in the ascending order of _id and then descending order of name.

D. Display all documents except first two trupti collection.

E. Display 2th and 3th documents from the trupti collection.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

F. Display total number of documents in trupti collection.

G. Display last two documents from the trupti collection.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 5
AIM: Write MongoDB queries similar to LIKE predicate in SQL.

A. Find the ids of studnrts whose name begins with the letter “N”.

B. Display all documents in which student name ends with the letter ‘e’.

C. Find all documents in student name contains ‘h’in any position.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 6
AIM: To Demonstrate the Installation and Configuration of Single node
Hadoop.
1. Download Hadoop binaries

The first step is to download Hadoop binaries from the official website.
The binary package size is about 342 MB.

After finishing the file download, we should unpack the package using two steps. First, we
should extract the hadoop-3.2.1.tar.gz library, and then, we should unpack the extracted tar
file.

The tar file extraction may take some minutes to finish. In the end, you may see some
warnings about symbolic link creation. Just ignore these warnings since they are not related to
windows.

Since we are installing Hadoop 3.2.1, we should download the files locatedin
https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin and copy them
into the “hadoop-3.2.1\bin” directory.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

After unpacking the package, we should add the Hadoop native IO libraries, which
can be found in the following GitHub repository:
https://github.com/cdarlint/winutils.
2. Setting up environment variable.
After installing Hadoop and its prerequisites, we should configure the environment variables
to define Hadoop and Java default paths.

To edit environment variables, go to Control Panel > System and Security > System (or right-
click > properties on My Computer icon) and click on the “Advanced system settings” link.

There are two variables to define:

• JAVA_HOME: JDK installation folder path


• HADOOP_HOME: Hadoop installation folder path

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

3. Configuring Hadoop cluster.


There are four files we should alter to configure Hadoop cluster:

%HADOOP_HOME%\etc\hadoop\hdfs-site.xml %HADOOP_HOME%\etc\
hadoop\core-site.xml

%HADOOP_HOME%\etc\hadoop\mapred-site.xml %HADOOP_HOME%\etc\
hadoop\yarn-site.xml

As we know, Hadoop is built using a master-slave paradigm. Before altering the HDFS
configuration file, we should create a directory to store all master node (name node) data and
another one to store data (data node). In this example, we created the following directories:

• C:\hadoop-env\hadoop-3.2.1\data\dfs\namenode
• C:\hadoop-env\hadoop-3.2.1\data\dfs\datanode

Now we can edit our hdfs-site.xml file for further config. Open file add edit as
below:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-env/hadoop3.2.1/data/dfs/namenode</value>

</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/datanode</value>
</property>

Core site configuration


<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9820</value>

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

</property>

Map Reduce site configuration:


<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

<description>MapReduce framework name</description>

</property>

Yarn site configuration:


<property>

<name>yarn.nodemanager.auxservices</name>

<value>mapreduce_shuffle</value>

<description>Yarn Node Manager Aux Service</description>

</property>

Formatting Name
node: hdfs
namenode -format

this commad may give you some error then we must fix those. If you had done all
well then you will get message like below:

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

Let start Hadoop services and see it’s working or not:

Just navigate to “%HADOOP_HOME%\sbin” directory. Then we will run the


following command to start the Hadoop nodes:

.\start-dfs.cmd
Two command prompt windows will open (one for the name node and one for the
data node) as follows:

./start-yarn.cmd

./start-yarn.cmd

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

To make sure that all services started successfully, we can run the following command: Jps

14560 DataNode

4960 ResourceManager

5936 NameNode

768
NodeManager
14636 Jps

It will show above services running and it’s all for single node Hadoop setup.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

PRACTICAL: 7
AIM: To demonstrate the configuration of multimode Hadoop
cluster.
Data, data and Data. Across every sectors people are dealing with huge and colossal amount
of data which is also termed as Big data. Hadoop is a very well known and widespread
distributed framework for big data processing . But when it comes to Hadoop installation,
most of us feel that it is quite cumbersome job. This article will provide you some easy and
quick steps for a multi node Hadoop cluster setup.

Multi-Node Cluster in Hadoop 3.x (3.1.3)

A Multi Node Cluster in Hadoop contains two or more data nodes in a distributed Hadoop
environment. This is used in organisations to store and analyse their massive amount of data.
So knowing how to setup a multi-node Hadoop cluster is an important task.

Prerequisites

We will need the following software and hardware as prerequisite to perform the
activities.

Ubuntu 18.04.3 LTS (Long Term Support)

● Hadoop-3.1.3

● JAVA 8

● SSH

● At least 2 laptop/desktop connected by LAN/Wi-Fi

Installations Steps

STEP: 1 Installation of Ubuntu/OS in the machines

This step is very self-explanatory, as a first step we need to install Ubuntu or any other flavor
of Linux you have chosen in both the nodes (Laptop/Desktop — will be referred as nodes
from hereon). You can also install a lighter version of Ubuntu — Lubuntu (Light weight
Ubuntu). if you are using old hardware where you are having difficulty installing Ubuntu.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

In my case I was using an old laptop of mine as the slave node and I had to install Lubuntu
and it worked without any issues.

Please create an admin user in both the nodes preferably with the same username

STEP: 2 Configuring host names

Once OS is installed as a next step, we should set the hostname for both the nodes. In my case
I named the nodes as —

● masternode
● slave sudo vi /etc/hostname

Reboot of the node is required after the hostname is updated.

* This step is optional if you have already put the hostnames during OS installation

STEP: 3 Configuring IP address in the hosts file of the nodes

Next, we need to add the IPs of masternode and slave node in the /etc/hosts file in both the
nodes.

Command:

sudo vi
/etc/host

Comment out all other entries you have in the hosts file in both the nodes.

Command to see the IP of the node:

ip addr show

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP:4 Restart the sshd service in both the nodes

Command:

service sshd
restart

STEP: 5 Create the SSH Key in the master node and publish it in the slave node.

For this activity follow the below steps:

● Command to generate SSH key in masternode: ssh-keygen

● It will ask for folder location where it will copy the keys, I entered
/home/username/.ssh/id_rsa

● It will ask for pass phrase, keep it empty for simplicity.

● Next copy the newly generated public key to auth file in your users
home/.ssh directory. Command: cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys

● Next execute — ssh localhost to check if the key is working.

● Next, we need to publish the key to the slave node. Command: ssh-
copy-id -i $HOME/.ssh/id_rsa.pub <username>@slave

● First time it will prompt you to enter the password and publish the
key.

● Execute ssh <username>@slave again to check if you are able to


loging without password. This is very important. Without public key
working,the slave node cannot be added to the cluster later.

STEP: 6 Download and install Java

Download and install Open JDK 8 and set the JAVA_HOME path in your .bashrc
file of the user under which you are installing hadoop.

STEP: 7 Download the Hadoop 3.1.3 package in all nodes.

Login to each node and download and untar the Hadoop package.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

wget http://apache.cs.utah.edu/hadoop/common/current/hadoop-3.1.3.tar.gz tar -xzf hadoop-


3.1.3.tar.gz

STEP: 8 Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.

Command: sudo vi .bashrc

Environment Variables to Set in .bashrc

STEP: 9 Set NameNode Location

Update your ~/hadoop/etc/hadoop/core-site.xml file to set the NameNode


location to node-master on port 9000:

STEP: 10 Set path for HDFS

Edit ~/hadoop/etc/hadoop/hdfs-site.conf to add the following for the masternode:

For the data node please put the following

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

Please note the difference between the configuration properties of masternode and
slave.

STEP: 11 Set YARN as Job Scheduler

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP: 12 Configure YARN


Edit ~/hadoop/etc/hadoop/yarn-site.xml, which contains the configuration options for YARN.
In the value field for the yarn.resourcemanager.hostname, replace 192.168.1.4 with the IP
address of node-master that you have:

STEP: 13 Configure Workers


The file worker is used by startup scripts to start required daemons on all nodes.
Edit

~/hadoop/etc/hadoop/workers of the masternode to include hostnames of both of the


nodes.

STEP: 14 Update the JAVA_HOME in hadoop-env.sh

Edit ~/hadoop/etc/hadoop/hadoop-env.sh and update the value for the


JAVA_HOME of your installation for both the nodes.

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP: 15 Format HDFS namenode


STEP: 16 Start and Stop HDFS
Ok. So now you are almost there. Only thing left is starting the daemons. To start all
the daemons and bring up your hadoop cluster use the below command:

Command: start-all.sh

Once the command prompt is back, to check the daemons running use the following
command: Command: jps

This is what you see in the masternode:

This is what you will see in the slave node:

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

If you are not seeing the above daemons running, then something has gone wrong in your
configuration. So, you need to check the previous steps again.
URL after modifying the IP with that of your masternode:
http://192.168.1.4:9870/dfshealth.html#tab- overview

SSAIET, Computer Dept, Navsari Page :


Enrollment No.
[ Big data analysis (3170722) ]
211230107015

STEP: 17 Put and Get Data to HDFS


To start with you have to create the user directory in your HDFS cluster. This user directory
should be in the same username under which you have installed and running the cluster. Use
the following command:

Command: hdfs dfs -mkdir /user/username

Once user directory is created you can use any of your hdfs dfs commands and start using
your HDFS cluster.

SSAIET, Computer Dept, Navsari Page :

You might also like