Enrollment No.
[ Big data analysis (3170722)        ]
    211230107015
                                  PRACTICAL: 1
AIM: To Demonstrate Installation and Configuration of MongoDB client
Server.
STEP: 1 — Download the MongoDB MSI Installer Package
Download the current version of MongoDB. Make sure you select MSI as the
package.
SSAIET, Computer Dept, Navsari                                                Page :
    Enrollment No.
                                                      [ Big data analysis (3170722)        ]
    211230107015
STEP: 2 — Install MongoDB with the Installation Wizard
   A. Navigate to your downloads folder and double click on the .msi package you just
      downloaded. This will launch the installation wizard.
   B. Click Next to start installation.
   C. Accept the licence agreement then click Next.
SSAIET, Computer Dept, Navsari                                                    Page :
    Enrollment No.
                                                    [ Big data analysis (3170722)           ]
    211230107015
   D. Select the Complete setup.
   E. Select “Run service as Network Service user” and make a note of the data directory,
      we’ll need this later.
   F. We won’t need Mongo Compass, so deselect it and click Next.
SSAIET, Computer Dept, Navsari                                                   Page :
    Enrollment No.
                                             [ Big data analysis (3170722)        ]
    211230107015
   G. Click Install to begin installation.
   H. Hit Finish to complete installation.
SSAIET, Computer Dept, Navsari                                           Page :
    Enrollment No.
                                                   [ Big data analysis (3170722)        ]
    211230107015
   STEP: 3 After installation we have to set the path of mongodb and mongoshell in
   environment variable.
   A. Open Environment variable and then open path and add the path of mongodb and
      mongoshell.
SSAIET, Computer Dept, Navsari                                                 Page :
    Enrollment No.
                                                  [ Big data analysis (3170722)        ]
    211230107015
   B. After that command promat cheak the version of mongodb and run the
      mongodb.
SSAIET, Computer Dept, Navsari                                                Page :
    Enrollment No.
                                 [ Big data analysis (3170722)        ]
    211230107015
SSAIET, Computer Dept, Navsari                               Page :
    Enrollment No.
                                           [ Big data analysis (3170722)        ]
    211230107015
                                   PRACTICAL: 2
AIM: Write the MongoDB queries for creating database, collection,
inserting documents, updating document, deleting documents.
Code: -
   A. Creating database and collection
   B. Inserting documents
SSAIET, Computer Dept, Navsari                                         Page :
    Enrollment No.
                                 [ Big data analysis (3170722)        ]
    211230107015
   C. Updating Documents
   D. Deleting Documents
SSAIET, Computer Dept, Navsari                               Page :
    Enrollment No.
                                 [ Big data analysis (3170722)        ]
    211230107015
SSAIET, Computer Dept, Navsari                               Page :
     Enrollment No.
                                                       [ Big data analysis (3170722)        ]
     211230107015
                                      PRACTICAL: 3
AIM: Write the MongoDB queries for the given collection.
     Collection and Inserted Documents:
A. Find the document where in the name of collage has value ‘Neha’.
B. Display name of student from trupti collection
C. Display name of student with id of the student having id value 3.
D. Display documents with students id with 1 to 2.
SSAIET, Computer Dept, Navsari                                                     Page :
    Enrollment No.
                                                   [ Big data analysis (3170722)        ]
    211230107015
                                   PRACTICAL: 4
AIM: Write MongoDB queries for aggregate methods such as Count, Limit,
Sort etc.
   A. Display documents in the ascending order of _id
   B. Display documents in the descending order of name.
SSAIET, Computer Dept, Navsari                                                 Page :
    Enrollment No.
                                                      [ Big data analysis (3170722)        ]
    211230107015
   C. Display documents first in the ascending order of _id and then descending order of name.
   D. Display all documents except first two trupti collection.
   E. Display 2th and 3th documents from the trupti collection.
SSAIET, Computer Dept, Navsari                                                    Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)        ]
    211230107015
   F. Display total number of documents in trupti collection.
   G. Display last two documents from the trupti collection.
SSAIET, Computer Dept, Navsari                                                   Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)        ]
    211230107015
                                    PRACTICAL: 5
AIM: Write MongoDB queries similar to LIKE predicate in SQL.
   A. Find the ids of studnrts whose name begins with the letter “N”.
   B. Display all documents in which student name ends with the letter ‘e’.
   C. Find all documents in student name contains ‘h’in any position.
SSAIET, Computer Dept, Navsari                                                   Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)         ]
    211230107015
                                     PRACTICAL: 6
AIM: To Demonstrate the Installation and Configuration of Single node
Hadoop.
   1. Download Hadoop binaries
The first step is to download Hadoop binaries from the official website.
The binary package size is about 342 MB.
After finishing the file download, we should unpack the package using two steps. First, we
should extract the hadoop-3.2.1.tar.gz library, and then, we should unpack the extracted tar
file.
The tar file extraction may take some minutes to finish. In the end, you may see some
warnings about symbolic link creation. Just ignore these warnings since they are not related to
windows.
Since we are installing Hadoop 3.2.1, we should download the files locatedin
https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin and copy them
into the “hadoop-3.2.1\bin” directory.
SSAIET, Computer Dept, Navsari                                                      Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)         ]
    211230107015
After unpacking the package, we should add the Hadoop native IO libraries, which
can       be     found       in       the following     GitHub       repository:
https://github.com/cdarlint/winutils.
    2. Setting up environment variable.
After installing Hadoop and its prerequisites, we should configure the environment variables
to define Hadoop and Java default paths.
To edit environment variables, go to Control Panel > System and Security > System (or right-
click > properties on My Computer icon) and click on the “Advanced system settings” link.
There are two variables to define:
   •   JAVA_HOME: JDK installation folder path
   •   HADOOP_HOME: Hadoop installation folder path
SSAIET, Computer Dept, Navsari                                                    Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)        ]
    211230107015
   3. Configuring Hadoop cluster.
There are four files we should alter to configure Hadoop cluster:
%HADOOP_HOME%\etc\hadoop\hdfs-site.xml                   %HADOOP_HOME%\etc\
hadoop\core-site.xml
%HADOOP_HOME%\etc\hadoop\mapred-site.xml                 %HADOOP_HOME%\etc\
hadoop\yarn-site.xml
As we know, Hadoop is built using a master-slave paradigm. Before altering the HDFS
configuration file, we should create a directory to store all master node (name node) data and
another one to store data (data node). In this example, we created the following directories:
   •   C:\hadoop-env\hadoop-3.2.1\data\dfs\namenode
   •   C:\hadoop-env\hadoop-3.2.1\data\dfs\datanode
Now we can edit our hdfs-site.xml file for further config. Open file add edit as
below:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoop-env/hadoop3.2.1/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///E:/hadoop-env/hadoop-3.2.1/data/dfs/datanode</value>
</property>
Core site configuration
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9820</value>
SSAIET, Computer Dept, Navsari                                                     Page :
    Enrollment No.
                                                   [ Big data analysis (3170722)        ]
    211230107015
</property>
Map Reduce site configuration:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>MapReduce framework name</description>
</property>
Yarn site configuration:
<property>
<name>yarn.nodemanager.auxservices</name>
<value>mapreduce_shuffle</value>
<description>Yarn Node Manager Aux Service</description>
</property>
Formatting     Name
node:           hdfs
namenode -format
this commad may give you some error then we must fix those. If you had done all
well then you will get message like below:
SSAIET, Computer Dept, Navsari                                                 Page :
    Enrollment No.
                                                         [ Big data analysis (3170722)        ]
    211230107015
Let start Hadoop services and see it’s working or not:
Just navigate to “%HADOOP_HOME%\sbin” directory. Then we will run the
following command to start the Hadoop nodes:
.\start-dfs.cmd
Two command prompt windows will open (one for the name node and one for the
data node) as follows:
./start-yarn.cmd
./start-yarn.cmd
SSAIET, Computer Dept, Navsari                                                       Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)         ]
    211230107015
To make sure that all services started successfully, we can run the following command: Jps
14560 DataNode
4960 ResourceManager
5936 NameNode
768
NodeManager
14636 Jps
It will show above services running and it’s all for single node Hadoop setup.
SSAIET, Computer Dept, Navsari                                                     Page :
    Enrollment No.
                                                        [ Big data analysis (3170722)          ]
    211230107015
                                      PRACTICAL: 7
AIM: To demonstrate the configuration of multimode Hadoop
cluster.
Data, data and Data. Across every sectors people are dealing with huge and colossal amount
of data which is also termed as Big data. Hadoop is a very well known and widespread
distributed framework for big data processing . But when it comes to Hadoop installation,
most of us feel that it is quite cumbersome job. This article will provide you some easy and
quick steps for a multi node Hadoop cluster setup.
Multi-Node Cluster in Hadoop 3.x (3.1.3)
A Multi Node Cluster in Hadoop contains two or more data nodes in a distributed Hadoop
environment. This is used in organisations to store and analyse their massive amount of data.
So knowing how to setup a multi-node Hadoop cluster is an important task.
Prerequisites
We will need the following software and hardware as prerequisite to perform the
activities.
Ubuntu 18.04.3 LTS (Long Term Support)
           ● Hadoop-3.1.3
           ● JAVA 8
           ● SSH
           ● At least 2 laptop/desktop connected by LAN/Wi-Fi
Installations Steps
STEP: 1 Installation of Ubuntu/OS in the machines
This step is very self-explanatory, as a first step we need to install Ubuntu or any other flavor
of Linux you have chosen in both the nodes (Laptop/Desktop — will be referred as nodes
from hereon). You can also install a lighter version of Ubuntu — Lubuntu (Light weight
Ubuntu). if you are using old hardware where you are having difficulty installing Ubuntu.
SSAIET, Computer Dept, Navsari                                                        Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)            ]
    211230107015
In my case I was using an old laptop of mine as the slave node and I had to install Lubuntu
and it worked without any issues.
Please create an admin user in both the nodes preferably with the same username
STEP: 2 Configuring host names
Once OS is installed as a next step, we should set the hostname for both the nodes. In my case
I named the nodes as —
                ●     masternode
                ●     slave sudo vi /etc/hostname
Reboot of the node is required after the hostname is updated.
* This step is optional if you have already put the hostnames during OS installation
STEP: 3 Configuring IP address in the hosts file of the nodes
Next, we need to add the IPs of masternode and slave node in the /etc/hosts file in both the
nodes.
Command:
sudo           vi
/etc/host
Comment out all other entries you have in the hosts file in both the nodes.
Command to see the IP of the node:
ip addr show
SSAIET, Computer Dept, Navsari                                                         Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)        ]
    211230107015
STEP:4 Restart the sshd service in both the nodes
Command:
   service sshd
   restart
STEP: 5 Create the SSH Key in the master node and publish it in the slave node.
For this activity follow the below steps:
           ● Command to generate SSH key in masternode: ssh-keygen
           ● It will ask for folder location where it will copy the keys, I entered
             /home/username/.ssh/id_rsa
           ● It will ask for pass phrase, keep it empty for simplicity.
           ● Next copy the newly generated public key to auth file in your users
             home/.ssh directory. Command: cat $HOME/.ssh/id_rsa.pub >>
             $HOME/.ssh/authorized_keys
           ● Next execute — ssh localhost to check if the key is working.
           ● Next, we need to publish the key to the slave node. Command: ssh-
             copy-id -i $HOME/.ssh/id_rsa.pub <username>@slave
           ● First time it will prompt you to enter the password and publish the
             key.
           ● Execute ssh <username>@slave again to check if you are able to
             loging without password. This is very important. Without public key
             working,the slave node cannot be added to the cluster later.
STEP: 6 Download and install Java
Download and install Open JDK 8 and set the JAVA_HOME path in your .bashrc
file of the user under which you are installing hadoop.
STEP: 7 Download the Hadoop 3.1.3 package in all nodes.
Login to each node and download and untar the Hadoop package.
SSAIET, Computer Dept, Navsari                                                     Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)            ]
    211230107015
wget http://apache.cs.utah.edu/hadoop/common/current/hadoop-3.1.3.tar.gz tar -xzf hadoop-
3.1.3.tar.gz
STEP: 8 Add the Hadoop and Java paths in the bash file (.bashrc) on all nodes.
  Command: sudo vi .bashrc
  Environment Variables to Set in .bashrc
STEP: 9 Set NameNode Location
  Update your ~/hadoop/etc/hadoop/core-site.xml file to set the NameNode
  location to node-master on port 9000:
STEP: 10 Set path for HDFS
  Edit ~/hadoop/etc/hadoop/hdfs-site.conf to add the following for the masternode:
For the data node please put the following
SSAIET, Computer Dept, Navsari                                                       Page :
    Enrollment No.
                                                    [ Big data analysis (3170722)        ]
    211230107015
Please note the difference between the configuration properties of masternode and
slave.
STEP: 11 Set YARN as Job Scheduler
SSAIET, Computer Dept, Navsari                                                  Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)         ]
    211230107015
STEP: 12 Configure YARN
Edit ~/hadoop/etc/hadoop/yarn-site.xml, which contains the configuration options for YARN.
In the value field for the yarn.resourcemanager.hostname, replace 192.168.1.4 with the IP
address of node-master that you have:
STEP: 13 Configure Workers
The file worker is used by startup scripts to start required daemons on all nodes.
Edit
~/hadoop/etc/hadoop/workers of the masternode to include hostnames of both of the
nodes.
STEP: 14 Update the JAVA_HOME in hadoop-env.sh
Edit ~/hadoop/etc/hadoop/hadoop-env.sh and update           the   value   for   the
JAVA_HOME of your installation for both the nodes.
SSAIET, Computer Dept, Navsari                                                    Page :
    Enrollment No.
                                                       [ Big data analysis (3170722)          ]
    211230107015
STEP: 15 Format HDFS namenode
STEP: 16 Start and Stop HDFS
Ok. So now you are almost there. Only thing left is starting the daemons. To start all
the daemons and bring up your hadoop cluster use the below command:
Command: start-all.sh
Once the command prompt is back, to check the daemons running use the following
command: Command: jps
This is what you see in the masternode:
This is what you will see in the slave node:
SSAIET, Computer Dept, Navsari                                                       Page :
    Enrollment No.
                                                  [ Big data analysis (3170722)        ]
    211230107015
If you are not seeing the above daemons running, then something has gone wrong in your
configuration. So, you need to check the previous steps again.
URL after modifying the IP with that of your masternode:
http://192.168.1.4:9870/dfshealth.html#tab- overview
SSAIET, Computer Dept, Navsari                                                Page :
    Enrollment No.
                                                     [ Big data analysis (3170722)         ]
    211230107015
STEP: 17 Put and Get Data to HDFS
To start with you have to create the user directory in your HDFS cluster. This user directory
should be in the same username under which you have installed and running the cluster. Use
the following command:
Command: hdfs dfs -mkdir /user/username
Once user directory is created you can use any of your hdfs dfs commands and start using
your HDFS cluster.
SSAIET, Computer Dept, Navsari                                                    Page :