Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
FACULTY OF ENGINEERING AND TECHNOLOGY
Big Data Analytics (203105348)
7th SEMESTER
7A9 (CSE)
Name: Aashutosh.S.yadav
Year/Sem: 4th
Enrolment No. 2203051057108
Course: B-tech(CSE)
2203051057108 1
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
CERTIFICATE
This is to certify that
Mr./Ms..............................................................................................................
with enrolment no. ................................................................ has
successfully completed his/her laboratory experiments in the Big Data
Analytics (203105348) From the Department of
...................................................................................................
during the academic year ............................................
Date of Submission: -........................... Staff In charge: -...........................
2203051057108 2
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
INDEX
PAGE NO. MARKS SIGN
SR DATE OF DATE OF
NO PRACTICAL LIST START COMPLETION From To
1 To understand the overall 15-0602024 15-06-2024 4 6
programming architecture
using Map Reduce API.
2 Write a program of Word 7 10
Count in Map Reduce over
HDFS.
3
Basic CRUD operations in
MongoDB
4 Store the basic information
about students such as roll
no, name, date of birth, and
address of student using
various collection types
such as List, Set and Map.
5 Basic commands available
for the Hadoop Distributed
File System
6
Basic commands available
for HIVE Query Language.
7 Basic commands of HBASE
Shell.
8 Creating the HDFS tables
and loading them in Hive
and learn joining of tables
in Hive
2203051057108 3
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
Practical:1
AIM: To understand the overall programming architecture using Map Reduce API.
The MapReduce task is mainly divide into into two phase map phase and Reduce Phase.
1. Map(), filter(), and reduce() in python.
2. These functions are most commonly used with lambda function.
1.Map():
“A map function execute certain instructions or functionality provided to it on every item of an
iterable could be a list, tuple, set, etc.
SYNTAX:
Map(function,iterable)
EXAMPLE:
items=[1,2,3,4,5]
a=list(map((lambda x: x **3), items))
print(a)
2.Filter():-
“A filter function in python tests a specific user-defined confition for a function and returns an
iterable for the elements and values that satisfy the condition or, in other words, return true.”
2203051057108 4
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
SYNTAX:
Filter(function, iterable)
EXAMPLE:
a=[1,2,3,4,5]
b=[2,5,0,7,3]
c=list(filter(lambda x: x in a,b))
print(c)# prints out[2,5,3]
3.Reduce():
“Reduce function apply a function to every item of an iterable and gives back a single value as a
resultant”.
We have to import the reduce function from functools module using the statement.
SYNTAX:
reduce(function, iterable)
EXAMPLE:
from functools import reduce
a=reduce((lambda x, y: x*y),[1,2,3,4,])
print(a)
2203051057108 5
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
2203051057108 6
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
Practical-2
Aim: Write a program of Word Count in Map Reduce over HDFS.
Description:
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.
Consists of two main phases
Mapper Phase
Reduce phase
Map
I/P File Reduce HDFS
Map
Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate
2203051057108 7
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format.Output is stored in HDFS.
The overall MapReduce word count process
Python Code
import urllib.request
import random
from operator import itemgetter
import_word = {}
import_count = 0
story = 'http://sixty-north.com/c/t.txt'
2203051057108 8
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
request =urllib.request.Request(story)
response = urllib.request.urlopen(request)
each_word = []
words = 1
same_words = {}
word = []
""" lopping the entire file"""
#Collect All the words into a list
for line in response:
#print "Line " , line
line_words = line.split()
for word in line_words:
each_word.append(word)
for words in each_word:
if words .lower() not in same_words.keys():
same_words[words.lower()] = 1
else:
same_words[words.lower()] =same_words[words.lower()]= +1
for each in same_words.keys():
print("word = ",each,"count = ",same_words[each]
2203051057108 9
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester
Output:-
2203051057108 10