0% found this document useful (0 votes)

37 views10 pages

Practical 1-2com

Uploaded by

2203051057108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views10 pages

Practical 1-2com

Uploaded by

2203051057108

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Faculty OfEngineering& Technology

BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

FACULTY OF ENGINEERING AND TECHNOLOGY

Big Data Analytics (203105348)

7th SEMESTER

7A9 (CSE)

Name: Aashutosh.S.yadav

Year/Sem: 4th

Enrolment No. 2203051057108

Course: B-tech(CSE)

2203051057108 1
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

CERTIFICATE

This is to certify that

Mr./Ms..............................................................................................................
with enrolment no. ................................................................ has
successfully completed his/her laboratory experiments in the Big Data
Analytics (203105348) From the Department of

...................................................................................................

during the academic year ............................................

Date of Submission: -........................... Staff In charge: -...........................

2203051057108 2
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

INDEX

PAGE NO. MARKS SIGN

SR DATE OF DATE OF
NO PRACTICAL LIST START COMPLETION From To
1 To understand the overall 15-0602024 15-06-2024 4 6
programming architecture
using Map Reduce API.
2 Write a program of Word 7 10
Count in Map Reduce over
HDFS.
3
Basic CRUD operations in
MongoDB
4 Store the basic information
about students such as roll
no, name, date of birth, and
address of student using
various collection types
such as List, Set and Map.
5 Basic commands available
for the Hadoop Distributed
File System
6
Basic commands available
for HIVE Query Language.
7 Basic commands of HBASE
Shell.
8 Creating the HDFS tables
and loading them in Hive
and learn joining of tables
in Hive

2203051057108 3
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Practical:1
AIM: To understand the overall programming architecture using Map Reduce API.

The MapReduce task is mainly divide into into two phase map phase and Reduce Phase.
1. Map(), filter(), and reduce() in python.
2. These functions are most commonly used with lambda function.
1.Map():
“A map function execute certain instructions or functionality provided to it on every item of an
iterable could be a list, tuple, set, etc.
SYNTAX:
Map(function,iterable)
EXAMPLE:
items=[1,2,3,4,5]
a=list(map((lambda x: x **3), items))
print(a)

2.Filter():-
“A filter function in python tests a specific user-defined confition for a function and returns an
iterable for the elements and values that satisfy the condition or, in other words, return true.”

2203051057108 4
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

SYNTAX:
Filter(function, iterable)

EXAMPLE:
a=[1,2,3,4,5]
b=[2,5,0,7,3]
c=list(filter(lambda x: x in a,b))
print(c)# prints out[2,5,3]

3.Reduce():
“Reduce function apply a function to every item of an iterable and gives back a single value as a
resultant”.
We have to import the reduce function from functools module using the statement.
SYNTAX:
reduce(function, iterable)
EXAMPLE:
from functools import reduce
a=reduce((lambda x, y: x*y),[1,2,3,4,])
print(a)

2203051057108 5
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

2203051057108 6
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Practical-2

Aim: Write a program of Word Count in Map Reduce over HDFS.

Description:
MapReduce is a framework for processing large datasets using a large number of computers
(nodes), collectively referred to as a cluster. Processing can occur on data stored in a file
system (HDFS).A method for distributing computation across multiple nodes.Each node
processes the data that is stored at that node.
Consists of two main phases
Mapper Phase
Reduce phase

Map

I/P File Reduce HDFS

Map

Input data set is split into independent blocks – processed in parallel. Each input split is
converted in Key Value pairs. Mapper logic processes each key value pair and produces and
intermediate key value pairs based on the implementation logic. Resultant key value pairs can
be of different type from that of input key value pairs. The output of Mapper is passed to the
reducer. Output of Mapper function is the input for Reducer. Reducer sorts the intermediate

2203051057108 7
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

key value pairs. Applies reducer logic upon the key value pairs and produces the output in
desired format.Output is stored in HDFS.

The overall MapReduce word count process

Python Code
import urllib.request

import random

from operator import itemgetter

import_word = {}

import_count = 0

story = 'http://sixty-north.com/c/t.txt'

2203051057108 8
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

request =urllib.request.Request(story)

response = urllib.request.urlopen(request)

each_word = []

words = 1

same_words = {}

word = []

""" lopping the entire file"""

#Collect All the words into a list

for line in response:

#print "Line " , line

line_words = line.split()

for word in line_words:

each_word.append(word)

for words in each_word:

if words .lower() not in same_words.keys():

same_words[words.lower()] = 1

else:

same_words[words.lower()] =same_words[words.lower()]= +1

for each in same_words.keys():

print("word = ",each,"count = ",same_words[each]

2203051057108 9
Faculty OfEngineering& Technology
BIG-DATA ANALYSIS(203105348)
B.Tech CSE 4th Year 7th Semester

Output:-

2203051057108 10

Practical (Bda)
No ratings yet
Practical (Bda)
15 pages
Practical - 1
No ratings yet
Practical - 1
3 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
BDA - Manual - 1to6 Ayushi
No ratings yet
BDA - Manual - 1to6 Ayushi
22 pages
BDA Manual SHUBHAM
No ratings yet
BDA Manual SHUBHAM
22 pages
HPC 1-3
No ratings yet
HPC 1-3
19 pages
Bda Index
No ratings yet
Bda Index
3 pages
Even Sem 2 - 4 - 6 - 8 Supple Winter-2024
No ratings yet
Even Sem 2 - 4 - 6 - 8 Supple Winter-2024
9 pages
Bda Manual Index Ayushi
No ratings yet
Bda Manual Index Ayushi
2 pages
BDA Mayur
No ratings yet
BDA Mayur
43 pages
B.tech R20 III Year AIML Syllabus
No ratings yet
B.tech R20 III Year AIML Syllabus
49 pages
Scala Programming for B.Tech Students
No ratings yet
Scala Programming for B.Tech Students
5 pages
Se2 Complete
No ratings yet
Se2 Complete
14 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Manual
No ratings yet
BDA Manual
56 pages
Data Science Practical
No ratings yet
Data Science Practical
55 pages
Revaluation Supple Winter - 2024
No ratings yet
Revaluation Supple Winter - 2024
20 pages
BS Data Science
No ratings yet
BS Data Science
324 pages
110 SchemeSyllabus BTech ECS 3rd4th Wef AY2024-251
No ratings yet
110 SchemeSyllabus BTech ECS 3rd4th Wef AY2024-251
47 pages
2yrs Mtech Cse Sem1
No ratings yet
2yrs Mtech Cse Sem1
16 pages
First Page
No ratings yet
First Page
4 pages
Experiment Pgno
No ratings yet
Experiment Pgno
50 pages
Dbms Lab PDF
No ratings yet
Dbms Lab PDF
85 pages
Dbms Lab
No ratings yet
Dbms Lab
85 pages
Batch 4 - Revolutionizing Blood Cell Analysis
No ratings yet
Batch 4 - Revolutionizing Blood Cell Analysis
79 pages
HPC Lab Manual P 3-Compressed
No ratings yet
HPC Lab Manual P 3-Compressed
7 pages
Gopal
No ratings yet
Gopal
2 pages
Report
No ratings yet
Report
1 page
Wa0028
No ratings yet
Wa0028
3 pages
Data Split
No ratings yet
Data Split
50 pages
5th Sem
No ratings yet
5th Sem
1 page
Record DSCP508 - DV-1-1
No ratings yet
Record DSCP508 - DV-1-1
89 pages
BDA Final Manual 1-8 Sourav
No ratings yet
BDA Final Manual 1-8 Sourav
43 pages
Downloadfile
No ratings yet
Downloadfile
155 pages
Pattern Recognition Laboratery
No ratings yet
Pattern Recognition Laboratery
13 pages
Se1 Complete
No ratings yet
Se1 Complete
15 pages
Human Resource Analytics: Bachelor of Technology
No ratings yet
Human Resource Analytics: Bachelor of Technology
66 pages
Attendance - Report - D6AD - B - Even Sem - 23-24
No ratings yet
Attendance - Report - D6AD - B - Even Sem - 23-24
4 pages
CSE 3002 Big Data Technologies - 7sem
No ratings yet
CSE 3002 Big Data Technologies - 7sem
19 pages
Final Project Report
No ratings yet
Final Project Report
72 pages
MongoDB Guide for Data Science Students
No ratings yet
MongoDB Guide for Data Science Students
24 pages
Dsa Manul Pranay (210304124002)
No ratings yet
Dsa Manul Pranay (210304124002)
62 pages
11 Informatics Practices Eng 202324
No ratings yet
11 Informatics Practices Eng 202324
3 pages
Ads Subject Experts
No ratings yet
Ads Subject Experts
1 page
MD 9.1 - Curricula Syllabus - SPPU Autonomy - UG PDF
No ratings yet
MD 9.1 - Curricula Syllabus - SPPU Autonomy - UG PDF
945 pages
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
No ratings yet
Machine Learning - AL3451 - Notes - Unit 1 - Introduction To Machine Learning
29 pages
Analog & Digital Circuits Lab Manual
No ratings yet
Analog & Digital Circuits Lab Manual
176 pages
Bda 1
No ratings yet
Bda 1
95 pages
B.Tech. IT VII Sem (207 Credits) PDF
No ratings yet
B.Tech. IT VII Sem (207 Credits) PDF
34 pages
Digital Systems Design - EC3352 - Hand Written Notes - Unit 1 - Basic Concepts
No ratings yet
Digital Systems Design - EC3352 - Hand Written Notes - Unit 1 - Basic Concepts
58 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
DR - Baba Saheb Ambedkar
No ratings yet
DR - Baba Saheb Ambedkar
50 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
Digital Systems Design - EC3352 2021 Regulation - Important Questions
No ratings yet
Digital Systems Design - EC3352 2021 Regulation - Important Questions
65 pages
2CS702-CPD-Odd 23 24
No ratings yet
2CS702-CPD-Odd 23 24
9 pages
6.16 Computer Engineering 8 Branches
No ratings yet
6.16 Computer Engineering 8 Branches
23 pages
Dbms Mini Project
90% (10)
Dbms Mini Project
23 pages
IT Brief - 03-11-2020
No ratings yet
IT Brief - 03-11-2020
21 pages
188
No ratings yet
188
82 pages
Mis Hmis
No ratings yet
Mis Hmis
8 pages
KFC 11 Additional Exercises 1 2023 0213 PDF
No ratings yet
KFC 11 Additional Exercises 1 2023 0213 PDF
20 pages
11 Eth-Trunki Stackand CSS
No ratings yet
11 Eth-Trunki Stackand CSS
44 pages
LPP Notes
No ratings yet
LPP Notes
11 pages
Philips 32pfs580312
No ratings yet
Philips 32pfs580312
60 pages
ISDS 361A Phase 2 Case Study 2 PDF
No ratings yet
ISDS 361A Phase 2 Case Study 2 PDF
3 pages
Science Ai Flyers 19apr
No ratings yet
Science Ai Flyers 19apr
1 page
Passwords HardCopy & Conventional - v8 - Final
No ratings yet
Passwords HardCopy & Conventional - v8 - Final
1 page
Lesson Plan - Theory - PC EE 402 - 2025
No ratings yet
Lesson Plan - Theory - PC EE 402 - 2025
5 pages
Mohd
No ratings yet
Mohd
39 pages
ABAP 7.4 Syntax
No ratings yet
ABAP 7.4 Syntax
18 pages
HP Computers Customer Satisfaction Study
No ratings yet
HP Computers Customer Satisfaction Study
80 pages
NPM Init and Package - Json
No ratings yet
NPM Init and Package - Json
5 pages
Automated Requirements Metrics
No ratings yet
Automated Requirements Metrics
61 pages
React Optimization Techniques Part 1
No ratings yet
React Optimization Techniques Part 1
37 pages
Siebelink, Voordijk, Adriaanse - 2018 - Developing and Testing A Tool To Evaluate BIM Maturity Sectoral Analysis in The Dutch Constructi
No ratings yet
Siebelink, Voordijk, Adriaanse - 2018 - Developing and Testing A Tool To Evaluate BIM Maturity Sectoral Analysis in The Dutch Constructi
14 pages
iBF120 GX Technical Data Sheet 380466 1
No ratings yet
iBF120 GX Technical Data Sheet 380466 1
2 pages
Process #1: Develop Project Charter: - PG 75, PMBOK 6th Ed
No ratings yet
Process #1: Develop Project Charter: - PG 75, PMBOK 6th Ed
9 pages
Sem-4 (Operations) - Quantitative Techniques: LIST OF ATTEMPTED QUESTIONS AND ANSWERS (QT Quantitative Techniques Mix)
100% (5)
Sem-4 (Operations) - Quantitative Techniques: LIST OF ATTEMPTED QUESTIONS AND ANSWERS (QT Quantitative Techniques Mix)
7 pages
E Passport 1
No ratings yet
E Passport 1
15 pages
HP LaserJet 6L Specs - CNET
No ratings yet
HP LaserJet 6L Specs - CNET
7 pages
Management Information System: Group-6
No ratings yet
Management Information System: Group-6
17 pages
Code Snippets Updated For Blender 258
0% (1)
Code Snippets Updated For Blender 258
178 pages
F500 RS2
100% (3)
F500 RS2
14 pages
R00 - 16-20MVA - 33-11kV - Datasheet
No ratings yet
R00 - 16-20MVA - 33-11kV - Datasheet
3 pages
Chinese (Simplified)
No ratings yet
Chinese (Simplified)
21 pages
PMO - New Investor Online Enrolment
No ratings yet
PMO - New Investor Online Enrolment
27 pages
CNC Lathe Operating Manual
No ratings yet
CNC Lathe Operating Manual
5 pages
Commercial Dispatch Eedition 12-21-15
No ratings yet
Commercial Dispatch Eedition 12-21-15
16 pages
GOST 12.1.012-2004 - Eng
No ratings yet
GOST 12.1.012-2004 - Eng
19 pages

Practical 1-2com

Uploaded by

Practical 1-2com

Uploaded by

Faculty OfEngineering& Technology

FACULTY OF ENGINEERING AND TECHNOLOGY

Big Data Analytics (203105348)

Enrolment No. 2203051057108

This is to certify that

during the academic year ............................................

Date of Submission: -........................... Staff In charge: -...........................

PAGE NO. MARKS SIGN

Aim: Write a program of Word Count in Map Reduce over HDFS.

I/P File Reduce HDFS

The overall MapReduce word count process

from operator import itemgetter

""" lopping the entire file"""

#Collect All the words into a list

for line in response:

#print "Line " , line

for word in line_words:

for words in each_word:

if words .lower() not in same_words.keys():

for each in same_words.keys():

print("word = ",each,"count = ",same_words[each]

You might also like