0% found this document useful (0 votes)

156 views25 pages

Mini Project Doc 2

This document provides an overview of using big data in healthcare analytics. It discusses the 3 V's of big data - volume, velocity, and variety. It then describes the Hadoop ecosystem including HDFS for storage, MapReduce for processing, Pig and Hive for querying, Sqoop for data transfer, and Impala and Cloudera for analytics. The document outlines how these tools can be used to get healthcare data from various sources, store it in HDFS, process it using MapReduce, query it using Pig/Hive, and perform analytics and visualization. Screenshots of implementing the system are also included to demonstrate loading and analyzing healthcare data using Hadoop technologies.

Uploaded by

Likhil Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views25 pages

Mini Project Doc 2

Uploaded by

Likhil Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

A

MINI PROJECT REPORT

on
HEALTH CARE ANALYTICS USING BIGDATA

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING

Submit by
CH. Likhil Kumar Goud

(197Y1A0521)

Y. Navadeep Reddy
(197Y1A0526)

Under the Guidance of

Mrs. K. Jaysri (Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING
MARRI LAXMAN REDDY

INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(AUTONOMOUS)

(Affiliated to JNTU-H, Approved by AICTE New Delhi and Accredited by

NBA & NAAC With ‘A’ Grade)
CERTIFICATE

This is to certify that the project report titled “Health Care Analytics using Bigdata” is

being submitted by CH. Likhil Kumar Goud (197Y1A0521) in IV B.Tech I Semester

Computer Science & Engineering is a record bonafide work carried out by him. The

results embodied in this report have not been submitted to any other University for the

award of anydegree.

Internal Guide HOD

Principal External Examiner

DECLARATION

I hereby declare that the Minor Project Report entitled, “Health Care Analytics using

Bigdata” submitted for the B.Tech degree is entirely my work and all ideas and references

have been duly acknowledged. It does not contain any work for the award of any other

degree.

Date:

CH.Likhil Kumar Goud

(197Y1A0521)

Y. Navadeep Reddy

(197Y1A0526)
Health Care Analytics

ACKNOWLEDGEMENT

I am happy to express my deep sense of gratitude to the principal of the college Dr. K.
Venkateswara Reddy, Professor, Department of Computer Science and Engineering, Marri
Laxman Reddy Institute of Technology & Management, for having provided me with adequate
facilities to pursue myproject.

I would like to thank Mr. Abdul Basith Khateeb, Assoc. Professor and Head, Department of
Computer Science and Engineering, Marri Laxman Reddy Institute of Technology &
Management, for having provided the freedom to use all the facilities available in the
department, especially the laboratories and the library.

I am very grateful to my project guide Mrs. K.Jaysri, Assi. Prof., Department of Computer
Science and Engineering, Marri Laxman Reddy Institute of Technology & Management, for his
extensive patience and guidance throughout my project work.

I sincerely thank my seniors and all the teaching and non-teaching staff of the Department of
Computer Science for their timely suggestions, healthy criticism and motivation during the
course of this work.

I would also like to thank my classmates for always being there whenever I needed help or
moral support. With great respect and obedience, I thank my parents and brother who were the
backbone behind my deeds.

Finally, I express my immense gratitude with pleasure to the other individuals who have either
directly or indirectly contributed to my need at right time for the development and success of
this work.

Department of CSE, MLRITM Page 4

September 2022
Health Care Analytics

CONTENTS

TABLE OF CONTENTS:
Certificates ii

Acknowledgement

Abstract vii

Chapter 1: Introduction 1

Chapter 2: Literature survey 2

1. INTRODUCTION
1.1 Bigdata 3V’s

1.2 Ecosystem
Hdfs
Map reduce
Pig
Hive

Sqoop

Impala

1.3 Applications of Bigdata:

Department of CSE, MLRITM Page 5
September 2022
Health Care Analytics

1.4 Cloudera

1.5 Hue
2. LITERATURE SURVEY

2.1 Existing system

2.2 Proposed system

3. REQUIREMENT ANALYSIS

3.1 Hardware requirements

3.2 Software requirements
4.IMPLEMENTATION

4.1 Problem Definition:

4.2 System Architecture
Get to the Source
Ingestion Strategy and Acquisition
Storage
Data processing
. Export Data sets

. Reporting and visualization

. Data Exploration

. Adhoc Querying

Department of CSE, MLRITM Page 6

September 2022
Health Care Analytics

5. METHODOLOGY

5.1 how Hdfs is used in our project:

5.2 how hive is used:
5.3 how cloudera is used:
5.4 how hue is used:
5.5 how sqoop is used:
6. SCREENSHOTS

 To create database
 To create table
 To display fields
 Loading data into mysql
 To import data from mysql to hdfs
 COMPILATION TIME

Department of CSE, MLRITM Page 7

September 2022
Health Care Analytics

LIST OF FIGURES
4.2 System Architecture 17
5.1 how Hdfs is used in our project 21

Department of CSE, MLRITM Page 8

September 2022
Health Care Analytics

LIST OF TABLES

6. SCREENSHOTS 22

Department of CSE, MLRITM Page 9

September 2022
Health Care Analytics

ABSTRACT
In today's modern world, healthcare also needs to be modernized. It
means that the healthcare data should be properly analyzed so that we
can categorize it into groups of Gender, Disease, City, symptoms and
treatment.
BIGDATA is used to predict epidemics, cure disease, improve quality
of life and avoid preventable deaths. With the increasing population of
the world, and everyone living longer, models of treatment delivery are
rapidly changing and many of the decision behind those changes are
being driven by data.
The drive now is to understand as much as a patient as possible as early
in their life as possible, hopefully picking up warning signs of serious
illness at early enough stage that treatment is far simpler and less
expensive than if it had not been spotted until later. The gigantic size of
analytics will need large computation which can be done with the help
of distributed processing HADOOP.
The frameworks use will provide multipurpose beneficial outputs which
includes getting the healthcare data analysis into various forms.The
groups made by the system would be symptoms wise, age wise, gender
wise, season wise, disease wise etc. As the system will display the data
group wise, it would be helpful to get a clear idea about the disease and
their rate of spreading, so that appropriate treatment could be given at
proper time.

Department of CSE, MLRITM Page 10

September 2022
Health Care Analytics

1. INTRODUCTION
1.1 Bigdata 3V’s:

The 3Vs that define Big Data are Variety, Velocity

and Volume.
Volume
We currently see the exponential growth in the data storage as
the data is now more than text data. We can find data in the
format of videos, music and large images on our social media
channels. It is very common to have Terabytes and Petabytes of
the storage system for enterprises. As the database grows the
applications and architecture built to support the data needs to be
reevaluated quite often. Sometimes the same data is re-evaluated
with multiple angles and even though the original data is the
same the new found intelligence creates explosion of the data.
The big volume indeed represents Big Data.

Velocity
The data growth and social media explosion have changed how
we look at the data. There was a time when we used to believe
that data of yesterday is recent. The matter of the fact
newspapers is still following that logic. However, news channels
and radios have changed how fast we receive the news. Today,
people reply on social media to update them with the latest
happening. They often discard old messages and pay attention to
recent updates. The data movement is now almost real time and
the update window has reduced to fractions of the seconds. This
high velocity data represent Big Data.

Department of CSE, MLRITM Page 11

September 2022
Health Care Analytics

Variety
Data can be stored in multiple format. For example database,
excel, csv, access or for the matter of the fact, it can be stored in
a simple text file. Sometimes the data is not even in the
traditional format as we assume, it may be in the form of video,
SMS, pdf or something we might have not thought about it. It is
the need of the organization to arrange it and make it
meaningful. It will be easy to do so if we have data in the same
format, however it is not the case most of the time. The real
world have data in many different formats and that is the
challenge we need to overcome with the BigData. This variety
of the data represent represent Big Data.

1.2 Ecosystem
Hdfs:
HDFS is built to support applications with large data sets,
including individual files that reach into the terabytes. It uses a
master/slave architecture, with each cluster consisting of a single
Namenode that manages file system operations and supporting
Datanodes that manage data storage on individual compute
nodes.
Map reduce:
MapReduce is a programming model for processing large
data sets with a parallel , distributed algorithm on a

Department of CSE, MLRITM Page 12

September 2022
Health Care Analytics

cluster. MapReduce model consist of two separate routines,

namely Map-function and Reduce-function. The computation on
an input in MapReduce model occurs in three stages:
In the map stage, the mapper takes a single(key, value) pair
as input and produces any number of pairs as output .
The shuffle stage is automatically handled by the
MapReduce framework. The underlying system implementing
MapReduce routes all of the values that are associated with an
individual key to the same reducer.
In the reduce stage, the reducer takes all of the values
associated with a single key k and outputs any number of pairs.
Pig:
Pig is a high-level platform for creating programs that run
on Apache Hadoop. The language for this platform is called Pig
Latin. Pig Latin abstracts the programming from
the Java MapReduce idiom into a notation which makes
MapReduce programming high level, similar to that
of SQL for RDBMS.

Hive:
Hive is a data warehouse infrastructure tool to process
structured data in Hadoop. It resides on top of Hadoop to
summarize Big Data, and makes querying and analyzing easy.
Hive gives an SQL-like interface to query data stored in various

Department of CSE, MLRITM Page 13

September 2022
Health Care Analytics

databases and file systems that integrate with Hadoop. The

traditional SQL queries must be implemented in
the MapReduce Java API to execute SQL applications and
queries over a distributed data.
Sqoop:
Sqoop is a command-line interface application for
transferring data between relational databases and Hadoop. It is
a big data tool that offers the capability to extract datafrom
non-Hadoop data stores, transform the data into a form usable
by Hadoop, and then load the data into HDFS. This process is
called ETL, for Extract, Transform, and Load.
Impala:
Cloudera Impala is Cloudera's open source massively
parallel processing (MPP) SQL query engine for data stored in
a computer cluster running Apache Hadoop. Impala brings
scalable parallel database technology to Hadoop, enabling users
to issue low-latency SQL queries to data stored
in HDFS and Apache HBase without requiring data movement
or transformation. Impala is integrated with Hadoop to use the
same file and data formats, metadata, security and resource
management frameworks used by MapReduce, Apache
Hive, Apache Pig and other Hadoop software.
1.3 Applications of Bigdata:
Healthcare contributions

Banking Sectors and Fraud Detection

Department of CSE, MLRITM Page 14

September 2022
Health Care Analytics

Private sector uses the big data in traffic management,

direction preparation,intellectual transportation arrangements
and overcrowding administration.
Private sector uses the big data in income administration,
industrial improvements, logistics and for reasonable benefit.

1.4 Cloudera:
Cloudera's open-source Apache Hadoop distribution, CDH
(Cloudera Distribution Including Apache Hadoop), targets
enterprise-class deployments of that technology. Cloudera says
that more than 50% of its engineering output is donated
upstream to the various Apache-licensed open source projects
(Apache Hive, Apache Avro, Apache HBase, and so on) that
combine to form the Hadoop platform.
1.5 Hue:
Hue is an open source Web interface for analyzing data with
any Apache Hadoop.Hue allows technical and non-technical
users to take advantage of Hive, Pig, and many of the other tools
that are part of the Hadoop.
You can load your data, runinteractive Hive queries, develop
and run Pig scripts, work with HDFS, check on the status of
your jobs, and more. Hue’s File Browser allows you to browse

Department of CSE, MLRITM Page 15

September 2022
Health Care Analytics

Amazon Simple Storage Service (S3) buckets and you can use
the Hive editor to run queries against data stored in S3.
2. LITERATURE SURVEY
2.1 Existing system:
The existing systems are done using RDBMS which stores
data in the form of tables. RDBMS allows to store only
structured data.
When any user want to know about the basic information of
diseases the person will interact the concern hospital and if the
user want to take appointment user want to go directly to the
hospital to fix the appointment . if the user is enable to go
hospital in particular time. User will enable to take appointment
instantly.

2.2 Proposed system:

The proposed system will group together the disease and
their symptoms data and analyze it to provide cumulative
information. After the analysis, algorithm could be applied to
the resultant and grouping can be made to show a clear picture
of the analysis.

3. REQUIREMENT ANALYSIS
3.1 Hardware requirements
Processor
16 GB Memory
4 TB Disk

Department of CSE, MLRITM Page 16

September 2022
Health Care Analytics

3.2 Software requirements

VM ware
Linux OS

4.IMPLEMENTATION
4.1 Problem Definition:
Health care analytics using big data hadoop.
4.2 System Architecture

Get to the Source!

Source profiling is one of the most important steps in deciding
the architecture. It involves identifying the different source
systems and categorizing them based on their nature and type.
Points to be considered while profiling the data sources:
 Identify the internal and external sources systems
 High Level assumption for the amount of data ingested
from each source
 Identify the mechanism used to get data – push or pull
 Determine the type of data source – Database, File, web
service, streams etc.
 Determine the type of data – structured, semi structured or
unstructured.
Ingestion Strategy and Acquisition

Department of CSE, MLRITM Page 17

September 2022
Health Care Analytics

Data ingestion in the Hadoop world means ELT (Extract, Load

and Transform) as opposed to ETL (Extract, Transform and
Load) in case of traditional warehouses.
Points to be considered:
 Determine the frequency at which data would be ingested
from each source
 Is there a need to change the semantics of the data append
replace etc?
 Is there any data validation or transformation required
before ingestion (Pre-processing)?
 Segregate the data sources based on mode of ingestion –
Batch or real-time
Storage:
Hadoop distributed file system is the most commonly used
storage framework in BigData world, others are the NoSql data
stores – MongoDB, HBase, Cassandra etc. One of the salient
features of Hadoop storage is its capability to scale, self-manage
and self-heal.
Things to consider while planning storage methodology:
 Type of data (Historical or Incremental)
 Format of data ( structured, semi structured and
unstructured)
 Compression requirements
 Frequency of incoming data
 Query pattern on the data
 Consumers of the data
Data processing:
Earlier frequently accessed data was stored in Dynamic
RAMs but now due to the sheer volume, it is been stored on
Department of CSE, MLRITM Page 18
September 2022
Health Care Analytics

multiple disks on a number of machines connected via the

network. Instead of bringing the data to processing, in the new
way, processing is taken closer to data which significantly
reduce the network I/O.The Processing methodology is driven
by business requirements. It can be categorized into Batch, real-
time or Hybrid based on the SLA.
 Batch Processing – Batch is collecting the input for a
specified interval of time and running transformations on it
in a scheduled way. Historical data load is a typical batch
operation.
Technology Used: MapReduce, Hive, Pig
 Real-time Processing – Real-time processing involves
running transformations as and when data is acquired.
Technology Used: Impala, Spark, spark SQL.
 Hybrid Processing – It’s a combination of both batch and
real-time processing needs.
Data consumption:
Different users like administrator, Business users, vendor,
partners etc. can consume data in different format. Output of
analysis can be consumed by recommendation engine or
business processes can be triggered based on the analysis.
Different forms of data consumption are:
 Export Data sets – There can be requirements for third
party data set generation. Data sets can be generated using
hive export or directly from HDFS.

Department of CSE, MLRITM Page 19

September 2022
Health Care Analytics

 Reporting and visualization – Different reporting and

visualization tool scan connect to Hadoop using
JDBC/ODBC connectivity to hive.
 Data Exploration – Data scientist can build models and
perform deep exploration in a sandbox environment.
Sandbox can be a separate cluster (Recommended
approach) or a separate schema within same cluster that
contains subset of actual data.
 Adhoc Querying – Adhoc or Interactive querying can be
supported by using Hive, Impala or spark SQL.
5. METHODOLOGY
5.1 how Hdfs is used in our project:
HDFS holds very large amount of data and provides
easier access. To store such huge data, the files are stored
across multiple machines. These files are stored in
redundant fashion to rescue the system from possible data
losses in case of failure.
HDFS also makes applications available to Parallel
processing. HDFS mainly consists of two nodes
 Namenode
 Datanode

Department of CSE, MLRITM Page 20

September 2022
Health Care Analytics

5.2 how hive is used:

Hive gives an SQL-like interface to query data stored
in various databases and file systems that integrate
with Hadoop.
Hive supports easy portability of SQL-based application
to Hadoop.
SQL statements are broken down by the Hive service
into MapReducejobs and executed across a Hadoop cluster.

5.3 how cloudera is used:

5.4 how hue is used:
5.4 how sqoop is used:

Department of CSE, MLRITM Page 21

September 2022
Health Care Analytics

6. SCREENSHOTS
 To create database
 To create table

 To display fields

Department of CSE, MLRITM Page 22

September 2022
Health Care Analytics

 Loading data into mysql

 To import data from mysql to hdfs

Department of CSE, MLRITM Page 23

September 2022
Health Care Analytics

COMPILATION TIME

Department of CSE, MLRITM Page 24

September 2022
Health Care Analytics

Department of CSE, MLRITM Page 25

September 2022

Mar Publishing
No ratings yet
Mar Publishing
7 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
14 pages
Big Data Analytics For Healthcare Industry
100% (1)
Big Data Analytics For Healthcare Industry
20 pages
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
No ratings yet
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
10 pages
Bsa Assignment
No ratings yet
Bsa Assignment
13 pages
Big Data Final Report
100% (3)
Big Data Final Report
34 pages
The Role of Big Data Analytics in Hospital Management System
No ratings yet
The Role of Big Data Analytics in Hospital Management System
6 pages
Seminar Big Data in Health Care
No ratings yet
Seminar Big Data in Health Care
36 pages
Big Data Analytics in Healthcare
No ratings yet
Big Data Analytics in Healthcare
33 pages
Case Study DS-BDA
No ratings yet
Case Study DS-BDA
29 pages
Big Data Analytics in Health Care A Review Paper
No ratings yet
Big Data Analytics in Health Care A Review Paper
12 pages
Sample
No ratings yet
Sample
11 pages
Healt Care
No ratings yet
Healt Care
22 pages
Big Data Hadoop in Health Care
No ratings yet
Big Data Hadoop in Health Care
51 pages
10 1109ICoAC44903 2018 8939061
No ratings yet
10 1109ICoAC44903 2018 8939061
9 pages
Case Study Unit-1 and 2 Big Data
No ratings yet
Case Study Unit-1 and 2 Big Data
9 pages
Ojha 2016
No ratings yet
Ojha 2016
7 pages
Final Major Team 3 Document
No ratings yet
Final Major Team 3 Document
92 pages
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
No ratings yet
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
6 pages
Data Science's Impact on Healthcare
No ratings yet
Data Science's Impact on Healthcare
11 pages
Innovative Project1
No ratings yet
Innovative Project1
25 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
31 pages
Rohit Manna SEM3 CA2 MSD303 MSC (DA)
No ratings yet
Rohit Manna SEM3 CA2 MSD303 MSC (DA)
4 pages
Hadoop Research Paper
No ratings yet
Hadoop Research Paper
7 pages
Big Data Analytics Evaluation Scheme
No ratings yet
Big Data Analytics Evaluation Scheme
7 pages
Gag PDF
No ratings yet
Gag PDF
15 pages
De-Identified Personal Health Care System Using Hadoop
No ratings yet
De-Identified Personal Health Care System Using Hadoop
8 pages
HealthcareBigData AComprehensiveOverview
No ratings yet
HealthcareBigData AComprehensiveOverview
29 pages
BDAHC
No ratings yet
BDAHC
4 pages
FenilSeminar Presentation
No ratings yet
FenilSeminar Presentation
6 pages
Final Big Data Word
No ratings yet
Final Big Data Word
9 pages
Big Data in Healthcare Management
No ratings yet
Big Data in Healthcare Management
2 pages
A Project Report On Web Based Data Management
No ratings yet
A Project Report On Web Based Data Management
16 pages
A Systematic Review On Big Data Applications and Scope For Industrial Processing and Healthcare Sectors
No ratings yet
A Systematic Review On Big Data Applications and Scope For Industrial Processing and Healthcare Sectors
35 pages
Big Data's Impact on Healthcare
No ratings yet
Big Data's Impact on Healthcare
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Statistical Analysis of Big Data To Improvise Health Care: February 2018
No ratings yet
Statistical Analysis of Big Data To Improvise Health Care: February 2018
4 pages
2nd Draft of Literature Review
No ratings yet
2nd Draft of Literature Review
6 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
16 pages
Full Doc Janani
No ratings yet
Full Doc Janani
121 pages
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
100% (22)
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
23 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
HAP 780 15 Big Data
No ratings yet
HAP 780 15 Big Data
19 pages
SAP HANA in Healthcare Real-Time Big Data Analysis
No ratings yet
SAP HANA in Healthcare Real-Time Big Data Analysis
25 pages
BIG DATA For Healthcare A Survey
No ratings yet
BIG DATA For Healthcare A Survey
12 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
2 pages
Bda Cac1
No ratings yet
Bda Cac1
3 pages
Bigdata Teikyo University PDF
No ratings yet
Bigdata Teikyo University PDF
16 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Application of Big Data Analytics - An Innovation in Health Care
No ratings yet
Application of Big Data Analytics - An Innovation in Health Care
14 pages
Anand J. Kulkarn
No ratings yet
Anand J. Kulkarn
4 pages
Big Datapptfina1
No ratings yet
Big Datapptfina1
25 pages
Big Data in Health Care
No ratings yet
Big Data in Health Care
4 pages
Big Data Criteria
No ratings yet
Big Data Criteria
10 pages
Phim0016 0001f 2
No ratings yet
Phim0016 0001f 2
19 pages
Big Data Is The Future of Healthcare
No ratings yet
Big Data Is The Future of Healthcare
11 pages
Lecture 1 - What Is Big Data 1-29
No ratings yet
Lecture 1 - What Is Big Data 1-29
88 pages
Unit 1
No ratings yet
Unit 1
19 pages
Arduino Greenhouse Control
No ratings yet
Arduino Greenhouse Control
8 pages
PeopleSoft Internet Architecture (PIA)
100% (3)
PeopleSoft Internet Architecture (PIA)
11 pages
Serva PXE Ubuntu Install
No ratings yet
Serva PXE Ubuntu Install
9 pages
Fourth Generation Computers Overview
100% (2)
Fourth Generation Computers Overview
3 pages
Standby Android Log 2024 0723 021842
No ratings yet
Standby Android Log 2024 0723 021842
682 pages
Concurrent Java
No ratings yet
Concurrent Java
20 pages
Mediationzone 7.0: by Markus Henriks & Irene Gonzalvez
No ratings yet
Mediationzone 7.0: by Markus Henriks & Irene Gonzalvez
38 pages
PLSQL Interview Questions and Answers: Functions
No ratings yet
PLSQL Interview Questions and Answers: Functions
15 pages
1.2 3 Drivers 2
No ratings yet
1.2 3 Drivers 2
7 pages
Outlook 2016 Beginner's Guide
No ratings yet
Outlook 2016 Beginner's Guide
503 pages
Sabertooth 2 X 32
No ratings yet
Sabertooth 2 X 32
63 pages
Tariq Hashmat Tauheed: - Under The Supervision of Prof. M. Hasan
No ratings yet
Tariq Hashmat Tauheed: - Under The Supervision of Prof. M. Hasan
23 pages
Teguh Kawal Services (16 Jan) DC VIIC2273
No ratings yet
Teguh Kawal Services (16 Jan) DC VIIC2273
4 pages
ViviCam X018 Camera Manual
No ratings yet
ViviCam X018 Camera Manual
58 pages
Mystwood Manor FAQ & Troubleshooting
No ratings yet
Mystwood Manor FAQ & Troubleshooting
4 pages
3.3 SAMTEC - Introduction and Configuration: DICV-DM-M053
100% (1)
3.3 SAMTEC - Introduction and Configuration: DICV-DM-M053
6 pages
Practical - C Constructs in Assembly
No ratings yet
Practical - C Constructs in Assembly
9 pages
2025 Resume Amanual
No ratings yet
2025 Resume Amanual
3 pages
SEWP ZG512 Object Oriented Analysis and Design Prof. Santosh Chobe
No ratings yet
SEWP ZG512 Object Oriented Analysis and Design Prof. Santosh Chobe
31 pages
A Guide To Installing The Sitecore XP Scaled Topology
No ratings yet
A Guide To Installing The Sitecore XP Scaled Topology
54 pages
Middleware for IT Professionals
No ratings yet
Middleware for IT Professionals
15 pages
Aneka Cloud Application Platform and Its Integration With Windows Azure
No ratings yet
Aneka Cloud Application Platform and Its Integration With Windows Azure
30 pages
Report
No ratings yet
Report
39 pages
DBW301 Test 2
No ratings yet
DBW301 Test 2
3 pages
Unit Testing Cake PHP
No ratings yet
Unit Testing Cake PHP
106 pages
Tutorial Sadasd Asdas
No ratings yet
Tutorial Sadasd Asdas
3 pages
Arithmetic Operations
No ratings yet
Arithmetic Operations
42 pages
Log
No ratings yet
Log
2,046 pages
Cisco CLI Guide for Admins
No ratings yet
Cisco CLI Guide for Admins
106 pages
Details Linux Based Pax S80 ARM Procesor
No ratings yet
Details Linux Based Pax S80 ARM Procesor
2 pages

Mini Project Doc 2

Uploaded by

Mini Project Doc 2

Uploaded by

A

MINI PROJECT REPORT

COMPUTER SCIENCE AND ENGINEERING

Under the Guidance of

Mrs. K. Jaysri (Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE AND

INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(Affiliated to JNTU-H, Approved by AICTE New Delhi and Accredited by

being submitted by CH. Likhil Kumar Goud (197Y1A0521) in IV B.Tech I Semester

Internal Guide HOD

Principal External Examiner

CH.Likhil Kumar Goud

Department of CSE, MLRITM Page 4

Chapter 2: Literature survey 2

1.3 Applications of Bigdata:

2.1 Existing system

3.1 Hardware requirements

4.1 Problem Definition:

. Reporting and visualization

Department of CSE, MLRITM Page 6

5.1 how Hdfs is used in our project:

Department of CSE, MLRITM Page 7

Department of CSE, MLRITM Page 8

Department of CSE, MLRITM Page 9

Department of CSE, MLRITM Page 10

The 3Vs that define Big Data are Variety, Velocity

Department of CSE, MLRITM Page 11

Department of CSE, MLRITM Page 12

cluster. MapReduce model consist of two separate routines,

Department of CSE, MLRITM Page 13

databases and file systems that integrate with Hadoop. The

Banking Sectors and Fraud Detection

Department of CSE, MLRITM Page 14

Private sector uses the big data in traffic management,

Department of CSE, MLRITM Page 15

2.2 Proposed system:

Department of CSE, MLRITM Page 16

3.2 Software requirements

Get to the Source!

Department of CSE, MLRITM Page 17

Data ingestion in the Hadoop world means ELT (Extract, Load

multiple disks on a number of machines connected via the

Department of CSE, MLRITM Page 19

 Reporting and visualization – Different reporting and

Department of CSE, MLRITM Page 20

5.2 how hive is used:

5.3 how cloudera is used:

Department of CSE, MLRITM Page 21

Department of CSE, MLRITM Page 22

 Loading data into mysql

 To import data from mysql to hdfs

Department of CSE, MLRITM Page 23

Department of CSE, MLRITM Page 24

Department of CSE, MLRITM Page 25

You might also like