Unit 1
Unit 1
3
BIG DATA ANALYTICS
• In this data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
BIG DATA ANALYTICS
• Big data analytics refers to the systematic processing and analysis of large
amounts of data and complex data sets, known as big data, to extract valuable
insights.
• Big data analytics allows for the uncovering of trends, patterns and correlations
in large amounts of raw data to help analysts make data-informed decisions.
• This process allows organizations to leverage the exponentially growing data
generated from diverse sources, including internet-of-things (IoT) sensors, social
media, financial transactions and smart devices to derive actionable intelligence
through advanced analytic techniques.
5
BIG DATA ANALYTICS
6
IMPORTANCE OF BIG DATA
7
IMPORTANCE OF BIG DATA
• The importance of big data does not revolve around how much data a company has
but how a company utilizes the collected data.
• Every company uses data in its own way; the more efficiently a company uses its
data, the more potential it has to grow.
• The company can take data from any source and analyse it to find answers which will
enable:
• Cost Savings
• Time Reductions
• Understand the market conditions
• Control online reputation
8
BIG DATA CAN BE USED IN
14
TYPES OF DIGITAL DATA
Gartner estimates that 80% of data generated in any
enterprise today is unstructured data. Roughly 10% of
• Structured Data data is in the structured and semi-structured category.
• Semi-Structured Data
• Unstructured Data
15
TYPES OF DIGITAL DATA
16
TYPES OF DIGITAL DATA
• Structured Data
• Data which is in a organized form (rows and columns) and can be easily used
by a computer program.
• Conforms to a Data model.
• E.g: Data stored in Databases.
• Sources of Structured Data
• Database such as Oracle, MySQL, DB2, Teradata, ….
• Spreadsheets
• OLTP systems
17
TYPES OF DIGITAL DATA
• Semi-Structured Data
• Data which does not conform to a data model but has some structure.
• It is not in a form which can be easily used by a computer program.
• E.g: XML, HTML, e-mails,…..
• Sources of Semi-structured Data
• XML (eXtensible Markup Language)
• JSON (Java Script Object Notation)
• Used to transmit data between a server and a web application.
18
19
TYPES OF DIGITAL DATA
• Un Structured Data
• Data which does not conform to a data model or is not in a form which can be
used easily by a computer program.
• E.g: Memos, Images, Audio, Video, letters, etc.
• Sources of Unstructured Data
• Web pages Body of e-mail
• Images Text messages
• Audios Chat
• Videos Social media data
Word document
20
TYPES OF DIGITAL DATA
• Weather forecasting
• NoSQL
21
22
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
23
DATA SIZE
24
HOW TO DEAL WITH UNSTRUCTURED DATA?
• Data Mining
• Process of discovering knowledge hidden in large volumes of data.
• Text Analytics (or) Text Mining
• Process of gleaning high quality meaningful information from text.
• Includes tasks such as Text categorization, Text clustering, Sentiment analysis, ….
• Natural Language Processing (NLP)
• Related to the area of human computer interaction.
• Enabling computers to understand human (or) Natural language input.
• Noisy Text Analytics
• Process of extracting structured (or) semi-structured information from noisy
unstructured data such as chats, blogs, emails, text messages, etc.
25
DATA GROWS
• “Every day, we create 2.5 quintillion bytes of data — so much that 90%
of the data in the world today has been created in the last few years
alone.”
• How to manage very large amounts of data and extract value and knowledge
from them ?
26
WHERE DO WE GET DATA
• Definition of BIG DATA: Collection of Datasets that are large and complex
that can not be processed by traditional data processing applications.
• Constitute both structured and un structured data that grow large so fast
that they are not manageable by traditional RDBMS tools or conventional
statistical tools.
27
BIG DATA
BIG DATA
28
29
VS
• Volume
• Bits -> Bytes -> Kilobytes -> Megabytes -> Gigabytes
-> Terabytes -> Petabytes -> Exabytes -> Zettabytes -
> Yottabytes
30
VS
• Velocity
• refers to the increasing speed at which this data is created, and the increasing speed at
which the data can be processed, stored and analysed by relational databases
• Data is being generated fast and need to be processed fast
• Online Data Analytics
• Late decisions missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what you like send
promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body any abnormal
measurements require immediate reaction .
31
VS
32
SUMMARY OF VS
• Veracity
• Refers to biases, noise and abnormality in data.
• Validity
• Refers to the accuracy and correctness of data.
• Volatility
• Deals with how long the data is valid? And how long it should be
stored?
• Variability
• Data whose meaning is constantly changing.
33
WHO IS GENERATING BIG DATA
34
BIG DATA ANALYTICS
“Big Data Analytics is the process of examining big data to uncover patterns,
unearth trends and find unknown correlations and other useful information to
make faster and better decisions”
“Process of collecting, organizing and analyzing of large sets of data (big data)
to discover patterns and other useful information”
35
CLASSIFICATION OF ANALYTICS
36
FIRST SCHOOL OF THOUGHT
• Basic Analytics
• Slicing and dicing of data to help with basic business insights.
• Reporting on historical data, basic visualization, etc.,
• Operationalized analytics
• Gets woven into enterprise’s business process
• Advanced analytics
• Forecasting for the future by way of predictive and prescriptive modeling.
• Monetized analytics
• To derive direct business revenue.
37
38
• Predictive Analytics
• which use statistical models and forecasting techniques to understand the
future and answer: “What could happen?”
• Understanding the future
• Use Predictive Analytics any time you need to know something about the
future, or fill in the information that you do not have.
• Prescriptive Analytics
• which use optimization and simulation algorithms to advise on possible
outcomes and answer: “What should we do?”
• Advise on possible outcomes
• Use Prescriptive Analytics any time you need to provide users with advice on
what action to take.
39
40
41
SECOND SCHOOL OF THOUGHT
• Analytics 1.0
• Mid 1950’s to 2009
• Descriptive statistics (and Diagnostic)
• Report on events, occurrences, etc of the past.
• What happened?
• Why did it happen?
• Analytics 2.0
• 2005 to 2012
• Descriptive statistics + Predictive statistics
• Use data from the past to make predictions for the future
• What will happen?
• Why will it happen?
42
SECOND SCHOOL OF THOUGHT
• Analytics 3.0
• 2012 to present
• Descriptive + Predictive + Prescriptive statistics
• Use data from the past to make prophecies for the future and at the same
time make recommendations to leverage the situation to one’s advantage.
• What will happen?
• When will it happen?
• Why will it happen?
• What should be the action taken to take advantage of what will happen?
43
ANALYTICS 1.0, 2.0, 3.0
44
45
ANALYTICS 1.0, 2.0, 3.0
• Descriptive Analytics
• which use data aggregation and data mining to provide insight into the past and
answer: “What has happened?”
• Insight into the past
• Use Descriptive Analytics when you need to understand at an aggregate level what is
going on in your company, and when you want to summarize and describe different
aspects of your business.
• Predictive Analytics
• which use statistical models and forecasting techniques to understand the future
and answer: “What could happen?”
• Understanding the future
• Use Predictive Analytics any time you need to know something about the future, or
fill in the information that you do not have.
46
ANALYTICS 1.0, 2.0, 3.0
• Prescriptive Analytics
• which use optimization and simulation algorithms to advise on possible outcomes
and answer: “What should we do?”
• Advise on possible outcomes
• Use Prescriptive Analytics any time you need to provide users with advice on what
action to take.
47
TRADITIONAL BI VS. BIG DATA
48
READING DATA WITH A SINGLE MACHINE
PARALLEL PROCESSING
49
BIG DATA CHALLENGES
50
THE EVOLUTION OF DATA MANAGEMENT
51
DATA MANAGEMENT-WAVE1
Creating manageable data structures
•The relational model added a level of abstraction (the structured query
language [SQL], report generators, and data management tools) so that it
was easier for programmers to satisfy the growing business demands to extract
value from data.
•The relational model offered an ecosystem of tools from a large number of
emerging software companies.
•Problem:
Storing this growing volume of data was expensive and accessing it was slow.
lots of data duplication existed,
actual business value of that data was hard to measure.
52
DATA MANAGEMENT-WAVE1
Creating manageable data structures
• Solution:
• When the volume of data that organizations needed to manage grew out of
control, the data warehouse provided a solution.
• The data warehouse enabled the IT organization to select a subset of the data
being stored so that it would be easier for the business to try to gain insights.
• Data warehouses and data marts solved many problems for companies needing a
consistent way to manage massive transactional data.
• Problem
managing huge volumes of unstructured or semi-structured data, the ware-house
was not able to evolve enough to meet changing demands.
too slow for increasingly real-time business and consumer environments.
53
DATA MANAGEMENT-WAVE2
Web and content management-Wave 2
• Enterprise Content Management systems evolved in the 1980s to provide
businesses with the capability to better manage unstructured data, mostly
documents.
• Documents and store and manage web content, images, audio, and video.
• A platform that incorporated business process management, version control,
information recognition, text management, and collaboration. This new
generation of systems added meta-data.
• With big data, it is now possible to virtualize data so that it can be stored
efficiently and, utilizing cloud-based storage, more cost-effectively as well.
54
DATA MANAGEMENT-WAVE3
Managing big data-Wave 3
• With big data, it is now possible to virtualize data so that it can be stored
efficiently and, utilizing cloud-based storage, more cost-effectively as well.
• the heart of big data, such as virtualization, parallel processing, distributed
file systems, and in-memory database
55
BEGINNING WITH CAPTURE, ORGANIZE,
INTEGRATE, ANALYZE, AND ACT
• Data must first be captured, and then organized
and integrated.
• After this phase is successfully implemented, data
can be analyzed based on the problem being
addressed.
• Finally, management takes action based on the
outcome of that analysis.
• For example, Amazon.com might recommend a
book based on a past purchase or a customer
might receive a coupon for a discount for a future The cycle of Big Data
purchase of a related product to one that was just Management
purchased.
56
ORGANIZATION SHOULD THINK?
• How much data will my organization need to manage today and in the
future?
• How often will my organization need to manage data in real time or near
real time?
• How much risk can my organization afford? Is my industry subject to strict
security, compliance, and governance requirements?
• How important is speed to my need to manage data?
• How certain or precise does the data need to be?
57
REQUIREMENTS OF BIG DATA
• Interfaces:
Big data is the fact that it relies on picking up lots of data from lots of
sources.
Therefore, open application programming interfaces (APIs) will be core to any
big data architecture.
• Physical Infrastructure
Without the availability of robust physical infrastructures, big data would
probably not have emerged
data may be physically stored in many different locations and can be linked
together through networks, the use of a distributed file system, and various
big data analytic tools and applications.
58
REQUIREMENTS OF BIG DATA
• The more important big data analysis becomes to companies, the more important it will
be to secure that data.
• For example, if you are a healthcare company, you will probably want to use big data
applications to determine changes in demographics or shifts in patient needs.
• new emerging approaches to data management in the big data world, including
document, graph, columnar, and geospatial database architectures.
59
REQUIREMENTS OF BIG DATA
• Data is organized into tables with rows and columns. It is intended to store
huge volumes of data across commodity servers.
•Hadoop
62
BENEFITS OF BIG DATA
• Data accumulation from multiple sources, including the Internet, social media platforms, online shopping
sites, company databases, external third-party sources, etc.
• Real-time forecasting and monitoring of business as well as the market.
• Identify crucial points hidden within large datasets to influence business decisions.
• Promptly mitigate risks by optimizing complex decisions for unforeseen events and potential threats.
• Identify issues in systems and business processes in real-time.
• Unlock the true potential of data-driven marketing.
• Dig in customer data to create tailor-made products, services, offers, discounts, etc.
• Facilitate speedy delivery of products/services that meet and exceed client expectations.
• Diversify revenue streams to boost company profits and ROI.
• Respond to customer requests, grievances, and queries in real-time.
• Foster innovation of new business strategies, products, and services
63
CHALLENGES OF BIG DATA
• One of the issues with Big data is the exponential growth of raw data. The
data centres and databases store huge amounts of data, which is still rapidly
growing. With the exponential growth of data, organizations often find it
difficult to rightly store this data.
• The next challenge is choosing the right Big Data tool. There are various Big
Data tools, however choosing the wrong one can result in wasted effort,
time and money too.
• Next challenge of Big Data is securing it. Often organizations are too busy
understanding and analyzing the data, that they leave the data security for a
later stage, and unprotect data ultimately becomes the breeding ground for
the hackers.
64
ADVANTAGES OF USING BIG DATA IN BUSINESS –
65
APPLICATIONS OF BIG DATA
66
APPLICATIONS OF BIG DATA
67
APPLICATIONS OF BIG DATA
• Education Sector:
• Online educational course conducting organization utilize big data to search
candidate, interested in that course. If someone searches for YouTube tutorial
video on a subject, then online or offline course provider organization on that
subject send ad online to that person about their course.
• Media and Entertainment Sector:
• Media and entertainment service providing company like Netflix, Amazon
Prime, Spotify do analysis on data collected from their users. Data like what
type of video, music users are watching, listening most, how long users are
spending on site, etc are collected and analyzed to set the next business
strategy.
68
APPLICATIONS OF BIG DATA
69
APPLICATIONS OF BIG DATA
• Healthcare:
• The healthcare sector has access to huge amounts of data but has been
plagued by failures in utilizing the data to curb the cost of rising healthcare
and by inefficient systems that stifle faster and better healthcare benefits
across the board.
70
APPLICATIONS OF BIG DATA
71
MYTHS IN BIG DATA
72
MYTHS IN BIG DATA
Myth 3: Big data can predict everything about the future of the business
• Fact: Analytics can predict the trend using Big data, but it’s not the data which drives the business. A
business stands on many factors like economy, human resources, technology and many more. Hence,
when it comes to predicting the future of a business, you cannot predict anything certain through data.
Myth 4: Big Data means big budget and it is for big companies
• Fact: It’s true that we have seen organizations like multi-national corporations and governments bodies
investing a huge amount to set up large-scale data centers and high-end technologies for implementing
Big data. Not only that, employing skilled big data professionals and data scientists is also a very costly
affair as their demand is high due to resource crunch in the market.
73
MYTHS IN BIG DATA
74
MYTHS IN BIG DATA
75