0% found this document useful (0 votes)

15 views68 pages

1 Introduction

The document provides an overview of Big Data, defining it through the 3Vs: Volume, Variety, and Velocity, and discusses its sources and implications for various sectors. It outlines different data models and technologies used in Big Data analytics, including RDBMS, NoSQL, NewSQL, and specialized databases like Graph and Vector DBs. The document emphasizes the importance of real-time data processing and the growing demand for Big Data skills in the job market.

Uploaded by

Việt Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views68 pages

1 Introduction

Uploaded by

Việt Hưng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

BigData Techniques and Technologies

Introduction to Big Data

NGUYỄN Ngọc Hoá

Department of Information Systems
VNU University of Engineering and Technology

Hoa.Nguyen@vnu.edu.vn
Outline
1. Definitions

2. SOTA Data Models

3. Data Analytics

4. Big Data Stack

5. Big Data Potential Applications & Landscape

6. Big Data Jobs

2 Big Data @ DIS

1. Definitions

3 Big Data @ DIS

Big Data
 Big in Big Data refers to:
– Big size is the primary definition.
– Big complexity rather than big volume. it can be small and not all
large datasets are big data
– size matters... but so does accessibility, interoperability and
reusability
 define Big Data using 3Vs; namely:
– Volume, Variety, Velocity

4 Big Data @ DIS

5 Big Data @ DIS
Big Data: A buzz word?

6 Big Data @ DIS

7 Big Data @ DIS
8 Big Data @ DIS
Where Big Data Comes From?
 Big Data is not specific application – More Type of data (variety of data)
type, but rather a trend –or even a – Faster Ingest of data (velocity of data)
collection of trends- napping – More Accessibility of data (internet,
multiple application types instruments , …)
 Data growing in multiple ways – Data Growth and availability exceeds
– More data (volume of data ) organization ability to make intelligent
decision based on it

9 Big Data @ DIS

Who is generating Big Data

10 Big Data @ DIS

Processes 40 EB a day (2023)
Search Index 100 EB (2023)
Perform 8.5B searches/day (2023)
How much data?
Crawls 20B web pages a day (2023)

19 Hadoop clusters: 600

PB, 40k servers (9/2015)

550 PB on 50k+ servers

Hadoop: 10K nodes, 150K
running 15k apps (2024)
cores, 150 PB (4/2014)

1,000 PB data in Hive + LHC: ~15 PB a year

4 PB/day (2024)

S3: 2T objects, 1.1M

request/second (4/2013)
LSST: 6-10 PB a year
640K ought to be (~2020)
enough for
anybody.
SKA: 0.3 – 1.5 EB
per year (~2020)

11 Big Data @ DIS

From http://www.umiacs.umd.edu/~jimmylin/
How much data?
Batch – More Compute
 Airbus A350: Equipped with Management
~6,000 sensors, Level

generating ~2.5 TBs of data Planning

per day. Level

 Autonomous vehicles: Supervision

Level
Generate 4~6 TBs of data
per day. Control
Level
 Smart factories: Produce
~500 TBs of data per day. Field
Level
 IoT data is projected to reach
73.1 ZB by 2025.
(Source: IDC)

Stream – More Data

12 Big Data @ DIS

Volume, Variety, and Velocity
 Aggregation that used to be measured in petabytes (PB) is
now referenced by a term: zettabytes (ZB).
– A zettabyte is a trillion gigabytes (GB)
– or a billion terabytes
 in 2010, we crossed the 1 ZB marker, and at the end of
2025 that number was estimated to be 175 ZB (source IDC)
– Google: 100 PB/day process, 15000 PB storage
– eBay: 100 TB/day, 90 PB storage
– Baidu: 10-100 TB/day, 2000 PB storage
– Facebook: 600 TB/day, 300 PB storage
– Spotify: 2.2 TB/day, 100 PB storage

13 Big Data @ DIS

Volume, Variety, and Velocity
 Different types: single application can be
generating/collecting many types of data
– Relational data (relation/transactions/legacy data)
– Text data (Web)
– Semi-structured data (XML)
– Graph data: social network, semantic web, …
– Streaming data, …
 Different sources:
– User shopping behaviors from Shoppe/Lazada/Tiki/Amazon …
– Product reviews from different provider websites
To extract knowledge, all these types of data need to be
linked together
– Trying to capture all of the data that pertains to our decision-making process.
– Making sense out of unstructured data, such as opinion, or analysing images.

14 Big Data @ DIS

A global view of linked big data

15 Big Data @ DIS

Volume, Variety, and Velocity
 The rate at which data arrives at the enterprise and is
processed or well understood
 In other terms “How long does it take you to do something
about it or know it has even arrived?
 Data is generated fast and need to be processed fast
 Late decisions  missing opportunities

 Examples:
– e-Promotions: based on your current location, your purchase history,
what you like  send promotions right now for the store next to you
– Healthcare monitoring: sensors monitoring your activities and body 
any abnormal measurements require immediate reaction
– Disaster management and response

16 Big Data @ DIS

Realtime Analytics/Decision Requirements

 Today, it is possible using real-

time analytics to optimize Like
buttons across both website
and on Facebook.
 FaceBook use anonymised data to
show the number of times people:
– saw Like buttons,
– clicked Like buttons,
– saw Like stories on Facebook,
– and clicked Like stories to visit a given
website.

17 Big Data @ DIS

Extensions: 6Vs of Big Data

 Volume: The amount of data being  Veracity: The quality of big data can
generated is growing rapidly and be uncertain, making it important to
becoming increasingly large, making validate and clean the data before
it difficult to store and process using using it for analysis.
traditional methods.  Value: Despite its challenges, big
 Variety: Big data comes in many data holds the potential to deliver
different formats, including valuable insights and drive business
structured, semi-structured, and results, making it an important asset
unstructured data, such as text, for organizations.
images, videos, and sensor data.  Variability: refers to how often this
 Velocity: Big data is generated and change happens. Big Data helps in
processed at a high speed, requiring managing these drifts of data that
real-time processing capabilities. benefit organizations to come up with
the latest products.
18 Big Data @ DIS
Other V’s
 Visibility/Visualization: after big data being processed, we
need a way to present the data in a manner that is readable
and accessible.
 Viscosity: describe the latency or lag time in the data relative
to the event being described. We found that this is just as
easily understood as an element of Velocity.
 Virality: defined by some users as the rate at which the data
spreads; how often it is picked up and repeated by other
users or events.
 Volatility: refers to how long data is valid and how long it
should be stored. You need to determine at what point data
is no longer relevant to the current analysis.
 …
19 Big Data @ DIS
Who cares Big Data?
 Government
 Finance, Banking
 Manufacture
 Education
 Health
 Traffic
 IoT
 ….

20 Big Data @ DIS

How to deal with Big Data?
Advice From Jim Gray (Turing 98):
1. Analysing big data requires scale‐out solutions not scale-up
solutions
2. Move the analysis to the data.
3. Work with scientists to find the most common “20 queries” and make
them fast.
4. Go from “working to working.”

21 Big Data @ DIS

20 Common Queries
1. Data retrieval & search • Find co-occurring items (e.g., products
• Find a specific record by ID (e.g., retrieve a frequently bought together).
user, product, or experiment result). • Detect missing or mismatched records
• Retrieve a list of recent records (e.g., (e.g., students registered for a course but
latest 100 transactions, newest publications). missing from attendance logs).
• Full-text search on large text fields (e.g., 5. Ranking & Sorting
search scientific papers for a keyword). • Find the top N results (e.g., top 10 best-
• Find records with partial matches (e.g., selling books).
autocomplete suggestions for search terms). • Find anomalies/outliers (e.g., detect
2. Aggregation & Statistics unusually high spending in credit card
• Compute total count (e.g., how many users transactions).
made purchases this month?). 7. Geospatial Processing
• Find the sum, average, min, max of a field • Find records within a location radius (e.g.,
(e.g., average sensor temperature, highest all hospitals within 10km of a given point).
sales). • Find the nearest neighbors (e.g., closest
• Group by and aggregate (e.g., total sales weather stations to a given location).
per region, number of cases per category). 8. Machine Learning & Anomaly Detection
3. Time-Based Analysis • Find similar records (e.g., customers with
• Find records in a given time range (e.g., similar purchasing behavior).
transactions from Jan 1–Feb 1). • Detect duplicate records (e.g., merge
• Compare trends over time (e.g., sales duplicate user profiles).
growth from last year to this year). • Find sudden spikes or drops (e.g., unusual
• Compute moving averages over time (e.g., traffic surge on a website).
rolling 7-day average of website visits). • Detect missing expected values (e.g.,
4. Join & Relationships expected daily report missing for a date).
22 • Join two or more datasets (e.g., link
customer data with purchase history). Big Data @ DIS
Why study big data technologies?
 Hot topic in both research and industry
 Highly demanded in real world
 A promising future career
– Research and development of big data systems: distributed systems
(e.g. Hadoop), visualization tools, data warehouse/lake, OLAP, data
integration, data quality control, …
– Big data applications: social marketing, healthcare, …
– Data analysis: to get values out of big data, such as discovering and
applying patterns, predictive analysis, business intelligence, privacy
and security, …

23 Big Data @ DIS

2. SOTA Data Models

24 Big Data @ DIS

Data Is Driving Everything

 “Big data”  “Deep learning”

 “Data science”  “Statistical analysis”
 “Data lakes”  “Biomedical informatics”
 “Visual analytics”  “Business analytics”

Lots of trends in pursuit of the same goals!

Discovery, models, decision-making, …
Data Needs to Be Modeled, Cleaned, and Linked!

25 Big Data @ DIS

Most Scenarios: Lots of “Medium
Data” that Isn’t Ready for Analytics
 Other than in the Web and in monitoring scenarios – we
typically don’t have all of the data in one place

– In different systems
– Bringing in public datasets
– Requiring access to Twitter APIs etc.

 Also, it’s often not in a form where:

– It’s clean and regular – e.g., we may have missing values, spurious
values, etc.
– The features we want to use to make predictions are immediately 26

available to us

26 Big Data @ DIS

Data vs Structured Data
Structural relationships are
sometimes important features
Images
Data +
feature
Genes extraction,
wrangling

Text

 Goal: raw data  structured data

– Fields, entities, objects, machine learning features
– May be very regular or semi-structured

27 Ultimately, goal is data  information  knowledge

Big Data @ DIS
Linked Data: Find Patterns in Connectivity
(Clusters, Paths, …)

28 Big Data @ DIS

Knowledge Graphs
Classes, subclasses, instances, and properties

29 Big Data @ DIS

Dynamic Data: Track over Time,
Forecast the Future

30 Big Data @ DIS

Tabular (Relational) Data and Joins /
Lookups (eg to Web Services)
New York Taxi Data

Reverse
Geocode
Data

Street View

31 Big Data @ DIS

SOTA Data Models
 RDBMS, NoSQL, NewSQL, Graph DB, Realtime DB, Vector
DB, and GPU DB.
Data Model Best For Examples
RDBMS Structured data, Transactions MySQL, PostgreSQL
Scalability, Semi-structured
NoSQL MongoDB, Cassandra
data
High-performance Google Spanner,
NewSQL
transactions CockroachDB
Graph DB Relationship-heavy data Neo4j, TigerGraph
Realtime DB Instant data updates Firebase, Apache Ignite
Vector DB AI, Similarity search Pinecone, FAISS
GPU DB Big data, Real-time analytics Kinetica, OmniSci
Automated insights,
AI-Driven DB Google BigQuery ML
Optimization
Multi-Model DB Handling diverse workloads ArangoDB, MarkLogic

32 Big Data @ DIS

Traditional Relational Databases
(RDBMS)
 Structured and tabular format (rows & columns)
 Uses SQL for data manipulation
 ACID-compliant (Atomicity, Consistency, Isolation, Durability)
 Examples: MySQL, PostgreSQL, Oracle, SQL Server
 Best suited for structured data and transactional applications

33 Big Data @ DIS

NoSQL vs NewSQL Databases
NoSQL NewSQL
 Designed for scalability and  Combines the consistency of
flexibility RDBMS with the scalability of
 Four main types: NoSQL
– Key-Value Stores (Redis,  ACID compliance with high
DynamoDB) performance
– Document Stores (MongoDB,  Examples: Google Spanner,
CouchDB) CockroachDB, TiDB
– Column-Family Stores  Best for large-scale transactional
(Cassandra, HBase)
applications
– Graph Databases (Neo4j,
ArangoDB)
 Suitable for semi-structured and
unstructured data

34 Big Data @ DIS

Graph & Vector Databases
GraphDB VectorDB
• Optimized for handling • Stores high-dimensional vector
relationships and connected data data
• Uses nodes and edges to • Used in AI, machine learning,
represent entities and recommendation systems
relationships • Examples: Pinecone, FAISS,
• Examples: Neo4j, ArangoDB, Weaviate
TigerGraph • Best for similarity search, NLP,
• Ideal for social networks, fraud and computer vision
detection, recommendation
engines

35 Big Data @ DIS

Realtime & GPU Databases
Realtime DB GPU DB
• Supports low-latency and high- • Uses GPUs for parallel
throughput data processing processing of massive datasets
• Often used in financial trading, • Accelerates analytics and AI
gaming, real-time analytics workloads
• Examples: Firebase, Apache • Examples: Kinetica, BlazingDB,
Ignite, SingleStore OmniSci
• Best for applications requiring • Suitable for real-time big data
instant data access and updates processing and deep learning

36 Big Data @ DIS

AI-Driver & Multi-Model Databases
AI-Driven DB Multi-Model DB
• Integrates AI and machine • Supports multiple data models
learning for automation and within a single system
optimization • Benefits:
• Features: • Flexibility to handle diverse data
• Automated indexing and query types
optimization • Reduces data silos and simplifies
• Predictive analytics and anomaly architecture
detection • Examples: ArangoDB, MarkLogic,
• Self-healing and auto-tuning OrientDB
capabilities
• Best for applications requiring
• Examples: Google BigQuery ML, diverse workloads
Oracle Autonomous Database
• Best for enterprises leveraging AI
for smarter decision-making

37 Big Data @ DIS

3. Data Analytics

The process of examining large and varied data sets to

uncover hidden patterns, correlations, market trends,
and customer preferences.

38 Big Data @ DIS

The Goal of Data Analytics:
From Data to “Knowledge” or Action
 Definition: the process of examining large and varied data sets to
uncover hidden patterns, correlations, market trends, and
customer preferences.
 Pattern detection: Raw data  patterns  partial understanding
– “Show me sales by region by product category”
– “Show me clusters of documents by concept”
 Given an observation: Hypothesis  experiment over sample 
significance
– “Behavioral factor F leads to higher risk of outcome O”
– Do statistical test, measure significance vs. null hypothesis

 CORBA: Collect  Extrapolate  Recognize 

Build  Apply
39 Big Data @ DIS
What Does Big Data Analytics
Involve?
 Acquisition, access – data may exist without being accessible (C)

 Wrangling – data may be in the wrong form (CE)

 Integration, representation – data relationships may not be captured (ER)

 Cleaning, filtering – data may have variable quality (ER)

 Hypothesizing, querying, analyzing, modeling – from data to info (ERB)

 Understanding, iterating, exploring – helping build knowledge (A)

And: ethical obligations – need to protect data, follow good statistical

practices, present results in a non-misleading way (CERBA)

Examples: Netflix Movie, Amazon Product, Expedia Hotel

Recommendation, …
40 Big Data @ DIS
Big Data Analytics: From Data to
Action

41 Big Data @ DIS

Data Science / Data Analytics:
Beware Over-Hyped Expectations!
Data science myth: Data science reality:
• We’ll learn everything “bottom • We’ll typically rely on human
up” using fancy statistics and expertise to impose models
machine learning over the data, the features, etc.
• Basically we “turn the crank” • Deep learning can do feature
and out pop insights! selection – but why throw away
what we know!
Data + algorithms  knowledge
Data + human insight +
algorithms + iteration 
information  knowledge
42

42 Big Data @ DIS

Data Science Application Process
 What question are you answering?
 What is the right scope of the project?
 What data will you use?
 What techniques are you going to try?
 How will you evaluate your results?
 What maintenance will be required?
Before we even get to machine learning, at
least 80-90% of DS companies work involves:
• Working with experts to understand the
domain, assumptions, questions, etc.
• Trying to catalog and make sense of the
data sources
• Wrangling, extracting, and integrating the
data
43
• Cleaning the wrangled data
Big Data @ DIS
4. Big Data Stack

44 Big Data @ DIS

Big Data platform: six key imperatives
1. Discover, Explore, and Navigate Big • Scalable storage solutions for images,
videos, and logs.
Data Sources
• Federated discovery, search, and 4. Analyze Data in Motion
• Stream computing technologies such as
navigation. Apache Kafka and Flink.
• Ability to access structured and • Real-time event processing for IoT and
unstructured data from various transaction monitoring.
sources. • Low-latency decision-making capabilities.
• Metadata management and indexing 5. Rich Library of Analytical Functions
for efficient retrieval. and Tools
2. Extreme Performance – Run • In-database analytics libraries for machine
Analytics Closer to Data learning and deep learning.
• Massively parallel processing (MPP) analytic • Data visualization and reporting tools.
appliances. • AI-driven insights and automation.
• Distributed computing frameworks like 6. Integrate and Govern All Data
Apache Spark. Sources
• Optimized query engines for real-time and • Data integration platforms ensuring seamless
batch analytics. data flow across systems.
3. Manage and Analyze Unstructured • Data quality management, security policies,
Data and lifecycle governance.
• Hadoop ecosystem, including HDFS, • Master Data Management (MDM) to unify
MapReduce, and text analytics. enterprise data.
• Natural Language Processing (NLP) and
sentiment analysis.
45 Big Data @ DIS
Big Data Stack Components
1. Data Sources 4. Data Processing
• Structured Data (RDBMS, Data • Batch Processing (Apache Spark,
Warehouses) Hadoop MapReduce)
• Semi-Structured Data (JSON, XML, • Real-Time Processing (Apache Storm,
Logs) Apache Flink)
• Unstructured Data (Text, Images, 5. Data Analytics
Videos, Social Media) • Machine Learning (TensorFlow, Scikit-Learn)
2. Data Ingestion • Business Intelligence (Tableau, Power BI)
• Batch Processing (ETL, Apache • Search & Query (Elasticsearch, Apache Drill)
Sqoop, Apache Flume) 6. Data Visualization
• Streaming Processing (Apache Kafka, • Reporting Tools (Tableau, Google Data
Apache NiFi) Studio)
• Dashboards (Grafana, Kibana)
3. Data Storage
• Relational Databases (MySQL, 7. Data Security & Governance
PostgreSQL) • Data Encryption & Access Control
• Compliance & Auditing (GDPR, HIPAA)
• NoSQL Databases (MongoDB,
Cassandra, HBase)
• Data Lakes (Hadoop HDFS, Amazon
S3)
46 Big Data @ DIS
Big Data Stack

47 Big Data @ DIS

Target stack on this course

48 Big Data @ DIS

Big Data Tools & Frameworks
 Apache Hadoop: Distributed storage and processing framework.
 Apache Spark: Fast data processing engine for large-scale data
analytics.
 Cloudera Data Platform: Comprehensive data management platform.
 Coalesce: Data
transformation platform for
building data pipelines.
 Other Tools:
– NoSQL databases (e.g.,
MongoDB, Cassandra).
– Data visualization tools
(e.g., Tableau, Power BI).

49 Big Data @ DIS

50 Big Data @ DIS
51 Big Data @ DIS
52 Big Data @ DIS
5. Big Data Potential Applications &
Lanscape

53 Big Data @ DIS

Potential Applications of Big Data
 Healthcare & Medicine – Education Market Trends – Analyzing
enrollment patterns and future workforce
– Predictive Disease Analytics – Early
needs.
diagnosis of diseases like cancer and heart
conditions. – Research & Scientific Discovery –
Accelerating breakthroughs in various
– Medical Image Analysis – AI-powered
disciplines using data analytics.
interpretation of X-rays, MRIs, and CT scans.
– Genomic Data Processing – Personalized  Economy & Business
medicine and drug discovery. – Market & Consumer Analytics –
– Epidemiology & Pandemic Management – Predicting trends and customer
Tracking and predicting disease outbreaks behavior.
(e.g., COVID-19).
– Financial Risk Management – Fraud
– Hospital & Resource Management –
Optimizing hospital bed occupancy and detection, credit scoring, and market
medical supply chains. analysis.
 Education & Research – Supply Chain Optimization – Real-time
tracking, demand forecasting, and
– Personalized Learning – Adaptive learning
systems and AI-powered tutoring. inventory management.
– Academic Performance Prediction – – Personalized Marketing – Targeted
Identifying at-risk students and improving advertising and recommendation
teaching methods. systems.
– Institutional Decision-Making – Data-driven – Stock Market Predictions – Analyzing
policymaking for schools and universities.
financial data for investment strategies.
54 Big Data @ DIS
Potential Applications …
 Society & Public Services communications data.
– Smart Cities – Traffic optimization, waste – Border Security & Immigration Control –
management, and public safety Detecting illegal activities and managing
improvements. migration patterns.
– Crime Prediction & Prevention – Identifying – Counterterrorism & Crime Prevention –
crime patterns and predicting high-risk areas. Analyzing global threat networks and
– Social Media Analysis – Tracking public suspicious transactions.
sentiment and misinformation detection.  Environment & Sustainability
– Disaster Management – Real-time – Climate Change Monitoring – Analyzing
monitoring of natural disasters and temperature trends and carbon emissions.
emergency response planning. – Natural Disaster Prediction – Early warning
– Employment & Labor Market Analysis – systems for earthquakes, floods, and
Predicting job market trends and workforce hurricanes.
planning. – Agriculture & Precision Farming – Optimizing
 National Security & Defense crop yields and resource use.
– Cybersecurity & Threat Intelligence – – Wildlife & Biodiversity Conservation –
Identifying cyber threats and anomalies in Tracking endangered species and
real-time. deforestation patterns.
– Military Strategy & Operations – Predictive – Water & Air Quality Management –
analytics for tactical planning and logistics. Monitoring pollution levels and ensuring
– Surveillance & Intelligence Gathering – regulatory compliance.
Analyzing satellite imagery and

55 Big Data @ DIS

When to consider Big Data Solution
 Data volume is growing rapidly: You’re  Performance issues: when existing
limited by your current platform or relational databases struggle with
environment because you can’t query speed and performance.
process the amount of data that you – A financial firm processing massive
want to process. stock market data streams.
– A retail business with millions of  Advanced analytics and AI integration:
customer transactions daily. when machine learning, predictive
 Need for real-time data processing: analytics, or deep learning is required.
when real-time insights are critical for – Personalized marketing campaigns
decision-making. based on user behavior.
– Fraud detection in banking or real-time  Need for scalability and flexibility:
recommendations in e-commerce. when data workloads fluctuate and
 Unstructured or multi-format data: You require dynamic scaling.
want to involve new sources of data in – Cloud-based Big Data platforms for
the analytics, but you can’t, because it startups and enterprises.
doesn’t fit into schema-defined rows  Data-driven decision making: when
and columns without sacrificing fidelity organizations want to leverage data to
or the richness of the data gain a competitive edge.
– Social media sentiment analysis or – Healthcare providers optimizing patient
medical image processing. treatment plans using data analytics.

56 Big Data @ DIS

The 2017 Big Data Landscape

57 Big Data @ DIS

58 Big Data @ DIS
The 2024 MAD (ML, AI & Data)
Landscape

https://mad.firstmark.com/
59 Big Data @ DIS
Hype Cycle for Data Management
2022

60 Big Data @ DIS

Hype Cycle for Data Management
2023

61 Big Data @ DIS

Hype Cycle for Data Management
2024

62 Big Data @ DIS

63 Big Data @ DIS
6. Big Data Jobs

64 Big Data @ DIS

Big Data Jobs
 Data scientists: collect, analyze, manage, structure and interpret large
volumes of data from a range of sources. Data scientists then use
reporting tools to pinpoint patterns, trends and interrelationships between
the various data sets.

 Big data engineer & architects: create the underpinning software

architecture; design, build, and manage the infrastructure and scalable
data management systems that data scientists need to perform their
analysis; outline business objectives and transform them into data-
processing workflows; can be found across industries.

 Big data developers: apply their deep understanding of technologies such

as Hadoop and Apache Spark with programming languages such as
Java, Python and Scala to process data. By drawing on deep
proficiencies in functional programming paradigms, they can effectively
ingest data into broader big data platform ecosystems.
65 Big Data @ DIS
Big Data Jobs…
 Big data analysts: detect and analyze actionable data, such as hidden
trends and patterns. By fusing these findings with their in-depth
knowledge of the market in which their organizations operate, they can
help leaders formulate informed strategic business decisions.

 Big data specialists: interrogate, ingest, analyze and transform complex

sets of data. This ensures the necessary data is made available to the
other team members who use it to uncover actionable insights and
provide recommendations to improve business outcomes.
 …

66 Big Data @ DIS

67 Big Data @ DIS
Skills required for Big Data Analytics
 Store and process
– Large scale databases
– Software Engineering
– System/network Engineering
 Analyse and model
– Reasoning
– Knowledge Representation
– Multimedia Retrieval
– Modelling and Simulation
– Machine Learning
– Information Retrieval
 Understand and design
– Decision theory
– Visual analytics
– Perception Cognition

68 Big Data @ DIS

Understanding Big Data: Challenges & Applications
No ratings yet
Understanding Big Data: Challenges & Applications
82 pages
Introduction To Big Data Management
No ratings yet
Introduction To Big Data Management
53 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Bsd1313 Chapter 2
No ratings yet
Bsd1313 Chapter 2
40 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Lecture 6 BigData
No ratings yet
Lecture 6 BigData
61 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
Big Data CH 1
No ratings yet
Big Data CH 1
62 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Big Data 1 Unit
No ratings yet
Big Data 1 Unit
21 pages
Big Data
No ratings yet
Big Data
23 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Big Data
No ratings yet
Big Data
54 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Big Data: Transforming Business
No ratings yet
Big Data: Transforming Business
93 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
IT UNIT 2 Part 1
No ratings yet
IT UNIT 2 Part 1
33 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
01 Introduction
No ratings yet
01 Introduction
23 pages
Big Data Overview
No ratings yet
Big Data Overview
75 pages
Bda U1
No ratings yet
Bda U1
78 pages
Introduction To Big Data and Hadoop
No ratings yet
Introduction To Big Data and Hadoop
31 pages
BDA - CHP 1
No ratings yet
BDA - CHP 1
141 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
53 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Bigdata Units
No ratings yet
Bigdata Units
80 pages
Unit I
No ratings yet
Unit I
64 pages
Chap 1
No ratings yet
Chap 1
41 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Unit - 1
No ratings yet
Unit - 1
104 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
88 pages
Big Data
No ratings yet
Big Data
16 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
Lecture 3-Introduction To Big Data
No ratings yet
Lecture 3-Introduction To Big Data
25 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
PPT01-Introduction To Big Data
No ratings yet
PPT01-Introduction To Big Data
34 pages
Big Data Intro
No ratings yet
Big Data Intro
32 pages
Big Data
No ratings yet
Big Data
10 pages
17 2017 Lecture1-2 INT312
0% (2)
17 2017 Lecture1-2 INT312
21 pages
Big Data Essentials for Beginners
No ratings yet
Big Data Essentials for Beginners
31 pages
Unit 1
No ratings yet
Unit 1
89 pages
Unit 1
No ratings yet
Unit 1
76 pages
BDA - Unit - I
No ratings yet
BDA - Unit - I
86 pages
VDA - Lecture 5 - Visual Analytics For Investigating and Processing Data
No ratings yet
VDA - Lecture 5 - Visual Analytics For Investigating and Processing Data
37 pages
Trần Việt Hưng-23021586
No ratings yet
Trần Việt Hưng-23021586
9 pages
Arpan Rau 2014 Head of Mechanical - FRC 3061 Current Mentor - FRC 4979 and FRC 5125 2014 Deans List Finalist - Midwest Regional
No ratings yet
Arpan Rau 2014 Head of Mechanical - FRC 3061 Current Mentor - FRC 4979 and FRC 5125 2014 Deans List Finalist - Midwest Regional
102 pages
2021USAPhO Plus Solutions
No ratings yet
2021USAPhO Plus Solutions
19 pages
2021-USAPhO Solution v2
No ratings yet
2021-USAPhO Solution v2
22 pages
21h2z7b4 FBM207
No ratings yet
21h2z7b4 FBM207
12 pages
London Pub Menu New (30 X 23 CM) - 1
No ratings yet
London Pub Menu New (30 X 23 CM) - 1
9 pages
Methods and Tools For Directed Activity: Presented by R Harish
No ratings yet
Methods and Tools For Directed Activity: Presented by R Harish
10 pages
Moderator: Dr. Usha Suwalka Presenter: Dr. Suchismita Naik
No ratings yet
Moderator: Dr. Usha Suwalka Presenter: Dr. Suchismita Naik
44 pages
Tiếng Anh - Chính Thức - Thiệp
No ratings yet
Tiếng Anh - Chính Thức - Thiệp
7 pages
VT 321bridge
No ratings yet
VT 321bridge
1 page
Marian Stauder: University of Illinois
No ratings yet
Marian Stauder: University of Illinois
2 pages
Donation Platform Enhancements
No ratings yet
Donation Platform Enhancements
4 pages
MartinezMork Panel03
No ratings yet
MartinezMork Panel03
15 pages
Cursors 100112215205 Phpapp01
No ratings yet
Cursors 100112215205 Phpapp01
19 pages
Introduction To Agile Change Management v1.0 1
100% (1)
Introduction To Agile Change Management v1.0 1
8 pages
Ud6 Comunicacion Prof Ingles
No ratings yet
Ud6 Comunicacion Prof Ingles
20 pages
Deontology Ethics and Social Responsibility of Education
No ratings yet
Deontology Ethics and Social Responsibility of Education
17 pages
Unique Wedding by Dr. Franklin
No ratings yet
Unique Wedding by Dr. Franklin
6 pages
Low Cost Equipment For Teaching
No ratings yet
Low Cost Equipment For Teaching
58 pages
Raymond Caldwell: Theatre Educator & Director
No ratings yet
Raymond Caldwell: Theatre Educator & Director
9 pages
Advanced - Commercial - Formulas
No ratings yet
Advanced - Commercial - Formulas
3 pages
Soal STS B. Inggris Kls 2 2024-2025
No ratings yet
Soal STS B. Inggris Kls 2 2024-2025
3 pages
2023 Healthcare-Programme Mongolia Annual Report
No ratings yet
2023 Healthcare-Programme Mongolia Annual Report
22 pages
Investigatory Project Ray Optics
No ratings yet
Investigatory Project Ray Optics
11 pages
Maintenance and Service Manual: Electric Wheelchair
No ratings yet
Maintenance and Service Manual: Electric Wheelchair
76 pages
The Top 50 Questions Kids Ask Susan Bartell PDF Download
100% (3)
The Top 50 Questions Kids Ask Susan Bartell PDF Download
57 pages
BioCon 700 Parts List - June 2019
No ratings yet
BioCon 700 Parts List - June 2019
3 pages
ACCT205 - Portfolio Project Directions and Rubrics
No ratings yet
ACCT205 - Portfolio Project Directions and Rubrics
10 pages
Summrizing Tenses in English
No ratings yet
Summrizing Tenses in English
3 pages
ET GT 5ES Datasheet L
No ratings yet
ET GT 5ES Datasheet L
2 pages
Math9 PCTB Sol Ex1 2 Sheraz Ansari
No ratings yet
Math9 PCTB Sol Ex1 2 Sheraz Ansari
4 pages
New 1
No ratings yet
New 1
19 pages
CSP Project File (Umar-F3)
No ratings yet
CSP Project File (Umar-F3)
35 pages
Decolonization in South Asia Meanings of Freedom in Post Independence West Bengal 1947 52 1st Edition Sekhar Bandyopadhyay Sample
100% (4)
Decolonization in South Asia Meanings of Freedom in Post Independence West Bengal 1947 52 1st Edition Sekhar Bandyopadhyay Sample
158 pages