0% found this document useful (0 votes)
27 views46 pages

1 Introduction

Uploaded by

Ngọc Long
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views46 pages

1 Introduction

Uploaded by

Ngọc Long
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Principles of Distributed Database

Systems
TS. Phan Thị Hà
Khoa CNTT. PTIT

© 2020, M.T. Özsu & P. Valduriez TS. Pan


1
Thị Hà_PTIT
Outline
◼ Introduction
◼ Distributed and Parallel Database Design
◼ Distributed Data Control
◼ Distributed Query Processing
◼ Distributed Transaction Processing
◼ Data Replication
◼ Database Integration – Multidatabase Systems
◼ Parallel Database Systems
◼ Peer-to-Peer Data Management
◼ Big Data Processing
◼ NoSQL, NewSQL and Polystores
◼ Web Data Management
© 2020, M.T. Özsu & P. Valduriez TS. Pan
2
Thị Hà_PTIT
Outline
◼ Introduction
❑ What is a distributed DBMS
❑ History
❑ Distributed DBMS promises
❑ Design issues
❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


3
Thị Hà_PTIT
Distributed Computing

◼ A number of autonomous processing elements (not


necessarily homogeneous) that are interconnected by a
computer network and that cooperate in performing their
assigned tasks.
◼ What is being distributed?
❑ Processing logic
❑ Function
❑ Data
❑ Control

© 2020, M.T. Özsu & P. Valduriez TS. Pan


4
Thị Hà_PTIT
Current Distribution – Geographically
Distributed Data Centers

© 2020, M.T. Özsu & P. Valduriez TS. Pan


5
Thị Hà_PTIT
What is a Distributed Database System?

A distributed database is a collection of multiple, logically


interrelated databases distributed over a computer network

A distributed database management system (Distributed


DBMS) is the software that manages the DDB and provides
an access mechanism that makes this distribution
transparent to the users

© 2020, M.T. Özsu & P. Valduriez TS. Pan


6
Thị Hà_PTIT
What is not a DDBS?

◼ A timesharing computer system


◼ A loosely or tightly coupled multiprocessor system
◼ A database system which resides at one of the nodes of
a network of computers - this is a centralized database
on a network node

© 2020, M.T. Özsu & P. Valduriez TS. Pan


7
Thị Hà_PTIT
Distributed DBMS Environment

© 2020, M.T. Özsu & P. Valduriez TS. Pan


8
Thị Hà_PTIT
Implicit Assumptions

◼ Data stored at a number of sites → each site logically


consists of a single processor
◼ Processors at different sites are interconnected by a
computer network → not a multiprocessor system
❑ Parallel database systems
◼ Distributed database is a database, not a collection of
files → data logically related as exhibited in the users’
access patterns
❑ Relational data model
◼ Distributed DBMS is a full-fledged DBMS
❑ Not remote file system, not a TP system

© 2020, M.T. Özsu & P. Valduriez TS. Pan


9
Thị Hà_PTIT
Important Point

Logically integrated
but
Physically distributed

© 2020, M.T. Özsu & P. Valduriez TS. Pan


10
Thị Hà_PTIT
Outline
◼ Introduction

❑ History

© 2020, M.T. Özsu & P. Valduriez TS. Pan


11
Thị Hà_PTIT
History – File Systems

© 2020, M.T. Özsu & P. Valduriez TS. Pan


12
Thị Hà_PTIT
History – Database Management

© 2020, M.T. Özsu & P. Valduriez TS. Pan


13
Thị Hà_PTIT
History – Early Distribution
Peer-to-Peer (P2P)

© 2020, M.T. Özsu & P. Valduriez TS. Pan


14
Thị Hà_PTIT
History – Client/Server

© 2020, M.T. Özsu & P. Valduriez TS. Pan


15
Thị Hà_PTIT
History – Data Integration

© 2020, M.T. Özsu & P. Valduriez TS. Pan


16
Thị Hà_PTIT
History – Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
◼ Cost savings: no need to maintain dedicated compute
power
◼ Elasticity: better adaptivity to changing workload

© 2020, M.T. Özsu & P. Valduriez TS. Pan


17
Thị Hà_PTIT
Data Delivery Alternatives

◼ Delivery modes
❑ Pull-only
❑ Push-only
❑ Hybrid
◼ Frequency
❑ Periodic
❑ Conditional
❑ Ad-hoc or irregular
◼ Communication Methods
❑ Unicast
❑ One-to-many
◼ Note: not all combinations make sense
© 2020, M.T. Özsu & P. Valduriez TS. Pan
18
Thị Hà_PTIT
Outline
◼ Introduction

❑ Distributed DBMS promises


© 2020, M.T. Özsu & P. Valduriez TS. Pan


19
Thị Hà_PTIT
Distributed DBMS Promises

 Transparent management of distributed, fragmented,


and replicated data

 Improved reliability/availability through distributed


transactions

 Improved performance

 Easier and more economical system expansion

© 2020, M.T. Özsu & P. Valduriez TS. Pan


Thị Hà_PTIT
Transparency

◼ Transparency is the separation of the higher-level


semantics of a system from the lower level
implementation issues.
◼ Fundamental issue is to provide
data independence
in the distributed environment
❑ Network (distribution) transparency
❑ Replication transparency
❑ Fragmentation transparency
◼ horizontal fragmentation: selection
◼ vertical fragmentation: projection
◼ hybrid

© 2020, M.T. Özsu & P. Valduriez TS. Pan


Thị Hà_PTIT
Example

© 2020, M.T. Özsu & P. Valduriez TS. Pan


22
Thị Hà_PTIT
Transparent Access

Tokyo

SELECT ENAME,SAL
FROM EMP,ASG,PAY Boston Paris
WHERE DUR > 12 Paris projects
Paris employees
AND EMP.ENO = ASG.ENO Communication Paris assignments
Network Boston employees
AND PAY.TITLE = EMP.TITLE
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments

© 2020, M.T. Özsu & P. Valduriez TS. Pan


23
Thị Hà_PTIT
Distributed Database - User View

Distributed Database

© 2020, M.T. Özsu & P. Valduriez TS. Pan


24
Thị Hà_PTIT
Distributed DBMS - Reality
User
Query

User
DBMS
Application
Software
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query
DBMS
Software

User
Query

© 2020, M.T. Özsu & P. Valduriez TS. Pan


25
Thị Hà_PTIT
Types of Transparency

◼ Data independence
◼ Network transparency (or distribution transparency)
❑ Location transparency
❑ Fragmentation transparency
◼ Fragmentation transparency
◼ Replication transparency

© 2020, M.T. Özsu & P. Valduriez TS. Pan


26
Thị Hà_PTIT
Reliability Through Transactions

◼ Replicated components and data should make distributed


DBMS more reliable.
◼ Distributed transactions provide
Concurrency transparency

❑ Failure atomicity

• Distributed transaction support requires implementation of


❑ Distributed concurrency control protocols

❑ Commit protocols

◼ Data replication
❑ Great for read-intensive workloads, problematic for updates
❑ Replication protocols

© 2020, M.T. Özsu & P. Valduriez TS. Pan


27
Thị Hà_PTIT
Potentially Improved Performance

◼ Proximity of data to its points of use

❑ Requires some support for fragmentation and replication

◼ Parallelism in execution

❑ Inter-query parallelism

❑ Intra-query parallelism

© 2020, M.T. Özsu & P. Valduriez TS. Pan


28
Thị Hà_PTIT
Scalability

◼ Issue is database scaling and workload scaling

◼ Adding processing and storage power

◼ Scale-out: add more servers

❑ Scale-up: increase the capacity of one server → has limits

© 2020, M.T. Özsu & P. Valduriez TS. Pan


29
Thị Hà_PTIT
Outline
◼ Introduction

❑ Design issues

© 2020, M.T. Özsu & P. Valduriez TS. Pan


30
Thị Hà_PTIT
Distributed DBMS Issues

◼ Distributed database design


❑ How to distribute the database
❑ Replicated & non-replicated database distribution
❑ A related problem in directory management

◼ Distributed query processing


❑ Convert user transactions to data manipulation instructions
❑ Optimization problem
◼ min{cost = data transmission + local processing}
❑ General formulation is NP-hard

© 2020, M.T. Özsu & P. Valduriez TS. Pan


31
Thị Hà_PTIT
Distributed DBMS Issues

◼ Distributed concurrency control


❑ Synchronization of concurrent accesses
❑ Consistency and isolation of transactions' effects
❑ Deadlock management

◼ Reliability
❑ How to make the system resilient to failures
❑ Atomicity and durability

© 2020, M.T. Özsu & P. Valduriez TS. Pan


32
Thị Hà_PTIT
Distributed DBMS Issues

◼ Replication
❑ Mutual consistency
❑ Freshness of copies
❑ Eager vs lazy
❑ Centralized vs distributed
◼ Parallel DBMS
❑ Objectives: high scalability and performance
❑ Not geo-distributed
❑ Cluster computing

© 2020, M.T. Özsu & P. Valduriez TS. Pan


33
Thị Hà_PTIT
Related Issues

◼ Alternative distribution approaches


❑ Modern P2P
❑ World Wide Web (WWW or Web)
◼ Big data processing
❑ 4V: volume, variety, velocity, veracity
❑ MapReduce & Spark
❑ Stream data
❑ Graph analytics
❑ NoSQL
❑ NewSQL
❑ Polystores

© 2020, M.T. Özsu & P. Valduriez TS. Pan


34
Thị Hà_PTIT
Outline
◼ Introduction

❑ Distributed DBMS architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


35
Thị Hà_PTIT
DBMS Implementation Alternatives

© 2020, M.T. Özsu & P. Valduriez TS. Pan


36
Thị Hà_PTIT
Dimensions of the Problem

◼ Distribution
❑ Whether the components of the system are located on the same machine or
not
◼ Heterogeneity
❑ Various levels (hardware, communications, operating system)
❑ DBMS important one
◼ data model, query language,transaction management algorithms
◼ Autonomy
❑ Not well understood and most troublesome
❑ Various versions
◼ Design autonomy: Ability of a component DBMS to decide on issues related to its
own design.
◼ Communication autonomy: Ability of a component DBMS to decide whether and
how to communicate with other DBMSs.
◼ Execution autonomy: Ability of a component DBMS to execute local operations in
any manner it wants to.

© 2020, M.T. Özsu & P. Valduriez TS. Pan


37
Thị Hà_PTIT
Client/Server Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


38
Thị Hà_PTIT
Advantages of Client-Server
Architectures
◼ More efficient division of labor
◼ Horizontal and vertical scaling of resources
◼ Better price/performance on client machines
◼ Ability to use familiar tools on client machines
◼ Client access to remote data (via standards)
◼ Full DBMS functionality provided to client workstations
◼ Overall better system price/performance

© 2020, M.T. Özsu & P. Valduriez TS. Pan


39
Thị Hà_PTIT
Database Server

© 2020, M.T. Özsu & P. Valduriez TS. Pan


40
Thị Hà_PTIT
Distributed Database Servers

© 2020, M.T. Özsu & P. Valduriez TS. Pan


41
Thị Hà_PTIT
Peer-to-Peer Component Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


42
Thị Hà_PTIT
MDBS Components & Execution

© 2020, M.T. Özsu & P. Valduriez TS. Pan


43
Thị Hà_PTIT
Mediator/Wrapper Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


44
Thị Hà_PTIT
Cloud Computing

On-demand, reliable services provided over the Internet in


a cost-efficient manner
◼ IaaS – Infrastructure-as-a-Service

◼ PaaS – Platform-as-a-Service

◼ SaaS – Software-as-a-Service

◼ DaaS – Database-as-a-Service

© 2020, M.T. Özsu & P. Valduriez TS. Pan


45
Thị Hà_PTIT
Simplified Cloud Architecture

© 2020, M.T. Özsu & P. Valduriez TS. Pan


46
Thị Hà_PTIT

You might also like