Unit 3 (Distributed DBMS Architecture)
Architecture: The architecture of a system defines its structure:
The components of the system are identified;
The function of each component is specified;
The interrelationships and interactions among the
components are defined.
Motivation for Standardization of DDBMS Architecture
DDBMS might be implemented as homogeneous or heterogeneous
DDBMS
Homogeneous DDBMS
All sites use same DBMS product
It is much easier to design and manage
The approach provides incremental growth and allows increased
performance
Heterogeneous DDBMS
Sites may run different DBMS products, with possibly different
underlying data models
This occurs when sites have implemented their own databases first,
and integration is considered later
Translations are required to allow for different hardware and/or
different DBMS products
Architectural Models for DDBMS: DDBMS architecture is Classified by
below 3 dimentions
Classified along three dimensions:
1. Autonomy
2. Distribution
3. Heterogeneity
1. Autonomy: Refers to the distribution of control (not of data) and
indicates the degree to which individual DBMSs can operate
independently.
1. 3 types of autonomy
1. Tight integration: a single-image of the entire database is
available to any user who wants to share the information
(which may reside in multiple DBs); realized such that one
data manager is in control of the processing of each user
request.ex. Homogeneous DDBMS
2. Semiautonomous systems: individual DBMSs can operate
independently, but have decided to participate in a
federation to make some of their local data sharable.
3. Total isolation: the individual systems are stand-alone
DBMSs, which know neither of the existence of other
DBMSs nor how to communicate with them; there is no
global control.ex. Heterogeneous DDBMS
Autonomy has different dimensions
1. Design autonomy: each individual DBMS is free to use the data
models and transaction management techniques that it prefers.
2. Communication autonomy: each individual DBMS is free to
decide what information to provide to the other DBMSs
3. Execution autonomy: each individual DBMS can execute the
transactions that are submitted to it in any way that it wants to.
2. Distribution: Refers to the physical distribution of data over
multiple sites.
1. No distribution: No distribution of data at all
2. Client/Server distribution: Data are concentrated on the
server, while clients provide application environment/user
interface First attempt to distribution
3. Peer-to-peer distribution (also called full distribution):
No distinction between client and server machine
Each machine has full DBMS functionality
3. Heterogeneity: Refers to heterogeneity of the components at
various levels
Hardware
communications
operating system
DB components (e.g., data model, query language,
transaction management algorithms
Q. Elaborate Peer to Peer distributed architecture.
This architectures is used by same organization
In these systems, each peer acts both as a client and a server for
imparting database services
The peers share their resource with other peers and co-ordinate their
activities.
This architecture generally has four levels of schemas –
The physical data organization on each machine may be different
Global Conceptual Schema –
Depicts the global logical view of data.
describes the enterprise view of the data
Union of the LCSs
The GCS is defined by integrating either the external
schemas of local autonomous databases or parts of
their local conceptual schemas
Local Conceptual Schema –
Depicts logical data organization at each site.
Required since the data are fragmented and
replicated
Local Internal Schema –
Depicts physical data organization at each site.
Describes the local physical data organization (Which
might be different on each machine?
External Schema –
Depicts user view of data.
Describes the user/application view on the data
This case, the ANSI/SPARC model is extended by the addition of
global directory to permits the required global mappings.
The local mappings are still performed by local directory / dictionary
(LD/D).
The local database management components are integrated by
means of global DBMS functions. Local conceptual schemas are
mappings of global schema onto each site
The detailed components of a distributed DBMS.
• Two major components:
user processor
data processor
User processor
user interface handler :- is responsible for interpreting user
commands as they come in, and formatting the result data as it is
sent to the user,
semantic data controller - uses the integrity constraints and
authorizations that are defined as part of the global conceptual
schema to check if the user query can be processed
global query optimizer and decomposer - determines an
execution strategy to minimize a cost function, and translates the
global queries in local ones using the global and local conceptual
schemas as well as global directory,
Distributed execution monitor - coordinates the distributed
execution of the user request.
Data processor
local query optimizer - is responsible for choosing the best access
path to access any data item,
local recovery manager - is responsible for making sure that the
local database remains consistent even when failures occur,
Run-time support processor - physically accesses the database
according to the physical commands in the schedule generated by
the query optimizer. This is the interface to the operating system and
contains the database buffer (or cache) manager, which is
responsible for maintaining the main memory buffers and managing
the data accesses
Q. Explain Client/Server Database architecture
This architecture is quite common in relational systems where
the communication between the clients and the server(s) is at
the level of SQL statements
It is based on reference model of DBMS
This provides two-level architecture which make it easier to
manage the complexity of modern DBMSs and the complexity
of distribution.
More efficient division of work
Divide the functionality into two classes
1.server functions
2.client functions
Client is defined as request of service
Server is provide services
It is very easy to manage
A client sends a query to one of the servers. The earliest
available server solves it and replies.
The server does most of the data management work (query
processing and optimization, transaction management, storage
management).
The client is the application and the user interface
(management the data that is cached to the client,
management the transaction locks)
server functions
o Mainly data management, including Query processing,
optimization, transaction management, etc.
o Query processing, optimization: server will execute
query in such way that result will be produce in optimal
(choosing the best access path to execute query)
o transaction management : To take care of ACID
property while transactions are ruining
o local recovery manager - is responsible for making sure
that the local database remains consistent even when
failures occur
Client Function :
o might also include some data management functions
(consistency checking, transaction management, etc.)
not just user interface
Different types of client/server architecture
o Multiple client/single server
o Multiple client/multiple server
Multiple client/single servers:
This is not true Distributed DBMS environment
It is centralized DBMS environment only one server database where
data is one site stored
From a data management perspective, this is not much different from
centralized databases since the database is stored on only one
machine (the server) which also hosts the software to manage it.
Multiple client/multiple servers
In this type of architectures there is two possibilities
o One server at a time :it means one client at one time can
connect with only one server if client wants to connect with
another server the client has to disconnect with that connected
server and then it can connect with another server
o Many server at a time: At a onetime multiple client can
connect with multiple server
Data can be fetched from more than one server
It is truly DDBMS environment