0% found this document useful (0 votes)
101 views36 pages

DDBMS Handbook (MSE 1)

This document provides an overview of distributed database management systems (DDBMS). It discusses that a distributed database is a set of interconnected databases distributed over a computer network. A DDBMS manages the distributed database and provides mechanisms to make the databases transparent to users. The document also describes some key aspects of DDBMS including types of DDBMS, distributed database architectures, and design alternatives for distributed databases.

Uploaded by

qwerty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views36 pages

DDBMS Handbook (MSE 1)

This document provides an overview of distributed database management systems (DDBMS). It discusses that a distributed database is a set of interconnected databases distributed over a computer network. A DDBMS manages the distributed database and provides mechanisms to make the databases transparent to users. The document also describes some key aspects of DDBMS including types of DDBMS, distributed database architectures, and design alternatives for distributed databases.

Uploaded by

qwerty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Distributed DBMS

(2170714)

Hand Book
Year: 2020-21

CE & IT Department

Prepared By: Prof. Bansari Thakkar, Asst. Prof. CE Dept, AIT


Index
1.0 Introduction to DBMS 3
2.0 : Database and Database management systems 3
3.0 : Database schema 4
4.0: Types of DBMS 4
5.0: Operations on DBMS 6
6.0. Distributed DBMS 8
7.0 Types of Distributed DBMS 12
8.0 Distributed DBMS Architectures 13
9.0 RDBMS 14
10.0 Data Intigrity 16
11.0 Database Normalization 16
12.0 Relational algebra 21
13.0 Translationg SQL querries into Relational algebra 24
14.0 Distributed DBMS Architecture- Models.
26
15.0 Design Alternatives 30
16.0 Replication 32
17.0 Fragmentation 33
18. 0 Transperancies. 35
Introduction to DBMS
What is DBMS?
This chapter gives an overview of databases and Database Management Systems (DBMS).
A database is an ordered collection of related data. A DBMS is a software package to work
upon a database. A detailed study of DBMS is available in our tutorial named “Learn
DBMS”. In this chapter, we revise the main concepts so that the study of DDBMS can be
done with ease. The three topics covered are database schemas, types of databases and
operations on databases

Database and Database Management System

A database is an ordered collection of related data that is built for a specific purpose. A
database may be organized as a collection of multiple tables, where a table represents a real
world element or entity. Each table has several different fields that represent the characteristic
features of the entity.

For example, a company database may include tables for projects, employees, departments,
products and financial records. The fields in the Employee table may be Name, Company_Id,
Date_of_Joining, and so forth.

A database management system is a collection of programs that enables creation and


maintenance of a database. DBMS is available as a software package that facilitates definition,
construction, manipulation and sharing of data in a database. Definition of a database includes
description of the structure of a database. Construction of a database involves actual storing of
the data in any storage medium. Manipulation refers to the retrieving information from the
database, updating the database and generating reports. Sharing of data facilitates data to be
accessed by different users or programs.
Examples of DBMS Application Areas

 Automatic Teller Machines


 Train Reservation System
 Employee Management System
 Student Information System

Prepared By: Bansari Thakkar


Page 3
Examples of DBMS Packages

 MySQL
 Oracle
 SQL Server
 dBASE
 FoxPro
 PostgreSQL, etc.

Database Schema
 A database schema is a description of the database which is specified during database
design and subject to infrequent alterations.
 It defines the organizationof the data, the relationships among them, and the constraints
associated with them.
 Databases are often represented through the three-schema architecture or ANSISPARC
architecture. The goal of this architecture is to separate the user application from the
physical database. The three levels are −
 Internal Level having Internal Schema − It describes the physical structure, details of
internal storage and access paths for the database.
 Conceptual Level having Conceptual Schema − It describes the structure of the whole
database while hiding the details of physical storage of data. This illustrates the entities,
attributes with their data types and constraints, user operations and relationships.
 External or View Level having External Schemas or Views − It describes the portion of a
database relevant to a particular user or a group of users while hiding the rest of database.

Types of DBMS
There are four types of DBMS.
Hierarchical DBMS
In hierarchical DBMS, the relationships among data in the database are established so that one
data element exists as a subordinate of another. The data elements have parent-child relationships
and are modelled using the “tree” data structure. These are very fast and simple.

Prepared By: Bansari Thakkar


Page 4
Network DBMS
Network DBMS in one where the relationships among data in the database are of type many-to-
many in the form of a network. The structure is generally complicated due to the existence of
numerous many-to-many relationships. Network DBMS is modelled using “graph” data
structure.

Relational DBMS
In relational databases, the database is represented in the form of relations. Each relation models
an entity and is represented as a table of values. In the relation or table, a row is called a tuple
and denotes a single record. A column is called a field or an attribute and denotes a characteristic
property of the entity. RDBMS is the most popular database management system.

For example − A Student Relation −

Prepared By: Bansari Thakkar


Page 5
Object Oriented DBMS
Object-oriented DBMS is derived from the model of the object-oriented programming paradigm.
They are helpful in representing both consistent data as stored in databases, as well as transient
data, as found in executing programs. They use small, reusable elements called objects. Each
object contains a data part and a set of operations which works upon the data. The object and its
attributes are accessed through pointers instead of being stored in relational table models.

For example − A simplified Bank Account object-oriented database −

Distributed DBMS
A distributed database is a set of interconnected databases that is distributed over the computer
network or internet. A Distributed Database Management System (DDBMS) manages the
distributed database and provides mechanisms so as to make the databases transparent to the
users. In these systems, data is intentionally distributed among multiple nodes so that all
computing resources of the organization can be optimally used.

M operations on a DBMS
The four basic operations on a database are Create, Retrieve, Update and Delete.

CREATE database structure and populate it with data − Creation of a database relation
involves specifying the data structures, data types and the constraints of the data to be
stored.

Example − SQL command to create a student table −

CREATE TABLE STUDENT (

ROLL INTEGER PRIMARY KEY,

NAME VARCHAR2(25),

YEAR INTEGER,

Prepared By: Bansari Thakkar


Page 6
STREAM VARCHAR2(10));

Once the data format is defined, the actual data is stored in accordance with the format in
some storage medium.

Example SQL command to insert a single tuple into the student table −

INSERT INTO STUDENT ( ROLL, NAME, YEAR, STREAM)

VALUES (1,'ANKIT JHA',1,'COMPUTER SCIENCE');

RETRIEVE information from the database – Retrieving information generally involves


selecting a subset of a table or displaying data from the table after some computations
have been done. It is done by querying upon the table.

Example − To retrieve the names of all students of the Computer Science stream, the
following SQL query needs to be executed −

SELECT NAME FROM STUDENT

WHERE STREAM ='COMPUTER SCIENCE';

UPDATE information stored and modify database structure – Updating a table involves
changing old values in the existing table’s rows with new values.

Example − SQL command to change stream from Electronics to Electronics and


Communications −

UPDATE STUDENT

SET STREAM ='ELECTRONICS AND COMMUNICATIONS'

WHERE STREAM ='ELECTRONICS';

Modifying database means to change the structure of the table. However, modification of
the table is subject to a number of restrictions.

Example − To add a new field or column, say address to the Student table, we use the
following SQL command −

ALTER TABLE STUDENT

ADD ( ADDRESS VARCHAR2(50));

Prepared By: Bansari Thakkar


Page 7
DELETE information stored or delete a table as a whole – Deletion of specific
information involves removal of selected rows from the table that satisfies certain
conditions.

Example − To delete all students who are in 4th year currently when they are passing
out, we use the SQL command −

DELETE FROM STUDENT

WHERE YEAR =4;

Alternatively, the whole table may be removed from the database.

Example − To remove the student table completely, the SQL command used is −

DROP TABLE STUDENT;

This chapter introduces the concept of DDBMS. In a distributed database, there are a number of
databases that may be geographically distributed all over the world. A distributed DBMS
manages the distributed database in a manner so that it appears as one single database to users.
In the later part of the chapter, we go on to study the factors that lead to distributed databases,
its advantages and disadvantages.

Distributed DBMS

A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.

1) Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.

2) Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.

3) The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.

4) A distributed database is not a loosely connected file system.

5) A distributed database incorporates transaction processing, but it is not synonymous with a


transaction processing system.

Prepared By: Bansari Thakkar


Page 8
Distributed database management system (DDBMS)

A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.
Features
1) It is used to create, retrieve, update and delete distributed databases.

2) It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.

3) It ensures that the data modified at any site is universally updated.

4) It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.

5) It is designed for heterogeneous database platforms.

6) It maintains confidentiality and data integrity of the databases.

Factors Encouraging DDBMS


The following factors encourage moving over to DDBMS −

Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus, the overall database of the organization becomes distributed.

Need for Sharing of Data − The multiple organizational units often need to communicate with
each other and share their data and resources. This demands common databases or replicated
databases that should be used in a synchronized manner.

Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common data.
Distributed database systems aid both these processing by providing synchronized data.

Database Recovery − One of the common techniques used in DDBMS is replication of data
across different sites. Replication of data automatically helps in data recovery if database in any

Prepared By: Bansari Thakkar


Page 9
site is damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.

Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.

Advantages of Distributed Databases

Following are the advantages of distributed databases over centralized databases.

Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers
and local data to the new site and finally connecting them to the distributed system, with no
interruption in current functions.

More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.

Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized systems,
all queries have to pass through the central computer for processing, which increases the
response time.

Lower Communication Cost − In distributed database systems, if data is located locally where it
is mostly used, then the communication costs for data manipulation can be minimized. This is
not feasible in centralized systems.

Prepared By: Bansari Thakkar


Page 10
Adversities of Distributed Databases

Following are some of the adversities associated with distributed databases.

Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several sites.

Processing overhead − Even simple operations may require a large number of


communications and additional calculations to provide uniformity in data across the sites.

Data integrity − The need for updating data in multiple sites pose problems of data
integrity.

Overheads for improper data distribution − Responsiveness of queries is largely


dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.

In this part of the tutorial, we will study the different aspects that aid in designing distributed
database environments. This chapter starts with the types of distributed databases. Distributed
databases can be classified into homogeneous and heterogeneous databases having further
divisions. The next section of this chapter discusses the distributed architectures namely client –
server, peer – to – peer and multi – DBMS. Finally, the different design alternatives like
replication and fragmentation are introduced.

Prepared By: Bansari Thakkar


Page 11
Types of Distributed Databases
Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems.
Its properties are −

1) The sites use very similar software.

2) The sites use identical DBMS or DBMS from the same vendor.

3) Each site is aware of all other sites and cooperates with other sites to process user requests.

4) The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −

Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.

Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

Prepared By: Bansari Thakkar


Page 12
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −

1) Different sites use dissimilar schemas and software.

2) The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.

3) Query processing is complex due to dissimilar schemas.

4) Transaction processing is complex due to dissimilar software.

5) A site may not be aware of other sites and so there is limited co-operation in processing user
requests.

Types of Heterogeneous Distributed Databases


Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.

Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Distributed DBMS Architectures


DDBMS architectures are generally developed depending on three parameters −

Distribution − It states the physical distribution of data across the different sites.

Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.

Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system


components and databases.

Prepared By: Bansari Thakkar


Page 13
RDBMS

RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and
for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft
Access.

A Relational database management system (RDBMS) is a database management system


(DBMS) that is based on the relational model as introduced by E. F. Codd.

The data in an RDBMS is stored in database objects which are called as tables. This table is
basically a collection of related data entries and it consists of numerous columns and rows.

Remember, a table is the most common and simplest form of data storage in a relational
database. The following program is an example of a EMPLOYEE table −

+----+----------+-----+-----------+----------+

| ID | NAME | AGE | ADDRESS | SALARY |

+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

| 6 | Komal | 22 | MP | 4500.00 |

| 7 | Muffy | 24 | Indore | 10000.00 |

+----+----------+-----+-----------+----------+

Every table is broken up into smaller entities called fields. The fields in the CUSTOMERS
table consist of ID, NAME, AGE, ADDRESS and SALARY.

A field is a column in a table that is designed to maintain specific information about every
record in the table.

Prepared By: Bansari Thakkar


Page 14
Record or a Row?
A record is also called as a row of data is each individual entry that exists in a table. For
example, there are 7 records in the above CUSTOMERS table.

A record is a horizontal entity in a table.


column
A column is a vertical entity in a table that contains all information associated with a specific
field in a table.

For example, a column in the CUSTOMERS table is ADDRESS, which represents location
description

NULL value
A NULL value in a table is a value in a field that appears to be blank, which means a field with a
NULL value is a field with no value.

It is very important to understand that a NULL value is different than a zero value or a field that
contains spaces. A field with a NULL value is the one that has been left blank during a record
creation.
Constraints
Constraints are the rules enforced on data columns on a table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.

Constraints can either be column level or table level. Column level constraints are applied only
to one column whereas, table level constraints are applied to the entire table.

Following are some of the most commonly used constraints available in SQL

NOT NULL Constraint − Ensures that a column cannot have a NULL value.

DEFAULT Constraint − Provides a default value for a column when none is specified.

UNIQUE Constraint − Ensures that all the values in a column are different.

PRIMARY Key − Uniquely identifies each row/record in a database table.

FOREIGN Key − Uniquely identifies a row/record in any another database table.

CHECK Constraint − The CHECK constraint ensures that all values in a column satisfy
certain conditions.

INDEX − Used to create and retrieve data from the database very quickly.

Prepared By: Bansari Thakkar


Page 15
data integrity

The following categories of data integrity exist with each RDBMS −

Entity Integrity − There are no duplicate rows in a table.

Domain Integrity − Enforces valid entries for a given column by restricting the type, the format,
or the range of values.

Referential integrity − Rows cannot be deleted, which are used by other records.

User-Defined Integrity − Enforces some specific business rules that do not fall into entity,
domain or referential integrity.

Database normalization

Database normalization is the process of efficiently organizing data in a database. There are two
reasons of this normalization process −

Eliminating redundant data, for example, storing the same data in more than one table.

Ensuring data dependencies make sense.

Both these reasons are worthy goals as they reduce the amount of space a database consumes and
ensures that data is logically stored. Normalization consists of a series of guidelines that help
guide you in creating a good database structure.

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible.

anomalies
Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while
a few others are left with old values. Such instances leave the database in an inconsistent
state.

Prepared By: Bansari Thakkar


Page 16
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because
of unawareness, the data is also saved somewhere else.

Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization

It is a method to remove all these anomalies and bring the database to a consistent state.

Normalization guidelines are divided into normal forms; think of a form as the format or the way
a database structure is laid out. The aim of normal forms is to organize the database structure, so
that it complies with the rules of first normal form, then second normal form and finally the third
normal form.

It is your choice to take it further and go to the fourth normal form, fifth normal form and so on,
but in general, the third normal form is more than enough.

1) First Normal Form (1NF)


2) Second Normal Form (2NF)
3) Third Normal Form (3NF)
4) Boyce-Codd Normal Form (BCNF)

First Normal Form (1NF)

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all
the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units.

Prepared By: Bansari Thakkar


Page 17
We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form (2NF)


Before we learn about the second normal form, we need to understand the following −

Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime


attribute.

Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a


non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true.

Prepared By: Bansari Thakkar


Page 18
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.

Third Normal Form (3NF)

For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy −

 No non-prime attribute is transitively dependent on prime key attribute.

 For any non-trivial functional dependency, X → A, then either −

 X is a superkey or,
 A is prime attribute.

Prepared By: Bansari Thakkar


Page 19
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as follows −

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that -

 For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip

and

Zip → City

Which confirms that both the relations are in BCNF.

Prepared By: Bansari Thakkar


Page 20
Relational algebra

Relational algebra defines the basic set of operations of relational database model. A sequence
of relational algebra operations forms a relational algebra expression. The result of this
expression represents the result of a database query.

The basic operations are −

 Projection

 Selection

 Union

 Intersection

 Minus

 Join
Projection
Projection operation displays a subset of fields of a table. This gives a vertical partition of the
table.

Syntax in Relational Algebra

π<AttributeList>(<TableName>)
For example, let us consider the following Student database –

STUDENT

Roll_No Name Course Semester Gender

2 Amit Prasad BCA 1 Male

4 Varsha Tiwari BCA 1 Female

5 Asif Ali MCA 2 Male

6 Joe Wallace MCA 1 Male

8 Shivani Iyengar BCA 1 Female

Prepared By: Bansari Thakkar


Page 21
If we want to display the names and courses of all students, we will use the following relational
algebra expression −

πName,Course(STUDENT)
Selection
Selection operation displays a subset of tuples of a table that satisfies certain conditions. This
gives a horizontal partition of the table.

Syntax in Relational Algebra

σ<Conditions>(<TableName>)

For example, in the Student table, if we want to display the details of all students who have
opted for MCA course, we will use the following relational algebra expression −

σCourse="mCA"(STUDENT)
Combination of Projection and Selection Operations
For most queries, we need a combination of projection and selection operations. There are two
ways to write these expressions −

 Using sequence of projection and selection operations.

 Using rename operation to generate intermediate results.

For example, to display names of all female students of the BCA course −

 Relational algebra expression using sequence of projection and selection operations

πName(σGender="Female"ANDCourse="BCA"(STUDENT))

Relational algebra expression using rename operation to generate intermediate results

FemaleBCAStudent←σGender="Female"ANDCourse="BCA"(STUDENT)
Result←πName(FemaleBCAStudent)

Union
If P is a result of an operation and Q is a result of another operation, the union of P and Q
(p∪ Qp∪ Q) is the set of all tuples that is either in P or in Q or in both without duplicates.

Prepared By: Bansari Thakkar


Page 22
For example, to display all students who are either in Semester 1 or are in BCA course −

Sem1Student←σSemester=1(STUDENT)
BCAStudent←σCourse="BCA"(STUDENT)
Result←Sem1Student∪BCAStudent
Intersection
If P is a result of an operation and Q is a result of another operation, the intersection of P and Q
( p∩ Qp∩ Q ) is the set of all tuples that are in P and Q both.

For example, given the following two schemas −

EMPLOYEE

EmpID Name City Department Salary

PROJECT

Pid City Department Status

To display the names of all cities where a project is located and also an employee resides −

CityEmp←πCity(EMPLOYEE)
CityProject←πCity(PROJECT)
Result←CityEmp∩CityProject
Minus
If P is a result of an operation and Q is a result of another operation, P - Q is the set of all tuples
that are in P and not in Q.

For example, to list all the departments which do not have an ongoing project (projects with status
= ongoing) –

AllDept←πDepartment(EMPLOYEE)
ProjectDept←πDepartment(σStatus="ongoing"(PROJECT)) Result←AllDept−ProjectDept
Join
Join operation combines related tuples of two different tables (results of queries) into a single
table.

Prepared By: Bansari Thakkar


Page 23
For example, consider two schemas, Customer and Branch in a Bank database as follows −
CUSTOMER

CustID AccNo TypeOfAc BranchID DateOfOpening

BRANCH

BranchID BranchName IFSCcode Address

To list the employee details along with branch details −

Result←CUSTOMER⋈ Customerid.BranchID⋈ Branch

u Translating queries into Relational Algebra

SQL queries are translated into equivalent relational algebra expressions before optimization. A
query is at first decomposed into smaller query blocks. These blocks are translated to equivalent
relational algebra expressions. Optimization includes optimization of each block and then
optimization of the query as a whole.

Examples
Let us consider the following schemas −

EMPLOYEE

EmpID Name City Department Salary

PROJECT

PId City Department Status

WORKS

EmpID PID Hours

Example 1

To display the details of all employees who earn a salary LESS than the average salary, we write
the SQL query −

SELECT * FROM EMPLOYEE

WHERE SALARY <( SELECT AVERAGE(SALARY) FROM EMPLOYEE );

This query contains one nested sub-query. So, this can be broken down into two blocks.

Prepared By: Bansari Thakkar


Page 24
The inner block is −

SELECT AVERAGE(SALARY)FROM EMPLOYEE ;

If the result of this query is AvgSal, then outer block is −

SELECT * FROM EMPLOYEE WHERE SALARY <AvgSal;

Relational algebra expression for inner block −

AvgSal←IAVERAGE(Salary)EMPLOYEE

Relational algebra expression for outer block −

σSalary<AvgSal>EMPLOYEE

Example 2

To display the project ID and status of all projects of employee 'Arun Kumar', we write the SQL
query −

SELECT PID, STATUS FROM PROJECT

WHERE PID =( SELECT FROM WORKS WHERE EMPID =( SELECT EMPID FROM EMPLOYEE

WHERE NAME ='ARUN KUMAR'));

This query contains two nested sub-queries. Thus, can be broken down into three blocks, as f
follows −

SELECT EMPID FROM EMPLOYEE WHERE NAME ='ARUN KUMAR';

SELECT PID FROM WORKS WHERE EMPID =ArunEmpID;

SELECT PID, STATUS FROM PROJECT WHERE PID =ArunPID;

(Here ArunEmpID and ArunPID are the results of inner queries)

Relational algebra expressions for the three blocks are −

ArunEmpID←πEmpID(σName="ArunKumar"(EMPLOYEE))
ArunPID←πPID(σEmpID="ArunEmpID"(WORKS)) Result←πPID,Status(σPID="ArunPID"(PROJECT))

Distributed DBMS Architecture- Models.


Prepared By: Bansari Thakkar
Page 25
Some of the common architectural models are −

 Client - Server Architecture for DDBMS

 Peer - to - Peer Architecture for DDBMS

 Multi - DBMS Architecture

Client - Server Architecture for DDBMS


This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have some
functions like consistency checking and transaction management.

The two different client - server architecture are −

 Single Server Multiple Client

 Multiple Server Multiple Client (shown in the following diagram)

Prepared By: Bansari Thakkar


Page 26
Peer- to-Peer Architecture for DDBMS
In these systems, each peer acts both as a client and a server for imparting database services.
The peers share their resource with other peers and co-ordinate their activities.

This architecture generally has four levels of schemas −

Global Conceptual Schema − Depicts the global logical view of data.

Local Conceptual Schema − Depicts logical data organization at each site.

Local Internal Schema − Depicts physical data organization at each site.

External Schema − Depicts user view of data.

Prepared By: Bansari Thakkar


Page 27
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous database
systems.

Multi-DBMS can be expressed through six levels of schemas −

Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.

Multi-database Conceptual Level − Depicts integrated multi-database that comprises of


global logical multi-database structure definitions.

Multi-database Internal Level − Depicts the data distribution across different sites and multi-
database to local data mapping.

Local database View Level − Depicts public view of local data.

Local database Conceptual Level − Depicts local data organization at each site.

Local database Internal Level − Depicts physical data organization at each site.

There are two design alternatives for multi-DBMS −

Prepared By: Bansari Thakkar


Page 28
 Model with multi-database conceptual level.

 Model without multi-database conceptual level.

Prepared By: Bansari Thakkar


Page 29
design alternatives

The distribution design alternatives for the tables in a DDBMS are as follows −

 Non-replicated and non-fragmented

 Fully replicated

 Partially replicated

 Fragmented

 Mixed

Prepared By: Bansari Thakkar


Page 30
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it is
at a close proximity to the site where it is used most. It is most suitable for database systems
where the percentage of queries needed to join information in tables placed at different sites is
low. If an appropriate distribution strategy is adopted, then this design alternative helps to
reduce the communication cost during data processing.

Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since, each
site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost during
update operations. Hence, this is suitable for systems where a large number of queries is
required to be handled whereas the number of database updates is low.

Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables is
done in accordance to the frequency of access. This takes into consideration the fact that the
frequency of accessing the tables vary considerably from site to site. The number of copies of the
tables (or portions) depends on how frequently the access queries execute and the site which
generate the access queries.

Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of each
fragment in the system, i.e. no redundant data.

The three fragmentation techniques are −

 Vertical fragmentation

 Horizontal fragmentation

 Hybrid fragmentation

Prepared By: Bansari Thakkar


Page 31
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially replicated
across the different sites according to the frequency of accessing the fragments.

In the last chapter, we had introduced different design alternatives. In this chapter, we will study
the strategies that aid in adopting the designs. The strategies can be broadly divided into
replication and fragmentation. However, in most cases, a combination of the two is used.

Replication
Data replication is the process of storing separate copies of the database at two or more sites. It
is a popular fault tolerance technique of distributed databases.

Advantages of Data Replication


Reliability − In case of failure of any site, the database system continues to work since a copy is
available at another site(s).

Reduction in Network Load − Since local copies of data are available, query processing can be
done with reduced network usage, particularly during prime hours. Data updating can be done
at non-prime hours.

Quicker Response − Availability of local copies of data ensures quick query processing and
consequently quick response time.

Simpler Transactions − Transactions require less number of joins of tables located at different
sites and minimal coordination across the network. Thus, they become simpler in nature.

Disadvantages of Data Replication


Increased Storage Requirements − Maintaining multiple copies of data is associated with
increased storage costs. The storage space required is in multiples of the storage required for
a centralized system.

Increased Cost and Complexity of Data Updating − Each time a data item is updated, the
update needs to be reflected in all the copies of the data at the different sites. This requires
complex synchronization techniques and protocols.

Undesirable Application – Database coupling − If complex update mechanisms are not used,
removing data inconsistency requires complex co-ordination at application level. This results
in undesirable application – database coupling.

Prepared By: Bansari Thakkar


Page 32
Some commonly used replication techniques are −

 Snapshot replication

 Near-real-time replication

 Pull replication

Fragmentation

Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments.

Fragmentation can be of three types:

horizontal,

vertical, and

hybrid (combination of horizontal and vertical).

Horizontal fragmentation can further be classified into two techniques:

primary horizontal fragmentation and

derived horizontal fragmentation.

Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”

Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database system is increased.

Local query optimization techniques are sufficient for most queries since data is locally available.

Since irrelevant data is not available at the sites, security and privacy of the database system
can be maintained.

Disadvantages of Fragmentation
When data from different fragments are required, the access speeds may be very high.

In case of recursive fragmentations, the job of reconstruction will need expensive techniques.

Lack of back-up copies of data in different sites may render the database ineffective in case of
failure of a site.

Prepared By: Bansari Thakkar


Page 33
vertical fragmentation

In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order
to maintain reconstructiveness, each fragment should contain the primary key field(s) of the
table. Vertical fragmentation can be used to enforce privacy of data.

For example, let us consider that a University database keeps records of all registered students
in a Student table having the following schema.

STUDENT

Regd_No Name Course Address Semester Fees Marks

Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment the database as follows −

CREATE TABLE STD_FEES AS

SELECT Regd_No,Fees

FROM STUDENT;

Horizontal fragmentation

Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.

For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will horizontally
fragment the database as follows −

CREATE COMP_STD AS

SELECT * FROM STUDENT

WHERE COURSE ="Computer Science";

hybrid fragmentation

In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are


used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.

Prepared By: Bansari Thakkar


Page 34
Hybrid fragmentation can be done in two alternative ways −

At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.

At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.

Transparency

Distribution transparency is the property of distributed databases by the virtue of which the
internal details of the distribution are hidden from the users. The DDBMS designer may choose
to fragment tables, replicate the fragments and store them at different sites. However, since
users are oblivious of these details, they find the distributed database easy to use like any
centralized database.

The three dimensions of distribution transparency are −

 Location transparency

 Fragmentation transparency

 Replication transparency

Location transparency

Location transparency ensures that the user can query on any table(s) or fragment(s) of a table
as if they were stored locally in the user’s site. The fact that the table or its fragments are stored
at remote site in the distributed database system, should be completely oblivious to the end
user. The address of the remote site(s) and the access mechanisms are completely hidden.

In order to incorporate location transparency, DDBMS should have access to updated and
accurate data dictionary and DDBMS directory which contains the details of locations of data.

Prepared By: Bansari Thakkar


Page 35
Fragmentation transparency

Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of
some fragments. It also conceals the fact that the fragments are located at diverse sites.

This is somewhat similar to users of SQL views, where the user may not know that they are
using a view of a table instead of the table itself.

Replication transparency
Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table exists.

Replication transparency is associated with concurrency transparency and failure transparency.


Whenever a user updates a data item, the update is reflected in all the copies of the table.
However, this operation should not be known to the user. This is concurrency transparency.
Also, in case of failure of a site, the user can still proceed with his queries using replicated copies
without any knowledge of failure. This is failure transparency.

Prepared By: Bansari Thakkar


Page 36

You might also like