Database Management Systems Guide
Database Management Systems Guide
SYSTEMS
Structure
3.0 Objectives
3.1 Introduction
3.2 Definitions and Basic Concepts
3.2.1 Data and Information
3.2.2 Database and Database Management System (DBMS)
3.2.3 Data Hierarchy
3.2.4 Data Integrity
3.2.5 Data Independence
3.4 Evolution of DB MS
3.4.1 Chronology
3.4.2 Functions and Components of a DBMS
I
Database Design and
Management 3.0 OBJECTIVES
After reading this Unit, you will be able to:
• comprehend the need for a database management system (DBMS), its primary
objectives, database architecture and basic issues in the complex and expanding
environment of database management;
• distinguish between various data models and know the characteristics of each;
3.1 INTRODUCTION
Database systems have permeated all spheres of life. This Unit provides core information
in understanding database management systems. Starting with the need for a database
approach, the three-level database architecture has been explained. Modeling concepts
and the role E-R model plays in conceptual database design has been discussed. Classical
data models viz., hierarchical model, network model and relational model have been
explained with illustrations. Considering that the Relational Database Management Systems
(RDBMS) are still widely in use, relational database technology has been dealt in details.
The concepts of dependencies and normalisation have been elucidated with examples. A
step-by-step approach for designing databases has been given. The Unit also dwells on
some of the database systems in specific application areas.
To illustrate the difference between these two terms let us consider an example. The
digits 050643 as such have no meaning. But if we are told that the first two digits
represented a month, the next two digits, a day of the month and the last two, a year, then
the set 050643 may represent the date of birth of a person. Processed in another manner
the same digits written as 643050 may represent the telephone number of an individual.
I
User Database Management
.----User
DBMS Systems
User
Fig. 3.1: Data, DBMS and Users
3~2.3 Data Hierarchy
Hierarchy in the organisation of data in descending order of complexity is represented in
Fig. 3.2.
Database
~
File
~
Record
~
Field
~
Character
~
Byte
~
Bit
Fig. 3.2: Data Hierarchy
From this hierarchy it is clear that a database is made up of files. Files are composed of
records and each record consists of field~ data items. Each field is composed of
characters, which are made up of bytes. And lastly, bytes decompose into bits.
The degree of correctness, timeliness, and relevance of data stored in a database indicates
data integrity. Data integrity can be ensured by certain checks and validation procedures
carried out whenever an update operation is attempted and also by elimination of data
redundancy. Some database management systems have features, which support data
validation. For example, in ORACLE (a relational database management system) triggers
can be used for this purpose.
37
I
Database Design and
Management
3.2.5 Data Independence
Data independence is a property by which data files are insulated from application programs
that use those files. The close link between the data and the access program is weakened
and the database made more flexible to user requirements. With data independence, a
programmer can write programs in FORTRAN, PASCAL, or any other language of his
choice without bothering what language programs originally created the files that he wants
to access.
Data independence is an important concept which will be considered in more details while
discussing the architecture of database management systems.
1) What is data independence and what role does it play in database systems?
To illustrate the limitations of data management using master files, let us consider the
example of a database of an educational institution running professional courses. Let us
assume that the database consists of three master files of student, faculty and course
records. The student master file has been created using FORTRAN and has' such fields
as student identification number, name, address, gender, course, high-school grade and
examinations cleared.
The faculty master file uses CaBaL and consists of fields like faculty identification
number, name, gender, department, salary, qualifications and teaching hours. The course
master file is based on PASCAL and covers such data as course identification number,
38 course title, class number, section number and students attending the course.
Now, suppose there is a query to provide the names of all the female students being Database Management
Systems
taught by female faculty members. This query cannot be answered by the available
master files despite the fact that the data needed for the query exists in the database.
The difficulty lies in the fact that the needed data is available in two master files created
in different programming languages and having their own access techniques. To answer
the query a new master file with data items derived from the student and faculty master
files will have to be created and new programmes for accessing the data written. This
makes data retrieval cumbersome and time-consuming.
Take another situation when the accounts department of the institution in the example
also wants to use the database and needs the student master file with additional fields like
.
stipend paid, fees due, penalties charged, etc. To meet these requirements, another copy
of the student master file with new fields is created. Similarly, there may be copies of
faculty and course master files created to meet specific requirements. This results in
duplication of data i.e., data redundancy. In such circumstances it becomes difficult to
keep the master files identically updated i.e., propagate the updates in all the copies.
These limitation and drawbacks were at the core of development of database management
systems.
3 4.1 Chronology
i) Primitive (first generation) Systems
This stage is illustrated in Fig. 3.3a. Here, programming overheads are high as the user
has the responsibility for all the I/O (Input/Output) programming as well as file management.
Application programs (software that supports end user activity) have to be written to
access the data stored in secondary storage devices (Direct Access Storage Devices).
This phase represents the period of mid 1960s.
AppL Prg. 1 AppL Prg. 2
Mid 1960s
DASD
Late 1960s
DASD
The third generation systems (Fig. 3.3c) evolved in early 1970s. Here, a layer of DB MS
has been added over the operating system. The systems provide physical as well as
logical data independence. The advantage it gives is that query processing can be attempted
by offering higher levels of operation on the logical view of the data available from
DBMS to the application programs.
DBMS
Early 70s
1
Operating System
I
3.4.2 Functions and Components of a DB MS Database Management
Systems
Basically, there are only two operations that can be performed on data viz., retrieval and
maintenance. Retrieval refers to reading data from files to serve the information
requirements of a user and forms the most important function of a database management
system. Maintenance concerns changing of data in stored files.
Data maintenance involves three operations: addition, deletion and modification which
correspond to adding new records, deleting existing records and modify/updating values
in the existing records.
A database management system has two essential components: data definition part and
data manipulation part. Data definition part provides definition or description of database
objects and is written using data definition language (DDL). This part creates logical
structures of entities when a database is set up.
Programming language interface (PU) or host language interface provides access to the
database through some type of programming language (PASCAL, C, COBOL, etc.).
Query languages allow fast retrieval of data and some of them are considered fourth
generation languages (4GL). The 4GLs are non-procedural languages, which implies
that a user has to specify only what data is required and not how it should be retrieved.
The query languages can be grouped into two categories - command-driven query
languages and screen-oriented query languages. In the first case the commands are
specified in English-like text while in the second case the user enters commands through
a fill-in-the blank mechanism. SQL (structured query language), a 4GL and a standard
language for interfacing with relational DBMS, belongs to the first category while querying
data through SQL forms is an example of the second category.
Report writers represent programs, which are used to derive information from a database
and generate a report that presents this information in the desired fashion. And lastly,
system utilities, are programs which allow a system manager to take back-up of databases,
load data into a database, restore data in case of database crash and carry out other jobs
related to database administration.
3.5 ARCHITECTURE OF A DB MS
Database management systems have a three-level architecture (see Fig. 3.4). Schema,
level and view are used interchangeably to describe the architecture of a DBMS. The
uppermost level in the architecture is the external level, which refers to the way the users
view the data. The external level is also sometimes called subschema. A user would
generally be interested only in some portion of the total database, which forms his external
view. There may be several external views of the same database depending upon the
user requirements. External schema can be used for implementing data security by
restricting access to the database.
The second level is the conceptual schema, which represents the entire information
content of the database. It gives global or integrated view of the database - the view of
the database administrator. The conceptual schema is written using the conceptual DDL
(Data Definition Language). 41
I
Database Design and External Schema External Schema External Schema
Management
(User I) (User 2) (User 3)
DBMS
i oftware
Conceptual Schema
1------+ Second Level
~--------~
Mapping Statements
1 DBMS Software
Internal Schema
t------------i----~ Third Level
Mapping Statements
Access Method
Mapping statements (software programs) at the first level which establish correspondence
between the external and conceptual schemas provide data independence which implies
that one can define additional database objects or modify the structures of the existing
ones without requiring any existing user to change his application programs. With logical
data independence, a conceptual schema can grow and evolve with time without affecting
the external schemas.
The third level of the architecture is the internal schema. Internal view describes how the
data is actually stored and managed on the computer's secondary storage. It specifies
what indexes exist, how stored fields are represented, physical sequence of stored records,
etc. The internal schema is written using the internal DDL. Conceptual/internal mapping
statements ensure physical data independence - that is the way the data is actually stored
can be changed without requiring any change in the conceptual schema.
The mapping component between internal schema and secondary storage devices (direct
access storage devices) is called access method. The access method consists of a set of
routines whose function is to conceal all the device-dependent details from the DBMS
and present the DBMS with the stored record view, i.e., the internal view. In many cases
the underlying operating system performs this function.
42 ......................................................................................................................
I
Database Management
3.6 DATA MODELING Systems
Botanical
id
Standardising agency
i) 1: I (one-to-one)
iii) n: I (many-to-one)
Let us illustrate these relationship types one by one. In 1:1 relationship, one instance of an
entity of a given type is associated with only one member of another type. Let there be a
set of country names and a set of city names. Further, let us assume that each city in the
set is a capital. The relationship between these two sets which can be called "capital" is
] :] because for each country name there is only one city name and conversely, each city
name corresponds to only one country name.
In l:n relationship, one instance of a given type of entity is related to many instances of
another type. Let there be a set of departments and a set of faculty members employed
in the departments. Department-faculty relationship which can be called 'employed' is of
I:n type because each department employs several faculty members and each faculty
member works only in one department.
Many-to-one (n: 1) relationship has the same semantics as 1:n. In the above example if
we change the relationship to faculty-department (in place of department-faculty) we
will have n: 1 relationship type.
Lastly n:m relationship is one in which many instances of an entity type are associated
with many instances of another entity type. Consider a set of a faculty members teaching
a set of students. Faculty-student relationship ("teaching") is an example of n:m type
relationship because a faculty member can teach 'm' students and a student can be
taught by 'n' faculty members.
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................
I
• Hierarchical Database Managemenr
Systems
• Network
• Relational
Besides these three models, which are sometimes referred to as classical models, post-
relational research has resulted in a new data model called object-oriented data model.
The hierarchical model is the oldest of the three models. This model structures data so
that each element is subordinate to another in a strict hierarchical manner.
The hierarchical model represents l:n relationship between record types (see Fig. 3.6).
One record type (the I in I:n relationship) is designated as the "parent" record type. In
this parent child relationship, a child record type can have only one parent record type but
a parent record may have several child record types.
The hierarchical model is implemented by storing data in physical adjacency and using
pointers to the dependent child records. The model suffers from undesirable • data
redundancy.
An example of the hierarchical model is illustrated in Fig. 3.7. Here hierarchy has. been
shown between two record types-one having fields like name, address, profession,
account number and the other giving account number and balance. Redundancy has
been shown by an arrow.
Sari la Rohini
Retired Teacher Vasant Saket
Hajela Vihar Da, Scientist Advocate Naraina
Delhi Sinha Vihar Ray Teacher
Delhi Mathur Delhi
Delhi
Delhi
I
/
190 3000 229 3335 722 917 435 3900 I 229 3335
1 i
I 817
I 1725 I
Fig. 3.7: An Illustration of the Hierarchical Model 45
I
Database Design and Examples of commercial implementation ofthe hierarchical model are IBM's IMS database
Management
management system and CDS/ISIS - a popular database management system for
bibliographic applications.
The network model is implemented with various pointer schemes. Since a network is an
extension of hierarchy, the semantic properties of the network model are similar to those
of the hierarchical model.
The network model is illustrated in Fig.3.9. The example given makes use of the same
record types as in the hierarchical model with the difference that the first record type has
pointers to the Alc No.of the second record type. A pointer may itselfpoint to a number of
pointers (called arrays).
.
Hajela
-
Sarita Vihar
Delhi
Retired
-- •... 190 3000
Das
Rohini
Delhi
Teacher • 229 3335
Sinha
Vasant Virar
Scientist
0
-
....,.. 722 917
Delhi
Sa<et
I 817
I 1725
I
Roy
Delh
Advocate
f
I 435
I 3900
I
Mathur
Naraina
Delh Teacher
--
(An array)
46 Fig. 3.9: An Illustration of the Network Model
I
3.7.3 Relational Model Database Management
Systems
The relational model maintains data in tabular form, which users find comfortable and
familiar. The model is based on well-develoned mathematical theory of relations from
which it derives its name. The name has nothing to do with the fact that the data stored 'in
the relations is related (which usually is).
The relational model supports 1:1, 1:n, n: 1 and n:m relationships. A significant aspect of
the relational model is that the relationships between data items are not explicitly stated
by pointers. Instead, it is up to the DB MS to deduce the relationships from the existence
of matching data in different tables. The absence of physical links provides a more flexible
data manipulation environment.
An illustration of the relational model is given in Fig. 3.9. Here the relationship between
the two tables has been captured by repeating a column (Alc No.) in the first table.
Table name: Customer,
Table name: Account
T
Name Address '::;rof0,-c;ion Ale No. Ale No. Balance
~ ~
ORACLE, INGRESS, and SYBASE are some of the well-known commercial database
management systems based on the relational model.
Fig. 3.11 gives comparison between the conventional database approach and object-
oriented database approach.
Conventlonal pi ach Object. Orient d pproacb
User User
DBMS Operations
Message
Data
• Data '" Object
.'
User
• Objects: An object is an entity, real or abstract, that has state, behaviour and
identity. The state of an object is represented by its attributes and their values. The
behaviour of an object is represented by its operations or methods.
• Messages: Objects communicate with each other thr~ugh messages. A message
determines what operation is to be performed by an object. A message specifies an
operation name and a list of arguments.
• Classes: A class is a set of objects that share common attributes and behavior. Each
object is an instance of some class.
The object-oriented approach emphasises incremental software development. The
underlying principle of this approach is:
• Grow software, don't build it;
• Build components rather than a whole system; and
• Assemble a basic system and then enhance it.
Smalltalk, C++, Java and Object PascallDelphi are the object-oriented programming
languages used in this approach.
Self Check Exercise
4) Why have RDBMS found wider application than other data models? Give a few
examples of RDBMS.
48
Database Management
3.8 RELATIONAL DATABASE MANAGEMENT Systems
SYSTEMS (RDBMS)
Relational database management systems have been evolving since the relational data
model was first proposed in 1970 by Edgar F. Cod of IBM. They have become de facto
international standard in database management. Despite great advances in the object-
oriented database management systems, relational systems are likely to remain in vogue
for quite some time.
The degree of a relation determines the number of attributes in the relation. The number
oftuples in a relation is called its cardinality. For example, the customer table in Fig. 3.9
has four attributes and six tupies and hence the degree of this relation is four and cardinality
six.
ii) Each column of a relation has a unique name (although columns of different relations
may have the same name). .
vii) There may be more than one column in a relauon having th same domain
specifications.
viii) A column in a relation cannot have more than one domain specification.
be) Each column contains values about the same attributes and each table cell value
(intersection of a row and a column) must be single-valued.
The structure of a relation i.e., set of attributes without any values assigned to them IS
called the relation scheme. A tuple of a relation with values assigned to its attributes is
called a relation instance.
Database Design and A table is generally represented by its relation scheme which is denoted by table name
Management followed by attribute names given in brackets. The relation schemes of tables shown in
Fig. 3.10 are:
CUSTOMER (Name, Address, Profession, Nc No.)
i) Identification of tuples
The smallest superkey, which is also called minimal key, is a key such that no proper
subset of it is a superkey. One of the minimal keys is chosen as the primary key. The keys
in the set of minimal keys are called candidate or alternate keys. It is up to the database
designer to select one of the candidate keys as a primary key.
A primary key is an attribute or a combination of attributes that uniquely identifies a record
while a secondary key does not identify a recdfd uniquely. A secondary key identifies all
the re~ords corresponding to the key value.
Aforeign key 'is an attribute or a combination of attributes that is used to link tables. In
relational database management systems foreign keys are used as linking pins between
tables ..Foreign keys represent links to the primary keys.
Primary and foreign keys are critical in relational database management systems due to
their contribution in defining integrity rules. A pararriount guideline in relational syst~ms is
that a primary key or any attribute participating in a composite primary key of a relation
cannot have null value. This rule is called. entity integrity.
There is another integrity rule, which pertains to foreign keys. According to this rule an
attribute that is a foreign key in one table must be a primary key in another table. This rule
is known as referential integrity.
I
iii) The system should support relational algebra operators of SELECT, 'PROJECT and
JOIN.
Any system, which fulfills these three criteria, is called minimally relational. A system,
which satisfies only the first two conditions, is not a relational system and is called tabular
50 DBMS.
For a DBMS to be fully relational it should additionally support both entity and referential Database Management
Systems
integrity rules and implement all relational algebra operations.
The founder of relational database theory, E.F. Codd outlined 12 rules that define a fully
relational database system. These rules are based on the premise that a relational database
management system should be able to manage databases entirely through its relational
capabilities.
• Insertion anomalies
• Deletion anomalies
• Update' anomalies
An insertion anomaly occurs when we are unable to insert a .tuple into a table. Such a
situation can arise when the value of primary key is not known. As per-the entity integrity
rule, the primary key cannot have null value. Therefore, the value/s corresponding to
primary key attribute/s of the tuple must be assigned before inserting the tuple. If these
values are unknown, the tuple cannot be inserted into the table.
In case of a deletion anomaly, the deletion of a tuple causes problems in the database.
This call happen when we delete a tuple, which contains an important piece of information,
and thetuple being the last one in the table containing the information. With the deletion of
the tuple the important piece of information also gets removed from the database. '
3.9.1 Dependencies
A dependency refers to relationship amongst attributes. These attributes may belong to
the same relation or different relations. Dependencies can be of various types viz., functional
, dependencies, transitive dependencies, multi valued dependencies, join dependencies, etc.
We shall briefly examine some of these dependencies.
Book-id Subject
Book-id Price
Book
A B
B C
In this relation let us presume that each department has its own office building. Then
beside others we have the following functional dependencies:
Join Dependency - If we decompose a relation into smaller relations and the join of the
smaller relations does not give us tuples as in the parent relation, we say the relation has
join dependency. .
A B C ED: A-7B
aI bI cl
a2 b2 c3 C-7B
a3 bI c2
a4 b2 c4
Let us now decompose this relation into two relations SPLIT I (A, B) and SPLIT 2(B,
C). The functional dependencies will remain the same in the relation ·SAMPLE.
SPLIT I SPLIT 2
A B B C
aI bI bI cl
a2 b2 b2 c3
a3 bI bI c2
a4 b2 b2 c4
Now, if we join the relations SPLIT I and SPLIT 2 over common attribute B we get the
relation SAMPLE 1.
SAMPLE I
..,.
A B C
aI bI cl
aI bI c2*
a2 b2 c3
a2 b2 c4*
a3 bI c2
a3 bI cl*
a4 b2 c3*
a4 b2 c4
,. We can see from the relation SAMPLE I that it has four additional (~purious) tuples
(shown by asterisk), which were not present in the original relation SAMPLE. This type
of join is called lossy join because the information content of the original table is lost. This
has occurred since the join attribute was not the determinant in the original relation and
hence it should not have been decomposed the way it was done. ~e say a join dependency
exists in.this case. If we decompose a relation and join the constituent relations over the
determinant attribute, we get lossless join. 53
Database Design and For example, consider the relation SAMPLENEW and split it into SAMPLENEWI and
Ma~agement SAMPLENEW2 as given below:
SAMPLENEW
X Y Z
xl yl zl
x2 y2 72 F.D: X-7 Y
X-7Z'
x3 y2 zl
x4 yl 72
SAMPLENEWl SAMPLENEW2
X Y F.D: X -7 Y X Z F.D: X -7 Z
xl yl xl zl
x2 x2 72
y2
x3 y2 x3 zl
x4 72.
x4 yl
The join of SAMPLENEWI and SAMPLENEW2 gives the original table SAMPLENEW
without any spurious rows because the attribute X over which the table is decomposed is
the determinant attribute.
First normal form (lNF), second normal form (2NF), third normal form (3NF),
Boyce-Codd normal form .(BCNF), fourth normal form (4NF), fifth normal
.form (5NF) and the highest normal form which is called domainlkey normal form
(DKlNF).
E.F. Codd had outlined the first three normal forms, which we shall discuss in details.
Most of the commercially available DBMS are normalized up to 3NF or BCNF.
Normal forms build on each other i.e., if a relation is in 3NF, it is also in INF and 2NF.
The first normal form (lNF). A relation is in the first normal form if it can be
represented as a flat file i.e., the relation contains single values at the intersection of
each row and column. .
In this relation CName, Age .pertains to the attribute dependent child with multiple values
(name and age). To convert this relation into INF, it should be decorriposed into two
relations as foliows: .
" The second normal form (2NF). A relationis in the second normal form if it is in
INF and every non-key attribute is fully functionally dependent on the primary key.
The second normal form pertains only to relations with composite primary key. In
case a relation is in INF and has a single-attribute primary key, it is automatically In
the 2NF.
To explain the second normal form let us take the example of the following relation.
In this relation Course-id, Student-id and Faculty-id form a composite primary key.
Course-id ~. Course-name
This means that Course-name is not fully functionally dependent on the primary key.
Therefore the relation is not in the 2NF. The dependency diagram of the relation can be
represented as follows :
To convert this relation into the2NF, we decompose it into two relations as follows:
Letus examine these relations (normalised and unnormalised) on the basis o(;nomalies
described earlier. .
55
Database Design and COURSE (unnormalised)
Management
COURSE
COURSE- TITLE
Course-id Course-name
E500 Microeconomics
E501 Macroeconomics
Insertion: In the case of unnormalised relation, suppose a new course named 'Computer
networks' has to be introduced. We cannot enter the Course-name in the relation, since
we have no means to ascertain Student-id and Faculty-id at the time of introduction of
the course and these attributes, which along with Course-id form the primary key, cannot
be assigned null values. However, in the normalised relation COURSE-TITLE this problem
does not arise.
Deletion: Suppose in the unnormalised relation the student with Student-id SOOlleaves
56 the course. This student being the only student in the particular course, deletion of the
I
tuple would result in loss of information and we will have no way to know that the course Database Management
Systems
'Management Studies' exists. In normalised relation the deletion of this tuple does not
result in information loss because the information about the course name exists in the
second relation.
Updating: Let us assume that the name.of the course 'Library automation' changes to
'Library management'. In the unnormalised relation we will have to change the information
in two tuples while in the normalised relation only one tuple would require to be updated.
This may not appear to be a significant gain but if the database size is large the problem
can have serious repercussions.
Third normal form (3NF). A relation is in the third normal form if it is in second normal
form and contains no transitive dependencies. This normal form is considered to be the
most important normal form.
Let us also assume that each department has its office in one building.
This relation has the following functional dependencies by virtue of Faculty-id being the
primary key.
Faculty-id -7 Faculty-name
'Faculty-id -7 Department
Faculty-id -7 Gender
Faculty-id -7 Office
Department Office
I t t t t t
To convert this relation into the 3NF we shall have to decompose it into two smaller
relations. The new relation OFFICE-NAME has the attribute office which caused transitive
dependency and its determinant. The decomposed relations are :
With some tuple values in these relations let us examine them for anomalies, which are
used to test relations. 57
Database Design and FACULTY (unnormalised)
Management
Faculty-id Faculty-name Department Gender Salary Office
,"
FACULTY
OFFICE-NAME
Department Office
Insertion: Let us assume that a new department whose faculty has not yet been finalized
is created. We cannot enter this information in the unnormalised relation because we do
not know value of the primary key (Faculty-id). To assign values to Faculty-id corresponding
to new department we must know the values of other attributes (Faculty-name, Gender,
Salary, Office). This problem does not exist in the normalised relation.
Deletion: Suppose a faculty member Vohra of mathematics department leaves the faculty.
If we delete tuple corresponding to this Faculty-name in the unnormalised relation, the
information that mathematics department exists is also lost. However, this does not happen
in the normalised relations. ~ .
Update: If the office of the Computer Science department shifts from Birla Block to
Nehru Block, 3 tuples need to be updated in the unnormalised relation while only one tuple
would be modified in normalised relation.
Boyce-Codd normal form (BCNF). Originally, normal forms stopped at the 3NF. However,
research into dependencies led to higher normal forms. BCNF is an extension of the third
normal form. It states that if a relation is in 3NF and all determinants are candidate keys,
58 then it is in Boyce-Codd normal form.
I
The fourth normal form (4NF) refers to multivalued dependencies. A relation is said to Database Management
be in the fourth normal form if it has only one multi valued dependency. Systems
The fifth normal form (SNF) is also called project join normal form (PINF). If a relation
is in 5NF we should be able to join the projections (decompositions) of the relation and
reconstruct the original relation without any information loss. •.
The domain key normal form (DKlNF) is considered the highest normal form. A
relation is in DKINF if every constraint can be inferred by simply knowing the set of
attribute names and their underlying domain along with their set of keys. Thus in DKINF
only two types of constraints are allowed - domain "nstraints and key constraints. If
these constraints are fully enforced, other constraints (dependencies) are removed and
no anomalies can occur in the relation.
The process of normalisation from lower to higher normal forms has been illustrated in
Fig. 3.12.
RaW~Da,a
.~
~
DK/NF Enforce domain and key constraints
......................................................................................................................
... . . . ..~ .
......................................................................................................................
......................................................................................................................
.......................................................................................................................
...................................................................................................................... 59
Database Design and
Management 3.10 DESIGNING DATABASES
Designing a database is a highly complex operation. Though it is relatively easy to identify
a poorly designed database, there does not exist a unified approach which leads to the
best design.
The database design should be flexible enough to meet the requirements of the maximum
number of users to the fullest. Besides, the design should also anticipate, to a certain
extent, future requirements and make provisions for them. This calls for some intuitiveness
on the part of the database designer.
The process of designing a database is an interactive one, which means that initial database
structure, changes with usage. However, with time the design tends to get stabilized.
Usually, a person designated as database administrator (DBA) controls the design and
administration of a database.
A broad step-by-step procedure for designing a database has been summarized below:
1) First of all, data to be represented in the database is determined. For this, information
needs of the users are studied in detail. Based on the information requirements
anlysis, entities of interest are identified and their attributes examined.
2) . An E-R model of the database representing conceptual schema is drawn. This is the
most important stage in the database design process. E-R diagram, which depicts
entitites and their relationships, should be as comprehensive as possible.
3) The E-R model is mapped into a selected database structure (hierarchical, network,
relational or any other model). In c~se a relational model is chosen, tables '
corresponding to entities and their relations are finalised. The process of normalisation
is invoked to check the tables and reshape them if necessary.
5) The database is populated. This involves inserting data into the empty database. If
data to be inserted is available in machine-readable form, data loading utility of DB MS
can be utilised.
7) The feedback of the users on the database functionality is analysed and changes in
the structure made to optirnise the usage.
A distributed database is a single logical database, which is fragmented, and the fragments
spread across computers at different locations that are interlinked by a data communication
A
network to provide an integrated access to the data. distributed database environment
requires the data to be shared. A distributed database gives geographical data independence
i.e., a user requesting data need not know at which site the data is located. This property
60 is often referred to as location transparency and each local site is called a node.
Architecture and schematic representation of a distributed database is shown in Fig.3.13 Database Management
Systems
Distributed Distributed
DDID DDID
Distributed Distributed
DBMS DBMS
Communication
Application Network
Programs Application
Programs
Local Local
DBMS DBMS
Operating Operating
System System
Database
Database Site 2
Site 3
As is clear from Fig.3.13, each site has a local DB MS as well as a copy of the distributed
DBMS. Distributed data dictionary/directory (DDID) stores information on location of
data in the network as well as data definitions. Request for data is first checked from the
distributed data dictionary/directory for location of the required data. In case the data is
available at the local site, the distributed DB MS forwards the request to local DBMS for
processing. If the request involves data from other sites, the distributed DBMS routes
the request to these sites.
When different nodes in a distributed database have mixed DBMSs (i.e., node 1 may
have relational DBMS and node 2 network DBMS), then the distributed DB MS capable
of handling such environment is called heterogeneous distributed database management
system.
• By encouraging local control of data at different sites, data integrity improves and
data administration becomes easier. 61
Database Design and
Management
• Distribution of data can improve access time if local data is stored locally. By locating
data closer to the point of its use, communication cost can be reduced and query'
response time improved.
• Distributed system facilitates modular growth. New nodes hosting additional database
fragments can be added to the system.
There are a number of options available for distributing data in a distributed database.
These options include: i) data replication, ii) horizontal partitioning, iii) vertical partitioning
and iv) combination of the above.
In case of data replication a copy of the database is stored at a few or all sites (full
replication). Reliability, saving in telecommunication charges and faster response are
the advantages of this option. But additional storage requirements and difficulty in
propagating updates form the basic drawbacks. This option is suitable in case updates
are infrequent and database interaction is restricted to read-only. CD-ROM (compact
. I
disk read only memory) offers excellent medium for replicated databases.
A combination of the mentioned options of data distribution may be used depending upon
the needs of the distributed system. The basic principle, which one must keep in mind, is
that data should be stored at sites where it will be most frequently used.
Top Level
Flexible on-demand reports to make
decisions about unstructured queries
Middle Level
Management Information Summarised structured'[eports
System (MIS)
Lower Level
Processed transaction
Transaction Processing System (TPS)
All Levels
Expert System Office Automation System (OA S)
~n
Jud ements
Based on
Reasoning
,r ,r r
I Expert I
1
Knowledge Engineers <Ill
I I
,
Knowledge Base
*Facts
*Rules
I
~ ~
~1
___ In_fu_re_n_ce~E_n_g_in_e
__ ,~"'~-------------1.~ Control Mechanism
_1 U seT Interface
1_
Fig. 3.16: Components of an Expert System
On request for a piece of specific information from a user, the system looks for data and
rules in the knowledge base. The inference engine decides which rules to fire and how
they should be applied and also when to complete the querying process and provide
answer. User interface may prompt more input from a user to help the system respond
effectively to the query.
The use of expert systems is growing rapidly. However, human experts are likely to
remain in control using the expert systems as job aid.
Self Check Exercise
7) What is the difference between a knowledge base and a database?
8) What is artificial intelligence (l\I)? List some of the areas of its application.
Note: i) Write your answers in the space given below.
ii) Check your answers with the answers given at the end of the Unit.
l
Database Design and
Management 3.14 SUMMARY
This Unit discusses some of the basic issues in database management systems and
provides essential background to understand the functionality of these systems.
The database architecture (three-layer) is a convenient tool for the user to visualize the
schema levels in a database system. This architecture makes it easier to achieve data
independence - both physical and logical.
The E-R model plays an important role in the conceptual database design. Evolutionary
path of data models from hierarchical model to relational and then to object-oriented
model has brought about tremendous changes in database design techniques. Relational
database management systems are by far the most popular. Normalisation of relations is
.
an important aspect of database design aimed to remove anomalies in a database. It
improves integrity and consistency of data, though it slows retrieval speed. Designing
databases is a highly complex process. Usually a database stabilizes in design over a
period of time with feedback from users.
A subset or transformation of the logical view of the database schema that is required
by a particular user application program is called subschema. A subschema is an
individual view of the database. Each individual user may have a separate view of
the database depending upon the user requirements. Access to the entire database
i.e., conceptual schema is generally not given to all the users. A view helps to
provide ~ccess to only that"part of the database, which a user actually needs. The
purpose of the views is not only to rationalise database access but also to implement
66 security aspects.
3) E-R diagram is a tool that models the relationships among the entities in a database. Database Management
Systems
E-R diagram maps the conceptual schema and serves as a blne print for designing
databases.
RDBMS have advantages over other data models in the fact that the relational
model is based on well-developed mathematical theory of relations from which it
derives its name. Application of mathematics imparts great strength to the relational
model. The data in relational systems is represented in the form of tables which
users find easier to handle. Examples of RDBMS are: ORACLE, SYBASE and
INGRESS.
5) The advantages of normalisation of relations are that it enforces data integrity and
removes anomalies in a database. It also minimises data redundancy and in general
promotes accuracy and consistency of data in the database.
Since the process of normalization involves decomposing a table into two or more.
tables, which are joined while retrieving data, the retrieval speed gets adversely
affected.
• Modular growth
7) A database stores explicit information while a knowledge base with IF-THEN rules
can infer additional information not directly stored in the basic data. In a knowledge
base inference rules allow relationships to be derived from the data.
3.16 KEYWORDS
Access Method The method used to store, find and retrieve the data from
a database.
Data Independence Separates the data from the program, which often enables
data definition to be changed without altering the program.
Data Integrity Keeping accurate data which means few errors and
1
Courtney, James F. and Paradice, David B. (1988). Database Systems for Management.
Toronto: Times Mirror/Mosby College Publishing.
Codd, E.F. (1990). The Relational Modelfor Database Management; New York: Wesley
Publishing Company. Inc.
Curtin, Dennis P. [et al.] (1999). Information Technology: The Breaking Wave. New
Delhi: Tata McGraw-Hill Publishing Company Limited.
Date, C'J. (1989). Introduction to, Database Systems. New Delhi: Narosa Publishing
House.
Murray, Linda. A. and John Richardon, T.E. (1989). Intelligent Systems in a Human
Context. Oxford: Oxford University Press.
Post, Gerald Y.(200,9). Database Management Systems. New Delhi.: Tata McGraw-
Hill Publishing Company Limited.
Williams Brain K. [et al.] (l999). Using Information Technology. Singapore: Irwinl
McGraw-Hill.
69