0% found this document useful (0 votes)

35 views35 pages

Database Management Systems Guide

Uploaded by

Mansha Singad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views35 pages

Database Management Systems Guide

Uploaded by

Mansha Singad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

UNIT 3 DATABASE MANAGEMENT

SYSTEMS
Structure
3.0 Objectives
3.1 Introduction
3.2 Definitions and Basic Concepts
3.2.1 Data and Information
3.2.2 Database and Database Management System (DBMS)
3.2.3 Data Hierarchy
3.2.4 Data Integrity
3.2.5 Data Independence

3.3' Objectives of Database Management Systems (DBMS)

3.3.1 The Need for DB MS
3.3.2 The Goals of DBMS

3.4 Evolution of DB MS
3.4.1 Chronology
3.4.2 Functions and Components of a DBMS

3.5 Architecture of a DBMS

3.6 Data Modeling
3.6.1 Entity-Relationship Model
3.6.2 Types of Relationships

3.7 Data Models

3.7.1 Hierarchical Model
3.7.2 Network Model
3.7.3 Relational Model
3.7.4 Object-oriented Model

3.8 Relational Database Management Systems (RDBMS)

3.8.1 Characteristics of a Relation
3.8.2 Keys and their Functions
3.8.3 Criteria for a DBMS to be Relational

3.9 Normalisation of Relations

3.9.1 Dependencies
3.9.2 Normal Forms

3.10 Designing Databases

3.11 Distributed Database Systems
3 11.1 Architecture of Distributed Databases

3.11.2 Justifications and Options for Distributing Data

3.12 Database Systems for Management Support

3.13 Artificial Intelligence and Expert Systems
3.14 Summary

3.15 Answers to Self Check Exercises

_ 3.16 Keywords

3.17 References and Further Reading 35

I
Database Design and
Management 3.0 OBJECTIVES
After reading this Unit, you will be able to:

• understand the evolutionary pattern of development of database approach;

• comprehend the need for a database management system (DBMS), its primary
objectives, database architecture and basic issues in the complex and expanding
environment of database management;

• distinguish between various data models and know the characteristics of each;

• become conversant with relational database technology; and

• gauge emerging trends in data management.

3.1 INTRODUCTION
Database systems have permeated all spheres of life. This Unit provides core information
in understanding database management systems. Starting with the need for a database
approach, the three-level database architecture has been explained. Modeling concepts
and the role E-R model plays in conceptual database design has been discussed. Classical
data models viz., hierarchical model, network model and relational model have been
explained with illustrations. Considering that the Relational Database Management Systems
(RDBMS) are still widely in use, relational database technology has been dealt in details.
The concepts of dependencies and normalisation have been elucidated with examples. A
step-by-step approach for designing databases has been given. The Unit also dwells on
some of the database systems in specific application areas.

3.2 DEFINITIONS AND BASIC CONCEPTS

Some commonly used database terms are defined in the following paragraphs to help you
understand the data architecture concepts.

3.2.1 Data and Information

Data is the raw material from which information is derived as the end product.
Data represents a set of characters that have no meaning on their own, i.e., it consists
of just symbols. On processing, meaning is attached to data, which transforms into
information.

To illustrate the difference between these two terms let us consider an example. The
digits 050643 as such have no meaning. But if we are told that the first two digits
represented a month, the next two digits, a day of the month and the last two, a year, then
the set 050643 may represent the date of birth of a person. Processed in another manner
the same digits written as 643050 may represent the telephone number of an individual.

3.2.2 Database and Database Management System (DBMS)

Database
A database is a collection of logically related data arranged in a structured form designed
to meet the information requirements of multiple users. It may also be defined as a
collection of non-redundant operational data sharable between different application
systems.

Database Management System

It is a collection of software that is used to store, delete, modify and retrieve data that
is stored in a database. DBMS acts as an interface between the user and the data
36 (Fig. 3.1).

I
User Database Management
.----User
DBMS Systems

User
Fig. 3.1: Data, DBMS and Users
3~2.3 Data Hierarchy
Hierarchy in the organisation of data in descending order of complexity is represented in
Fig. 3.2.

Database

~
File

~
Record

~
Field

~
Character

~
Byte

~
Bit
Fig. 3.2: Data Hierarchy

From this hierarchy it is clear that a database is made up of files. Files are composed of
records and each record consists of field~ data items. Each field is composed of
characters, which are made up of bytes. And lastly, bytes decompose into bits.

3.2.4 Data Integrity

The degree of correctness, timeliness, and relevance of data stored in a database indicates
data integrity. Data integrity can be ensured by certain checks and validation procedures
carried out whenever an update operation is attempted and also by elimination of data
redundancy. Some database management systems have features, which support data
validation. For example, in ORACLE (a relational database management system) triggers
can be used for this purpose.
37

I
Database Design and
Management
3.2.5 Data Independence
Data independence is a property by which data files are insulated from application programs
that use those files. The close link between the data and the access program is weakened
and the database made more flexible to user requirements. With data independence, a
programmer can write programs in FORTRAN, PASCAL, or any other language of his
choice without bothering what language programs originally created the files that he wants
to access.

Data independence can be logical, physical or geographical. By logical or physical data

independence we mean the ability to change the logical or physical structures of data
without requiring any changes in the application programs which manipulate that data.
Geographical data independence, a characteristic of distributed database management
systems, makes location of data transparent to the users.

Data independence is an important concept which will be considered in more details while
discussing the architecture of database management systems.

Self Check Exercise

1) What is data independence and what role does it play in database systems?

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.

3.3 OBJECTIVES OF DATABASE MANAGEMENT

SYSTEMS (DBMS)
3.3.1 The Need for DBMS
Let us consider the scenario of data processing that existed before the advent of database
management systems. In the file-oriented system, which was used at that time, a master
file (the file which contains all the up-to-date data on a subject) is created using a
programming language. Access techniques based on the requisite queries on the data are
embedded in the file at the time of its creation. Any change in the master file i.e., addition
of a new field or change in the structure of an existing field has to be implemented by
creating the file all over again along with the modified access techniques.

To illustrate the limitations of data management using master files, let us consider the
example of a database of an educational institution running professional courses. Let us
assume that the database consists of three master files of student, faculty and course
records. The student master file has been created using FORTRAN and has' such fields
as student identification number, name, address, gender, course, high-school grade and
examinations cleared.

The faculty master file uses CaBaL and consists of fields like faculty identification
number, name, gender, department, salary, qualifications and teaching hours. The course
master file is based on PASCAL and covers such data as course identification number,
38 course title, class number, section number and students attending the course.
Now, suppose there is a query to provide the names of all the female students being Database Management
Systems
taught by female faculty members. This query cannot be answered by the available
master files despite the fact that the data needed for the query exists in the database.
The difficulty lies in the fact that the needed data is available in two master files created
in different programming languages and having their own access techniques. To answer
the query a new master file with data items derived from the student and faculty master
files will have to be created and new programmes for accessing the data written. This
makes data retrieval cumbersome and time-consuming.
Take another situation when the accounts department of the institution in the example
also wants to use the database and needs the student master file with additional fields like
.
stipend paid, fees due, penalties charged, etc. To meet these requirements, another copy
of the student master file with new fields is created. Similarly, there may be copies of
faculty and course master files created to meet specific requirements. This results in
duplication of data i.e., data redundancy. In such circumstances it becomes difficult to
keep the master files identically updated i.e., propagate the updates in all the copies.
These limitation and drawbacks were at the core of development of database management
systems.

3.3.2 The Goals of DBMS

The database management systems have the following goals:
i) To provide retrieval flexibility. It should be relatively easy to link data from different
files.
ii) To facilitate reduction of data duplication and elimination of multiple copies of a
master file. Data redundancy control helps in overcoming updating problems and
promotes data integrity.
iii) To ensure high level of data independence. The data is hidden from the programming
language, operating system and processing environment. It should be up to DBMS
to convert the stored data into a form that could be used in whatever language the
programmer desires to use.

3.4 EVOLUTION OF DBMS

The database management concepts have undergone tremendous changes over the years.
The chronological evolution of database management systems is described below:

3 4.1 Chronology
i) Primitive (first generation) Systems
This stage is illustrated in Fig. 3.3a. Here, programming overheads are high as the user
has the responsibility for all the I/O (Input/Output) programming as well as file management.
Application programs (software that supports end user activity) have to be written to
access the data stored in secondary storage devices (Direct Access Storage Devices).
This phase represents the period of mid 1960s.
AppL Prg. 1 AppL Prg. 2

Mid 1960s

DASD

Fig. 3:3(a): First Generation Systems 39

Database Design and ii) Second Generation Systems
Management
Appearing in late 1960s, this generation is represented in Fig. 3.3b. In this case, operating
system takes care of file management and other routines thereby relieving the programmer
from this effort. Though physical data independence is achieved, logical data independence
remains to be built into the system.

ppl Prg I Appl. Prg. 2

Late 1960s

DASD

Fig. 3.3(b): Second Generation Systems

iii) Third Generation Systems

The third generation systems (Fig. 3.3c) evolved in early 1970s. Here, a layer of DB MS
has been added over the operating system. The systems provide physical as well as
logical data independence. The advantage it gives is that query processing can be attempted
by offering higher levels of operation on the logical view of the data available from
DBMS to the application programs.

DBMS
Early 70s

1
Operating System

40 Fig. 3.3(c): Third Generation Systems

I
3.4.2 Functions and Components of a DB MS Database Management
Systems

Basically, there are only two operations that can be performed on data viz., retrieval and
maintenance. Retrieval refers to reading data from files to serve the information
requirements of a user and forms the most important function of a database management
system. Maintenance concerns changing of data in stored files.

Data maintenance involves three operations: addition, deletion and modification which
correspond to adding new records, deleting existing records and modify/updating values
in the existing records.

A database management system has two essential components: data definition part and
data manipulation part. Data definition part provides definition or description of database
objects and is written using data definition language (DDL). This part creates logical
structures of entities when a database is set up.

Data manipulation part refers to methods of manipulating data and is implemented

using data manipulation languages (DML). There are four basic methods of data
manipulations: programming language interface, query languages, report writers and
system utilities.

Programming language interface (PU) or host language interface provides access to the
database through some type of programming language (PASCAL, C, COBOL, etc.).
Query languages allow fast retrieval of data and some of them are considered fourth
generation languages (4GL). The 4GLs are non-procedural languages, which implies
that a user has to specify only what data is required and not how it should be retrieved.

The query languages can be grouped into two categories - command-driven query
languages and screen-oriented query languages. In the first case the commands are
specified in English-like text while in the second case the user enters commands through
a fill-in-the blank mechanism. SQL (structured query language), a 4GL and a standard
language for interfacing with relational DBMS, belongs to the first category while querying
data through SQL forms is an example of the second category.

Report writers represent programs, which are used to derive information from a database
and generate a report that presents this information in the desired fashion. And lastly,
system utilities, are programs which allow a system manager to take back-up of databases,
load data into a database, restore data in case of database crash and carry out other jobs
related to database administration.

3.5 ARCHITECTURE OF A DB MS

Database management systems have a three-level architecture (see Fig. 3.4). Schema,
level and view are used interchangeably to describe the architecture of a DBMS. The
uppermost level in the architecture is the external level, which refers to the way the users
view the data. The external level is also sometimes called subschema. A user would
generally be interested only in some portion of the total database, which forms his external
view. There may be several external views of the same database depending upon the
user requirements. External schema can be used for implementing data security by
restricting access to the database.

The second level is the conceptual schema, which represents the entire information
content of the database. It gives global or integrated view of the database - the view of
the database administrator. The conceptual schema is written using the conceptual DDL
(Data Definition Language). 41

I
Database Design and External Schema External Schema External Schema
Management
(User I) (User 2) (User 3)

Mapping Statements Mapping Statements First

Mapping Statements L- ----' Level

DBMS
i oftware

Conceptual Schema
1------+ Second Level
~--------~
Mapping Statements

1 DBMS Software

Internal Schema
t------------i----~ Third Level
Mapping Statements

Access Method

Fig 3.4: Architecture of a DBMS

Mapping statements (software programs) at the first level which establish correspondence
between the external and conceptual schemas provide data independence which implies
that one can define additional database objects or modify the structures of the existing
ones without requiring any existing user to change his application programs. With logical
data independence, a conceptual schema can grow and evolve with time without affecting
the external schemas.
The third level of the architecture is the internal schema. Internal view describes how the
data is actually stored and managed on the computer's secondary storage. It specifies
what indexes exist, how stored fields are represented, physical sequence of stored records,
etc. The internal schema is written using the internal DDL. Conceptual/internal mapping
statements ensure physical data independence - that is the way the data is actually stored
can be changed without requiring any change in the conceptual schema.
The mapping component between internal schema and secondary storage devices (direct
access storage devices) is called access method. The access method consists of a set of
routines whose function is to conceal all the device-dependent details from the DBMS
and present the DBMS with the stored record view, i.e., the internal view. In many cases
the underlying operating system performs this function.

Self Check Exercise

2) What is a schema and a sub-schema in database system? Briefliexplain the
purpose of the views in database architecture.

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.
......................................................................................................................
......................................................................................................................
......................................................................................................................
......................................................................................................................

42 ......................................................................................................................

I
Database Management
3.6 DATA MODELING Systems

Data modeling is a process by which the data requirements of an organisation or an

application area are represented. Conceptual modeling is an important phase in designing
a successful database application. The traditional approach is of using entity-relationship
(ER) model for this purpose. Enhanced ER or EER model is used for modeling object-
oriented databases.

3.6.1 Entity-Relationship Model

Entity-relationship model developed by Chen in 1976-77 serves as an excellent tool in the
database design process. It provides graphic representation of entities, attributes and
relationships.

Requirements analysis of the designed database helps in collecting the necessary

information in the form of entities and attributes to be included in the database. Based on
enterprise rules, the relationships between the entities get identified and the nature of use
of the database determined. E-R model describes the conceptual schema and is considered
as a blueprint of the database under design. After finalisation, E-R diagram (as entity-
relationship model is generally called) is mapped into one of the selected database models
(discussed later in the text) and the system-dependent procedure of database creation
started.

An example of E-R diagram is illustrated in Fig. 3.5. It depicts a database on marketing

of drugs from medicinal and aromatic plants. The plants or their parts serve as crude
drugs, which are traded in the market. The standardising agency certifies the quality of
drugs while certifying agency approves the drugs for export. Before supply to the customer
the crude drugs are sometimes processed.

In an ER diagram rectangles represent entities, ellipses-attributes, diamonds-

relationships, attributes with underscore-primary keys; attributes with double
underscore-foreign keys and 1, n, m-relationship types.

Family Belongs to Plant

Botanical
id

Standardising agency

Fig. 3.5: Illustration of an E-R Diagram

3.6.2 Types of Relationships

A relationship is an association between two or more entities. Entities correspond to
record types in a database, which are sets. Thus a relationship represents correspondence
between the elements of n sets. Relationship over 2 sets is called binary relationship;
relationship over 3 sets-ternary and over n sets-n-ary relationship. 43
Database Design and Relationships can themselves be treated as entities and assigned attributes (see Fig. 3.5).
Management
Relationships can be grouped into the following types:

i) 1: I (one-to-one)

ii) I:n (one-to-many)

iii) n: I (many-to-one)

iv) n:m (many-to-many) ";"

Let us illustrate these relationship types one by one. In 1:1 relationship, one instance of an
entity of a given type is associated with only one member of another type. Let there be a
set of country names and a set of city names. Further, let us assume that each city in the
set is a capital. The relationship between these two sets which can be called "capital" is
] :] because for each country name there is only one city name and conversely, each city
name corresponds to only one country name.

In l:n relationship, one instance of a given type of entity is related to many instances of
another type. Let there be a set of departments and a set of faculty members employed
in the departments. Department-faculty relationship which can be called 'employed' is of
I:n type because each department employs several faculty members and each faculty
member works only in one department.

Many-to-one (n: 1) relationship has the same semantics as 1:n. In the above example if
we change the relationship to faculty-department (in place of department-faculty) we
will have n: 1 relationship type.

Lastly n:m relationship is one in which many instances of an entity type are associated
with many instances of another entity type. Consider a set of a faculty members teaching
a set of students. Faculty-student relationship ("teaching") is an example of n:m type
relationship because a faculty member can teach 'm' students and a student can be
taught by 'n' faculty members.

Self Check Exercise

3) What is the significance of an entity-relationship (ER) diagram?

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.

......................................................................................................................

3.7 DATA MODELS

The data models are methods by which data is structured to represent the real world and
the manner in which the data is accessed. These models provide alternative ways of
picturing the relationships and serve as frameworks for mapping the conceptual schema
of a database. There are a number of data models in use today but the following three
44
have been most widely implemented:

I
• Hierarchical Database Managemenr
Systems
• Network
• Relational

Besides these three models, which are sometimes referred to as classical models, post-
relational research has resulted in a new data model called object-oriented data model.

3.7.1 Hierarchical Model

The hierarchical model is the oldest of the three models. This model structures data so
that each element is subordinate to another in a strict hierarchical manner.

The hierarchical model represents l:n relationship between record types (see Fig. 3.6).
One record type (the I in I:n relationship) is designated as the "parent" record type. In
this parent child relationship, a child record type can have only one parent record type but
a parent record may have several child record types.

Fig. 3.6: A Hierarchy (l:n relationship)

The hierarchical model is implemented by storing data in physical adjacency and using
pointers to the dependent child records. The model suffers from undesirable • data
redundancy.

An example of the hierarchical model is illustrated in Fig. 3.7. Here hierarchy has. been
shown between two record types-one having fields like name, address, profession,
account number and the other giving account number and balance. Redundancy has
been shown by an arrow.

Record type #1 '_N_a_m_e_--,I,-A_d_d_re_ss_.:...I_P_ro_fe_s_si_o_n I

Record type #2. I Alc No. I BaJance I

Sari la Rohini
Retired Teacher Vasant Saket
Hajela Vihar Da, Scientist Advocate Naraina
Delhi Sinha Vihar Ray Teacher
Delhi Mathur Delhi
Delhi
Delhi

I
/
190 3000 229 3335 722 917 435 3900 I 229 3335

1 i
I 817
I 1725 I
Fig. 3.7: An Illustration of the Hierarchical Model 45

I
Database Design and Examples of commercial implementation ofthe hierarchical model are IBM's IMS database
Management
management system and CDS/ISIS - a popular database management system for
bibliographic applications.

3.7.2 Network Model

In the network model the restriction that a child can have only one parent record is
removed. The network supports n:m relationship (see Fig.3.8). A network can be hierarchy
(1:n being a special case of n:m) but a hierarchy cannot be a network.

Fig. 3.8: A Network (n:m relationship)

The network model is implemented with various pointer schemes. Since a network is an
extension of hierarchy, the semantic properties of the network model are similar to those
of the hierarchical model.

The network model is illustrated in Fig.3.9. The example given makes use of the same
record types as in the hierarchical model with the difference that the first record type has
pointers to the Alc No.of the second record type. A pointer may itselfpoint to a number of
pointers (called arrays).

Cullinet's IDMS is an example of commercial database management system based on

the network model.
I File 1 File 2

AlCNo. AIC No. Balance

.
Hajela

-
Sarita Vihar
Delhi
Retired
-- •... 190 3000

Das
Rohini
Delhi
Teacher • 229 3335

Sinha
Vasant Virar
Scientist
0
-
....,.. 722 917
Delhi

Sa<et
I 817
I 1725
I
Roy
Delh
Advocate
f
I 435
I 3900
I
Mathur
Naraina
Delh Teacher
--
(An array)
46 Fig. 3.9: An Illustration of the Network Model

I
3.7.3 Relational Model Database Management
Systems
The relational model maintains data in tabular form, which users find comfortable and
familiar. The model is based on well-develoned mathematical theory of relations from
which it derives its name. The name has nothing to do with the fact that the data stored 'in
the relations is related (which usually is).

The relational model supports 1:1, 1:n, n: 1 and n:m relationships. A significant aspect of
the relational model is that the relationships between data items are not explicitly stated
by pointers. Instead, it is up to the DB MS to deduce the relationships from the existence
of matching data in different tables. The absence of physical links provides a more flexible
data manipulation environment.

An illustration of the relational model is given in Fig. 3.9. Here the relationship between
the two tables has been captured by repeating a column (Alc No.) in the first table.
Table name: Customer,
Table name: Account

T
Name Address '::;rof0,-c;ion Ale No. Ale No. Balance

Hajela Sarita Vihar Retired 190 190 3000

Delhi

Das Rohini Teacher 229 229 3335

Delhi

Sinha Vasant Vihar Scientist 817 817 1725

Delhi

Sinha Vasant Vihar Scientist 722 722 917

- Delhi

Roy Saket Advocate 435 435 3900

Delhi

Mathur Naraina Teacher 229

Delhi

~ ~

Column repeated to establish relationship

Fig. 3.10: An Illustration of the Relational Model

ORACLE, INGRESS, and SYBASE are some of the well-known commercial database
management systems based on the relational model.

3.7.4 Object-oriented Model

The object-oriented data model facilitates handling of objects rather than records. In an
object-oriented model an entity is represented as an instance (object) of a class that has
a set of properties and operations (methods) applied to the objects. A class represents an
abstract data type and is a shell from which one can generate as many copies (called
instances) as one wants. In object-oriented approach, the behaviour of an object is a part
of its definition. The behaviour is described by a set of methods. The set of methods
offered by an object to the others defines the object interface. A class and hence an
object may inherit properties and methods from related classes. Objects and classes are
dynamic and can be created at any time. 47
Database Design and Viewing the data as objects instead of as records provides more flexibility and removes
Management
the need to normalise data.

Fig. 3.11 gives comparison between the conventional database approach and object-
oriented database approach.
Conventlonal pi ach Object. Orient d pproacb

User User
DBMS Operations

Message
Data
• Data '" Object

User

Fig. 3.11: Comparison between Conventional and Object-oriented Database Approaches

Some of the object building blocks are defined below:

• Objects: An object is an entity, real or abstract, that has state, behaviour and
identity. The state of an object is represented by its attributes and their values. The
behaviour of an object is represented by its operations or methods.
• Messages: Objects communicate with each other thr~ugh messages. A message
determines what operation is to be performed by an object. A message specifies an
operation name and a list of arguments.
• Classes: A class is a set of objects that share common attributes and behavior. Each
object is an instance of some class.
The object-oriented approach emphasises incremental software development. The
underlying principle of this approach is:
• Grow software, don't build it;
• Build components rather than a whole system; and
• Assemble a basic system and then enhance it.
Smalltalk, C++, Java and Object PascallDelphi are the object-oriented programming
languages used in this approach.
Self Check Exercise
4) Why have RDBMS found wider application than other data models? Give a few
examples of RDBMS.

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.

•••••••••••••••••••••••••••••••••• 1.., •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

48
Database Management
3.8 RELATIONAL DATABASE MANAGEMENT Systems

SYSTEMS (RDBMS)
Relational database management systems have been evolving since the relational data
model was first proposed in 1970 by Edgar F. Cod of IBM. They have become de facto
international standard in database management. Despite great advances in the object-
oriented database management systems, relational systems are likely to remain in vogue
for quite some time.

Let us define some of the terms used in relational technology.

A relational database is collection of data belonging to a system spread over a number

of relations.

As pointed out earlier, relation is a mathematical concept. Mathematically, a relation is a

subset of the Cartesian product of a list of domains. A domain represents a set of values
an attribute can assume (e.g., numeric, character, etc.). .

The degree of a relation determines the number of attributes in the relation. The number
oftuples in a relation is called its cardinality. For example, the customer table in Fig. 3.9
has four attributes and six tupies and hence the degree of this relation is four and cardinality
six.

A relational database management system (RDBMS) is a collection of programs that

help to organise and manipulate data in a relational database.

The following terms are used interchangeably in RDBMS jargon:

• relation, table, database, file

• attribute, column, field

• tupIe, row, record

• domain, range, type

3.8.1 Characteristics of a Relation

A relation exhibits the following characteristics:

i) A relation is always named.

ii) Each column of a relation has a unique name (although columns of different relations
may have the same name). .

iii) The order of the columns in a relation i of no consequence.

iv) There cannot be two identical tupies in a relation.

v) The order of the tuples in a relation is of no consequence.

vi) Each column of a relation has a domain specification.

vii) There may be more than one column in a relauon having th same domain
specifications.

viii) A column in a relation cannot have more than one domain specification.

be) Each column contains values about the same attributes and each table cell value
(intersection of a row and a column) must be single-valued.

The structure of a relation i.e., set of attributes without any values assigned to them IS
called the relation scheme. A tuple of a relation with values assigned to its attributes is
called a relation instance.
Database Design and A table is generally represented by its relation scheme which is denoted by table name
Management followed by attribute names given in brackets. The relation schemes of tables shown in
Fig. 3.10 are:
CUSTOMER (Name, Address, Profession, Nc No.)

ACCOUNT (Nc No., Balance).

The primary key attribute is underlined in the relation scheme.

3.8.2 Keys and their Functions

Keys play an important role in relational database management systems. They serve two
basic purposes:

i) Identification of tuples

ii) Establishing relationship between tables

Keys (data items) can be made up of single attributes called single key attributes or
multiplekey attributes called multikey attributes. Keys formed by multiple attributes are
also called composite or concatenated keys.
A superkey is a set of attributes which taken collectively allows us to identify uniquely an
entity in an entity set. If any key is a superkey, a superset of superkey is also a superkey.
(If Xl is a subset of a set X, the X is called superset of Xl).

The smallest superkey, which is also called minimal key, is a key such that no proper
subset of it is a superkey. One of the minimal keys is chosen as the primary key. The keys
in the set of minimal keys are called candidate or alternate keys. It is up to the database
designer to select one of the candidate keys as a primary key.
A primary key is an attribute or a combination of attributes that uniquely identifies a record
while a secondary key does not identify a recdfd uniquely. A secondary key identifies all
the re~ords corresponding to the key value.
Aforeign key 'is an attribute or a combination of attributes that is used to link tables. In
relational database management systems foreign keys are used as linking pins between
tables ..Foreign keys represent links to the primary keys.
Primary and foreign keys are critical in relational database management systems due to
their contribution in defining integrity rules. A pararriount guideline in relational syst~ms is
that a primary key or any attribute participating in a composite primary key of a relation
cannot have null value. This rule is called. entity integrity.

There is another integrity rule, which pertains to foreign keys. According to this rule an
attribute that is a foreign key in one table must be a primary key in another table. This rule
is known as referential integrity.
I

3.8.3 Criteria for a DBMS to be Relational

Distinguishing a truly relational DB MS from relational-like systems assumes importance
in view of the fact that many new DBMS packages are being labelled as "relational". The
minimum conditions for a system to be called relational are:

i) The data should be represented in the form of tables.

ii) Any pointer mechanism should be transparent to the DBMS users.

iii) The system should support relational algebra operators of SELECT, 'PROJECT and
JOIN.
Any system, which fulfills these three criteria, is called minimally relational. A system,
which satisfies only the first two conditions, is not a relational system and is called tabular
50 DBMS.
For a DBMS to be fully relational it should additionally support both entity and referential Database Management
Systems
integrity rules and implement all relational algebra operations.

The founder of relational database theory, E.F. Codd outlined 12 rules that define a fully
relational database system. These rules are based on the premise that a relational database
management system should be able to manage databases entirely through its relational
capabilities.

3.9. NORMALISATION OF RELATIONS

Normalisation cif relations is an important aspect in database design, which deals with
semantics of the data. It is a technique that structures data in such a manner that there
are no anomalies in the database. Anomalies refer to undesirable side effects which can
occurin relations resulting in poor database design. The process of normalisation usually
involves decomposing a relation into two or more relations with fewer attributes i.e.,
taking a vertical subset of the parent relation. The criteria used to split relations that
determine the levels of normalisation are called normal forms.

It ought to be noted that normalisation is primarily aimed at preventing or reducing data

maintenance problems rather than improving the retrieval efficiency. Converting relations
to normal forms and' utilising the join of the decomposed relations to retrieve data that
could have been retrieved from one original table does not augur well for retrieval speed.
Thus, while normalising relations we are sacrificing retrieval speed to improve the integrity,
consistency and overall maintenance of data stored in the database.

As indicated earlier, normalisation of relations removes anomalies in the database. The

, anomalies can be categorised as:

• Insertion anomalies

• Deletion anomalies

• Update' anomalies
An insertion anomaly occurs when we are unable to insert a .tuple into a table. Such a
situation can arise when the value of primary key is not known. As per-the entity integrity
rule, the primary key cannot have null value. Therefore, the value/s corresponding to
primary key attribute/s of the tuple must be assigned before inserting the tuple. If these
values are unknown, the tuple cannot be inserted into the table.

In case of a deletion anomaly, the deletion of a tuple causes problems in the database.
This call happen when we delete a tuple, which contains an important piece of information,
and thetuple being the last one in the table containing the information. With the deletion of
the tuple the important piece of information also gets removed from the database. '

An update anomaly occurs ~hen we have a lot of redundancy in ourdata. Due to

redundancy, data updating becomes cumbersome. If we have to update one attribute
value, which is occurring a number of times, we have to search for every, occurrence of
that value and then change it. The anomalies will be further elaborated when we discuss
the normal forms Before we proceed with discussion on normalisation it would be useful
to understand the concept of dependencies.

3.9.1 Dependencies
A dependency refers to relationship amongst attributes. These attributes may belong to
the same relation or different relations. Dependencies can be of various types viz., functional
, dependencies, transitive dependencies, multi valued dependencies, join dependencies, etc.
We shall briefly examine some of these dependencies.

Functional Dependency (F.D) - Functional dependency represents semantic association

between attributes. If a value of an attribute A determines the value of another attribute
B, ~e say B is functionally dependent on A. This is denoted by: 51
Database Design and A 7 B and read as "A determines B" and A is called the determinant.
Management
It should be noted that when a data item (or collection of data items) determines another
data item, the second data item does not necessarily determine the first. An attribute is
said to be fully functionally dependent on a combination of attributes if it is dependent on
the combination of attributes and not functionally dependent on any proper subset of the
combination. To illustrate functional dependency let us consider a relation with the following
relation scheme : '

BOOK (Book-id, Subject, Year of publication, Price). Book-id (Book identifier or

identification number) is the primary key of the relation and hence determines subject,
year of publication and price.

We can represent it as:

Book-id Subject

Book-id Year of publication

Book-id Price

This can also be shown graphically in the form of a dependency diagram :

Book

Transitive Dependency - Transitive dependency is a form ofintermediate dependency.

For example, if we have attributes or groups of attributes A, Band C such that A determines
Band B determines C i.e., .

A B

B C

Then we say a transitive dependency represented by A 7 B 7 C exists between A

and C.

Let us consider a relation FACULTY.

FACULTY (Faculty-id, Name, Department, Office, Salary)

In this relation let us presume that each department has its own office building. Then
beside others we have the following functional dependencies:

Faculty-id Department (by virtue of Faculty-id being the primary key)

Department Office (by virtue of the enterprise rule or constraint imposed by

us on the relation)

Thus, we have the following transitive dependency in the relation:

Faculty-id Department Office

Multivalued Dependency-Multivalued dependency refers to m:n (many-to-many)

relationships. We say multivalued dependency exists between two data items when one
value of the first data item gives a collection of values of the second data item i.e., it
multidetermines the second data items.

Join Dependency - If we decompose a relation into smaller relations and the join of the
smaller relations does not give us tuples as in the parent relation, we say the relation has
join dependency. .

52 Let us consider a relation SAMPLE with the functional depen.u.ncics as shown:

SAMPLE Database Management
Systems

A B C ED: A-7B

aI bI cl

a2 b2 c3 C-7B

a3 bI c2

a4 b2 c4

Let us now decompose this relation into two relations SPLIT I (A, B) and SPLIT 2(B,
C). The functional dependencies will remain the same in the relation ·SAMPLE.

SPLIT I SPLIT 2

A B B C

aI bI bI cl

ED.: A-7 ED.: C -7 B

a2 b2 b2 c3

a3 bI bI c2

a4 b2 b2 c4

Now, if we join the relations SPLIT I and SPLIT 2 over common attribute B we get the
relation SAMPLE 1.

SAMPLE I
..,.
A B C

aI bI cl

aI bI c2*

a2 b2 c3

a2 b2 c4*

a3 bI c2

a3 bI cl*

a4 b2 c3*

a4 b2 c4

,. We can see from the relation SAMPLE I that it has four additional (~purious) tuples
(shown by asterisk), which were not present in the original relation SAMPLE. This type
of join is called lossy join because the information content of the original table is lost. This
has occurred since the join attribute was not the determinant in the original relation and
hence it should not have been decomposed the way it was done. ~e say a join dependency
exists in.this case. If we decompose a relation and join the constituent relations over the
determinant attribute, we get lossless join. 53
Database Design and For example, consider the relation SAMPLENEW and split it into SAMPLENEWI and
Ma~agement SAMPLENEW2 as given below:

SAMPLENEW

X Y Z

xl yl zl

x2 y2 72 F.D: X-7 Y

X-7Z'

x3 y2 zl

x4 yl 72

SAMPLENEWl SAMPLENEW2

X Y F.D: X -7 Y X Z F.D: X -7 Z

xl yl xl zl

x2 x2 72
y2

x3 y2 x3 zl

x4 72.
x4 yl

The join of SAMPLENEWI and SAMPLENEW2 gives the original table SAMPLENEW
without any spurious rows because the attribute X over which the table is decomposed is
the determinant attribute.

3.9.2 Normal Forms

The technique of normalization is based on the analysis of functional dependencies between
attributes. Taking into account the total scenario, enterprise rules and semantic constraints,
all the functional dependencies in the relations are captured and the relations transformed
into proper normal forms. A normal form represents the level of normalization of a relation.
Normalization proceeds by converting a relation from a lower normal form to higher
normal form.

The norm forms are:

First normal form (lNF), second normal form (2NF), third normal form (3NF),
Boyce-Codd normal form .(BCNF), fourth normal form (4NF), fifth normal
.form (5NF) and the highest normal form which is called domainlkey normal form
(DKlNF).

E.F. Codd had outlined the first three normal forms, which we shall discuss in details.
Most of the commercially available DBMS are normalized up to 3NF or BCNF.

Normal forms build on each other i.e., if a relation is in 3NF, it is also in INF and 2NF.

The first normal form (lNF). A relation is in the first normal form if it can be
represented as a flat file i.e., the relation contains single values at the intersection of
each row and column. .

In other words, each attribute value in a tuple is atomic i.e., non-decomposable.

54
Let us consider a relation PERSON that-is not in INF. Database Management
Systems

PERSON (Name, Age, Gender, CName, Age)

In this relation CName, Age .pertains to the attribute dependent child with multiple values
(name and age). To convert this relation into INF, it should be decorriposed into two
relations as foliows: .

PERSON (Name, Age, Gender)

DEPENDENT (Name, eName, CAge>

" The second normal form (2NF). A relationis in the second normal form if it is in
INF and every non-key attribute is fully functionally dependent on the primary key.
The second normal form pertains only to relations with composite primary key. In
case a relation is in INF and has a single-attribute primary key, it is automatically In
the 2NF.

To explain the second normal form let us take the example of the following relation.

COURSE (Course-id, Course-name, Class-number, Student-id, Faculty-id).

In this relation Course-id, Student-id and Faculty-id form a composite primary key.

The following functional dependencies can be noticed in this relation.

Course-id, Student-id, Faculty-id ~ Course-name

Course-id, Student-id, Faculty-id ~ Class-number

On examining the relation semantically we observe that Course-id uniquely determines

Course-name i.e.,

Course-id ~. Course-name

This means that Course-name is not fully functionally dependent on the primary key.
Therefore the relation is not in the 2NF. The dependency diagram of the relation can be
represented as follows :

Class-number Student-id Faculty-id

To convert this relation into the2NF, we decompose it into two relations as follows:

COURSE (Course-id, Class-number, Student-id, Faculty-id)

COURSE-TITLE (Course-id, Course-name)

I
The decomposition is done; by extracting the attribute that caused the problem from the
relation and creating a new relation with this attribute.

Letus examine these relations (normalised and unnormalised) on the basis o(;nomalies
described earlier. .
55
Database Design and COURSE (unnormalised)
Management

Course-id Course-name Class- num ber Student-id Faculty-id

C701 Computer science 7 S009 B03

C701 Computer science 8 S008 G04

C702 Library automation 7 S009 AOl

C702 Library automation 8 S006 G04

E500 Microeconomics 7 S009 P02

E501 Macroeconomics 7 SOOI P02

M200 Management studies 7 SOOI VOl

The normalised relations obtained from the above relation are :

COURSE

Course-id Class-number Student-id Faculty-id

C701 7 SOO9 B03

C70l 8 S008 G04

C702 7 S009 AOl

E500 8 S006 G04

E501 7 S007 P02

M200 7 SOOI VOl

COURSE- TITLE

Course-id Course-name

C70l Computer science

C702 Library automation

E500 Microeconomics

E501 Macroeconomics

M200 Management studies

Insertion: In the case of unnormalised relation, suppose a new course named 'Computer
networks' has to be introduced. We cannot enter the Course-name in the relation, since
we have no means to ascertain Student-id and Faculty-id at the time of introduction of
the course and these attributes, which along with Course-id form the primary key, cannot
be assigned null values. However, in the normalised relation COURSE-TITLE this problem
does not arise.

Deletion: Suppose in the unnormalised relation the student with Student-id SOOlleaves
56 the course. This student being the only student in the particular course, deletion of the

I
tuple would result in loss of information and we will have no way to know that the course Database Management
Systems
'Management Studies' exists. In normalised relation the deletion of this tuple does not
result in information loss because the information about the course name exists in the
second relation.

Updating: Let us assume that the name.of the course 'Library automation' changes to
'Library management'. In the unnormalised relation we will have to change the information
in two tuples while in the normalised relation only one tuple would require to be updated.
This may not appear to be a significant gain but if the database size is large the problem
can have serious repercussions.

Third normal form (3NF). A relation is in the third normal form if it is in second normal
form and contains no transitive dependencies. This normal form is considered to be the
most important normal form.

To illustrate 3NF, let us.consider the following relation:

FACULTY (Faculty-id, Faculty-name, Department, Gender, Salary, Office)

Let us also assume that each department has its office in one building.

This relation has the following functional dependencies by virtue of Faculty-id being the
primary key.

Faculty-id -7 Faculty-name

'Faculty-id -7 Department

Faculty-id -7 Gender

Faculty-id -7 Office

We also know from the semantics of this relation that

Department Office

Therefore, we have a transitive dependency as shown below :

Faculty-id -7 . Department Office

Thus dependency diagram of the relation can be represented as :

Faculty-id Faculty name Department Gender Salary Office

I t t t t t
To convert this relation into the 3NF we shall have to decompose it into two smaller
relations. The new relation OFFICE-NAME has the attribute office which caused transitive
dependency and its determinant. The decomposed relations are :

FACULTY (Faculty-id, Faculty-name, Department, Gender, Salary)

OFFICE-NAME (Department, Office)

With some tuple values in these relations let us examine them for anomalies, which are
used to test relations. 57
Database Design and FACULTY (unnormalised)
Management
Faculty-id Faculty-name Department Gender Salary Office

B03 Bindra Computer Science M 4000 BirlaBlock

G04 Ganguly Computer Science M 4500 BirlaBlock
AO] Arora Computer Science F 3450 BirlaBlock
P02 Pandey Economics F 3500 Kanishka Block
VOl Vohra Mathematics M 4500 Ashoka Block
BOI Bansal Physics M 3000 Ashoka Block

The normalized relations are:

,"
FACULTY

Faculty-id Faculty -name Department Gender Salary

B03 Bindra Computer Science M 4000

G04 Ganguly Computer Science M 4500
AOl Arora Computer Science F 3450
P02 Pandey Economics F 3500
VOl Vohra Mathematics M 4500
BOI Bansal Physics M 3000

OFFICE-NAME

Department Office

Computer Science BirlaBlock

Economics Kanishka Block
Mathematics Ashoka Block
Physics Ashoka Block

Insertion: Let us assume that a new department whose faculty has not yet been finalized
is created. We cannot enter this information in the unnormalised relation because we do
not know value of the primary key (Faculty-id). To assign values to Faculty-id corresponding
to new department we must know the values of other attributes (Faculty-name, Gender,
Salary, Office). This problem does not exist in the normalised relation.

Deletion: Suppose a faculty member Vohra of mathematics department leaves the faculty.
If we delete tuple corresponding to this Faculty-name in the unnormalised relation, the
information that mathematics department exists is also lost. However, this does not happen
in the normalised relations. ~ .

Update: If the office of the Computer Science department shifts from Birla Block to
Nehru Block, 3 tuples need to be updated in the unnormalised relation while only one tuple
would be modified in normalised relation.

Boyce-Codd normal form (BCNF). Originally, normal forms stopped at the 3NF. However,
research into dependencies led to higher normal forms. BCNF is an extension of the third
normal form. It states that if a relation is in 3NF and all determinants are candidate keys,
58 then it is in Boyce-Codd normal form.

I
The fourth normal form (4NF) refers to multivalued dependencies. A relation is said to Database Management
be in the fourth normal form if it has only one multi valued dependency. Systems

The fifth normal form (SNF) is also called project join normal form (PINF). If a relation
is in 5NF we should be able to join the projections (decompositions) of the relation and
reconstruct the original relation without any information loss. •.

The domain key normal form (DKlNF) is considered the highest normal form. A
relation is in DKINF if every constraint can be inferred by simply knowing the set of
attribute names and their underlying domain along with their set of keys. Thus in DKINF
only two types of constraints are allowed - domain "nstraints and key constraints. If
these constraints are fully enforced, other constraints (dependencies) are removed and
no anomalies can occur in the relation.

The process of normalisation from lower to higher normal forms has been illustrated in
Fig. 3.12.

RaW~Da,a

1 NF Ensure flat file structure

Remove elements of functional dependencies

2NF
Remove elements of transitive dependencies
~
3 NF BCNF
--__
i _
~
4NF Ensure single multi valued dependencies

5NP Remove join dependencies

~
DK/NF Enforce domain and key constraints

Fig. 3.12: Normalisation Process

Self Check. Exercise

5) What are the advantages and disadvantages of normalisation of relations?

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.

......................................................................................................................

... . . . ..~ .

......................................................................................................................

.......................................................................................................................

...................................................................................................................... 59
Database Design and
Management 3.10 DESIGNING DATABASES
Designing a database is a highly complex operation. Though it is relatively easy to identify
a poorly designed database, there does not exist a unified approach which leads to the
best design.

The database design should be flexible enough to meet the requirements of the maximum
number of users to the fullest. Besides, the design should also anticipate, to a certain
extent, future requirements and make provisions for them. This calls for some intuitiveness
on the part of the database designer.

The process of designing a database is an interactive one, which means that initial database
structure, changes with usage. However, with time the design tends to get stabilized.
Usually, a person designated as database administrator (DBA) controls the design and
administration of a database.

A broad step-by-step procedure for designing a database has been summarized below:

1) First of all, data to be represented in the database is determined. For this, information
needs of the users are studied in detail. Based on the information requirements
anlysis, entities of interest are identified and their attributes examined.

2) . An E-R model of the database representing conceptual schema is drawn. This is the
most important stage in the database design process. E-R diagram, which depicts
entitites and their relationships, should be as comprehensive as possible.

3) The E-R model is mapped into a selected database structure (hierarchical, network,
relational or any other model). In c~se a relational model is chosen, tables '
corresponding to entities and their relations are finalised. The process of normalisation
is invoked to check the tables and reshape them if necessary.

4) An empty database is created using DBMS commands (create TABLE, INDEX,

etc.). A data dictionary, which defines data item names and their internal storage
format, is also created by DBMS. The database design process up to step 3 is
system-independent. The process becomes system-dependent after that.

5) The database is populated. This involves inserting data into the empty database. If
data to be inserted is available in machine-readable form, data loading utility of DB MS
can be utilised.

6) The performance of the database is closely monitored to ascertain whether any

tuning is required. Flexibility and speed of access are critically evaluated. The database
is also examined for datamaintenance problems.

7) The feedback of the users on the database functionality is analysed and changes in
the structure made to optirnise the usage.

3.11 DISTRIBUTED DATABASE SYSTEMS

3.11.1 Architecture of Distributed Databases

A distributed database is a single logical database, which is fragmented, and the fragments
spread across computers at different locations that are interlinked by a data communication
A
network to provide an integrated access to the data. distributed database environment
requires the data to be shared. A distributed database gives geographical data independence
i.e., a user requesting data need not know at which site the data is located. This property
60 is often referred to as location transparency and each local site is called a node.
Architecture and schematic representation of a distributed database is shown in Fig.3.13 Database Management
Systems
Distributed Distributed
DDID DDID

Distributed Distributed
DBMS DBMS
Communication
Application Network
Programs Application
Programs

Local Local
DBMS DBMS

Operating Operating
System System

Database
Database Site 2
Site 3

Fig. 3.13~ Architecture and Schematic Representation of a Distributed Database

As is clear from Fig.3.13, each site has a local DB MS as well as a copy of the distributed
DBMS. Distributed data dictionary/directory (DDID) stores information on location of
data in the network as well as data definitions. Request for data is first checked from the
distributed data dictionary/directory for location of the required data. In case the data is
available at the local site, the distributed DB MS forwards the request to local DBMS for
processing. If the request involves data from other sites, the distributed DBMS routes
the request to these sites.

When different nodes in a distributed database have mixed DBMSs (i.e., node 1 may
have relational DBMS and node 2 network DBMS), then the distributed DB MS capable
of handling such environment is called heterogeneous distributed database management
system.

Distributed database management exploits all advantages of centralised and decentralised

processing. A decentralized database, like a distributed database, is also stored on different
computers at multiple locations but in this case the computers are not interconnected and
hence the data cannot be shared.

3.11.2 Justifications and Options for Distributing Data.

The justifications for distributing data can be summed up as :

• A distributed database provides increased reliability and availability. Compared to

a centralised system which on failure becomes unavailable to all users, a
distributed system will continue to function, though at a reduced level, even when a
node fails.

• By encouraging local control of data at different sites, data integrity improves and
data administration becomes easier. 61
Database Design and
Management
• Distribution of data can improve access time if local data is stored locally. By locating
data closer to the point of its use, communication cost can be reduced and query'
response time improved.

• Distributed system facilitates modular growth. New nodes hosting additional database
fragments can be added to the system.

There are a number of options available for distributing data in a distributed database.
These options include: i) data replication, ii) horizontal partitioning, iii) vertical partitioning
and iv) combination of the above.

In case of data replication a copy of the database is stored at a few or all sites (full
replication). Reliability, saving in telecommunication charges and faster response are
the advantages of this option. But additional storage requirements and difficulty in
propagating updates form the basic drawbacks. This option is suitable in case updates
are infrequent and database interaction is restricted to read-only. CD-ROM (compact
. I

disk read only memory) offers excellent medium for replicated databases.

Horizontal partitioning of a database involves distributing rows of a relation to multiple

sites. New relations (partitions) with the requisite rows are created for this purpose. The
original relation can be reconstructed by taking the union of the new relations. Horizontal
partitioning can optimize performance by storing fragments of the database at the sites
where they are most used.

On vertical partitioning of a database, selected columns of a relation are projected into

new relations, which are stored at different sites. The main criterion for vertical partitioning
is specific data item requirer;nents at individual sites.

A combination of the mentioned options of data distribution may be used depending upon
the needs of the distributed system. The basic principle, which one must keep in mind, is
that data should be stored at sites where it will be most frequently used.

3.12 DATABASE SYSTEMS FOR MANAGEMENT

SUPPORT
Database systems for management support are broadly referred to as Management
Information Systems (MIS). However, for different levels of management database
systems have been categorised based on management functions and expected outputs.
Fig.3.14 illustrates a hierarchy of information systems corresponding to the three levels of
management.

Top Level
Flexible on-demand reports to make
decisions about unstructured queries

Middle Level
Management Information Summarised structured'[eports

System (MIS)

Lower Level
Processed transaction
Transaction Processing System (TPS)

All Levels
Expert System Office Automation System (OA S)

62 Fig. 3.14: Inf0"':la!ion Systems and Management Levels

As is clear from the figure, for lower level of managers, transaction-processing Database Management
Systems
systems (TPS) yielding processed transactions (bills, orders etc) suffice. Middle level
managers need management information systems providing summarised structured
reports. At the top-level decision support S; ~~.:.=~
(DSS) or executive information systems
(EIS) capable of providing brief on-demand reports about unstructured queries are
required. Office automation systems and expert systems are used by all levels including
non-management.

Self Check Exercise

6) What are the advantages of distributed databases?

Note: i) Write your answer in the space given below.

ii) Check your answer with the answers given at the end of the Unit.

3.13 ARTIFICIAL INTELLIGENCE AND EXPERT

SYSTEMS
Artificial intelligence (AI) is a group of related technologies that attempt to develop machines
to emulate human-like qualities such as learning, reasoning, communicating, seeing and
hearing.
In the early days of its development a computer wes called "electronic brain" and since
then efforts have been made to empower it with capabilities comparable with human
brain.
However, there is a basic difference in the functioning of a computer and human brain.
Whereas a computer processes numbers, human brain works on symbols. A conventional
computer program uses algorithms i.e., well-defined step-by-step procedure to solve a
problem" while human thought process is not algorithmic and is based on symbolic
processing. In symbolic processing information are represented using symbols rather than
numbers. Human intelligence or thought process relies on heuristic methods for
information processing. Heuristics of human thought process can be represented as shown
in Fig. 3.15..
Human decisions

~n
Jud ements

Based on

Reasoning

,r ,r r

Learned facts Experience Intuition

Fig.3.1S: Heuristics of Human Thought Process 63

Database Design and The main areas of AI are:
Management
• Robotics
• Perception systems
• Fuzzy logic
• Neural networks
• Genetic algorithm
• Natural language processing
• Expert systems
Robotics is a field that attempts to develop machines that can perform work normally
done by people. The machines themselves are called robots. Perception systems are
sensing devices that emulate the human capabilities of sight, hearing, touch and smell.
Clearly, perception systems are related to robotics, since robots need to have some sensing
capabilities. Fuzzy logic is a method of dealing with imprecise data and uncertainty, with
problems that have many answers rather than one. Unlike classical logic, fuzzy logic is
more like human reasoning. It deals with probability and credibility. That is, instead of
being simply true or false, a proposition is mostly true or mostly false or more true or .
more false.
Fuzzy logic principles are applied in neural networks. Neural networks use physical
electronic devices or software to mimic the neurological structure of human brain. Genetic
algorithms refer to programs that use Darwinian principles of random mutation to improve
themselves.
In natural language processing a user can interact with a computing system using everyday
language rather than a structural command language. Natural language systems have an
interface called natural language interface, which translates the natural language into the
application languages (such as SQL). For a language to be understood there are two
aspects namely, syiuax and semantics which play a vital role. Syntax represents the rules
for combining words to form phrases, clauses and sentences while semantics refers to
word meanings and the way the concepts are expressed. A natural language interface is
capable of analyzing syntax and semantics of a language. However, there are a number
of limitations with the natural language systems at present, which have hindered their
widespread use. With time the systems are likely to get more refined and popular.
An expert system can be defined as AI program designed to represent human expertise
in a specific domain. Typical expert system application areas include diagnosis, planning,
instruction and managment, monitoring and design.
At the heart of an expert system is a knowledge base, which contains facts (basic
data), rules and other information pertaining to a particular domain. Facts represent
accepted knowledge in a certain field (normally possessed by experts) and rules - a
collection of IF - THEN statements also called "rules of thumb" (heuristics) for drawing
inferences: IF such-and-such is true, THEN assume that so-and-so is true.
Unlike databases, which store explicit information, knowledge bases with their IF-THEN
rules can infer additional information not directly stored in the basic data. LISP (list
processing) and PROLOG (programming in logic) are the languages most commonly
utilised by knowledge bases. .
The overall process of developing an expert system is called Knowledge Engineering.
In order to map human knowledge to computer knowledge, one must understand the
nature of knowledge. Knowledge can be grouped into: procedural knowledge, declarative
knowledge, semantic knowledge and episodic knowledge. Procedural knowledge includes
skills an individual knows how to perform. It refers to knowing how to do something.
Declarative knowledge represents information that can be verbalised or expressed. It
states facts about the world. Semantic knowledge is a deep-level knowledge that reflects
cognitive structure, organisation and representation. Episodic knowledge represents
information that has been chunked or compiled episodically. It refers to experiential
64 information of an expert.
The expert systems are made up of three basic components: the knowledge base, an Database Management
Systems
inference engine and a user interface. A specialist called knowledge engineer interviews
the experts for the domain and encodes their knowledge (in the form of rules) into the
knowledge base. The inference engine includes programs that are used to control how
the rules in the knowledge base are used or processed. The user interface facilitates
communication or interaction between the expert system and end user. Components of
an expert system are represented schematic ally in Fig.3.16.

I Expert I

1
Knowledge Engineers <Ill
I I
,
Knowledge Base

*Facts
*Rules

I
~ ~
~1
___ In_fu_re_n_ce~E_n_g_in_e
__ ,~"'~-------------1.~ Control Mechanism

_1 U seT Interface
1_
Fig. 3.16: Components of an Expert System
On request for a piece of specific information from a user, the system looks for data and
rules in the knowledge base. The inference engine decides which rules to fire and how
they should be applied and also when to complete the querying process and provide
answer. User interface may prompt more input from a user to help the system respond
effectively to the query.
The use of expert systems is growing rapidly. However, human experts are likely to
remain in control using the expert systems as job aid.
Self Check Exercise
7) What is the difference between a knowledge base and a database?
8) What is artificial intelligence (l\I)? List some of the areas of its application.
Note: i) Write your answers in the space given below.
ii) Check your answers with the answers given at the end of the Unit.

l
Database Design and
Management 3.14 SUMMARY

This Unit discusses some of the basic issues in database management systems and
provides essential background to understand the functionality of these systems.

The database architecture (three-layer) is a convenient tool for the user to visualize the
schema levels in a database system. This architecture makes it easier to achieve data
independence - both physical and logical.

The E-R model plays an important role in the conceptual database design. Evolutionary
path of data models from hierarchical model to relational and then to object-oriented
model has brought about tremendous changes in database design techniques. Relational
database management systems are by far the most popular. Normalisation of relations is
.
an important aspect of database design aimed to remove anomalies in a database. It
improves integrity and consistency of data, though it slows retrieval speed. Designing
databases is a highly complex process. Usually a database stabilizes in design over a
period of time with feedback from users.

Database technology is making tremendous advances and a variety of database systems

have appeared on the scene in recent years. Distributed database systems, which provide
geographical data independence, permit large databases to be distributed over a number
of locations transparent to the. users. Knowledge engineering with expert systems and
knowledge bases is-providing new tools catalysing socio-economic developments. The
needs of the emerging information society critically depend on the advancement in database
technology.

3.15 ANSWERS TO SELF CHECK EXERCISES

1) In database systems; data independence is an important concept. It refers to
separation of data from the programs that use it. Data independence enables the
data definition to be changed without altering the programs.

Data independence can be physical, logical or geogiaPI:llcal (distributive). Physical

independence means that one can change the. way the data is actually stored or
accessed in the system without requiring changes in the application programs. Logical
independence means that the database can be reorganised i.e., changes can be
made in the conceptual level but still the same application programs can be used.
Geographical independence implies that the application programs are not affected
by the location of data i.e., whether the data is located on a local disk or on a remote
file server. Data independence imparts great flexibility in handling data in database
systems.

2) A schema is an overall conceptual or logical view of data in a database. Collection

of all the tables in a database. The schema contains a list of all the fields, the field
types, and the maximum and minimum acceptable values for each field, along with
information about the structure of every row in every table of the database.

A subset or transformation of the logical view of the database schema that is required
by a particular user application program is called subschema. A subschema is an
individual view of the database. Each individual user may have a separate view of
the database depending upon the user requirements. Access to the entire database
i.e., conceptual schema is generally not given to all the users. A view helps to
provide ~ccess to only that"part of the database, which a user actually needs. The
purpose of the views is not only to rationalise database access but also to implement
66 security aspects.
3) E-R diagram is a tool that models the relationships among the entities in a database. Database Management
Systems
E-R diagram maps the conceptual schema and serves as a blne print for designing
databases.

4) Relational database management systems (RDBMS) have become de facto

international standard in database management. Despite great advances in the object-
oriented systems arid other systems; relational systems retain their wide acceptance.

RDBMS have advantages over other data models in the fact that the relational
model is based on well-developed mathematical theory of relations from which it
derives its name. Application of mathematics imparts great strength to the relational
model. The data in relational systems is represented in the form of tables which
users find easier to handle. Examples of RDBMS are: ORACLE, SYBASE and
INGRESS.

5) The advantages of normalisation of relations are that it enforces data integrity and
removes anomalies in a database. It also minimises data redundancy and in general
promotes accuracy and consistency of data in the database.

Since the process of normalization involves decomposing a table into two or more.
tables, which are joined while retrieving data, the retrieval speed gets adversely
affected.

6) The advantages of distributed databases are as given below:

• Increased reliability and availability

• Improved data integrity and data administration

• Improved access time

• Modular growth

7) A database stores explicit information while a knowledge base with IF-THEN rules
can infer additional information not directly stored in the basic data. In a knowledge
base inference rules allow relationships to be derived from the data.

8) Artificial intelligence refers to computer programs, which emulate human intelligence

i.e., programs facilitating computers to perform tasks that require intelligence when
performed
~ by humans. Examples of such tasks are visual perception, understanding
natural language, game-playing, theorem-proving, medical diagnosis and engineering
design.

AI methods have been successfully applied in areas like: knowledge-based systems

(expert systems, knowledge bases, etc.), robotics, computer vision, machine
translation, neural networks and others.

3.16 KEYWORDS
Access Method The method used to store, find and retrieve the data from
a database.

Artificial Intelligence A branch of computer science that is attempting to develop

systems to emulate human-like qualities such as learning,
reasoning, communicating, seeing and hearing.

Data Independence Separates the data from the program, which often enables
data definition to be changed without altering the program.

Data Integrity Keeping accurate data which means few errors and
1

the data reflect the true state of a database. 67

Database Design and -, Dependency A dependency refers to relationship amongst attributes
Management
belonging to the relation or different relations.

E-R Diagram Entity-Relationship Diagram. A diagram that shows

associations (relationships) between entities.

Expert System A system with a knowledge base consisting of data and

rules that enables users to make decisions as effectively
as an expert.
•
Foreign Key A column in one table that is primary key in a second
table. It does not need to be a key in the first table.

Knowledge Base A knowledge base is an expert system's database of

knowledge about a particular subject. This includes relevant
facts, rules and procedures for solving problems. The basic
unit of knowledge is expressed as an IF-THEN- ELSE
rule.

Normalisation The process of creating a well-behaved set of tables to

efficiently store data, minimize redundancy and ensure
data integrity.

Primary Key A column or a set of columns that identify a particular

row in a table.

Relation A relation is a table.

Relationship An association between two or more entities.

Schema An overall conceptual or logical view of the relationships

between the data in a database.

Subschema A subset or transformation of the logical view of the

database schema that is required by a particular user
application program.

Transparent In computing, it pertains to a process or procedure involving

a user without the later being aware of its existence.

3.17 REFERENCES AND FURTHER READING

CISMOD 93. Proceedings of the International Conference on Information Systems
and-Management of Data (1993). Indian National Scientific Documentation Centre
(INSDOC): New Delhi.

Courtney, James F. and Paradice, David B. (1988). Database Systems for Management.
Toronto: Times Mirror/Mosby College Publishing.

Codd, E.F. (1990). The Relational Modelfor Database Management; New York: Wesley
Publishing Company. Inc.

Curtin, Dennis P. [et al.] (1999). Information Technology: The Breaking Wave. New
Delhi: Tata McGraw-Hill Publishing Company Limited.

Date, C'J. (1989). Introduction to, Database Systems. New Delhi: Narosa Publishing
House.

Delobel, Claudeand Adiba, Michel (1985). Relational Database Systems. Amsterdam:

Elsevier Science Publishers RV.

Elmasri, Ramaz and Navathe, Shaukan B. (2000). Fundamentals of Database Systems.

Asia.Pearson Education Asia.
68
MeFadden, Fred. R. and Hoffer, Jeffrey A. (1988). Database Management, California Database Management
: The BenjaminlCummings Publishing Company. Inc. " Systems

Martin, James (1988). Principles of Database Management. New Delhi: Prentice-

Hall of India Private Limited.

McNurlin, Barbara C. and Spragu, Ralph H. (1989). Information Systems Management

in Practice. New Jersey: Prentice-Hall.Inc.

McGraw, Karn, L. and Karan, Harbison-Briggs (1989). Knowledge Acquisition:

Principles and Guidelines. New Jersey: Prentice-Hall. Inc.

Murray, Linda. A. and John Richardon, T.E. (1989). Intelligent Systems in a Human
Context. Oxford: Oxford University Press.

O'Brien, James A (1997). Introduction to Information Systems. Singapore: Post, Gerld

y. (2000). Database Management System. Delhi: Tata McGraw-Hill Publishing Company
Ltd. IrwinlMcGraw Hill.

Post, Gerald Y.(200,9). Database Management Systems. New Delhi.: Tata McGraw-
Hill Publishing Company Limited.

Shepherd, Robert D. (2001). Introduction to Computers and Technology. New Delhi:

Crest Publishing House.

Su Stanley, Y.W (1988). Database Computers: Principles, Architectures and

Techniques. Singapore: McGraw-Hill Book Company.

Ullman, Jeffrey, D. (1991). Principles of Database Systems. Ne.w Delhi: Galgotia.

Williams Brain K. [et al.] (l999). Using Information Technology. Singapore: Irwinl
McGraw-Hill.

Introduction To Databases Concepts
No ratings yet
Introduction To Databases Concepts
14 pages
Learning Guide Unit 5
No ratings yet
Learning Guide Unit 5
6 pages
II BSC Dbms Volume1
No ratings yet
II BSC Dbms Volume1
35 pages
LSM (Unit 2)
No ratings yet
LSM (Unit 2)
49 pages
DBMS
No ratings yet
DBMS
95 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Com 312 Dbms
No ratings yet
Com 312 Dbms
25 pages
Chapter 1 DBMS
100% (1)
Chapter 1 DBMS
32 pages
DBMS Module 1
No ratings yet
DBMS Module 1
55 pages
Ism Unit2 PDF
No ratings yet
Ism Unit2 PDF
24 pages
DBMS Basics for Students
No ratings yet
DBMS Basics for Students
185 pages
MCA 05 Unlocked
No ratings yet
MCA 05 Unlocked
156 pages
DBMS 3 Module
No ratings yet
DBMS 3 Module
55 pages
CH - 1 Dbms CLG
No ratings yet
CH - 1 Dbms CLG
18 pages
DBMS UNIT I Part I
No ratings yet
DBMS UNIT I Part I
10 pages
Introduction 1
No ratings yet
Introduction 1
70 pages
DBMS Notes for B.Tech Students
No ratings yet
DBMS Notes for B.Tech Students
135 pages
Intro to Database Management Systems
No ratings yet
Intro to Database Management Systems
25 pages
Unit 2 Part 1
No ratings yet
Unit 2 Part 1
35 pages
Database Concepts for Students
No ratings yet
Database Concepts for Students
7 pages
All Unit Notes
No ratings yet
All Unit Notes
109 pages
DBMS (R20) Unit - 1
No ratings yet
DBMS (R20) Unit - 1
14 pages
Multi-User Relational Database - UA
No ratings yet
Multi-User Relational Database - UA
384 pages
PDF Unit1
No ratings yet
PDF Unit1
109 pages
CSC 220 Dbase
No ratings yet
CSC 220 Dbase
27 pages
Database Management System
No ratings yet
Database Management System
37 pages
3.BIT2103 - 3database Systems - Architecture and Components
No ratings yet
3.BIT2103 - 3database Systems - Architecture and Components
32 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Unit 1
No ratings yet
Unit 1
26 pages
DB Lecture 1
No ratings yet
DB Lecture 1
43 pages
Database Management System
No ratings yet
Database Management System
18 pages
DBMS Module 1
No ratings yet
DBMS Module 1
34 pages
Database Management Systems Guide
No ratings yet
Database Management Systems Guide
111 pages
DBMS Course Syllabus
No ratings yet
DBMS Course Syllabus
219 pages
Database Management System Guide
No ratings yet
Database Management System Guide
42 pages
Sem 3 DBMS UNIT-1
No ratings yet
Sem 3 DBMS UNIT-1
19 pages
RDBMS
No ratings yet
RDBMS
197 pages
Chapter1 RKS
No ratings yet
Chapter1 RKS
38 pages
Introduction of DBMS
No ratings yet
Introduction of DBMS
83 pages
Introduction Notes On Database
No ratings yet
Introduction Notes On Database
61 pages
SQL Commands UNIT - 5
No ratings yet
SQL Commands UNIT - 5
15 pages
Database Systems Overview
No ratings yet
Database Systems Overview
89 pages
DBMS Lecture
No ratings yet
DBMS Lecture
24 pages
Adobe Scan 16 Mar 2024
No ratings yet
Adobe Scan 16 Mar 2024
15 pages
Intro To DBMS
No ratings yet
Intro To DBMS
29 pages
Unit 1
No ratings yet
Unit 1
23 pages
Database Management System Guide
No ratings yet
Database Management System Guide
113 pages
DBMS Unit 1
No ratings yet
DBMS Unit 1
42 pages
Dbms
No ratings yet
Dbms
42 pages
DBMS Class Notes
No ratings yet
DBMS Class Notes
74 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
18 pages
Lecture 0 INT306
No ratings yet
Lecture 0 INT306
38 pages
Fundamentals of Database Systems: Lesson 1: Introduction To Databases
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction To Databases
22 pages
Unit 1
No ratings yet
Unit 1
63 pages
Data Baselecture 1
No ratings yet
Data Baselecture 1
51 pages
Filtered DBMS Syllabu
No ratings yet
Filtered DBMS Syllabu
111 pages
Lecture 1
No ratings yet
Lecture 1
61 pages
BCA2 Dbmsnotes
No ratings yet
BCA2 Dbmsnotes
96 pages
Oss Full Notes 2021 Reg
No ratings yet
Oss Full Notes 2021 Reg
193 pages
Google Cloud Professional Database Engineer Exam Questions
No ratings yet
Google Cloud Professional Database Engineer Exam Questions
69 pages
CUAP DBMS Notes Unit11
No ratings yet
CUAP DBMS Notes Unit11
31 pages
Class 10 IT 402 As Per Latest Syllabus 2024-25
100% (11)
Class 10 IT 402 As Per Latest Syllabus 2024-25
36 pages
CS6005 Advanced Database System UNIT II
No ratings yet
CS6005 Advanced Database System UNIT II
95 pages
IT & Multimedia Courses 2010 List
No ratings yet
IT & Multimedia Courses 2010 List
14 pages
Instant Download (Ebook PDF) Human Resource Information Systems: Basics, Applications, and Future Directions 4th Edition PDF All Chapter
100% (4)
Instant Download (Ebook PDF) Human Resource Information Systems: Basics, Applications, and Future Directions 4th Edition PDF All Chapter
51 pages
Database Concepts and SQL Basics
No ratings yet
Database Concepts and SQL Basics
6 pages
Tena GG
No ratings yet
Tena GG
25 pages
Ii-Ii-Dbms-Courses File
No ratings yet
Ii-Ii-Dbms-Courses File
207 pages
True-Or-False - Chapter10 To Chapter20
No ratings yet
True-Or-False - Chapter10 To Chapter20
11 pages
Lookup and Lookup Caches
No ratings yet
Lookup and Lookup Caches
17 pages
SQL Server 2008 Exam Prep Guide
No ratings yet
SQL Server 2008 Exam Prep Guide
299 pages
Database Development (Basic)
No ratings yet
Database Development (Basic)
5 pages
TITLE OF THE PROJECT School Management S
No ratings yet
TITLE OF THE PROJECT School Management S
19 pages
Database
100% (1)
Database
40 pages
Relational Model
No ratings yet
Relational Model
16 pages
Relational Database Design by ER and EER To Relational Mapping
No ratings yet
Relational Database Design by ER and EER To Relational Mapping
19 pages
Here Is A Complete Set of MySQL Notes
No ratings yet
Here Is A Complete Set of MySQL Notes
6 pages
Case Studies in Systematic Software Development
No ratings yet
Case Studies in Systematic Software Development
405 pages
Course Outline: ECDL Module 5
No ratings yet
Course Outline: ECDL Module 5
2 pages
BSSE Curriculum 08-01-2024 Updated
No ratings yet
BSSE Curriculum 08-01-2024 Updated
60 pages
Mapping of ER Diagram To Relational Model
No ratings yet
Mapping of ER Diagram To Relational Model
36 pages
Part 1-Textbook Exercise-Unit 3 Database Management System Using LibreOffice Base
100% (1)
Part 1-Textbook Exercise-Unit 3 Database Management System Using LibreOffice Base
12 pages
Testblancbi
No ratings yet
Testblancbi
29 pages
MySQL-Full Notes
100% (3)
MySQL-Full Notes
51 pages
Unit 3 - DBMS - LibreOffice Base - NCERT
No ratings yet
Unit 3 - DBMS - LibreOffice Base - NCERT
25 pages
Oil & Gas Data Warehouse Modeling
No ratings yet
Oil & Gas Data Warehouse Modeling
7 pages
Question Set 1 On Rdbms
No ratings yet
Question Set 1 On Rdbms
6 pages
SQL Practical File for BBA Students
No ratings yet
SQL Practical File for BBA Students
28 pages
Unit 1 1
No ratings yet
Unit 1 1
27 pages