DATABASE MANAGEMENT SYSTEM
(BCA 501)
Syllabus
UNIT-I
Introduction: Characteristics of database approach, data models, DBMS
architecture and data independence.
UNIT-II
E-R Modeling: Entity types, Entity set, attribute and key, relationships, relation
types, roles and structural constraints, weak entities, enhanced E-R and object
modeling, Sub classes; Super classes, inheritance, specialization and generalization.
UNIT-III
File Organization: Indexed sequential access files; implementation using B & B++
trees, hashing, hashing functions, collision resolution, extendible hashing, dynamic
hashing approach implementation and performance.
UNIT-IV
Relational Data Model: Relational model concepts, relational constraints, relational
alzebra SQL: SQL queries, programming using SQL.
UNIT-V
EER and ER to relational mapping: Data base design using EER to relational
language.
UNIT-VI
Data Normalization: Functional Dependencies, Normal form up to 3rd normal form.
Concurrency Control: Transaction processing, locking techniques and associated,
database recovery, security and authorization. Recovery Techniques, Database
Security
UNIT -1 Meaning & Definition
Databases and database systems have become an essential part of our
everyday life. We encounter several activities that involve some
interaction with a database almost daily.
The examples include deposit and/or withdrawal from a bank, hotel,
airline or railway reservation, accessing a computerized library, order a
magazine subscription from a publisher, purchase items from
supermarkets. In all the above cases a database is accessed. These may
be called Traditional Database Applications.
In these types of databases, the information stored and accessed is
textual or numeric. However, with advances in technology in the past
few years, different databases have been developed such as
Multimedia Databases that store pictures, video clips and sound
messages; Geographical Information Systems (GIS) that can
store maps, weather data and satellite images, etc., and Real time
databases that can control industrial and manufacturing processes.
Data: Data is a collection of raw facts and figures. Data is a
collection of unorganized facts but able to to organized into useful
information.
Database: Database is an organized collection of data.
Database Management System: DBMS is basically a
collection of programs that enables users to defined, create and
maintain a database and provide control access to data.
NEED FOR A DATABASE MANAGEMENT SYSTEM
A Database is an organized, persistent collection of data of an organization.
The database management system manages the database of an enterprise.
But why do we need the database management system? To describe it, let us
first discuss the alternative to it, that is the file-based system.
The File Based System : File based systems are an early attempt to
computerize the manual filing system. For example, a manual file can
be set up to hold all the correspondence relating to a particular matter as a
project, product, task, client or employee. In an organization there could be
many such files which may be labeled and stored. The same could be done
at homes where file relating to bank statements, receipts, tax payments,
etc., could be maintained.
Limitations of File Based System
• Separation and isolation of data: When the data is stored in
separate files it becomes difficult to access. It becomes extremely
complex when the data has to be retrieved from more than two files
as a large amount of data has to be searched.
• Duplication of data: Due to the decentralized approach, the file
system leads to uncontrolled duplication of data. This is undesirable as
the duplication leads to wastage of a lot of storage space. It also
costs time and money to enter the data more than once.
• Inconsistent Data: The data in a file system can become
inconsistent if more than one person modifies the data
concurrently, for example, if any student changes the residence and the
change is notified to only his/her file and not to bus list. Entering wrong
data is also another reason for inconsistencies.
Data dependence: The physical structure and storage of data files and records are
defined in the application code.This means that it is extremely difficult to make
changes to the existing structure. The programmer would have to identify all the
affected programs, modify them and retest them. This characteristic of the File Based
system is called program data dependence.
• Incompatible File Formats: Since the structure of the files is embedded in
application programs, the structure is dependent on application programming
languages. Hence the structure of a file generated by COBOL programming language
may be quite different from a file generated by ‘C’ programming language. This
incompatibility makes them difficult to process jointly. The application developer may
have to develop software to convert the files to some common format for processing.
However, this may be time consuming and expensive.
• Fixed Queries: File based systems are very much dependent on application
programs. Any query or report needed by the organization has to be developed by
the application programmer. With time, the type and number of queries or reports
increases. Producing different types of queries or reports is not possible in File Based
Systems. As a result, in some organizations the type of queries or reports to be
produced is fixed. No new query or report of the data could be generated.
The Database Approach
In order to overcome the limitations of a file system, a new approach was
required. Hence a database approach emerged. A database is a persistent
collection of logically related data. The initial attempts were to provide a
centralized collection of data.
A database has a self-describing nature.
It contains not only the data but also the complete definition of the
database structure and constraints, which are stored in a system catalog.
A DBMS manages this data.
It allows data sharing and integration of data of an organization in a single
database.
DBMS controls access to this data and thus needs to provide features for database
creation, data manipulation such as data value modification, data retrieval, data
integrity and security etc.
Advantages of the database approach
Reduction of Redundancies: In a file processing system, each user
group maintains its own files resulting in a considerable amount of
redundancy of the stored data. This results in wastage of storage space
but more importantly may result in data inconsistencies. Also, the same
data has to be updated more than once resulting in duplication of effort.
The files that represent the same data may become inconsistent as some
may be updated whereas others may not be. In database approach
data can be stored at a single place or with controlled redundancy
under DBMS, which saves space and does not permit inconsistency.
Shared Data: A DBMS allows the sharing of database under its
control by any number of application programs or users. A database
belongs to the entire organization and is shared by all authorized users.
Data Independence In the file-based system, the descriptions of data
and logic for accessing the data are built into each application program
making the program more dependent on data. A change in the structure
of data may require alterations to programs. Database Management
systems separates data descriptions from data. Hence it is not
affected by changes.This is called Data Independence, where details
of data are not exposed. DBMS provides an abstract view and hides
details.
Improved Integrity Data Integrity refers to validity and
consistency of data. Data Integrity means that the data should be
accurate and consistent.This is done by providing some checks or
constraints. These are consistency rules that the database is not
permitted to violate. Constraints may apply to data items within a record
or relationships between records. For example, the age of an employee
can be between 18 and 70 years only. While entering the data for the age
of an employee, the database should check this. However, if Grades of
any student are entered, the data can be erroneously entered as Grade C
for Grade A. In this case DBMS will not be able to provide any check as
both A and C are of the same data type and are valid values.
Efficient Data Access DBMS utilizes techniques to store and retrieve
the data efficiently at least for unforeseen queries. A complex DBMS
should be able to provide services to end users, where they can efficiently
retrieve the data almost immediately.
Multiple User Interfaces Since many users having varying levels of
technical knowledge use a database, a DBMS should be able to provide a
variety of interfaces. This includes ─
a. query language for casual users,
b. programming language interfaces for application programmers,
c. forms and codes for parametric users,
d. menu driven interfaces, and
e. natural language interfaces for standalone users, these interfaces are still not
available in standard form with commercial database.
Representing complex relationship among data A database may
include varieties of data interrelated to each other in many ways. A DBMS
must have the capability to represent a variety of relationships among the
data as well as to retrieve and update related data easily and efficiently.
Improved Security Data is vital to any organization and also confidential. In a
shared system where multiple users share the data, all information should not be
shared by all users. For example, the salary of the employees should not be visible to
anyone other than the department dealing in this. Hence, database should be protected
from unauthorized users. This is done by Database Administrator (DBA) by providing
the usernames and passwords only to authorized users as well as granting privileges or
the type of operation allowed. This is done by using security and authorization
subsystem. Only authorized users may use the database and their access types
can be restricted to only retrieval, insert, update or delete or any of these. For
example, the Branch Manager of any company may have access to all data whereas the
Sales Assistant may not have access to salary details.
Improved Backup and Recovery A file-based system may fail to provide
measures to protect data from system failures. This lies solely on the user by taking
backups periodically. DBMS provides facilities for recovering the hardware and
software failures. A backup and recovery subsystem is responsible for this. In case
a program fails, it restores the database to a state in which it was before the execution
of the program.
Support for concurrent transactions A transaction is defined as the unit of
work. For example, a bank may be involved in a transaction where an amount of
Rs.5000/- is transferred from account X to account Y. A DBMS also allows multiple
transactions to occur simultaneously.
Disadvantages of a DBMS
Danger of a Overkill: For small and simple applications for single
users a database system is often not advisable.
Complexity: A database system creates additional complexity and
requirements. The supply and operation of a database management
system with several users and databases is quite costly and
demanding.
Qualified Personnel: The professional operation of a database
system requires appropriately trained staff. Without a qualified
database administrator nothing will work for long.
Costs: Through the use of a database system new costs are generated
for the system itself’s but also for additional hardware and the more
complex handling of the system.
Lower Efficiency: A database system is a multi-use software which
is often less efficient than specialized software which is produced and
optimized exactly for one problem.
THE DBMS ARCHITECTURE
There are two different ways to look at the architecture of a
DBMS:
the logical DBMS architecture and the
physical DBMS architecture.
The logical architecture deals with the way data is stored and
presented to users, while the physical architecture is concerned
with the software components that make up a DBMS.
The logical architecture
The logical architecture describes how data in the database is perceived
by users. It is not concerned with how the data is handled and processed
by the DBMS, but only with how it looks. The method of data storage on
the underlying file system is not revealed, and the users can manipulate
the data without worrying about where it is located or how it is actually
stored. This results in the database having different levels of abstraction.
It divides the system into three levels of abstraction:
the internal or physical level,
the conceptual level, and
the external or view level.
The diagram below shows the logical architecture for a typical DBMS.
The External or View Level Basic
Concepts
The external or view level is the highest level of abstraction
of database. It provides a window on the conceptual view,
which allows the user to see only the data of interest to them.
The user can be either an application program or an end user.
There can be many external views as any number of external
schema an be defined and they can overlap each other. It consists
of the definition of logical records and relationships in the
external view. It also contains the methods for deriving the
objects such as entities, attributes and relationships in the
external view from the Conceptual view.
The Conceptual Level or Global level
The conceptual level presents a logical view of the entire
database as a unified whole. It allows the user to bring all
the data in the database together and see it in a consistent
manner. Hence, there is only one conceptual schema per
database. The first stage in the design of a database is to define the
conceptual view, and a DBMS provides a data definition language
for this purpose. It describes all the records and relationships
included in the database. The data definition language used to
create the conceptual level must not specify any physical storage
considerations that should be handled by the physical level. It does
not provide any storage or access details, but defines the
information content only.
The Internal or Physical Level
The collection of files permanently stored on secondary storage
devices is known as the physical database. The physical or
internal level is the one closest to physical storage, and it provides a
low-level description of the physical database, and an interface
between the operating systems file system and the record structures
used in higher levels of abstraction. It is at this level that record
types and methods of storage are defined, as well as how stored
fields are represented, what physical sequence the stored records
are in, and what other physical structures exist.
Data Dictionary
A Data Dictionary stores information about the structure of the
database. It is used heavily. Hence a good data dictionary should
have a good design and efficient implementation. It is seen that
when a program becomes somewhat large in size, keeping track of all
the available names that are used and the purpose for which they were
used becomes more and more difficult. After a significant time if the
same or another programmer has to modify the program, it becomes
extremely difficult.
Data Independence
The three-schema architecture can be used to explain the concept
of data independence, which can be defined as the capacity to
change the schema at one level of a database system without
having to change the schema at the next higher level.
The ability to modify a schema definition in one level without
affecting a schema definition in the next higher level is called data
independence.
DBMS ARCHITECTURE
A DBMS architecture defines how users interact with the database to
read, write, or update information. A well-designed architecture and
schema (a blueprint detailing tables, fields and relationships) ensure
data consistency, improve performance and keep data secure.
There are several types of DBMS Architecture that we use according
to the usage requirements.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1-Tier Architecture
In 1-Tier Architecture, the user works directly with
the database on the same system. This means the client, server
and database are all in one application. The user can open the
application, interact with the data and perform tasks without
needing a separate server or network connection.
A common example is Microsoft Excel. Everything from the user
interface to the logic and data storage happens on the same
device. The user enters data, performs calculations and saves files
directly on their computer.
2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model.
The application at the client end directly communicates with the
database on the server side. APIs like ODBC and JDBC are used
for this interaction. The server side is responsible for providing
query processing and transaction management functionalities.
On the client side, the user interfaces and application programs
are run. The application on the client side establishes a connection
with the server side to communicate with the DBMS. For
Example: A Library Management System used in schools or small
organizations is a classic example of two-tier architecture.
3-Tier Architecture
In 3-Tier Architecture, there is another layer between the client and the
server. The client does not directly communicate with the server.
Instead, it interacts with an application server which further
communicates with the database system and then the query processing
and transaction management takes place. This intermediate layer acts as
a medium for the exchange of partially processed data between the
server and the client. This type of architecture is used in the case of
large web applications.
Example: E-commerce Store
User:You visit an online store, search for a product and add it to your
cart.
Processing: The system checks if the product is in stock, calculates
the total price and applies any discounts.
Database: The product details, your cart and order history are stored
in the database for future reference.
Data Models
Data Model is the modeling of the data description, data semantics,
and consistency constraints of the data. It provides the conceptual
tools for describing the design of a database at each level of data
abstraction.
1) Relational Data Model: This type of model designs the data
in the form of rows and columns within a table.Thus, a
relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This model
was initially described by Edgar F. Codd, in 1969. The relational data
model is the widely used model which is primarily used by
commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the
logical representation of data as objects and relationships
among them.These objects are known as entities, and
relationship is an association among these entities. This model
was designed by Peter Chen and published in 1976 papers. It was
widely used in database designing. A set of attributes describe the
entities. For example, student_name, student_id describes the
'student' entity. A set of the same type of entities is known as an
'Entity set', and the set of the same type of relationships is known as
'relationship set'.
RELATIONAL MODEL
3) Object-based Data Model: An extension of the ER model with
notions of functions, encapsulation, and object identity, as well.This
model supports a rich type system that includes structured and
collection types. Thus, in 1980s, various database systems following the
object-oriented approach were developed. Here, the objects are nothing
but the data carrying its properties.
4) Semi structured Data Model: This type of data model is different
from the other three data models (explained above). The semi structured
data model allows the data specifications at places where the
individual data items of the same type may have different attributes
sets.The Extensible Markup Language, also known as XML, is widely
used for representing the semi structured data. Although XML was
initially designed for including the markup information to the text
document, it gains importance because of its application in the exchange of
data.
Object-based Data Model
Semi structured Data Model
Database Users
Database users are categorized based on their interaction with the database. There are seven
types of database users in DBMS. Below mentioned are the types of database users:
1. Database Administrator (DBA)
A Database Administrator (DBA) defines the schema and manages all three levels of the database
architecture. They create user accounts, control access, and ensure data security by authorizing
only trusted users. DBAs are also responsible for handling security breaches and optimizing
system performance.
DBA also monitors the recovery and backup and provides technical support.
The DBA has a DBA account in the DBMS which is called a system or superuser account.
DBA repairs damage caused due to hardware and/or software failures.
DBA is the one having privileges to perform DCL (Data Control Language) operations such
as GRANT and REVOKE, to allow/restrict a particular user from accessing the database.
2. Naive / Parametric End Users
Parametric End Users are the unsophisticated who lacks DBMS knowledge but they frequently
use the database applications in their daily life to get the desired results. For example, Railway’s
ticket booking users, Clerks in any bank are naive user because they lacks DBMS knowledge but
they still use the database.
3. A System Analyst
A system Analyst is a user who analyzes the requirements of
parametric end users. They check whether all the requirements of
end users are satisfied.
4. Sophisticated Users
Sophisticated users like engineers or analysts interact directly with
the database using SQL queries. They don't write full application
code but use the query processor to access and manipulate data.
These users develop custom database applications based on their
specific needs.
.
5. Database Designers
Database Designers create the structure of a database, including
tables, views, indexes, triggers, and constraints. They decide what
data to store and how it should be related based on user
requirements. Their design is finalized before the database is built or
populated with data.
6. Application Programmers
Application Programmers, also known as System Analysts or
Software Engineers, write backend code for application programs.
They use languages like C, Visual Basic, COBOL, etc., to design,
test, and maintain programs. These programs, called "canned
transactions," help naive users interact with the database efficiently.
7. Specialized users
Specialized users are sophisticated users who write specialized
database application that does not fit into the traditional data-
processing framework. Among these applications are computer
aided-design systems, knowledge-base and expert systems etc.
Database Administrator
One of the main reasons for using DBMSs is to have central control
of both the data and the programs that access those data. A person
who has such central control over the system is called a database
administrator (DBA)
The functions of a DBA include:
Schema definition. The DBA creates the original database schema by executing a set
of data definition statements in the DDL.
Storage structure and access-method definition.
Schema and physical-organization modification. The DBA carries out changes
to the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access. The authorization information is kept in a special system
structure that the database system consults whenever someone attempts to access the
data in the system.
Routine maintenance. Examples of the database administrator’s routine
maintenance activities are:
Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users