UNIT - 1
History of DBMS
Data is a collection of facts and figures. The data collection was increasing day to
day and they needed to be stored in a device or a software which is safer.
Charles Bachman was the first person to develop the Integrated Data Store (IDS)
which was based on network data model for which he was inaugurated with the
Turing Award (The most prestigious award which is equivalent to Nobel prize in
the field of Computer Science.). It was developed in early 1960’s.
In the late 1960’s, IBM (International Business Machines Corporation) developed
the Integrated Management Systems which is the standard database system used
till date in many places. It was developed based on the hierarchical database
model. It was during the year 1970 that the relational database model was
developed by Edgar Codd. Many of the database models we use today are
relational based. It was considered the standardized database model from then.
The relational model was still in use by many people in the market.Later during the
same decade (1980’s), IBM developed the Structured Query Language (SQL) as a
part of R project. It was declared as a standard language for the queries by ISO and
ANSI. The Transaction Management Systems for processing transactions was also
developed by James Gray for which he was felicitated the Turing Award.
Further, there were many other models with rich features like complex queries,
data types to insert images and many others. The Internet Age has perhaps
influenced the data models much more. Data models were developed using object
oriented programming features, embedding with scripting languages like Hyper
Text Markup Language (HTML) for queries. With humongous data being available
online, DBMS is gaining more significance day by day.
File systems verses a DBMS
File System :
The file system is basically a way of arranging the files in a storage medium like
a hard disk. The file system organizes the files and helps in the retrieval of files
when they are required. File systems consist of different files which are grouped
into directories. The directories further contain other folders and files. The file
system performs basic operations like management, file naming, giving access
rules, etc.
Example: NTFS(New Technology File System), EXT(Extended File System).
DBMS(Database Management System) :
Database Management System is basically software that manages the collection
of related data. It is used for storing data and retrieving the data effectively when
it is needed. It also provides proper security measures for protecting the data from
unauthorized access. In Database Management System the data can be fetched by
SQL queries and relational algebra. It also provides mechanisms for data
recovery and data backup.
Example:
Oracle, MySQL, MS SQL server.
Basis File System DBMS
he file system is software that
manages and organizes the DBMS is software for
Structure
files in a storage medium managing the database.
within a computer.
Redundant data can be present In DBMS there is no
Data Redundancy
in a file system. redundant data.
It provides backup and
It doesn’t provide backup and
Backup and Recovery recovery of data even if it is
recovery of data if it is lost.
lost.
There is no efficient query Efficient query processing
Query Processing
processing in the file system. is there in DBMS.
There is more data
There is less data consistency
Consistency consistency because of the
in the file system.
process of normalization.
It has more complexity in
It is less complex as compared
Complexity handling as compared to
to DBMS
the file system.
File systems provide less DBMS has more security
Security Constraints security in comparison to mechanisms as compared to
DBMS. file systems.
It has a comparatively
It is less expensive than
Cost higher cost than a file
DBMS.
system.
There is no data In DBMS data
Data Independence
independence. independence exists.
Only one user can access data Multiple users can access
User Access
at a time. data at a time.
The Data Model:
Data Models in DBMS are:
Hierarchical Model.
Network Model.
Entity-Relationship Model.
Relational Model.
Object-Oriented Data Model.
Object-Relational Data Model.
Flat Data Model.
Semi-Structured Data Model.
Data Model
Data Model gives us an idea that how the final system will look like after its
complete implementation. It defines the data elements and the relationships
between the data elements. Data Models are used to show how data is stored,
connected, accessed and updated in the database management system. Here, we
use a set of symbols and text to represent the information so that members of the
organisation can communicate and understand it. Though there are many data
models being used nowadays but the Relational model is the most widely used
model. Apart from the Relational model, there are many other types of data
models about which we will study in details in this blog. Some of the Data
Models in DBMS are:
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model
Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in
the hierarchical tree structure. The hierarchy starts from the root which has root
data and then it expands in the form of a tree adding child node to the parent
node. This model easily represents some of the real-world relationships like food
recipes, sitemap of a website etc.
Example: We can represent the relationship between the shoes present on a
shopping website in the following way:
Features of a Hierarchical Model
1. One-to-many relationship: The data here is organised in a tree-like
structure where the one-to-many relationship is between the datatypes.
Also, there can be only one path from parent to any node. Example: In the
above example, if we want to go to the node sneakers we only have one
path to reach there i.e through men's shoes node.
2. Parent-Child Relationship: Each child node has a parent node but a parent
node can have more than one child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is
automatically deleted.
4. Pointers: Pointers are used to link the parent node with the child node and
are used to navigate between the stored data. Example: In the above
example the 'shoes' node points to the two other nodes 'women shoes' node
and 'men's shoes' node.
Advantages of Hierarchical Model
It is very simple and fast to traverse through a tree-like structure.
Any change in the parent node is automatically reflected in the child node
so, the integrity of data is maintained.
Disadvantages of Hierarchical Model
Complex relationships are not supported.
As it does not support more than one parent of the child node so if we have
some complex relationship where a child node needs to have two parent
node then that can't be represented using this model.
If a parent node is deleted then the child node is automatically deleted.
Network Model
This model is an extension of the hierarchical model. It was the most popular
model before the relational model. This model is the same as the hierarchical
model, the only difference is that a record can have more than one parent. It
replaces the hierarchical tree with a graph. Example: In the example below we
can see that node student has two parents i.e. CSE Department and Library. This
was earlier not possible in the hierarchical model.
Features of a Network Model
1. Ability to Merge more Relationships: In this model, as there are more
relationships so data is more related. This model has the ability to manage
one-to-one relationships as well as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one
path to the same record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with
the help of the circular linked list. The current position is maintained with
the help of a program and this position navigates through the records
according to the relationship.
Advantages of Network Model
The data can be accessed faster as compared to the hierarchical model. This
is because the data is more related in the network model and there can be
more than one path to reach a particular node. So the data can be accessed
in many ways.
As there is a parent-child relationship so data integrity is present. Any
change in parent record is reflected in the child record.
Disadvantages of Network Model
As more and more relationships need to be handled the system might get
complex. So, a user must be having detailed knowledge of the model to
work with the model.
Any change like updation, deletion, insertion is very complex.
Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model
diagram. In this model, we represent the real-world problem in the pictorial form
to make it easy for the stakeholders to understand. It is also very easy for the
developers to understand the system by just looking at the ER diagram. We use
the ER diagram as a visual tool to represent an ER Model. ER diagram has the
following three components:
Entities: Entity is a real-world thing. It can be a person, place, or even a
concept. Example: Teachers, Students, Course, Building, Department, etc
are some of the entities of a School Management System.
Attributes: An entity contains a real-world property called attribute. This is
the characteristics of that attribute. Example: The entity teacher has the
property like teacher id, salary, age, etc.
Relationship: Relationship tells how two attributes are
related. Example: Teacher works for a department.
Example:
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number.
The attributes of entity Department entity are Dept_id, Dept_name. The two
entities are connected using the relationship. Here, each teacher works for a
department.
Features of ER Model
Graphical Representation for Better Understanding: It is very easy and
simple to understand so it can be used by the developers to communicate
with the stakeholders.
ER Diagram: ER diagram is used as a visual tool for representing the
model.
Database Design: This model helps the database designers to build the
database and is widely used in database design.
Advantages of ER Model
Simple: Conceptually ER Model is very easy to build. If we know the
relationship between the attributes and the entities we can easily build the
ER Diagram for the model.
Effective Communication Tool: This model is used widely by the database
designers for communicating their ideas.
Easy Conversion to any Model: This model maps well to the relational
model and can be easily converted relational model by converting the ER
model to the table. This model can also be converted to any other model
like network model, hierarchical model etc.
Disadvatages of ER Model
No industry standard for notation: There is no industry standard for
developing an ER model. So one developer might use notations which are
not understood by other developers.
Hidden information: Some information might be lost or hidden in the ER
model. As it is a high-level view so there are chances that some details of
information might be hidden.
Relational Model
Relational Model is the most widely used model. In this model, the data is
maintained in the form of a two-dimensional table. All the information is stored
in the form of row and columns. The basic structure of a relational model is
tables. So, the tables are also called relations in the relational model.
Example: In this example, we have an Employee table.
Features of Relational Model
Tuples: Each row in the table is called tuple. A row contains all the
information about any instance of the object. In the above example, each
row has all the information about any specific individual like the first row
has information about John.
Attribute or field: Attributes are the property which defines the table or
relation. The values of the attribute should be from the same domain. In the
above example, we have different attributes of the employee like Salary,
Mobile_no, etc.
Advnatages of Relational Model
Simple: This model is more simple as compared to the network and
hierarchical model.
Scalable: This model can be easily scaled as we can add as many rows and
columns we want.
Structural Independence: We can make changes in database structure
without changing the way to access the data. When we can make changes
to the database structure without affecting the capability to DBMS to
access the data we can say that structural independence has been achieved.
Disadvantages of Relatinal Model
Hardware Overheads: For hiding the complexities and making things
easier for the user this model requires more powerful hardware computers
and data storage devices.
Bad Design: As the relational model is very easy to design and use. So the
users don't need to know how the data is stored in order to access it. This
ease of design can lead to the development of a poor database which would
slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the
relational model. These problems can be avoided with the help of proper
implementation and organisation.
Object-Oriented Data Model
The real-world problems are more closely represented through the object-oriented
data model. In this model, both the data and relationship are present in a single
structure known as an object. We can store audio, video, images, etc in the
database which was not possible in the relational model(although you can store
audio and video in relational database, it is advised not to store in the relational
database). In this model, two are more objects are connected through links. We
use this link to relate one object to other objects. This can be understood by the
example given below.
In the above example, we have two objects Employee and Department. All the
data and relationships of each object are contained as a single unit. The attributes
like Name, Job_title of the employee and the methods which will be performed
by that object are stored as a single object. The two objects are connected through
a common attribute i.e the Department_id and the communication between these
two will be done with the help of this common id.
Object-Relational Model
As the name suggests it is a combination of both the relational model and the
object-oriented model. This model was built to fill the gap between object-
oriented model and the relational model. We can have many advanced features
like we can make complex data types according to our requirements using the
existing data types. The problem with this model is that this can get complex and
difficult to handle. So, proper understanding of this model is required.
Flat Data Model
It is a simple model in which the database is represented as a table consisting of
rows and columns. To access any data, the computer has to read the entire table.
This makes the modes slow and inefficient.
Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot
differentiate between data and schema in this model.
Example: Web-Based data sources which we can't differentiate between the
schema and data of the website. In this model, some entities may have missing
attributes while others may have an extra attribute. This model gives flexibility in
storing the data. It also gives flexibility to the attributes.
Example: If we are storing any value in any attribute then that value can be either
atomic value or a collection of values.
Associative Data Model
Associative Data Model is a model in which the data is divided into two parts.
Everything which has independent existence is called as an entity and the
relationship among these entities are called association. The data divided into two
parts are called items and links.
Item: Items contain the name and the identifier (some numeric value).
Links: Links contain the identifier, source, verb and subject.
Example: Let us say we have a statement "The world cup is being hosted by
London from 30 May 2020". In this data two links need to be stored:
1. The world cup is being hosted by London. The source here is 'the world
cup', the verb 'is being' and the target is 'London'.
2. from 30 May 2020. The source here is the previous link, the verb is 'from'
and the target is '30 May 2020'.
This is represented using the table as follows:
Context Data Model
Context Data Model is a collection of several models. This consists of models
like network model, relational models etc. Using this model we can do various
types of tasks which are not possible using any model alone.
Levels of Abstraction in a DBMS
Data Abstraction is a process of hiding unwanted or irrelevant details from the
end user. It provides a different view and helps in achieving data independence
which is used to enhance the security of data.
The database systems consist of complicated data structures and relations. For
users to access the data easily, these complications are kept hidden, and only the
relevant part of the database is made accessible to the users through data
abstraction.
Levels of abstraction
Database systems include complex data-structures. In terms of retrieval of data,
reduce complexity in terms of usability of users and in order to make the system
efficient, developers use levels of abstraction that hide irrelevant details from the
users. Levels of abstraction simplify database design.
Mainly there are three levels of abstraction for DBMS, which are as follows:
Physical or Internal Level
Logical or Conceptual Level
View or External Level
Physical: This is the lowest level of data abstraction. It tells us how the data is
actually stored in memory. The access methods like sequential or random access
and file organization methods like B+ trees, hashing used for the same. Usability,
size of memory, and the number of times the records are factors that we need to
know while designing the database. Suppose we need to store the details of an
employee. Blocks of storage and the amount of memory used for these purposes
are kept hidden from the user.
Logical: Logical level is the intermediate level or next higher level. It describes what
data is stored in the database and what relationship exists among those data. It tries to
describe the entire or whole data because it describes what tables to be created and
what are the links among those tables that are created.
It is less complex than the physical level. Logical level is used by developers or
database administrators (DBA). So, overall, the logical level contains tables (fields and
attributes) and relationships among table attributes.
View: This is the highest level of abstraction. Only a part of the actual database is
viewed by the users. This level exists to ease the accessibility of the database by an
individual user. Users view data in the form of rows and columns. Tables and
relations are used to store data. Multiple views of the same database may exist.
Users can just view the data and interact with the database, storage and
implementation details are hidden from them.
Data Independence
The main purpose of data abstraction is to achieve data independence in order to
save time and cost required when the database is modified or altered.
We have namely two levels of data independence arising from these levels of
abstraction:
Physical level data independence: It refers to the characteristic of being able to
modify the physical schema without any alterations to the conceptual or logical
schema, done for optimization purposes, e.g., Conceptual structure of the
database would not be affected by any change in storage size of the database
system server. Changing from sequential to random access files is one such
example. These alterations or modifications to the physical structure may
include:
Utilizing new storage devices.
Modifying data structures used for storage.
Altering indexes or using alternative file organization techniques etc.
Logical level data independence: It refers characteristic of being able to modify
the logical schema without affecting the external schema or application program.
The user view of the data would not be affected by any changes to the conceptual
view of the data. These changes may include insertion or deletion of attributes,
altering table structures entities or relationships to the logical schema, etc.
Importance of Data Independence
Helps you to improve the quality of the data
Database system maintenance becomes affordable
Enforcement of standards and improvement in database security
You don’t need to alter data structure in application programs
Permit developers to focus on the general structure of the Database rather
than worrying about the internal implementation
It allows you to improve state which is undamaged or undivided
Database incongruity is vastly reduced.
Easily make modifications in the physical level is needed to improve the
performance of the system.
Structure of a DBMS
Database Management System (DBMS) is a software that allows access to data
stored in a database and provides an easy and effective method of
Defining the information.
Storing the information.
Manipulating the information.
Protecting the information from system crashes or data theft.
Differentiating access permissions for different users.
Please be note that the Structure of Database Management System is also
referred to as Overall System Structure or Database Architecture but it is
different from the tier architecture of Database.
The database system is divided into three components: Query Processor, Storage
Manager, and Disk Storage. These are explained as following below.
1. Query Processor:
It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received
from the DML compiler.
Query Processor contains the following components –
DML Compiler:
It processes the DML statements into low level instruction (machine language),
so that they can be executed.
DDL Interpreter:
It processes the DDL statements into a set of table containing meta data (data
about data).
Embedded DML Pre-compiler:
It processes DML statements embedded in an application program into
procedural calls.
Query Optimizer:
It executes the instruction generated by DML Compiler.
3. Storage Manager:
Storage Manager is a program that provides an interface between the data
stored in the database and the queries received. It is also known as Database
Control System. It maintains the consistency and integrity of the database by
applying the constraints and executes the DCL statements. It is responsible for
updating, storing, deleting, and retrieving data in the database.
It contains the following components:
Authorization Manager:
It ensures role-based access control, i.e,. checks whether the particular person
is privileged to perform the requested operation or not.
Integrity Manager:
It checks the integrity constraints when the database is modified.
Transaction Manager:
It controls concurrent access by performing the operations in a scheduled way
that it receives the transaction. Thus, it ensures that the database remains in
the consistent state before and after the execution of a transaction.
File Manager:
It manages the file space and the data structure used to represent information
in the database.
Buffer Manager:
It is responsible for cache memory and the transfer of data between the
secondary storage and main memory.
3. Disk Storage: It contains the following components:
Data File: It stores the data.
Data Dictionary:
It contains the information about the structure of any database object. It is the
repository of information that governs the metadata.
Indices: It provides faster retrieval of data item.