0% found this document useful (0 votes)
17 views79 pages

Unit-1 DBMS Notes

The document explains the concepts of data, information, and knowledge, highlighting their differences and the role of information systems in organizations. It details the components, dimensions, types, and advantages of information systems, as well as the significance of database management systems (DBMS) in managing data efficiently. Additionally, it contrasts DBMS with traditional file systems, emphasizing the benefits of using a centralized approach to data management.

Uploaded by

anikasahu720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views79 pages

Unit-1 DBMS Notes

The document explains the concepts of data, information, and knowledge, highlighting their differences and the role of information systems in organizations. It details the components, dimensions, types, and advantages of information systems, as well as the significance of database management systems (DBMS) in managing data efficiently. Additionally, it contrasts DBMS with traditional file systems, emphasizing the benefits of using a centralized approach to data management.

Uploaded by

anikasahu720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

UNIT-1

Data
Data is the raw material that can be processed for any computing machine.
For example − Employee name, Product name, Name of the student, Marks of the
student, Mobile number, Image etc.

Information
Information is the data that has been converted into more useful or intelligent form.
For example: Report card sheet.
The information is needed for the following reasons −
 To gain knowledge about the surroundings.
 To keep the system up to date.
 To know about the rules and regulations of the society.
Knowledge
The human mind purposefully organizes the information and evaluates it to produce
knowledge.
Example of data, information and knowledge
A student secures 450 marks. Here 450 is data, marks of the student is
the information and hard work required to get the marks is knowledge.
Differences
The major differences between Data and Information are as follows −

Data Information

Data is the raw fact. It is a processed form of data.

It is not significant to a business. It is significant to a business.

Data is an atomic level piece of information. It is a collection of data.

Example: Product name, Name of student. Example: Report card of student.

It is a phenomenal fact. It is organized data.

This is the primary level of intelligence. It is a secondary level of intelligence.

May or may not be meaningful. Always meaningful.

Understanding is difficult. Understanding is easy.


Diagram
The diagram given below depicts the use of data and information in a database −
Information System Definition
"Information system is set of people, information technology, and business
process in order to achieve a business objective."

Information systems are a set of interconnected elements working together to


collect, process, store, and distribute information to help coordination, visualization
in an organization, analysis, and decision-making.

The Information system can be defined as a collection of software, hardware,


and telecommunications network that people develop and use to gather,
create, and distribute useful data, mainly in organizational settings.

In other words, an information system means a collection of interrelated components


which work together to gather, process, store, and break down the information to
help decision making.
Dimensions of an information system
There are various dimensions of an information system:

1. Organizational dimension
2. Management dimension
3. Technology dimension

Let's discuss the above dimensions in detail:

1. Organizational Dimension
The information system is the organization's part. The ordinary operating procedure
and culture of an organization would be embedded in the information system. This
includes the following:

o Business processes
o Political interest groups
o Functional specialties
o Cultured

2. Management Dimension
In today's world, managers face business challenges. Information systems provide
managers with the tools and information they have to plan, manage, monitor their
work, make decisions, develop new goods and services, and make long-term tactical
decisions.

3. Technology Dimension
Management makes use of technology to fulfill their duties. It contains- computer
hardware and software, networking/telecom technology, and data management. It's
one of the many strategies a manager can use to deal with changes. Organizational
levels, processing, system goals, mode of data and type of support provided are used
to classify information systems.

Types of Information System


The types of information systems are as follows:

1. Transaction Processing System (TPS):

o The term "transaction processing system" refers to an information system that


processes data are originating from business transactions.
o The primary purpose of a transaction processing system is to offer
transactions to update records and produce reports required for storekeeping.
o Online Transaction Processing and Batching Processing are the methods
which we used to complete the transaction.
o Examples of transaction processing systems are Stock control systems, Payroll
systems, Bill systems.

2. Management Information System (MIS):

o The purpose of a management information system is to transform


comparatively raw data accessible through using Transaction Processing
System into a summarized and aggregated form for managers, generally in
the form of a report. Operational supervisors and middle management are
likely to use the reports.
o In MIS, there are various kinds of reports generated. Few reports are a kind of
summary report, ad-hoc reports, exception report, and on-demand report.
o Examples of Management Information System are Human resource
management system and Sales management systems.

3. Decision Support System (DSS)

o Another type of information system is a decision support system. It is


interactive, which offers information, data manipulation tools, and models to
support decision-making in a semi-structured and unstructured scenario.
o This type of information system includes tools and techniques to help gather
relevant information and examine options, and substitutes, the end-user being
more elaborate in making DSS than MIS.
o Examples of Decision Support System: Bank loan management systems,
financial planning systems.

4. Experts System

o The expert system contains expertise which is helpful for a manager


in identifying problems or in problem-solving. The principles of artificial
intelligence research are used to develop these kinds of information systems.
o This type of information system is a knowledge-based system. It acts as an
expert consultant to users by utilizing its knowledge of a specific area.
o There are some components of expert systems such as Knowledgebase and
software modules. These modules make inferences based on knowledge and
provide answers to a user's query.
Components of Information System
There are various components of an information system:

1. Hardware
2. Software
3. Data
4. Procedures
5. People
6. Feedback

Let's discuss the above components in detail:

1. Hardware
Hardware means equipment and machinery. This category encompasses the
computer and all of its supporting equipment in modern information systems. The
supporting devices contain input and output devices, communication devices and
storage device. Hardware in pre-computer information systems may contain ledger
books and ink.

2. Software
In an information system, software means computer programs as well as the manuals
which support them. Computer program means the machine-readable instructions
that tell circuitry in the system's hardware to work to generate helpful information
from the data. In most cases, programs are stored on an input/output medium, such
as a tape or disk. The software which is for pre-computer information systems
comprised instruction for using them means the guidebook for a card catalog and
the information regarding how the hardware was configured for use such as columns
headings in the ledger book.

3. Data
Data means facts that systems use to generate valuable knowledge. Data is usually
stored in machine-readable form on tape or disk until the computer requires them.
The data in pre-computer information systems is usually stored in a human-readable
format.

4. Procedures
Procedures mean rules which govern how an operation is performed in information
system. "Procedures are for people what software is for hardware" is a general
analogy that we used to clarify the importance of procedures in a system.

5. People
Every system requires individuals if the system is to be beneficial. People are often
the most neglected part of the system, and they are possibly the factor that has the
greatest impact on the success or failure of information systems.

This contains clients, yet additionally the individuals who operate as well as service
the computers, those who support the network of computers, and the individuals
who keep up the information.

6. Feedback
Another component of an information system is feedback, which determines that an
information system can be offered with feedback. However, this component is not
needed to function.

Advantages of Information System


There are various advantages of the information system:

o Communication
o Availability
o Creation of new types of jobs
o Globalization and cultural gap
Communication
Using information technology, instant messaging, emails, voice, and video calls,
communication become inexpensive, faster, and effective.

Availability
With the help of the Information system, it is possible for businesses around the
world to be open around the clock. This implies that a business can be open anytime,
anyplace, making buys from various nations simpler and more helpful. It likewise
implies that you can have your products delivered right to your doorstep without
making more effort.

Creation of New Types of Jobs


The creation of new and exciting jobs is another advantage of the information
system. We can create various opportunities using IT such as computer
programmers, Hardware and Software developers, Web designers, and Systems
analyzers.

Globalization and Cultural Gap


We may reduce semantic, Geological, and some social constraints by implementing
an information system. Sharing data, information, knowledge, communication, and
connections between different nations, societies, and dialects are a lot easier.

Components of Information Systems


Every information system includes several key components: hardware, software,
telecommunications, people and data. Hardware refers to the physical pieces of the
information system; software is the programming that controls the information
system; telecommunication transmits information through the system; humans
manage and interact with the information system; and data is information stored
within and processed by the system.

Hardware
The hardware component of an information system comprises the physical elements
of the system. People can touch and feel pieces of hardware. These mechanisms,
equipment and wiring allow systems like computers, smartphones and tablets to
function.

Input and output devices are essential pieces of technology that allow humans to
interact with computers and other information systems. Keyboards, mice,
microphones and scanners are all examples of input devices. And output devices
might include printers, monitors, speakers and sound and video cards.
Pieces of hardware including microprocessors, hard drives, electric power supply
units, and removable storage also allow computers to store and process data.

Software
Software are the intangible programs that manage information system functions,
including input, output, processing and storage.

System software – such as the MacOS or Microsoft Windows operating systems –


provides a base for application software to run.

Application software operates programs geared toward particular uses in information


systems. For example, word processing applications are used to create and edit text
documents. Graphical user interface (GUI) software is among the most common
application softwareExternal link:open_in_new; it presents the information stored in
computers and allows users to interact with computers through digital graphics –
such as icons, buttons and scroll bars – rather than through text-based commands.
Software can be either open source or closed source. Open source software coding
is publicly available for users and programmers to manipulate, whereas closed
source software is proprietary.

Telecommunications
Telecommunications systems connect computer networks and allow information to
be transmitted through them. Telecommunications networks also allow computers
and storage services to access information from the cloud.

There are a number of methods telecommunications networks use to convey


information. Coaxial cables and fiber optic cables are used by telephone, internet
and cable providers to transmit data, video and audio messages.

Local-area networks (LANs) connect computers to create computer networks in a


designated space, like a school or home. Wide-area networks (WANs) are
collections of LANs that facilitate data-sharing across large areas. A virtual private
network (VPN) allows a user to protect their online privacy by encrypting data on
public networks.

Microwaves and radio waves can also be used to transmit information in


telecommunications networks.

Data
Data are intangible, raw facts that are stored, transmitted, analyzed and processed
by other components of information systems. Data are often stored as numerical
facts, and they represent quantitative or qualitative information.

Data can be stored in a database or data warehouse, in a form that best suits the
organization using it.

Databases house collections of data that can be queried or retrieved for specific
purposes. Databases allow users to perform fundamental operations, such as
storage and retrieval. Data warehouses, on the other hand, store data from multiple
sources for analytical purposes. They allow users to assess an organization or its
operations.

Human Resources
Human resources are a crucial part of information systems. The human component
of information systems encompasses the qualified people who influence and
manipulate the data, software and processes in information systems. Humans
involved in information systems may include business analysts, information security
analysts or system analysts.

Business analysts work to elevate an organization’s operations and processes. They


often focus on improving efficiency and productivity or streamlining distribution.
Information security analysts work to prevent data breaches and cybersecurity
attacks. And system analysts use information technology to help organizations
optimize their user experiences with programs.

The Role of Information Systems


Information systems allow users to collect, store, organize and distribute data—
functions that can serve a variety of purposes for companies. Many businesses use
their information systems to manage resources and improve efficiency. And some
rely on information systems to compete in global markets. Huawei researchers found
that in 2016, the digital economy worldwide was worth $11.5 trillion dollars or 15.5%
of the global GDP [PDF, 22.8 MB]External link:open_in_new. By 2025, that number is
projected to grow further, to about 24% of the global GDP.
There are a variety of applications for different types of information systems. For
example, GIS can help researchers track the movement of sea ice, help inform
agricultural decisions, or offer insight into crime patterns. Email software, such as
Microsoft Outlook, is a common type of office automation system that can
automatically sort, prioritize, file and respond to messages. And Apple’s SIRI is a
well-known expert system that works to replicate human decision-making when
prompted by speech from users. From internet browsing to online banking,
information systems are becoming increasingly integrated in daily life.
INTRODUCTION TO DBMS
A database management system (DBMS) refers to the technology for creating and managing
databases. DBMS is a software tool to organize (create, retrieve, update, and manage) data in
a database.

The main aim of a DBMS is to supply a way to store up and retrieve database information
that is both convenient and efficient. By data, we mean known facts that can be recorded and
that have embedded meaning. Usually, people use software such as DBASE IV or V,
Microsoft ACCESS, or EXCEL to store data in the form of a database. A datum is a unit of
data. Meaningful data combined to form information. Hence, information is interpreted data -
data provided with semantics. MS. ACCESS is one of the most common examples of
database management software.

Why Use DBMS?


 To develop software applications In less time.
 Data independence and efficient use of data.
 For uniform data administration.
 For data integrity and security.
 For concurrent access to data, and data recovery from crashes.
 To use user-friendly declarative query language.

Where is a Database Management System (DBMS) being


Used?
 Airlines: reservations, schedules, etc
 Telecom: calls made, customer details, network usage, etc
 Universities: registration, results, grades, etc
 Sales: products, purchases, customers, etc
 Banking: all transactions etc

Advantages of DBMS
A DBMS manages data and has many benefits. These are:

 Data independence: Application programs should be as free or independent as possible


from details of data representation and storage. DBMS can supply an abstract view of the
data for insulating application code from such facts.
 Efficient data access: DBMS utilizes a mixture of sophisticated concepts and techniques for
storing and retrieving data competently. This feature becomes important in cases where the
data is stored on external storage devices.
 Data integrity and security: If data is accessed through the DBMS, the DBMS can enforce
integrity constraints on the data.
 Data administration: When several users share the data, integrating the administration of
data can offer significant improvements. Experienced professionals understand the nature of
the data being managed and can be responsible for organizing the data representation to
reduce redundancy and make the data to retrieve efficiently.

Components of DBMS
 Users: Users may be of any kind such as DB administrator, System developer, or database
users.
 Database application: Database application may be Departmental, Personal, organization's
and / or Internal.
 DBMS: Software that allows users to create and manipulate database access,
 Database: Collection of logical data as a single unit.

DBMS vs. File System


File System Approach
File based systems were an early attempt to computerize the manual system. It is
also called a traditional based approach in which a decentralized approach was taken
where each department stored and controlled its own data with the help of a data
processing specialist. The main role of a data processing specialist was to create the
necessary computer file structures, and also manage the data within structures and
design some application programs that create reports based on file data.

In the above figure:

Consider an example of a student's file system. The student file will contain
information regarding the student (i.e. roll no, student name, course etc.). Similarly,
we have a subject file that contains information about the subject and the result file
which contains the information regarding the result.

Some fields are duplicated in more than one file, which leads to data redundancy. So
to overcome this problem, we need to create a centralized system, i.e. DBMS
approach.

DBMS:
A database approach is a well-organized collection of data that are related in a
meaningful way which can be accessed by different users but stored only once in a
system. The various operations performed by the DBMS system are: Insertion,
deletion, selection, sorting etc.

In the above figure,

In the above figure, duplication of data is reduced due to centralization of data.


There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In DBMS, The file system is a collection of


the user is not required to write the data. In this system, the user has to
procedures. write the procedures for managing
the database.

Sharing of data Due to the centralized approach, data Data is distributed in many files,
sharing is easy. and it may be of different formats,
so it isn't easy to share data.

Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.

Recovery DBMS provides a crash recovery The file system doesn't have a crash
Mechanism mechanism, i.e., DBMS protects the mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store and store and retrieve the data.
retrieve the data.

Concurrency DBMS takes care of Concurrent access In the File system, concurrent
Problems of data using some form of locking. access has many problems like
redirecting the file while deleting
some information or updating
some information.

Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.

Cost The database system is expensive to The file system approach is cheaper
design. to design.

Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by different
Inconsistency redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.

Structure The database structure is complex to The file system approach has a
design. simple structure.

Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.

o Logical Data Independence


o Physical Data Independence

Integrity Integrity Constraints are easy to apply. Integrity Constraints are difficult to
Constraints implement in file system.

Data Models In the database approach, 3 types of In the file system approach, there is
data models exist: no concept of data models exists.

o Hierarchal data models


o Network data models
o Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.
Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

Characteristics of Database Management System

1. Real World Entity

DBMS these days is very realistic and real-world entities are used to design its architecture. Also,
behavior and attributes are used by DBMS. To simplify it we can take an example of an
organization database where employee is an entity and his employee id is an attribute.

2. Self-Describing Nature

Before DBMS, a traditional file management system was used for storing information and data.
There was no concept of definition in traditional file management system like we have in DBMS.
A DBMS should be of Self- Describing nature as it not only contains the database itself but also
the metadata. A metadata (data about data) defines and describes not only the extent, type,
structure, and format of all data but also relationship between data. This data represents itself that
what actions should be taken on it.

3. Support ACID Properties

Any DBMS is able to support ACID (Accuracy, Completeness, Isolation, and Durability)
properties. It is made sure in every DBMS that the real purpose of data should not be lost while
performing transactions like delete, insert, and update. Let us take an example; if an employee’s
name is updated then it should make sure that there is no duplicate data and no mismatch of
employee information.

4. Concurrent Use of Database

There are many chances that many users will be accessing the data at the same time. They may
require altering the database system concurrently. At that time, DBMS supports them to
concurrently use database without any problem. With the help of concurrency, economy of the
system can be increased. For Example, employees of the railway reservation system can book and
access tickets for passengers concurrently. Every employee can see on his own interface that how
many seats are available or bogie is fully booked.

5. Insulation Between Data and Program

Program data independence provides a big relief to database users. In traditional file management
system, structure of data files was defined in the application programs so the user had to change
all the programs that are using that particular data file.
But in DBMS, structure of data files is not stored in the program but it is stored in system
catalogue. With the help of this, internal improvement of data efficiency or any changes in the
data do not have any effect on application software.

6. Transactions

Transactions are bunch of actions that are done to bring database from one consistent state to new
consistent state. Traditional file-based system did not have this feature. Transaction is always
atomic which means it can never be further divided. It can only be completed or uncompleted.
For example, A person wants to credit money from his account to another person’s account. Then
transaction will be complete if he sends money and another guy receives his money. Anything
other than this can lead to an inconsistent transaction.

7. Data Persistence

Persistence means if the data is not removed explicitly then all the data will be maintained in
DBMS. If any system failure happens then life span of data stored in the DBMS will be decided
by the users directly or indirectly. Any data stored in the DBMS can never be lost. If system
failure happens in between any transaction then it will be rolled back or fully completed, but data
will never be at risk.

8. Backup and Recovery

There are many chances of failure of the whole database. At that time no one will be able to get
the database back and for sure company will be in a big loss. The only solution is to take backup
of database and whenever it is needed, it can be stored back. A database must have this
characteristic to enable more effectiveness.

9. Data Integrity

This is one of the most important characteristics of database management system. Integrity
ensures the quality and reliability of database system. It protects unauthorized access to the
database and makes it more secure. It brings only consistent and accurate data into the database.

10. Multiple Views

Users can have multiple views of database depending on their department and interest. DBMS
support multiple views of database to the users. For example, a user of the teaching department
will have different view and user of hostel department will have different. This feature helps
users to have somewhat security because users of other departments cannot access their files.

11. Stores Any Kind of Data

A database management system should be able to store any kind of data. It should not be
restricted to employee name, salary, and address. Any kind of data that exists in the real world
can be stored in DBMS because we need to work with all kinds of data that is present around us.

12. Security

DBMS provides security to the data stored in it because all users have different rights to access
database. Some of the users can access the whole database while other can access a small part of
database. For example, a computer network lecturer can only access files that are related to
computer subjects but HOD of the department can access files of all subjects that are related to
their department.

13. Represents Complex Relationship Between Data

Data stored in a database is connected with each other and a relationship is made in between data.
DBMS should be able to represent the complex relationship between data to make efficient and
accurate use of data.
14. Query Language

Queries are used to retrieve and manipulate data but DBMS is armed by a strong query language
that makes it more effective and efficient. Users have the power to retrieve any kind of data they
want from the database by applying different sets of queries. The file-Based system has not this
luxury of the query language.

15. Cost

The cost of the DBMS is high as compared to the other software and technology available in the
market. But if you consider the long run then DBMS is way far better because its maintenance
cost will be almost nothing.

Data Model
Data Model gives us an idea that how the final system will look like after its
complete implementation. It defines the data elements and the relationships
between the data elements. Data Models are used to show how data is stored,
connected, accessed and updated in the database management system. Here, we
use a set of symbols and text to represent the information so that members of the
organisation can communicate and understand it. Though there are many data
models being used nowadays but the Relational model is the most widely used
model. Apart from the Relational model, there are many other types of data models
about which we will study in details in this blog. Some of the Data Models in DBMS
are:

1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Data Model
5. Object-Relational Data Model
6. Flat Data Model
7. Semi-Structured Data Model
8. Associative Data Model
9. Context Data Model
1.Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises
the data in the hierarchical tree structure. The hierarchy starts from the
root which has root data and then it expands in the form of a tree
adding child node to the parent node. This model easily represents
some of the real-world relationships like food recipes, sitemap of a
website etc. Example: We can represent the relationship between the
shoes present on a shopping website in the following way:

Features of a Hierarchical Model

1. One-to-many relationship: The data here is organised in a


tree-like structure where the one-to-many relationship is between
the datatypes. Also, there can be only one path from parent to any
node. Example: In the above example, if we want to go to the
node sneakers we only have one path to reach there i.e through
men's shoes node.
2. Parent-Child Relationship: Each child node has a parent
node but a parent node can have more than one child node.
Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child
node is automatically deleted.
4. Pointers: Pointers are used to link the parent node with the
child node and are used to navigate between the stored
data. Example: In the above example the 'shoes' node points to
the two other nodes 'women shoes' node and 'men's shoes' node.
Advantages of Hierarchical Model

 It is very simple and fast to traverse through a tree-like structure.


 Any change in the parent node is automatically reflected in the
child node so, the integrity of data is maintained.
Disadvantages of Hierarchical Model

 Complex relationships are not supported.


 As it does not support more than one parent of the child node so if
we have some complex relationship where a child node needs to
have two parent node then that can't be represented using this
model.
 If a parent node is deleted then the child node is automatically
deleted.

2.Network Model
This model is an extension of the hierarchical model. It was the most
popular model before the relational model. This model is the same as
the hierarchical model, the only difference is that a record can have
more than one parent. It replaces the hierarchical tree with a
graph. Example: In the example below we can see that node student
has two parents i.e. CSE Department and Library. This was earlier not
possible in the hierarchical model.
Features of a Network Model

1. Ability to Merge more Relationships: In this model, as


there are more relationships so data is more related. This model
has the ability to manage one-to-one relationships as well as
many-to-many relationships.
2. Many paths: As there are more relationships so there can be
more than one path to the same record. This makes data access
fast and simple.
3. Circular Linked List: The operations on the network model
are done with the help of the circular linked list. The current
position is maintained with the help of a program and this
position navigates through the records according to the
relationship.
Advantages of Network Model

 The data can be accessed faster as compared to the hierarchical


model. This is because the data is more related in the network
model and there can be more than one path to reach a particular
node. So the data can be accessed in many ways.
 As there is a parent-child relationship so data integrity is present.
Any change in parent record is reflected in the child record.
Disadvantages of Network Model

 As more and more relationships need to be handled the system


might get complex. So, a user must be having detailed knowledge
of the model to work with the model.
 Any change like updation, deletion, insertion is very complex.

3.Relational Model
Relational Model is the most widely used model. In this model, the
data is maintained in the form of a two-dimensional table. All the
information is stored in the form of row and columns. The basic
structure of a relational model is tables. So, the tables are also
called relations in the relational model. Example: In this
example, we have an Employee table.
Features of Relational Model

 Tuples: Each row in the table is called tuple. A row contains


all the information about any instance of the object. In the
above example, each row has all the information about any
specific individual like the first row has information about
John.
 Attribute or field: Attributes are the property which
defines the table or relation. The values of the attribute
should be from the same domain. In the above example, we
have different attributes of the employee like Salary,
Mobile_no, etc.
Advnatages of Relational Model

 Simple: This model is more simple as compared to the


network and hierarchical model.
 Scalable: This model can be easily scaled as we can add as
many rows and columns we want.
 Structural Independence: We can make changes in
database structure without changing the way to access the
data. When we can make changes to the database structure
without affecting the capability to DBMS to access the data
we can say that structural independence has been achieved.
Disadvantages of Relatinal Model

 Hardware Overheads: For hiding the complexities and


making things easier for the user this model requires more
powerful hardware computers and data storage devices.
 Bad Design: As the relational model is very easy to design
and use. So the users don't need to know how the data is
stored in order to access it. This ease of design can lead to the
development of a poor database which would slow down if
the database grows.
But all these disadvantages are minor as compared to the
advantages of the relational model. These problems can be avoided
with the help of proper implementation and organisation.

4.Object-Oriented Data Model


The real-world problems are more closely represented through the
object-oriented data model. In this model, both the data and
relationship are present in a single structure known as an object. We
can store audio, video, images, etc in the database which was not
possible in the relational model(although you can store audio and video
in relational database, it is adviced not to store in the relational
database). In this model, two are more objects are connected through
links. We use this link to relate one object to other objects. This can be
understood by the example given below.

In the above example, we have two objects Employee and Department.


All the data and relationships of each object are contained as a single
unit. The attributes like Name, Job_title of the employee and the
methods which will be performed by that object are stored as a single
object. The two objects are connected through a common attribute i.e the
Department_id and the communication between these two will be done
with the help of this common id.
Components of Object-Oriented Data Model:
The OODBMS is based on three major components, namely: Object
structure, Object classes, and Object identity. These are explained below.
1. Object Structure:
The structure of an object refers to the properties that an object is made up
of. These properties of an object are referred to as an attribute. Thus, an
object is a real-world entity with certain attributes that makes up the object
structure. Also, an object encapsulates the data code into a single unit which
in turn provides data abstraction by hiding the implementation details from
the user.
The object structure is further composed of three types of components:
Messages, Methods, and Variables. These are explained below.
1. Messages –
A message provides an interface or acts as a communication medium
between an object and the outside world. A message can be of two
types:
 Read-only message: If the invoked method does not change the
value of a variable, then the invoking message is said to be a read-
only message.
 Update message: If the invoked method changes the value of a
variable, then the invoking message is said to be an update message.

2. Methods –
When a message is passed then the body of code that is executed is
known as a method. Whenever a method is executed, it returns a value
as output. A method can be of two types:
 Read-only method: When the value of a variable is not affected by a
method, then it is known as the read-only method.
 Update-method: When the value of a variable change by a method,
then it is known as an update method.

3. Variables –
It stores the data of an object. The data stored in the variables makes the
object distinguishable from one another.

2. Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we
need to define a class and then the objects are made which differ in the
values they store but share the same class definition. The objects in turn
correspond to various messages and variables stored in them.

Coding example:
// Create a Car class with some attributes
class Car {
public:
string brand;
string model;
int year;
};

int main() {
// Create an object of Car
Car carObj1;
carObj1.brand = "BMW";
carObj1.model = "X5";
carObj1.year = 1999;

// Create another object of Car


Car carObj2;
carObj2.brand = "Ford";
carObj2.model = "Mustang";
carObj2.year = 1969;

// Print attribute values


cout << carObj1.brand << " " << carObj1.model << " " <<
carObj1.year << "\n";
cout << carObj2.brand << " " << carObj2.model << " " <<
carObj2.year << "\n";
return 0;
}

5.Object-Relational Model
An Object relational model is a combination of a Object oriented database model
and a Relational database model. So, it supports objects, classes, inheritance etc. just
like Object Oriented models and has support for data types, tabular structures etc.
like Relational data model.
One of the major goals of Object relational data model is to close the gap between
relational databases and the object oriented practises frequently used in many
programming languages such as C++, C#, Java etc.

History of Object Relational Data Model


Both Relational data models and Object oriented data models are very useful. But it
was felt that they both were lacking in some characteristics and so work was started
to build a model that was a combination of them both. Hence, Object relational data
model was created as a result of research that was carried out in the 1990’s.

Advantages of Object Relational model


The advantages of the Object Relational model are −
Inheritance
The Object Relational data model allows its users to inherit objects, tables etc. so
that they can extend their functionality. Inherited objects contains new attributes as
well as the attributes that were inherited.
Complex Data Types
Complex data types can be formed using existing data types. This is useful in Object
relational data model as complex data types allow better manipulation of the data.
Extensibility
The functionality of the system can be extended in Object relational data model. This
can be achieved using complex data types as well as advanced concepts of object
oriented model such as inheritance.

Disadvantages of Object Relational model


The object relational data model can get quite complicated and difficult to handle at
times as it is a combination of the Object oriented data model and Relational data
model and utilizes the functionalities of both of them.

6.Flat Data Model


A flat data model is one in which all of the data is kept in the same plane. It’s a
standard data model that’s been introduced first and foremost, but it’s not really
scientific.
In the flat database model, each data element is represented by a single two-
dimensional array, with columns having similar types of values and components in
the row having relational values.
The database is represented in this model as a single table with fields (columns) and
tuples (rows). This database management system (DBMS) data model is a slow and
inefficient procedure.

Example

There are two columns entitled name and password, for example, that
can be used by any given security system. As a result, each row is utilised
to record various accounts and passwords. In the flat model, no two
entries are the same. The table format is used to store the database in the
flat model. Because it is hard to manage such a vast collection of entries
in the case of a flat database model, this database model has the problem
of being unable to store big chunks of data in the 2D array.
7.Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We
cannot differentiate between data and schema in this model. This model
can primarily be used to represent data from some data sources that
aren’t bound or constrained by the schema. Example: Web-Based data
sources which we can't differentiate between the schema and data of the
website. In this model, some entities may have missing attributes while
others may have an extra attribute. This model gives flexibility in storing
the data. It also gives flexibility to the attributes. Example: If we are
storing any value in any attribute then that value can be either atomic
value or a collection of values.

8.Associative Data Model


Associative Data Model is a model in which the data is divided into two
parts. Everything which has independent existence is called as
an entity and the relationship among these entities are
called association. The data divided into two parts are called items and
links.

 Item: Items contain the name and the identifier(some numeric


value).
 Links: Links contain the identifier, source, verb and subject.
Example: Let us say we have a statement "The world cup is being
hosted by London from 30 May 2020". In this data two links need to be
stored:

1. The world cup is being hosted by London. The source here is 'the
world cup', the verb 'is being' and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb
is 'from' and the target is '30 May 2020'.
This is represented using the table as follows:
9.Context Data Model
Context Data Model is a collection of several models. This consists of
models like network model, relational models etc. Using this model we
can do various types of tasks which are not possible using any model
alone.

File Organization in DBMS


A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored in
form of tables, but in actual this huge amount of data is stored in physical memory in
form of files.
File – A file is named collection of related information that is recorded on secondary
storage such as magnetic disks, magnetic tapes and optical disks.
What is File Organization?
File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and access to
any specific record. In simple terms, Storing the files in certain order is called file
Organization. File Structure refers to the format of the label and data blocks and of
any logical control record.
Types of File Organizations –

Various methods have been introduced to Organize files. These particular methods
have advantages and disadvantages on the basis of access or selection . Thus it is all
upon the programmer to decide the best suited file Organization method according to
his requirements.
Some types of File Organizations are :

 Sequential File Organization


 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization

We will be discussing each of the file Organizations in further sets of this article along
with differences and advantages/ disadvantages of each file Organization methods.

Sequential File Organization –

The easiest method for file Organization is Sequential method. In this method the file
are stored one after another in a sequential manner. There are two ways to implement
this method:
 Pile File Method – This method is quite simple, in which we store the records in a
sequence i.e one after other in the order in which they are inserted into the tables.

1. Insertion of new record –


Let the R1, R3 and so on upto R5 and R4 be four records in the sequence. Here,
records are nothing but a row in any table. Suppose a new record R2 has to be
inserted in the sequence, then it is simply placed at the end of the file.
 Sorted File Method –In this method, As the name itself suggest whenever a new
record has to be inserted, it is always inserted in a sorted (ascending or
descending) manner. Sorting of records may be based on any primary key or any
other key.

1. Insertion of new record –


Let us assume that there is a preexisting sorted sequence of four records R1, R3,
and so on upto R7 and R8. Suppose a new record R2 has to be inserted in the
sequence, then it will be inserted at the end of the file and then it will sort the
sequence .
Pros and Cons of Sequential File Organization –
Pros –
 Fast and efficient method for huge amount of data.
 Simple design.
 Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Cons –
 Time wastage as we cannot jump on a particular record that is required, but we
have to move in a sequential manner which takes our time.
 Sorted file method is inefficient as it takes time and space for sorting records.

Heap File Organization –

Heap File Organization works with data blocks. In this method records are inserted at
the end of the file, into the data blocks. No Sorting or Ordering is required in this
method. If a data block is full, the new record is stored in some other block, Here the
other data block need not be the very next data block, but it can be any block in the
memory. It is the responsibility of DBMS to store and manage the new records.

Insertion of new record –


Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a new
record R2 has to be inserted in the heap then, since the last data block i.e data block 3
is full it will be inserted in any of the data blocks selected by the DBMS, lets say data
block 1.

If we want to search, delete or update data in heap file Organization the we will
traverse the data from the beginning of the file till we get the requested record. Thus if
the database is very huge, searching, deleting or updating the record will take a lot of
time.
Pros and Cons of Heap File Organization –
Pros –
 Fetching and retrieving records is faster than sequential record but only in case of
small databases.
 When there is a huge number of data needs to be loaded into the database at a time,
then this method of file Organization is best suited.
Cons –
 Problem of unused memory blocks.
 Inefficient for larger databases.

In a database management system, When we want to retrieve a particular


data, It becomes very inefficient to search all the index values and reach the
desired data. In this situation, Hashing technique comes into picture.
Hashing is an efficient technique to directly search the location of desired
data on the disk without using index structure. Data is stored at the data
blocks whose address is generated by using hash function. The memory
location where these records are stored is called as data block or data
bucket.
Prerequisite - Hashing Data Structure

Hash File Organization:


 Data bucket – Data buckets are the memory locations where the records
are stored. These buckets are also considered as Unit Of Storage.
 Hash Function – Hash function is a mapping function that maps all the
set of search keys to actual record address. Generally, hash function
uses the primary key to generate the hash index – address of the data
block. Hash function can be simple mathematical function to any complex
mathematical function.
 Hash Index-The prefix of an entire hash value is taken as a hash index.
Every hash index has a depth value to signify how many bits are used for
computing a hash function. These bits can address 2n buckets. When all
these bits are consumed ? then the depth value is increased linearly and
twice the buckets are allocated.

Static Hashing:

In static hashing, when a search-key value is provided, the hash function


always computes the same address. For example, if we want to generate an
address for STUDENT_ID = 104 using mod (5) hash function, it always
results in the same bucket address 4. There will not be any changes to the
bucket address here. Hence a number of data buckets in the memory for this
static hashing remain constant throughout.
Operations:
 Insertion – When a new record is inserted into the table, The hash
function h generates a bucket address for the new record based on its
hash key K. Bucket address = h(K)
 Searching – When a record needs to be searched, The same hash
function is used to retrieve the bucket address for the record. For
Example, if we want to retrieve the whole record for ID 104, and if the
hash function is mod (5) on that ID, the bucket address generated would
be 4. Then we will directly got to address 4 and retrieve the whole record
for ID 104. Here ID acts as a hash key.
 Deletion – If we want to delete a record, Using the hash function we will
first fetch the record which is supposed to be deleted. Then we will
remove the records for that address in memory.
 Updation – The data record that needs to be updated is first searched
using hash function, and then the data record is updated.
Now, If we want to insert some new records into the file But the data bucket
address generated by the hash function is not empty or the data already
exists in that address. This becomes a critical situation to handle. This
situation in the static hashing is called bucket overflow. How will we insert
data in this case? There are several methods provided to overcome this
situation. Some commonly used methods are discussed below:
1. Open Hashing – In Open hashing method, next available data block is
used to enter the new record, instead of overwriting the older one. This
method is also called linear probing. For example, D3 is a new record
that needs to be inserted, the hash function generates the address as
105. But it is already full. So the system searches next available data
bucket, 123 and assigns D3 to

it.
2. Closed hashing – In Closed hashing method, a new data bucket is
allocated with same address and is linked it after the full data bucket. This
method is also known as overflow chaining. For example, we have to
insert a new record D3 into the tables. The static hash function generates
the data bucket address as 105. But this bucket is full to store the new
data. In this case is a new data bucket is added at the end of 105 data
bucket and is linked to it. Then new record D3 is inserted into the new
bucket.

 Quadratic probing : Quadratic probing is very much similar to open


hashing or linear probing. Here, The only difference between old and
new bucket is linear. Quadratic function is used to determine the new
bucket address.
 Double Hashing : Double Hashing is another method similar to linear
probing. Here the difference is fixed as in linear probing, but this fixed
difference is calculated by using another hash function. That’s why the
name is double hashing.

Dynamic Hashing –

The drawback of static hashing is that that it does not expand or shrink
dynamically as the size of the database grows or shrinks. In Dynamic
hashing, data buckets grows or shrinks (added or removed dynamically) as
the records increases or decreases. Dynamic hashing is also known
as extended hashing. In dynamic hashing, the hash function is made to
produce a large number of values. For Example, there are three data records
D1, D2 and D3 . The hash function generates three addresses 1001, 0101
and 1010 respectively. This method of storing considers only part of this
address – especially only first one bit to store the data. So it tries to load
three of them at address 0 and 1.

But the problem is that No bucket address is remaining for D3. The bucket
has to grow dynamically to accommodate D3. So it changes the address
have 2 bits rather than 1 bit, and then it updates the existing data to have 2
bit address. Then it tries to accommodate D3.
B+ Tree File Organization –

B+ Tree, as the name suggests, It uses a tree like structure to store records in File. It
uses the concept of Key indexing where the primary key is used to sort the records.
For each primary key, an index value is generated and mapped with the record. An
index of a record is the address of record in the file.
B+ Tree is very much similar to binary search tree, with the only difference that
instead of just two children, it can have more than two. All the information is stored in
leaf node and the intermediate nodes acts as pointer to the leaf nodes. The information
in leaf nodes always remain a sorted sequential linked list.
In the above diagram 56 is the root node which is also called the main node of the
tree.
The intermediate nodes here, just consist the address of leaf nodes. They do not
contain any actual record. Leaf nodes consist of the actual record. All leaf nodes are
balanced.
Pros and Cons of B+ Tree File Organization –

Pros –

 Tree traversal is easier and faster.


 Searching becomes easy as all records are stored only in leaf nodes and are sorted
sequential linked list.
 There is no restriction on B+ tree size. It may grows/shrink as the size of data
increases/decreases.
Cons –
 Inefficient for static tables.

Cluster File Organization –

In cluster file organization, two or more related tables/records are stored within same
file known as clusters. These files will have two or more tables in the same data block
and the key attributes which are used to map these table together are stored only once.
Thus it lowers the cost of searching and retrieving various records in different files as
they are now combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table
are related to each other.

Therefore these table are allowed to combine using a join operation and can be seen in
a cluster file.
If we have to insert, update or delete any record we can directly do so. Data is sorted
based on the primary key or the key with which searching is done. Cluster key is the
key with which joining of the table is performed.
Types of Cluster File Organization – There are two ways to implement this
method:
1. Indexed Clusters –
In Indexed clustering the records are group based on the cluster key and stored
together. The above mentioned example of the Employee and Department
relationship is an example of Indexed Cluster where the records are based on the
Department ID.
2. Hash Clusters –
This is very much similar to indexed cluster with only difference that instead of
storing the records based on cluster key, we generate hash key value and store the
records with same hash key value.

B-Tree is a self-balancing search tree. In most of the other self-balancing


search trees (like AVL and Red-Black Trees), it is assumed that everything is
in main memory. To understand the use of B-Trees, we must think of the
huge amount of data that cannot fit in main memory. When the number of
keys is high, the data is read from disk in the form of blocks. Disk access
time is very high compared to the main memory access time. The main idea
of using B-Trees is to reduce the number of disk accesses. Most of the tree
operations (search, insert, delete, max, min, ..etc ) require O(h) disk
accesses where h is the height of the tree. B-tree is a fat tree. The height of
B-Trees is kept low by putting maximum possible keys in a B-Tree node.
Generally, the B-Tree node size is kept equal to the disk block size. Since
the height of the B-tree is low so total disk accesses for most of the
operations are reduced significantly compared to balanced Binary Search
Trees like AVL Tree, Red-Black Tree, ..etc.
Time Complexity of B-Tree:

Sr. No. Algorithm Time Complexity

1. Search O(log n)

2. Insert O(log n)

3. Delete O(log n)

“n” is the total number of elements in the B-tree.


Properties of B-Tree:

1. All leaves are at the same level.


2. A B-Tree is defined by the term minimum degree ‘t’. The value of t
depends upon disk block size.
3. Every node except root must contain at least t-1 keys. The root may
contain minimum 1 key.
4. All nodes (including root) may contain at most 2*t – 1 keys.
5. Number of children of a node is equal to the number of keys in it plus 1.
6. All keys of a node are sorted in increasing order. The child between two
keys k1 and k2 contains all keys in the range from k1 and k2.
7. B-Tree grows and shrinks from the root which is unlike Binary Search
Tree. Binary Search Trees grow downward and also shrink from
downward.
8. Like other balanced Binary Search Trees, time complexity to search,
insert and delete is O(log n).
9. Insertion of a Node in B-Tree happens only at Leaf Node.
Following is an example of B-Tree of minimum order 5. Note that in practical
B-Trees, the value of the minimum order is much more than 5.
We can see in the above diagram that all the leaf nodes are at the same
level and all non-leaf have no empty sub-tree and have keys one less than
the number of their children.
Interesting Facts:
1. The minimum height of the B-Tree that can exist with n number of nodes
and m is the maximum number of children of a node can have
is:
2. The maximum height of the B-Tree that can exist with n number of nodes
and t is the minimum number of children that a non-root node can have

is: and
Traversal in B-Tree:
Traversal is also similar to Inorder traversal of Binary Tree. We start from the
leftmost child, recursively print the leftmost child, then repeat the same
process for remaining children and keys. In the end, recursively print the
rightmost child.
Search Operation in B-Tree:
Search is similar to the search in Binary Search Tree. Let the key to be
searched be k. We start from the root and recursively traverse down. For
every visited non-leaf node, if the node has the key, we simply return the
node. Otherwise, we recur down to the appropriate child (The child which is
just before the first greater key) of the node. If we reach a leaf node and
don’t find k in the leaf node, we return NULL.
Logic:
Searching a B-Tree is similar to searching a binary tree. The algorithm is
similar and goes with recursion. At each level, the search is optimized as if
the key value is not present in the range of parent then the key is present in
another branch. As these values limit the search they are also known as
limiting value or separation value. If we reach a leaf node and don’t find the
desired key then it will display NULL.
Algorithm for Searching an Element:-
BtreeSearch(x, k)
i = 1

while i ≤ n[x] and k ≥ keyi[x] // n[x] means number of


keys in x node
do i = i + 1

if i n[x] and k = keyi[x]


then return (x, i)

if leaf [x]
then return NIL

else
return BtreeSearch(ci[x], k)
Example: Searching 120 in the given B-Tree.

Solution:
In this example, we can see that our search was reduced by just limiting the
chances where the key containing the value could be present. Similarly if
within the above example we’ve to look for 180, then the control will stop at
step 2 because the program will find that the key 180 is present within the
current node. And similarly, if it’s to seek out 90 then as 90 < 100 so it’ll go to
the left subtree automatically and therefore the control flow will go similarly
as shown within the above example.

 C++
 Java
 C#
 Javascript

// C++ implementation of search() and traverse() methods

#include<iostream>

using namespace std;

// A BTree node

class BTreeNode

int *keys; // An array of keys

int t; // Minimum degree (defines the range for number of keys)


BTreeNode **C; // An array of child pointers

int n; // Current number of keys

bool leaf; // Is true when node is leaf. Otherwise false

public:

BTreeNode(int _t, bool _leaf); // Constructor

// A function to traverse all nodes in a subtree rooted with this


node

void traverse();

// A function to search a key in the subtree rooted with this


node.

BTreeNode *search(int k); // returns NULL if k is not present.

// Make the BTree friend of this so that we can access private members
of this

// class in BTree functions

friend class BTree;

};

// A BTree

class BTree

{
BTreeNode *root; // Pointer to root node

int t; // Minimum degree

public:

// Constructor (Initializes tree as empty)

BTree(int _t)

{ root = NULL; t = _t; }

// function to traverse the tree

void traverse()

{ if (root != NULL) root->traverse(); }

// function to search a key in this tree

BTreeNode* search(int k)

{ return (root == NULL)? NULL : root->search(k); }

};

// Constructor for BTreeNode class

BTreeNode::BTreeNode(int _t, bool _leaf)

// Copy the given minimum degree and leaf property

t = _t;

leaf = _leaf;
// Allocate memory for maximum number of possible keys

// and child pointers

keys = new int[2*t-1];

C = new BTreeNode *[2*t];

// Initialize the number of keys as 0

n = 0;

// Function to traverse all nodes in a subtree rooted with this node

void BTreeNode::traverse()

// There are n keys and n+1 children, traverse through n keys

// and first n children

int i;

for (i = 0; i < n; i++)

// If this is not leaf, then before printing key[i],

// traverse the subtree rooted with child C[i].

if (leaf == false)

C[i]->traverse();
cout << " " << keys[i];

// Print the subtree rooted with last child

if (leaf == false)

C[i]->traverse();

// Function to search key k in subtree rooted with this node

BTreeNode *BTreeNode::search(int k)

// Find the first key greater than or equal to k

int i = 0;

while (i < n && k > keys[i])

i++;

// If the found key is equal to k, return this node

if (keys[i] == k)

return this;

// If the key is not found here and this is a leaf node

if (leaf == true)
return NULL;

// Go to the appropriate child

return C[i]->search(k);

The above code doesn’t contain the driver program. We will be covering the
complete program in our next post on B-Tree Insertion.
There are two conventions to define a B-Tree, one is to define by minimum
degree (followed in Cormen book), second is define by order. We have
followed the minimum degree convention and will be following same in
coming posts on B-Tree. The variable names used in the above program are
also kept same as Cormen book for better readability.
Applications of B-Trees:
 It is used in large databases to access data stored in the disk
 Searching of data in a data set can be achieved in significantly less time
using B tree
 With the indexing feature multilevel indexing can be achieved.
 Most of the servers also use B-tree approach.
Insertion and Deletion
B-Tree Insertion
B-Tree Deletion
References:
Introduction to Algorithms 3rd Edition by Clifford Stein, Thomas H. Cormen,
Charles E. Leiserson, Ronald L. Rivest
Please write comments if you find anything incorrect, or you want to share
more information about the topic discussed above.

Indexed sequential access method (ISAM)


ISAM method is an advanced sequential file organization. In this method, records are
stored in the file using the primary key. An index value is generated for each primary
key and mapped with the record. This index contains the address of the record in the
file.
If any record has to be retrieved based on its index value, then the address of the
data block is fetched and the record is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index
is based on the primary key values, we can retrieve the data for the given range of
value. In the same way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
o When the record is deleted, then the space used by it needs to be released.
Otherwise, the performance of the database will slow down.
File Access Methods in Operating
System
When a file is used, information is read and accessed into computer memory and there
are several ways to access this information of the file. Some systems provide only one
access method for files. Other systems, such as those of IBM, support many access
methods, and choosing the right one for a particular application is a major design
problem.
There are three ways to access a file into a computer system: Sequential-Access,
Direct Access, Index sequential Method.
1. Sequential Access –
It is the simplest access method. Information in the file is processed in order, one
record after the other. This mode of access is by far the most common; for
example, editor and compiler usually access the file in this fashion.
Read and write make up the bulk of the operation on a file. A read operation -read
next- read the next position of the file and automatically advance a file pointer,
which keeps track I/O location. Similarly, for the -write next- append to the end of
the file and advance to the newly written material.
Key points:
 Data is accessed one record right after another record in an order.
 When we use read command, it move ahead pointer by one
 When we use write command, it will allocate memory and move the pointer to
the end of the file
 Such a method is reasonable for tape.

2. Direct Access –
Another method is direct access method also known as relative access method. A
filed-length logical record that allows the program to read and write record rapidly.
in no particular order. The direct access is based on the disk model of a file since
disk allows random access to any file block. For direct access, the file is viewed as
a numbered sequence of block or record. Thus, we may read block 14 then block
59, and then we can write block 17. There is no restriction on the order of reading
and writing for a direct access file.
A block number provided by the user to the operating system is normally a relative
block number, the first relative block of the file is 0 and then 1 and so on.

3. Index sequential method –


It is the other method of accessing a file that is built on the top of the sequential
access method. These methods construct an index for the file. The index, like an
index in the back of a book, contains the pointer to the various blocks. To find a
record in the file, we first search the index, and then by the help of pointer we
access the file directly.
Key points:
 It is built on top of Sequential access.
 It control the pointer by using index.

File Allocation Methods


The allocation methods define how the files are stored in the disk blocks. There are
three main disk space or file allocation methods.
 Contiguous Allocation
 Linked Allocation
 Indexed Allocation
The main idea behind these methods is to provide:
 Efficient disk space utilization.
 Fast access to the file blocks.
All the three methods have their own advantages and disadvantages as discussed
below:
1. Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk. For example,
if a file requires n blocks and is given a block b as the starting location, then the
blocks assigned to the file will be: b, b+1, b+2,……b+n-1. This means that given the
starting block address and the length of the file (in terms of blocks required), we can
determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
 Address of starting block
 Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks.
Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
 Both the Sequential and Direct Accesses are supported by this. For direct access,
the address of the kth block of the file which starts at block b can easily be
obtained as (b+k).
 This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
 This method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
 Increasing file size is difficult because it depends on the availability of contiguous
memory at a particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous.
The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each
block contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The
last block (25) contains -1 indicating a null pointer and does not point to any other
block.

Advantages:
 This is very flexible in terms of file size. File size can be increased easily since the
system does not have to look for a contiguous chunk of memory.
 This method does not suffer from external fragmentation. This makes it relatively
better in terms of memory utilization.
Disadvantages:
 Because the file blocks are distributed randomly on the disk, a large number of
seeks are needed to access every block individually. This makes linked allocation
slower.
 It does not support random or direct access. We can not directly access the blocks
of a file. A block k of a file can be accessed by traversing k blocks sequentially
(sequential access ) from the starting block of the file via block pointers.
 Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all
the blocks occupied by a file. Each file has its own index block. The ith entry in the
index block contains the disk address of the ith file block. The directory entry contains
the address of the index block as shown in the image:
Advantages:
 This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
 It overcomes the problem of external fragmentation.
Disadvantages:
 The pointer overhead for indexed allocation is greater than linked allocation.
 For very small files, say files that expand only 2-3 blocks, the indexed allocation
would keep one entire block (index block) for the pointers which is inefficient in
terms of memory utilization. However, in linked allocation we lose the space of
only 1 pointer per block.
For files that are very large, single index block may not be able to hold all the
pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks together for holding
the pointers. Every index block would then contain a pointer or the address to the
next index block.
2. Multilevel index: In this policy, a first level index block is used to point to the
second level index blocks which inturn points to the disk blocks occupied by the
file. This can be extended to 3 or more levels depending on the maximum file size.
3. Combined Scheme: In this scheme, a special block called the Inode (information
Node) contains all the information about the file such as the name, size, authority,
etc and the remaining space of Inode is used to store the Disk Block addresses
which contain the actual file as shown in the image below. The first few of these
pointers in Inode point to the direct blocks i.e the pointers contain the addresses of
the disk blocks that contain data of the file. The next few pointers point to indirect
blocks. Indirect blocks may be single indirect, double indirect or triple
indirect. Single Indirect block is the disk block that does not contain the file data
but the disk address of the blocks that contain the file data. Similarly, double
indirect blocks do not contain the file data but the disk address of the blocks that
contain the address of the blocks containing the file data.

This article is contributed by Saloni Baweja. If you like GeeksforGeeks and would
like to contribute, you can also write an article using contribute.geeksforgeeks.org or
mail your article to contribute@geeksforgeeks.org. See your article appearing on the
GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more
information about the topic discussed above.

DBMS Architecture

o The DBMS design depends upon its architecture. The basic client/server architecture
is used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.

Types of DBMS Architecture


Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture

o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.

2-Tier Architecture

o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,


applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection with
the server side.
Fig: 2-tier Architecture

3-Tier Architecture

o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
Fig: 3-tier Architecture

Three schema Architecture


o The three schema architecture is also called ANSI/SPARC architecture or three-level
architecture.
o This framework is used to describe the structure of a specific database system.
o The three schema architecture is also used to separate the user applications and
physical database.
o The three schema architecture contains three-levels. It breaks the database down into
three different categories.

The three-schema architecture is as follows:


In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between various database
levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the request from
external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from the conceptual
to internal level.

Objectives of Three schema Architecture


The main objective of three level architecture is to enable multiple users to access
the same data with a personalized view while storing the underlying data only once.
Thus it separates the user's view from the physical structure of the database. This
separation is desirable for the following reasons:

o Different users need different views of the same data.


o The approach in which a particular user needs to see the data may change over time.
o The users of the database should not worry about the physical implementation and
internal workings of the database such as data compression and encryption
techniques, hashing, optimization of the internal structures etc.
o All users should be able to access the same data according to their requirements.
o DBA should be able to change the conceptual structure of the database without
affecting the user's
o Internal structure of the database should be unaffected by changes to physical
aspects of the storage.

1. Internal Level

o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored in
a block.
o The physical level is used to describe complex low-level data structures in detail.

The internal level is generally is concerned with the following activities:

o Storage space allocations.


For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary keys, indexes, pointers and
sequencing.
o Data compression and encryption techniques.
o Optimization of internal structures.
o Representation of stored fields.
2. Conceptual Level

o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.

3. External Level

o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.

Mapping between Views


The three levels of DBMS architecture don't exist independently of each other. There
must be correspondence between the three levels i.e. how they actually correspond
with each other. DBMS is responsible for correspondence between the three types of
schema. This correspondence is called Mapping.
There are basically two types of mapping in the database architecture:

o Conceptual/ Internal Mapping


o External / Conceptual Mapping

Conceptual/ Internal Mapping

The Conceptual/ Internal Mapping lies between the conceptual level and the internal
level. Its role is to define the correspondence between the records and fields of the
conceptual level and files and data structures of the internal level.

External/ Conceptual Mapping

The external/Conceptual Mapping lies between the external level and the Conceptual
level. Its role is to define the correspondence between a particular external and the
conceptual view.

Components of DBMS
DBMS stands for DataBase Management System. DBMS is a type of software by
which we can save and retrieve the user's data with the security process. DBMS can
manipulate the database with the help of a group of programs. The DBMS can accept
the request from the operating system to supply the data. The DBMS also can accept
the request to retrieve a large amount of data through the user and third-party
software.

DBMS also give permission to the user to use the data according to their needs. The
word "DBMS" contains information regarding the database program and the users. It
also provides an interface between the user and the software. In this topic, we are
going to discuss the various types of DBMS.

Components of DBMS
There are many components available in the DBMS. Each component has a
significant task in the DBMS. A database environment is a collection of components
that regulates the use of data, management, and a group of data. These components
consist of people, the technique of Handel the database, data, hardware, software,
etc. there are several components available for the DBMS. We are going to explain
five main topics of the database below.
1. Hardware

o Here the hardware means the physical part of the DBMS. Here the hardware includes
output devices like a printer, monitor, etc., and storage devices like a hard disk.
o In DBMS, information hardware is the most important visible part. The equipment
which is used for the visibility of the data is the printer, computer, scanner, etc. This
equipment is used to capture the data and present the output to the user.
o With the help of hardware, the DBMS can access and update the database.
o The server can store a large amount of data, which can be shared with the help of the
user's own system.
o The database can be run in any system that ranges from microcomputers to
mainframe computers. And this database also provides an interface between the real
worlds to the database.
o When we try to run any database software like MySQL, we can type any commands
with the help of our keyboards, and RAM, ROM, and processor are part of our
computer system.

2. Software

o Software is the main component of the DBMS.


o Software is defined as the collection of programs that are used to instruct the
computer about its work. The software consists of a set of procedures, programs, and
routines associated with the computer system's operation and performance. Also, we
can say that computer software is a set of instructions that is used to instruct the
computer hardware for the operation of the computers.
o The software includes so many software like network software and operating
software. The database software is used to access the database, and the database
application performs the task.
o This software has the ability to understand the database accessing language and then
convert these languages to real database commands and then execute the database.
o This is the main component as the total database operation works on a software or
application. We can also be called as database software the wrapper of the whole
physical database, which provides an easy interface for the user to store, update and
delete the data from the database.
o Some examples of DBMS software include MySQL, Oracle, SQL Server, dBase,
FileMaker, Clipper, Foxpro, Microsoft Access, etc.

3. Data

o The term data means the collection of any raw fact stored in the database. Here the
data are any type of raw material from which meaningful information is generated.
o The database can store any form of data, such as structural data, non-structural data,
and logical data.
o The structured data are highly specific in the database and have a structured format.
But in the case of non-structural data, it is a collection of different types of data, and
these data are stored in their native format.
o We also call the database the structure of the DBMS. With the help of the database,
we can create and construct the DBMS. After the creation of the database, we can
create, access, and update that database.
o The main reason behind discovering the database is to create and manage the data
within the database.
o Data is the most important part of the DBMS. Here the database contains the actual
data and metadata. Here metadata means data about data.
o For example, when the user stores the data in a database, some data, such as the size
of the data, the name of the data, and some data related to the user, are stored
within the database. These data are called metadata.

4. Procedures

o The procedure is a type of general instruction or guidelines for the use of DBMS. This
instruction includes how to set up the database, how to install the database, how to
log in and log out of the database, how to manage the database, how to take a
backup of the database, and how to generate the report of the database.
o In DBMS, with the help of procedure, we can validate the data, control the access and
reduce the traffic between the server and the clients. The DBMS can offer better
performance to extensive or complex business logic when the user follows all the
procedures correctly.
o The main purpose of the procedure is to guide the user during the management and
operation of the database.
o The procedure of the databases is so similar to the function of the database. The
major difference between the database procedure and database function is that the
database function acts the same as the SQL statement. In contrast, the database
procedure is invoked using the CALL statement of the DBMS.
o Database procedures can be created in two ways in enterprise architecture. These
two ways are as below.
o The individual object or the default object.
o The operations in a container.

1. CREATE [OR REPLACE] PROCEDURE procedure_name (<Argument> {IN, OUT, IN OUT}


2. <Datatype>,...)
3. IS
4. Declaration section<variable, constant> ;
5. BEGIN
6. Execution section
7. EXCEPTION
8. Exception section
9. END

5. Database Access Language

o Database Access Language is a simple language that allows users to write commands
to perform the desired operations on the data that is stored in the database.
o Database Access Language is a language used to write commands to access, upsert,
and delete data stored in a database.
o Users can write commands or query the database using Database Access Language
before submitting them to the database for execution.
o Through utilizing the language, users can create new databases and tables, insert
data and delete data.
o Examples of database languages are SQL (structured query language), My Access,
Oracle, etc. A database language is comprised of two languages.

Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.

There are two types of data independence:

1. Logical Data Independence


o Logical data independence refers characteristic of being able to change the
conceptual schema without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual
view.
o If we do any changes in the conceptual view of the data, then the user view of the
data would not be affected.
o Logical data independence occurs at the user interface level.

2. Physical Data Independence


o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal
levels.
o Physical data independence occurs at the logical interface level.

Fig: Data Independence

ER (Entity Relationship) Diagram in DBMS


o ER model stands for an Entity-Relationship model. It is a high-level data model. This
model is used to define the data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
o In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

For example, Suppose we design a school database. In this database, the student
will be an entity with attributes like address, name, id, age, etc. The address can be
another entity with attributes like city, street name, pin code, etc and there will be a
relationship between them.

Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can
be represented as rectangles.

Consider an organization as an example- manager, product, employee, department


etc. can be taken as an entity.

a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a
double rectangle.

2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It


represents a primary key. The key attribute is represented by an ellipse with the text
underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite


attribute. The composite attribute is represented by an ellipse, and those ellipses are
connected with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.
d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute.
It can be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another
attribute like Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus
is used to represent the relationship.
Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is
known as one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then this is known as a one-to-
many relationship.

For example, Scientist can invent many inventions, but the invention is done by the
only specific scientist.

c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an
entity on the right associates with the relationship then it is known as a many-to-one
relationship.

For example, Student enrolls for only one course, but a course can have many
students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then it is known as a many-
to-many relationship.

For example, Employee can assign by many projects and project can have many
employees.

Cardinality in DBMS (Mapping Constraints)


DBMS
DBMS stands for Database Management System, which is a tool, or a software used
to do various operations on a Database like the Creation of the Database, Deletion of
the Database, or Updating the current Database. To simplify processing and data
querying, the most popular types of Databases currently in use typically model their
data as rows and columns in a set of tables. The data may then be handled, updated,
regulated, and structured with ease. For writing and querying data, most Databases
employ Structured Query Language (SQL).

Cardinality
Cardinality means how the entities are arranged to each other or what is the
relationship structure between entities in a relationship set. In a Database
Management System, Cardinality represents a number that denotes how many times
an entity is participating with another entity in a relationship set. The Cardinality of
DBMS is a very important attribute in representing the structure of a Database. In a
table, the number of rows or tuples represents the Cardinality.

Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the mapping
of one entity set to another entity set in a relationship set. We generally take the
example of a binary relationship set where two entities are mapped to each other.

Cardinality is very important in the Database of various businesses. For example, if we


want to track the purchase history of each customer then we can use the one-to-
many cardinality to find the data of a specific customer. The Cardinality model can be
used in Databases by Database Managers for a variety of purposes, but corporations
often use it to evaluate customer or inventory data.

There are four types of Cardinality Mapping in Database Management Systems:

1. One to one
2. Many to one
3. One to many
4. Many to many

One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one
relationship from one entity to another entity. There are a lot of examples of one-to-
one cardinality in real life databases.

For example, one student can have only one student id, and one student id can
belong to only one student. So, the relationship mapping between student and
student id will be one to one cardinality mapping.
Another example is the relationship between the director of the school and the
school because one school can have a maximum of one director, and one director
can belong to only one school.

Note: it is not necessary that there would be a mapping for all entities in an entity set in
one-to-one cardinality. Some entities cannot participate in the mapping.

Many to One Cardinality:


In many to one cardinality mapping, from set 1, there can be multiple sets that can
make relationships with a single entity of set 2. Or we can also describe it as from set
2, and one entity can make a relationship with more than one entity of set 1.

One to one Cardinality is the subset of Many to one Cardinality. It can be represented
by M:1.

For example, there are multiple patients in a hospital who are served by a single
doctor, so the relationship between patients and doctors can be represented by
Many to one Cardinality.
One to Many Cardinalities:
In One-to-many cardinality mapping, from set 1, there can be a maximum single set
that can make relationships with a single or more than one entity of set 2. Or we can
also describe it as from set 2, more than one entity can make a relationship with only
one entity of set 1.

One to one cardinality is the subset of One-to-many Cardinality. It can be


represented by 1: M.

For Example, in a hospital, there can be various compounders, so the relationship


between the hospital and compounders can be mapped through One-to-many
Cardinality.
Many to Many Cardinalities:
In many, many cardinalities mapping, there can be one or more than one entity that
can associate with one or more than one entity of set 2. In the same way from the
end of set 2, one or more than one entity can make a relation with one or more than
one entity of set 1.

It is represented by M: N or N: M.

One to one cardinality, One to many cardinalities, and Many to one cardinality is the
subset of the many to many cardinalities.

For Example, in a college, multiple students can work on a single project, and a
single student can also work on multiple projects. So, the relationship between the
project and the student can be represented by many to many cardinalities.
Appropriate Mapping Cardinality
Evidently, the real-world context in which the relation set is modeled determines the
Appropriate Mapping Cardinality for a specific relation set.

o We can combine relational tables with many involved tables if the Cardinality is one-
to-many or many-to-one.
o One entity can be combined with a relation table if it has a one-to-one relationship
and total participation, and two entities can be combined with their relation to form a
single table if both of them have total participation.
o We cannot mix any two tables if the Cardinality is many-to-many.

You might also like