1.6.
1 Storage Manager
The storage manager is the component of a database
system that provides the interface between the low-level
data stored in the database and the application programs
and queries submitted to the system. The storage manager
is responsible for the interaction with the file manager. The
raw data are stored on the disk using the file system
provided by the operating system. The storage manager
translates the various DML statements into low-level file
system commands. Thus, the storage manager is
responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
Authorization and integrity manager, which tests
for the satisfaction of integrity constraints and checks
the authority of users to access data.
Transaction manager, which ensures that the
database remains in a consistent (correct) state despite
system failures, and that concurrent transaction
executions proceed without conflicts.
File manager, which manages the allocation of space
on disk storage and the data structures used to
represent information stored on disk.
Buffer manager, which is responsible for fetching data
from disk storage into main memory, and deciding what
data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the
database to handle data sizes that are much larger than
the size of main memory.
The storage manager implements several data structures
as part of the physical system implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the
structure of the database, in particular the schema of
the database.
Indices, which can provide fast access to data items.
Like the index in this textbook, a database index
provides pointers to those data items that hold a
particular value. For example, we could use an index to
f
ind the instructor record with a particular ID, or all
instructor records with a particular name.
We discuss storage media, file structures, and
buffer management in
Chapter 12 and
Chapter 13.
Methods of accessing data efficiently are discussed in
Chapter 14.
1.6.2 The Query Processor
Page 20
The query processor components include:
DDL interpreter, which interprets DDL statements and
records the definitions in the data dictionary.
DML compiler, which translates DML statements in a
query language into an evaluation plan consisting of
low-level instructions that the query-evaluation engine
understands.
A query can usually be translated into any of a number
of alternative evaluation plans that all give the same
result. The DML compiler also performs query
optimization; that is, it picks the lowest cost
evaluation plan from among the alternatives.
Query evaluation engine, which executes low-level
instructions generated by the DML compiler.
Query evaluation is covered in
Chapter 15, while the
methods by which the query optimizer chooses from among
the possible evaluation strategies are discussed in
16.
1.6.3 Transaction Management
Chapter
Often, several operations on the database form a single
logical unit of work. An example is a funds transfer, as in
Section 1.2, in which one account A is debited and another
account B is credited. Clearly, it is essential that either both
the credit and debit occur, or that neither occur. That is, the
funds transfer must happen in its entirety or not at all. This
all-or-none requirement is called atomicity. In addition, it is
essential that the execution of the funds transfer preserves
the consistency of the database. That is, the value of the
sum of the balances of A and B must be preserved. This
correctness requirement is called consistency. Finally, after
the successful execution of a funds transfer, the new values
of the balances of accounts A and B must persist, despite
the possibility of system failure. This persistence
requirement is called durability.
A transaction is a collection of operations that performs a
single logical function in a database application. Each
transaction is a unit of both atomicity and consistency. Thus,
we require that transactions do not violate any database
consistency constraints. That is, if the database was
consistent when a transaction started, the database must
be consistent when the transaction successfully terminates.
However, during the execution of a transaction, it may be
necessary temporarily to allow inconsistency, since
either the debit of A or the credit of B must be done
before the other. This temporary inconsistency, although
necessary, may lead to difficulty if a failure occurs.
It is the programmer's responsibility to properly define the
various transactions so that each preserves the consistency
of the database. For example, the transaction to transfer
funds from account A to account B could be defined to be
composed of two separate programs: one that debits
account A and another that credits account B. The execution
of these two programs one after the other will indeed
preserve consistency. However, each program by itself does
not transform the database from a consistent state to a new
consistent state. Thus, those programs are not transactions.
Ensuring the atomicity and durability properties is the
responsibility of the database system itself—specifically, of
the recovery manager. In the absence of failures, all
transactions complete successfully, and atomicity is
achieved easily. However, because of various types of
failure, a transaction may not always complete its execution
successfully. If we are to ensure the atomicity property, a
failed transaction must have no effect on the state of the
database. Thus, the database must be restored to the state
in which it was before the transaction in question started
executing. The database system must therefore perform
Page 21
failure recovery, that is, it must detect system failures and
restore the database to the state that existed prior to the
occurrence of the failure.
Finally, when several transactions update the database
concurrently, the consistency of data may no longer be
preserved, even though each individual transaction is
correct. It is the responsibility of the concurrency-control
manager to control the interaction among the concurrent
transactions, to ensure the consistency of the database. The
transaction manager consists of the concurrency-control
manager and the recovery manager.
The basic concepts of transaction processing are covered
in
Chapter 17. The management of concurrent transactions
is covered in
Chapter 18.
Chapter 19 covers failure recovery
in detail.
The concept of a transaction has been applied broadly in
database systems and applications. While the initial use of
transactions was in financial applications, the concept is
now used in real-time applications in telecommunication, as
well as in the management of long-duration activities such
as product design or administrative workflows.
1.7 Database and Application
Architecture
We are now in a position to provide a single picture of the
various components of a database system and the
connections among them.
Figure 1.3 shows the architecture
of a database system that runs on a centralized server
machine. The figure summarizes how different types of
users interact with a database, and how the different
components of a database engine are connected to each
other.
Figure 1.3 System structure.
The centralized architecture shown in
Figure 1.3 is
applicable to shared-memory server architectures, which
have multiple CPUs and exploit parallel processing, but all
the CPUs access a common shared memory. To scale
Page 22
up to even larger data volumes and even higher processing
speeds, parallel databases are designed to run on a cluster
consisting of multiple machines. Further, distributed
databases allow data storage and query processing across
multiple geographically separated machines.
In
Chapter 20, we cover the general structure of
modern computer systems, with a focus on parallel
system architectures.
Chapter 21 and
Page 23
Chapter 22 describe
how query processing can be implemented to exploit
parallel and distributed processing.
Chapter 23 presents a
number of issues that arise in processing transactions in a
parallel or a distributed database and describes how to deal
with each issue. The issues include how to store data, how
to ensure atomicity of transactions that execute at multiple
sites, how to perform concurrency control, and how to
provide high availability in the presence of failures.
We now consider the architecture of applications that use
databases as their backend. Database applications can be
partitioned into two or three parts, as shown in
Figure 1.4.
Earlier-generation database applications used a two-tier
architecture, where the application resides at the client
machine, and invokes database system functionality at the
server machine through query language statements.
Figure 1.4 Two-tier and three-tier architectures.
In contrast, modern database applications use a three
tier architecture, where the client machine acts as merely
a front end and does not contain any direct database calls;
web browsers and mobile applications are the most
commonly used application clients today. The front end
communicates with an application server. The application
server, in turn, communicates with a database system to
access data. The business logic of the application, which
says what actions to carry out under what conditions, is
embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications
provide better security as well as better performance than
two-tier applications.
Page 24
1.8 Database Users and
Administrators
A primary goal of a database system is to retrieve
information from and store new information in the database.
People who work with a database can be categorized as
database users or database administrators.
1.8.1 Database Users and User Interfaces
There are four different types of database-system users,
differentiated by the way they expect to interact with the
system. Different types of user interfaces have been
designed for the different types of users.
Naïve users are unsophisticated users who interact with
the system by using predefined user interfaces, such as
web or mobile applications. The typical user interface
for na¨ıve users is a forms interface, where the user can
f
ill in appropriate fields of the form. Na¨ıve users may
also view read reports generated from the database.
As an example, consider a student, who during class
registration period, wishes to register for a class by
using a web interface. Such a user connects to a web
application program that runs at a web server. The
application first verifies the identity of the user and then
allows her to access a form where she enters the
desired information. The form information is sent back
to the web application at the server, which then
determines if there is room in the class (by retrieving
information from the database) and if so adds the
student information to the class roster in the database.
Application programmers are computer professionals
who write application programs. Application
programmers can choose from many tools to develop
user interfaces.
Sophisticated users interact with the system without
writing programs. Instead, they form their requests
either using a database query language or by using
tools such as data analysis software. Analysts who
submit queries to explore data in the database fall in
this category.