Lecturer Notes: Fundamentals of Database Systems
These notes are compiled from "Fundamentals of Database Systems, Sixth Edition" by
Elmasri and Navathe.
1. Characteristics of the Database Approach
The database approach differs significantly from traditional file processing systems. Here are
its key characteristics:
Self-Describing Nature of a Database System: A fundamental characteristic of the
database approach is that the database system itself contains not only the data but also
a complete definition or description of the database structure and constraints. This
information is known as metadata, and it is stored in the database catalog or
dictionary. This makes the database system self-describing.
o Example: In a file system, a file processing program needs to know the exact
record format, file structure, and access methods. If these change, the program
must be modified. In a DBMS, this metadata is stored and accessed by the
DBMS software, allowing for greater flexibility and reducing the need to
change programs when data definitions change.
Insulation between Programs and Data (Program-Data Independence): The
structure of data files is stored in the DBMS catalog separately from the access
programs. This means that changes to the data structure (e.g., adding a new field) do
not require changes to programs that do not use the changed data.
o Example: If a new field for Date_of_Birth is added to a STUDENT record,
existing programs that access STUDENT records but do not use
Date_of_Birth do not need to be recompiled.
Data Abstraction: The DBMS provides users with a conceptual view of the data,
hiding the details of data storage and organization. This is achieved through the use of
data models.
Support for Multiple Views of the Data: Each user or application program may
require its own view of the database. A view is a subset of the database or contains
virtual data derived from the database files but not explicitly stored. The DBMS
generates these views.
o Example: A university database might have a registrar's view focusing on
student grades and course enrollments, while a financial aid office has a view
focused on student financial information.
Sharing of Data and Multiuser Transaction Processing: A DBMS allows multiple
users to access and share the same database concurrently. The DBMS includes
concurrency control software to ensure that data updates are handled correctly and
consistently when multiple users access the same data simultaneously. This is crucial
for transaction processing, where many users execute updates concurrently.
o Example: Multiple airline agents can book flights concurrently without
interfering with each other's updates to seat availability.
Control of Redundancy: The database approach aims to reduce data redundancy by
storing each data item only once. This helps in maintaining data consistency and
integrity.
o Example: Instead of storing student address in multiple course registration
files, it is stored once in a STUDENT table and referenced by other tables.
Controlled redundancy might be used for performance, but the DBMS
manages it.
Enforcement of Integrity Constraints: The DBMS allows the DBA to define and
enforce various data integrity constraints, such as data type constraints, uniqueness
constraints, and referential integrity (e.g., ensuring a foreign key value refers to an
existing primary key value).
o Example: A constraint can ensure that STUDENT_ID is unique for each
student record.
Restriction of Unauthorized Access: A DBMS provides security and authorization
mechanisms to restrict access to the database. The DBA can grant and revoke specific
privileges to users.
Providing Persistent Storage for Program Objects: Database systems provide
persistent storage for program objects and data structures. This means that the data
remains available even after the program that created it terminates.
Providing Backup and Recovery: The DBMS provides mechanisms for backing up
the database and recovering it to a consistent state in case of hardware or software
failures.
Providing Multiple Interface: A DBMS offers various interfaces for different types
of users, including query languages, programming language interfaces, graphical user
interfaces (GUIs), and natural language interfaces.
Representing Complex Relationships Among Data: A DBMS can store and manage
complex relationships between different data entities.
Enforcing Standards: The DBMS facilitates the enforcement of standards within the
organization for data representation, naming conventions, and documentation.
Reduced Application Development Time: The features offered by a DBMS, such as
data independence, integrity enforcement, and multiple interfaces, significantly
reduce the time and effort required to develop database applications.
2. Actors on the Scene
These are the people whose jobs involve the day-to-day use of a database system.
Database Administrators (DBA): The DBA is responsible for authorizing access to
the database, coordinating and monitoring its use, and acquiring software and
hardware resources. The DBA is accountable for the smooth functioning of the
DBMS and the overall database system. Their responsibilities include:
o Defining the conceptual schema.
o Defining the internal schema.
o Liaising with users.
o Defining security and integrity checks.
o Database tuning.
o Maintaining system software and hardware.
Database Designers: They are responsible for identifying the data to be stored in the
database and for choosing appropriate structures to represent and store this data. They
communicate with all prospective database users to understand their requirements.
Their tasks include:
o Designing the conceptual schema.
o Designing the external schema.
o Participating in physical design.
End Users: These are the people whose jobs require access to the database for
querying, updating, and generating reports. There are several categories of end users:
o Casual End Users: Occasionally access the database, often using
sophisticated query languages to get new information. Managers and senior
executives often fall into this category.
o Naive or Parametric End Users: Make up a sizeable portion of database
users. Their main job function revolves around constantly querying and
updating the database, using "canned" transactions that have been carefully
programmed and tested. Sales clerks, reservation agents, and bank tellers are
examples.
o Sophisticated End Users: Include engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the DBMS.
They use tools like business intelligence (BI) dashboards or statistical
packages.
o Standalone End Users: Maintain personal databases by using ready-to-use
program packages that provide easy-to-use menu- or graphics-based
interfaces. An example is a user of a tax package that stores various financial
figures.
System Analysts and Application Programmers (Software Engineers):
o System Analysts: Determine the requirements of end users, especially naive
and parametric end users, and develop specifications for canned transactions
that meet these requirements.
o Application Programmers (Software Engineers): Implement these
specifications as application programs, which are then tested and debugged.
3. Workers behind the Scene
These are the people who design, develop, and maintain the DBMS software and related
system environments.
DBMS Designers and Implementers: These are the people who design and
implement the DBMS modules and interfaces as a software package. They include
individuals who design and implement the query language, data storage, concurrency
control, and recovery components.
Tool Developers: These people design and implement software packages that
facilitate database system design, use, and administration. These tools include:
o Design tools.
o Performance monitoring tools.
o Prototyping tools.
o Simulation tools.
o Graphical user interface (GUI) development tools.
o Data dictionary systems.
Operators and Maintenance Personnel: These are the system administration
personnel who are responsible for the actual running and maintenance of the hardware
and software environment for the database system.
4. Advantages of Using the DBMS Approach
Using a DBMS offers several advantages over traditional file processing systems:
Controlling Redundancy: DBMSs reduce data redundancy by integrating files and
centralizing data. This prevents data duplication and inconsistencies.
Restricting Unauthorized Access: Security and authorization subsystems within the
DBMS control which users can access what data.
Providing Persistent Storage for Program Objects: Data structures and objects
created by programs can be stored persistently in the database, meaning they remain
available even after program execution ends.
Providing Storage Structures for Efficient Query Processing: The DBMS uses
various data structures (e.g., indexes) to allow efficient retrieval and update of data.
Providing Backup and Recovery: The DBMS provides facilities to recover the
database from hardware or software failures to a consistent state.
Providing Multiple User Interfaces: DBMSs offer diverse interfaces (e.g., query
languages, graphical interfaces, programming interfaces) to cater to different user
needs.
Representing Complex Relationships Among Data: DBMSs are designed to
manage intricate relationships between various data elements.
Enforcing Integrity Constraints: The DBMS enforces rules (integrity constraints) to
ensure the validity and consistency of data, such as unique primary keys or referential
integrity.
Permitting Inferencing and Actions Using Rules: Some advanced DBMSs or
knowledge-base systems can define rules to infer new facts or trigger actions based on
data.
Potential for Enforcing Standards: The DBA can define and enforce standards for
data naming, formatting, and access, leading to better data consistency.
Reduced Application Development Time: Many DBMS features, such as built-in
functions, query optimizers, and development tools, accelerate the creation of new
applications.
Flexibility: Database systems are more flexible in handling evolving data and new
application requirements compared to traditional file systems.
Up-to-date Information: The centralized nature and transaction processing
capabilities of a DBMS ensure that data is always current.
Economy of Scale: Consolidating data and applications under a single DBMS can
lead to economies of scale by centralizing control and reducing duplicated efforts.
5. Overview of Database and Architectures - Data Models, Schemas, and Instances
Data Model: A data model is a collection of concepts that can be used to describe the
structure of a database, providing the necessary means to achieve data abstraction. It
includes data types, relationships, and constraints.
o Categories of Data Models:
High-Level (Conceptual) Data Models: Provide concepts close to the
way users perceive data. Examples include the Entity-Relationship
(ER) model.
Record-Based (Representation) Data Models: Provide concepts that
can be understood by users but are not too far removed from how data
is organized in the computer. Examples include relational, network,
and hierarchical models.
Low-Level (Physical) Data Models: Describe the details of how data
is stored on computer storage media.
Schema: The description of a database is called the database schema. It is specified
during database design and is not expected to change frequently.
o Example: The schema for a COMPANY database might include a STUDENT
relation with attributes Name, Student_Number, Class, and Major.
State (or Instance): The data in the database at a particular moment in time is called
a database state or instance. It changes frequently as data is updated.
o Example: A particular snapshot of the STUDENT table with actual student
names and numbers is a database instance.
Schema vs. State Analogy: The database schema is like the "intention" (or database
definition), whereas a database state is like the "extension" (or actual data).
6. Three-Schema Architecture and Data Independence
The three-schema architecture is a proposed standard for database systems, intended to
separate user applications from the physical database and to support multiple views of the
data.
Internal Schema: Describes the physical storage structure of the database. It uses a
physical data model and describes full details of data storage and access paths.
Conceptual Schema: Describes the structure of the whole database for a community
of users. It hides the details of physical storage structures and concentrates on
describing entities, data types, relationships, and constraints. It uses a conceptual or
high-level data model.
External Schemas (User Views): Describe the part of the database that a particular
user group is interested in and hides the rest of the database from that user group.
Each external schema uses the conceptual data model or a specific external data
model.
Diagram of Three-Schema Architecture: (The book typically includes a diagram similar to
Figure 2.1 showing these three levels with mappings between them. I cannot reproduce it, but
it illustrates how users interact with external views, which are mapped to the conceptual
schema, which in turn is mapped to the internal schema.)
Mappings: The DBMS is responsible for transforming requests and data between the
three levels.
o Conceptual/Internal Mapping: Transforms requests and results between the
conceptual and internal levels.
o External/Conceptual Mapping: Transforms requests and results between
external and conceptual levels.
Data Independence: The ability to change the schema at one level of a database system
without having to change the schema at the next higher level.
Logical Data Independence: The ability to change the conceptual schema without
changing the external schemas or application programs. This allows for changes to the
overall logical structure of the database without affecting user views.
o Example: Adding a new record type or attribute to the conceptual schema
would not necessitate changing existing external views or applications, as long
as they do not depend on the added data.
Physical Data Independence: The ability to change the internal schema without
changing the conceptual schema. This allows for changes to the physical storage
organization (e.g., file structures, indexing) without affecting the logical view of the
data.
o Example: Changing from a hashed file organization to a B-tree index for a
table would not require changes to the conceptual schema or application
programs.
7. Database Languages and Interfaces
DBMSs provide various languages and interfaces for users to interact with the database.
Data Definition Language (DDL): Used by the DBA and database designers to
define the conceptual schema and internal schema, as well as to define external
schemas and their mappings. DDL statements are compiled by a DDL compiler to
generate and store the schema descriptions in the DBMS catalog.
o Example (Conceptual Schema): Defining tables, attributes, and constraints
in SQL.
o Example (Internal Schema): Specifying file organization, indexing, and
storage details.
Data Manipulation Language (DML): Used for retrieving, inserting, deleting, and
modifying data in the database.
o Procedural DML (Low-Level): Requires users to specify what data is
needed and how to retrieve it. This is typically used by database programmers.
o Non-Procedural DML (High-Level): Allows users to specify what data is
needed without specifying how to retrieve it. These are easier for casual users.
SQL is a prime example.
Query Language: The part of a DML that involves data retrieval.
Example (Non-Procedural): SELECT * FROM EMPLOYEE
WHERE Salary > 50000;
Interfaces:
o Menu-Based Interfaces for Web Clients or Browse: Present users with a
menu of options, guiding them through the selection of query components.
o Forms-Based Interfaces: Display a form to the user, who then fills in the
blanks to specify queries or updates. SQL*Forms and Oracle Forms are
examples.
o Graphical User Interfaces (GUIs): Display the database schema in a
diagrammatic form, allowing users to specify queries by manipulating the
diagram, often using a mouse.
o Natural Language Interfaces: Allow users to input requests in plain English
or another natural language. These interfaces typically have their own schema
and a dictionary to interpret the queries.
o Speech Input and Output: Used for certain applications where voice
interaction is more convenient.
o Interfaces for Parametric Users: A small set of pre-defined, "canned"
transactions that can be executed by naive end users.
o Interfaces for the DBA: Special commands for creating accounts, managing
storage, setting parameters, and monitoring the database.
8. The Database System Environment
A database system environment consists of several components working together.
DBMS Software: The core software that manages the database.
Database: The actual collection of data.
DBMS Catalog (Metadata): Stores the schema definitions.
Query Processor: Interprets and optimizes user queries.
Runtime Database Processor: Executes the data manipulation operations.
Stored Data Manager: Manages the interaction with the file system and actual data
storage.
Disk Storage: Where the database files are physically stored.
DDL Compiler: Processes DDL statements to store schema information in the
catalog.
DML Compiler/Query Optimizer: Processes DML queries, optimizes them, and
generates an execution plan.
Precompiler: Extracts DML commands from application programs (written in a host
language) and converts them into function calls.
Runtime Database Processor: This component works with the stored data manager
to execute the actual data retrieval or update operations.
Transaction Manager: Manages concurrent access and ensures transaction properties
(ACID).
Recovery Manager: Handles database recovery from failures.
Diagram of a Simplified Database System Environment: (The book typically includes
Figure 2.3 to illustrate these components and their interactions, showing how user queries and
application programs pass through various DBMS components before interacting with the
stored database and catalog.)
9. Centralized and Client/Server Architectures for DBMSs
Centralized DBMSs Architecture: In earlier architectures, mainframe computers
performed all system functions, including user applications, user interface, and
DBMS functionality. Users accessed these systems via dumb terminals that only
displayed information.
o Diagram: Figure 2.4 typically shows a central computer (CPU, memory, disk
I/O) connected to multiple display monitors and terminals via controllers and a
system bus. All processing occurs on the central machine.
Client/Server Architectures: Developed to exploit the processing power of PCs and
workstations. It defines specialized servers with specific functionalities.
o Logical Client/Server Architecture: Clients request services from servers.
Servers provide specialized functionalities (e.g., file server, print server,
DBMS server).
o Physical Client/Server Architecture: Shows machines as client-only sites,
dedicated servers, or machines with both client and server functionality, all
connected via a communication network.
Two-Tier Client/Server Architecture for DBMSs: In relational
DBMSs, the user interface and application programs were moved to
the client side. The server side (often called a query server or
transaction server) handles query processing and transaction
management.
Client: Runs the user interface and application programs.
Server: Provides database functionality (query processing,
transaction management).
Three-Tier Client/Server Architectures for Web Applications: This
architecture is common for large applications and web-based systems.
First Tier (Client): User interface, typically a web browser.
Second Tier (Application Server or Web Server): Contains
the business logic and application programs. It communicates
with the database server.
Third Tier (Database Server): Stores the database and DBMS
software.
Advantages: This architecture enhances security, scalability,
and performance by distributing processing load and allowing
more specialized functionality at each tier.
10. Classification of Database Management Systems
DBMSs can be classified based on various criteria:
Based on Data Model:
o Relational DBMS (RDBMS): The most common type, based on the
relational data model (e.g., MySQL, Oracle, SQL Server).
o Object-Oriented DBMS (OODBMS): Based on the object data model,
allowing complex objects and inheritance (e.g., ObjectStore).
o Object-Relational DBMS (ORDBMS): Hybrid systems that combine
features of both relational and object-oriented models (e.g., PostgreSQL,
Oracle's object-relational features).
o Hierarchical DBMS: Data organized in a tree-like structure (legacy systems
like IBM IMS).
o Network DBMS: Data organized in a graph structure (legacy systems like
IDMS).
o NoSQL Databases: A broad class of databases that do not use the traditional
relational model, often designed for specific use cases like big data or real-
time web applications (e.g., MongoDB, Cassandra). (While not explicitly
detailed in the provided text's overview of classification, this is a modern and
relevant category.)
Based on Number of Users:
o Single-User Systems: Support only one user at a time (e.g., personal
databases on a local PC).
o Multiuser Systems: Support multiple users concurrently.
Based on Distribution:
o Centralized DBMS: All data and DBMS software reside on a single machine.
o Distributed DBMS (DDBMS): The database and DBMS software are
distributed over multiple sites connected by a network.
Homogeneous DDBMS: All sites use the same DBMS software.
Heterogeneous DDBMS / Federated Database Systems (FDBS) /
Multidatabase Systems: Different sites may use different DBMS
software, potentially with different data models.
Based on Cost: (Not explicitly detailed as a primary classification criterion in the
provided content, but implicitly influences choice of DBMS)
Based on Purpose/Application Area:
o General-Purpose DBMS: Designed for a wide range of applications.
o Special-Purpose DBMS: Optimized for specific applications (e.g., CAD
tools, GIS, real-time systems, data warehouses, search engines).
Other Classifications:
o In-memory Databases: Store data primarily in RAM for faster access.
o Column-Oriented Databases: Store data in columns rather than rows, good
for analytical workloads.
o Cloud Databases: Databases offered as a service over the internet (e.g.,
Amazon RDS, Google Cloud SQL).