1: Introduction to databases
Outline
Introduction
Traditional File based Systems
Database Approach
Roles in Database Environment
History of Database management systems
Advantages and Disadvantages of DBMS’s.
Structure of Relational Databases
Database Schema
Keys
Schema Diagram
Relational Query Languages.
Relational Operations
Introduction
Data
Information
DBMS (Database Management System)
Relational Queries
SQL (Structured Query Language)
Indexing
Hashing
Transactions
Concurrency Control
Recovery Systems
Database Management System
(DBMS)
DBMS contains information about a particular enterprise
Collection of interrelated data
Set of programs to access the data
An environment that is both convenient and efficient to use
Database Applications:
Banking: transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Online retailers: order tracking, customized
recommendations
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax
deductions
Databases can be very large.
Databases touch all aspects of our lives
University Database Example
Application program examples
Add new students, instructors, and courses
Register students for courses, and generate
class rosters
Assign grades to students, compute grade
point averages (GPA) and generate transcripts
In the early days, database applications were
built directly on top of file systems
Traditional File based Systems
File-based systems were an early attempt to
computerize the manual filing system that we are all
familiar with.
We may have divisions in the filing system or separate
folders for different types of item that are in some way
logically related.
The manual filing system works well while the number
of items to be stored is small.
It even works quite adequately when there are large
numbers of items and we have only to store and retrieve
them.
Drawbacks of using file systems to
store data
Data redundancy and inconsistency
Multiple file formats, duplication of information in
different files
Difficulty in accessing data
Need to write a new program to carry out each new
task
Data isolation
Multiple files and formats
Integrity problems
Integrity constraints (e.g., account balance > 0)
become “buried” in program code rather than being
stated explicitly
Hard to add new constraints or change existing ones
Drawbacks of using file systems to store data
(Cont.)
Atomicity of updates
Failures may leave database in an inconsistent state with
partial updates carried out
Example: Transfer of funds from one account to another
should either complete or not happen at all
Concurrent access by multiple users
Concurrent access needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
Example: Two people reading a balance (say 100) and
updating it by withdrawing money (say 50 each) at the same
time
Security problems
Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
Database Approach
A shared collection of logically related data, and a
description of this data, designed to meet the information
needs of an organization.
The database holds not only the organization’s operational
data but also a description of this data.
The Database Management System (DBMS)
DDL
DML
Query Language
Structured Query Language
It may provide:
a security system, which prevents unauthorized users
accessing the database;
an integrity system, which maintains the consistency
of stored data;
a concurrency control system, which allows shared
access of the database;
Roles in Database Environment
We can identify four distinct types of people that
participate in the DBMS environment:
Data and database administrators
database designers
application developers
the end-users.
Data and Database administrators
The Data Administrator (DA) is responsible for the
management of the data resource including database
planning, development and maintenance of standards,
policies and procedures, and conceptual/logical
database design.
The Database Administrator (DBA) is responsible for the
physical realization of the database, including physical
database design and implementation, security and
integrity control, maintenance of the operational
system, and ensuring satisfactory performance of the
applications for users.
Database designers
In large database design projects, we can distinguish
between two types of designer:
Logical database designers
Physical database designers.
The logical database designer is concerned with
identifying the data (that is, the entities and attributes),
the relationships between the data, and the constraints
on the data that is to be stored in the database.
The logical database designer must have a thorough and
complete understanding of the organization’s data and
any constraints on this data.
The physical database designer decides how the logical
database design is to be physically realized. This
involves:
mapping the logical database design into a set of
tables and integrity constraints;
selecting specific storage structures and access
methods for the data to achieve good performance;
designing any security measures required on the
data.
Application developers and end-
users
Once the database has been implemented, the
application programs that provide the required
functionality for the end-users must be implemented.
This is the responsibility of the application developers.
The end-users are the ‘clients’ for the database, which
has been designed and implemented, and is being
maintained to serve their information needs.
End-users can be classified according to the way they
use the system:
Naive users are typically unaware of the DBMS
Sophisticated users.
History of Database Systems
1950s and early 1960s:
Data processing using magnetic tapes for storage
Tapes provided only sequential access
Punched cards for input
Late 1960s and 1970s:
Hard disks allowed direct access to data
Network and hierarchical data models in widespread
use
Ted Codd defines the relational data model
Would win the ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
History (cont.)
1980s:
Research relational prototypes evolve into commercial
systems
SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
Early 2000s:
XML and XQuery standards
Automated database administration
Later 2000s:
Giant data storage systems
Google BigTable, Yahoo PNuts, Amazon, ..
Advantages and Disadvantages of
DBMS’s
Structure of Relational Databases
A relational database consists of a collection of tables,
tables
each of which is assigned a unique name.
A row in a table represents a relationship among a set of
values.
In the relational model the term relation is used to refer
to a table,
table while the term tuple is used to refer to a row.
row
Similarly, the term attribute refers to a column of a
table.
Relation Schema and Instance
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
Formally, given sets D1, D2, …. Dn a relation r is a subset of
D 1 x D 2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where
The current values (relation instance) of a relation are
each ai Di
specified by a table
An element t of r is a tuple, represented by a row in a
table
For each attribute of a relation, there is a set of
permitted values, called the domain of that attribute.
We require that, for all relations r, the domains of all
attributes of r be atomic.
atomic
A domain is atomic if elements of the domain are
considered to be indivisible units.
Database Schema
Database schema is the logical design of the database,
and the database instance,
instance which is a snapshot of the
data in the database at a given instant in time.
In general, a relation schema consists of a list of
attributes and their corresponding domains.
The schema of a relation does not generally change.
Example Schema
Keys
Super key
An attribute, or set of attributes, that uniquely
identifies a tuple within a relation.
Candidate Key
A super key such that no proper subset is a super key
within the relation.
When a key consists of more than one attribute, we
call it a composite key.
Primary
The candidate key that is selected to identify tuples
uniquely within the relation.
Foreign
An attribute, or set of attributes, within one relation
that matches the candidate key of some (possibly the
same) relation.
Keys
Let K R
K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
Example: {ID} and {ID,name} are both superkeys of
instructor.
Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
One of the candidate keys is selected to be the primary key.
which one?
Foreign key constraint: Value in one relation must appear in
another
Referencing relation
Referenced relation
Example – dept_name in instructor is a foreign key from
instructor referencing department
Schema Diagrams
A database schema, along with primary key and foreign
key dependencies, can be depicted by schema diagrams
Relational Query Languages
A query language is a language in which a user requests
information from the database. These languages are
usually on a level higher than that of a standard
programming language.
Query languages can be categorized as either
procedural or nonprocedural.
In a procedural language, the user instructs the system
to perform a sequence of operations on the database to
compute the desired result.
In a nonprocedural language, the user describes the
desired information without giving a specific procedure
for obtaining that information.
Relational Operations
RELATIONAL ALGEBRA
The relational algebra defines a set of operations on
relations, paralleling the usual algebraic operations
such as addition, subtraction or multiplication, which
operate on numbers.
Select Operation – selection of rows
(tuples)
Relation r
A=B ^ D > 5 (r)
Project Operation – selection of columns
(Attributes)
Relation r:
A,C (r)
Union of two relations
Relations r, s:
r s:
Set difference of two relations
Relations r, s:
r – s:
Set intersection of two relations
Relation r, s:
rs
Note: r s = r – (r – s)
joining two relations -- Cartesian-
product
Relations r, s:
r x s:
Cartesian-product – naming issue
Relations r, s: B
r x s: r.B s.B
Notes about Relational Languages
Each Query input is a table (or set of tables)
Each query output is a table.
All data in the output table appears in one of the input
tables
Relational Algebra is not Turning complete
Can we compute:
SUM
AVG
MAX
MIN
Summary of Relational Algebra
Operators
Symbol (Name) Example of Use
σ
σ
(Selection) salary > = 85000 (instructor)
Return rows of the input relation that satisfy the predicate.
Π
Π
(Projection) ID, salary (instructor)
Output specified attributes from all rows of the input relation. Remove
duplicate tuples from the output.
x
(Cartesian Product) instructor x department
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.
∪
Π ∪ Π
(Union) name (instructor) name (student)
Output the union of tuples from the two input relations.
-
Π -- Π
(Set Difference) name (instructor) name (student)
Output the set difference of tuples from the two input relations.
⋈
(Natural Join) instructor ⋈ department
Output pairs of rows from the two input relations that have the same value on
all attributes that have the same name.
End of Unit 1