RELATIONAL LANGUAGES
A data model is a collection of conceptual tools for
describing data, data relationships, data semantics, and
consistency constraints. The relational model uses a
collection of tables to represent both data and the
relationships among those data. Its conceptual simplicity
has led to its widespread adoption; today a vast majority of
database products are based on the relational model. The
relational model describes data at the logical and view
levels, abstracting away low-level details of data storage.
To make data from a relational database available to users,
we have to address how users specify requests for retrieving
and updating data. Several query languages have been
developed for this task, which are covered in this part.
Chapter 2 introduces the basic concepts underlying
relational databases, including the coverage of relational
algebra-a formal query language that forms the basis for
SQL. The language SQL is the most widely used relational
query language today and is covered in great detail in this
part.
Chapter 3 provides an overview of the SQL query
language, including the SQL data definition, the basic
structure of SQL queries, set operations, aggregate
functions, nested subqueries, and modification of the
database.
Chapter 4 provides further details of SQL, including join
expressions, views, transactions, integrity constraints that
are enforced by the database, and authorization
mechanisms that control what access and update actions
can be carried out by a user.
Chapter 5 covers advanced topics related to SQL including
access to SQL from programming languages, functions,
procedures, triggers, recursive queries, and advanced
aggregation features.
CH
AP
TE
R 2
Page 37
Introduction to the
Relational Model
The relational model remains the primary data model for
commercial data-processing applications. It attained its primary
position because of its simplicity, which eases the job of the
programmer, compared to earlier data models such as the
network model or the hierarchical model. It has retained this
position by incorporating various new features and capabilities
over its half-century of existence. Among those additions are
object-relational features such as complex data types and stored
procedures, support for XML data, and various tools to support
semi-structured data. The relational model's independence from
any specific underlying low-level data structures has allowed it to
persist despite the advent of new approaches to data storage,
including modern column-stores that are designed for large-scale
data mining.
In this chapter, we first study the fundamentals of the
relational model. A substantial theory exists for relational
databases. In
Chapter 6 and
Chapter 7, we shall examine
aspects of database theory that help in the design of relational
database schemas, while in
Chapter 15 and
Chapter 16 we
discuss aspects of the theory dealing with efficient processing of
queries. In Chapter 27, we study aspects of formal relational
languages beyond our basic coverage in this chapter.
2.1 Structure of Relational Databases
A relational database consists of a collection of tables, each of
which is assigned a unique name. For example, consider the
instructor table of
Figure 2.1, which stores information about
instructors. The table has four column headers: ID, name,
dep_Dname, and salary. Each row of this table records
information about an instructor, consisting of the instructor's ID,
name, dep_Dname, and salary. Similarly, the course table of
Figure 2.2 stores information about courses, consisting of a
course_id, title, dept_name, and credits, for each course. Note
that each instructor is identfied by the value of the column ID,
while each course is identified by the value of the column
course_id.
Figure 2.1 The instructor relation.
Page 38
Figure 2.2 The course relation.
Figure 2.3 shows a third table, prereq, which stores the
prerequisite courses for each course. The table has two columns,
course_id and prereq_id. Each row consists of a pair of course
identifiers such that the second course is a prerequisite for the
f
irst course.
Figure 2.3 The prereq relation.
Page 39
Thus, a row in the prereq table indicates that two courses are
related in the sense that one course is a prerequisite for the
other. As another example, when we consider the table
instructor, a row in the table can be thought of as representing
the relationship between a specified ID and the corresponding
values for name, dept_name, and salary values.
In general, a row in a table represents a relationship among a
set of values. Since a table is a collection of such relationships,
there is a close correspondence between the concept of table
and the mathematical concept of relation, from which the
relational data model takes its name. In mathematical
terminology, a tuple is simply a sequence (or list) of values. A
relationship between n values is represented mathematically by
an n_tuple of values, that is, a tuple with n values, which
corresponds to a row in a table.
Thus, in the relational model the term relation is used to refer
to a table, while the term tuple is used to refer to a row.
Similarly, the term attribute refers to a column of a table.
Examining
Figure 2.1, we can see that the relation instructor
has four attributes: ID, name, dept_name, and salary.
We use the term relation instance to refer to a specific
instance of a relation, that is, containing a specific set of rows.
The !instance of instructor shown in
Figure 2.1 has 12 tuples,
corresponding to 12 instructors.
In this chapter, we shall be using a number of different
relations to illustrate the various concepts underlying the
relational data model. These relations represent part of a
university. To simplify our presentation, we exclude much of the
data an actual university database would contain. We shall
discuss criteria for the appropriateness of relational structures in
great detail in
Chapter 6 and
Chapter 7.
The order in which tuples appear in a relation is irrelevant,
since a relation is a set of tuples. Thus, whether the tuples of a
relation are listed in sorted order, as in
Figure 2.1, or are
unsorted, as in
Figure 2.4, does not matter; the relations in the
two figures are the same, since both contain the same set of
tuples. For ease of exposition, we generally show the relations
sorted by their first attribute.
Page 40
Figure 2.4 Unsorted display of the instructor relation.
For each attribute of a relation, there is a set of permitted
values, called the domain of that attribute. Thus, the domain of
the salary attribute of the instructor relation is the set of all
possible salary values, while the domain of the name attribute is
the set of all possible instructor names.
We require that, for all relations r, the domains of all attributes
of r be atomic. A domain is atomic if elements of the domain are
considered to be indivisible units. For example, suppose the
table instructor had an attribute phone_number, which can store
a set of phone numbers corresponding to the instructor. Then the
domain of phone_number would not be atomic, since an element
of the domain is a set of phone numbers, and it has subparts,
namely, the individual phone numbers in the set.
The important issue is not what the domain itself is, but rather
how we use domain elements in our database. Suppose now that
the phone_number attribute stores a single phone number. Even
then, if we split the value from the phone number attribute into a
country code, an area code, and a local number, we would be
treating it as a non-atomic value. If we treat each phone number
as a single indivisible unit, then the attribute phone _number
would have an atomic domain.
The null value is a special value that signifies that the value is
unknown or does not exist. For example, suppose as before that
we include the attribute phone_number in the instructor relation.
It may be that an instructor does not have a phone number at
all, or that the telephone number is unlisted. We would then
have to use the null value to signify that the value is unknown or
does not exist. We shall see later that null values cause a
number of difficulties when we access or update the database,
and thus they should be eliminated if at all possible. We shall
assume null values are absent initially, and in
Section 3.6 we
describe the effect of nulls on different operations.