RDBMSPDF
RDBMSPDF
(R.D.B.M.S)
1
UNIT I
INTRODUCTION TO DATA BASE MANAGEMENT
Data
Data is the raw material that can be processed for any computing machine.
For example − Employee name, Product name, Name of the student, Marks of
the student, Mobile number, Image etc.
Information
Information is the data that has been converted into more useful or intelligent
form.
For example: Report card sheet.
Differences
Data Information
1. Data is the raw fact. 1. It is a processed form of data.
2. It is not significant to a
2. It is significant to a business.
business.
3. Data is an atomic level
3. It is a collection of data.
piece of information.
4. Example: Product name,
4. Example: Report card of student.
Name of student.
5. It is a phenomenal fact. 5. It is organized data.
6. This is the primary level of
6. It is a secondary level of intelligence.
intelligence.
7. May or may not be
7. Always meaningful.
meaningful.
Diagram
2
The diagram given below depicts the use of data and information in a database
−
As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System
WHAT IS A DATABASE?
To find out what database is, we have to start from data, which is the basic building block
of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items,
e.g.
Roll Name Age
1 ABC 19
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples
Or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1
Roll Name Age
1 ABC 19
2 DEF 22
3 XYZ 28
T2
3
Roll Address
1 KOL
2 DEL
3 MUM
T3
Roll Year
1 I
2 II
3 I
T4
Year Hostel
I H1
II H2
We now have a collection of 4 tables. They can be called a “related collection” because
we can clearly find out that there are some common attributes existing in a selected pair
of tables.
Tuple or Record.
In a database, data is organized strictly in row and column format. The rows are called
Tuple or Record.
The data items within one row may belong to different data types.
Domain or Attribute.
On the other hand, the columns are often called Domain or Attribute.
All the data items within a single attribute are of the same data type.
DBMS
A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data.
The collection of data, usually referred to as the database, contains information relevant to
an enterprise.
The primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.
For example, consider the names, telephone numbers, and addresses of the people you
know.
You may have recorded this data in an indexed address book, or you may have stored it on a
diskette, using a personal computer and software such as DBASE IV or V, Microsoft
ACCESS, or EXCEL.
4
A datum – a unit of data – is a symbol or a set of symbols which is used to represent
something.
Information is interpreted data – data supplied with semantics.
Knowledge refers to the practical use of information.
Database systems are designed to manage large bodies of information.
Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information. In addition, the database
system must ensure the safety of the information stored, despite system crashes or attempts
at unauthorized access.
ADVANTAGES OF A DBMS
Using a DBMS to manage data has many advantages:
Reduction of Redundancy: This is perhaps the most significant advantage of using DBMS.
Redundancy is the problem of storing the same data item in more one place.
Redundancy creates several problems like requiring extra storage space, entering same
data more than once during data insertion, and deleting data from more than one place
during deletion. Anomalies may occur in the database if insertion, deletion etc are not
done properly.
Sharing of Data: In a paper-based record keeping, data cannot be shared among many users.
But in computerized DBMS, many users can share the same database if they are
connected via a network.
Data Integrity: We can maintain data integrity by specifying integrity constrains, which
are rules and restrictions about what kind of data may be entered or manipulated within the
database. This increases the reliability of the database as it can be guaranteed that no
wrong data can exist within the database at any point of time.
Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of the
data to insulate application code from such details.
Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
Data integrity and security: If data is always accessed through the DBMS, the DBMS
can enforce integrity constraints on the data. For example, before inserting salary
information for an employee, the DBMS can check that the department budget is not
exceeded. Also, the DBMS can enforce access controls that govern what data is visible to
different classes of users.
Data administration: When several users share the data, centralizing the administration
of data can offer significant improvements. Experienced professionals, who understand
the nature of the data being managed, and how different groups of users use it, can be
responsible for organizing the data representation to minimize redundancy and fine-
tuning the storage of the data to make retrieval efficient.
5
Concurrent access and crash recovery:
A DBMS schedules concurrent accesses to the data in such a manner that users can think of the
data as being accessed by only one user at a time. Further, the DBMS protects users from the
effects of system failures.
Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This,
in conjunction with the high-level interface to the data, facilitates quick development of
Applications. Such applications are also likely to be more robust than applications developed from
scratch because many important tasks are handled by the DBMS instead of being implemented by the
application.
What are the components of DBMS?
Hardware, Software, Data, Database Access Language, Procedures and Users all together form the
components of a DBMS.
Hardware
The hardware is the actual computer system used for keeping and accessing the database. The
conventional DBMS hardware consists of secondary storage devices such as hard disks. Databases run
on the range of machines from microcomputers to mainframes.
Software
Software is the actual DBMS between the physical database and the users of the system. All the
requests from the user for accessing the database are handled by DBMS.
Data
It is an important component of the database management system. The main task of DBMS is to
process the data. Databases are used to store the data, retrieved, and updated to and from the databases.
Users
There are a number of users who can access or retrieve the data on demand using the application and
the interfaces provided by the DBMS.
Native Users
Online Users
6
Sophisticated Users
Specialized Users
Application Users
DBA- Database Administrator
There four main components on which the working of a DBMS depends. This includes:
Data: The main component is the data. The entire database is set based on the data and the
information processed based on it. This data acts as a bridge between the software and hardware
components of DBMS. This can further be divided into three varieties:
User Data – The actual data based on which the work is done
Metadata – This is the data of the data, i.e., managing the data required to enter the
information
Application MetaData – This is the structure and format of the queries
To simplify it, in a table, the information given in each table is the User Data, the number of tables,
rows and columns is the Metadata the structure we choose is the Application Metadata.
7
Hardware: These are the general hardware devices which help us save and enter the data like
hard disks, magnetic tapes, etc.
Software: The software acts as a medium of communication between the user and the database.
Based on the user’s requirement, the database can be modified and updated. To perform
operations on the data, query languages like SQL are used.
Users: No function can be performed without the Users. Thus, they form the fourth most
important component of DBMS. The information entered into a database is used by the User or
the administrator to perform their business operations and responsibilities
10
Figure 1.1 the Three Schema Architecture
The three-schema architecture is a convenient tool for the user to visualize the schema levels in a
database system. Most DBMSs do not separate the three levels completely, but support the three-
schema architecture to some extent. Some DBMSs may include physical-level details in the
conceptual schema. In most DBMSs that support user views, external schemas are specified in the
same data model that describes the conceptual-level information. Some DBMSs allow different
data models to be used at the conceptual and external levels.
Notice that the three schemas are only descriptions of data; the only data that actually exists is at
the physical level. In a DBMS based on the three-schema architecture, each user group refers only
to its own external schema. Hence, the DBMS must transform a request specified on an external
schema into a request against the conceptual schema, and then into a request on the internal
schema for processing over the stored database. If the request is a database retrieval, the data
extracted from the stored database must be reformatted to match the user’s external view. The
processes of transforming requests and results between levels are called mappings. These
mappings may be time- consuming, so some DBMSs—especially those that are meant to support
small databases—do not support external views. Even in such systems, however, a certain amount
of mapping is necessary to transform requests between the conceptual and internal levels.
11
DATA MODEL
A data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.
12
One to one. An entity in A is associated with at most one entity in B, and an entity in
B is associated with at most one entity in A.
One to many. An entity in A is associated with any number (zero or more) of entities
in B. An entity in B, however, can be associated with at most one entity in A.
Many to one. An entity in A is associated with at most one entity in B. An entity in
B, however, can be associated with any number (zero or more) of entities in A.
Many to many. An entity in A is associated with any number (zero or more) of entities
in B, and an entity in B is associated with any number (zero or more) of entities in A.
13
• Easy conversion for E-R to other data model: Conversion from E-R diagram to a network or
hierarchical data model can· easily be accomplished.
• Graphical representation for better understanding: An E-R model gives graphical and
diagrammatical representation of various entities, its attributes and relationships between
entities. This is turn helps in the clear understanding of the data structure and in minimizing
redundancy and other problems.
• No industry standard for notation: there is an industry standard notation for developing an E-
R diagram.
• Popular for high-level design: The E-R data model is especially popular for high level
14
E-R Modelling
15
1. Entity
It may be an object, person, place or event that stores data in a database. In E-R diagram an
entity is represented in rectangle form. For example, students, employees, managers, etc.
Entity set
It is a collection of entities of the same type which share similar properties. For example, a
group of students in a college and students are an entity set. 16
Entity is characterised into two types as follows:
a. Strong entity set: The entity types which consist of key attributes or if there are
enough attributes for forming a primary key attribute are called a strong entity set. It
is represented by a single rectangle.
b. Weak entity set: An entity does not have a primary key attribute and depends on
another strong entity via foreign key attribute. It is represented by a double rectangle.
17
UNIT-II:
We are living in a world where every entity has relationships with one other whether a living
or non-living being. For example, you are a single entity but you share different relations with
your family and friends. Even within a family, you are the son of your father at the same time
you are also a sibling of your brother. Similarly, the relationships exist in Database
Management (DBMS). In this article, we are going to learn about what is the degree of
relationships, and the types of relationships.
Degree of Relationship
In DBMS, a degree of relationship represents the number of entity types that are associated
with a relationship. For example, we have two entities, one is a student and the other is a bag
and they are connected with the primary key and foreign key. So, here we can see that the
degree of relationship is 2 as 2 entities are associating in a relationship.
Types of degree
Now, based on the number of linked entity types, we have 4 types of degrees of relationships.
1. Unary
2. Binary
3. Ternary
4. N-ary
Let’s discuss them one by one with the help of examples.
Unary (Degree 1)
In this type of relationship, both the associating entity types are the same. So, we can say that
unary relationships exist when both entity types are the same and we call them the degree of
relationship is 1. In other words, in a relation only one entity set is participating then such
type of relationship is known as a unary relationship.
Example: In a particular class, we have many students, there are monitors too. So, here class
monitors are also students. Thus, we can say that only students are participating here. So the
degree of such type of relationship is 1.
18
unary
Binary (Degree 2)
In a Binary relationship, there are two types of entity associates. So, we can say that a Binary
relationship exists when there are two types of entity and we call them a degree of
relationship is 2. Or in other words, in a relation when two entity sets are participating then
such type of relationship is known as a binary relationship. This is the most used relationship
and one can easily be converted into a relational table.
Example: We have two entity types ‘Student’ and ‘ID’ where each ‘Student’ has his ‘ID’. So,
here two entity types are associating we can say it is a binary relationship. Also, one ‘Father’
can have many ‘daughters’ but each ‘daughter’ should belong to only one ‘father. We can say
that it is a one-to-many binary relationship.
Binary
Ternary (Degree 3)
In the Ternary relationship, there are three types of entity associates. So, we can say that a
Ternary relationship exists when there are three types of entity and we call them a degree of
relationship is 3. Since the number of entities increases due to this, it becomes very complex
to turn E-R into a relational table. Now let’s understand with the examples.
Example: We have three entity types ‘Teacher’, ‘Course’, and ‘Class’. The relationship
between these entities is defined as the teacher teaching a particular course, also the teacher
teaches a particular class. So, here three entity types are associating we can say it is a ternary
relationship.
Ternary
N-ary (n Degree)
In the N-ary relationship, there are n types of entity that associates. So, we can say that an N-
ary relationship exists when there are n types of entities. There is one limitation of the N-ary
relationship, as there are many entities so it is very hard to convert into an entity, rational
table. So, this is very uncommon, unlike binary which is very much popular.
Example: We have 5 entities Teacher, Class, Location, Salary, Course. So, here five entity
types are associating we can say an n-ary relationship is 5. 19
N-ary
Conclusion
In summary, the complexity of the real-world scenario determines the type of relationship
(unary, binary, ternary, or n-ary) selected in a database architecture; simpler relationships are
favored where feasible. While ternary and n-ary relationships are utilized for more
sophisticated interactions, unary and binary relationships are frequently used. The selection
should fit the particular requirements and data structure of the application.
1. Unary Relationship: A unary relationship exists when both the participating entity
type are the same. When such a relationship is present we say that the degree of
relationship is 1.
2. Binary Relationship: A binary relationship exists when exactly two entity type
participates. When such a relationship is present we say that the degree is 2. This is
the most common degree of relationship. It is easy to deal with such relationship as 20
these can be easily converted into relational tables.
For example, we have two entities type ‘Customer’ and ‘Account’ where each
‘Customer’ has an ‘Account’ which stores the account details of the ‘Customer’.
Since we have two entity types participating we call it a binary relationship. Also, one
‘Customer’ can have many ‘Account’ but each ‘Account’ should belong to only one
‘Customer’. We can say that it is a one-to-many binary relationship.
3. Ternary Relationship: A ternary relationship exists when exactly three entity type
participates. When such a relationship is present we say that the degree is 3. As the
number of entity increases in the relationship, it becomes complex to convert them
into relational tables.
For example, we have three entity type ‘Employee’, ‘Department’ and ‘Location’.
The relationship between these entities are defined as an employee works in a
department,
21
an employee works at a particular location. So, we can see we have three entities
participating in a relationship so it is a ternary relationship. The degree of this relation
is 3.
4. n-ary Relationship: An N-ary relationship exists when ‘n’ number of entities are
participating. So, any number of entities can participate in a relationship. There is no
limitation to the maximum number of entities that can participate.
A given ER model can be converted into Relational model. A Relational model includes
Relations, Tuples, Attributes, Keys, and Foreign keys.
22
Rules of converting ER model to Relational Model:
23
Now for the above example we can create three relations:
Employee
Works_On
Projects
Aggregation
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.
For example: Centre entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching Centre then he will never enquiry about the Course only or just about the Centre
instead he will ask the enquiry about both.
24
IsA Relationship and Attribute Inheritance
This relationship is called IsA. Some texts will call this an IsA relationship, but do not
confuse
with the concept of relationship between entities.
25
The upper entity type (connected to the apex of the IsA triangle) is the more abstract/general
entity type (super type) from which the lower entities inherit its attributes.
Properties of IsA
1. CONSTRAINTS
Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Contraints could be column level or table level. Column level constraints are applied only to one
column where as table level constraints are applied to the whole table.
Following are commonly used constraints available in SQL. These constraints have already
been discussed in SQL - RDBMS Concepts chapter but its worth to revise them at this point.
NOT NULL Constraint: Ensures that a column cannot have NULL value.
DEFAULT Constraint : Provides a default value for a column when none is specified.
UNIQUE Constraint: Ensures that all values in a column are different.
PRIMARY Key: Uniquely identified each rows/records in a database table.
26
FOREIGN Key: Uniquely identified a rows/records in any another database table.
CHECK Constraint: The CHECK constraint ensures that all values in a column
satisfy certain conditions.
INDEX: Use to create and retrieve data from the database very quickly.
Constraints can be specified when a table is created with the CREATE TABLE statement or you
can use ALTER TABLE statment to create constraints even after the table is created.
Dropping Constraints:
Any constraint that you have defined can be dropped using the ALTER TABLE command with the
DROP CONSTRAINT option.
27
For example, to drop the primary key constraint in the EMPLOYEES table, you can use the
following command:
Some implementations may provide shortcuts for dropping certain constraints. For example, to
drop the primary key constraint for a table in Oracle, you can use the following command:
Integrity Constraints:
Integrity constraints are used to ensure accuracy and consistency of data in a relational database.
Data integrity is handled in a relational database through the concept of referential integrity.
There are many types of integrity constraints that play a role in referential integrity (RI). These
constraints include Primary Key, Foreign Key, Unique Constraints and other constraints
mentioned above.
The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
28
(a) Another table is ORDERS as follows:
+ + + + +
29
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
Now let us join these two tables in our SELECT statement as follows:
+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +
Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.
DATA-MANIPULATION LANGUAGE
Data manipulation is
The retrieval of information stored in the database
The insertion of new information into the database
The deletion of information from the database
30
The modification of information stored in the database
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as organized by the appropriate data model. There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those
data. Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specifywhat data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data. The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data manipulation language
synonymously.
This query in the SQL language finds the name of the customer whose
customer-id is 192-83-7465:
31
selectcustomer.customer-name
fromcustomer
wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following
query finds the balance of all accounts owned by the customer with customerid 192-83-
7465.
selectaccount.balance
fromdepositor,
account
wheredepositor.customer-id = 192-83-7465 and
depositor.account-number= account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At
higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact
efficiently with the system. The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the database system.
UNIT III
Functional Dependencies
FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.
32
Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where
X and Y are subsets of attributes of R. An FD is a relationship between an attribute "Y" and a
determinant (1 or more other attributes) "X" such that for a given value of a determinant the
value of the attribute is uniquely defined.
X is a determinant
X determines Y
Y is functionally dependent on X
X→Y
X →Y is trivial if Y Í X
Example:
Let R be
NewStudent(stuId, lastName, major, credits, status,
ZipCode→AddressCity
ArtistName→BirthYear
Author, Title→PublDate
NORMALIZATION
ANAMOLIES IN DBMS:
Insertion Anomaly
It is a failure to place information about a new database entry into all the places in the database
where information about the new entry needs to be stored. In a properly normalized database,
information about a new entry needs to be inserted into only one place in the database, in an
inadequatly normalized database, information about a new entry may need to be inserted into
more than one place, and human fallibility being what it is, some of the needed additional
insertions may be missed.
Deletion anomaly
It is a failure to remove information about an existing database entry when it is time to remove
that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry
needs to be deleted from only one place in the database, in an inadequatly normalized database,
information about that old entry may need to be deleted from more than one place.
Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus
“update anomalies” can be either of the kinds discussed above.
All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption
of the database. Properly normalized database are much less susceptible to corruption than are
un- normalized databases.
Normalization Avoids
Duplication of Data – The same data is listed in multiple lines of the database
Insert Anomaly – A record about an entity cannot be inserted into the table without
first inserting information about another entity – Cannot enter a customer without a
sales order
Database security
Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.
Many layers and types of information security control are appropriate to databases, including:
Access control
Auditing
Authentication
Encryption
Integrity controls
Backups
Application security
35
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.
Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.
37
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking
A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
38
needs to be accessed by another transaction, the existing lock for the object and the type of
the
39
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.
The common interactions between these lock types are defined by blocking behavior as follows:
An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.
TRANSCATION
A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.
Example:
You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
40
• Subtracting the money from the savings account balance.
The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.
You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together
States of Transaction
Active: the initial state, the transaction stays in this state while it is executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
Committed: after successful completion.
We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.
41
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that
42
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion
The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.
Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.
43
Delete Anomaly – A record cannot be deleted without deleting a record about a
related entity. Cannot delete a sales order without deleting all of the customer’s
information.
Update Anomaly – Cannot update information without changing information in many
places. To update customer information, it must be updated for each sales order the
customer has placed
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building
blocks which are used to define normal form.
Example: In RESULT relation, Marks attribute may decide the grade attribute .This is
represented as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.
2. Functional Dependency: Yes functional dependency has definition but let’s not care
about that. Let’s try to understand the concept by example. Consider the following
relation :
REPORT(Student#,Course#,CourseName,IName,Room#,Marks,Gra
de) Where:
Student#-Student Number
Course#-Course Number
CourseName -CourseName
IName- Name of the instructor who delivered the course
Room#-Room number which is assigned to respective instructor
Marks- Scored in Course Course# by student Student #
Grade –Obtained by student Student# in course Course #
Student#,Course# together (called composite attribute) defines EXACTLY ONE value
of marks .This can be symbolically represented as
Student#Course# Marks
This type of dependency is called functional dependency. In above example Marks is functionally
dependent on Student#Course#.
44
Other Functional dependencies in above examples are:
45
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one
value of
Y. This is represented as
: X->Y
However X may be composite in nature.
CourseName is not fully functionally dependent on student#course# because one of the subset
course# determines the course name and Student# does not having role in deciding Course name
.Hence CourseName is not fully functional dependent on student #Course#.
Student
# Marks
Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes.
Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-
set of X.However X may be composite in nature.
Room#
IName
CourseNam
e Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute
Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However
X may be composite in nature.
46
IName
Room#
Course
#
Similarly Grade depends on Marks,in turn Marks depends on Student# Course# hence Grade
47
Fully transitively depends on Student# Course#.
6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all
other attributes ,then the attribute X is a key attribute which is nothing but the
candidate key.
Ex: Student#Course# together is a composite key attribute which determines all attributes in
relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can
be further decomposed and simplified. Consider the following table:
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).
A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.
48
Emp- Month Sales Bank-Id Bank-Name
Emp- Name
Id
E01 AA Jan 1000 B01 SBI
49
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends
on the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank- Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.
Bank- Bank-
Id Name
B01 SBI
B02 UTI
50
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
51
Third Normal Form (3NF)
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency
in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in
3NF, and the table must be split to bring it into 3NF.
Such derived dependencies hold well in most of the situations. For example if we
have Roll → Marks
And
Marks → Grade
Then we may safely
derive Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These
could be same situation when a 3NF relation may not be in BCNF the following conditions are
found true.
53
Professor Departmen Head ofPercent
Code t Dept. Time
P1 Physics Ghosh 50
P1 Mathematic Krishnan 50
s
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematic Krishnan 100
s
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.
55
P2 Physics 75
P3 Mathematic 100
s
Head of
Departmen Dept.
t
Physics Ghosh
Mathematic Krishnan
s
Chemistry Rao
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and
5NF are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
56
Vendor Item Project
Code Code No.
V1 I1 P1
V1 I2 P1
V1 I1 P3
57
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2
58
Fifth Normal Form (5NF)
59
These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved
the dependencies between Vendor Code and Item code in the first table and Vendor Code and
Item code in the second table, we have lost the relationship between Item Code and Project No.
If there were a primary key then this loss of dependency would not have occurred. In order to
revive this relationship we must add a new table like the following. Please note that during the
entire process of normalization, this is the only step where a new table is created by joining two
attributes, rather than splitting them into separate tables.
Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13
Database security
Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.
Many layers and types of information security control are appropriate to databases, including:
Access control
Auditing
Authentication
Encryption
Integrity controls
Backups
Application security
61
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.
Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.
63
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking
A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
64
needs to be accessed by another transaction, the existing lock for the object and the type of
the
65
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.
The common interactions between these lock types are defined by blocking behavior as follows:
An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.
TRANSCATION
A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.
Example:
You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
66
• Subtracting the money from the savings account balance.
The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.
You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together
States of Transaction
Active: the initial state, the transaction stays in this state while it is executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
Committed: after successful completion.
We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.
67
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that
68
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion
The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.
Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.
A timestamp is a tag that can be attached to any transaction or any data item, which denotes a
specific time on which the transaction or data item had been activated in any way. We, who use
computers, must all be familiar with the concepts of “Date Created” or “Last Modified” properties
of files and folders. Well, timestamps are things like that.
A timestamp can be implemented in two ways. The simplest one is to directly assign the
current value of the clock to the transaction or the data item. The other policy is to attach the
value of a logical counter that keeps incrementing as new timestamps are
69
required. The timestamp of a transaction denotes the time when it was first activated. The
timestamp of a data item can be of the following two types:
70
W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.
These two timestamps are updated each time a successful read/write operation is performed on
the data item Q.
The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be
executed in their respective timestamp order. This is an alternative solution to using locks.
1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has already been overwritten by some other transaction. Hence the value which T
wanted to read from Q does not exist there anymore, and T would be rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has been written and committed by some other transaction earlier. Hence T will
be allowed to read the value of Q, and the R-timestamp of Q should be updated to TS (T).
1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for
transaction T to write its value, and the delay has become so great that it has allowed
another transaction to read the old value of data item Q. In such a case T has lost its
relevance and will be rolled back.
2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the
system has allowed another transaction to write into the data item Q. in such a case too, T
has lost its relevance and will be rolled back.
3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).
UNIT IV
DATA-MANIPULATION LANGUAGE
Data manipulation is
The retrieval of information stored in the database
The insertion of new information into the database
The deletion of information from the database
The modification of information stored in the database
71
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as organized by the appropriate data model. There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those
data. Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specifywhat data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data. The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data manipulation language
synonymously.
This query in the SQL language finds the name of the customer whose
customer-id is 192-83-7465:
72
selectcustomer.customer-name
fromcustomer
wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following
query finds the balance of all accounts owned by the customer with customerid 192-83-
7465.
selectaccount.balance
fromdepositor,
account
wheredepositor.customer-id = 192-83-7465 and
depositor.account-number= account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At
higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact
efficiently with the system. The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the database system.
UNION
RESULT ← A 𝖴 B
SQL Statement:
SELECT * From A
UNION
SELECT * From B
INTERSECTION
RESULT ← A ∩ B
COMMANDS IN SQL :
1. CREATE DATABASE
The SQL CREATE DATABASE statement is used to create new SQL database.
Syntax:
Example:
If you want to create new database <testDB>, then CREATE DATABASE statement would be as
follows:
2. DROP DATABASE
74
The SQL DROP DATABASE statement is used to drop any existing database in SQL schema.
Syntax:
Example:
If you want to delete an existing database <testDB>, then DROP DATABASE statement would
be as follows:
3. USE
The SQL USE statement is used to select any existing database in SQL schema.
Syntax:
USE DatabaseName;
4. CREATE TABLE
Syntax:
CREATE TABLE
table_name( column1
datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
75
PRIMARY KEY( one or more columns )
);
76
CREATE TABLE is the keyword telling the database system what you want to do.in this case,
you want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data type it is.
The syntax becomes clearer with an example below.
A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement.
5. DROP TABLE
The SQL DROP TABLE statement is used to remove a table definition and all data, indexes,
triggers, constraints, and permission specifications for that table.
Syntax:
6. INSERT INTO
The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.
Syntax:
Here column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:
You can create a record in CUSTOMERS table using second syntax as follows:
All the above statement would product following records in CUSTOMERS table:
+ -+ + + -+ +
78
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
79
7. SELECT
SQL SELECT Statement is used to fetch the data from a database table which returns data in
the form of result table. These result tables are called result-sets.
Syntax:
Here column1, column2...are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field then you can use following syntax:
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would fetch ID, Name and Salary fields of the customers available
in CUSTOMERS table:
+ -+ + +
| ID | NAME| SALARY |
80
+ -+ + +
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |
81
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
8. WHERE CLAUSE
The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple table.
If the given condition is satisfied then only it returns specific value from the table. You would use
WHERE clause to filter the records and fetching only necessary records.
The WHERE clause not only used in SELECT statement, but it is also used in UPDATE, DELETE
statement etc. which we would examine in subsequent chapters.
Syntax:
You can specify a condition using comparision or logical operators like >, <, =, LIKE, NOT etc.
Below examples would make this concept clear.
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
82
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
83
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000:
+ -+ + +
| ID | NAME| SALARY |
+ -+ + +
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
The SQL AND and OR operators are used to combile multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.
These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.
The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.
Syntax:
84
You can combine N number of conditions using AND operator. For an action to be taken by
the SQL statement, whether it be a transaction or query, all conditions separated by the AND
must be TRUE.
85
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:
+ -+ + +
| ID | NAME | SALARY |
+ -+ + +
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
10. UPDATE
The SQL UPDATE Query is used to modify the existing records in a table.
You can use WHERE clause with UPDATE query to update selected rows otherwise all the
rows would be effected.
Syntax:
86
The basic syntax of UPDATE query with WHERE clause is as follows:
UPDATE table_name
87
SET column1 = value1, column2 = value2 , columnN = valueN
WHERE [condition];
11. DELETE
The SQL DELETE Query is used to delete the existing records from a table.
You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.
Syntax:
DELETE FROM
table_name WHERE
[condition];
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
88
Now CUSTOMERS table would have following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
89
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
12. LIKE
The SQL LIKE clause is used to compare a value to similar values using wildcard operators.
There are two wildcards used in conjunction with the LIKE operator:
The percent sign represents zero, one, or multiple characters. The underscore represents a single
number or character. The symbols can be used in combinations.
Syntax:
SELECT FROM
table_name WHERE
column LIKE 'XXXX%'
or
or
SELECT FROM
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
90
table_name WHERE
column LIKE 'XXXX_'
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
91
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would display all the records from CUSTOMERS table where
SALARY starts with 200:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ + + -+ +
13. TOP
The SQL TOP clause is used to fetch a TOP N number or X percent records from a table.
Syntax:
The basic syntax of TOP clause with SELECT statement would be as follows:
Example:
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
92
Consider CUSTOMERS table is having following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
93
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example on SQL server which would fetch top 3 records from CUSTOMERS table:
+ -+ -+ -+ +- +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ -+ -+ +- +
14. ORDER BY
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on
one or more columns. Some database sorts query results in ascending order by default.
Syntax:
SELECT
column-list
FROM
table_name
[WHERE
condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
You can use more than one column in the ORDER BY clause. Make sure whatever column you
are using to sort, that column should be in column-list.
Example:
94
Consider CUSTOMERS table is having following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
95
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would sort the result in ascending order by NAME and SALARY:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+ -+ + + -+ +
15. GROUP BY
The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange
identical data into groups.
The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.
Syntax:
The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the
96
conditions in the WHERE clause and must precede the ORDER BY clause if one is used.
SELECT column1,
column2 FROM
table_name WHERE [
conditions ]
GROUP BY column1, column2
97
ORDER BY column1, column2
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
If you want to know the total amount of salary on each customer, then GROUP BY query would
be as follows:
+ + +
| NAME | SUM(SALARY) |
+ + +
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh |2000.00 |
+ + +
98
The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate all the
duplicate records and fetching only unique records.
There may be a situation when you have multiple duplicate records in a table. While fetching such
records, it makes more sense to fetch only unique records instead of fetching duplicate records.
99
Syntax:
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
First let us see how the following SELECT query returns duplicate salary records:
This would produce following result where salary 2000 is coming twice which is a duplicate
record from the original table.
+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
100
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +
Now let us use DISTINCT keyword with the above SELECT query and see the result:
101
SQL> SELECT DISTINCT SALARY FROM
CUSTOMERS ORDER BY SALARY;
This would produce following result where we do not have any duplicate entry:
+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +
The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
+ + + + +
102
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
Now let us join these two tables in our SELECT statement as follows:
+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +
Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.
UNIT V
JOIN Operator
The JOIN operation is denoted by the symbol and is used to compound similar tuples from two
Relations into single longer tuples. Every row of the first table is joined to every row of the
second table. The result is tuples taken from both tables.
SELECT A.dob,
JOIN B on B.empno=A.empno
This operation results in all combinations of tuples from Relation A and Relation B satisfying a
join requirement. The THETA JOIN is designated by: The SQL implementation would be the
same as for the JOIN example above.
A <join condition> B
EQUIJOIN Operator
The EQUIJOIN operation returns all combinations of tuples from Relation A and Relation B
satisfying a join requirement with only equality comparisons. The EQUIJOIN operation is
symbolized by :
A <join condition> B, OR
A (<join
attributes 2>) B
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:
105
18. UNION
The SQL UNION clause/operator is used to combine the results of two or more SELECT
statements without returning any duplicate rows.
To use UNION, each SELECT must have the same number of columns selected, the same
number of column expressions, the same data type, and have them in the same order but they do
not have to be the same length.
Syntax:
SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]
UNION
SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]
Here given condition could be any given expression based on your requirement.
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
106
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
+ + + + +
107
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
Now let us join these two tables in our SELECT statement as follows:
+ + +- +- +
| ID | NAME | AMOUNT | DATE |
+ + +- +- +
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+ + +- +- +
108
The HAVING clause enables you to specify conditions that filter which group results appear in
the final results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause places
conditions on groups created by the GROUP BY clause.
Syntax:
SELE
CT
FROM
WHER
E
109
GROUP
BY
HAVING
ORDER
BY
The HAVING clause must follow the GROUP BY clause in a query and must also precede the
ORDER BY clause if used. The following is the syntax of the SELECT statement, including the
HAVING clause:
SELECT column1,
column2 FROM
table1, table2 WHERE
[ conditions ]
GROUP BY column1,
column2 HAVING [
conditions ] ORDER BY
column1, column2
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
110
SELECT * from A
INNER JOIN B
on A.empno=B.empno
The NATURAL JOIN operation returns results that does not include the JOIN attributes of the
second Relation B. It is not required that attributes with the same name be mentioned. The
NATURAL JOIN operator is symbolized by:
A * <join
condition> B, OR A *
OR A * B
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:
FROM A
NATURAL JOIN
//where depno =5
We can always use the ‘where’ clause to further restrict our output and stop a Cartesian product
output.
111
SQL Statement:
INTERSECT
MINUS Operations: the MINUS operation includes tuples from one Relation that are not in
another Relation. Let the Relations be A and B, the MINUS operation A MINUS B is denoted by
A – B, that results in tuples that are A and not in B. The SQL implementation of the MINUS
operations would be as follows:
MINUS
RESULT ← A – B
SQL Statement
MINUS
Relational Algebra Operators are mathematical functions used to retrieve queries by describing a
sequence operations on tables or even databases(schema) involved. With relational algebra
112
operators, a query is always composed of a number of operators, which each in turn are composed of
relations as variables and return an individual abstraction as the end product.
The following are the main relational algebra operators as applied to SQL:
The SELECT operator is used to choose a subset of the tuples(rows) from a relation that satisfies a
selection condition, acting as a filter to retain only tuples that fulfills a qualifying requirement.
σ<Selection condition>(R)
To implement the SELECT statement in SQL, we take a look at an example in which we would
like to select the EMPLOYEE tuples whose employee number is 7, or those whose date of birth is
before 1980…
σempno=7(EMPLOYEE)
σdob<’01-Jan-1980′
(EMPLOYEE)
SELECT empno
FROM EMPLOYEE
WHERE empno=7
SELECT dob
113
FROM
EMPLOYEE
114
The PROJECT Operator
This operator is used to reorder, select and get rid of attributes from a table. At some point we
might want only certain attributes in a relation and eliminate others from our query result.
Therefore the PROJECT operator would be used in such operations.
∏<attribute list>(R )
∏dob, empno(EMPLOYEE )
SELECT dob,
empno FROM
EMPLOYEE
The RENAME operator is used to give a name to results or output of queries, returns of selection
statements, and views of queries that we would like to view at some other point in time:
The arrow symbol ← means that we first get the PROJECT operation results on the right
side of the arrow then apply the RENAME operation on the results on the left side of the
arrow.
In SQL we would translate the RENAME operator using the SQL ‘AS’ statement:
FROM EMPLOYEE
UNION
RESULT ← A 𝖴 B
SQL Statement:
SELECT * From A
UNION
SELECT * From B
INTERSECTION
116
RESULT ← A ∩ B
117
SQL Statement:
INTERSECT
MINUS Operations: the MINUS operation includes tuples from one Relation that are not in
another Relation. Let the Relations be A and B, the MINUS operation A MINUS B is denoted by
A – B, that results in tuples that are A and not in B. The SQL implementation of the MINUS
operations would be as follows:
MINUS
RESULT ← A – B
SQL Statement
MINUS
The CARTERSIAN PRODUCT operator, also referred to as the cross product or cross join,
creates a relation that has all the attributes of A and B, allowing all the attainable combinations of
tuples from A and B in the result. The CARTERSIAN PRODUCT A and B is symbolized by X
as in A X B.
is C = A X B
119
from A, B
JOIN Operator
The JOIN operation is denoted by the symbol and is used to compound similar tuples from two
Relations into single longer tuples. Every row of the first table is joined to every row of the
second table. The result is tuples taken from both tables.
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:
SELECT A.dob,
JOIN B on B.empno=A.empno
This operation results in all combinations of tuples from Relation A and Relation B satisfying a
join requirement. The THETA JOIN is designated by: The SQL implementation would be the
same as for the JOIN example above.
A <join condition> B
EQUIJOIN Operator
The EQUIJOIN operation returns all combinations of tuples from Relation A and Relation B
satisfying a join requirement with only equality comparisons. The EQUIJOIN operation is
symbolized by :
A <join condition> B, OR
A (<join
attributes 2>) B
120
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:
121
SELECT * from A
INNER JOIN B
on A.empno=B.empno
The NATURAL JOIN operation returns results that does not include the JOIN attributes of the
second Relation B. It is not required that attributes with the same name be mentioned. The
NATURAL JOIN operator is symbolized by:
A * <join
condition> B, OR A *
OR A * B
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:
FROM A
NATURAL JOIN
//where depno =5
We can always use the ‘where’ clause to further restrict our output and stop a Cartesian product
output.
DIVISION Operator
The DIVISION operation will return a Relation R(X) that includes all tuples t[X] in R(Z) that
122
appear in R1 in combination with every tuple from R2(Y), where Z = X 𝖴 Y. The DIVISION
operator is symbolized by:
R1(Z) ∻ R2(Y)
123
The DIVISION operator is the most difficult to implement in SQL as no SQL command is given
for DIVISION operation. The DIVISION operator would be seen as the opposite of the
CARTERSIAN PRODUCT operator; just as in standard math, the relation between division and
multiplication.
Therefore a series of current SQL commands have to be utilized in implementation of the
DIVISION operator. An example of the SQL implementation of DIVISION operator:
SELECT surname,
WHERE NOT
FROM employee y
WHERE NOT
FROM employee z
AND y.surname =
z.surname)) ORDER BY
empno
RELATIONAL CALCULUS
125
Relation calculus, which in effect means calculating with relations, is based on predicate calculus,
which is calculating with predicates. It is a formal language used to symbolize logical arguments
in mathematics. Propositions specifying a property consist of an expression that names an
individual object, and another expression, called the predicate, that stands for the property that the
individual object possesses. If for instance, p and q are propositions, we can build other
propositions "not p", "p or q", "p and q" and so on. In predicate calculus, propositions may be
built not only out of other propositions but also out of elements that are not themselves
propositions. In this manner we can build a proposition that specifies a certain property or
characteristic of an object.
The tuple relational calculus is based on specifying a number of tuple variables. Each such tuple
variable normally ranges over a particular database relation. This means that the variable may
take any individual tuple from that relation as its value. A simple tuple relational calculus query is
of the form { t I COND(t)·}, where '1' is a tuple variable and COND(t) is a conditional
expression involving '1'. The result of such a query is a relation that contains all the tuples (rows)
that satisfy COND(t).
For example, the relational calculus query {t I BOOK(t) and t.PRICE>lOO} will get you all the
books whose price is greater than 100. In the above example, the condition 'BOOK(t)' specifies
that the range relation of the tuple variable '1' is BOOK. Each BOOK tuple 't' that satisfies the
condition 't.PRICE> 100' will be retrieved. Note that 't.PRICE' references the attribute PRICE of
the tuple variable '1'.
The query {t IBOOK (t) and t.PRICE>100} retrieves all attribute values for each selected
BOOK tuple. To retrieve only some of the attributes (say TITLE, AUTHOR and PRICE) we can
modify the query as follows:
For each tuple variable the range relation 'R' of 'to This value is specified by a condition of the
form R(t) .
Thus, to retrieve the details of all books (Title and Author name) which were published by
'Kalyani' and whose price is greater than 100 we will write the query as follows:
Where tj,t2 1n,1n+j,1n+2 1n+mare tuple variables, each Ai is an attribute of the relation on which
ranges and COND is a condition or formula of the tuple relational calculus.
Every condition is a WFF (Well Formed Formula). Here, a well-formed formula is constructed
from conditions, Boolean operations (AND, OR, NOT) and quantifiers like for all values (V) or
there exists (3).
• If Fj and F2 are WFFs, then (Fj AND F2), (Fj OR F2) are also WFFs .
range variables when, the meaning of the formula changed if all the
occurrences of range variable say 'x' were replaced by some other variables say
128
• Nothing else is WFF.
Bound variables: Bound variables are those range variables when the meaning of the formula
would remain unchanged if all the occurrence of range variable say 'x' were replaced by some
other variable say 'y'. Then range variable 'x' is called as the Bound variable.
Here, WFF simply states that there exists some integer x that is greater than 3. Note, that the
meaning of this WFF would remain totally unchanged if all references of x were replaced by
references to some other variable y. In other words the WFF EXISTS y(y>3) is semantically
same.
Free Variables: Free variables are those range variables when the meaning of the formula
changed, if all the occurrences of range variable say 'x' were replaced by some other variables
say 'y'. Then range variable 'x' is called as the Free variable.
Here, there are three references to x, denoting two different variables. The first two references are
bound and could be replaced by references to some other variable y without changing the overall
meaning. The third reference is free, and cannot be replaced without changing the meaning of the
formula. Thus, of the two WFFs shown below, the first is equivalent to the one just given and the
second is not: -
Closed and Open WFF: A WFF in which all variables references are bound is called Closed
WFF.
e.g. EXISTS x (x>3) is a closed WFF.
An open WFF is a WFF that is not closed i.e. one that consists of at least one free variable
reference.
e.g. EXISTS y (y>3) and x<O
129
DOMAIN RELATIONAL CALCULUS
130
The domain calculus differs from the tuple calculus in the type of variables used in formulas. In
domain calculus the variables range over single values from domains of attributes rather than
ranging over tuples. To form a relation of degree 'n' for a query result, we must have 'n' of these
domain variables-one for each attribute.
In the above expression Xl, X2, ... , Xn, Xn+b Xn+2, , Xn+m are domain variables that range over
domains of attributes and COND is a condition or formula of the domain relational calculus.
Expression of the domain calculus are constructed from the following elements:
• Domain variables Xl, X2, ... , Xn, Xn+b Xn+2, ... , Xn+m each domain variable is to range
over some specified domain .
• Simple comparisons of the form x * y, as for the tuple calculus, except that x and yare now
domain variables.
Here, R is a relation, and each "term" is a pair AV, where A in turn is an attribute
Of R and V is either a domain variable or a constant. For example EMP (empno: 100, ename:
'Ajay') is a membership condition (which evaluates to true if and only if there exists an EMP tuple
having empno=100 and ename = 'Ajay') .
• Well Formed Formulates (WFFs), formed in accordance with rules of tuple calculus (but with
the revised definition of "condition").
The rules concerning free and bound variables given for the tuple calculus are also applicable
similarly on the domain calculus.
131
SQL
What is SQL?
SQL is structured Query Language which is a computer language for storing, manipulating and
retrieving data stored in relational database.
SQL is the standard language for Relation Database System. All relational database management
systems like MySQL, MS Access, Oracle, Sybase, Informix, postgres and SQL Server uses SQL
as standard database language.
Why SQL?
SQL Process:
When you are executing an SQL command for any RDBMS, the system determines the best
way to carry out your request and SQL engine figures out how to interpret the task.
There are various components included in the process. These components are Query
Dispatcher, Optimization engines, Classic Query Engine and SQL query engine etc. Classic
query engine handles all non-SQL queries but SQL query engine won't handle logical files.
132
SQL DATABASE :
MySQL
MySQL is open source SQL database, which is developed by Swedish company MySQL AB.
MySQL is pronounced "my ess-que-ell," in contrast with SQL, pronounced "sequel."
MySQL is supporting many different platforms including Microsoft Windows, the major Linux
distributions, UNIX, and Mac OS X.
MySQL has free and paid versions, depending on its usage (non-commercial/commercial)
and features. MySQL comes with a very fast, multi-threaded, multi-user, and robust SQL
database server.
Features:
High Performance.
High Availability.
Scalability and Flexibility Run anything.
Robust Transactional Support.
Web and Data Warehouse Strengths.
Strong Data Protection.
Comprehensive Application Development.
Management Ease.
Open Source Freedom and 24 x 7 Support.
Lowest Total Cost of Ownership.
133
MS SQL Server
MS SQL Server is a Relational Database Management System developed by Microsoft Inc. Its
primary query languages are:
T-SQL.
ANSI SQL.
Features:
High Performance.
High Availability.
Database mirroring.
Database snapshots.
CLR integration.
Service Broker.
DDL triggers.
Ranking functions.
Row version-based isolation levels.
XML integration.
TRY...CATCH.
Database Mail.
ORACLE
It is very large and multi-user database management system. Oracle is a relational database
management system developed by 'Oracle Corporation'.
Oracle works to efficiently manage its resource, a database of information, among the multiple
clients requesting and sending data in the network.
It is an excellent database server choice for client/server computing. Oracle supports all major
operating systems for both clients and servers, including MSDOS, NetWare, UnixWare, OS/2
and most UNIX flavors.
Features:
Concurrency
Concurrency
Read Consistency
134
Locking Mechanisms
Quiesce Database
Portability
Self managing database
SQL*Plus
ASM
Scheduler
Resource Manager
Data Warehousing
Materialized views
Bitmap indexes
Table compression
Parallel Execution
Analytic SQL
Data mining
Partitioning
MS- ACCESS
This is one of the most popular Microsoft products. Microsoft Access is entry-level database
management software. MS Access database is not only an inexpensive but also powerful
database for small-scale projects.
MS Access uses the Jet database engine which utilizes a specific SQL language dialect
Features:
Users can create tables, queries, forms and reports, and connect them together with macros.
The import and export of data to many formats including Excel, Outlook, ASCII,
dBase, Paradox, FoxPro, SQL Server, Oracle, ODBC, etc.
There is also the Jet Database format (MDB or ACCDB in Access 2007) which can
contain the application and data in one file. This makes it very convenient to distribute
the entire application to another user, who can run it in disconnected environments.
Microsoft Access offers parameterized queries. These queries and Access tables
can be referenced from other programs like VB6 and .NET through DAO or ADO.
The desktop editions of Microsoft SQL Server can be used with Access as an
alternative to the Jet Database Engine.
Microsoft Access is a file server-based database. Unlike client-server relational
database management systems (RDBMS), Microsoft Access does not implement
135
database triggers, stored procedures, or transaction logging.
136
SQL data type is an attribute that specifies type of data of any object. Each column,
variable and expression has related data type in SQL.
You would use these data types while creating your tables. You would choose a
particular data type for a table column based on your requirement.
SQL Server offers six categories of data types for your use:
tinyint 0 255
bit 0 1
money - +922,337,203,685,477.58
922,337,203,685,477.58 07
08
smallmoney -214,748.3648 +214,748.3647
138
Smalldatetime Jan 1, 1900 Jun 6, 2079
1991
Maximum of 8,000
varchar
characters.(Variable-length
varchar non- Unicode data).
Maximum length of
varchar(max)
231characters, Variable-length
varchar(max) non-Unicode data (SQL Server
2005 only).
Variable-length non-Unicode
text text
data with a maximum length
of 2,147,483,647 characters.
Operator in SQL
Operators are used to specify conditions in an SQL statement and to serve as conjunctions for
multiple conditions in a statement.
139
Arithmetic operators
Comparison operators
Logical operators
Operators used to negate conditions
140
SQL Arithmetic Operators:
142
Checks if the value of left operand is
greater (a > b) is not true.
> than the value of right operand, if
yes then condition becomes true.
Operator Description
143
ALL The ALL operator is used to compare a value to all values in another
value set. The AND operator allows the existence of multiple conditions in an SQL
AND
statement's
WHERE clause.
The BETWEEN operator is used to search for values that are within a set
BETWEEN of values, given the minimum value and the maximum value.
144
The EXISTS operator is used to search for the presence of a row in a
EXISTS specified table that meets certain criteria.
The NOT operator reverses the meaning of the logical operator with which it is
NOT
used.
Eg. NOT EXISTS, NOT BETWEEN, NOT IN etc. This is negate operator.
IS NULL The NULL operator is used to compare a value with a NULL value.
COMMANDS IN SQL :
1. CREATE DATABASE
The SQL CREATE DATABASE statement is used to create new SQL database.
Syntax:
Example:
If you want to create new database <testDB>, then CREATE DATABASE statement would be as
follows:
2. DROP DATABASE
145
The SQL DROP DATABASE statement is used to drop any existing database in SQL schema.
Syntax:
Example:
If you want to delete an existing database <testDB>, then DROP DATABASE statement would
be as follows:
3. USE
The SQL USE statement is used to select any existing database in SQL schema.
Syntax:
USE DatabaseName;
4. CREATE TABLE
Syntax:
CREATE TABLE
table_name( column1
datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
146
PRIMARY KEY( one or more columns )
);
147
CREATE TABLE is the keyword telling the database system what you want to do.in this case,
you want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data type it is.
The syntax becomes clearer with an example below.
A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement.
5. DROP TABLE
The SQL DROP TABLE statement is used to remove a table definition and all data, indexes,
triggers, constraints, and permission specifications for that table.
Syntax:
6. INSERT INTO
The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.
Syntax:
Here column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:
You can create a record in CUSTOMERS table using second syntax as follows:
All the above statement would product following records in CUSTOMERS table:
+ -+ + + -+ +
149
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
150
7. SELECT
SQL SELECT Statement is used to fetch the data from a database table which returns data in
the form of result table. These result tables are called result-sets.
Syntax:
Here column1, column2...are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field then you can use following syntax:
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would fetch ID, Name and Salary fields of the customers available
in CUSTOMERS table:
+ -+ + +
| ID | NAME| SALARY |
151
+ -+ + +
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |
152
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
8. WHERE CLAUSE
The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple table.
If the given condition is satisfied then only it returns specific value from the table. You would use
WHERE clause to filter the records and fetching only necessary records.
The WHERE clause not only used in SELECT statement, but it is also used in UPDATE, DELETE
statement etc. which we would examine in subsequent chapters.
Syntax:
You can specify a condition using comparision or logical operators like >, <, =, LIKE, NOT etc.
Below examples would make this concept clear.
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
153
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
154
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000:
+ -+ + +
| ID | NAME| SALARY |
+ -+ + +
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
The SQL AND and OR operators are used to combile multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.
These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.
The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.
Syntax:
You can combine N number of conditions using AND operator. For an action to be taken by
155
the SQL statement, whether it be a transaction or query, all conditions separated by the AND
must be TRUE.
156
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:
+ -+ + +
| ID | NAME | SALARY |
+ -+ + +
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +
10. UPDATE
The SQL UPDATE Query is used to modify the existing records in a table.
You can use WHERE clause with UPDATE query to update selected rows otherwise all the
rows would be effected.
Syntax:
157
The basic syntax of UPDATE query with WHERE clause is as follows:
UPDATE table_name
158
SET column1 = value1, column2 = value2 , columnN = valueN
WHERE [condition];
11. DELETE
The SQL DELETE Query is used to delete the existing records from a table.
You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.
Syntax:
DELETE FROM
table_name WHERE
[condition];
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
159
Now CUSTOMERS table would have following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
160
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
12. LIKE
The SQL LIKE clause is used to compare a value to similar values using wildcard operators.
There are two wildcards used in conjunction with the LIKE operator:
The percent sign represents zero, one, or multiple characters. The underscore represents a single
number or character. The symbols can be used in combinations.
Syntax:
SELECT FROM
table_name WHERE
column LIKE 'XXXX%'
or
or
SELECT FROM
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
161
table_name WHERE
column LIKE 'XXXX_'
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
162
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would display all the records from CUSTOMERS table where
SALARY starts with 200:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ + + -+ +
13. TOP
The SQL TOP clause is used to fetch a TOP N number or X percent records from a table.
Syntax:
The basic syntax of TOP clause with SELECT statement would be as follows:
Example:
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
163
Consider CUSTOMERS table is having following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
164
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example on SQL server which would fetch top 3 records from CUSTOMERS table:
+ -+ -+ -+ +- +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ -+ -+ +- +
14. ORDER BY
The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on
one or more columns. Some database sorts query results in ascending order by default.
Syntax:
SELECT
column-list
FROM
table_name
[WHERE
condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
You can use more than one column in the ORDER BY clause. Make sure whatever column you
are using to sort, that column should be in column-list.
Example:
165
Consider CUSTOMERS table is having following records:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
166
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is an example which would sort the result in ascending order by NAME and SALARY:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+ -+ + + -+ +
15. GROUP BY
The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange
identical data into groups.
The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.
Syntax:
The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the
167
conditions in the WHERE clause and must precede the ORDER BY clause if one is used.
SELECT column1,
column2 FROM
table_name WHERE [
conditions ]
GROUP BY column1, column2
168
ORDER BY column1, column2
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
If you want to know the total amount of salary on each customer, then GROUP BY query would
be as follows:
+ + +
| NAME | SUM(SALARY) |
+ + +
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh |2000.00 |
+ + +
169
The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate all the
duplicate records and fetching only unique records.
There may be a situation when you have multiple duplicate records in a table. While fetching such
records, it makes more sense to fetch only unique records instead of fetching duplicate records.
170
Syntax:
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
First let us see how the following SELECT query returns duplicate salary records:
This would produce following result where salary 2000 is coming twice which is a duplicate
record from the original table.
+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
171
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +
Now let us use DISTINCT keyword with the above SELECT query and see the result:
172
SQL> SELECT DISTINCT SALARY FROM
CUSTOMERS ORDER BY SALARY;
This would produce following result where we do not have any duplicate entry:
+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +
17. CONSTRAINTS
Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Contraints could be column level or table level. Column level constraints are applied only to one
column where as table level constraints are applied to the whole table.
Following are commonly used constraints available in SQL. These constraints have already
been discussed in SQL - RDBMS Concepts chapter but its worth to revise them at this point.
NOT NULL Constraint: Ensures that a column cannot have NULL value.
DEFAULT Constraint : Provides a default value for a column when none is specified.
UNIQUE Constraint: Ensures that all values in a column are different.
PRIMARY Key: Uniquely identified each rows/records in a database table.
FOREIGN Key: Uniquely identified a rows/records in any another database table.
CHECK Constraint: The CHECK constraint ensures that all values in a column
satisfy certain conditions.
INDEX: Use to create and retrieve data from the database very quickly.
Constraints can be specified when a table is created with the CREATE TABLE statement or you
can use ALTER TABLE statment to create constraints even after the table is created.
Dropping Constraints:
173
Any constraint that you have defined can be dropped using the ALTER TABLE command with the
DROP CONSTRAINT option.
174
For example, to drop the primary key constraint in the EMPLOYEES table, you can use the
following command:
Some implementations may provide shortcuts for dropping certain constraints. For example, to
drop the primary key constraint for a table in Oracle, you can use the following command:
Integrity Constraints:
Integrity constraints are used to ensure accuracy and consistency of data in a relational database.
Data integrity is handled in a relational database through the concept of referential integrity.
There are many types of integrity constraints that play a role in referential integrity (RI). These
constraints include Primary Key, Foreign Key, Unique Constraints and other constraints
mentioned above.
The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
175
(a) Another table is ORDERS as follows:
+ + + + +
176
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
Now let us join these two tables in our SELECT statement as follows:
+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +
Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.
178
19. UNION
The SQL UNION clause/operator is used to combine the results of two or more SELECT
statements without returning any duplicate rows.
To use UNION, each SELECT must have the same number of columns selected, the same
number of column expressions, the same data type, and have them in the same order but they do
not have to be the same length.
Syntax:
SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]
UNION
SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]
Here given condition could be any given expression based on your requirement.
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
179
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
+ + + + +
180
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
Now let us join these two tables in our SELECT statement as follows:
+ + +- +- +
| ID | NAME | AMOUNT | DATE |
+ + +- +- +
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+ + +- +- +
20. NULL
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.
181
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.
182
Syntax:
Here NOT NULL signifies that column should always accept an explicit value of the given
data type. There are two column where we did not use NOT NULL which means these column
could be NULL.
A field with a NULL value is one that has been left blank during record creation.
Example:
The NULL value can cause problems when selecting data, however, because when comparing an
unknown value to any other value, the result is always unknown and not included in the final results.
You must use the IS NULL or IS NOT NULL operators in order to check for a NULL value.
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | |
| 7 | Muffy | 24 | Indore | |
+ -+ + + -+ +
183
Now following is the usage of IS NOT NULL operator:
184
WHERE SALARY IS NOT
NULL;
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
+ -+ + + -+ +
21. ALIAS
You can rename a table or a column temporarily by giving another name known as alias.
The use of table aliases means to rename a table in a particular SQL statement. The renaming is a
temporary change and the actual table name does not change in the database.
The column aliases are used to rename a table's columns for the purpose of a particular SQL query.
Syntax:
SELECT column_name AS
alias_name FROM table_name
WHERE [condition];
Example:
185
Consider following two tables, (a) CUSTOMERS table is as follows:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
186
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
+ + + + +
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +
+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +
187
SQL> SELECT ID AS CUSTOMER_ID, NAME AS
CUSTOMER_NAME FROM CUSTOMERS
WHERE SALARY IS NOT NULL;
+ + -+
188
| CUSTOMER_ID | CUSTOMER_NAME |
+ + -+
| 1 ||
Ramesh
| 2 | Khilan |
| 3 ||
kaushik
| 4 ||
Chaitali
| 5 | Hardik |
| 6 | Komal |
| 7 | Muffy |
+ + -+
The SQL ALTER TABLE command is used to add, delete, or modify columns in an existing table.
You would also use ALTER TABLE command to add and drop various constraints on a an
existing table.
Syntax:
The basic syntax of ALTER TABLE to add a new column in an existing table is as follows:
The basic syntax of ALTER TABLE to DROP COLUMN in an existing table is as follows:
The basic syntax of ALTER TABLE to change the DATA TYPE of a column in a table is as
follows:
The basic syntax of ALTER TABLE to add a NOT NULL constraint to a column in a table is as
follows:
The basic syntax of ALTER TABLE to ADD UNIQUE CONSTRAINT to a table is as follows:
189
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint UNIQUE(column1, column2...);
The basic syntax of ALTER TABLE to ADD CHECK CONSTRAINT to a table is as follows:
190
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint CHECK (CONDITION);
The basic syntax of ALTER TABLE to ADD PRIMARY KEY constraint to a table is as follows:
The basic syntax of ALTER TABLE to DROP CONSTRAINT from a table is as follows:
The basic syntax of ALTER TABLE to DROP PRIMARY KEY constraint from a table is as
follows:
ALTER TABLE
table_name DROP
PRIMARY KEY;
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
191
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
192
Following is the example to ADD a new column in an existing table:
Now CUSTOMERS table is changed and following would be output from SELECT statement:
+ -+ -+ -+ +- -+ +
| ID | NAME | AGE | ADDRESS | SALARY | SEX |
+ -+ -+ -+ +- -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 | NULL |
| 2 | Ramesh | 25 | Delhi | 1500.00 | NULL |
| 3 | kaushik | 23 | Kota | 2000.00 | NULL |
| 4 | kaushik | 25 | Mumbai | 6500.00 | NULL |
| 5 | Hardik | 27 | Bhopal | 8500.00 | NULL |
| 6 | Komal | 22 | MP | 4500.00 | NULL |
| 7 | Muffy | 24 | Indore | 10000.00 | NULL |
+ -+ -+ -+ +- -+ +
Now CUSTOMERS table is changed and following would be output from SELECT statement:
+ -+ -+ -+ +- -+
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- -+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ -+ -+ +- -+
The SQL TRUNCATE TABLE command is used to delete complete data from an existing table.
193
You can also use DROP TABLE command to delete complete table but it would remove
complete table structure form the database and you would need to re-create this table once
again if you wish you store some data.
194
Syntax:
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Now CUSTOMERS table is truncated and following would be output from SELECT statement:
The HAVING clause enables you to specify conditions that filter which group results appear in
the final results.
The WHERE clause places conditions on the selected columns, whereas the HAVING clause places
conditions on groups created by the GROUP BY clause.
Syntax:
195
SELE
CT
FROM
WHER
E
196
GROUP
BY
HAVING
ORDER
BY
The HAVING clause must follow the GROUP BY clause in a query and must also precede the
ORDER BY clause if used. The following is the syntax of the SELECT statement, including the
HAVING clause:
SELECT column1,
column2 FROM
table1, table2 WHERE
[ conditions ]
GROUP BY column1,
column2 HAVING [
conditions ] ORDER BY
column1, column2
Example:
+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +
Following is the example which would display record for which similar age count would be
more than or equal to 2:
+ -+ +- + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ +- + + +
| 2 | Khilan | 25 | Delhi | 1500.00 |
+ -+ +- + + +
198
24. DATE FUNCTIONS
Following is the list of all important Date and Time related functions available through SQL.
There are various other functions supported by your RDBMS. Given list is based on MySQL
RDBMS.
Name Description
CURRENT_TIMESTAM
Synonyms for NOW()
P(),
CURRENT_TIMESTAM
P
199
DAY() Synonym for DAYOFMONTH()
200
DAYOFYEAR() Return the day of the year (1-366)
LOCALTIMESTAM
Synonym for NOW()
P,
LOCALTIMESTAM
P()
MAKETIME MAKETIME()
202
SUBTIME() Subtract times
203
Examples of SQL Commands in Different Tables:
ITEMS table
This table stores information about all the items that are offered by compnay. The structure of
the table is as follows:
Column Datatype Meaning
Itemno Number(5) A unique number assigned to each
item.
ItemName Varchar2(20) Name of the item.
Rate Number(8,2) Rate of the item.
taxrate Number(4,2) Sales tax rate for this item.
create table
ITEMS (
itemno number(5) constraint items_pk primary key,
itemname varchar2(20),
rate number(8,2) constraint items_rate_chk check( rate >= 0),
taxrate number(4,2) default 0 constraint items_rate_chk check( rate >= 0)
);
CUSTOMERS Table
This table contains information about customers who have placed one or more orders. The
204
following is the structure of the table.
Column Datatype Meaning
205
Custno Number(5) A unique number assigned to each customer.
CustName Varchar2(20) Complete name of the customer.
Address1 varchar2(50) First line of address.
Address2 varchar2(50) Second line of address.
City varchar2(30) Name of the city where customer lives.
state varchar2(30) Name of the state where customer lives.
PinCode varchar2(10) Pincode of the city.
Phone varchar2(30) One or more phone numbers separated using
comma(,).
create table
CUSTOMERS (
custno number(5) constraint customers_pk primary key,
custname varchar2(20) constraint customers_custname_nn not
null, address1 varchar2(50),
address2
varchar2(50), city
varchar2(30),
state varchar2(30),
pin varchar2(10),
phone varchar2(30)
);
207
ORDERS Table
Contains information about all orders placed by customers. Contains one row for each order.
The details of items ordered in an order will be found in LINEITEMS table. The following is
the structure of the table.
Column Datatype Meaning
OrdNo Number(5) A unique number assigned to each order.
OrdDate Date Date on which order is placed.
ShipDate Date Date on which goods are to be shipped to customer.
Address1 varchar2(50) First line of shipping address.
Address2 varchar2(50) Second line of shipping address.
City varchar2(30) City name in shipping address.
state varchar2(30) State name in shipping address.
PinCode varchar2(10) Pincode of the city in shipping address.
Phone varchar2(30) One or more phone numbers separated using comma(,) of shipping
place.
create table
ORDERS (
ordno number(5) constraint orders_pk primary
key, orddate date,
shipdate date,
custno number(5) constraint orders_custno_pk references
customers, address1 varchar2(50),
address2
varchar2(50), city
varchar2(30),
state varchar2(30),
208
pin varchar2(10),
209
phone varchar2(30),
constraint order_dates_chk check( orddate <= shipdate)
);
LINEITEMS Table
Contains details of items ordered in each order. For each item in each order this table contains
one row. The following is the structure of the table.
Column Datatype Meaning
OrdNo Number(5) Refers to the order number of the order.
Itemno Number(5) Refers to the item number of the item.
qty number(3) Howmany units of this item arerequired in this
order.
price Number(8,2) Selling price of the item for this order.
DisRate Number(4,2) Discount Rate for this item in this order.
211
Default DISRATE is 0
QTY must be >= 1
DISRATE must be >= 0
create table
LINEITEMS (
ordno number(5) constraint LINEITEMS_ORDNO_FK references
ORDERS, itemno number(5) constraint LINEITEMS_itemno_FK
references ITEMS, qty number(3) constraint LINEITEMS_qty_CHK
CHECK( qty >= 1),
price number(8,2),
disrate number(4,2) default 0
constraint LINEITEMS_DISRATE_CHK CHECK( disrate >= 0),
constraint lineitems_pk primary key (ordno,itemno)
);
212
insert into lineitems
values(1006,2,10,950,20.0); insert into
lineitems values(1006,4,5,7800,10.0); insert
into lineitems values(1006,3,5,6600,15.0);
213
You can download and run sql scripts used to create these tables and data using the following
procedure.
Queries
215
DISPLAY ALL THE ORDERS THAT ARE PLACED IN THE CURRENT MONTH
DISPLAY ORDERNO FOR ORDERS WHERE ATLEAST ONE PRODUCT IS HAVING RATE
MORE THAN 5000 AND TOTAL NO.OF UNITS IS MORE THAN 10
SELECT ORDNO
FROM LINEITEMS
GROUP BY ORDNO
HAVING MAX(PRICE) > 5000 AND SUM(QTY) > 10;
DISPLAY CUSTNO WHO HAVE PLACED MORE THAN 2 ORDERS IN THE LAST 3
216
MONTHS
217
SELECT CUSTNO
FROM ORDERS
WHERE ORDDATE > ADD_MONTHS(SYSDATE,-3)
GROUP BY CUSTNO
HAVING COUNT(*) > 2;
DISPLAY CUSTNO,DATE ON WHICH FIRST ORDER WAS PLACED AND THE GAP
BETWEEN FIRST ORDER AND LAST ORDER IN DAYS
DISPLAY ORDERNO,MAX PRICE IN THE ORDER FOR THE ORDERS WHERE THE
AMOUNT OF ITEMS IS MORE THAN 10000
219
FROM ORDERS O, CUSTOMERS C
WHERE SHIPDATE IS NOT NULL AND O.CUSTNO = C.CUSTNO;
AND ORDNO IN
DISPLAY DETAILS OF CUSTOMERS WHO PLACED ANY ORDERS WORTH MORE THAN
30000
220
SELECT * FROM
CUSTOMERS WHERE
CUSTNO IN
( SELECT CUSTNO
FROM ORDERS
WHERE ORDNO IN
( SELECT ORDNO
FROM LINEITEMS
GROUP BY
ORDNO
HAVING SUM(QTY*PRICE) > 30000)
);
SELECT * FROM
ORDERS WHERE
ORDNO IN
(
SELECT ORDNO
FROM LINEITEMS
WHERE PRICE =
( SELECT MAX(PRICE) FROM LINEITEMS
WHERE ITEMNO = 3)
AND ITEMNO = 3
);
221
DISPLAY DETAILS OF ITEMS FOR WHICH THERE IS AN ORDER IN THE LAST 7
DAYS OR TOTAL NO.OF UNITS ORDERED IS MORE THAN 10.
222
FROM LINEITEMS
WHERE ORDNO IN
(SELECT ORDNO FROM ORDERS WHERE SYSDATE-ORDDATE <= 7)
)
OR ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
GROUP BY
ITEMNO
HAVING SUM(QTY) > 10
);
DISPLAY ALL THE LINEITEMS IN WHICH THE RATE OF THE ITEM IS MORE THAN AVG
RATE OF THE ITEMS
223
SELECT * FROM
ORDERS WHERE
ORDNO IN
( SELECT ORDNO
FROM LINEITEMS L, ITEMS I
224
WHERE L.ITEMNO = I.ITEMNO
AND PRICE > RATE );
DETAILS OF CUSTOMERS WHO HAVE NOT PLACED ANY ORDER FOR THE LAST 15
DAYS
SELECT * FROM
CUSTOMERS WHERE
CUSTNO NOT IN (SELECT
CUSTNO
FROM ORDERS
WHERE SYSDATE - ORDDATE <= 15);
SELECT O1.*
FROM ORDERS O1, ORDERS O2
WHERE TO_CHAR( O1.ORDDATE,'MMYY') =
TO_CHAR(SYSDATE,'MMYY') OR (O2.ORDNO = 1004 AND
O1.ORDDATE > O2.ORDDATE);
225
SELECT * FROM ITEMS
WHERE ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
WHERE ORDNO
IN
( SELECT ORDNO
226
FROM ORDERS
WHERE CUSTNO =
102
)
);
CHANGE SHIPDATE OF ORDER 1004 TO THE ORDER DATE OF MOST RECENT ORDER
UPDATE ORDERS
SET SHIPDATE = ( SELECT MAX(ORDDATE)
FROM ORDERS)
WHERE ORDNO = 1004;
DISPLAY DETAILS OF ORDERS THAT WERE PLACED IN THE MONTH OF JUNE 2000.
227
DISPLAY ORDERNO,ORDERDATE AND APPROXIMATE SHIPDATE(15 DAYS FROM
ORDDATE) FOR ALL ORDERS THAT ARE NOT SHIPPED.
228
SELECT ORDNO, ORDDATE, ORDDATE + 15
"SHIPDATE" FROM ORDER WHERE SHIPDATE IS
NULL;
DISPLAY THE DETAILS OF ORDERS THAT PLACED IN THE LAST 20 DAYS AND
DELIVERED.
CHANGE THE RATE OF ITEMS IN ORDER 1003 SO THAT 10% DISCOUNT IS GIVEN
TO ALL ITEMS.
230
WHERE LENGTH(ITEMNAME) > 10;
DISPLAY ITEMS WHERE ITEMNAME CONTAINS LETTER ‘O’ AFTER 5TH POSITION.
CUSTOMER.
SELECT ITEMNO,
UPPER(ITEMNAME) FROM ITEMS
WHERE UPPER(ITEMNAME) LIKE '%M%';
231
DISPLAY ITEMNO,PRICE,QUANTITY,DISCOUNT RATE FOR ITEMS WHERE THE
DISCOUNT RATE IS NON-ZERO. DISCOUNT-RATE IS CALUCULATED AS 10% FOR
ITEM 1,7% FOR ITEM 6 AND 8% FOR REMAINING.
232
SELECT ITEMNO, PRICE, QTY, DECODE(ITEMNO,1,10,6,7,10)
"DISRATE" FROM LINEITEMS
WHERE DISRATE <> 0
SELECT SUM(QTY*PRICE)
FROM LINEITEMS;
SELECT COUNT(*)
FROM ORDERS
WHERE SHIPDATE IS NULL;
SELECT STATE,
COUNT(*) FROM
CUSTOMERS
WHERE CUSTNAME LIKE '%NIKE%'
GROUP BY STATE;
DISPLAY CUSTOMER WHO HAS PLACED MORE THAN 2 ORDERS IN A SINGLE MONTH.
SELECT CUSTNO
FROM ORDERS
GROUP BY CUSTNO, TO_CHAR(ORDDATE,'MMYY')
HAVING COUNT(*) > 2;
235
DISPLAY ORDERNO,ITEMNO,AMOUNT FOR ITEMS WHERE THE PRICE OF THE
ITEM IS MORE THAN THE CURRENT PRICE OF THE ITEM.
237
DISPLAY THE ITEMS FOR WHICH WE HAVE SOLD MORE THAN 50 UNITS BY
TAKING INTO ORDERS WHERE THE PRICE IS MORE THAN 5000.
238
SELECT * FROM CUSTOMERS
WHERE CUSTNO IN ( SELECT CUSTNO FROM ORDERS
239
GROUP BY CUSTNO HAVING COUNT(*)
= ( SELECT MAX(COUNT(*))
FROM ORDERS
GROUP BY
CUSTNO));
DISPLAY DETAILS OF ITEMS FOR WHICH THERE WAS NO ORDER IN THE CURRENT
MONTH BUT THERE WAS AN ORDER IN THE PREVIOUS MONTH.
241
DISPLAY THE ORDERS IN WHICH THE GAP BETWEEN SHIPDATE AND ORDERDATE IS
MORE THAN THE AVERAGE GAP FOR INDIVIDUAL CUSTOMERS.
DISPLAY THE DETAILS OF ITEMS IN WHICH THE CURRENT PRICE IS MORE THAN THE
MAXIMUM PRICE AT WHICH WE SOLD IT.
<
ADD A NEW ITEM TO THE LAST ORDER PLACED BY CUSTOMER 106 WITH THE
242
FOLLOWING DETAILS- ITEMNO-3,QUANTITY-2,PRICE AS THE CURRENT RATE OF
THE ITEM,DISCOUNT-8%.
243
DECLARE
V_ORDNO ORDERS.ORDNO
%TYPE; V_RATE ITEMS.RATE
%TYPE;
BEGIN
SELECT MAX(ORDNO) INTO
V_ORDNO FROM ORDERS WHERE
CUSTNO = 106;
(V_ORDNO,3,2,V_RATE,8); END;
DECLARE
V_APRICE LINEITEMS.PRICE
%TYPE; V_RATE ITEMS.RATE
%TYPE;
BEGIN
SELECT AVG(PRICE) INTO V_APRICE
FROM LINEITEMS WHERE ITEMNO =
5;
DECLARE
V_ORDNO ORDERS.ORDNO
%TYPE; V_PRICE
LINEITEMS.PRICE%TYPE;
245
V_DIS NUMBER(2);
V_RATE ITEMS.RATE%TYPE;
V_ITEMNO ITEMS.ITEMNO%TYPE;
BEGIN
SELECT MAX(ORDNO) INTO V_ORDNO
FROM ORDERS WHERE CUSTNO =
102;
END;
DECLARE
V_MAXORDNO ORDERS.ORDNO
%TYPE; V_MINORDNO
ORDERS.ORDNO%TYPE; V_CNT
NUMBER(2);
BEGIN
SELECT MAX(ORDNO), MIN(ORDNO) INTO V_MAXORDNO,
V_MINORDNO FROM ORDERS;
247
IF V_CNT = 0 THEN
DBMS_OUTPUT.PUT_LINE(I);
EXIT;
END IF;
END LOOP;
END;
DISPLAY CUSTOMER NAMES OF THE CUSTOMERS WHO HAVE PLACED MORE THAN
3 ORDERS WHERE THE TOTAL AMOUNT OF THE ORDER IS MORE THAN 10,000.
SELECT CUSTNAME
FROM CUSTOMERS
WHERE CUSTNO IN ( SELECT
CUSTNO FROM
ORDERS
WHERE ORDNO IN ( SELECT ORDNO
FROM LINEITEMS
GROUP BY ORDNO
HAVING SUM(QTY*PRICE) > 10000)
GROUP BY CUSTNO
HAVING COUNT(*) > 1 );
CHANGE THE RATE OF EACH ITEM AS FOLLOWS (1) INCREASE THE RATE BY 10%
IF THE ITEM WAS SOLD IN MORE THAN 5 ITEMS. (2) INCREASE THE RATE BY 2%
IF AVERAGE PRICE IS GREATER THAN CURRENT PRICE, OTHERWISE DECREASE
THE PRICE BY 3%.
DECLARE
CURSOR CITEMS IS
SELECT ITEMNO,COUNT(*) CNT, AVG(PRICE) APRICE FROM LINEITEMS
GROUP BY ITEMNO;
V_PER NUMBER(5,2);
V_RATE ITEMS.RATE
%TYPE;
BEGIN
FOR REC IN CITEMS
LOOP
248
IF REC.CNT > 5 THEN
V_PER := 0.90;
ELSE
249
-- GET CURRENT RATE
SELECT RATE INTO V_RATE
FROM ITEMS WHERE ITEMNO = REC.ITEMNO;
END
LOOP; END;
CREATE A NEW TABLE CALLED CUSTSUM AND STORE THE FOLLOWING DATA
INTO THE TABLE - CUSTOMERNO,CUSTOMER NAME,NO.OF ORDERS PLACED,
DATE OF MOST RECENT ORDER AND TOTAL AMOUNT OF ALL THE ORDERS.
DECLARE
CURSOR CUSTCUR IS
SELECT CUSTNO, CUSTNAME FROM
CUSTOMERS; V_ORDCNT NUMBER(5);
V_MORDDATE DATE;
V_TOTAMT NUMBER(10);
250
BEGIN
251
FOR REC IN
CUSTCUR LOOP
-- GET DETAILS OF CUSTOMER
SELECT COUNT(*), MAX(ORDDATE), SUM(QTY*PRICE) INTO V_ORDCNT,
V_MORDDATE, V_TOTAMT
FROM ORDERS O, LINEITEMS L
WHERE O.ORDNO = L.ORDNO AND CUSTNO = REC.CUSTNO;
END;
DISPLAY ITEMNAMES OF ITEMS FOR WHICH THE CURRENT PRICE IS LESS THAN THE
AVERAGE PRICE OR TOTAL QUANTITY SOLD IS LESS THAN 10 UNITS.
SELECT
ITEMNAME FROM
ITEMS WHERE
ITEMNO IN
( SELECT ITEMNO
FROM ITEMS I
WHERE RATE < ( SELECT AVG(PRICE) FROM LINEITEMS WHERE ITEMNO =
I.ITEMNO)
)
OR ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
GROUP BY ITEMNO
HAVING SUM(QTY) > 10 );
253
INSERT INTO LINEITEMS VALUES ( P_ORDNO, P_ITEMNO, V_RATE, 1,
10); END;
BEGIN
SELECT MAX(ORDNO), MIN(ORDNO) INTO V_MAXORDNO,
V_MINORDNO FROM ORDERS;
FOR I IN
V_MINORDNO..V_MAXORDNO LOOP
SELECT COUNT(*) INTO V_CNT
FROM ORDERS WHERE ORDNO = I;
IF V_CNT = 0 THEN
RETURN I;
END IF;
END LOOP;
-- NO MISSING
ORDNO RETURN
NULL;
END;
255
RETURN V_CUSTNAME;
END;
-- CHECK CONDITIONS
IF V_CNT = 1 THEN
RAISE_APPLICATION_ERROR(-20002,'ITEM IS ALREADY
EXISTING'); END IF;
256
SELECT SUM(QTY * PRICE) INTO V_TOTAMT
FROM LINEITEMS WHERE ORDNO = V_ORDNO;
257
RAISE_APPLICATION_ERROR(-20003,'TOTAL AMOUNT EXCEEDED
50000'); END IF;
P_PRICE,P_QTY,0); END;
DO NOT ALLOW ANY CHANGE TO ITEM RATE IN SUCH A WAY DIFFERENCE IS MORE
THAN 25% OF THE EXISTING RATE.
258
CREATE OR REPLACE TRIGGER TRGDIFFRATE
259
BEFORE UPDATE
ON ITEMS
FOR EACH ROW
DECLARE
V_DIFF
NUMBER(5); BEGIN
V_DIFF := ABS(:NEW.RATE - :OLD.RATE);
IF; END;
The database development life cycle (DDLC) is a process of designing, implementing and
maintaining a database system to meet strategic or operational information needs of an
organization or enterprise such as:
PHASES OF DDLC
The software development is the group of actions needed to transform the user’s need into an
effectual software solution. Software development procedure consist the activities needed for
building the software systems and integrating the techniques and practices to be accepted. It
also includes the planning of project, tracking development and managing the complications of
building software.
This different database related activities can be grouped into below phases (more commonly
known as DDLC – Database Development Life Cycle):
Requirements Analysis Database Design Evaluation and Selection Logical Database Design
Physical Database Design Implementation Data Loading Testing and Performance Tuning
Operation Maintenance
260
Requirements Analysis
The most important step n implementing a database system is to find out what is needed —
What type of a database is required for the business organization, daily volume of the data, how
much data needs to be stored in the master files etc. In order to collect all this required
information, a database analyst need to spend a lot of time within the business organization
talking to people, end users and get acquainted with day-to-day process.
Database Design
In this stage the database designers will make a decision on the database model that is perfectly
suited for the organization’s requirements. The database designers will study the documents
prepared by the analysts in the requirements analysis stage and then start developing a system
that fulfills the needs.
Once the data model is designed, tested and demonstrated, the next phase is to evaluate the diverse
database management systems and choose the one that is perfectly suited for the requirements of the
organization. In order to identify best performing database for the organization, end user should be
involved in this phase.
Once the evaluation and selection phase is completed successfully, the next step n the database
development life cycle is logical database design. The conceptual design is translated into
internal model in the logical design phase. This includes the mapping of all objects i.e. tables
design, indexes, views, transactions, access privileges etc.
Physical database design is the procedure of selecting and characterizing the data storage and
data access of the database. The data storage depends on the type of devices supported by the
hardware, the data access methods and the DBMS.
Physical design is mainly significant for older database models like hierarchical and network
models. Physical design is very vital in database development life cycle and has great
significance as a bad design can result in deprived performance.
Implementation
261
In most databases a new database implementation needs the formation of special storage
related constructs to house the end user tables. These constructs typically comprise storage
group, table spaces, data files, tables etc.
Data Loading
Once the database has been created, the data must be loaded into the database. The data required
to be converting and migrating to the new database, if the loaded data is currently stored n a
different system or in a different format.
The next phase is testing and performance tuning, this phase starts soon the data is loaded into
the database. In this phase, database is tested and fine-tuned for performance, integrity, access
and security constraints. It is very important that the database administrators and application
programmers work together during this phase, because testing and performance tuning happens
in parallel.
Operation
Once the data is loaded into the database and it s fully tested, the database is than released into
production.
In operation phase, the database is accessed by the end users and application programs. This
stage includes adding of new data, modifying existing data and deletion of obsolete data. The
database administrators perform the administrative tasks periodically such as performance
tuning, expanding storage space, database backup etc. This is the crucial phase as it provides
useful information and helps management to make a business decision, thus making the smooth
and well-organized functioning of the organization.
Maintenance
Database maintenance phase is very important and it is one of the ongoing phases in DDLC.
Factors such as new business needs, new information requirements, acquisition of new data etc
will make it essential to formulate ongoing changes and improvements to the existing design.
The major tasks in this phase include: database backup and recovery, performance tuning, design
modifications, access management and audits, usage monitoring, hardware maintenance,
upgradation etc.
Functional Dependencies
262
FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.
263
Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where
X and Y are subsets of attributes of R. An FD is a relationship between an attribute "Y" and a
determinant (1 or more other attributes) "X" such that for a given value of a determinant the
value of the attribute is uniquely defined.
X is a determinant
X determines Y
Y is functionally dependent on X
X→Y
X →Y is trivial if Y Í X
Example:
Let R be
NewStudent(stuId, lastName, major, credits, status,
ZipCode→AddressCity
ArtistName→BirthYear
Author, Title→PublDate
264
dependency between attributes, eliminating the repetition of information. Functional dependency is
related to a candidate key, which uniquely identifies a tuple and determines the value of all other
attributes in the relation. In some cases, functionally dependant sets are irreducible if: The right-
hand
265
set of functional dependency holds only one attribute. The left-hand set of functional dependency
cannot be reduced, since this may change the entire content of the set. Reducing any of the
existing functional dependency might change the content of the set. An important property of a
functional dependency is Armstrong’s axiom, which is used in database normalization. In a
relation, R, with three attributes (X, Y, Z) Armstrong’s axiom holds strong if the following
conditions are satisfied: Axiom of Transivity: If X->Y and Y->Z then, X->Z. Axiom of
Reflexivity (Subset Property): If Y is a subset of X then X->Y. Axiom of Augmentation: If X->Y
then XZ->YZ.
Dependency Preservation
o Let be a decomposition of R.
o The restriction of F to is the set of all
functional dependencies in that include
only attributes of .
o Functional dependencies in a restriction can be tested in one relation, as
they involve attributes in one relation schema.
o The set of restrictions is the set of dependencies that can be
checked efficiently.
o We need to know whether testing only the restrictions is sufficient.
o Let .
o F' is a set of functional dependencies on schema R, but in general,.
o However, it may be that .
o If this is so, then every functional dependency in F is implied by F', and if F' is
satisfied, then
F must also be satisfied.
266
o A decomposition having the property that is a dependency-preserving
decomposition.
MULTIVALUED DEPENDENCIES
Multivalued dependencies occur when the presence of one or more rows in a table implies the
presence of one or more other rows in that same table.
Examples:
267
For example, imagine a car company that manufactures many models of car, but always makes
both red and blue colors of each model. If you have a table that contains the model name, color
and year of each car the company manufactures, there is a multivalued dependency in that table.
If there is a row for a certain model name and year in blue, there must also be a similar row
corresponding to the red version of that same car.
NORMALIZATION
ANAMOLIES IN DBMS:
Insertion Anomaly
It is a failure to place information about a new database entry into all the places in the database
where information about the new entry needs to be stored. In a properly normalized database,
information about a new entry needs to be inserted into only one place in the database, in an
inadequatly normalized database, information about a new entry may need to be inserted into
more than one place, and human fallibility being what it is, some of the needed additional
insertions may be missed.
Deletion anomaly
It is a failure to remove information about an existing database entry when it is time to remove
that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry
needs to be deleted from only one place in the database, in an inadequatly normalized database,
information about that old entry may need to be deleted from more than one place.
Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus
“update anomalies” can be either of the kinds discussed above.
All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption
of the database. Properly normalized database are much less susceptible to corruption than are
un- normalized databases.
Normalization Avoids
268
Duplication of Data – The same data is listed in multiple lines of the database
Insert Anomaly – A record about an entity cannot be inserted into the table without
first inserting information about another entity – Cannot enter a customer without a
sales order
269
Delete Anomaly – A record cannot be deleted without deleting a record about a
related entity. Cannot delete a sales order without deleting all of the customer’s
information.
Update Anomaly – Cannot update information without changing information in many
places. To update customer information, it must be updated for each sales order the
customer has placed
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building
blocks which are used to define normal form.
Example: In RESULT relation, Marks attribute may decide the grade attribute .This is
represented as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.
2. Functional Dependency: Yes functional dependency has definition but let’s not care
about that. Let’s try to understand the concept by example. Consider the following
relation :
REPORT(Student#,Course#,CourseName,IName,Room#,Marks,Gra
de) Where:
Student#-Student Number
Course#-Course Number
CourseName -CourseName
IName- Name of the instructor who delivered the course
Room#-Room number which is assigned to respective instructor
Marks- Scored in Course Course# by student Student #
Grade –Obtained by student Student# in course Course #
Student#,Course# together (called composite attribute) defines EXACTLY ONE value
of marks .This can be symbolically represented as
Student#Course# Marks
This type of dependency is called functional dependency. In above example Marks is functionally
dependent on Student#Course#.
270
Other Functional dependencies in above examples are:
271
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one
value of
Y. This is represented as
: X->Y
However X may be composite in nature.
CourseName is not fully functionally dependent on student#course# because one of the subset
course# determines the course name and Student# does not having role in deciding Course name
.Hence CourseName is not fully functional dependent on student #Course#.
Student
# Marks
Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes.
Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-
set of X.However X may be composite in nature.
Room#
IName
CourseNam
e Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute
Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However
X may be composite in nature.
272
IName
Room#
Course
#
Similarly Grade depends on Marks,in turn Marks depends on Student# Course# hence Grade
273
Fully transitively depends on Student# Course#.
6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all
other attributes ,then the attribute X is a key attribute which is nothing but the
candidate key.
Ex: Student#Course# together is a composite key attribute which determines all attributes in
relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can
be further decomposed and simplified. Consider the following table:
In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).
A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.
274
Emp- Month Sales Bank-Id Bank-Name
Emp- Name
Id
E01 AA Jan 1000 B01 SBI
275
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI
As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends
on the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank- Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.
Bank- Bank-
Id Name
B01 SBI
B02 UTI
276
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.
277
Third Normal Form (3NF)
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency
in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in
3NF, and the table must be split to bring it into 3NF.
Such derived dependencies hold well in most of the situations. For example if we
have Roll → Marks
And
Marks → Grade
Then we may safely
derive Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These
could be same situation when a 3NF relation may not be in BCNF the following conditions are
found true.
279
Professor Departmen Head ofPercent
Code t Dept. Time
P1 Physics Ghosh 50
P1 Mathematic Krishnan 50
s
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematic Krishnan 100
s
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.
281
P2 Physics 75
P3 Mathematic 100
s
Head of
Departmen Dept.
t
Physics Ghosh
Mathematic Krishnan
s
Chemistry Rao
When attributes in a relation have multi-valued dependency, further Normalization to 4NF and
5NF are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.
We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:
A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
282
Vendor Item Project
Code Code No.
V1 I1 P1
V1 I2 P1
V1 I1 P3
283
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
Vendor-Supply
Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2
284
Fifth Normal Form (5NF)
285
These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved
the dependencies between Vendor Code and Item code in the first table and Vendor Code and
Item code in the second table, we have lost the relationship between Item Code and Project No.
If there were a primary key then this loss of dependency would not have occurred. In order to
revive this relationship we must add a new table like the following. Please note that during the
entire process of normalization, this is the only step where a new table is created by joining two
attributes, rather than splitting them into separate tables.
Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13
QUERY PROCESSING
Query processing includes translation of high-level queries into low-level expressions that can
be used at the physical level of the file system, query optimization and actual execution of the
query to get the result. It is a three-step process that consists of parsing and translation,
optimization and execution of the query submitted by the user.
286
A query is processed in four general steps:
287
1. Scanning and Parsing
2. Query Optimization or planning the execution strategy
3. Query Code Generator (interpreted or compiled)
4. Execution in the runtime database processor
For any given query, there may be a number of different ways to execute it.
Each operation in the query (SELECT, JOIN, etc.) can be implemented using one or more
different
Access Routines.
For example, an access routine that employs an index to retrieve some rows would be
more efficient that an access routine that performs a full table scan.
The goal of the query optimizer is to find a reasonably efficient strategy for executing
the query (not quite what the name implies) using the access routines.
Optimization typically takes one of two forms: Heuristic Optimization or Cost Based
Optimization
In Heuristic Optimization, the query execution is refined based on heuristic rules for
reordering the individual operations.
With Cost Based Optimization, the overall cost of executing the query is
systematically reduced by estimating the costs of executing several different execution
plans.
Once the query optimizer has determined the execution plan (the specific ordering of
access routines), the code generator writes out the actual access routines to be executed.
With an interactive session, the query code is interpreted and passed directly to the
runtime database processor for execution.
It is also possible to compile the access routines and store them for later execution.
288
4. Execution in the runtime database processor
At this point, the query has been scanned, parsed, planned and (possibly) compiled.
The runtime database processor then executes the access routines against the database.
The results are returned to the application that made the query in the first place.
Any runtime errors are also returned.
289
Query Optimization
1. Cast the query into some internal representation, such as a query tree structure.
2. Convert the internal representation to canonical form.
*A subset (say C) of a set of queries (say Q) is said to be a set of canonical forms for Q if and only
if every query Q is equivalent to just one query in C.
During this step, some optimization is already achieved by transforming the internal
representation to a better canonical form.
Possible improvements
*Access path selection: Consider the query expression as a series of basic operations
290
(join, restriction, etc.), then the optimizer choose from a set of pre-defined, low-level
implementation procedures. These procedures may involve the user of primary key,
foreign key or indexes and other information about the database.
291
4. Generate query plans and choose the cheapest by constructing a set of candidate
query plans first, then choose the best plan. To pick the best plan can be achieved
by assigning cost to each given plan. The costs is computed according to the
number of disk I/O's involved.
MODULE 3
Database security
Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.
Many layers and types of information security control are appropriate to databases, including:
Access control
Auditing
Authentication
Encryption
292
Integrity controls
Backups
Application security
293
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.
Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.
295
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking
A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
296
needs to be accessed by another transaction, the existing lock for the object and the type of
the
297
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.
The common interactions between these lock types are defined by blocking behavior as follows:
An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.
TRANSCATION
A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.
Example:
You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
298
• Subtracting the money from the savings account balance.
The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.
You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together
States of Transaction
Active: the initial state, the transaction stays in this state while it is executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
Committed: after successful completion.
We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.
299
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that
300
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion
The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.
Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.
A timestamp is a tag that can be attached to any transaction or any data item, which denotes a
specific time on which the transaction or data item had been activated in any way. We, who use
computers, must all be familiar with the concepts of “Date Created” or “Last Modified” properties
of files and folders. Well, timestamps are things like that.
A timestamp can be implemented in two ways. The simplest one is to directly assign the
current value of the clock to the transaction or the data item. The other policy is to attach the
value of a logical counter that keeps incrementing as new timestamps are
301
required. The timestamp of a transaction denotes the time when it was first activated. The
timestamp of a data item can be of the following two types:
302
W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.
These two timestamps are updated each time a successful read/write operation is performed on
the data item Q.
The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be
executed in their respective timestamp order. This is an alternative solution to using locks.
1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has already been overwritten by some other transaction. Hence the value which T
wanted to read from Q does not exist there anymore, and T would be rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has been written and committed by some other transaction earlier. Hence T will
be allowed to read the value of Q, and the R-timestamp of Q should be updated to TS (T).
1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for
transaction T to write its value, and the delay has become so great that it has allowed
another transaction to read the old value of data item Q. In such a case T has lost its
relevance and will be rolled back.
2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the
system has allowed another transaction to write into the data item Q. in such a case too, T
has lost its relevance and will be rolled back.
3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).
304
Transaction failure can occur when some, but not all, physical data bases are updated at
the same time.
System Failure.
System failure can be caused by bugs in the data base, operating system, or hardware. In each
case, the Transaction processing is terminated without control of the application. Data in the
memory is lost; however, disk storage remains stable. The system must recover in the amount of
time it takes to complete all interrupted transactions. At one transaction per second, the system
should recover in a few seconds. System failures may occur as often as several times a week.
Media Failure.
Disk crashes or controller failures can occur because of disk-write bugs in the operating system
release, hardware errors in the channel or controller, head crashes, or media degradation. These
failures are rare but costly.
By identifying the type of DBMS failure, an organization can define the state of activity to return
to after recovery. To design the data base recovery procedures, the potential failures must be
identified and the reliability of the hardware and software must be determined. the following is a
summary of four such recovery actions:
· TRANSACTION UNDO. a transaction that aborts itself or must be aborted by the system
during routine execution.
· GLOBAL REDO. When recovering from a system failure, the effects of all incomplete
transaction must be rolled back.
· PARTIAL UNDO. While a system is recovering from failure, the results of completed
transactions may not yet be reflected in the data base because execution has been terminated in an
uncontrolled manner. Therefore, they must be repeated, if necessary, by the recovery component.
· GLOBAL UNDO. If the data base is totally destroyed, a copy of the entire data base must be
reloaded from a backup source. A supplemental copy of the transaction is necessary to roll up the
state of the data base to the present.
Techniques for Reviewing DBMS Recovery
The review of a DBMS recovery must ensure that employees with specific responsibilities
perform their functions in accordance with operational policy and procedure. There are several
useful DBMS recovery review techniques.
There are two ways to make the system operate again. First, all transactions that have occurred
since the last backup can be reapplied, which would bring the data base up to date.
Second, the current contents of the data base can be taken and all transactions can be backed out
until the integrity and validity of the data are restored. Whichever method is selected, it should be
documented and a checklist of specific tasks and responsibilities identified.
The DBMS typically provides exhaustive review trails so that the system can know its exact state
at any time. These review tails should be complete enough to reconstruct transactions and aid in
recovery procedures. A data base administrator should know how to use these review trails in
recovery to fully understand the inner workings of the DBMS.
A data base that has been backed up regularly helps the system recover from a failure and begin
operating again as soon as possible. Daily backups are sufficient in most organizations. Those
organizations that must always have current data must sometimes perform hourly backups. Each
backup should be well documented to provide further insight into the review process.
305
Review techniques should examine application design, security procedures, and personnel control
to ensure that managers can meet emergencies and have effective contingencies in place. These
three areas are extremely critical review points for the auditor, management, users, and IS
personnel.
Application Design
It is important to build sound recovery procedures and processes into an application during the
design phase. The design of an application should take into consideration the data base control
issues that affect backup and recovery processes. Possible weaknesses in controls include:
· Inaccurate or incomplete data in the data base.
· An inadequate audit trail.
· An inadequate service level.
· Failure of the DBMS to function as specified.
· Inadequate documentation.
· Lack of processing continuity.
· Lack of management support.
· Fraud or embezzlement.
The data base administrator should be responsible for examining the backup and recovery controls
being considered by the user and developer when reviewing application design. The user and the
developer of the application must assess the risks of not having appropriate controls in place to aid
in recovery. Some key controls that should be adopted are:
· Review trails. A method of chronologically recording system activities that allows the
reconstruction, review, and examination of each event in a transaction from inception to
the final results.
· Recovery procedures. Automated or manual tools and techniques for recovering the integrity of
a data base.
· Application system failure procedures. Procedures for users to follow in the event that their
applications cannot operate.
· Checkpoint data bases. Copies of the data base and transaction files that are made at specific
point in time for recovery purposes.
At a minimum, these controls should be tested during the module and integration testing phases of
development. In terms of a new system review before implementation, these controls are most
effective if thoroughly validated and approved by the user and developer before the system is
placed into operation. One important issue to be considered in application design is data integrity.
Maintaining Data Integrity.
Data integrity concerns the accuracy of the contents of the data base. The integrity of the data can
be compromised because of failures(i.e., events at which the system fails to provide normal
operation or correct data). Failures are caused primarily by errors, which may originate in
programs, interactions between these programs, or the system. A transaction is a sequence of
actions. It should be designed and executed so that it either is successfully completed or has no
effect on the data base. A transaction can fail to be completed for the following reasons:
· An action violates a security or integrity constraint.
306
· The user cancels the transaction.
· An unrecoverable I/O error occurs.
307
· The system backs out the transaction to resolve a deadlock.
· The application program fails.
· The system crashes.
Semantic Integrity.
This refers to the accuracy of the data base despite the fact that users or applications programs try
to modify it incorrectly. Assuming that the data base security system prevents unauthorized
access, and hence malicious attempts to corrupt data, most potential errors will be caused by
incorrect input, incorrect programs, or lack of user understanding.
Traditionally, most integrity checking has been performed by the applications programs and by
periodic auditing of the data base. The following are some problems that occur when relying on
application programs for integrity checking:
· Checking is likely to be incomplete because the applications programmer may not be aware of
the semantics of the complete data base.
· Each application program relies on other programs that can modify the data base, and a problem
in one program could corrupt the whole data base.
· Code that enforces the same integrity constraints occurs in several programs. This leads to
unnecessary duplication of the programming effort and exposes the system to potential
inconsistencies.
· The criteria for integrity are buried within procedures and are therefore difficult to understand
and control.
· Maintenance operations performed by users of high-level query language cannot be controlled.
Most of these errors could be detected through auditing, although the time lag in detecting
errors by auditing can cause problems, such as difficulty in tracing the source of an error and
hence correcting it as well as incorrect data used in various ways, causing errors to propagate
through the data base and into the environment.
The semantics, or meaning, of a data base is partly drawn from a shared understanding among the
users, partly implied by the data structures used, and partly expressed as integrity constraints.
These constraints are explicitly stated by the individuals responsible for data control. Data
bases can also be classified as:
· A single record or set.
· Static or transitional.
· General or selective.
· Immediate or deferred.
· Unconditional or conditional.
A system of concurrent transactions must be correctly synchronized—that is, the processing of
these transactions must reach the same final state and produce the same output. Three forms of
inconsistency result from concurrence: lost updates, an incorrect read, and an unrepeatable read.
Lost updates can also result from backing up or undoing a transaction.
Correcting Inconsistency Problems.
The most commonly used approach to eliminate consistency problems is locking. The DBMS
can use the locking facilities that the operating system provides so that multiple processes can
synchronize their concurrent access of shared resources. A lock can be granted to multiple
processes, but a given object cannot be locked in shared and exclusive mode at the same time.
Shared and exclusive modes conflict because they are incompatible. The operating system usually
308
provides lock and unlock commands for requesting and releasing locks. If a lock request
cannot be granted, the
309
process is suspended until the request can be granted. If transactions do not follow restrictive
locking rules, Deadlock can occur. Deadlock can cause the loss of an entire file; therefore, it
is critical to have a recovery system in place to alleviate this problem.
The Deadlock problem can be solved either by preventing Deadlock or by detecting them after
they occur and taking steps to resolve them. Deadlock can be prevented by placing restrictions on
the way locks are requested. They can be detected by examining the status of locks. After they
are detected, the Deadlock can be resolved by aborting a transaction and rescheduling it. Methods
for selecting the best transaction to abort have also been developed.
A synchronization problem can occur in a distributed data base environment, such as a
client/server network. Data bases can become out of sync when data from one data base fails
to be updated on other data bases. When updates fail to occur, users at some locations may use
data that is not current with data at other locations. Distributed data bases provide different
types of updating mechanisms. In a two-phase commit update process, network nodes must be
online and receive data simultaneously before updates can occur. A newer update method called
data replication enables updates to be stored until nodes are online and ready to receive. Update
methods must ensure currency in all network data bases.
Security Procedures
A data base usually contains information that is vital to an organization's survival. A secure
data base environment, with physical and logical security controls, is essential during recover
procedures. Physical Security.
In some distributed environments, many physical security controls, such as the use of security
badges and cipher locks, are not feasible and the organization must rely more heavily on logical
security measures. In these cases, many organizational members may have data processing
needs that do not involve a data base but require the use of computer peripherals.
Logical Security.
Logical security prevents unauthorized users from invoking DBMS functions. The primary means
of implementing this type of security is the use of passwords to prevent access to files, records,
data elements, and DBMS utilities. Passwords should be checked to ensure that they are
designated in an intelligent, logical manner.
Security Logs.
Each time an unauthorized user attempts to access the data base, it should be recorded in a
security log. Entries in this log should consist of user ID, terminal or port number, time, date, and
type of infraction. With this information, it is possible to investigate any serious breaches of
security. From the data base administrator's standpoint, evidence that the DBMS is detecting
security violations and that a consistent procedure is used to follow them up should be sufficient.
Personnel Control
Data base recovery involves ensuring that only authorized users are allowed access and that no
subsequent misuse of information occurs. These controls are usually reestablished when a system
becomes operational. When operations cease or problems occur, however, controls often become
inoperative.
The three primary classes of data base users are data base administrator, applications and systems
programmers, and end users--and each has a unique view of the data. The DBMS must be flexible
enough to present data appropriately to each class of user and maintain the proper controls to
310
inhibit abuse of the system, especially during recovery, when controls may not be fully
operational.
311
Data Base Administrator.
The data base administrator is responsible for ensuring that the data base retains its integrity and is
accountable if the data base becomes compromised, no matter what circumstances arise. This
individual has ultimate power over the schema that the organization has implemented. Any
modifications or additions to this schema must be approved by the data base administrator.
Permission to use subschema (i.e., logical views) is given to end users and programmers only after
their intentions are fully known and are consistent with organizational goals.
Because the data base administrator has immediate and unrestricted access to almost every piece
of valuable organizational information, an incompetent employee in this position can expose the
organization to enormous risk, especially during DBMS recovery. Therefore, an organization
should have controls in place to ensure the appointment of a qualified data base administrator.
The data base administrator must ensure that appropriate procedures are followed during DBMS
recovery. The data base administrator should also validate and verify the system once it has been
recovered before allowing user access so that if controls are not functioning or accessing problem
continue, users will not be affected.
312
Object Oriented Database (OODB) provides all the facilities associated with object oriented
paradigm. It enables us to create classes, organize objects, structure an inheritance hierarchy and
call methods of other classes. Besides these, it also provides the facilities associated with
standard database systems. However, object oriented database systems have not yet replaced the
RDBMS in commercial business applications. Following are the two different approaches for
designing an object-oriented database:
• Designed to store, retrieve and manage objects created by programs written in some object
oriented languages (OOL) such as C++ or java.
Although a relational database can be used to store and manage objects, it does not understand
objects as such. Therefore, a middle layer called object manager or object-oriented layer software
is required to translate objects into tuples of a relation .
The user will create classes, objects, inheritance and so on and the database system will store and
manage these objects and classes. This second approach, thus, turns non-OOPLs into OOPLs. A
translation layer is required to map the objects created by user into objects of the database
system.
Advantages of OODBMS
The object-oriented data model allows the 'real world' to be modeled more closely. The object,
which encapsulates both state and behavior, is a more natural and realistic representation of real-
world objects. An object can store all the relationships it has with other objects, including many-
to- many relationships, and objects can be formed into complex objects that the traditional data
models cannot cope with easily.
Extensibility
OODBMSs allow new data types to be built from existing types. The ability to factor out
common properties of several classes and form them into a superclass that can be shared with
subclasses can greatly reduce redundancy within system and, as we stated· at the start of this
chapter, is regarded as one of the main advantages of object orientation. Further, the reusability of
classes promotes faster development and easier maintenance of the database and its applications.
313
Unlike traditional databases (such as hierarchical, network or relational), the object oriented
database are capable of storing different types of data, for example, pictures, voice video,
including text, numbers and so on.
A single language interface between the Data Manipulation Language (DML) and the
programming language overcomes the impedance mismatch. This eliminates many of the
efficiencies that occur in mapping a declarative language such as SQL to an imperative 'language
such as 'C'. Most OODBMSs provide a DML that is computationally complete compared with
SQL, the 'standard language of RDBMSs.
Navigational access from the object is the most common form of data access in an OODBMS.
This is in contrast to the associative access of SQL (that is, declarative statements with selection
based on one or more predicates). Navigational access is more suitable for handling parts
explosion, recursive queries, and so on.
The tight coupling between data and applications in an OODBMS makes schema evolution more
feasible.
There are many areas where traditional DBMSs have not been particularly successful, such as,
Computer-Aided Design (CAD), Computer-Aided Software Engineering (CASE), Office
Information System(OIS), and Multimedia Systems. The enriched modeling capabilities of
OODBMSs have made them suitable for these applications.
Improved performance
There have been a number of benchmarks that have suggested OODBMSs provide significant
performance improvements over relational DBMSs. The results showed an average 30-fold
performance improvement for the OODBMS over the RDBMS.
314
Disadvantages of OODBMSs
Lack of universal data model: There is no universally agreed data model for an OODBMS, and
most models lack a theoretical foundation. This .disadvantage is seen as a significant drawback,
and is comparable to pre-relational systems.
Lack of experience: In comparison to RDBMSs the use of OODBMS is still relatively limited.
This means that we do not yet have the level of experience that we have with traditional systems.
OODBMSs are still very much geared towards the programmer, rather than the naïve end-user.
Also there is a resistance to the acceptance of the technology. While the OODBMS is limited to a
small niche market, this problem will continue to exist
Competition: Perhaps one of the most significant issues that face OODBMS vendors is the
competition posed by the RDBMS and the emerging ORDBMS products. These products have an
established user base with significant experience available. SQL is an approved standard and the
relational data model has a solid theoretical formation and relational products have many
supporting tools to help .both end-users and developers.
Locking at object level may impact performance Many OODBMSs use locking as the basis for
concurrency control protocol. However, if locking is applied at the object level, locking of an
inheritance hierarchy may be problematic, as well as impacting performance.
Complexity: The increased functionality provided by the OODBMS (such as the illusion of a
single- level storage model, pointer sizzling, long-duration transactions, version management, and
schema evolution--makes the system more complex than that of traditional DBMSs. In
complexity leads to products that are more expensive and more difficult to use.
Lack of support for views: Currently, most OODBMSs do not provide a view mechanism,
which, as we have seen previously, provides many advantages such as data independence,
security, reduced complexity, and customization.
Lack of support for security: Currently, OODBMSs do not provide adequate security
mechanisms. The user cannot grant access rights on individual objects or classes.
315
If OODBMSs are to expand fully into the business field, these deficiencies must be rectified.
316
OBJECT RELATIONAL DBMS
Relational DBMSs are currently the dominant database technology. The OODBMS has also
become the favored system for financial and telecommunications applications. Although the
OODBMS market is still same. The OODBMS continues to find new application areas, such as
the World Wide Web. Some industry analysts expect the market for the OODBMSs to grow at
over 50% per year, a rate faster than the total database market. However, their sales are unlikely
to overtake those of relational systems because of the wealth of businesses that find RDBMSs
acceptable, and because businesses have invested to much money and resources in their
development that change is prohibitive. This is the approach that has been taken by many
extended relational DBMSs, although each has implemented different combinations of features.
Thus there is no single extended relational model rather, there are a variety of these models,
whose characteristics depends upon the way and the degree to which extensions were made.
However, all the models do share the same basic relational tables and query language, all
incorporate some concept of 'object, and some have the ability to store methods (or procedures or
triggers), as well as data in the database.
In a four-quadrant view of the database world, as illustrated in the figure, the lower-left quadrant
are those applications that process simple data and have no requirements for querying the data.
These types of application, for example standard text processing packages such as Word,
WordPerfect, and Frame maker, can use the underlying operating system to obtain the essential
DBMS functionality of persistence. In the lower-right quadrant are those applications that process
complex data but again have no significant requirements for querying the data. For these types of
application, for example computer-aided design packages, an OODBMS may be an appropriate
317
choice of DBMS.
In the top-left quadrant are those applications that process simple data and also have
requirements for complex querying. Many traditional business applications fall into this quadrant
and an RDBMS may be the most appropriate DBMS.
318
Finally, in the top-right quadrant are those applications that process completed data and have
complex querying requirements. This represents many of the advanced database applications and
for these applications an ORDBMS may be the appropriate choice of DBMS.
ORDBMSs can provide appropriate solutions for many types of advanced database applications.
However, there are also disadvantages.
Advantages of ORDBMSs
Reuse and Sharing: The main advantages of extending the Relational data model come from
reuse and sharing. Reuse comes from the ability to extend the DBMS server to perform standard
functionality centrally, rather than have it coded in each application.
Increased Productivity: ORDBMS provides increased productivity both for the developer and
for the, end user
Use of experience in developing RDBMS: Another obvious advantage is that .the extended
relational approach preserves the significant body of knowledge and experience that has gone into
developing relational applications. This is a significant advantage, as many organizations would
find it prohibitively expensive to change. If the new functionality is designed appropriately, this
approach should allow organizations to take advantage of the new extensions in an evolutionary
way without losing the benefits of current database features and functions.
Disadvantages of ORDBMSs
The ORDBMS approach has the obvious disadvantages of complexity and associated increased
costs. Further, there are the proponents of the relational approach that believe the· essential
simplicity' and purity of the .relational model are lost with these types of extension.
ORDBMS vendors are attempting to portray object models as extensions to the relational model
with some additional complexities. This potentially misses the point of object orientation,
highlighting the large semantic gap between these two technologies. Object applications are
simply not as data- centric as relational-based ones.
PARALLEL DATABASE
320
parallel. Centralized and client-server database systems are not powerful enough to handle such
applications. In parallel processing, many operations are performed simultaneously, as opposed
to serial processing, in which the computational steps are performed sequentially.
Shared memory architecture, where multiple processors share the main memory space.
Shared disk architecture, where each node has its own main memory, but all nodes
share mass storage, usually a storage area network. In practice, each node usually also
has multiple processors.
Shared nothing architecture, where each node has its own mass storage as well as
main memory.
322
For a client application, the location and platform of the databases are transparent. You can also
create synonyms for remote objects in the distributed system so that users can access them with
the same syntax as local objects. For example, if you are connected to database mfg but want to
access data on database hq, creating a synonym on mfg for the remote dept table enables you to
issue this query:
SELECT * FROM dept;
An Oracle Database distributed database system can incorporate Oracle Databases of different
versions. All supported releases of Oracle Database can participate in a distributed database
system. Nevertheless, the applications that work with the distributed database must understand the
functionality that is available at each node in the system. A distributed database application cannot
expect an Oracle7 database to understand the SQL extensions that are only available with Oracle
Database.
Heterogeneous Distributed Database Systems
In a heterogeneous distributed database system, at least one of the databases is a non-Oracle
Database system. To the application, the heterogeneous distributed database system appears as a
323
single, local, Oracle Database. The local Oracle Database server hides the distribution and
heterogeneity of the data.
324
The Oracle Database server accesses the non-Oracle Database system using Oracle Heterogeneous
Services in conjunction with an agent. If you access the non-Oracle Database data store using an
Oracle Transparent Gateway, then the agent is a system-specific application. For example, if you
include a Sybase database in an Oracle Database distributed system, then you need to obtain a
Sybase-specific transparent gateway so that the Oracle Database in the system can
communicate with it.
Alternatively, you can use generic connectivity to access non-Oracle Database data stores so
long as the non-Oracle Database system supports the ODBC or OLE DB protocols.
Heterogeneous Services
Heterogeneous Services (HS) is an integrated component within the Oracle Database server and
the enabling technology for the current suite of Oracle Transparent Gateway products. HS
provides the common architecture and administration mechanisms for Oracle Database gateway
products and other heterogeneous access facilities. Also, it provides upwardly compatible
functionality for users of most of the earlier Oracle Transparent Gateway releases.
Transparent Gateway Agents
For each non-Oracle Database system that you access, Heterogeneous Services can use a
transparent gateway agent to interface with the specified non-Oracle Database system. The agent
is specific to the non-Oracle Database system, so each type of system requires a different agent.
The transparent gateway agent facilitates communication between Oracle Database and non-
Oracle Database systems and uses the Heterogeneous Services component in the Oracle
Database server. The agent executes SQL and transactional requests at the non-Oracle Database
system on behalf of the Oracle Database server.
Client/Server Database Architecture
A database server is the Oracle software managing a database, and a client is an application that
requests information from a server. Each computer in a network is a node that can host one or
more databases. Each node in a distributed database system can act as a client, a server, or both,
depending on the situation.
An Oracle Database Distributed Database System
325
A client can connect directly or indirectly to a database server. A direct connection occurs when
a client connects to a server and accesses information from a database contained on that server.
For example, if you connect to the hq database and access the dept table on this database as in
below Figure, you can issue the following:
This query is direct because you are not accessing an object on a remote database.
In contrast, an indirect connection occurs when a client connects to a server and then accesses
information contained in a database on a different server. For example, if you connect
to the hq database but access the emp table on the remote sales database as in above Figure you
can issue the following:
SELECT * FROM emp@sales;
326
In Distributed Database System, data is physically stored across several sites, and each site is
327
typically managed by a DBMS capable of running independent of the other sites. In contrast to
parallel databases, the distribution of data is governed by factors such as local ownership and
increased availability.
1. System Components
- Distributed DBMS consists of many Geo-distributed, low-bandwidth link
connected, autonomic sites.
- Parallel DBMS consists of tightly coupled, high-bandwidth link connected, non-autonomic
nodes.
2. Component Role
- Sites in Distributed DBMS can work independently to handle local transactions or work
together to handle global transactions.
- Nodes in Parallel DBMS can only work together to handle global transactions.
3. Design Purposes
= Distributed DBMS is for:
- Sharing Data
- Local Autonomy
- High Availability
= Parallel DBMS is for:
- High Performance
- High Availability
DATA WAREHOUSING
A data warehouse is a collection of data marts representing historical data from different
operations in the company. This data is stored in a structure optimized for querying and data
analysis as a data warehouse. Table design, dimensions and organization should be consistent
throughout a data warehouse so that reports or queries across the data warehouse are consistent.
328
A data warehouse can also be viewed as a database for historical data from different functions
within a company. The term Data Warehouse was coined by Bill Inmon in 1990, which he
defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision making process". He
defined the terms in the sentence as follows: Subject Oriented: Data that gives information
about a particular subject instead of about a company's ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time period.
Non-volatile: Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the business. It is a single,
complete and consistent store of data obtained from a variety of different sources made
available to end users in what they can understand and use in a business context. It can be
• Used for decision Support
• Used to manage and control business
• Used by managers and end-users to understand the business and make judgments
• Data warehouses are designed to perform well with aggregate queries running on large
amounts of data.
• The structure of data warehouses is easier for end users to navigate, understand and query
against unlike the relational databases primarily designed to handle lots of transactions.
• Data warehouses enable queries that cut across different segments of a company's operation.
E.g. production data could be compared against inventory data even if they were originally
stored in different databases with different structures.
• Queries that would be complex in very normalized databases could be easier to build and
maintain in data warehouses, decreasing the workload on transaction systems.
• Data warehousing is an efficient way to manage and report on data that is from a variety
of sources, non uniform and scattered throughout a company.
• Data warehousing is an efficient way to manage demand for lots of information from lots of users.
• Data warehousing provides the capability to analyze large amounts of historical data for
nuggets of wisdom that can provide an organization with competitive advantage.
• A data warehouse can be viewed as an information system with the following attributes:
• Purging data
330
8. The need to track slowly changing dimensions
Technical considerations
• The communication infrastructure that connects data marts, operational systems and end users
• The systems management framework that enables admin of the entire environment
Implementation considerations
• Extract the data from operational db, transform it, clean it up and load it into the warehouse
Access tools
331
Data warehouse implementation relies on selecting suitable data access tools. The best way to
choose this is based on the type of data can be selected using this tool and the kind of access it
permits for a particular user. The following lists the various type of data that can be accessed:
• Ranking data
• Multivariable data
• Complex queries with multiple joins, multi level sub queries and sophisticated search criteria
A proper attention must be paid to data extraction which represents a success factor for a data
warehouse architecture. When implementing data warehouse several the following selection
criteria that affect the ability to transform, consolidate, integrate and repair the data should be
considered:
• The tool must have the ability to identify the particular data and that can be read by conversion tool
• The tool must support flat files, indexed files since corporate data is still in this type
332
• The tool must have the capability to merge data from multiple data stores
333
• The tool should have specification interface to indicate the data to be extracted
• The tool should have the ability to read data from data dictionary
• The tool should permit the user to extract the required data
• The tool must have the facility to perform data type and character set translation
• The tool must have the capability to create summarization, aggregation and derivation of records
• The data warehouse database system must be able to perform loading data directly
– As a data warehouse grows, there are at least two options for data placement. One is to put
some of the data in the data warehouse into another storage media.
– The second option is to distribute the data in the data warehouse across multiple servers.
User levels
The users of data warehouse data can be classified on the basis of their skill level in accessing
the warehouse. There are three classes of users:Casual users: are most comfortable in retrieving
info from warehouse in pre defined formats and running pre existing queries and reports. These
users do not need tools that allow for building standard and ad hoc reports
Power Users: can use pre defined as well as user defined queries to create simple and ad hoc
reports. These users can engage in drill down operations. These users may have the experience
of using reporting and query tools.
Expert users: These users tend to create their own complex queries and perform standard
analysis on the info they retrieve. These users have the knowledge about the use of query and
report tools
335
– Presentation of info
– Testing of hypothesis
– Discovery of info
The data in a data warehouse comes from operational systems of the organization as well as
from other external sources. These are collectively referred to as source systems. The data
extracted from source systems is stored in a area called data staging area, where the data is
cleaned, transformed, combined, duplicated to prepare the data for us in the data warehouse. The
data staging area is generally a collection of machines where simple activities like sorting and
sequential processing takes place. The data staging area does not provide any query or
presentation services. As soon as a system provides query or presentation services, it is
categorized as a presentation server. A presentation server is the target machine on which the
data is loaded from the data staging area organized and stored for direct querying by end users,
report writers and other applications. The three different kinds of systems that are required for a
data warehouse are:
336
1. Source Systems
2. Data Staging Area
3. Presentation servers
337
The data travels from source systems to presentation servers via the data staging area. The entire
process is popularly known as ETL (extract, transform, and load) or ETT (extract, transform,
and transfer). Oracle’s ETL tool is called Oracle Warehouse Builder (OWB) and MS SQL
Server’s ETL tool is called Data Transformation Services (DTS).
Each component and the tasks performed by them are explained below:
1. OPERATIONAL DATA
The sources of data for the data warehouse is supplied from:
(i) The data from the mainframe systems in the traditional network and hierarchical
format.
(ii) Data can also come from the relational DBMS like Oracle, Informix.
(iii) In addition to these internal data, operational data also includes external data
obtained from commercial databases and databases associated with supplier and
customers.
2. LOAD MANAGER
338
The load manager performs all the operations associated with extraction and loading data into
the data warehouse. These operations include simple transformations of the data to prepare the
data for entry into the warehouse. The size and complexity of this component will vary between
data warehouses and may be constructed using a combination of vendor data loading tools
and custom built programs.
339
3. WAREHOUSE MANAGER
The warehouse manager performs all the operations associated with the management of data in
the warehouse. This component is built using vendor data management tools and custom built
programs. The operations performed by warehouse manager include:
(i) Analysis of data to ensure consistency
(ii) Transformation and merging the source data from temporary storage into data
warehouse tables
(iii) Create indexes and views on the base table.
(iv) Denormalization
(v) Generation of aggregation
(vi) Backing up and archiving of data
In certain situations, the warehouse manager also generates query profiles to determine which
indexes ands aggregations are appropriate.
4. QUERY MANAGER
The query manager performs all operations associated with management of user queries. This
component is usually constructed using vendor end-user access tools, data warehousing
monitoring tools, database facilities and custom built programs. The complexity of a query
manager is determined by facilities provided by the end-user access tools and database.
5. DETAILED DATA
This area of the warehouse stores all the detailed data in the database schema. In most cases
detailed data is not stored online but aggregated to the next level of details. However the
detailed data is added regularly to the warehouse to supplement the aggregated data.
8. META DATA
The data warehouse also stores all the Meta data (data about data) definitions used by all
processes in the warehouse. It is used for variety of purposed including:
(i) The extraction and loading process – Meta data is used to map data sources
to a common view of information within the warehouse.
(ii) The warehouse management process – Meta data is used to automate the
production of summary tables.
(iii) As part of Query Management process Meta data is used to direct a query to the
340
most appropriate data source.
341
The structure of Meta data will differ in each process, because the purpose is different. More
about Meta data will be discussed in the later Lecture Notes.
EXTRACT
Some of the data elements in the operational database can be reasonably be expected to be useful
in the decision making, but others are of less value for that purpose. For this reason, it is necessary
to extract the relevant data from the operational database before bringing into the data
warehouse. Many commercial tools are available to help with the extraction process. Data
Junction is one of the commercial products. The user of one of these tools typically has an
easy-to-use windowed interface by which to specify the following:
(i) Which files and tables are to be accessed in the source database?
(ii) Which fields are to be extracted from them? This is often done internally by
SQL Select statement.
(iii) What are those to be called in the resulting database?
(iv) What is the target machine and database format of the output?
(v) On what schedule should the extraction process be repeated?
TRANSFORM
The operational databases developed can be based on any set of priorities, which keeps changing
342
with the requirements. Therefore those who develop data warehouse based on these databases are
typically faced with inconsistency among their data sources. Transformation process deals with
rectifying any inconsistency (if any).
343
One of the most common transformation issues is ‘Attribute Naming Inconsistency’. It is
common for the given data element to be referred to by different data names in different
databases. Employee Name may be EMP_NAME in one database, ENAME in the other. Thus one
set of Data Names are picked and used consistently in the data warehouse. Once all the data
elements have right names, they must be converted to common formats. The conversion may
encompass the following:
CLEANSING
Information quality is the key consideration in determining the value of the information. The
developer of the data warehouse is not usually in a position to change the quality of its underlying
historic data, though a data warehousing project can put spotlight on the data quality issues and
lead to improvements for the future. It is, therefore, usually necessary to go through the data
entered into the data warehouse and make it as error free as possible. This process is known as
Data Cleansing.
Data Cleansing must deal with many types of possible errors. These include missing data and
incorrect data at one source; inconsistent data and conflicting data when two or more source are
involved. There are several algorithms followed to clean the data, which will be discussed in the
coming lecture notes.
LOADING
Loading often implies physical movement of the data from the computer(s) storing the source
database(s) to that which will store the data warehouse database, assuming it is different. This
takes place immediately after the extraction phase. The most common channel for data movement
is a high-speed communication link. Ex: Oracle Warehouse Builder is the API from Oracle, which
provides the features to perform the ETL task on Oracle Data Warehouse.
344
DATA MINING
The architecture of a typical data mining system may have the following major components .
Database or data warehouse server: The database or data warehouse server is responsible
for fetching the relevant data, based on the user’s data mining request.
Knowledge base: This is the domain knowledge that is used to guide the search, or evaluate
345
the interestingness of resulting patterns. Such knowledge can include concept hierarchies,
used to organize
346
attributes or attribute values into different levels of abstraction. Knowledge such as user
beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness,
may also be included. Other examples of domain knowledge are additional interestingness
constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous
sources).
Data mining engine: This is essential to the data mining system and ideally consists of a set
of functional modules for tasks such as characterization, association, classification, cluster
analysis, and evolution and deviation analysis.
Pattern evaluation module: This component typically employs interestingness measures
and interacts with the data mining modules so as to focus the search towards interesting
patterns. It may use interestingness thresholds to filter out discovered patterns.
Alternatively, the pattern evaluation module may be integrated with the mining module,
depending on the implementation of the data mining method used. For efficient data mining,
it is highly recommended to push the evaluation of pattern interestingness as deep as possible
into the mining process so as to confine the search to only the interesting patterns.
Graphical user interface: This module communicates between users and the data mining
system, allowing the user to interact with the system by specifying a data mining query or
task, providing information to help focus the search, and performing exploratory data mining
based on the intermediate data mining results. In addition, this component allows the user to
browse database and data warehouse schemas or data structures, evaluate mined patterns, and
visualize the patterns in different forms.
Data mining identifies facts or suggests conclusions based on sifting through the data to
discover either patterns or anomalies. Data mining has five main functions:
Association: identifies relationships between events that occur at one time (such as the
contents of a shopping basket).
348
Sequencing: similar to association, except that the relationship exists over a period of
time (such as repeat visits to a supermarket or use of a financial planning product).
Forecasting: estimates future values based on patterns within large sets of data (such
as demand forecasting).
The areas where data mining has been applied recently include:
Science
o astronomy,
o bioinformatics,
o drug discovery, ...
Business
o advertising,
o Customer modeling and CRM (Customer Relationship management)
o e-Commerce,
o fraud detection
o health care, ...
o investments,
o manufacturing,
o sports/entertainment,
o telecom (telephone and communications),
o targeted marketing,
Web:
o search engines, bots, ...
Government
o anti-terrorism efforts
o law enforcement,
o profiling tax cheaters
One of the most important and widespread business applications of data mining is Customer
Modeling, also called Predictive Analytics. This includes tasks such as
predicting attrition or churn, i.e. find which customers are likely to terminate service
targeted marketing:
o customer acquisition - find which prospects are likely to become customers
o cross-sell - for given customer and product, find which other product(s) they are
likely to buy
credit-risk - identify the risk that this customer will not pay back the loan or credit card
fraud detection - is this transaction fraudulent?
349
The largest users of Customer Analytics are industries such as banking, telecom, retailers, where
businesses with large numbers of customers are making extensive use of these technologies.
350