0% found this document useful (0 votes)
21 views350 pages

RDBMSPDF

This document provides an overview of Relational Database Management Systems (RDBMS), explaining the differences between data and information, the structure of databases, and the components of a DBMS. It outlines the advantages of using a DBMS, such as data integrity, efficient access, and reduced redundancy, while also detailing the roles of various users and the database administrator. Additionally, it introduces the three-schema architecture that separates user applications from the physical database for better management and organization.

Uploaded by

mpari0181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views350 pages

RDBMSPDF

This document provides an overview of Relational Database Management Systems (RDBMS), explaining the differences between data and information, the structure of databases, and the components of a DBMS. It outlines the advantages of using a DBMS, such as data integrity, efficient access, and reduced redundancy, while also detailing the roles of various users and the database administrator. Additionally, it introduces the three-schema architecture that separates user applications from the physical database for better management and organization.

Uploaded by

mpari0181
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 350

RELATIONAL DATA BASE MANAGEMENT SYSTEM

(R.D.B.M.S)

1
UNIT I
INTRODUCTION TO DATA BASE MANAGEMENT
Data

 Data is the raw material that can be processed for any computing machine.
 For example − Employee name, Product name, Name of the student, Marks of
the student, Mobile number, Image etc.

Information

 Information is the data that has been converted into more useful or intelligent
form.
 For example: Report card sheet.

Differences

The major differences between Data and Information are as follows –

Data Information
1. Data is the raw fact. 1. It is a processed form of data.
2. It is not significant to a
2. It is significant to a business.
business.
3. Data is an atomic level
3. It is a collection of data.
piece of information.
4. Example: Product name,
4. Example: Report card of student.
Name of student.
5. It is a phenomenal fact. 5. It is organized data.
6. This is the primary level of
6. It is a secondary level of intelligence.
intelligence.
7. May or may not be
7. Always meaningful.
meaningful.

8. Understanding is difficult. 8. Understanding is easy.

Diagram

2
 The diagram given below depicts the use of data and information in a database

As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System
WHAT IS A DATABASE?
 To find out what database is, we have to start from data, which is the basic building block
of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items,
e.g.
Roll Name Age
1 ABC 19

Table or Relation: Collection of related records.

Roll Name Age


1 ABC 19
2 DEF 22
3 XYZ 28

The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples
Or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1
Roll Name Age
1 ABC 19
2 DEF 22
3 XYZ 28

T2
3
Roll Address
1 KOL
2 DEL
3 MUM

T3
Roll Year
1 I
2 II
3 I

T4
Year Hostel
I H1
II H2

 We now have a collection of 4 tables. They can be called a “related collection” because
we can clearly find out that there are some common attributes existing in a selected pair
of tables.
Tuple or Record.

 In a database, data is organized strictly in row and column format. The rows are called
Tuple or Record.
 The data items within one row may belong to different data types.
Domain or Attribute.

 On the other hand, the columns are often called Domain or Attribute.
 All the data items within a single attribute are of the same data type.

 DBMS
 A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data.
 The collection of data, usually referred to as the database, contains information relevant to
an enterprise.
 The primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.
 For example, consider the names, telephone numbers, and addresses of the people you
know.
 You may have recorded this data in an indexed address book, or you may have stored it on a
diskette, using a personal computer and software such as DBASE IV or V, Microsoft
ACCESS, or EXCEL.
4
 A datum – a unit of data – is a symbol or a set of symbols which is used to represent
something.
 Information is interpreted data – data supplied with semantics.
 Knowledge refers to the practical use of information.
 Database systems are designed to manage large bodies of information.
 Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information. In addition, the database
system must ensure the safety of the information stored, despite system crashes or attempts
at unauthorized access.

 ADVANTAGES OF A DBMS
Using a DBMS to manage data has many advantages:
 Reduction of Redundancy: This is perhaps the most significant advantage of using DBMS.
Redundancy is the problem of storing the same data item in more one place.
Redundancy creates several problems like requiring extra storage space, entering same
data more than once during data insertion, and deleting data from more than one place
during deletion. Anomalies may occur in the database if insertion, deletion etc are not
done properly.
 Sharing of Data: In a paper-based record keeping, data cannot be shared among many users.
But in computerized DBMS, many users can share the same database if they are
connected via a network.
 Data Integrity: We can maintain data integrity by specifying integrity constrains, which
are rules and restrictions about what kind of data may be entered or manipulated within the
database. This increases the reliability of the database as it can be guaranteed that no
wrong data can exist within the database at any point of time.
 Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of the
data to insulate application code from such details.
 Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
 Data integrity and security: If data is always accessed through the DBMS, the DBMS
can enforce integrity constraints on the data. For example, before inserting salary
information for an employee, the DBMS can check that the department budget is not
exceeded. Also, the DBMS can enforce access controls that govern what data is visible to
different classes of users.
 Data administration: When several users share the data, centralizing the administration
of data can offer significant improvements. Experienced professionals, who understand
the nature of the data being managed, and how different groups of users use it, can be
responsible for organizing the data representation to minimize redundancy and fine-
tuning the storage of the data to make retrieval efficient.
5
Concurrent access and crash recovery:
A DBMS schedules concurrent accesses to the data in such a manner that users can think of the
data as being accessed by only one user at a time. Further, the DBMS protects users from the
effects of system failures.
 Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS. This,
in conjunction with the high-level interface to the data, facilitates quick development of
Applications. Such applications are also likely to be more robust than applications developed from
scratch because many important tasks are handled by the DBMS instead of being implemented by the
application.
What are the components of DBMS?

Hardware, Software, Data, Database Access Language, Procedures and Users all together form the

components of a DBMS.

Hardware

The hardware is the actual computer system used for keeping and accessing the database. The
conventional DBMS hardware consists of secondary storage devices such as hard disks. Databases run
on the range of machines from microcomputers to mainframes.

Software

Software is the actual DBMS between the physical database and the users of the system. All the
requests from the user for accessing the database are handled by DBMS.

Data

It is an important component of the database management system. The main task of DBMS is to
process the data. Databases are used to store the data, retrieved, and updated to and from the databases.

Users

There are a number of users who can access or retrieve the data on demand using the application and
the interfaces provided by the DBMS.

The users of the database can be classified into different groups −

 Native Users
 Online Users
6
 Sophisticated Users
 Specialized Users
 Application Users
 DBA- Database Administrator

The components of DBMS are given below in pictorial form −

Components of Database Management Software

There four main components on which the working of a DBMS depends. This includes:

 Data: The main component is the data. The entire database is set based on the data and the
information processed based on it. This data acts as a bridge between the software and hardware
components of DBMS. This can further be divided into three varieties:

 User Data – The actual data based on which the work is done
 Metadata – This is the data of the data, i.e., managing the data required to enter the
information
 Application MetaData – This is the structure and format of the queries
To simplify it, in a table, the information given in each table is the User Data, the number of tables,
rows and columns is the Metadata the structure we choose is the Application Metadata.
7
 Hardware: These are the general hardware devices which help us save and enter the data like
hard disks, magnetic tapes, etc.
 Software: The software acts as a medium of communication between the user and the database.
Based on the user’s requirement, the database can be modified and updated. To perform
operations on the data, query languages like SQL are used.
 Users: No function can be performed without the Users. Thus, they form the fourth most
important component of DBMS. The information entered into a database is used by the User or
the administrator to perform their business operations and responsibilities

 DATABASE USERS AND USER INTERFACES


There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different
types of users.
Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. For example, a bank teller who needs to
transfer $50 from account A to account B invokes a program called transfer. This program asks
the teller for the amount of money to be transferred, the account from which the money is to be
transferred, and the account to which the money is to be transferred.
As another example, consider a user who wishes to find her account balance over the World Wide
Web. Such a user may access a form, where she enters her account number. An application
program at the Web server then retrieves the account balance, using the given account
number, and passes this information back to the user. The typical user interface for naive
users is a forms interface, where the user can fill in appropriate fields of the form. Naive
users may also simply read reports
generated from the database.
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program. There are also special types of
programming languages that combine imperative control structures (for example, for loops, while
loops and if-then-else statements) with statements of the data manipulation language. These
languages, sometimes called fourth-generation languages, often
include special features to facilitate the generation of forms and the display of data on the screen.
Most major commercial database systems include a fourth generation language.
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
Analysts who submit queries to explore data in the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view
summaries of data in different ways. For instance, an analyst can see total sales by region (for
example, North, South, East, and West), or by product, or by a combination of region and product
(that is, total sales of each product in each region). The tools also permit the analyst to select
8
specific regions, look at data in more detail (for example, sales by city within a region) or look at
the data in less detail (for example, aggregate products together by category).
Another class of tools for analysts is data mining tools, which help them find certain
kinds of patterns in data.
Specialized users are sophisticated users who write specialized database applications that do not
fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledge base and expert
systems, systems that store data with complex data types (for example, graphics data and audio
data), and environment-modeling systems.
 DATABASE ADMINISTRATOR
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called a
database administrator (DBA). The functions of a DBA include:
 Schema definition. The DBA creates the original database schema by executing a set of
data definition statements in the DDL.
 Schema and physical-organization modification. The DBA carries out changes to the
schema and physical organization to reflect the changing needs of the organization, or to
alter the physical organization to improve performance.
 Granting of authorization for data access. By granting different types of authorization,
the database administrator can regulate which parts of the database various users can
access. The Authorization information is kept in a special system structure that the
database system consults whenever someone attempts to access the data in the system.
 Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:
1. Periodically backing up the database, either onto tapes or onto remote servers, to prevent
loss of data in case of disasters such as flooding.
2. Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
3. Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
 DBMS ARCHITECTURE
Three important characteristics of the database approach are (1) insulation of programs and data
(program-data and program-operation independence); (2) support of multiple user views; and (3)
use of a catalog to store the database description (schema). In this section we specify an
architecture for database systems, called the three-schema architecture, which was proposed to
help achieve and visualize these characteristics.
The goal of the three-schema architecture, illustrated in Figure 1.1, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the
following three levels:
1. The internal level has an internal schema, which describes the physical storage structure
of the database. The internal schema uses a physical data model and describes the
9
complete details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole
database for a community of users. The conceptual schema hides the details of physical
storage structures and concentrates on describing entities, data types, relationships, user
operations, and constraints. A high-level data model or an implementation data model can
be used at this level.
3. The external or view level includes a number of external schemas or user views. Each
external schema describes the part of the database that a particular user group is interested
in and hides the rest of the database from that user group. A high-level data model or an
implementation data model can be used at this level.

10
Figure 1.1 the Three Schema Architecture
The three-schema architecture is a convenient tool for the user to visualize the schema levels in a
database system. Most DBMSs do not separate the three levels completely, but support the three-
schema architecture to some extent. Some DBMSs may include physical-level details in the
conceptual schema. In most DBMSs that support user views, external schemas are specified in the
same data model that describes the conceptual-level information. Some DBMSs allow different
data models to be used at the conceptual and external levels.
Notice that the three schemas are only descriptions of data; the only data that actually exists is at
the physical level. In a DBMS based on the three-schema architecture, each user group refers only
to its own external schema. Hence, the DBMS must transform a request specified on an external
schema into a request against the conceptual schema, and then into a request on the internal
schema for processing over the stored database. If the request is a database retrieval, the data
extracted from the stored database must be reformatted to match the user’s external view. The
processes of transforming requests and results between levels are called mappings. These
mappings may be time- consuming, so some DBMSs—especially those that are meant to support
small databases—do not support external views. Even in such systems, however, a certain amount
of mapping is necessary to transform requests between the conceptual and internal levels.

11
DATA MODEL
A data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.

 Entity: An entity is a “thing” or “object” in the real world that is distinguishable


from all other objects. For example, each person in an enterprise is an entity.
 Entity Set: An entity set is a set of entities of the same type that share the same
properties, or attributes. The set of all persons who are customers at a given bank, for
example, can be defined as the entity set customer. Similarly, the entity set loan might
represent the set of all loans awarded by a particular bank.
An entity is represented by a set of attributes. Attributes are descriptive properties
possessed by each member of an entity set. The designation of an attribute for an entity
set expresses that the database stores similar information concerning each entity in the
entity set; however, each entity may have its own value for each attribute.
 Simple and composite attributes: The attributes have been simple; that is, they are not
divided into subparts is called as "simple attributes". On the other hand, can be divided
into subparts is called as "composite attributes". For example, an attribute name could
be structured as a composite attribute consisting of first-name, middle-initial, and last-
name.
 Single-valued and multivalued attributes: For instance, the loan-number attribute for a
specific loan entity refers to only one loan number. Such attributes are said to be single
valued. There may be instances where an attribute has a set of values for a specific entity.
Consider an employee entity set with the attribute phone-number. An employee may have
zero, one, or several phone numbers, and different employees may have different
numbers of phones.
This type of attribute is said to be multivalued.
 Derived attribute: The value for this type of attribute can be derived from the values of
other related attributes or entities. For instance, let us say that the customer entity set has
an attribute loans-held, which represents how many loans a customer has from the bank.
We can derive the value for this attribute by counting the number of loan entities
associated with that customer.
 Relationship Sets: A relationship is an association among several entities. A
relationship set is a set of relationships of the same type.
 Mapping Cardinalities: Mapping cardinalities, or cardinality ratios, express the number
of entities to which another entity can be associated via a relationship set. Mapping
cardinalities are most useful in describing binary relationship sets, although they can
contribute to the description of relationship sets that involve more than two entity sets.

12
 One to one. An entity in A is associated with at most one entity in B, and an entity in
B is associated with at most one entity in A.
 One to many. An entity in A is associated with any number (zero or more) of entities
in B. An entity in B, however, can be associated with at most one entity in A.
 Many to one. An entity in A is associated with at most one entity in B. An entity in
B, however, can be associated with any number (zero or more) of entities in A.
 Many to many. An entity in A is associated with any number (zero or more) of entities
in B, and an entity in B is associated with any number (zero or more) of entities in A.

 THE ENTITY-RELATIONSHIP MODEL


 The entity-relationship (E-R) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among these
objects. An entity is a “thing” or “object” in the real world that is distinguishable from
other objects. For example, each person is an entity, and bank accounts can be considered
as entities.
 Entities are described in a database by a set of attributes. For example, the attributes
account-number and balance may describe one particular account in a bank, and they form
attributes of the account entity set. Similarly, attributes customer-name, customer-street
address and customer-city may describe a customer entity.
 An extra attribute customer-id is used to uniquely identify customers (since it may be
possible to have two customers with the same name, street address, and city).
 A unique customer identifier must be assigned to each customer. In the United States,
many enterprises use the social-security number of a person (a unique number the U.S.
government assigns to every person in the United States) as a customer identifier.
 A relationship is an association among several entities. For example, a depositor
relationship associates a customer with each account that she has. The set of all entities of
the same type and the set of all relationships of the same type are termed an entity set and
relationship set, respectively.
 The overall logical structure (schema) of a database can be expressed graphically by an E-
R diagram.

Advantages and Disadvantages of E-R Data Model

Following are advantages of an E-R Model:

• Straightforward relation representation: Having designed an E-R diagram for a database


application, the relational representation of the database model becomes relatively
straightforward.

13
• Easy conversion for E-R to other data model: Conversion from E-R diagram to a network or
hierarchical data model can· easily be accomplished.

• Graphical representation for better understanding: An E-R model gives graphical and
diagrammatical representation of various entities, its attributes and relationships between
entities. This is turn helps in the clear understanding of the data structure and in minimizing
redundancy and other problems.

Disadvantages of E-R Data Model

Following are disadvantages of an E-R Model:

• No industry standard for notation: there is an industry standard notation for developing an E-
R diagram.

• Popular for high-level design: The E-R data model is especially popular for high level

Symbols used in the E-R notation:

14
E-R Modelling

 ER model stands for an Entity-Relationship model. It is a high-level data model.


 The Entity Relationship model was proposed by Peter Chen in 1976.
 ER model is a logical representation of an enterprise data. ER model is a
diagrammatic representation of logical structure of database.
 E-R model describes relationship among entities and attributes.
 Entity Relationship Diagrams are the best tools to communicate within the entire
system.
 These diagrams are the graphical representation of the flow of data and information.
 These diagrams are most commonly used in business organizations to make data
travel easy.
 This conceptual database model is an effective way of communicating with the
individuals at all the levels.
 The most common use of this diagram is to present the relation of the various tables
present in a database.

Following are the main components and its symbols in ER Diagrams:

 Rectangles: This Entity Relationship Diagram symbol represents entity types


 Ellipses : Symbol represent attributes
 Diamonds: This symbol represents relationship types
 Lines: It links attributes to entity types and entity types with other relationship types
 Primary key: attributes are underlined
 Double Ellipses: Represent multi-valued attributes

15
1. Entity

It may be an object, person, place or event that stores data in a database. In E-R diagram an
entity is represented in rectangle form. For example, students, employees, managers, etc.

The entity is pictorially depicted as follows:

Entity set

It is a collection of entities of the same type which share similar properties. For example, a
group of students in a college and students are an entity set. 16
Entity is characterised into two types as follows:

a. Strong entity set


b. Weak entity set

a. Strong entity set: The entity types which consist of key attributes or if there are
enough attributes for forming a primary key attribute are called a strong entity set. It
is represented by a single rectangle.

b. Weak entity set: An entity does not have a primary key attribute and depends on
another strong entity via foreign key attribute. It is represented by a double rectangle.

17
UNIT-II:

Degree of Relations in DBMS



We are living in a world where every entity has relationships with one other whether a living
or non-living being. For example, you are a single entity but you share different relations with
your family and friends. Even within a family, you are the son of your father at the same time
you are also a sibling of your brother. Similarly, the relationships exist in Database
Management (DBMS). In this article, we are going to learn about what is the degree of
relationships, and the types of relationships.
Degree of Relationship
In DBMS, a degree of relationship represents the number of entity types that are associated
with a relationship. For example, we have two entities, one is a student and the other is a bag
and they are connected with the primary key and foreign key. So, here we can see that the
degree of relationship is 2 as 2 entities are associating in a relationship.
Types of degree
Now, based on the number of linked entity types, we have 4 types of degrees of relationships.
1. Unary
2. Binary
3. Ternary
4. N-ary
Let’s discuss them one by one with the help of examples.
Unary (Degree 1)
In this type of relationship, both the associating entity types are the same. So, we can say that
unary relationships exist when both entity types are the same and we call them the degree of
relationship is 1. In other words, in a relation only one entity set is participating then such
type of relationship is known as a unary relationship.
Example: In a particular class, we have many students, there are monitors too. So, here class
monitors are also students. Thus, we can say that only students are participating here. So the
degree of such type of relationship is 1.

18
unary

Binary (Degree 2)
In a Binary relationship, there are two types of entity associates. So, we can say that a Binary
relationship exists when there are two types of entity and we call them a degree of
relationship is 2. Or in other words, in a relation when two entity sets are participating then
such type of relationship is known as a binary relationship. This is the most used relationship
and one can easily be converted into a relational table.

Example: We have two entity types ‘Student’ and ‘ID’ where each ‘Student’ has his ‘ID’. So,
here two entity types are associating we can say it is a binary relationship. Also, one ‘Father’
can have many ‘daughters’ but each ‘daughter’ should belong to only one ‘father. We can say
that it is a one-to-many binary relationship.

Binary

Ternary (Degree 3)
In the Ternary relationship, there are three types of entity associates. So, we can say that a
Ternary relationship exists when there are three types of entity and we call them a degree of
relationship is 3. Since the number of entities increases due to this, it becomes very complex
to turn E-R into a relational table. Now let’s understand with the examples.
Example: We have three entity types ‘Teacher’, ‘Course’, and ‘Class’. The relationship
between these entities is defined as the teacher teaching a particular course, also the teacher
teaches a particular class. So, here three entity types are associating we can say it is a ternary
relationship.

Ternary

N-ary (n Degree)
In the N-ary relationship, there are n types of entity that associates. So, we can say that an N-
ary relationship exists when there are n types of entities. There is one limitation of the N-ary
relationship, as there are many entities so it is very hard to convert into an entity, rational
table. So, this is very uncommon, unlike binary which is very much popular.
Example: We have 5 entities Teacher, Class, Location, Salary, Course. So, here five entity
types are associating we can say an n-ary relationship is 5. 19
N-ary

Conclusion
In summary, the complexity of the real-world scenario determines the type of relationship
(unary, binary, ternary, or n-ary) selected in a database architecture; simpler relationships are
favored where feasible. While ternary and n-ary relationships are utilized for more
sophisticated interactions, unary and binary relationships are frequently used. The selection
should fit the particular requirements and data structure of the application.

Degree of Relationship: A relationship where a number of different entities set participate is


called a degree of a relationship.

It is categorised into the following:

1. Unary Relationship: A unary relationship exists when both the participating entity
type are the same. When such a relationship is present we say that the degree of
relationship is 1.

For example, suppose in a classroom, we have many students who belong to a


particular club-like dance club, basketball club etc. and some of them are club leads.
So, a particular group of student is managed by their respective club lead. Here, the
group is formed from students and also, the club leads are chosen from students. So,
the ‘Student’ is the only entity participating here. We can represent this relationship
using the E-R diagram as follows:

2. Binary Relationship: A binary relationship exists when exactly two entity type
participates. When such a relationship is present we say that the degree is 2. This is
the most common degree of relationship. It is easy to deal with such relationship as 20
these can be easily converted into relational tables.

For example, we have two entities type ‘Customer’ and ‘Account’ where each
‘Customer’ has an ‘Account’ which stores the account details of the ‘Customer’.
Since we have two entity types participating we call it a binary relationship. Also, one
‘Customer’ can have many ‘Account’ but each ‘Account’ should belong to only one
‘Customer’. We can say that it is a one-to-many binary relationship.

3. Ternary Relationship: A ternary relationship exists when exactly three entity type
participates. When such a relationship is present we say that the degree is 3. As the
number of entity increases in the relationship, it becomes complex to convert them
into relational tables.

For example, we have three entity type ‘Employee’, ‘Department’ and ‘Location’.
The relationship between these entities are defined as an employee works in a
department,

21
an employee works at a particular location. So, we can see we have three entities
participating in a relationship so it is a ternary relationship. The degree of this relation
is 3.

4. n-ary Relationship: An N-ary relationship exists when ‘n’ number of entities are
participating. So, any number of entities can participate in a relationship. There is no
limitation to the maximum number of entities that can participate.

Converting E-R model into relational model

A given ER model can be converted into Relational model. A Relational model includes
Relations, Tuples, Attributes, Keys, and Foreign keys.

 Relation is a table made from tuples.


 A Tuple is a row of data.
 An Attribute is a characteristic of the relation.

There is a direct mapping between ER model and Relational model.

22
Rules of converting ER model to Relational Model:

 Entity type is converted to a Relation table.


 1:1 or 1: N relationship type is converted to foreign key.
 M: N relationship type is converted to a relation with two foreign key.
 Simple attribute converted to an attribute.
 Value set converted to a domain.
 Key attribute converted to a primary key.

Overall transformation summary is as follows:

Consider the following example:

23
Now for the above example we can create three relations:

 Employee
 Works_On
 Projects

Transform attributes to fields:

 Employee will have E_ID, Name, Designation and Dob.


 Works_On will have E_ID, Status and P_ID.
 Projects will have P_ID, S_Date and E_Date.

Now we can create tables in DBMS.

Advantages of E-R Model


1. Conceptually E-R model is very simple: ER model is very simple because if we
know relationship between entities and attributes, then we can easily draw an ER diagram.
2. Better Visual representation: ER model is a diagrammatic representation of any
logical structure of database. By seeing ER diagram, we can easily understand relationship
among entities and relationship.
3. Effective communication tool: It is an effective communication tool for
database designer.
The clear representation of the data listed under proper headings and tables results in
the effective flow of information and communication.
4. Highly integrated with relational model: ER model can be easily converted
into relational model by simply converting ER model into tables.
5. Easy conversion to any data: ER model can be easily converted into another data
model like hierarchical data model, network data model and so on.
6. Straightforward relation representation: Having designed an E-R diagram for a
database application, the relational representation of the database model becomes relatively
straightforward.

Aggregation

In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.

For example: Centre entity offers the Course entity act as a single entity in the relationship
which is in a relationship with another entity visitor. In the real world, if a visitor visits a
coaching Centre then he will never enquiry about the Course only or just about the Centre
instead he will ask the enquiry about both.

24
IsA Relationship and Attribute Inheritance

This relationship is called IsA. Some texts will call this an IsA relationship, but do not
confuse
with the concept of relationship between entities.

 Freshman IsA Student, an eagle IsA bird


 The two entities represented by IsA are always descriptions of the same real-world
object
 Typically used in databases to be implemented as Object Oriented Models.

25
 The upper entity type (connected to the apex of the IsA triangle) is the more abstract/general
entity type (super type) from which the lower entities inherit its attributes.

Properties of IsA

1. Inheritance - All attributes of the supertype apply to the subtype.

 E.g., An attribute of Student applies to Freshman


 The subtype inherits all attributes of its supertype.
 The key of the supertype is also the key of the subtype

2. Transitivity - This property creates a hierarchy of IsA relationships

 Student is subtype of Person,


Freshman is subtype of Student,
therefore Freshman is also a subtype of Person

1. CONSTRAINTS

Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.

Contraints could be column level or table level. Column level constraints are applied only to one
column where as table level constraints are applied to the whole table.

Following are commonly used constraints available in SQL. These constraints have already
been discussed in SQL - RDBMS Concepts chapter but its worth to revise them at this point.

 NOT NULL Constraint: Ensures that a column cannot have NULL value.
 DEFAULT Constraint : Provides a default value for a column when none is specified.
 UNIQUE Constraint: Ensures that all values in a column are different.
 PRIMARY Key: Uniquely identified each rows/records in a database table.
26
 FOREIGN Key: Uniquely identified a rows/records in any another database table.
 CHECK Constraint: The CHECK constraint ensures that all values in a column
satisfy certain conditions.
 INDEX: Use to create and retrieve data from the database very quickly.

Constraints can be specified when a table is created with the CREATE TABLE statement or you
can use ALTER TABLE statment to create constraints even after the table is created.

Dropping Constraints:

Any constraint that you have defined can be dropped using the ALTER TABLE command with the
DROP CONSTRAINT option.

27
For example, to drop the primary key constraint in the EMPLOYEES table, you can use the
following command:

ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;

Some implementations may provide shortcuts for dropping certain constraints. For example, to
drop the primary key constraint for a table in Oracle, you can use the following command:

ALTER TABLE EMPLOYEES DROP PRIMARY KEY;

Some implementations allow you to disable constraints. Instead of permanently dropping a


constraint from the database, you may want to temporarily disable the constraint, and then
enable it later.

Integrity Constraints:

Integrity constraints are used to ensure accuracy and consistency of data in a relational database.
Data integrity is handled in a relational database through the concept of referential integrity.

There are many types of integrity constraints that play a role in referential integrity (RI). These
constraints include Primary Key, Foreign Key, Unique Constraints and other constraints
mentioned above.

The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.

Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

28
(a) Another table is ORDERS as follows:

+ + + + +

29
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now let us join these two tables in our SELECT statement as follows:

SQL> SELECT ID, NAME, AGE,


AMOUNT FROM
CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce following result:

+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +

Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.

 DATA-MANIPULATION LANGUAGE
Data manipulation is
 The retrieval of information stored in the database
 The insertion of new information into the database
 The deletion of information from the database
30
 The modification of information stored in the database
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as organized by the appropriate data model. There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those
data. Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specifywhat data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data. The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data manipulation language
synonymously.
This query in the SQL language finds the name of the customer whose
customer-id is 192-83-7465:

31
selectcustomer.customer-name
fromcustomer
wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following
query finds the balance of all accounts owned by the customer with customerid 192-83-
7465.
selectaccount.balance
fromdepositor,
account
wheredepositor.customer-id = 192-83-7465 and
depositor.account-number= account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At
higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact
efficiently with the system. The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the database system.

UNIT III
Functional Dependencies
FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.

32
Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where
X and Y are subsets of attributes of R. An FD is a relationship between an attribute "Y" and a
determinant (1 or more other attributes) "X" such that for a given value of a determinant the
value of the attribute is uniquely defined.

 X is a determinant
 X determines Y
 Y is functionally dependent on X
 X→Y
 X →Y is trivial if Y Í X

Example:

Let R be
NewStudent(stuId, lastName, major, credits, status,

socSecNo) FDs in R include

 {stuId}→{lastName}, but not the reverse


 {stuId} →{lastName, major, credits, status, socSecNo, stuId}
 {socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
 {credits}→{status}, but not {status}→{credits}

ZipCode→AddressCity

ArtistName→BirthYear

Autobrand→Manufacturer, Engine type

Author, Title→PublDate

NORMALIZATION

Normalization is a process of reducing redundancies of data in a database. Quite often we come


across tables having a lot of bulk data with many columns. All these data might not be
necessary all the time whenever we use those tables. So, a better option is to split up the bulk
table into small parts and use only those tables which suit the actual purpose at a given instance
of time. In this way, redundancy is reduced. To make the long story short, we can simply say
that normalization is a process of dividing a big table into smaller ones in order to reduce
33
redundancy.

ANAMOLIES IN DBMS:
Insertion Anomaly
It is a failure to place information about a new database entry into all the places in the database
where information about the new entry needs to be stored. In a properly normalized database,
information about a new entry needs to be inserted into only one place in the database, in an
inadequatly normalized database, information about a new entry may need to be inserted into
more than one place, and human fallibility being what it is, some of the needed additional
insertions may be missed.

Deletion anomaly
It is a failure to remove information about an existing database entry when it is time to remove
that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry
needs to be deleted from only one place in the database, in an inadequatly normalized database,
information about that old entry may need to be deleted from more than one place.

Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus
“update anomalies” can be either of the kinds discussed above.

All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption
of the database. Properly normalized database are much less susceptible to corruption than are
un- normalized databases.

Normalization Avoids

 Duplication of Data – The same data is listed in multiple lines of the database
 Insert Anomaly – A record about an entity cannot be inserted into the table without
first inserting information about another entity – Cannot enter a customer without a
sales order

Database security

Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.

Security risks to database systems include, for example:

 Unauthorized or unintended activity or misuse by authorized database users, database


administrators, or network/systems managers, or by unauthorized users or hackers (e.g.
inappropriate access to sensitive data, metadata or functions within databases, or
34
inappropriate changes to the database programs, structures or security configurations);
 Malware infections causing incidents such as unauthorized access, leakage or disclosure of
personal or proprietary data, deletion of or damage to the data or programs, interruption or
denial of authorized access to the database, attacks on other systems and the unanticipated
failure of database services;
 Overloads, performance constraints and capacity issues resulting in the inability of
authorized users to use databases as intended;
 Physical damage to database servers caused by computer room fires or floods, overheating,
lightning, accidental liquid spills, static discharge, electronic breakdowns/equipment
failures and obsolescence;
 Design flaws and programming bugs in databases and the associated programs and
systems, creating various security vulnerabilities (e.g. unauthorized privilege escalation),
data loss/corruption, performance degradation etc.;
 Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in
database or system administration processes, sabotage/criminal damage etc.

Many layers and types of information security control are appropriate to databases, including:

 Access control
 Auditing
 Authentication
 Encryption
 Integrity controls
 Backups
 Application security

35
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.

Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.

Database security cannot be seen as an isolated problem because it is effected by other


components of a computerized system as well. The security requirements of a system are specified
by means of a security policy which is then enforced by various security mechanisms. For
databases, requirements on the security can be classified into the following categories:
□Identification, Authentication
Usually before getting access to a database each user has to identify himself to the computer
system. Authentication is the way to verify the identity of a user at log-on time. Most common
authentication methods are passwords but more advanced techniques like badge readers,
biometric recognition techniques, or signature analysis devices are also available.
□Authorization, Access Controls
Authorization is the specification of a set of rules that specify who has which type of access to
what information. Authorization policies therefore govern the disclosure and modification of
information. Access controls are procedures that are designed to control authorizations. They are
responsible to limit access to stored data to authorized users only.
□Integrity, Consistency
An integrity policy states a set of rules (i. e. semantic integrity constraints) that define the
correct states of the database during database operation and therefore can protect against
malicious or accidental modification of information. Closely related issues to integrity and
36
consistency are concurrency control and recovery. Concurrency control policies protect the
integrity of the database in the presence of concurrent transactions. If these transactions do
not terminate normally due to

37
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking

Locking is a mechanism commonly used by systems to control access to shared resources by


concurrently running users. In the context of a DBMS, these shared resources are data objects,
and the users are transactions.
Locking is typically implemented using a lock manager, which records which objects are locked,
by whom, and in what mode. When a transaction wishes to use a particular
object (to read or write), it must request a lock from the lock manager. After it is done with the
object, it releases the lock by again notifying the lock manager. In certain cases, the lock manager
is not able to immediately grant a lock when it is requested (e.g., if it is held by another
transaction). In this case, the lock manager maintains a queue of transactions waiting for the lock.
It is important to also recognize that some data items can be shared simultaneously between
transactions (e.g., transactions T1 and T2 both want to read object X), but in other cases it is
necessary for a transaction to have an exclusive lock (e.g., T1 wants to write to X). This
motivates the need for multiple lock modes. In this case, the idea is that, if a transaction requests a
lock on an object in a mode that is incompatible with an existing lock on that object, then it must
wait on the lock queue until the existing lock is released.
In a DBMS, the goal is to develop a locking protocol that guarantees a schedule with desirable
properties (e.g., serializability, recoverability, avoid cascading aborts). Two common protocols
are two-phase locking (2PL) and strict two-phase locking (Strict 2PL).
Strict Two-Phase Locking (Strict 2PL)
1. If a transaction T wants to read object X, it requests a shared lock on X. If it wants to write X, it
requests an exclusive lock.
2. All locks requested by a transaction are held until the transaction is completed (commits or
aborts), at which point the locks are released. It can be shown that Strict 2PL guarantees
schedules that are serializable, recoverable, and that avoid cascading aborts.
Two-Phase Locking (2PL)
2PL relaxes Strict 2PL slightly. A transaction need not hold all locks until completion, but
once it has released a lock, it may not request any more locks. 2PL is guaranteed to produce
schedules that are serializable.

Shared and Exclusive Locks

A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
38
needs to be accessed by another transaction, the existing lock for the object and the type of
the

39
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.

Two major types of locks are utilized:

 Write-lock (exclusive lock) is associated with a database object by a transaction


(Terminology: "the transaction locks the object," or "acquires lock for it") before writing
(inserting/modifying/deleting) this object.
 Read-lock (shared lock) is associated with a database object by a transaction before reading
(retrieving the state of) this object.

The common interactions between these lock types are defined by blocking behavior as follows:

 An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
 A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
 A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
 A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.

TRANSCATION

A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.

Example:

You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
40
• Subtracting the money from the savings account balance.

• Adding the money to the checking account balance.

The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.

You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together

States of Transaction

A transaction must be in one of the following states:

 Active: the initial state, the transaction stays in this state while it is executing.
 Partially committed: after the final statement has been executed.
 Failed: when the normal execution can no longer proceed.
 Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
 Committed: after successful completion.

We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.

41
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that

42
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion

The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.

ACID Properties of Transactions

Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.

43
 Delete Anomaly – A record cannot be deleted without deleting a record about a
related entity. Cannot delete a sales order without deleting all of the customer’s
information.
 Update Anomaly – Cannot update information without changing information in many
places. To update customer information, it must be updated for each sales order the
customer has placed
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building
blocks which are used to define normal form.

1. Determinant : Attribute X can be defined as determinant if it uniquely defines the value


Y in a given relationship or entity .To qualify as determinant attribute need NOT be a key
attribute
.Usually dependency of attribute is represented as X->Y ,which means attribute X decides
attribute Y.

Example: In RESULT relation, Marks attribute may decide the grade attribute .This is
represented as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.

2. Functional Dependency: Yes functional dependency has definition but let’s not care
about that. Let’s try to understand the concept by example. Consider the following
relation :

REPORT(Student#,Course#,CourseName,IName,Room#,Marks,Gra
de) Where:

 Student#-Student Number
 Course#-Course Number
 CourseName -CourseName
 IName- Name of the instructor who delivered the course
 Room#-Room number which is assigned to respective instructor
 Marks- Scored in Course Course# by student Student #
 Grade –Obtained by student Student# in course Course #
 Student#,Course# together (called composite attribute) defines EXACTLY ONE value
of marks .This can be symbolically represented as

Student#Course# Marks
This type of dependency is called functional dependency. In above example Marks is functionally
dependent on Student#Course#.
44
Other Functional dependencies in above examples are:

 Course# -> CourseName


 Course#-> IName(Assuming one course is taught by one and only one instructor )
 IName -> Room# (Assuming each instructor has his /her own and non shared room)
 Marks ->Grade

45
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one
value of
Y. This is represented as
: X->Y
However X may be composite in nature.

3. Full functional dependency: In above example Marks is fully functional dependent on


student#Course# and not on the sub set of Student#Course# .This means marks cannot be
determined either by student # or Course# alone .It can be determined by using Student#
and Course# together. Hence Marks is fully functional dependent on student#course#.

CourseName is not fully functionally dependent on student#course# because one of the subset
course# determines the course name and Student# does not having role in deciding Course name
.Hence CourseName is not fully functional dependent on student #Course#.

Student
# Marks

Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes.
Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-
set of X.However X may be composite in nature.

4. Partial Dependency: In the above relationship CourseName,IName,Room# are


partially dependent on composite attribute Student#Course# because Course# alone
can defines the coursename, IName,Room#.

Room#
IName
CourseNam
e Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute
Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However
X may be composite in nature.

5. Transitive Dependency: In above example , Room# depends on IName and in turn


depends on Course# .Here Room# transitively depends on Course#.

46
IName
Room#
Course
#
Similarly Grade depends on Marks,in turn Marks depends on Student# Course# hence Grade

47
Fully transitively depends on Student# Course#.

6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all
other attributes ,then the attribute X is a key attribute which is nothing but the
candidate key.

Ex: Student#Course# together is a composite key attribute which determines all attributes in
relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.

Un-Normalized Form (UNF)

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can
be further decomposed and simplified. Consider the following table:

Emp- Emp- Month Sales Bank-Id Bank-Name


Id Name
E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).

First Normal Form (1NF)

A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.

48
Emp- Month Sales Bank-Id Bank-Name
Emp- Name
Id
E01 AA Jan 1000 B01 SBI

49
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI

As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.

Second Normal Form (2NF)

A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends
on the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.

Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank- Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.

Emp-IdEmp- MonthSale Bank-


Name s Id
E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01

Bank- Bank-
Id Name
B01 SBI
B02 UTI
50
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.

51
Third Normal Form (3NF)

A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency
in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in
3NF, and the table must be split to bring it into 3NF.

What is a transitive dependency? Within a relation if we


see A → B [B depends on A]
And
B → C [C depends on
B] Then we may derive
A → C[C depends on A]

Such derived dependencies hold well in most of the situations. For example if we
have Roll → Marks
And
Marks → Grade
Then we may safely
derive Roll → Grade.

This third dependency was not originally specified but we have derived it.

The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode

If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code

Boyce-Code Normal Form (BCNF)

A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These
could be same situation when a 3NF relation may not be in BCNF the following conditions are
found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
52
3. There are some common attributes in the relation.

53
Professor Departmen Head ofPercent
Code t Dept. Time
P1 Physics Ghosh 50
P1 Mathematic Krishnan 50
s
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematic Krishnan 100
s

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.

Professor Departmen Percent


Code t Time
54
P1 Physics 50
P1 Mathematic 50
s
P2 Chemistry 25

55
P2 Physics 75
P3 Mathematic 100
s

Head of
Departmen Dept.
t
Physics Ghosh
Mathematic Krishnan
s
Chemistry Rao

See the dependency diagrams for these new relations.

Fourth Normal Form (4NF)

When attributes in a relation have multi-valued dependency, further Normalization to 4NF and
5NF are required. Let us first find out what multi-valued dependency is.

A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.

We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:

1. A vendor is capable of supplying many items.


2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.

A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
56
Vendor Item Project
Code Code No.
V1 I1 P1
V1 I2 P1
V1 I1 P3

57
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

The given relation has a number of problems. For example:

1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.

Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.

The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.

Vendor-Supply

Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project

Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2

58
Fifth Normal Form (5NF)

59
These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved
the dependencies between Vendor Code and Item code in the first table and Vendor Code and
Item code in the second table, we have lost the relationship between Item Code and Project No.
If there were a primary key then this loss of dependency would not have occurred. In order to
revive this relationship we must add a new table like the following. Please note that during the
entire process of normalization, this is the only step where a new table is created by joining two
attributes, rather than splitting them into separate tables.

Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13

Database security

Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.

Security risks to database systems include, for example:

 Unauthorized or unintended activity or misuse by authorized database users, database


administrators, or network/systems managers, or by unauthorized users or hackers (e.g.
inappropriate access to sensitive data, metadata or functions within databases, or
inappropriate changes to the database programs, structures or security configurations);
 Malware infections causing incidents such as unauthorized access, leakage or disclosure of
personal or proprietary data, deletion of or damage to the data or programs, interruption or
denial of authorized access to the database, attacks on other systems and the unanticipated
failure of database services;
 Overloads, performance constraints and capacity issues resulting in the inability of
authorized users to use databases as intended;
 Physical damage to database servers caused by computer room fires or floods, overheating,
lightning, accidental liquid spills, static discharge, electronic breakdowns/equipment
failures and obsolescence;
 Design flaws and programming bugs in databases and the associated programs and
60
systems, creating various security vulnerabilities (e.g. unauthorized privilege escalation),
data loss/corruption, performance degradation etc.;
 Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in
database or system administration processes, sabotage/criminal damage etc.

Many layers and types of information security control are appropriate to databases, including:

 Access control
 Auditing
 Authentication
 Encryption
 Integrity controls
 Backups
 Application security

61
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.

Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.

Database security cannot be seen as an isolated problem because it is effected by other


components of a computerized system as well. The security requirements of a system are specified
by means of a security policy which is then enforced by various security mechanisms. For
databases, requirements on the security can be classified into the following categories:
□Identification, Authentication
Usually before getting access to a database each user has to identify himself to the computer
system. Authentication is the way to verify the identity of a user at log-on time. Most common
authentication methods are passwords but more advanced techniques like badge readers,
biometric recognition techniques, or signature analysis devices are also available.
□Authorization, Access Controls
Authorization is the specification of a set of rules that specify who has which type of access to
what information. Authorization policies therefore govern the disclosure and modification of
information. Access controls are procedures that are designed to control authorizations. They are
responsible to limit access to stored data to authorized users only.
□Integrity, Consistency
An integrity policy states a set of rules (i. e. semantic integrity constraints) that define the
correct states of the database during database operation and therefore can protect against
malicious or accidental modification of information. Closely related issues to integrity and
62
consistency are concurrency control and recovery. Concurrency control policies protect the
integrity of the database in the presence of concurrent transactions. If these transactions do
not terminate normally due to

63
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking

Locking is a mechanism commonly used by systems to control access to shared resources by


concurrently running users. In the context of a DBMS, these shared resources are data objects,
and the users are transactions.
Locking is typically implemented using a lock manager, which records which objects are locked,
by whom, and in what mode. When a transaction wishes to use a particular
object (to read or write), it must request a lock from the lock manager. After it is done with the
object, it releases the lock by again notifying the lock manager. In certain cases, the lock manager
is not able to immediately grant a lock when it is requested (e.g., if it is held by another
transaction). In this case, the lock manager maintains a queue of transactions waiting for the lock.
It is important to also recognize that some data items can be shared simultaneously between
transactions (e.g., transactions T1 and T2 both want to read object X), but in other cases it is
necessary for a transaction to have an exclusive lock (e.g., T1 wants to write to X). This
motivates the need for multiple lock modes. In this case, the idea is that, if a transaction requests a
lock on an object in a mode that is incompatible with an existing lock on that object, then it must
wait on the lock queue until the existing lock is released.
In a DBMS, the goal is to develop a locking protocol that guarantees a schedule with desirable
properties (e.g., serializability, recoverability, avoid cascading aborts). Two common protocols
are two-phase locking (2PL) and strict two-phase locking (Strict 2PL).
Strict Two-Phase Locking (Strict 2PL)
3. If a transaction T wants to read object X, it requests a shared lock on X. If it wants to write X, it
requests an exclusive lock.
4. All locks requested by a transaction are held until the transaction is completed (commits or
aborts), at which point the locks are released. It can be shown that Strict 2PL guarantees
schedules that are serializable, recoverable, and that avoid cascading aborts.
Two-Phase Locking (2PL)
2PL relaxes Strict 2PL slightly. A transaction need not hold all locks until completion, but
once it has released a lock, it may not request any more locks. 2PL is guaranteed to produce
schedules that are serializable.

Shared and Exclusive Locks

A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
64
needs to be accessed by another transaction, the existing lock for the object and the type of
the

65
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.

Two major types of locks are utilized:

 Write-lock (exclusive lock) is associated with a database object by a transaction


(Terminology: "the transaction locks the object," or "acquires lock for it") before writing
(inserting/modifying/deleting) this object.
 Read-lock (shared lock) is associated with a database object by a transaction before reading
(retrieving the state of) this object.

The common interactions between these lock types are defined by blocking behavior as follows:

 An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
 A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
 A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
 A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.

TRANSCATION

A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.

Example:

You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
66
• Subtracting the money from the savings account balance.

• Adding the money to the checking account balance.

The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.

You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together

States of Transaction

A transaction must be in one of the following states:

 Active: the initial state, the transaction stays in this state while it is executing.
 Partially committed: after the final statement has been executed.
 Failed: when the normal execution can no longer proceed.
 Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
 Committed: after successful completion.

We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.

67
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that

68
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion

The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.

ACID Properties of Transactions

Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.

Timestamp Ordering Protocol

A timestamp is a tag that can be attached to any transaction or any data item, which denotes a
specific time on which the transaction or data item had been activated in any way. We, who use
computers, must all be familiar with the concepts of “Date Created” or “Last Modified” properties
of files and folders. Well, timestamps are things like that.

A timestamp can be implemented in two ways. The simplest one is to directly assign the
current value of the clock to the transaction or the data item. The other policy is to attach the
value of a logical counter that keeps incrementing as new timestamps are
69
required. The timestamp of a transaction denotes the time when it was first activated. The
timestamp of a data item can be of the following two types:

70
W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.

These two timestamps are updated each time a successful read/write operation is performed on
the data item Q.

How should timestamps be used?

The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be
executed in their respective timestamp order. This is an alternative solution to using locks.

For Read operations:

1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has already been overwritten by some other transaction. Hence the value which T
wanted to read from Q does not exist there anymore, and T would be rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has been written and committed by some other transaction earlier. Hence T will
be allowed to read the value of Q, and the R-timestamp of Q should be updated to TS (T).

For Write operations:

1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for
transaction T to write its value, and the delay has become so great that it has allowed
another transaction to read the old value of data item Q. In such a case T has lost its
relevance and will be rolled back.
2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the
system has allowed another transaction to write into the data item Q. in such a case too, T
has lost its relevance and will be rolled back.
3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).

UNIT IV

 DATA-MANIPULATION LANGUAGE
Data manipulation is
 The retrieval of information stored in the database
 The insertion of new information into the database
 The deletion of information from the database
 The modification of information stored in the database

71
A data-manipulation language (DML) is a language that enables users to access or
manipulate data as organized by the appropriate data model. There are basically two types:
Procedural DMLs require a user to specify what data are needed and how to get those
data. Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specifywhat data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since
a user does not have to specify how to get the data, the database system has to figure out an
efficient means of accessing data. The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data manipulation language
synonymously.
This query in the SQL language finds the name of the customer whose
customer-id is 192-83-7465:

72
selectcustomer.customer-name
fromcustomer
wherecustomer.customer-id = 192-83-7465
The query specifies that those rows from the table customer where the customer-id is 192-83-
7465 must be retrieved, and the customer-name attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the following
query finds the balance of all accounts owned by the customer with customerid 192-83-
7465.
selectaccount.balance
fromdepositor,
account
wheredepositor.customer-id = 192-83-7465 and
depositor.account-number= account.account-number
There are a number of database query languages in use, either commercially or experimentally.
The levels of abstraction apply not only to defining or structuring data, but also to manipulating
data. At the physical level, we must define algorithms that allow efficient access to data. At
higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact
efficiently with the system. The query processor component of the database system translates
DML queries into sequences of actions at the physical level of the database system.

RELATIONAL SET OF OPERATORS:


The UNION, INTERSECTION, and MINUS Operators

UNION: the UNION operation on relation A UNION relation B designated as A 𝖴 B, joins or


includes all tuples that are in A or in B, eliminating duplicate tuples. The SQL implementation
of the UNION operations would be as follows:

UNION

RESULT ← A 𝖴 B

SQL Statement:

SELECT * From A

UNION

SELECT * From B

INTERSECTION: the INTERSECTION operation on a relation A INTERSECTION relation


73
B, designated by A ∩ B, includes tuples that are only in A and B. In other words only tuples
belonging to A and B, or shared by both A and B are included in the result. The SQL
implementation of the INTERSECTION operations would be as follows:

INTERSECTION

RESULT ← A ∩ B

COMMANDS IN SQL :

1. CREATE DATABASE

The SQL CREATE DATABASE statement is used to create new SQL database.

Syntax:

Basic syntax of CREATE DATABASE statement is as follows:

CREATE DATABASE DatabaseName;

Always database name should be unique within the RDBMS.

Example:

If you want to create new database <testDB>, then CREATE DATABASE statement would be as
follows:

SQL> CREATE DATABASE testDB;

2. DROP DATABASE

74
The SQL DROP DATABASE statement is used to drop any existing database in SQL schema.

Syntax:

Basic syntax of DROP DATABASE statement is as follows:

DROP DATABASE DatabaseName;

Always database name should be unique within the RDBMS.

Example:

If you want to delete an existing database <testDB>, then DROP DATABASE statement would
be as follows:

SQL> DROP DATABASE testDB;

3. USE

The SQL USE statement is used to select any existing database in SQL schema.

Syntax:

Basic syntax of USE statement is as follows:

USE DatabaseName;

4. CREATE TABLE

The SQL CREATE TABLE statement is used to create a new table.

Syntax:

Basic syntax of CREATE TABLE statement is as follows:

CREATE TABLE
table_name( column1
datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
75
PRIMARY KEY( one or more columns )
);

76
CREATE TABLE is the keyword telling the database system what you want to do.in this case,
you want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.

Then in brackets comes the list defining each column in the table and what sort of data type it is.
The syntax becomes clearer with an example below.

A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement.

5. DROP TABLE

The SQL DROP TABLE statement is used to remove a table definition and all data, indexes,
triggers, constraints, and permission specifications for that table.

Syntax:

Basic syntax of DROP TABLE statement is as follows:

DROP TABLE table_name;

6. INSERT INTO

The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.

Syntax:

There are two basic syntax of INSERT INTO statement is as follows:

INSERT INTO TABLE_NAME (column1, column2,


column3,...columnN)] VALUES (value1, value2,
value3,...valueN);

Here column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.

You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:

INSERT INTO TABLE_NAME VALUES


77
Example:

Following statements would create six records in CUSTOMERS table:

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (2, 'Khilan',
25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (3,
'kaushik', 23, 'Kota', 2000.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (4,
'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (5, 'Hardik',
27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (6, 'Komal',
22, 'MP', 4500.00 );

You can create a record in CUSTOMERS table using second syntax as follows:

INSERT INTO CUSTOMERS


VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );

All the above statement would product following records in CUSTOMERS table:

+ -+ + + -+ +
78
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

79
7. SELECT

SQL SELECT Statement is used to fetch the data from a database table which returns data in
the form of result table. These result tables are called result-sets.

Syntax:

The basic syntax of SELECT statement is as follows:

SELECT column1, column2, columnN FROM table_name;

Here column1, column2...are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field then you can use following syntax:

SELECT * FROM table_name;

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would fetch ID, Name and Salary fields of the customers available
in CUSTOMERS table:

SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;

This would produce following result:

+ -+ + +
| ID | NAME| SALARY |
80
+ -+ + +
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |

81
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

8. WHERE CLAUSE

The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple table.

If the given condition is satisfied then only it returns specific value from the table. You would use
WHERE clause to filter the records and fetching only necessary records.

The WHERE clause not only used in SELECT statement, but it is also used in UPDATE, DELETE
statement etc. which we would examine in subsequent chapters.

Syntax:

The basic syntax of SELECT statement with WHERE clause is as follows:

SELECT column1, column2,


columnN FROM table_name
WHERE [condition]

You can specify a condition using comparision or logical operators like >, <, =, LIKE, NOT etc.
Below examples would make this concept clear.

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
82
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

83
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000:

SQL> SELECT ID, NAME,


SALARY FROM
CUSTOMERS
WHERE SALARY > 2000;

This would produce following result:

+ -+ + +
| ID | NAME| SALARY |
+ -+ + +
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

9. AND and OR OPERATORS

The SQL AND and OR operators are used to combile multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.

These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.

The AND Operator:

The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.

Syntax:

The basic syntax of AND operator with WHERE clause is as follows:

SELECT column1, column2,


columnN FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];

84
You can combine N number of conditions using AND operator. For an action to be taken by
the SQL statement, whether it be a transaction or query, all conditions separated by the AND
must be TRUE.

85
Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:

SQL> SELECT ID, NAME,


SALARY FROM
CUSTOMERS
WHERE SALARY > 2000 AND age < 25;

This would produce following result:

+ -+ + +
| ID | NAME | SALARY |
+ -+ + +
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

10. UPDATE

The SQL UPDATE Query is used to modify the existing records in a table.

You can use WHERE clause with UPDATE query to update selected rows otherwise all the
rows would be effected.

Syntax:
86
The basic syntax of UPDATE query with WHERE clause is as follows:

UPDATE table_name

87
SET column1 = value1, column2 = value2 , columnN = valueN
WHERE [condition];

11. DELETE

The SQL DELETE Query is used to delete the existing records from a table.

You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.

Syntax:

The basic syntax of DELETE query with WHERE clause is as follows:

DELETE FROM
table_name WHERE
[condition];

You can combine N number of conditions using AND or OR operators.

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would DELETE a customer whose ID is 6:

SQL> DELETE FROM


CUSTOMERS WHERE ID = 6;

88
Now CUSTOMERS table would have following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

89
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

12. LIKE

The SQL LIKE clause is used to compare a value to similar values using wildcard operators.
There are two wildcards used in conjunction with the LIKE operator:

 The percent sign (%)


 The underscore (_)

The percent sign represents zero, one, or multiple characters. The underscore represents a single
number or character. The symbols can be used in combinations.

Syntax:

The basic syntax of % and _ is as follows:

SELECT FROM
table_name WHERE
column LIKE 'XXXX%'

or

SELECT FROM table_name


WHERE column LIKE
'%XXXX%'

or

SELECT FROM
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
90
table_name WHERE
column LIKE 'XXXX_'

consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
91
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would display all the records from CUSTOMERS table where
SALARY starts with 200:

SQL> SELECT * FROM


CUSTOMERS WHERE
SALARY LIKE '200%';

This would produce following result:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ + + -+ +

13. TOP

The SQL TOP clause is used to fetch a TOP N number or X percent records from a table.

Syntax:

The basic syntax of TOP clause with SELECT statement would be as follows:

SELECT TOP number|percent


column_name(s) FROM table_name
WHERE [condition]

Example:
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
92
Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
93
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example on SQL server which would fetch top 3 records from CUSTOMERS table:

SQL> SELECT TOP 3 * FROM CUSTOMERS;

This would produce following result:

+ -+ -+ -+ +- +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ -+ -+ +- +

14. ORDER BY

The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on
one or more columns. Some database sorts query results in ascending order by default.

Syntax:

The basic syntax of ORDER BY clause is as follows:

SELECT
column-list
FROM
table_name
[WHERE
condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];

You can use more than one column in the ORDER BY clause. Make sure whatever column you
are using to sort, that column should be in column-list.

Example:
94
Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +

95
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would sort the result in ascending order by NAME and SALARY:

SQL> SELECT * FROM


CUSTOMERS ORDER BY
NAME, SALARY;

This would produce following result:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+ -+ + + -+ +

15. GROUP BY

The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange
identical data into groups.

The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.

Syntax:

The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the
96
conditions in the WHERE clause and must precede the ORDER BY clause if one is used.

SELECT column1,
column2 FROM
table_name WHERE [
conditions ]
GROUP BY column1, column2

97
ORDER BY column1, column2

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

If you want to know the total amount of salary on each customer, then GROUP BY query would
be as follows:

SQL> SELECT NAME, SUM(SALARY) FROM


CUSTOMERS GROUP BY NAME;

This would produce following result:

+ + +
| NAME | SUM(SALARY) |
+ + +
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh |2000.00 |
+ + +

16. DISTINCT KEYWORD

98
The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate all the
duplicate records and fetching only unique records.

There may be a situation when you have multiple duplicate records in a table. While fetching such
records, it makes more sense to fetch only unique records instead of fetching duplicate records.

99
Syntax:

The basic syntax of DISTINCT keyword to eliminate duplicate records is as follows:

SELECT DISTINCT column1, column2,


.........................................................
columnN FROM table_name
WHERE [condition]

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

First let us see how the following SELECT query returns duplicate salary records:

SQL> SELECT SALARY FROM


CUSTOMERS ORDER BY SALARY;

This would produce following result where salary 2000 is coming twice which is a duplicate
record from the original table.

+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
100
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +

Now let us use DISTINCT keyword with the above SELECT query and see the result:

101
SQL> SELECT DISTINCT SALARY FROM
CUSTOMERS ORDER BY SALARY;

This would produce following result where we do not have any duplicate entry:

+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +

The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.

Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

(a) Another table is ORDERS as follows:

+ + + + +

102
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now let us join these two tables in our SELECT statement as follows:

SQL> SELECT ID, NAME, AGE,


AMOUNT FROM
CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce following result:

+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +

Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.

UNIT V

JOIN Operator

The JOIN operation is denoted by the symbol and is used to compound similar tuples from two
Relations into single longer tuples. Every row of the first table is joined to every row of the
second table. The result is tuples taken from both tables.

 The general syntax would be A <join condition> B


103
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob,

A.empno from employee

JOIN B on B.empno=A.empno

THETA JOIN Operator

This operation results in all combinations of tuples from Relation A and Relation B satisfying a
join requirement. The THETA JOIN is designated by: The SQL implementation would be the
same as for the JOIN example above.

 A <join condition> B

EQUIJOIN Operator

The EQUIJOIN operation returns all combinations of tuples from Relation A and Relation B
satisfying a join requirement with only equality comparisons. The EQUIJOIN operation is
symbolized by :

A <join condition> B, OR


 A (<join

attributes 1>), (<join

attributes 2>) B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

17. SQL JOIN TYPES

There are different type of joins available in SQL:

 INNER JOIN: returns rows when there is a match in both tables.


 LEFT JOIN: returns all rows from the left table, even if there are no matches in the
right table.
 RIGHT JOIN: returns all rows from the right table, even if there are no matches in the
left table.
 FULL JOIN: returns rows when there is a match in one of the tables.
104
 SELF JOIN: is used to join a table to itself, as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
 CARTESIAN JOIN: returns the cartesian product of the sets of records from the two or
more joined tables.

105
18. UNION

The SQL UNION clause/operator is used to combine the results of two or more SELECT
statements without returning any duplicate rows.

To use UNION, each SELECT must have the same number of columns selected, the same
number of column expressions, the same data type, and have them in the same order but they do
not have to be the same length.

Syntax:

The basic syntax of UNION is as follows:

SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]

UNION

SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]

Here given condition could be any given expression based on your requirement.

Example:

Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
106
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

(b) Another table is ORDERS as follows:

+ + + + +

107
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now let us join these two tables in our SELECT statement as follows:

SQL> SELECT ID, NAME, AMOUNT,


DATE FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID UNION
SELECT ID, NAME, AMOUNT,
DATE FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce following result:

+ + +- +- +
| ID | NAME | AMOUNT | DATE |
+ + +- +- +
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+ + +- +- +

SQL> SELECT * FROM CUSTOMERS;


Empty set (0.00 sec)

108
The HAVING clause enables you to specify conditions that filter which group results appear in
the final results.

The WHERE clause places conditions on the selected columns, whereas the HAVING clause places
conditions on groups created by the GROUP BY clause.

Syntax:

The following is the position of the HAVING clause in a query:

SELE
CT
FROM
WHER
E

109
GROUP
BY
HAVING
ORDER
BY

The HAVING clause must follow the GROUP BY clause in a query and must also precede the
ORDER BY clause if used. The following is the syntax of the SELECT statement, including the
HAVING clause:

SELECT column1,
column2 FROM
table1, table2 WHERE
[ conditions ]
GROUP BY column1,
column2 HAVING [
conditions ] ORDER BY
column1, column2

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

110
SELECT * from A

INNER JOIN B

on A.empno=B.empno

NATURAL JOIN Operator

The NATURAL JOIN operation returns results that does not include the JOIN attributes of the
second Relation B. It is not required that attributes with the same name be mentioned. The
NATURAL JOIN operator is symbolized by:

 A * <join

condition> B, OR A *

(<join attributes 1>),

(<join attributes 2>) B

OR A * B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob, B.empno

FROM A

NATURAL JOIN

//where depno =5

We can always use the ‘where’ clause to further restrict our output and stop a Cartesian product
output.

111
SQL Statement:

SELECT dob From A

INTERSECT

SELECT dob from B

MINUS Operations: the MINUS operation includes tuples from one Relation that are not in
another Relation. Let the Relations be A and B, the MINUS operation A MINUS B is denoted by
A – B, that results in tuples that are A and not in B. The SQL implementation of the MINUS
operations would be as follows:

MINUS

RESULT ← A – B

SQL Statement

SELECT dob From A

MINUS

SELECT dob from B


Module 2
RELATIONAL ALGEBRA

Relational Algebra Operators are mathematical functions used to retrieve queries by describing a
sequence operations on tables or even databases(schema) involved. With relational algebra

112
operators, a query is always composed of a number of operators, which each in turn are composed of
relations as variables and return an individual abstraction as the end product.

The following are the main relational algebra operators as applied to SQL:

The SELECT Operator

The SELECT operator is used to choose a subset of the tuples(rows) from a relation that satisfies a
selection condition, acting as a filter to retain only tuples that fulfills a qualifying requirement.

 The SELECT operator is relational algebra is denoted by the symbol σ (sigma).


 The syntax for the SELECT statement is then as follows:

σ<Selection condition>(R)

 The σ would represent the SELECT command


 The <selection condition> would represent the condition for selection.
 The (R) would represent the Relation or the Table from which we are making a
selection of the tuples.

To implement the SELECT statement in SQL, we take a look at an example in which we would
like to select the EMPLOYEE tuples whose employee number is 7, or those whose date of birth is
before 1980…

σempno=7(EMPLOYEE)

σdob<’01-Jan-1980′

(EMPLOYEE)

The SQL implementation would translate into:

SELECT empno

FROM EMPLOYEE

WHERE empno=7

SELECT dob

113
FROM

EMPLOYEE

WHERE DOB < ’01-Jan-1980′

114
The PROJECT Operator

This operator is used to reorder, select and get rid of attributes from a table. At some point we
might want only certain attributes in a relation and eliminate others from our query result.
Therefore the PROJECT operator would be used in such operations.

 The symbol used for the PROJECT operation is ∏ (pi).


 The general syntax for the PROJECT operator is:

∏<attribute list>(R )

 ∏ would represent the ROJECT.


 <attribute list> would represent the attributes(columns) we want from a relational.
 (R ) would represent the relation or table we want to choose the attributes from.

To implement the PROJECT statement in SQL, we take a look at an example in which we


would like to choose the Date of Birth (dob) and Employee Number (empno) from the relation
EMPLOYE…

 ∏dob, empno(EMPLOYEE )

In SQL this would translate to:

SELECT dob,

empno FROM

EMPLOYEE

The RENAME Operator

The RENAME operator is used to give a name to results or output of queries, returns of selection
statements, and views of queries that we would like to view at some other point in time:

 The RENAME operator is symbolized by ρ (rho).


 The general syntax for RENAME operator is: ρ s(B1, B2, B3,….Bn)(R )
 ρ is the RENAME operation.
 S is the new relation name.
 B1, B2, B3, …Bn are the new renamed attributes (columns).
 R is the relation or table from which the attributes are chosen.
115
To implement the RENAME statement in SQL, we take a look at an example in which we
would like to choose the Date of Birth and Employee Number attributes and RENAME them as
‘Birth_Date’ and ‘Employee_Number’ from the EMPLOYEE relation…

ρ s(Birth_Date, Employee_Number)(EMPLOYEE ) ← ∏dob, empno(EMPLOYEE )

 The arrow symbol ← means that we first get the PROJECT operation results on the right
side of the arrow then apply the RENAME operation on the results on the left side of the
arrow.

In SQL we would translate the RENAME operator using the SQL ‘AS’ statement:

SELECT dob AS ‘Birth_Date’, empno AS ‘Employee_Number’

FROM EMPLOYEE

The UNION, INTERSECTION, and MINUS Operators

UNION: the UNION operation on relation A UNION relation B designated as A 𝖴 B, joins or


includes all tuples that are in A or in B, eliminating duplicate tuples. The SQL implementation
of the UNION operations would be as follows:

UNION

RESULT ← A 𝖴 B

SQL Statement:

SELECT * From A

UNION

SELECT * From B

INTERSECTION: the INTERSECTION operation on a relation A INTERSECTION relation


B, designated by A ∩ B, includes tuples that are only in A and B. In other words only tuples
belonging to A and B, or shared by both A and B are included in the result. The SQL
implementation of the INTERSECTION operations would be as follows:

INTERSECTION

116
RESULT ← A ∩ B

117
SQL Statement:

SELECT dob From A

INTERSECT

SELECT dob from B

MINUS Operations: the MINUS operation includes tuples from one Relation that are not in
another Relation. Let the Relations be A and B, the MINUS operation A MINUS B is denoted by
A – B, that results in tuples that are A and not in B. The SQL implementation of the MINUS
operations would be as follows:

MINUS

RESULT ← A – B

SQL Statement

SELECT dob From A

MINUS

SELECT dob from B

CARTESIAN PRODUCT Operator

The CARTERSIAN PRODUCT operator, also referred to as the cross product or cross join,
creates a relation that has all the attributes of A and B, allowing all the attainable combinations of
tuples from A and B in the result. The CARTERSIAN PRODUCT A and B is symbolized by X
as in A X B.

Let there be Relation A(A1, A2) and Relation B(B1, B2)

The CARTERSIAN PRODUCT C of A and B which is A X B

is C = A X B

C = (A1B1, A1B2 , A2B1, A2B2 )

The SQL implementation would be something like:


118
SELECT A.dob, B.empno

119
from A, B

JOIN Operator

The JOIN operation is denoted by the symbol and is used to compound similar tuples from two
Relations into single longer tuples. Every row of the first table is joined to every row of the
second table. The result is tuples taken from both tables.

 The general syntax would be A <join condition> B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob,

A.empno from employee

JOIN B on B.empno=A.empno

THETA JOIN Operator

This operation results in all combinations of tuples from Relation A and Relation B satisfying a
join requirement. The THETA JOIN is designated by: The SQL implementation would be the
same as for the JOIN example above.

 A <join condition> B

EQUIJOIN Operator

The EQUIJOIN operation returns all combinations of tuples from Relation A and Relation B
satisfying a join requirement with only equality comparisons. The EQUIJOIN operation is
symbolized by :

A <join condition> B, OR


 A (<join

attributes 1>), (<join

attributes 2>) B

120
SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

121
SELECT * from A

INNER JOIN B

on A.empno=B.empno

NATURAL JOIN Operator

The NATURAL JOIN operation returns results that does not include the JOIN attributes of the
second Relation B. It is not required that attributes with the same name be mentioned. The
NATURAL JOIN operator is symbolized by:

 A * <join

condition> B, OR A *

(<join attributes 1>),

(<join attributes 2>) B

OR A * B

SQL translation example where attribute dob is Date of Birth and empno is Employee Number:

SELECT A.dob, B.empno

FROM A

NATURAL JOIN

//where depno =5

We can always use the ‘where’ clause to further restrict our output and stop a Cartesian product
output.

DIVISION Operator

The DIVISION operation will return a Relation R(X) that includes all tuples t[X] in R(Z) that
122
appear in R1 in combination with every tuple from R2(Y), where Z = X 𝖴 Y. The DIVISION
operator is symbolized by:

 R1(Z) ∻ R2(Y)

123
The DIVISION operator is the most difficult to implement in SQL as no SQL command is given
for DIVISION operation. The DIVISION operator would be seen as the opposite of the
CARTERSIAN PRODUCT operator; just as in standard math, the relation between division and
multiplication.
Therefore a series of current SQL commands have to be utilized in implementation of the
DIVISION operator. An example of the SQL implementation of DIVISION operator:

SELECT surname,

forenames FROM employee

WHERE NOT

EXISTS (SELECT ‘X’

FROM employee y

WHERE NOT

EXISTS (SELECT ‘X’

FROM employee z

WHERE x.empno = z.empno

AND y.surname =

z.surname)) ORDER BY

empno

RELATIONAL CALCULUS

In relational calculus, a query is expressed as a formula consisting of a number of variables and an


124
expression involving these variables. It is up to the DBMS to transform these nonprocedural
queries into equivalent, efficient, procedural queries. The concept of relational calculus was
first proposed by Codd. The relational calculus is used to measure the selective power of
relational languages. A language that can be used to produce any relation that can be derived
using the relational calculus is said to be relationally complete.

125
Relation calculus, which in effect means calculating with relations, is based on predicate calculus,
which is calculating with predicates. It is a formal language used to symbolize logical arguments
in mathematics. Propositions specifying a property consist of an expression that names an
individual object, and another expression, called the predicate, that stands for the property that the
individual object possesses. If for instance, p and q are propositions, we can build other
propositions "not p", "p or q", "p and q" and so on. In predicate calculus, propositions may be
built not only out of other propositions but also out of elements that are not themselves
propositions. In this manner we can build a proposition that specifies a certain property or
characteristic of an object.

TUPLE RELATIONAL CALCULUS

The tuple relational calculus is based on specifying a number of tuple variables. Each such tuple
variable normally ranges over a particular database relation. This means that the variable may
take any individual tuple from that relation as its value. A simple tuple relational calculus query is
of the form { t I COND(t)·}, where '1' is a tuple variable and COND(t) is a conditional
expression involving '1'. The result of such a query is a relation that contains all the tuples (rows)
that satisfy COND(t).

For example, the relational calculus query {t I BOOK(t) and t.PRICE>lOO} will get you all the
books whose price is greater than 100. In the above example, the condition 'BOOK(t)' specifies
that the range relation of the tuple variable '1' is BOOK. Each BOOK tuple 't' that satisfies the
condition 't.PRICE> 100' will be retrieved. Note that 't.PRICE' references the attribute PRICE of
the tuple variable '1'.

The query {t IBOOK (t) and t.PRICE>100} retrieves all attribute values for each selected

BOOK tuple. To retrieve only some of the attributes (say TITLE, AUTHOR and PRICE) we can
modify the query as follows:

{t.TITLE, t.AUTHOR, t.PRICE I BOOK(t) and t.PRICE>200}

Thus, in a tuple calculus expression we need to specify the following information:

For each tuple variable the range relation 'R' of 'to This value is specified by a condition of the
form R(t) .

• A condition to select the required tuples from the relation.


126
• A set of attributes to be retrieved. This set is called the requested attributes. The values of these
attributes for each selected combination of tuples. If the requested attribute list is not specified,
then all the attributes of the selected tuples are retrieved.

Thus, to retrieve the details of all books (Title and Author name) which were published by
'Kalyani' and whose price is greater than 100 we will write the query as follows:

{t.TITLE, t.AUTHOR I BOOK (t) and t.PUBLISHER='xyz' and t.PRICE>100}

Expressions and Formulas

A general expression of the.tuple relational calculus is of the following form:

{tl·Aj,t2·A2, tn·An, I COND(tl,t2· .. tn, tn+j,1n+2. tn+m)}

Where tj,t2 1n,1n+j,1n+2 1n+mare tuple variables, each Ai is an attribute of the relation on which
ranges and COND is a condition or formula of the tuple relational calculus.

A formula is defined as follows:

Every condition is a WFF (Well Formed Formula). Here, a well-formed formula is constructed
from conditions, Boolean operations (AND, OR, NOT) and quantifiers like for all values (V) or
there exists (3).

There are following rules which are applicable on WFF:

• Every condition is WFF.

• If F is a WFF the (F) and NOT(F) are also WFF.

• If Fj and F2 are WFFs, then (Fj AND F2), (Fj OR F2) are also WFFs .

• If F is a WFF in which T occurs as a free variable (free variables are those

range variables when, the meaning of the formula changed if all the

occurrences of range variable say 'x' were replaced by some other variables say

'y') then 3 T(t) and


127
V T(F) are WFFs .

128
• Nothing else is WFF.

Bound variables: Bound variables are those range variables when the meaning of the formula
would remain unchanged if all the occurrence of range variable say 'x' were replaced by some
other variable say 'y'. Then range variable 'x' is called as the Bound variable.

For example: x (x>3)

means EXISTS x (x>3)

Here, WFF simply states that there exists some integer x that is greater than 3. Note, that the
meaning of this WFF would remain totally unchanged if all references of x were replaced by
references to some other variable y. In other words the WFF EXISTS y(y>3) is semantically
same.

Free Variables: Free variables are those range variables when the meaning of the formula
changed, if all the occurrences of range variable say 'x' were replaced by some other variables
say 'y'. Then range variable 'x' is called as the Free variable.

For example: x (x>3) and x<O

means EXISTS x (x>3) and x <0

Here, there are three references to x, denoting two different variables. The first two references are
bound and could be replaced by references to some other variable y without changing the overall
meaning. The third reference is free, and cannot be replaced without changing the meaning of the
formula. Thus, of the two WFFs shown below, the first is equivalent to the one just given and the
second is not: -

EXIST Y (y>3) and x<O

EXITS y (y>3) and y<O

Closed and Open WFF: A WFF in which all variables references are bound is called Closed
WFF.
e.g. EXISTS x (x>3) is a closed WFF.

An open WFF is a WFF that is not closed i.e. one that consists of at least one free variable
reference.
e.g. EXISTS y (y>3) and x<O
129
DOMAIN RELATIONAL CALCULUS

130
The domain calculus differs from the tuple calculus in the type of variables used in formulas. In
domain calculus the variables range over single values from domains of attributes rather than
ranging over tuples. To form a relation of degree 'n' for a query result, we must have 'n' of these
domain variables-one for each attribute.

An expression of the domain calculus is of the following form:

{Xl, X2, ... , Xn I COND(XI, X2, .. ·, Xn, Xn+b Xn+2, , Xn+m)}

In the above expression Xl, X2, ... , Xn, Xn+b Xn+2, , Xn+m are domain variables that range over
domains of attributes and COND is a condition or formula of the domain relational calculus.

Expression of the domain calculus are constructed from the following elements:

• Domain variables Xl, X2, ... , Xn, Xn+b Xn+2, ... , Xn+m each domain variable is to range
over some specified domain .

• Conditions, which can take two forms:

• Simple comparisons of the form x * y, as for the tuple calculus, except that x and yare now
domain variables.

• Membership conditions, of the form R

(term, term ...).

Here, R is a relation, and each "term" is a pair AV, where A in turn is an attribute

Of R and V is either a domain variable or a constant. For example EMP (empno: 100, ename:
'Ajay') is a membership condition (which evaluates to true if and only if there exists an EMP tuple
having empno=100 and ename = 'Ajay') .

• Well Formed Formulates (WFFs), formed in accordance with rules of tuple calculus (but with
the revised definition of "condition").

Free and Bound Variables

The rules concerning free and bound variables given for the tuple calculus are also applicable
similarly on the domain calculus.

131
SQL

What is SQL?

SQL is structured Query Language which is a computer language for storing, manipulating and
retrieving data stored in relational database.

SQL is the standard language for Relation Database System. All relational database management
systems like MySQL, MS Access, Oracle, Sybase, Informix, postgres and SQL Server uses SQL
as standard database language.

Also they are using different dialects, Such as:

 MS SQL Server using T-SQL,


 Oracle using PL/SQL,
 MS Access version of SQL is called JET SQL (native format )etc

Why SQL?

 Allow users to access data in relational database management systems.


 Allow users to describe the data.
 Allow users to define the data in database and manipulate that data.
 Allow to embed within other languages using SQL modules, libraries & pre-compilers.
 Allow users to create and drop databases and tables.
 Allow users to create view, stored procedure, functions in a database.
 Allow users to set permissions on tables, procedures, and views

SQL Process:

When you are executing an SQL command for any RDBMS, the system determines the best
way to carry out your request and SQL engine figures out how to interpret the task.

There are various components included in the process. These components are Query
Dispatcher, Optimization engines, Classic Query Engine and SQL query engine etc. Classic
query engine handles all non-SQL queries but SQL query engine won't handle logical files.

Following is a simple diagram showing SQL Architecture:

132
SQL DATABASE :

MySQL

MySQL is open source SQL database, which is developed by Swedish company MySQL AB.
MySQL is pronounced "my ess-que-ell," in contrast with SQL, pronounced "sequel."

MySQL is supporting many different platforms including Microsoft Windows, the major Linux
distributions, UNIX, and Mac OS X.

MySQL has free and paid versions, depending on its usage (non-commercial/commercial)
and features. MySQL comes with a very fast, multi-threaded, multi-user, and robust SQL
database server.

Features:

 High Performance.
 High Availability.
 Scalability and Flexibility Run anything.
 Robust Transactional Support.
 Web and Data Warehouse Strengths.
 Strong Data Protection.
 Comprehensive Application Development.
 Management Ease.
 Open Source Freedom and 24 x 7 Support.
 Lowest Total Cost of Ownership.

133
MS SQL Server

MS SQL Server is a Relational Database Management System developed by Microsoft Inc. Its
primary query languages are:

 T-SQL.
 ANSI SQL.

Features:

 High Performance.
 High Availability.
 Database mirroring.
 Database snapshots.
 CLR integration.
 Service Broker.
 DDL triggers.
 Ranking functions.
 Row version-based isolation levels.
 XML integration.
 TRY...CATCH.
 Database Mail.

ORACLE

It is very large and multi-user database management system. Oracle is a relational database
management system developed by 'Oracle Corporation'.

Oracle works to efficiently manage its resource, a database of information, among the multiple
clients requesting and sending data in the network.

It is an excellent database server choice for client/server computing. Oracle supports all major
operating systems for both clients and servers, including MSDOS, NetWare, UnixWare, OS/2
and most UNIX flavors.

Features:

 Concurrency
 Concurrency
 Read Consistency
134
 Locking Mechanisms
 Quiesce Database
 Portability
 Self managing database
 SQL*Plus
 ASM
 Scheduler
 Resource Manager
 Data Warehousing
 Materialized views
 Bitmap indexes
 Table compression
 Parallel Execution
 Analytic SQL
 Data mining
 Partitioning

MS- ACCESS

This is one of the most popular Microsoft products. Microsoft Access is entry-level database
management software. MS Access database is not only an inexpensive but also powerful
database for small-scale projects.

MS Access uses the Jet database engine which utilizes a specific SQL language dialect

Features:

 Users can create tables, queries, forms and reports, and connect them together with macros.
 The import and export of data to many formats including Excel, Outlook, ASCII,
dBase, Paradox, FoxPro, SQL Server, Oracle, ODBC, etc.
 There is also the Jet Database format (MDB or ACCDB in Access 2007) which can
contain the application and data in one file. This makes it very convenient to distribute
the entire application to another user, who can run it in disconnected environments.
 Microsoft Access offers parameterized queries. These queries and Access tables
can be referenced from other programs like VB6 and .NET through DAO or ADO.
 The desktop editions of Microsoft SQL Server can be used with Access as an
alternative to the Jet Database Engine.
 Microsoft Access is a file server-based database. Unlike client-server relational
database management systems (RDBMS), Microsoft Access does not implement
135
database triggers, stored procedures, or transaction logging.

136
 SQL data type is an attribute that specifies type of data of any object. Each column,
variable and expression has related data type in SQL.
 You would use these data types while creating your tables. You would choose a
particular data type for a table column based on your requirement.
 SQL Server offers six categories of data types for your use:

SQL DATA TYPE :

Exact Numeric Data Types:

DATA TYPE FRO T


M O
bigint - 9,223,372,036,854,775,8
9,223,372,036,854,775,8 07
08
int -2,147,483,648 2,147,483,647

smallint -32,768 32,767

tinyint 0 255

bit 0 1

decimal -10^38 +1 10^38 .1

numeric -10^38 +1 10^38 .1

money - +922,337,203,685,477.58
922,337,203,685,477.58 07
08
smallmoney -214,748.3648 +214,748.3647

 Approximate Numeric Data Types:


DATA TYPE FROM TO

float -1.79E + 308 1.79E + 308

real -3.40E + 38 3.40E + 38

 Date and Time Data Types:


DATA TYPE FROM TO
137
datetime Jan 1, 1753 Dec 31, 9999

138
Smalldatetime Jan 1, 1900 Jun 6, 2079

date Stores a date like June 30,

1991

Stores a time of day like


time 12:30 P.M.

Character Strings Data Types:


DATA TYPE FROM TO

Maximum length of 8,000


char char
characters.( Fixed length
non- Unicode characters)

Maximum of 8,000
varchar
characters.(Variable-length
varchar non- Unicode data).

Maximum length of
varchar(max)
231characters, Variable-length
varchar(max) non-Unicode data (SQL Server
2005 only).

Variable-length non-Unicode
text text
data with a maximum length
of 2,147,483,647 characters.

Operator in SQL

An operator is a reserved word or a character used primarily in an SQL statement's WHERE


clause to perform operation(s), such as comparisons and arithmetic operations.

Operators are used to specify conditions in an SQL statement and to serve as conjunctions for
multiple conditions in a statement.
139
 Arithmetic operators
 Comparison operators
 Logical operators
 Operators used to negate conditions

140
SQL Arithmetic Operators:

Assume variable a holds 10 and variable b holds 20 then:

Operator Description Example

Addition - Adds values on either side


+ of the operator a + b will give 30
- Subtraction - Subtracts right hand
operand from left hand operand
a - b will give -10
Multiplication - Multiplies values on
* either side of the operator
a * b will give 200
Division - Divides left hand operand by
/ right hand operand
b / a will give 2
Modulus - Divides left hand operand
% by right hand operand and returns
remainder b % a will give 0

SQL Comparison Operators:

Assume variable a holds 10 and variable b holds 20 then:

Operator Description Example

Checks if the value of two operands are


= equal or not, if yes then condition (a = b) is not true.
becomes true.

Checks if the value of two operands are


equal or (a != b) is true.
!= not, if values are not equal then
condition becomes true.

Checks if the value of two operands are equal or


141
<> not, if values are not equal then
condition becomes true. (a <> b) is true.

142
Checks if the value of left operand is
greater (a > b) is not true.
> than the value of right operand, if
yes then condition becomes true.

Checks if the value of left operand is less


than (a < b) is true.
< the value of right operand, if yes then
condition becomes true.

Checks if the value of left operand is greater


>= than or equal to the value of right operand, if yes (a >= b) is
not true. then condition becomes true.

Checks if the value of left operand is less


than or (a <= b) is true.
<= equal to the value of right operand, if
yes then condition becomes true.

Checks if the value of left operand is


not less (a !< b) is false.
!< than the value of right operand, if
yes then condition becomes true.

Checks if the value of left operand is not


greater (a !> b) is true.
!> than the value of right operand, if
yes then condition becomes true.

SQL Logical Operators:

Here is a list of all the logical operators available in SQL.

Operator Description

143
ALL The ALL operator is used to compare a value to all values in another

value set. The AND operator allows the existence of multiple conditions in an SQL
AND
statement's
WHERE clause.

The ANY operator is used to compare a value to any applicable value


ANY in the list according to the condition.

The BETWEEN operator is used to search for values that are within a set
BETWEEN of values, given the minimum value and the maximum value.

144
The EXISTS operator is used to search for the presence of a row in a
EXISTS specified table that meets certain criteria.

The IN operator is used to compare a value to a list of literal values that


IN have been specified.

The LIKE operator is used to compare a value to similar values using


LIKE wildcard operators.

The NOT operator reverses the meaning of the logical operator with which it is
NOT
used.
Eg. NOT EXISTS, NOT BETWEEN, NOT IN etc. This is negate operator.

The OR operator is used to combine multiple conditions in an SQL


OR statement's WHERE clause.

IS NULL The NULL operator is used to compare a value with a NULL value.

The UNIQUE operator searches every row of a specified table for


UNIQUE uniqueness (no duplicates).

COMMANDS IN SQL :

1. CREATE DATABASE

The SQL CREATE DATABASE statement is used to create new SQL database.

Syntax:

Basic syntax of CREATE DATABASE statement is as follows:

CREATE DATABASE DatabaseName;

Always database name should be unique within the RDBMS.

Example:

If you want to create new database <testDB>, then CREATE DATABASE statement would be as
follows:

SQL> CREATE DATABASE testDB;

2. DROP DATABASE
145
The SQL DROP DATABASE statement is used to drop any existing database in SQL schema.

Syntax:

Basic syntax of DROP DATABASE statement is as follows:

DROP DATABASE DatabaseName;

Always database name should be unique within the RDBMS.

Example:

If you want to delete an existing database <testDB>, then DROP DATABASE statement would
be as follows:

SQL> DROP DATABASE testDB;

3. USE

The SQL USE statement is used to select any existing database in SQL schema.

Syntax:

Basic syntax of USE statement is as follows:

USE DatabaseName;

4. CREATE TABLE

The SQL CREATE TABLE statement is used to create a new table.

Syntax:

Basic syntax of CREATE TABLE statement is as follows:

CREATE TABLE
table_name( column1
datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
146
PRIMARY KEY( one or more columns )
);

147
CREATE TABLE is the keyword telling the database system what you want to do.in this case,
you want to create a new table. The unique name or identifier for the table follows the CREATE
TABLE statement.

Then in brackets comes the list defining each column in the table and what sort of data type it is.
The syntax becomes clearer with an example below.

A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement.

5. DROP TABLE

The SQL DROP TABLE statement is used to remove a table definition and all data, indexes,
triggers, constraints, and permission specifications for that table.

Syntax:

Basic syntax of DROP TABLE statement is as follows:

DROP TABLE table_name;

6. INSERT INTO

The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.

Syntax:

There are two basic syntax of INSERT INTO statement is as follows:

INSERT INTO TABLE_NAME (column1, column2,


column3,...columnN)] VALUES (value1, value2,
value3,...valueN);

Here column1, column2,...columnN are the names of the columns in the table into which you
want to insert data.

You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:

INSERT INTO TABLE_NAME VALUES


148
Example:

Following statements would create six records in CUSTOMERS table:

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (2, 'Khilan',
25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (3,
'kaushik', 23, 'Kota', 2000.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (4,
'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (5, 'Hardik',
27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS


(ID,NAME,AGE,ADDRESS,SALARY) VALUES (6, 'Komal',
22, 'MP', 4500.00 );

You can create a record in CUSTOMERS table using second syntax as follows:

INSERT INTO CUSTOMERS


VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );

All the above statement would product following records in CUSTOMERS table:

+ -+ + + -+ +
149
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

150
7. SELECT

SQL SELECT Statement is used to fetch the data from a database table which returns data in
the form of result table. These result tables are called result-sets.

Syntax:

The basic syntax of SELECT statement is as follows:

SELECT column1, column2, columnN FROM table_name;

Here column1, column2...are the fields of a table whose values you want to fetch. If you want to
fetch all the fields available in the field then you can use following syntax:

SELECT * FROM table_name;

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would fetch ID, Name and Salary fields of the customers available
in CUSTOMERS table:

SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;

This would produce following result:

+ -+ + +
| ID | NAME| SALARY |
151
+ -+ + +
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |

152
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

8. WHERE CLAUSE

The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple table.

If the given condition is satisfied then only it returns specific value from the table. You would use
WHERE clause to filter the records and fetching only necessary records.

The WHERE clause not only used in SELECT statement, but it is also used in UPDATE, DELETE
statement etc. which we would examine in subsequent chapters.

Syntax:

The basic syntax of SELECT statement with WHERE clause is as follows:

SELECT column1, column2,


columnN FROM table_name
WHERE [condition]

You can specify a condition using comparision or logical operators like >, <, =, LIKE, NOT etc.
Below examples would make this concept clear.

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
153
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

154
Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000:

SQL> SELECT ID, NAME,


SALARY FROM
CUSTOMERS
WHERE SALARY > 2000;

This would produce following result:

+ -+ + +
| ID | NAME| SALARY |
+ -+ + +
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

9. AND and OR OPERATORS

The SQL AND and OR operators are used to combile multiple conditions to narrow data in an
SQL statement. These two operators are called conjunctive operators.

These operators provide a means to make multiple comparisons with different operators in the
same SQL statement.

The AND Operator:

The AND operator allows the existence of multiple conditions in an SQL statement's WHERE
clause.

Syntax:

The basic syntax of AND operator with WHERE clause is as follows:

SELECT column1, column2,


columnN FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];

You can combine N number of conditions using AND operator. For an action to be taken by
155
the SQL statement, whether it be a transaction or query, all conditions separated by the AND
must be TRUE.

156
Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would fetch ID, Name and Salary fields from the CUSTOMERS
table where salary is greater than 2000 AND age is less tan 25 years:

SQL> SELECT ID, NAME,


SALARY FROM
CUSTOMERS
WHERE SALARY > 2000 AND age < 25;

This would produce following result:

+ -+ + +
| ID | NAME | SALARY |
+ -+ + +
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+ -+ + +

10. UPDATE

The SQL UPDATE Query is used to modify the existing records in a table.

You can use WHERE clause with UPDATE query to update selected rows otherwise all the
rows would be effected.

Syntax:
157
The basic syntax of UPDATE query with WHERE clause is as follows:

UPDATE table_name

158
SET column1 = value1, column2 = value2 , columnN = valueN
WHERE [condition];

11. DELETE

The SQL DELETE Query is used to delete the existing records from a table.

You can use WHERE clause with DELETE query to delete selected rows, otherwise all the
records would be deleted.

Syntax:

The basic syntax of DELETE query with WHERE clause is as follows:

DELETE FROM
table_name WHERE
[condition];

You can combine N number of conditions using AND or OR operators.

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would DELETE a customer whose ID is 6:

SQL> DELETE FROM


CUSTOMERS WHERE ID = 6;

159
Now CUSTOMERS table would have following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

160
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

12. LIKE

The SQL LIKE clause is used to compare a value to similar values using wildcard operators.
There are two wildcards used in conjunction with the LIKE operator:

 The percent sign (%)


 The underscore (_)

The percent sign represents zero, one, or multiple characters. The underscore represents a single
number or character. The symbols can be used in combinations.

Syntax:

The basic syntax of % and _ is as follows:

SELECT FROM
table_name WHERE
column LIKE 'XXXX%'

or

SELECT FROM table_name


WHERE column LIKE
'%XXXX%'

or

SELECT FROM
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
161
table_name WHERE
column LIKE 'XXXX_'

consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
162
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would display all the records from CUSTOMERS table where
SALARY starts with 200:

SQL> SELECT * FROM


CUSTOMERS WHERE
SALARY LIKE '200%';

This would produce following result:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ + + -+ +

13. TOP

The SQL TOP clause is used to fetch a TOP N number or X percent records from a table.

Syntax:

The basic syntax of TOP clause with SELECT statement would be as follows:

SELECT TOP number|percent


column_name(s) FROM table_name
WHERE [condition]

Example:
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
163
Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |

+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
164
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example on SQL server which would fetch top 3 records from CUSTOMERS table:

SQL> SELECT TOP 3 * FROM CUSTOMERS;

This would produce following result:

+ -+ -+ -+ +- +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+ -+ -+ -+ +- +

14. ORDER BY

The SQL ORDER BY clause is used to sort the data in ascending or descending order, based on
one or more columns. Some database sorts query results in ascending order by default.

Syntax:

The basic syntax of ORDER BY clause is as follows:

SELECT
column-list
FROM
table_name
[WHERE
condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];

You can use more than one column in the ORDER BY clause. Make sure whatever column you
are using to sort, that column should be in column-list.

Example:
165
Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +

166
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is an example which would sort the result in ascending order by NAME and SALARY:

SQL> SELECT * FROM


CUSTOMERS ORDER BY
NAME, SALARY;

This would produce following result:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+ -+ + + -+ +

15. GROUP BY

The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange
identical data into groups.

The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.

Syntax:

The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the
167
conditions in the WHERE clause and must precede the ORDER BY clause if one is used.

SELECT column1,
column2 FROM
table_name WHERE [
conditions ]
GROUP BY column1, column2

168
ORDER BY column1, column2

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

If you want to know the total amount of salary on each customer, then GROUP BY query would
be as follows:

SQL> SELECT NAME, SUM(SALARY) FROM


CUSTOMERS GROUP BY NAME;

This would produce following result:

+ + +
| NAME | SUM(SALARY) |
+ + +
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh |2000.00 |
+ + +

16. DISTINCT KEYWORD

169
The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate all the
duplicate records and fetching only unique records.

There may be a situation when you have multiple duplicate records in a table. While fetching such
records, it makes more sense to fetch only unique records instead of fetching duplicate records.

170
Syntax:

The basic syntax of DISTINCT keyword to eliminate duplicate records is as follows:

SELECT DISTINCT column1, column2,


.........................................................
columnN FROM table_name
WHERE [condition]

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

First let us see how the following SELECT query returns duplicate salary records:

SQL> SELECT SALARY FROM


CUSTOMERS ORDER BY SALARY;

This would produce following result where salary 2000 is coming twice which is a duplicate
record from the original table.

+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
171
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +

Now let us use DISTINCT keyword with the above SELECT query and see the result:

172
SQL> SELECT DISTINCT SALARY FROM
CUSTOMERS ORDER BY SALARY;

This would produce following result where we do not have any duplicate entry:

+ +
| SALARY |
+ +
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+ +

17. CONSTRAINTS

Constraints are the rules enforced on data columns on table. These are used to limit the type of
data that can go into a table. This ensures the accuracy and reliability of the data in the database.

Contraints could be column level or table level. Column level constraints are applied only to one
column where as table level constraints are applied to the whole table.

Following are commonly used constraints available in SQL. These constraints have already
been discussed in SQL - RDBMS Concepts chapter but its worth to revise them at this point.

 NOT NULL Constraint: Ensures that a column cannot have NULL value.
 DEFAULT Constraint : Provides a default value for a column when none is specified.
 UNIQUE Constraint: Ensures that all values in a column are different.
 PRIMARY Key: Uniquely identified each rows/records in a database table.
 FOREIGN Key: Uniquely identified a rows/records in any another database table.
 CHECK Constraint: The CHECK constraint ensures that all values in a column
satisfy certain conditions.
 INDEX: Use to create and retrieve data from the database very quickly.

Constraints can be specified when a table is created with the CREATE TABLE statement or you
can use ALTER TABLE statment to create constraints even after the table is created.

Dropping Constraints:
173
Any constraint that you have defined can be dropped using the ALTER TABLE command with the
DROP CONSTRAINT option.

174
For example, to drop the primary key constraint in the EMPLOYEES table, you can use the
following command:

ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;

Some implementations may provide shortcuts for dropping certain constraints. For example, to
drop the primary key constraint for a table in Oracle, you can use the following command:

ALTER TABLE EMPLOYEES DROP PRIMARY KEY;

Some implementations allow you to disable constraints. Instead of permanently dropping a


constraint from the database, you may want to temporarily disable the constraint, and then
enable it later.

Integrity Constraints:

Integrity constraints are used to ensure accuracy and consistency of data in a relational database.
Data integrity is handled in a relational database through the concept of referential integrity.

There are many types of integrity constraints that play a role in referential integrity (RI). These
constraints include Primary Key, Foreign Key, Unique Constraints and other constraints
mentioned above.

The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN
is a means for combining fields from two tables by using values common to each.

Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

175
(a) Another table is ORDERS as follows:

+ + + + +

176
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now let us join these two tables in our SELECT statement as follows:

SQL> SELECT ID, NAME, AGE,


AMOUNT FROM
CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce following result:

+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +

Here it is noteable that the join is performed in the WHERE clause. Several operators can be used
to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used
to join tables. However, the most common operator is the equal symbol.

18. SQL JOIN TYPES

There are different type of joins available in SQL:

 INNER JOIN: returns rows when there is a match in both tables.


 LEFT JOIN: returns all rows from the left table, even if there are no matches in the
right table.
 RIGHT JOIN: returns all rows from the right table, even if there are no matches in the
left table.
 FULL JOIN: returns rows when there is a match in one of the tables.
177
 SELF JOIN: is used to join a table to itself, as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
 CARTESIAN JOIN: returns the cartesian product of the sets of records from the two or
more joined tables.

178
19. UNION

The SQL UNION clause/operator is used to combine the results of two or more SELECT
statements without returning any duplicate rows.

To use UNION, each SELECT must have the same number of columns selected, the same
number of column expressions, the same data type, and have them in the same order but they do
not have to be the same length.

Syntax:

The basic syntax of UNION is as follows:

SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]

UNION

SELECT column1 [,
column2 ] FROM table1
[, table2 ] [WHERE
condition]

Here given condition could be any given expression based on your requirement.

Example:

Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
179
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

(b) Another table is ORDERS as follows:

+ + + + +

180
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now let us join these two tables in our SELECT statement as follows:

SQL> SELECT ID, NAME, AMOUNT,


DATE FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID UNION
SELECT ID, NAME, AMOUNT,
DATE FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;

This would produce following result:

+ + +- +- +
| ID | NAME | AMOUNT | DATE |
+ + +- +- +
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+ + +- +- +

20. NULL

The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.

181
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.

182
Syntax:

The basic syntax of NULL while creating a table:

SQL> CREATE TABLE


CUSTOMERS( ID INT NOT
NULL,
NAME VARCHAR (20) NOT
NULL, AGE INT NOT
NULL, ADDRESS CHAR (25) ,
SALARY DECIMAL (18,
2), PRIMARY KEY (ID)
);

Here NOT NULL signifies that column should always accept an explicit value of the given
data type. There are two column where we did not use NOT NULL which means these column
could be NULL.

A field with a NULL value is one that has been left blank during record creation.

Example:

The NULL value can cause problems when selecting data, however, because when comparing an
unknown value to any other value, the result is always unknown and not included in the final results.

You must use the IS NULL or IS NOT NULL operators in order to check for a NULL value.

Consider following table, CUSTOMERS having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | |
| 7 | Muffy | 24 | Indore | |
+ -+ + + -+ +
183
Now following is the usage of IS NOT NULL operator:

SQL> SELECT ID, NAME, AGE,


ADDRESS, SALARY FROM
CUSTOMERS

184
WHERE SALARY IS NOT
NULL;

This would produce following result:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
+ -+ + + -+ +

21. ALIAS

You can rename a table or a column temporarily by giving another name known as alias.

The use of table aliases means to rename a table in a particular SQL statement. The renaming is a
temporary change and the actual table name does not change in the database.

The column aliases are used to rename a table's columns for the purpose of a particular SQL query.

Syntax:

The basic syntax of table alias is as follows:

SELECT column1, column2....


FROM table_name AS
alias_name WHERE
[condition];

The basic syntax of column alias is as follows:

SELECT column_name AS
alias_name FROM table_name
WHERE [condition];

Example:

185
Consider following two tables, (a) CUSTOMERS table is as follows:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +

186
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

(b) Another table is ORDERS as follows:

+ + + + +
|OID | DATE | CUSTOMER_ID | AMOUNT |
+ + + + +
| 102 | 2009-10-08 00:00:00 | 3| 3000 |
| 100 | 2009-10-08 00:00:00 | 3| 1500 |
| 101 | 2009-11-20 00:00:00 | 2| 1560 |
| 103 | 2008-05-20 00:00:00 | 4| 2060 |
+ + + + +

Now following is the usage of table alias:

SQL> SELECT C.ID, C.NAME, C.AGE,


O.AMOUNT FROM CUSTOMERS AS
C, ORDERS AS O WHERE C.ID =
O.CUSTOMER_ID;

This would produce following result:

+ -+ + + +
| ID | NAME| AGE | AMOUNT |
+ -+ + + +
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+ -+ + + +

Following is the usage of column alias:

187
SQL> SELECT ID AS CUSTOMER_ID, NAME AS
CUSTOMER_NAME FROM CUSTOMERS
WHERE SALARY IS NOT NULL;

This would produce following result:

+ + -+

188
| CUSTOMER_ID | CUSTOMER_NAME |
+ + -+
| 1 ||
Ramesh
| 2 | Khilan |
| 3 ||
kaushik
| 4 ||
Chaitali
| 5 | Hardik |
| 6 | Komal |
| 7 | Muffy |
+ + -+

22. ALTER TABLE

The SQL ALTER TABLE command is used to add, delete, or modify columns in an existing table.

You would also use ALTER TABLE command to add and drop various constraints on a an
existing table.

Syntax:

The basic syntax of ALTER TABLE to add a new column in an existing table is as follows:

ALTER TABLE table_name ADD column_name datatype;

The basic syntax of ALTER TABLE to DROP COLUMN in an existing table is as follows:

ALTER TABLE table_name DROP COLUMN column_name;

The basic syntax of ALTER TABLE to change the DATA TYPE of a column in a table is as
follows:

ALTER TABLE table_name MODIFY COLUMN column_name datatype;

The basic syntax of ALTER TABLE to add a NOT NULL constraint to a column in a table is as
follows:

ALTER TABLE table_name MODIFY column_name datatype NOT NULL;

The basic syntax of ALTER TABLE to ADD UNIQUE CONSTRAINT to a table is as follows:
189
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint UNIQUE(column1, column2...);

The basic syntax of ALTER TABLE to ADD CHECK CONSTRAINT to a table is as follows:

190
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint CHECK (CONDITION);

The basic syntax of ALTER TABLE to ADD PRIMARY KEY constraint to a table is as follows:

ALTER TABLE table_name


ADD CONSTRAINT MyPrimaryKey PRIMARY KEY (column1, column2...);

The basic syntax of ALTER TABLE to DROP CONSTRAINT from a table is as follows:

ALTER TABLE table_name


DROP CONSTRAINT MyUniqueConstraint;

If you're using MySQL, the code is as follows:

ALTER TABLE table_name


DROP INDEX MyUniqueConstraint;

The basic syntax of ALTER TABLE to DROP PRIMARY KEY constraint from a table is as
follows:

ALTER TABLE table_name


DROP CONSTRAINT MyPrimaryKey;

If you're using MySQL, the code is as follows:

ALTER TABLE
table_name DROP
PRIMARY KEY;

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
191
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

192
Following is the example to ADD a new column in an existing table:

ALTER TABLE CUSTOMERS ADD SEX char(1);

Now CUSTOMERS table is changed and following would be output from SELECT statement:

+ -+ -+ -+ +- -+ +
| ID | NAME | AGE | ADDRESS | SALARY | SEX |
+ -+ -+ -+ +- -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 | NULL |
| 2 | Ramesh | 25 | Delhi | 1500.00 | NULL |
| 3 | kaushik | 23 | Kota | 2000.00 | NULL |
| 4 | kaushik | 25 | Mumbai | 6500.00 | NULL |
| 5 | Hardik | 27 | Bhopal | 8500.00 | NULL |
| 6 | Komal | 22 | MP | 4500.00 | NULL |
| 7 | Muffy | 24 | Indore | 10000.00 | NULL |
+ -+ -+ -+ +- -+ +

Following is the example to DROP sex column from existing table:

ALTER TABLE CUSTOMERS DROP SEX;

Now CUSTOMERS table is changed and following would be output from SELECT statement:

+ -+ -+ -+ +- -+
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ -+ +- -+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ -+ -+ +- -+

23. TRUNCATE TABLE

The SQL TRUNCATE TABLE command is used to delete complete data from an existing table.

193
You can also use DROP TABLE command to delete complete table but it would remove
complete table structure form the database and you would need to re-create this table once
again if you wish you store some data.

194
Syntax:

The basic syntax of TRUNCATE TABLE is as follows:

TRUNCATE TABLE table_name;

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is the example to turncate:

SQL > TRUNCATE TABLE CUSTOMERS;

Now CUSTOMERS table is truncated and following would be output from SELECT statement:

SQL> SELECT * FROM CUSTOMERS;


Empty set (0.00 sec)

The HAVING clause enables you to specify conditions that filter which group results appear in
the final results.

The WHERE clause places conditions on the selected columns, whereas the HAVING clause places
conditions on groups created by the GROUP BY clause.

Syntax:

The following is the position of the HAVING clause in a query:

195
SELE
CT
FROM
WHER
E

196
GROUP
BY
HAVING
ORDER
BY

The HAVING clause must follow the GROUP BY clause in a query and must also precede the
ORDER BY clause if used. The following is the syntax of the SELECT statement, including the
HAVING clause:

SELECT column1,
column2 FROM
table1, table2 WHERE
[ conditions ]
GROUP BY column1,
column2 HAVING [
conditions ] ORDER BY
column1, column2

Example:

Consider CUSTOMERS table is having following records:

+ -+ + + -+ +
| ID | NAME| AGE | ADDRESS | SALARY |
+ -+ + + -+ +
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+ -+ + + -+ +

Following is the example which would display record for which similar age count would be
more than or equal to 2:

SQL > SELECT *


197
FROM
CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;

This would produce following result:

+ -+ +- + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ +- + + +
| 2 | Khilan | 25 | Delhi | 1500.00 |
+ -+ +- + + +

198
24. DATE FUNCTIONS

Following is the list of all important Date and Time related functions available through SQL.
There are various other functions supported by your RDBMS. Given list is based on MySQL
RDBMS.

Name Description

ADDDATE() Add dates

ADDTIME() Add time

CONVERT_TZ() Convert from one timezone to another

CURDATE() Return the current date

CURRENT_DATE(), CURRENT_DATE Synonyms for

CURDATE() CURRENT_TIME(), CURRENT_TIME

Synonyms for CURTIME()

CURRENT_TIMESTAM
Synonyms for NOW()
P(),
CURRENT_TIMESTAM
P

CURTIME() Return the current time

DATE_ADD() Add two dates

DATE_FORMAT() Format date as specified

DATE_SUB() Subtract two dates

DATE() Extract the date part of a date or datetime expression

DATEDIFF() Subtract two dates

199
DAY() Synonym for DAYOFMONTH()

DAYNAME() Return the name of the weekday

DAYOFMONTH() Return the day of the month (1-31)

DAYOFWEEK() Return the weekday index of the argument

200
DAYOFYEAR() Return the day of the year (1-366)

EXTRACT Extract part of a date

FROM_DAYS() Convert a day number to a date

FROM_UNIXTIME() Format date as a UNIX timestamp

HOUR() Extract the hour

LAST_DAY Return the last day of the month for the

argument LOCALTIME(), LOCALTIME Synonym for NOW()

LOCALTIMESTAM
Synonym for NOW()
P,
LOCALTIMESTAM
P()

MAKEDATE() Create a date from the year and day of year

MAKETIME MAKETIME()

MICROSECOND() Return the microseconds from argument

MINUTE() Return the minute from the argument

MONTH() Return the month from the date passed

MONTHNAME() Return the name of the month

NOW() Return the current date and time

PERIOD_ADD() Add a period to a year-month

PERIOD_DIFF() Return the number of months between periods

QUARTER() Return the quarter from a date argument

SEC_TO_TIME() Converts seconds to 'HH:MM:SS' format

SECOND() Return the second (0-59)


201
STR_TO_DATE() Convert a string to a date

SUBDATE() When invoked with three arguments a synonym for


DATE_SUB()

202
SUBTIME() Subtract times

SYSDATE() Return the time at which the function executes

TIME_FORMAT() Format as time

TIME_TO_SEC() Return the argument converted to seconds

TIME() Extract the time portion of the expression passed

TIMEDIFF() Subtract time

With a single argument, this function returns the date or


TIMESTAMP() datetime expression. With two arguments, the sum of the
arguments

TIMESTAMPADD() Add an interval to a datetime expression

TIMESTAMPDIFF() Subtract an interval from a datetime expression

TO_DAYS() Return the date argument converted to days

UNIX_TIMESTAMP() Return a UNIX timestamp

UTC_DATE() Return the current UTC date

UTC_TIME() Return the current UTC time

UTC_TIMESTAMP() Return the current UTC date and time

WEEK() Return the week number

WEEKDAY() Return the weekday index

WEEKOFYEAR() Return the calendar week of the date (1-53)

YEAR() Return the year

YEARWEEK() Return the year and week

203
Examples of SQL Commands in Different Tables:

ITEMS table

This table stores information about all the items that are offered by compnay. The structure of
the table is as follows:
Column Datatype Meaning
Itemno Number(5) A unique number assigned to each
item.
ItemName Varchar2(20) Name of the item.
Rate Number(8,2) Rate of the item.
taxrate Number(4,2) Sales tax rate for this item.

The following are the constraints related to ITEMS table:

 ITEMNO is primary key


 RATE and TAXRATE must be >= 0
 Default value for TAXRATE is 0

create table
ITEMS (
itemno number(5) constraint items_pk primary key,
itemname varchar2(20),
rate number(8,2) constraint items_rate_chk check( rate >= 0),
taxrate number(4,2) default 0 constraint items_rate_chk check( rate >= 0)
);

insert into items values(1,'Samsung 14"


monitor',7000,10.5); insert into items values(2,'TVS Gold
Keyboard',1000,10); insert into items values(3,'Segate
HDD 20GB',6500,12.5); insert into items values(4,'PIII
processor',8000,8);
insert into items values(5,'Logitech Mouse',500,5);
insert into items values(6,'Creative
MMK',4500,11.5);

CUSTOMERS Table

This table contains information about customers who have placed one or more orders. The
204
following is the structure of the table.
Column Datatype Meaning

205
Custno Number(5) A unique number assigned to each customer.
CustName Varchar2(20) Complete name of the customer.
Address1 varchar2(50) First line of address.
Address2 varchar2(50) Second line of address.
City varchar2(30) Name of the city where customer lives.
state varchar2(30) Name of the state where customer lives.
PinCode varchar2(10) Pincode of the city.
Phone varchar2(30) One or more phone numbers separated using
comma(,).

The following are the constraint related to CUSTOMERS table.

 CUSTNO is primary key


 CUSTNAME is not null column

create table
CUSTOMERS (
custno number(5) constraint customers_pk primary key,
custname varchar2(20) constraint customers_custname_nn not
null, address1 varchar2(50),
address2
varchar2(50), city
varchar2(30),
state varchar2(30),
pin varchar2(10),
phone varchar2(30)
);

insert into customers values(101,'Raul','12-22-29','Dwarakanagar',


'Vizag','AP','530016','453343,634333');
insert into customers values(102,'Denilson','43-22-22','CBM Compound',
'Vizag','AP','530012','744545');
insert into customers values(103,'Mendiator','45-45-52','Abid Nagar',
'Vizag','AP','530016','567434');
insert into customers values(104,'Figo','33-34-56','Muralinagar',
'Vizag','AP','530021','875655,876563,872222');
206
insert into customers values(105,'Zidane','23-22-56','LB Colony',
'Vizag','AP','530013','765533');

207
ORDERS Table

Contains information about all orders placed by customers. Contains one row for each order.
The details of items ordered in an order will be found in LINEITEMS table. The following is
the structure of the table.
Column Datatype Meaning
OrdNo Number(5) A unique number assigned to each order.
OrdDate Date Date on which order is placed.
ShipDate Date Date on which goods are to be shipped to customer.
Address1 varchar2(50) First line of shipping address.
Address2 varchar2(50) Second line of shipping address.
City varchar2(30) City name in shipping address.
state varchar2(30) State name in shipping address.
PinCode varchar2(10) Pincode of the city in shipping address.
Phone varchar2(30) One or more phone numbers separated using comma(,) of shipping
place.

The following are the constraint related to ORDERS table.

 ORDNO is primary key


 CUSTNO is foreign key referencing CUSTNO of CUSTOMERS table.
 SHIPDATE must be >= ORDDATE.

create table
ORDERS (
ordno number(5) constraint orders_pk primary
key, orddate date,
shipdate date,
custno number(5) constraint orders_custno_pk references
customers, address1 varchar2(50),
address2
varchar2(50), city
varchar2(30),
state varchar2(30),
208
pin varchar2(10),

209
phone varchar2(30),
constraint order_dates_chk check( orddate <= shipdate)
);

insert into orders values(1001,'15-May-2001','10-jun-2001',102,


'43-22-22','CBM Compound','Vizag','AP','530012','744545');

insert into orders values(1002,'15-May-2001','5-jun-2001',101,


'12-22-29','Dwarakanagar','Vizag','AP','530016','453343,634333');

insert into orders values(1003,'17-May-2001','7-jun-2001',101,


'12-22-29','Dwarakanagar','Vizag','AP','530016','453343,634333');

insert into orders values(1004,'18-May-2001','17-jun-


2001',103, '45-45-52','Abid Nagar',
'Vizag','AP','530016','567434');

insert into orders values(1005,'20-May-2001','3-jun-2001',104,


'33-34-56','Muralinagar','Vizag','AP','530021','875655,876563,872222');

insert into orders values(1006,'23-May-2001','11-jun-


2001',104, '54-22-12','MVP
Colony','Vizag','AP','530024',null);

LINEITEMS Table

Contains details of items ordered in each order. For each item in each order this table contains
one row. The following is the structure of the table.
Column Datatype Meaning
OrdNo Number(5) Refers to the order number of the order.
Itemno Number(5) Refers to the item number of the item.
qty number(3) Howmany units of this item arerequired in this
order.
price Number(8,2) Selling price of the item for this order.
DisRate Number(4,2) Discount Rate for this item in this order.

The following are the constraint related to ORDERS table.


210
 Primary key is ORDNO and ITEMNO.
 ORDNO is a foreign key referencing ORDNO of ORDERS table.
 ITEMNO is a foreign key referencing ITEMNO of ITEMS table.

211
 Default DISRATE is 0
 QTY must be >= 1
 DISRATE must be >= 0

create table
LINEITEMS (
ordno number(5) constraint LINEITEMS_ORDNO_FK references
ORDERS, itemno number(5) constraint LINEITEMS_itemno_FK
references ITEMS, qty number(3) constraint LINEITEMS_qty_CHK
CHECK( qty >= 1),
price number(8,2),
disrate number(4,2) default 0
constraint LINEITEMS_DISRATE_CHK CHECK( disrate >= 0),
constraint lineitems_pk primary key (ordno,itemno)
);

insert into lineitems


values(1001,2,3,1000,10.0); insert into
lineitems values(1001,1,3,7000,15.0); insert
into lineitems values(1001,4,2,8000,10.0);
insert into lineitems
values(1001,6,1,4500,10.0);

insert into lineitems


values(1002,6,4,4500,20.0); insert into
lineitems values(1002,4,2,8000,15.0); insert
into lineitems values(1002,5,2,600,10.0);

insert into lineitems


values(1003,5,10,500,0.0); insert into
lineitems values(1003,6,2,4750,5.0);

insert into lineitems


values(1004,1,1,7000,10.0); insert into
lineitems values(1004,3,2,6500,10.0); insert
into lineitems values(1004,4,1,8000,20.0);

insert into lineitems


values(1005,6,1,4600,10.0); insert into
lineitems values(1005,2,2,900,10.0);

212
insert into lineitems
values(1006,2,10,950,20.0); insert into
lineitems values(1006,4,5,7800,10.0); insert
into lineitems values(1006,3,5,6600,15.0);

Procedure to create tables and data

213
You can download and run sql scripts used to create these tables and data using the following
procedure.

1. First download tables.sql.


2. Get into SQL*PLUS and run it by using START filename. Where filename is complete
path of the tables.sql file in your system. For example, if you downloaded the file into c:\
downloads then the command will be
SQL>START C:\DOWNLOADS\TABLES.SQL
This will create all four tables with required constraints.
3. Download data.sql
4. Follow the same procedure as above to run it using START command in SQL*PLUS.

Queries

DISPLAY DETAILS OF ITEMS WHERE ITEMNAME CONTAINS LETTER 'O' TWICE

SELECT * FROM ITEMS


WHERE ITEMNAME LIKE '%O%O%';

DISPLAY ITEMNO,NAME,PRICE AND SELLING PRICE(PRICE+TAX) ROUND


SELLING PRICE TO 100

SELECT ITEMNO, ITEMNAME, RATE, ROUND(RATE + RATE * TAXRATE /100) "SPRICE"


FROM ITEMS;

DISPLAY DETAILS OF ITEMS BY PADDING ITEMNAME TO 20 CHARACTERS WITH


'.' AND IN UPPERCASE

SELECT ITEMNO, UPPER(RPAD(ITEMNAME,20,'.')) ITEMNAME, RATE,


TAXRATE FROM ITEMS;

DISPLAY CUSTNO,NAME AND ADDRESS

COLUMN ADDRESS FORMAT


A40 COLUMN PHONE FORMAT
A15
SELECT CUSTNO, CUSTNAME, TRIM(ADDRESS1 || ',' || ADDRESS2 || ',' ||
CITY || ',' || STATE || ',' || PIN) ADDRESS , PHONE
FROM CUSTOMERS;

DISPLAY ORDERDATE,APPROXIMATE SHIPDATE, WHICH WILL BE COMMING


MONDAY AFTER 7 DAYS FROM ORDERDATE

SELECT ORDNO,ORDDATE, NEXT_DAY(ORDDATE+7,'MON') SHIPDATE


214
FROM ORDERS;

215
DISPLAY ALL THE ORDERS THAT ARE PLACED IN THE CURRENT MONTH

SELECT * FROM ORDERS


WHERE TO_CHAR(ORDDATE,'MMYY') = TO_CHAR(SYSDATE,'MMYY');

DISPLAY THE ORDERS THAT WERE PLACED IN THE LASTWEEK OF PREVIOUS

MONTH SELECT * FROM ORDERS


WHERE ORDDATE BETWEEN LAST_DAY( ADD_MONTHS(SYSDATE,-1)) - 7
AND LAST_DAY( ADD_MONTHS(SYSDATE,-1));

DISPLAY ORDERNO,ORDERDATE IN DD-MM HH24:MI FORMAT,SHIPDATE IF NOT


AVAILABLE TAKE IT AS 15 DAYS FROM THE DAY OF ORDER

SELECT ORDNO, TO_CHAR(ORDDATE,'DD-MM HH24:MI') ORDDATE,


NVL(SHIPDATE,ORDDATE + 15) SHIPDATE
FROM ORDERS;

DISPALY TOTAL NO OF ORDERS

SELECT COUNT(*) "TOTAL NO.


ORDERS" FROM ORDERS;

DISPLY ORDERNO,NO.OF ITEMS IN AN ORDER AND AVG RATE OF ORDERS

SELECT ORDNO, COUNT(*) "NO ITEMS", ROUND(AVG(PRICE),2) "AVERAGE RATE"


FROM LINEITEMS
GROUP BY ORDNO;

DISPLAY ORDERNO FOR ORDERS WHERE ATLEAST ONE PRODUCT IS HAVING RATE
MORE THAN 5000 AND TOTAL NO.OF UNITS IS MORE THAN 10

SELECT ORDNO
FROM LINEITEMS
GROUP BY ORDNO
HAVING MAX(PRICE) > 5000 AND SUM(QTY) > 10;

DISPLAY MONTH NAME AND NO.OF ORDERS RECEIVED IN THE MONTH

SELECT TO_CHAR(ORDDATE,'MONTH') MONTH, COUNT(*) "NO. ORDERS"


FROM ORDERS
GROUP BY TO_CHAR(ORDDATE,'MONTH');

DISPLAY CUSTNO WHO HAVE PLACED MORE THAN 2 ORDERS IN THE LAST 3
216
MONTHS

217
SELECT CUSTNO
FROM ORDERS
WHERE ORDDATE > ADD_MONTHS(SYSDATE,-3)
GROUP BY CUSTNO
HAVING COUNT(*) > 2;

DISPLAY CUSTNO,NO.OF ORDERS ,DATE OF MOST RECENT ORDER

SELECT CUSTNO, COUNT(*) "NO. ORDERS", MAX(ORDDATE) "RECENT ORDER ON"


FROM ORDERS
GROUP BY CUSTNO;

DISPLAY CUSTNO,DATE ON WHICH FIRST ORDER WAS PLACED AND THE GAP
BETWEEN FIRST ORDER AND LAST ORDER IN DAYS

SELECT CUSTNO, MIN(ORDDATE) "FIRST ORDER", MAX(ORDDATE) - MIN(ORDDATE)


"GAP IN DAYS"
FROM ORDERS
GROUP BY CUSTNO;

DISPLAY ORDERNO,MAX PRICE IN THE ORDER FOR THE ORDERS WHERE THE
AMOUNT OF ITEMS IS MORE THAN 10000

SELECT ORDNO, MAX(PRICE) "MAX


PRICE" FROM LINEITEMS
GROUP BY ORDNO
HAVING SUM(PRICE * QTY) > 10000;

DISPLAY ITEMNO,TOTAL NO.OF UNITS SOLD,MAXPRICE,MINPRICE

SELECT ITEMNO, SUM(QTY) "TOTAL NO. UNITS", MAX(PRICE),


MIN(PRICE) FROM LINEITEMS
GROUP BY ITEMNO;

DISPLAY CUSTNO,DATE,NO.OF ORDERS PLACED

SELECT CUSTNO, ORDDATE, COUNT(*) "NO.


ORDRES" FROM ORDERS
GROUP BY CUSTNO, ORDDATE;

DISPLAY ORDERNO,CUSTNAME,ORDERDATE,NO.OF DATE BETWEEN SHIPDATE AND


ORDERDATE FOR ORDERS THAT HAVE BEEN SHIPPED
218
SELECT ORDNO, CUSTNAME, ORDDATE, SHIPDATE - ORDDATE "DAYS"

219
FROM ORDERS O, CUSTOMERS C
WHERE SHIPDATE IS NOT NULL AND O.CUSTNO = C.CUSTNO;

DISPLAY ORDERNO,ORDERDATE,CUSTNO,NAME FOR ALL THE ORDERS WHERE THE


ORDER CONTAINS ORDER FOR ITEMNO 5.

SELECT O.ORDNO, ORDDATE, O.CUSTNO, CUSTNAME


FROM ORDERS O, CUSTOMERS C, LINEITEMS L
WHERE ITEMNO = 5 AND L.ORDNO = O.ORDNO AND O.CUSTNO = C.CUSTNO;

The above query can also be written as follows.

SELECT ORDNO, ORDDATE, O.CUSTNO, CUSTNAME

FROM ORDERS O, CUSTOMERS C

WHERE O.CUSTNO = C.CUSTNO

AND ORDNO IN

( SELECT ORDNO FROM LINEITEMS WHERE ITEMNO = 5);

DISPLAY ITEMNO,NAME,ORDERNO,CUSTNAME AND AMOUNT.

SELECT I.ITEMNO, ITEMNAME, O.ORDNO, CUSTNAME, PRICE * QTY


"AMOUNT" FROM CUSTOMERS C, ORDERS O, LINEITEMS L, ITEMS I
WHERE O.CUSTNO = C.CUSTNO AND O.ORDNO =
L.ORDNO AND I.ITEMNO = L.ITEMNO

DISPLAY DETAILS OF ORDEERS IN WHICH ORDERDATE IS AS MONDAY AND


CUSTOMER RESIDES IN VSP

SELECT * FROM ORDERS


WHERE TO_CHAR(ORDDATE,'fmDAY') = 'MONDAY'
AND CUSTNO IN (SELECT CUSTNO FROM CUSTOMERS WHERE CITY LIKE '%VIS%');

DISPLAY DETAILS OF CUSTOMERS WHO PLACED ANY ORDERS WORTH MORE THAN
30000
220
SELECT * FROM
CUSTOMERS WHERE
CUSTNO IN
( SELECT CUSTNO
FROM ORDERS
WHERE ORDNO IN
( SELECT ORDNO
FROM LINEITEMS
GROUP BY
ORDNO
HAVING SUM(QTY*PRICE) > 30000)
);

DISPLAY DETAILS OF ITEMS FOR WHICH THERE IS AN ORDER IN THE CURRENT


MONTH

SELECT * FROM ITEMS


WHERE ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
WHERE ORDNO
IN ( SELECT
ORDNO
FROM ORDERS
WHERE TO_CHAR(ORDDATE,'MMYY') = TO_CHAR(SYSDATE,'MMYY')
)
);

DISPLAY DETAILS OF ORDER IN WHICH WE SOLD ITEM 3 FOR MAX PRICE

SELECT * FROM
ORDERS WHERE
ORDNO IN
(
SELECT ORDNO
FROM LINEITEMS
WHERE PRICE =
( SELECT MAX(PRICE) FROM LINEITEMS
WHERE ITEMNO = 3)
AND ITEMNO = 3
);

221
DISPLAY DETAILS OF ITEMS FOR WHICH THERE IS AN ORDER IN THE LAST 7
DAYS OR TOTAL NO.OF UNITS ORDERED IS MORE THAN 10.

SELECT * FROM ITEMS


WHERE ITEMNO IN
(SELECT ITEMNO

222
FROM LINEITEMS
WHERE ORDNO IN
(SELECT ORDNO FROM ORDERS WHERE SYSDATE-ORDDATE <= 7)
)
OR ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
GROUP BY
ITEMNO
HAVING SUM(QTY) > 10
);

DISPLAY ALL THE LINEITEMS IN WHICH THE RATE OF THE ITEM IS MORE THAN AVG
RATE OF THE ITEMS

SELECT * FROM LINEITEMS


L WHERE PRICE >
(SELECT AVG(PRICE)
FROM LINEITEMS
WHERE ITEMNO = L.ITEMNO);

DISPLAY DETAILS OF CUSTOMER WHO HAS PLACED MAX NO OF

ORDERS SELECT * FROM CUSTOMERS


WHERE CUSTNO
IN ( SELECT
CUSTNO FROM
ORDERS
GROUP BY CUSTNO
HAVING COUNT(*)
=(
SELECT
MAX(COUNT(*)) FROM
ORDERS
GROUP BY CUSTNO
)
);

DISPLAY DETAILS OF ORDERS IN WHICH ATLEAST ONE ITEM IS SOLD FOR


HIGHER RATE THAN ACTUAL RATE

223
SELECT * FROM
ORDERS WHERE
ORDNO IN
( SELECT ORDNO
FROM LINEITEMS L, ITEMS I

224
WHERE L.ITEMNO = I.ITEMNO
AND PRICE > RATE );

DETAILS OF CUSTOMERS WHO HAVE NOT PLACED ANY ORDER FOR THE LAST 15
DAYS

SELECT * FROM
CUSTOMERS WHERE
CUSTNO NOT IN (SELECT
CUSTNO
FROM ORDERS
WHERE SYSDATE - ORDDATE <= 15);

DISPLAY DETAILS OF ITEMS FOR WHICH THERE WAS NO ORDER IN THE


PREVIOUS MONTH

SELECT * FROM ITEMS


WHERE ITEMNO NOT
IN (
SELECT ITEMNO
FROM LINEITEMS
WHERE ORDNO
IN
( SELECT
ORDNO FROM
ORDERS
WHERE TO_CHAR(ORDDATE,'MMYY') = TO_CHAR( ADD_MONTHS(SYSDATE,-
1),'MMYY')
)
);

DISPLAY ORDERS WHERE ORDDATE IS IN THE CURRENT MONTH OR AFTER


ORDER 1004.

SELECT O1.*
FROM ORDERS O1, ORDERS O2
WHERE TO_CHAR( O1.ORDDATE,'MMYY') =
TO_CHAR(SYSDATE,'MMYY') OR (O2.ORDNO = 1004 AND
O1.ORDDATE > O2.ORDDATE);

DISPLAY DETAILS OF ITEMS THAT ARE PURCHASED BY CUSTOMER 102

225
SELECT * FROM ITEMS
WHERE ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
WHERE ORDNO
IN
( SELECT ORDNO

226
FROM ORDERS
WHERE CUSTNO =
102
)
);

DISPLAY DETAILS OF ITEMS THAT ARE PURCHASED BY CUSTOMER 102

SELECT * FROM ITEMS


WHERE ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
WHERE ORDNO IN
( SELECT ORDNO
FROM ORDERS
WHERE CUSTNO =
102
)
);

CHANGE SHIPDATE OF ORDER 1004 TO THE ORDER DATE OF MOST RECENT ORDER

UPDATE ORDERS
SET SHIPDATE = ( SELECT MAX(ORDDATE)
FROM ORDERS)
WHERE ORDNO = 1004;

DISPLAY THE DETAILS OF ITEMS WHERE ITEMNAME CONTAINS LETTER O OR M

SELECT * FROM ITEMS


WHERE ITEMNAME LIKE '%O%' OR ITEMNAME LIKE '%M%';

DISPLAY DETAILS OF ORDERS THAT WERE PLACED IN THE MONTH OF JUNE 2000.

SELECT * FROM ORDERS


WHERE ORDDATE BETWEEN '01-JUN-2000' AND '30-JUN-2000';

227
DISPLAY ORDERNO,ORDERDATE AND APPROXIMATE SHIPDATE(15 DAYS FROM
ORDDATE) FOR ALL ORDERS THAT ARE NOT SHIPPED.

228
SELECT ORDNO, ORDDATE, ORDDATE + 15
"SHIPDATE" FROM ORDER WHERE SHIPDATE IS
NULL;

DISPLAY ITEMNO,ORDERNO AND TOTAL AMOUNT AFTER ROUNDING THE VALUE


TO 100’S FOR ALL THE ITEMS WHERE THE QUANTITY IS MORE THAN 5 UNITS OR
PRICE IS LESS THAN 5000.

SELECT ITEMNO, ORDNO, ROUND(QTY*PRICE,-2)


"TOTAL" FROM LINEITEMS
WHERE QTY > 5 OR PRICE < 5000;

DISPLAY ITEMNO,ITEMNAME,PRICE AND TAX FOR ITEMS THAT ARE TAXABLE.

SELECT ITEMNO, ITEMNAME, PRICE , PRICE * TAX /100 "TAX"


FROM ITEMS
WHERE TAXRATE IS NOT NULL;

DISPLAY ORDERNO,CUSTMERNO,ORDERDATE,NO. OF DAYS BETWEEN DAYS


ORDERDATE AND SYSTEM DATE AND DATE ON WHICH THE AMOUNT SHOULD BE
COLLECTED, WHICH IS 5TH OF NEXT MONTH OF THE MONTH IN WHICH ITEMS ARE
DELIVERED.
SELECT ORDNO, CUSTNO, ORDDATE, SYSDATE - ORDDATE "NODAYS" ,
LAST_DAY(SHIPDATE) + 5 "COLLDATE"
FROM ORDERS
WHERE SHIPDATE IS NOT NULL;

DISPLAY THE DETAILS OF ORDERS THAT PLACED IN THE LAST 20 DAYS AND
DELIVERED.

SELECT * FROM ORDERS


WHERE SYSDATE - ORDDATE <= 20 AND SHIPDATE IS NOT NULL;

CHANGE THE RATE OF ITEMS IN ORDER 1003 SO THAT 10% DISCOUNT IS GIVEN
TO ALL ITEMS.

UPDATE LINEITEMS SET PRICE = PRICE * 0.90


WHERE ORDNO = 1003;

DISPLAY THE ITEMS WHERE ITEMNAME CONTAINS MORE THAN 10


229
CHARACTERS. SELECT * FROM ITEMS

230
WHERE LENGTH(ITEMNAME) > 10;

DISPLAY ITEMS WHERE ITEMNAME CONTAINS LETTER ‘O’ AFTER 5TH POSITION.

SELECT * FROM ITEMS


WHERE INSTR(ITEMNAME,'0') > 5;

DISPLAY FIRST NAME OF THE

CUSTOMER.

SELECT SUBSTR(ITEMNAME,1, INSTR(ITEMNAME,' ') -1 ) "FIRST


NAME" FROM CUSTOMERS;

DISPLAY ITEMNO,ITEMNAME IN UPPER CASE FOR ALL ITEMS WHERE THE


LETTER ‘M’ IS EXISTING IN ANY CASE.

SELECT ITEMNO,
UPPER(ITEMNAME) FROM ITEMS
WHERE UPPER(ITEMNAME) LIKE '%M%';

DISPLAY THE ORDERS THAT ARE PLACED IN THE CURRENT MONTH.

SELECT * FROM ORDERS


WHERE TO_CHAR(ORDDATE,'YYMM') = TO_CHAR(SYSDATE,'YYMM');

INSERT INTO A NEW ORDER WITH THE FOLLOWING: ORDERNO-


1010,CUSTOMERNO- 105,ORDERDATE-13-JULY-2001 AT 4:45 PM,SHIPDATE-NULL,
SHIPADDRESS-NULL.

INSERT INTO ORDERS VALUES(1010,TO_DATE('13-07-2001 16:45','DD-MM-YYYY


HH24:MI'),NULL,105,
NULL,NULL,NULL,NULL,NULL,NULL);

DISPLAY ORDERNO,CUSTOMERNO,THE NO. OF DAYS BETWEEN SHIPDATE AND


ORDERDATE.IF SHIPDATE IS-NOT AVAILABLE, TAKE IT AS SYSTEM DATE.

SELECT ORDNO,CUSTNO, NVL(SHIPDATE,SYSDATE)-ORDDATE


FROM ORDERS;

231
DISPLAY ITEMNO,PRICE,QUANTITY,DISCOUNT RATE FOR ITEMS WHERE THE
DISCOUNT RATE IS NON-ZERO. DISCOUNT-RATE IS CALUCULATED AS 10% FOR
ITEM 1,7% FOR ITEM 6 AND 8% FOR REMAINING.

232
SELECT ITEMNO, PRICE, QTY, DECODE(ITEMNO,1,10,6,7,10)
"DISRATE" FROM LINEITEMS
WHERE DISRATE <> 0

DISPLAY TOTAL AMOUNT OF ORDERS WE RECEIVED SO FAR.

SELECT SUM(QTY*PRICE)
FROM LINEITEMS;

DISPLAY CUSTOMERNO,MONTH-NAME,NO. OF ORDERS OF THE CURRENT YEAR.

SELECT CUSTNO, TO_CHAR(ORDDATE,'MONTH'), COUNT(*)


FROM ORDERS
GROUP BY CUSTNO, TO_CHAR(ORDDATE,'MONTH');

DISPLAY DIFFERENCE BETWEEN HIGHEST PRICE AND LOWEST PRICE AT WHICH


THE ITEM WAS SOLD.

SELECT MAX(PRICE) - MIN(PRICE)


FROM LINEITEMS
GROUP BY ITEMNO;

DISPLAY HOW MANY ORDERS ARE STILL PENDING.

SELECT COUNT(*)
FROM ORDERS
WHERE SHIPDATE IS NULL;

DISPLAY ORDERNO,AVERAGE OF PRICE BY TAKING INTO ORDERS THAT WERE


PLACED IN THE LAST 15 DAYS.

SELECT O.ORDNO, AVG(PRICE)


FROM ORDERS O, LINEITEMS L
WHERE O.ORDNO = L.ORDNO AND SYSDATE - ORDDATE <= 15
GROUP BY O.ORDNO;

DISPLAY YEAR,NO.OF ORDERS IN WHICH THE DIFFERENCE BETWEEN SHIPDATE


AND ORDERDATE IS LESS THAN 10 DAYS.
233
SELECT TO_CHAR(ORDDATE,'YYYY'), COUNT(*)
FROM ORDERS
WHERE SHIPDATE - ORDDATE <=10
GROUP BY
TO_CHAR(ORDDATE,'YYYY');

DISPLAY STATE,NO.OF CUSTOMERS IN THE STATE WHERE THE CUSTOMER NAME


CONTAINS THE WORD ‘NIKE’.

SELECT STATE,
COUNT(*) FROM
CUSTOMERS
WHERE CUSTNAME LIKE '%NIKE%'
GROUP BY STATE;

DISPLAY CUSTOMER WHO HAS PLACED MORE THAN 2 ORDERS IN A SINGLE MONTH.

SELECT CUSTNO
FROM ORDERS
GROUP BY CUSTNO, TO_CHAR(ORDDATE,'MMYY')
HAVING COUNT(*) > 2;

DISPLAY HIGHEST NO.OF ORDERS PLACED BY A SINGLE CUSTOMER.

SELECT MAX( COUNT(*))


FROM ORDERS
GROUP BY CUSTNO;

DISPLAY CUSTOMERNO,NO.OF COMPLETED ORDERS AND NO.OF INCOMPLETE


ORDERS.

SELECT CUSTNO, SUM( DECODE(SHIPDATE,NULL,1,0) ) "INCOMP ORDERS", SUM(


DECODE(SHIPDATE,NULL,0,1)) "COMP ORDERS"
FROM ORDERS
GROUP BY CUSTNO;

DISPLAY ORDERNO,ITEMNO,ITEMNAME,PRICE AT WHICH ITEM IS SOLD AND


CURRENT PRICE OF THE ITEM.

SELECT ORDNO, L.ITEMNO, ITEMNAME, PRICE,RATE


234
FROM LINEITEMS L , ITEMS I
WHERE L.ITEMNO = I.ITEMNO;

235
DISPLAY ORDERNO,ITEMNO,AMOUNT FOR ITEMS WHERE THE PRICE OF THE
ITEM IS MORE THAN THE CURRENT PRICE OF THE ITEM.

SELECT ORDNO, L.ITEMNO, QTY *


PRICE FROM LINEITEMS L, ITEMS I
WHERE PRICE > RATE
AND L.ITEMNO = I.ITEMNO;

DISPLAY ITEMNO,ITEMNAME,ORDERNO,DIFFERENCE BETWEEN CURRENT PRICE


AND SELLING PRICE FOR THE ITEMS WHERE THERE IS A DIFFERENCE BETWEEN
CURRENT PRICE AND SELLING PRICE.

SELECT L.ITEMNO, ITEMNAME, ORDNO, RATE- PRICE


FROM ITEMS I, LINEITEMS L
WHERE I.ITEMNO = L.ITEMNO AND RATE <>PRICE;

DISPLAY CUSTOMERNO,CUTOMER NAME,ORDERNO, ORDERDATE FOR ORDERS


WHERE THE SHIPADDRESS AND CUSTOMER ADDRESS ARE SAME.

SELECT O.CUSTNO, CUSTNAME, ORDNO, ORDDATE


FROM ORDERS O, CUSTOMERS C
WHERE O.ADDRESS1 = C.ADDRESS1 AND O.ADDRESS2= C.ADDRESS2 AND C.CITY
= O.CITY
AND C.STATE = O.STATE AND C.PIN = O.PIN;

DISPLAY ITEMNO,ITEMNAME,ORDERNO,QUANTITY REQUIRED FOR ALL ITEMS


(THAT ARE NOT EVEN ORDERED FOR).

SELECT I.ITEMNO, ITEMNAME, ORDNO,


QTY FROM LINEITEMS L , ITEMS I
WHERE I.ITEMNO = L.ITEMNO(+);

DISPLAY NO.OF ORDERS PLACED BY CUSTOMERS RESIDING IN VIZAG.

SELECT O.CUSTNO, COUNT(*)


FROM ORDERS O, CUSTOMERS C
WHERE O.CUSTNO = C.CUSTNO AND C.CITY =
'VIZAG' GROUP BY O.CUSTNO;
236
DISPLAY ORDERNO,CUSTOMER NAME,DIFFERENCE BETWEEN SYSTEM DATE
AND ORDERDATE FOR ORDERS THAT HAVE NOT BEEN SHIPPED AND OLDER
THAN 10 DAYS.

SELECT ORDNO, CUSTNAME, SYSDATE - ORDDATE


FROM ORDERS O, CUSTOMERS C
WHERE O.CUSTNO = C.CUSTNO AND SYSDATE - ORDDATE > 10 AND SHIPDATE IS
NULL;

DISPLAY CUSTOMER NAME AND TOTAL AMOUNT OF ITEMS PURCHASED BY


CUSTOMER.

SELECT CUSTNAME, SUM(QTY * PRICE)


FROM LINEITEMS L, ORDERS O, CUSTOMERS C
WHERE L.ORDNO = O.ORDNO AND O.CUSTNO =
C.CUSTNO GROUP BY CUSTNAME;

DISPLAY THE DETAILS OF ITEM THAT HAS HIGHEST

PRICE. SELECT * FROM ITEMS


WHERE RATE = ( SELECT MAX(RATE) FROM ITEMS);

DISPLAY DETAILS OF CUSTOMERS WHO PLACED MORE THAN 5 ORDERS.

SELECT * FROM CUSTOMERS


WHERE CUSTNO IN ( SELECT CUSTNO FROM ORDERS GROUP BY CUSTNO HAVING
COUNT(*) > 5);

DISPLAY DETAILS OF CUTOMERS WHO HAVE NOT PLACED ANY ORDER.

SELECT * FROM CUSTOMERS


WHERE CUSTNO NOT IN ( SELECT CUSTNO FROM ORDERS);

DISPLAY DETAILS OF CUTOMERS WHO HAVE PLACED AN ORDER IN THE LAST 6


MONTHS.

SELECT * FROM CUSTOMERS


WHERE CUSTNO IN ( SELECT CUSTNO FROM ORDERS WHERE
MONTHS_BETWEEN(SYSDATE,ORDDATE) <= 6);

237
DISPLAY THE ITEMS FOR WHICH WE HAVE SOLD MORE THAN 50 UNITS BY
TAKING INTO ORDERS WHERE THE PRICE IS MORE THAN 5000.

SELECT * FROM ITEMS


WHERE ITEMNO IN ( SELECT ITEMNO FROM LINEITEMS WHERE PRICE > 5000
GROUP BY ITEMNO
HAVING SUM(QTY) > 50);

DISPLAY THE DETAILS OF ORDERS THAT WERE PLACED BY A CUSTOMER WITH


PHONE NUMBER STARTING WITH 541 OR THE ORDERS IN WHICH WE HAVE MORE
THAN 5 ITEMS.

SELECT * FROM ORDERS


WHERE CUSTNO IN (SELECT CUSTNO FROM CUSTOMERS WHERE PHONE LIKE
'541%') OR ORDNO IN (SELECT ORDNO FROM LINEITEMS GROUP BY ORDNO
HAVING COUNT(*) > 5);

CHANGE THE RATE OF ITEMNO 1 IN ITEMS TABLE TO THE HIGHEST RATE OF


LINEITEMS TABLE OF THAT ITEM.

UPDATE ITEMS SET RATE = ( SELECT MAX(PRICE) FROM LINEITEMS


WHERE ITEMNO = 1)
WHERE ITEMNO = 1;

DELETE CUSTOMERS WHO HAVE NOT PLACED ANY ORDER.

DELETE FROM CUSTOMERS WHERE CUSTNO NOT IN ( SELECT CUSTNO FROM


ORDERS);

RENAME COLUMN RATE IN ITEMS TO PRICE

STEP1: CREATE TABLE NEWITEMS AS SELECT ITEMNO, ITEMNAME, RATE PRICE,


TAXRATE
FROM ITEMS;

STEP2: DROP TABLE ITEMS;

STEP3: RENAME NEWITEMS TO ITEMS;

DISPLAY DETAILS OF CUSTOMERS WHO HAVE PLACED MAXIMUM NUMBER OF


ORDERS.

238
SELECT * FROM CUSTOMERS
WHERE CUSTNO IN ( SELECT CUSTNO FROM ORDERS

239
GROUP BY CUSTNO HAVING COUNT(*)
= ( SELECT MAX(COUNT(*))
FROM ORDERS
GROUP BY
CUSTNO));

DISPLAY DETAILS OF CUSTOMERS WHO HAVEN’T PLACED ANY ORDER IN THAT


CURRENT MONTH.

SELECT * FROM CUSTOMERS


WHERE CUSTNO NOT IN ( SELECT CUSTNO FROM ORDERS WHERE
TO_CHAR(ORDDATE,'MMYY') =
TO_CHAR(SYSDATE,'MMYY'));

DISPLAY DETAILS OF ITEMS FOR WHICH THERE WAS NO ORDER IN THE CURRENT
MONTH BUT THERE WAS AN ORDER IN THE PREVIOUS MONTH.

SELECT * FROM ITEMS


WHERE ITEMNO IN ( SELECT ITEMNO FROM LINEITEMS L, ORDERS O
WHERE L.ORDNO = O.ORDNO AND
TO_CHAR( ADD_MONTHS(SYSDATE,-1),'MMYY') =
TO_CHAR(ORDDATE,'MMYY'))
AND ITEMNO NOT IN (SELECT ITEMNO FROM LINEITEMS L, ORDERS
O WHERE L.ORDNO = O.ORDNO AND
TO_CHAR(SYSDATE,'MMYY') =
TO_CHAR(ORDDATE,'MMYY'));

DISPLAY DETAILS OF ITEMS THAT WERE PURCHASED BY CUSTOMER WHO HAS


PLACED MORE THAN 3 ORDERS.

SELECT * FROM ITEMS


WHERE ITEMNO IN ( SELECT ITEMNO FROM LINEITEMS
WHERE ORDNO IN ( SELECT ORDNO FROM
ORDERS
WHERE CUSTNO IN (
SELECT
CUSTNO FROM
ORDERS
GROUP BY CUSTNO
HAVING COUNT(*) >
1
240
)
)
);

241
DISPLAY THE ORDERS IN WHICH THE GAP BETWEEN SHIPDATE AND ORDERDATE IS
MORE THAN THE AVERAGE GAP FOR INDIVIDUAL CUSTOMERS.

SELECT * FROM ORDERS O


WHERE SHIPDATE - ORDDATE
>
(SELECT AVG(SHIPDATE -
ORDDATE) FROM ORDERS
WHERE CUSTNO = O.CUSTNO);

DISPLAY THE DETAILS OF ITEMS IN WHICH THE CURRENT PRICE IS MORE THAN THE
MAXIMUM PRICE AT WHICH WE SOLD IT.

SELECT * FROM ITEMS


I WHERE RATE >
( SELECT MAX(PRICE)
FROM LINEITEMS
WHERE ITEMNO = I.ITEMNO);

CREATE A NEW TABLE ‘COMPORDERS’ WITH ORDNO, CUSTOMER-


NAME,ORDERDATE,SHIPDATE,DIFFERENCE BETWEEN SHIPDATE AND
ORDERDATE.

CREATE TABLE COMPORDERS AS SELECT ORDNO,


CUSTNAME,ORDDATE, SHIPDATE, SHIPDATE-ORDDATE "NODAYS"
FROM ORDERS O, CUSTOMERS C
WHERE O.CUSTNO= C.CUSTNO AND SHIPDATE IS NOT NULL;

DISPLAY THE ITEMS THAT HAVE TOP 3 HIGHEST PRICES.

SELECT * FROM ITEMS I


WHERE 2 >= ( SELECT COUNT(*) FROM ITEMS WHERE RATE > I.RATE)
ORDER BY RATE DESC;

DISPLAY DETAILS OF ITEM THAT HAS SECOND LOWEST PRICE.

SELECT * FROM ITEMS I


WHERE 1 = ( SELECT COUNT(*) FROM ITEMS WHERE RATE < I.RATE)

<
ADD A NEW ITEM TO THE LAST ORDER PLACED BY CUSTOMER 106 WITH THE
242
FOLLOWING DETAILS- ITEMNO-3,QUANTITY-2,PRICE AS THE CURRENT RATE OF
THE ITEM,DISCOUNT-8%.

243
DECLARE
V_ORDNO ORDERS.ORDNO
%TYPE; V_RATE ITEMS.RATE
%TYPE;

BEGIN
SELECT MAX(ORDNO) INTO
V_ORDNO FROM ORDERS WHERE
CUSTNO = 106;

SELECT RATE INTO V_RATE


FROM ITEMS WHERE ITEMNO = 3;

INSERT INTO LINEITEMS VALUES

(V_ORDNO,3,2,V_RATE,8); END;

CHANGE RATE OF ITEM 5 TO EITHER AVERAGE RATE OF ITEM 5 OR CURRENT RATE


WHICHEVER IS HIGHER.

DECLARE
V_APRICE LINEITEMS.PRICE
%TYPE; V_RATE ITEMS.RATE
%TYPE;

BEGIN
SELECT AVG(PRICE) INTO V_APRICE
FROM LINEITEMS WHERE ITEMNO =
5;

SELECT RATE INTO V_RATE


FROM ITEMS WHERE ITEMNO =
5;

UPDATE ITEMS SET RATE = GREATEST( V_APRICE, V_RATE)


WHERE ITEMNO = 5;
END;

INSERT A NEW ROW INTO LINEITEMS WITH THE FOLLOWING DETAILS.


ORDERNO IS THE LAST ORDER PLACED BY CUSTOMERNO 102,ITEMNO IS THE
244
ITEM OF P3 PROCESSOR, RATE IS LOWEST RATE OF THAT ITEM,QUANTITY IS
2,DISCOUNT IS 10% IF ITEM’S CURRENT RATE IS MORE THAN THE LEAST RATE
OTHERWISE NO DISCOUNT.

DECLARE
V_ORDNO ORDERS.ORDNO
%TYPE; V_PRICE
LINEITEMS.PRICE%TYPE;

245
V_DIS NUMBER(2);
V_RATE ITEMS.RATE%TYPE;
V_ITEMNO ITEMS.ITEMNO%TYPE;

BEGIN
SELECT MAX(ORDNO) INTO V_ORDNO
FROM ORDERS WHERE CUSTNO =
102;

SELECT ITEMNO, RATE INTO V_ITEMNO , V_RATE


FROM ITEMS WHERE UPPER(ITEMNAME) = 'PIII PROCESSOR';

-- GET LOWEST RATE OF THE ITEM

SELECT MIN(PRICE) INTO


V_PRICE FROM LINEITEMS
WHERE ITEMNO = V_ITEMNO;

IF V_RATE > V_PRICE THEN


V_DIS := 10;
ELSE
V_DIS := 0;
END IF;

INSERT INTO LINEITEMS VALUES ( V_ORDNO, V_ITEMNO, 2, V_PRICE, V_DIS);

END;

DISPLAY THE HIGHEST OF THE MISSING ORDERNOS.

DECLARE

V_MAXORDNO ORDERS.ORDNO
%TYPE; V_MINORDNO
ORDERS.ORDNO%TYPE; V_CNT
NUMBER(2);

BEGIN
SELECT MAX(ORDNO), MIN(ORDNO) INTO V_MAXORDNO,
V_MINORDNO FROM ORDERS;

FOR I IN REVERSE V_MINORDNO..V_MAXORDNO


246
LOOP
SELECT COUNT(*) INTO V_CNT
FROM ORDERS WHERE ORDNO = I;

247
IF V_CNT = 0 THEN
DBMS_OUTPUT.PUT_LINE(I);
EXIT;
END IF;
END LOOP;

END;

DISPLAY CUSTOMER NAMES OF THE CUSTOMERS WHO HAVE PLACED MORE THAN
3 ORDERS WHERE THE TOTAL AMOUNT OF THE ORDER IS MORE THAN 10,000.

SELECT CUSTNAME
FROM CUSTOMERS
WHERE CUSTNO IN ( SELECT
CUSTNO FROM
ORDERS
WHERE ORDNO IN ( SELECT ORDNO
FROM LINEITEMS
GROUP BY ORDNO
HAVING SUM(QTY*PRICE) > 10000)
GROUP BY CUSTNO
HAVING COUNT(*) > 1 );

CHANGE THE RATE OF EACH ITEM AS FOLLOWS (1) INCREASE THE RATE BY 10%
IF THE ITEM WAS SOLD IN MORE THAN 5 ITEMS. (2) INCREASE THE RATE BY 2%
IF AVERAGE PRICE IS GREATER THAN CURRENT PRICE, OTHERWISE DECREASE
THE PRICE BY 3%.

DECLARE
CURSOR CITEMS IS
SELECT ITEMNO,COUNT(*) CNT, AVG(PRICE) APRICE FROM LINEITEMS
GROUP BY ITEMNO;

V_PER NUMBER(5,2);
V_RATE ITEMS.RATE
%TYPE;

BEGIN
FOR REC IN CITEMS
LOOP

248
IF REC.CNT > 5 THEN
V_PER := 0.90;
ELSE

249
-- GET CURRENT RATE
SELECT RATE INTO V_RATE
FROM ITEMS WHERE ITEMNO = REC.ITEMNO;

IF REC.APRICE > V_RATE


THEN V_PER := 1.02;
ELSE
V_PER :=
0.97; END IF;
END IF;

UPDATE ITEMS SET RATE = RATE *


V_PER WHERE ITEMNO = REC.ITEMNO;

END

LOOP; END;

CREATE A NEW TABLE CALLED CUSTSUM AND STORE THE FOLLOWING DATA
INTO THE TABLE - CUSTOMERNO,CUSTOMER NAME,NO.OF ORDERS PLACED,
DATE OF MOST RECENT ORDER AND TOTAL AMOUNT OF ALL THE ORDERS.

BEFORE THIS PROGRAM IS RUN, YOU HAVE TO CREATE TABLE AS FOLLOWS:

CREATE TABLE CUSTSUM


( CUSTNO NUMBER(5),
CUSTNAME
VARCHAR2(20), NOORD
NUMBER(5),
RORDDATE DATE,
TOTAMT NUMBER(10)
);

DECLARE
CURSOR CUSTCUR IS
SELECT CUSTNO, CUSTNAME FROM
CUSTOMERS; V_ORDCNT NUMBER(5);
V_MORDDATE DATE;
V_TOTAMT NUMBER(10);

250
BEGIN

251
FOR REC IN
CUSTCUR LOOP
-- GET DETAILS OF CUSTOMER
SELECT COUNT(*), MAX(ORDDATE), SUM(QTY*PRICE) INTO V_ORDCNT,
V_MORDDATE, V_TOTAMT
FROM ORDERS O, LINEITEMS L
WHERE O.ORDNO = L.ORDNO AND CUSTNO = REC.CUSTNO;

INSERT INTO CUSTSUM VALUES ( REC.CUSTNO, REC.CUSTNAME, V_ORDCNT,


V_MORDDATE,V_TOTAMT);
END LOOP;

END;

DISPLAY ITEMNAMES OF ITEMS FOR WHICH THE CURRENT PRICE IS LESS THAN THE
AVERAGE PRICE OR TOTAL QUANTITY SOLD IS LESS THAN 10 UNITS.

SELECT
ITEMNAME FROM
ITEMS WHERE
ITEMNO IN
( SELECT ITEMNO
FROM ITEMS I
WHERE RATE < ( SELECT AVG(PRICE) FROM LINEITEMS WHERE ITEMNO =
I.ITEMNO)
)
OR ITEMNO IN
( SELECT ITEMNO
FROM LINEITEMS
GROUP BY ITEMNO
HAVING SUM(QTY) > 10 );

CREATE A PROCEDURE THAT TAKES ORDERNO,ITEMNO AND INSERTS A ROW INTO


LINEITEMS, PRICE-RATE OF THE ITEM, QTY-1,DISCOUNT-10%.

CREATE OR REPLACE PROCEDURE NEWITEM(P_ORDNO NUMBER, P_ITEMNO


NUMBER)
AS
V_RATE ORDERS.ORDNO%TYPE;
252
BEGIN
SELECT RATE INTO V_RATE
FROM ITEMS WHERE ITEMNO = P_ITEMNO;

253
INSERT INTO LINEITEMS VALUES ( P_ORDNO, P_ITEMNO, V_RATE, 1,

10); END;

CREATE A FUNCTION THAT RETURNSTHE FIRST MISSING ORDERNO.

CREATE OR REPLACE FUNCTION FIRSTMISORDNO RETURN


NUMBER AS
V_MAXORDNO ORDERS.ORDNO
%TYPE; V_MINORDNO
ORDERS.ORDNO%TYPE; V_CNT
NUMBER(2);

BEGIN
SELECT MAX(ORDNO), MIN(ORDNO) INTO V_MAXORDNO,
V_MINORDNO FROM ORDERS;

FOR I IN
V_MINORDNO..V_MAXORDNO LOOP
SELECT COUNT(*) INTO V_CNT
FROM ORDERS WHERE ORDNO = I;

IF V_CNT = 0 THEN
RETURN I;
END IF;
END LOOP;

-- NO MISSING
ORDNO RETURN
NULL;
END;

CREATE A FUNCTION THAT TAKES ORDERNO AND RETURNS CUSTOMER NAME


OF THAT ORDER.

CREATE OR REPLACE FUNCTION GETCUSTNAME ( P_ORDNO NUMBER) RETURN


VARCHAR2
IS
V_CUSTNAME
254
VARCHAR2(30); BEGIN
SELECT CUSTNAME INTO V_CUSTNAME
FROM CUSTOMERS
WHERE CUSTNO = ( SELECT CUSTNO FROM ORDERS WHERE ORDNO = P_ORDNO);

255
RETURN V_CUSTNAME;
END;

CREATE A PROCEDURE THAT INSERTS A NEW ROW INTO LINEITEMS WITH


GIVEN ITEMNO,PRICE,QUANTITY, ORDERNO IS THE MOST RECENT ORDER.
CHECK WHETHER PRICE IS MORE THAN THE CURRENT RATE OF THE ITEM,
CHECK WHETHER ITEM IS ALREADY EXISTING IN THE ORDER AND CHECK
WHETHER THE TOTAL AMOUNT OF THE ORDER INCLUDING THE NEW ITEM HAS
EXCEEDED 50,000.

CREATE OR REPLACE PROCEDURE NEWITEMS(P_ITEMNO NUMBER, P_PRICE


NUMBER, P_QTY NUMBER)
IS
V_CNT NUMBER(2);
V_RATE ITEMS.RATE%TYPE;
V_TOTAMT NUMBER(10);
V_ORDNO ORDERS.ORDNO
%TYPE;
BEGIN
SELECT MAX(ORDNO) INTO V_ORDNO FROM ORDERS;

-- CHECK CONDITIONS

SELECT RATE INTO V_RATE FROM ITEMS WHERE ITEMNO =

P_ITEMNO; IF P_PRICE > V_RATE THEN


RAISE_APPLICATION_ERROR(-20001,'PRICE IS MORE THAN CURRENT
PRICE'); END IF;

SELECT COUNT(*) INTO


V_CNT FROM LINEITEMS
WHERE ORDNO = V_ORDNO AND ITEMNO = P_ITEMNO;

IF V_CNT = 1 THEN
RAISE_APPLICATION_ERROR(-20002,'ITEM IS ALREADY
EXISTING'); END IF;

-- GET TOTAL AMOUNT

256
SELECT SUM(QTY * PRICE) INTO V_TOTAMT
FROM LINEITEMS WHERE ORDNO = V_ORDNO;

IF V_TOTAMT + P_PRICE * P_QTY > 50000 THEN

257
RAISE_APPLICATION_ERROR(-20003,'TOTAL AMOUNT EXCEEDED
50000'); END IF;

INSERT INTO LINEITEMS VALUES (V_ORDNO, P_ITEMNO,

P_PRICE,P_QTY,0); END;

MAKE SURE AN ORDER IS NOT CONTAINING MORE THAN 5 ITEMS.

CREATE OR REPLACE TRIGGER CHECKITEMCOUNT


BEFORE INSERT
ON LINEITEMS
FOR EACH ROW
DECLARE
V_CNT
NUMBER(5); BEGIN

SELECT COUNT(*) INTO V_CNT


FROM LINEITEMS WHERE ORDNO = :NEW.ORDNO;

IF V_CNT >= 5 THEN


RAISE_APPLICATION_ERROR(-20010,'CANNOT HAVE MORE THAN 5 ITEMS IN
AN ORDER');
END
IF; END;

DO NOT ALLOW ANY CHANGES TO ITEMS TABLE AFTER 9PM BEFORE

9AM. CREATE OR REPLACE TRIGGER CHECKTIME


BEFORE INSERT OR DELETE OR
UPDATE ON ITEMS
BEGIN
IF TO_CHAR(SYSDATE,'HH24') < 9 OR TO_CHAR(SYSDATE,'HH24') > 21 THEN
RAISE_APPLICATION_ERROR(-200011,'NO CHANGES CAN BE MADE
BEFORE 9
A.M AND AFTER 9
P.M'); END IF;
END;

DO NOT ALLOW ANY CHANGE TO ITEM RATE IN SUCH A WAY DIFFERENCE IS MORE
THAN 25% OF THE EXISTING RATE.
258
CREATE OR REPLACE TRIGGER TRGDIFFRATE

259
BEFORE UPDATE
ON ITEMS
FOR EACH ROW
DECLARE
V_DIFF
NUMBER(5); BEGIN
V_DIFF := ABS(:NEW.RATE - :OLD.RATE);

IF V_DIFF > :OLD.RATE * 0.25 THEN


RAISE_APPLICATION_ERROR(-20014,'INVALID RATE FOR AMOUNT. CHANGE
IS TOO BIG');
END

IF; END;

DATABASE DEVELOPMENT LIFE CYCLE (DDLC)

The database development life cycle (DDLC) is a process of designing, implementing and
maintaining a database system to meet strategic or operational information needs of an
organization or enterprise such as:

 Improved customer support and customer satisfaction.


 Better production management.
 Better inventory management.
 More accurate sales forecasting.

PHASES OF DDLC

The software development is the group of actions needed to transform the user’s need into an
effectual software solution. Software development procedure consist the activities needed for
building the software systems and integrating the techniques and practices to be accepted. It
also includes the planning of project, tracking development and managing the complications of
building software.

This different database related activities can be grouped into below phases (more commonly
known as DDLC – Database Development Life Cycle):

Requirements Analysis Database Design Evaluation and Selection Logical Database Design
Physical Database Design Implementation Data Loading Testing and Performance Tuning
Operation Maintenance
260
 Requirements Analysis

The most important step n implementing a database system is to find out what is needed —
What type of a database is required for the business organization, daily volume of the data, how
much data needs to be stored in the master files etc. In order to collect all this required
information, a database analyst need to spend a lot of time within the business organization
talking to people, end users and get acquainted with day-to-day process.

 Database Design

In this stage the database designers will make a decision on the database model that is perfectly
suited for the organization’s requirements. The database designers will study the documents
prepared by the analysts in the requirements analysis stage and then start developing a system
that fulfills the needs.

 Evaluation and Selection

Once the data model is designed, tested and demonstrated, the next phase is to evaluate the diverse
database management systems and choose the one that is perfectly suited for the requirements of the
organization. In order to identify best performing database for the organization, end user should be
involved in this phase.

 Logical Database Design

Once the evaluation and selection phase is completed successfully, the next step n the database
development life cycle is logical database design. The conceptual design is translated into
internal model in the logical design phase. This includes the mapping of all objects i.e. tables
design, indexes, views, transactions, access privileges etc.

 Physical Database Design

Physical database design is the procedure of selecting and characterizing the data storage and
data access of the database. The data storage depends on the type of devices supported by the
hardware, the data access methods and the DBMS.

Physical design is mainly significant for older database models like hierarchical and network
models. Physical design is very vital in database development life cycle and has great
significance as a bad design can result in deprived performance.

 Implementation
261
In most databases a new database implementation needs the formation of special storage
related constructs to house the end user tables. These constructs typically comprise storage
group, table spaces, data files, tables etc.

 Data Loading

Once the database has been created, the data must be loaded into the database. The data required
to be converting and migrating to the new database, if the loaded data is currently stored n a
different system or in a different format.

 Testing and Performance Tuning

The next phase is testing and performance tuning, this phase starts soon the data is loaded into
the database. In this phase, database is tested and fine-tuned for performance, integrity, access
and security constraints. It is very important that the database administrators and application
programmers work together during this phase, because testing and performance tuning happens
in parallel.

 Operation

Once the data is loaded into the database and it s fully tested, the database is than released into
production.

In operation phase, the database is accessed by the end users and application programs. This
stage includes adding of new data, modifying existing data and deletion of obsolete data. The
database administrators perform the administrative tasks periodically such as performance
tuning, expanding storage space, database backup etc. This is the crucial phase as it provides
useful information and helps management to make a business decision, thus making the smooth
and well-organized functioning of the organization.

 Maintenance

Database maintenance phase is very important and it is one of the ongoing phases in DDLC.
Factors such as new business needs, new information requirements, acquisition of new data etc
will make it essential to formulate ongoing changes and improvements to the existing design.
The major tasks in this phase include: database backup and recovery, performance tuning, design
modifications, access management and audits, usage monitoring, hardware maintenance,
upgradation etc.

Functional Dependencies
262
FD's are constraints on well-formed relations and represent a formalism on the infrastructure of
relation.

263
Definition: A functional dependency (FD) on a relation schema R is a constraint X → Y, where
X and Y are subsets of attributes of R. An FD is a relationship between an attribute "Y" and a
determinant (1 or more other attributes) "X" such that for a given value of a determinant the
value of the attribute is uniquely defined.

 X is a determinant
 X determines Y
 Y is functionally dependent on X
 X→Y
 X →Y is trivial if Y Í X

Example:

Let R be
NewStudent(stuId, lastName, major, credits, status,

socSecNo) FDs in R include

 {stuId}→{lastName}, but not the reverse


 {stuId} →{lastName, major, credits, status, socSecNo, stuId}
 {socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
 {credits}→{status}, but not {status}→{credits}

ZipCode→AddressCity

ArtistName→BirthYear

Autobrand→Manufacturer, Engine type

Author, Title→PublDate

TRIVIAL FUNCTIONAL DEPENDENCY

A functional dependency is trivial if Y is a subset of X. In a table with attributes of employee


name and Social Security number (SSN), employee name is functionally dependant on SSN
because the SSN is unique for individual names. An SSN identifies the employee specifically,
but an employee name cannot distinguish the SSN because more than one employee could have
the same name.
Functional dependency defines Boyce-Codd normal form and third normal form. This preserves

264
dependency between attributes, eliminating the repetition of information. Functional dependency is
related to a candidate key, which uniquely identifies a tuple and determines the value of all other
attributes in the relation. In some cases, functionally dependant sets are irreducible if: The right-
hand

265
set of functional dependency holds only one attribute. The left-hand set of functional dependency
cannot be reduced, since this may change the entire content of the set. Reducing any of the
existing functional dependency might change the content of the set. An important property of a
functional dependency is Armstrong’s axiom, which is used in database normalization. In a
relation, R, with three attributes (X, Y, Z) Armstrong’s axiom holds strong if the following
conditions are satisfied: Axiom of Transivity: If X->Y and Y->Z then, X->Z. Axiom of
Reflexivity (Subset Property): If Y is a subset of X then X->Y. Axiom of Augmentation: If X->Y
then XZ->YZ.

Dependency Preservation

1. Another desirable property in database design is dependency preservation.


o We would like to check easily that updates to the database do not result in
illegal relations being created.
o It would be nice if our design allowed us to check updates without having to
compute natural joins.
o To know whether joins must be computed, we need to determine what
functional dependencies may be tested by checking each relation
individually.
o Let F be a set of functional dependencies on schema R.

o Let be a decomposition of R.
o The restriction of F to is the set of all
functional dependencies in that include
only attributes of .
o Functional dependencies in a restriction can be tested in one relation, as
they involve attributes in one relation schema.
o The set of restrictions is the set of dependencies that can be
checked efficiently.
o We need to know whether testing only the restrictions is sufficient.

o Let .
o F' is a set of functional dependencies on schema R, but in general,.
o However, it may be that .
o If this is so, then every functional dependency in F is implied by F', and if F' is
satisfied, then
F must also be satisfied.
266
o A decomposition having the property that is a dependency-preserving
decomposition.

MULTIVALUED DEPENDENCIES

Multivalued dependencies occur when the presence of one or more rows in a table implies the
presence of one or more other rows in that same table.
Examples:

267
For example, imagine a car company that manufactures many models of car, but always makes
both red and blue colors of each model. If you have a table that contains the model name, color
and year of each car the company manufactures, there is a multivalued dependency in that table.
If there is a row for a certain model name and year in blue, there must also be a similar row
corresponding to the red version of that same car.

NORMALIZATION

Normalization is a process of reducing redundancies of data in a database. Quite often we come


across tables having a lot of bulk data with many columns. All these data might not be
necessary all the time whenever we use those tables. So, a better option is to split up the bulk
table into small parts and use only those tables which suit the actual purpose at a given instance
of time. In this way, redundancy is reduced. To make the long story short, we can simply say
that normalization is a process of dividing a big table into smaller ones in order to reduce
redundancy.

ANAMOLIES IN DBMS:

Insertion Anomaly
It is a failure to place information about a new database entry into all the places in the database
where information about the new entry needs to be stored. In a properly normalized database,
information about a new entry needs to be inserted into only one place in the database, in an
inadequatly normalized database, information about a new entry may need to be inserted into
more than one place, and human fallibility being what it is, some of the needed additional
insertions may be missed.

Deletion anomaly
It is a failure to remove information about an existing database entry when it is time to remove
that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of entry
needs to be deleted from only one place in the database, in an inadequatly normalized database,
information about that old entry may need to be deleted from more than one place.

Update Anomaly
An update of a database involves modifications that may be additions, deletions, or both. Thus
“update anomalies” can be either of the kinds discussed above.

All three kinds of anomalies are highly undesirable, since thier occurence constitutes corruption
of the database. Properly normalized database are much less susceptible to corruption than are
un- normalized databases.

Normalization Avoids

268
 Duplication of Data – The same data is listed in multiple lines of the database
 Insert Anomaly – A record about an entity cannot be inserted into the table without
first inserting information about another entity – Cannot enter a customer without a
sales order

269
 Delete Anomaly – A record cannot be deleted without deleting a record about a
related entity. Cannot delete a sales order without deleting all of the customer’s
information.
 Update Anomaly – Cannot update information without changing information in many
places. To update customer information, it must be updated for each sales order the
customer has placed
Process of normalization:
Before getting to know the normalization techniques in detail, let us define a few building
blocks which are used to define normal form.

1. Determinant : Attribute X can be defined as determinant if it uniquely defines the value


Y in a given relationship or entity .To qualify as determinant attribute need NOT be a key
attribute
.Usually dependency of attribute is represented as X->Y ,which means attribute X decides
attribute Y.

Example: In RESULT relation, Marks attribute may decide the grade attribute .This is
represented as Marks->grade and read as Marks decides Grade.
Marks -> Grade
In the result relation, Marks attribute is not a key attribute .Hence it can be concluded that key
attributes are determinants but not all the determinants are key attributes.

2. Functional Dependency: Yes functional dependency has definition but let’s not care
about that. Let’s try to understand the concept by example. Consider the following
relation :

REPORT(Student#,Course#,CourseName,IName,Room#,Marks,Gra
de) Where:

 Student#-Student Number
 Course#-Course Number
 CourseName -CourseName
 IName- Name of the instructor who delivered the course
 Room#-Room number which is assigned to respective instructor
 Marks- Scored in Course Course# by student Student #
 Grade –Obtained by student Student# in course Course #
 Student#,Course# together (called composite attribute) defines EXACTLY ONE value
of marks .This can be symbolically represented as

Student#Course# Marks
This type of dependency is called functional dependency. In above example Marks is functionally
dependent on Student#Course#.
270
Other Functional dependencies in above examples are:

 Course# -> CourseName


 Course#-> IName(Assuming one course is taught by one and only one instructor )
 IName -> Room# (Assuming each instructor has his /her own and non shared room)
 Marks ->Grade

271
Formally we can define functional dependency as: In a given relation R, X and Y are attributes.
Attribute Y is functional dependent on attribute X if each value of X determines exactly one
value of
Y. This is represented as
: X->Y
However X may be composite in nature.

3. Full functional dependency: In above example Marks is fully functional dependent on


student#Course# and not on the sub set of Student#Course# .This means marks cannot be
determined either by student # or Course# alone .It can be determined by using Student#
and Course# together. Hence Marks is fully functional dependent on student#course#.

CourseName is not fully functionally dependent on student#course# because one of the subset
course# determines the course name and Student# does not having role in deciding Course name
.Hence CourseName is not fully functional dependent on student #Course#.

Student
# Marks

Course#
Formal Definition of full functional dependency : In a given relation R ,X and Y are attributes.
Y is fully functionally dependent on attribute X only if it is not functionally dependent on sub-
set of X.However X may be composite in nature.

4. Partial Dependency: In the above relationship CourseName,IName,Room# are


partially dependent on composite attribute Student#Course# because Course# alone
can defines the coursename, IName,Room#.

Room#
IName
CourseNam
e Course#
Student#
Formal Definition of Partial dependency: In a given relation R, X and Y are attributes .Attribute
Y is partially dependent on the attribute X only if it is dependent on subset attribute X .However
X may be composite in nature.

5. Transitive Dependency: In above example , Room# depends on IName and in turn


depends on Course# .Here Room# transitively depends on Course#.

272
IName
Room#
Course
#
Similarly Grade depends on Marks,in turn Marks depends on Student# Course# hence Grade

273
Fully transitively depends on Student# Course#.

6. Key attributes : In a given relationship R ,if the attribute X uniquely defines all
other attributes ,then the attribute X is a key attribute which is nothing but the
candidate key.

Ex: Student#Course# together is a composite key attribute which determines all attributes in
relationship
REPORT(student#,Course#,CourseName,IName,Room#,Marks,Grade)uniquely.Hence Student#
and Course# are key attributes.

Un-Normalized Form (UNF)

If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is
something that can not be further decomposed. A non-atomic value, as the name suggests, can
be further decomposed and simplified. Consider the following table:

Emp- Emp- Month Sales Bank-Id Bank-Name


Id Name
E01 AA Jan 1000 B01 SBI
Feb 1200
Mar 850
E02 BB Jan 2200 B02 UTI
Feb 2500
E03 CC Jan 1700 B01 SBI
Feb 1800
Mar 1850
Apr 1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3 for
E01, 2 for E02 and 4 for E03).

First Normal Form (1NF)

A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a
unique combination of values. The above table in UNF can be processed to create the following
table in 1NF.

274
Emp- Month Sales Bank-Id Bank-Name
Emp- Name
Id
E01 AA Jan 1000 B01 SBI

275
E01 AA Feb 1200 B01 SBI
E01 AA Mar 850 B01 SBI
E02 BB Jan 2200 B02 UTI
E02 BB Feb 2500 B02 UTI
E03 CC Jan 1700 B01 SBI
E03 CC Feb 1800 B01 SBI
E03 CC Mar 1850 B01 SBI
E03 CC Apr 1725 B01 SBI

As you can see now, each row contains unique combination of values. Unlike in UNF, this
relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation
is now in 1NF.

Second Normal Form (2NF)

A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends
on the primary key of the relation. Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.

Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and
Bank- Name all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which
is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be
removed into another related relation, it would come to 2NF.

Emp-IdEmp- MonthSale Bank-


Name s Id
E01 AA JAN 1000 B01
E01 AA FEB 1200 B01
E01 AA MAR 850 B01
E02 BB JAN 2200 B02
E02 BB FEB 2500 B02
E03 CC JAN 1700 B01
E03 CC FEB 1800 B01
E03 CC MAR 1850 B01
E03 CC APR 1726 B01

Bank- Bank-
Id Name
B01 SBI
B02 UTI
276
After removing the portion into another relation we store lesser amount of data in two relations
without any loss information. There is also a significant reduction in redundancy.

277
Third Normal Form (3NF)

A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency
in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in
3NF, and the table must be split to bring it into 3NF.

What is a transitive dependency? Within a relation if we


see A → B [B depends on A]
And
B → C [C depends on
B] Then we may derive
A → C[C depends on A]

Such derived dependencies hold well in most of the situations. For example if we
have Roll → Marks
And
Marks → Grade
Then we may safely
derive Roll → Grade.

This third dependency was not originally specified but we have derived it.

The derived dependency is called a transitive dependency when such dependency becomes
improbable. For example we have been given
Roll → City
And
City → STDCode

If we try to derive Roll → STDCode it becomes a transitive dependency, because obviously the
STDCode of a city cannot depend on the roll number issued by a school or college. In such a
case the relation should be broken into two, each containing one of these two dependencies:
Roll → City
And
City → STD code

Boyce-Code Normal Form (BCNF)

A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every
dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These
could be same situation when a 3NF relation may not be in BCNF the following conditions are
found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
278
3. There are some common attributes in the relation.

279
Professor Departmen Head ofPercent
Code t Dept. Time
P1 Physics Ghosh 50
P1 Mathematic Krishnan 50
s
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematic Krishnan 100
s

Consider, as an example, the above relation. It is assumed that:

1. A professor can work in more than one department


2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relation diagram for the above relation is given as the following:

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information
that Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and
deleting Head of Dept. form the given relation. The normalized relations are shown in the following.

Professor Departmen Percent


Code t Time
280
P1 Physics 50
P1 Mathematic 50
s
P2 Chemistry 25

281
P2 Physics 75
P3 Mathematic 100
s

Head of
Departmen Dept.
t
Physics Ghosh
Mathematic Krishnan
s
Chemistry Rao

See the dependency diagrams for these new relations.

Fourth Normal Form (4NF)

When attributes in a relation have multi-valued dependency, further Normalization to 4NF and
5NF are required. Let us first find out what multi-valued dependency is.

A multi-valued dependency is a typical kind of dependency in which each and every attribute
within a relation depends upon the other, yet none of them is a unique primary key.

We will illustrate this with an example. Consider a vendor supplying many items to many
projects in an organization. The following are the assumptions:

1. A vendor is capable of supplying many items.


2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.

A multi valued dependency exists here because all the attributes depend upon the other and yet
none of them is a primary key having unique value.
282
Vendor Item Project
Code Code No.
V1 I1 P1
V1 I2 P1
V1 I1 P3

283
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3

The given relation has a number of problems. For example:

1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.

Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned
above. The problem is reduced by expressing this relation as two relations in the Fourth Normal
Form (4NF). A relation is in 4NF if it has no more than one independent multi valued
dependency or one independent multi valued dependency with a functional dependency.

The table can be expressed as the two 4NF relations given as following. The fact that vendors
are capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.

Vendor-Supply

Item
Vendor Code
Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project

Project
Vendor No.
Code
V1 P1
V1 P3
V2 P1
V3 P2

284
Fifth Normal Form (5NF)

285
These relations still have a problem. While defining the 4NF we mentioned that all the attributes
depend upon each other. While creating the two tables in the 4NF, although we have preserved
the dependencies between Vendor Code and Item code in the first table and Vendor Code and
Item code in the second table, we have lost the relationship between Item Code and Project No.
If there were a primary key then this loss of dependency would not have occurred. In order to
revive this relationship we must add a new table like the following. Please note that during the
entire process of normalization, this is the only step where a new table is created by joining two
attributes, rather than splitting them into separate tables.

Project Item
No. Code
P1 11
P1 12
P2 11
P3 11
P3 13

QUERY PROCESSING

Query processing includes translation of high-level queries into low-level expressions that can
be used at the physical level of the file system, query optimization and actual execution of the
query to get the result. It is a three-step process that consists of parsing and translation,
optimization and execution of the query submitted by the user.

286
A query is processed in four general steps:

287
1. Scanning and Parsing
2. Query Optimization or planning the execution strategy
3. Query Code Generator (interpreted or compiled)
4. Execution in the runtime database processor

1. Scanning and Parsing

 When a query is first submitted (via an applications program), it must be scanned


and parsed to determine if the query consists of appropriate syntax.
 Scanning is the process of converting the query text into a tokenized representation.
 The tokenized representation is more compact and is suitable for processing by the parser.
 This representation may be in a tree form.
 The Parser checks the tokenized representation for correct syntax.
 In this stage, checks are made to determine if columns and tables identified in the
query exist in the database and if the query has been formed correctly with the
appropriate keywords and structure.
 If the query passes the parsing checks, then it is passed on to the Query Optimizer.

2. Query Optimization or Planning the Execution Strategy

 For any given query, there may be a number of different ways to execute it.
 Each operation in the query (SELECT, JOIN, etc.) can be implemented using one or more
different
Access Routines.
 For example, an access routine that employs an index to retrieve some rows would be
more efficient that an access routine that performs a full table scan.
 The goal of the query optimizer is to find a reasonably efficient strategy for executing
the query (not quite what the name implies) using the access routines.
 Optimization typically takes one of two forms: Heuristic Optimization or Cost Based
Optimization
 In Heuristic Optimization, the query execution is refined based on heuristic rules for
reordering the individual operations.
 With Cost Based Optimization, the overall cost of executing the query is
systematically reduced by estimating the costs of executing several different execution
plans.

3. Query Code Generator (interpreted or compiled)

 Once the query optimizer has determined the execution plan (the specific ordering of
access routines), the code generator writes out the actual access routines to be executed.
 With an interactive session, the query code is interpreted and passed directly to the
runtime database processor for execution.
 It is also possible to compile the access routines and store them for later execution.
288
4. Execution in the runtime database processor

 At this point, the query has been scanned, parsed, planned and (possibly) compiled.
 The runtime database processor then executes the access routines against the database.
 The results are returned to the application that made the query in the first place.
 Any runtime errors are also returned.

289
Query Optimization

 To enable the system to achieve (or improve) acceptable performance by choosing a


better (if not the best) strategy during the process of a query. One of the great strengths
to the relational database.

Automatic Optimization vs. Human Programmer

1. A good automatic optimizer will have a wealth of information available to it


that human programmers typically do not have.
2. An automatic optimizer can easily reprocess the original relational request
when the organization of the database is changed. For a human programmer,
reorganization would involve rewriting the program.
3. The optimizer is a program, and therefore is capable of considering literally
hundreds of different implementation strategies for a given request, which is
much more than a human programmer can.
4. The optimizer is available to a wide range of users, in an efficient and cost-effective
manner.

The Optimization Process

1. Cast the query into some internal representation, such as a query tree structure.
2. Convert the internal representation to canonical form.

*A subset (say C) of a set of queries (say Q) is said to be a set of canonical forms for Q if and only
if every query Q is equivalent to just one query in C.

During this step, some optimization is already achieved by transforming the internal
representation to a better canonical form.

Possible improvements

a. Doing the restrictions (selects) before the join.


b. Reduce the amount of comparisons by converting a restriction condition to an
equivalent condition in conjunctive normal form- that is, a condition consisting of
a set of restrictions that are ANDed together, where each restriction in turn consists
of a set of simple comparisons connected only by OR's.
c. A sequence of restrictions (selects) before the join.
d. In a sequence of projections, all but the last can be ignored.
e. A restriction of projection is equivalent to a projection of a restriction.
f. Others
3. Choose candidate low-level procedures by evaluate the transformed query.

*Access path selection: Consider the query expression as a series of basic operations
290
(join, restriction, etc.), then the optimizer choose from a set of pre-defined, low-level
implementation procedures. These procedures may involve the user of primary key,
foreign key or indexes and other information about the database.

291
4. Generate query plans and choose the cheapest by constructing a set of candidate
query plans first, then choose the best plan. To pick the best plan can be achieved
by assigning cost to each given plan. The costs is computed according to the
number of disk I/O's involved.

MODULE 3

Database security

Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. It involves various types or categories of controls,
such as technical, procedural/administrative and physical. Database security is a specialist topic
within the broader realms of computer security, information security and risk management.

Security risks to database systems include, for example:

 Unauthorized or unintended activity or misuse by authorized database users, database


administrators, or network/systems managers, or by unauthorized users or hackers (e.g.
inappropriate access to sensitive data, metadata or functions within databases, or
inappropriate changes to the database programs, structures or security configurations);
 Malware infections causing incidents such as unauthorized access, leakage or disclosure of
personal or proprietary data, deletion of or damage to the data or programs, interruption or
denial of authorized access to the database, attacks on other systems and the unanticipated
failure of database services;
 Overloads, performance constraints and capacity issues resulting in the inability of
authorized users to use databases as intended;
 Physical damage to database servers caused by computer room fires or floods, overheating,
lightning, accidental liquid spills, static discharge, electronic breakdowns/equipment
failures and obsolescence;
 Design flaws and programming bugs in databases and the associated programs and
systems, creating various security vulnerabilities (e.g. unauthorized privilege escalation),
data loss/corruption, performance degradation etc.;
 Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in
database or system administration processes, sabotage/criminal damage etc.

Many layers and types of information security control are appropriate to databases, including:

 Access control
 Auditing
 Authentication
 Encryption
292
 Integrity controls
 Backups
 Application security

293
Traditionally databases have been largely secured against hackers through network security
measures such as firewalls, and network-based intrusion detection systems. While network
security controls remain valuable in this regard, securing the database systems themselves, and the
programs/functions and data within them, has arguably become more critical as networks are
increasingly opened to wider access, in particular access from the Internet. Furthermore, system,
program, function and data access controls, along with the associated user identification,
authentication and rights management functions, have always been important to limit and in some
cases log the activities of authorized users and administrators. In other words, these are
complementary approaches to database security, working from both the outside-in and the
inside-out as it were.

Many organizations develop their own "baseline" security standards and designs detailing basic
security control measures for their database systems. These may reflect general information
security requirements or obligations imposed by corporate information security policies and
applicable laws and regulations (e.g. concerning privacy, financial management and reporting
systems), along with generally-accepted good database security practices (such as appropriate
hardening of the underlying systems) and perhaps security recommendations from the relevant
database system and software vendors. The security designs for specific database systems
typically specify further security administration and management functions (such as
administration and reporting of user access rights, log management and analysis, database
replication/synchronization and backups) along with various business-driven information security
controls within the database programs and functions (e.g. data entry validation and audit trails).
Furthermore, various security-related activities (manual controls) are normally incorporated into
the procedures, guidelines etc. relating to the design, development, configuration, use,
management and maintenance of databases.

Database security cannot be seen as an isolated problem because it is effected by other


components of a computerized system as well. The security requirements of a system are specified
by means of a security policy which is then enforced by various security mechanisms. For
databases, requirements on the security can be classified into the following categories:
□Identification, Authentication
Usually before getting access to a database each user has to identify himself to the computer
system. Authentication is the way to verify the identity of a user at log-on time. Most common
authentication methods are passwords but more advanced techniques like badge readers,
biometric recognition techniques, or signature analysis devices are also available.
□Authorization, Access Controls
Authorization is the specification of a set of rules that specify who has which type of access to
what information. Authorization policies therefore govern the disclosure and modification of
information. Access controls are procedures that are designed to control authorizations. They are
responsible to limit access to stored data to authorized users only.
□Integrity, Consistency
An integrity policy states a set of rules (i. e. semantic integrity constraints) that define the
correct states of the database during database operation and therefore can protect against
malicious or accidental modification of information. Closely related issues to integrity and
294
consistency are concurrency control and recovery. Concurrency control policies protect the
integrity of the database in the presence of concurrent transactions. If these transactions do
not terminate normally due to

295
system crashes or security violations recovery techniques are used to reconstruct correct or
valid database states.
□Auditing
The requirement to keep records of all security relevant actions issued by a user is called auditing.
Resulting audit records are the basis for further reviews and examinations in order to test the
adequacy of system controls and to recommend any changes in the security policy.
Locking

Locking is a mechanism commonly used by systems to control access to shared resources by


concurrently running users. In the context of a DBMS, these shared resources are data objects,
and the users are transactions.
Locking is typically implemented using a lock manager, which records which objects are locked,
by whom, and in what mode. When a transaction wishes to use a particular
object (to read or write), it must request a lock from the lock manager. After it is done with the
object, it releases the lock by again notifying the lock manager. In certain cases, the lock manager
is not able to immediately grant a lock when it is requested (e.g., if it is held by another
transaction). In this case, the lock manager maintains a queue of transactions waiting for the lock.
It is important to also recognize that some data items can be shared simultaneously between
transactions (e.g., transactions T1 and T2 both want to read object X), but in other cases it is
necessary for a transaction to have an exclusive lock (e.g., T1 wants to write to X). This
motivates the need for multiple lock modes. In this case, the idea is that, if a transaction requests a
lock on an object in a mode that is incompatible with an existing lock on that object, then it must
wait on the lock queue until the existing lock is released.
In a DBMS, the goal is to develop a locking protocol that guarantees a schedule with desirable
properties (e.g., serializability, recoverability, avoid cascading aborts). Two common protocols
are two-phase locking (2PL) and strict two-phase locking (Strict 2PL).
Strict Two-Phase Locking (Strict 2PL)
1. If a transaction T wants to read object X, it requests a shared lock on X. If it wants to write X, it
requests an exclusive lock.
2. All locks requested by a transaction are held until the transaction is completed (commits or
aborts), at which point the locks are released. It can be shown that Strict 2PL guarantees
schedules that are serializable, recoverable, and that avoid cascading aborts.
Two-Phase Locking (2PL)
2PL relaxes Strict 2PL slightly. A transaction need not hold all locks until completion, but
once it has released a lock, it may not request any more locks. 2PL is guaranteed to produce
schedules that are serializable.

Shared and Exclusive Locks

A lock is a system object associated with a shared resource such as a data item of an elementary
type, a row in a database, or a page of memory. In a database, a lock on a database object (a data-
access lock) may need to be acquired by a transaction before accessing the object. Correct use of
locks prevents undesired, incorrect or inconsistent operations on shared resources by other
concurrent transactions. When a database object with an existing lock acquired by one transaction
296
needs to be accessed by another transaction, the existing lock for the object and the type of
the

297
intended access are checked by the system. If the existing lock type does not allow this specific
attempted concurrent access type, the transaction attempting access is blocked (according to a
predefined agreement/scheme). In practice a lock on an object does not directly block a
transaction's operation upon the object, but rather blocks that transaction from acquiring another
lock on the same object, needed to be held/owned by the transaction before performing this
operation. Thus, with a locking mechanism, needed operation blocking is controlled by a proper
lock blocking scheme, which indicates which lock type blocks which lock type.

Two major types of locks are utilized:

 Write-lock (exclusive lock) is associated with a database object by a transaction


(Terminology: "the transaction locks the object," or "acquires lock for it") before writing
(inserting/modifying/deleting) this object.
 Read-lock (shared lock) is associated with a database object by a transaction before reading
(retrieving the state of) this object.

The common interactions between these lock types are defined by blocking behavior as follows:

 An existing write-lock on a database object blocks an intended write upon the same object
(already requested/issued) by another transaction by blocking a respective write-lock from
being acquired by the other transaction. The second write-lock will be acquired and the
requested write of the object will take place (materialize) after the existing write-lock is
released.
 A write-lock blocks an intended (already requested/issued) read by another transaction by
blocking the respective read-lock .
 A read-lock blocks an intended write by another transaction by blocking the respective
write- lock .
 A read-lock does not block an intended read by another transaction. The respective read-
lock for the intended read is acquired (shared with the previous read) immediately after
the intended read is requested, and then the intended read itself takes place.

TRANSCATION

A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a single
unit. If the database was in consistent state before a transaction, then after execution of the
transaction also, the database must be in a consistent state. For example, a transfer of money from
one bank account to another requires two changes to the database both must succeed or fail
together.

Example:

You are working on a system for a bank. A customer goes to the ATM and instructs it to transfer
Rs. 1000 from savings to a checking account. This simple transaction requires two steps:
298
• Subtracting the money from the savings account balance.

• Adding the money to the checking account balance.

The code to create this transaction will require two updates to the database. For example, there
will be two SQL statements: one UPDATE command to decrease the balance in savings and a
second UPDATE command to increase the balance in the checking account.

You have to consider what would happen if a machine crashed between these two operations. The
money has already been subtracted from the savings account will not be added to the checking
account. It is lost. You might consider performing the addition to checking first, but then the
customer ends up with extra money, and the bank loses. The point is that both changes must be
made successfully. Thus, a transaction is defined as a set of changes that must be made together

States of Transaction

A transaction must be in one of the following states:

 Active: the initial state, the transaction stays in this state while it is executing.
 Partially committed: after the final statement has been executed.
 Failed: when the normal execution can no longer proceed.
 Aborted: after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
 Committed: after successful completion.

We say that a transaction has committed only if it has entered the committed state. Similarly, we
say that a transaction has aborted only if it has entered the aborted state. A transaction is said to
have terminated if has either committed or aborted.

299
A transaction starts in the active state. When it finishes its final statement, it enters the partially
committed state. At this point, the transaction has completed its execution, but it is still possible
that

300
it may have to be aborted, since the actual output may still be temporarily hiding in main
memory and thus a hardware failure may preclude its successful completion

The database system then writes out enough information to disk that, even in the event of a
failure, the updates performed by the transaction can be recreated when the system restarts after
the failure. When the last of this information is written out, the transaction enters the committed
state.

ACID Properties of Transactions

Most of what we’re calling transactional locking relates to the ability of a database management
system (DBMS) to ensure reliable transactions that adhere to these ACID properties. ACID is an
acronym that stands for Atomicity, Consistency, Isolation, and Durability. Each of these
properties is described in more detail below. However, all of these properties are related and must
be considered together. They are more like different views of the same object than independent
things.
Atomicity
Atomicity means all or nothing. Transactions often contain multiple separate actions. For
example, a transaction may insert data into one table, delete from another table, and update a third
table. Atomicity ensures that either all of these actions occur or none at all.
Consistency
Consistency means that transactions always take the database from one consistent state to
another. So, if a transaction violates the databases consistency rules, then the entire transaction
will be rolled back.
Isolation
Isolation means that concurrent transactions, and the changes made within them, are not visible to
each other until they complete. This avoids many problems, including those that could lead to
violation of other properties. The implementation of isolation is quite different in different
DBMS’. This is also the property most often related to locking problems.
Durability
Durability means that committed transactions will not be lost, even in the event of abnormal
termination. That is, once a user or program has been notified that a transaction was committed,
they can be certain that the data will not be lost.

Timestamp Ordering Protocol

A timestamp is a tag that can be attached to any transaction or any data item, which denotes a
specific time on which the transaction or data item had been activated in any way. We, who use
computers, must all be familiar with the concepts of “Date Created” or “Last Modified” properties
of files and folders. Well, timestamps are things like that.

A timestamp can be implemented in two ways. The simplest one is to directly assign the
current value of the clock to the transaction or the data item. The other policy is to attach the
value of a logical counter that keeps incrementing as new timestamps are
301
required. The timestamp of a transaction denotes the time when it was first activated. The
timestamp of a data item can be of the following two types:

302
W-timestamp (Q): This means the latest time when the data item Q has been written into.
R-timestamp (Q): This means the latest time when the data item Q has been read from.

These two timestamps are updated each time a successful read/write operation is performed on
the data item Q.

How should timestamps be used?

The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be
executed in their respective timestamp order. This is an alternative solution to using locks.

For Read operations:

1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has already been overwritten by some other transaction. Hence the value which T
wanted to read from Q does not exist there anymore, and T would be rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item
Q which has been written and committed by some other transaction earlier. Hence T will
be allowed to read the value of Q, and the R-timestamp of Q should be updated to TS (T).

For Write operations:

1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for
transaction T to write its value, and the delay has become so great that it has allowed
another transaction to read the old value of data item Q. In such a case T has lost its
relevance and will be rolled back.
2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the
system has allowed another transaction to write into the data item Q. in such a case too, T
has lost its relevance and will be rolled back.
3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).

Causes of DBMS Failure


There are many causes of DBMS failure. When a DBMS fails, it falls into an incorrect state and
will likely contain erroneous data. Typical causes of DBMS failures include errors in the
application program, an error by the terminal user, an operator error, loss of data validity and
consistency, a hardware error, media failures, an error introduced by the environment, and errors
caused by mischief or catastrophe.
Typically, the three major types of failure that result from a major hardware or software
malfunction are transaction, system, and media. These failures may be caused by a natural
disaster, computer crime, or user, designer, developer, or operator error. Each type of failure is
described in the following paragraphs.
Transaction Failure.
Transaction failures occur when the transaction is not processed and the processing steps are
303
rolled back to a specific point in the processing cycle. In a distributed data base environment, a
single logical data base may be spread across several physical data bases.

304
Transaction failure can occur when some, but not all, physical data bases are updated at
the same time.
System Failure.
System failure can be caused by bugs in the data base, operating system, or hardware. In each
case, the Transaction processing is terminated without control of the application. Data in the
memory is lost; however, disk storage remains stable. The system must recover in the amount of
time it takes to complete all interrupted transactions. At one transaction per second, the system
should recover in a few seconds. System failures may occur as often as several times a week.
Media Failure.
Disk crashes or controller failures can occur because of disk-write bugs in the operating system
release, hardware errors in the channel or controller, head crashes, or media degradation. These
failures are rare but costly.
By identifying the type of DBMS failure, an organization can define the state of activity to return
to after recovery. To design the data base recovery procedures, the potential failures must be
identified and the reliability of the hardware and software must be determined. the following is a
summary of four such recovery actions:
· TRANSACTION UNDO. a transaction that aborts itself or must be aborted by the system
during routine execution.
· GLOBAL REDO. When recovering from a system failure, the effects of all incomplete
transaction must be rolled back.
· PARTIAL UNDO. While a system is recovering from failure, the results of completed
transactions may not yet be reflected in the data base because execution has been terminated in an
uncontrolled manner. Therefore, they must be repeated, if necessary, by the recovery component.
· GLOBAL UNDO. If the data base is totally destroyed, a copy of the entire data base must be
reloaded from a backup source. A supplemental copy of the transaction is necessary to roll up the
state of the data base to the present.
Techniques for Reviewing DBMS Recovery
The review of a DBMS recovery must ensure that employees with specific responsibilities
perform their functions in accordance with operational policy and procedure. There are several
useful DBMS recovery review techniques.
There are two ways to make the system operate again. First, all transactions that have occurred
since the last backup can be reapplied, which would bring the data base up to date.
Second, the current contents of the data base can be taken and all transactions can be backed out
until the integrity and validity of the data are restored. Whichever method is selected, it should be
documented and a checklist of specific tasks and responsibilities identified.
The DBMS typically provides exhaustive review trails so that the system can know its exact state
at any time. These review tails should be complete enough to reconstruct transactions and aid in
recovery procedures. A data base administrator should know how to use these review trails in
recovery to fully understand the inner workings of the DBMS.
A data base that has been backed up regularly helps the system recover from a failure and begin
operating again as soon as possible. Daily backups are sufficient in most organizations. Those
organizations that must always have current data must sometimes perform hourly backups. Each
backup should be well documented to provide further insight into the review process.
305
Review techniques should examine application design, security procedures, and personnel control
to ensure that managers can meet emergencies and have effective contingencies in place. These
three areas are extremely critical review points for the auditor, management, users, and IS
personnel.

Application Design
It is important to build sound recovery procedures and processes into an application during the
design phase. The design of an application should take into consideration the data base control
issues that affect backup and recovery processes. Possible weaknesses in controls include:
· Inaccurate or incomplete data in the data base.
· An inadequate audit trail.
· An inadequate service level.
· Failure of the DBMS to function as specified.
· Inadequate documentation.
· Lack of processing continuity.
· Lack of management support.
· Fraud or embezzlement.
The data base administrator should be responsible for examining the backup and recovery controls
being considered by the user and developer when reviewing application design. The user and the
developer of the application must assess the risks of not having appropriate controls in place to aid
in recovery. Some key controls that should be adopted are:
· Review trails. A method of chronologically recording system activities that allows the
reconstruction, review, and examination of each event in a transaction from inception to
the final results.
· Recovery procedures. Automated or manual tools and techniques for recovering the integrity of
a data base.
· Application system failure procedures. Procedures for users to follow in the event that their
applications cannot operate.
· Checkpoint data bases. Copies of the data base and transaction files that are made at specific
point in time for recovery purposes.
At a minimum, these controls should be tested during the module and integration testing phases of
development. In terms of a new system review before implementation, these controls are most
effective if thoroughly validated and approved by the user and developer before the system is
placed into operation. One important issue to be considered in application design is data integrity.
Maintaining Data Integrity.
Data integrity concerns the accuracy of the contents of the data base. The integrity of the data can
be compromised because of failures(i.e., events at which the system fails to provide normal
operation or correct data). Failures are caused primarily by errors, which may originate in
programs, interactions between these programs, or the system. A transaction is a sequence of
actions. It should be designed and executed so that it either is successfully completed or has no
effect on the data base. A transaction can fail to be completed for the following reasons:
· An action violates a security or integrity constraint.
306
· The user cancels the transaction.
· An unrecoverable I/O error occurs.

307
· The system backs out the transaction to resolve a deadlock.
· The application program fails.
· The system crashes.
Semantic Integrity.
This refers to the accuracy of the data base despite the fact that users or applications programs try
to modify it incorrectly. Assuming that the data base security system prevents unauthorized
access, and hence malicious attempts to corrupt data, most potential errors will be caused by
incorrect input, incorrect programs, or lack of user understanding.
Traditionally, most integrity checking has been performed by the applications programs and by
periodic auditing of the data base. The following are some problems that occur when relying on
application programs for integrity checking:
· Checking is likely to be incomplete because the applications programmer may not be aware of
the semantics of the complete data base.
· Each application program relies on other programs that can modify the data base, and a problem
in one program could corrupt the whole data base.
· Code that enforces the same integrity constraints occurs in several programs. This leads to
unnecessary duplication of the programming effort and exposes the system to potential
inconsistencies.
· The criteria for integrity are buried within procedures and are therefore difficult to understand
and control.
· Maintenance operations performed by users of high-level query language cannot be controlled.
Most of these errors could be detected through auditing, although the time lag in detecting
errors by auditing can cause problems, such as difficulty in tracing the source of an error and
hence correcting it as well as incorrect data used in various ways, causing errors to propagate
through the data base and into the environment.
The semantics, or meaning, of a data base is partly drawn from a shared understanding among the
users, partly implied by the data structures used, and partly expressed as integrity constraints.
These constraints are explicitly stated by the individuals responsible for data control. Data
bases can also be classified as:
· A single record or set.
· Static or transitional.
· General or selective.
· Immediate or deferred.
· Unconditional or conditional.
A system of concurrent transactions must be correctly synchronized—that is, the processing of
these transactions must reach the same final state and produce the same output. Three forms of
inconsistency result from concurrence: lost updates, an incorrect read, and an unrepeatable read.
Lost updates can also result from backing up or undoing a transaction.
Correcting Inconsistency Problems.
The most commonly used approach to eliminate consistency problems is locking. The DBMS
can use the locking facilities that the operating system provides so that multiple processes can
synchronize their concurrent access of shared resources. A lock can be granted to multiple
processes, but a given object cannot be locked in shared and exclusive mode at the same time.
Shared and exclusive modes conflict because they are incompatible. The operating system usually
308
provides lock and unlock commands for requesting and releasing locks. If a lock request
cannot be granted, the

309
process is suspended until the request can be granted. If transactions do not follow restrictive
locking rules, Deadlock can occur. Deadlock can cause the loss of an entire file; therefore, it
is critical to have a recovery system in place to alleviate this problem.
The Deadlock problem can be solved either by preventing Deadlock or by detecting them after
they occur and taking steps to resolve them. Deadlock can be prevented by placing restrictions on
the way locks are requested. They can be detected by examining the status of locks. After they
are detected, the Deadlock can be resolved by aborting a transaction and rescheduling it. Methods
for selecting the best transaction to abort have also been developed.
A synchronization problem can occur in a distributed data base environment, such as a
client/server network. Data bases can become out of sync when data from one data base fails
to be updated on other data bases. When updates fail to occur, users at some locations may use
data that is not current with data at other locations. Distributed data bases provide different
types of updating mechanisms. In a two-phase commit update process, network nodes must be
online and receive data simultaneously before updates can occur. A newer update method called
data replication enables updates to be stored until nodes are online and ready to receive. Update
methods must ensure currency in all network data bases.
Security Procedures
A data base usually contains information that is vital to an organization's survival. A secure
data base environment, with physical and logical security controls, is essential during recover
procedures. Physical Security.
In some distributed environments, many physical security controls, such as the use of security
badges and cipher locks, are not feasible and the organization must rely more heavily on logical
security measures. In these cases, many organizational members may have data processing
needs that do not involve a data base but require the use of computer peripherals.
Logical Security.
Logical security prevents unauthorized users from invoking DBMS functions. The primary means
of implementing this type of security is the use of passwords to prevent access to files, records,
data elements, and DBMS utilities. Passwords should be checked to ensure that they are
designated in an intelligent, logical manner.
Security Logs.
Each time an unauthorized user attempts to access the data base, it should be recorded in a
security log. Entries in this log should consist of user ID, terminal or port number, time, date, and
type of infraction. With this information, it is possible to investigate any serious breaches of
security. From the data base administrator's standpoint, evidence that the DBMS is detecting
security violations and that a consistent procedure is used to follow them up should be sufficient.
Personnel Control
Data base recovery involves ensuring that only authorized users are allowed access and that no
subsequent misuse of information occurs. These controls are usually reestablished when a system
becomes operational. When operations cease or problems occur, however, controls often become
inoperative.
The three primary classes of data base users are data base administrator, applications and systems
programmers, and end users--and each has a unique view of the data. The DBMS must be flexible
enough to present data appropriately to each class of user and maintain the proper controls to
310
inhibit abuse of the system, especially during recovery, when controls may not be fully
operational.

311
Data Base Administrator.
The data base administrator is responsible for ensuring that the data base retains its integrity and is
accountable if the data base becomes compromised, no matter what circumstances arise. This
individual has ultimate power over the schema that the organization has implemented. Any
modifications or additions to this schema must be approved by the data base administrator.
Permission to use subschema (i.e., logical views) is given to end users and programmers only after
their intentions are fully known and are consistent with organizational goals.
Because the data base administrator has immediate and unrestricted access to almost every piece
of valuable organizational information, an incompetent employee in this position can expose the
organization to enormous risk, especially during DBMS recovery. Therefore, an organization
should have controls in place to ensure the appointment of a qualified data base administrator.
The data base administrator must ensure that appropriate procedures are followed during DBMS
recovery. The data base administrator should also validate and verify the system once it has been
recovered before allowing user access so that if controls are not functioning or accessing problem
continue, users will not be affected.

Applications and Systems Programmers.


After recovery, programmers must access the data base to manipulate and report on data
according to some predetermined specification or to access whether data loss has occurred. Each
application should have a unique subschemas with which to work. After recovery, the data base
administrator validates the subschemas organization to ensure that it is operating properly and
allowing the application to receive only the data necessary to perform its tasks. Systems
programmers must be controlled in a slightly different manner than applications programmers.
They must have the freedom to perform their tasks but be constrained from altering production
programs or system utility programs in a fraudulent
manner.
End Users.
End users are defined as all organizational members not included in the previous categories
who need to interact with the data base through DBMS utilities or application programs. Data
elements of the data base generally originate from end users. Each data element should be
assigned to an end user. The end user is then responsible for defining the element's access
and security rules. Every other user who wishes to use this data element must confer with the
responsible end user. If access is granted, the data base administrator must implement any
restrictions placed on the request through the DBMS.
Assigning ownership of specific data elements to end users discourages the corruption of data
elements, thereby enhancing data base integrity. Reviewers should ensure that this process exists
and is appropriately reinstituted after the recovery process has been completed and operational
approval has been provided by the data base administrator.
After recovery, the data base administrator should ensure that all forms of security practices and
procedures are reinstated. These processes are a part of data base security.

Object Oriented Database (OODB)

312
Object Oriented Database (OODB) provides all the facilities associated with object oriented
paradigm. It enables us to create classes, organize objects, structure an inheritance hierarchy and
call methods of other classes. Besides these, it also provides the facilities associated with
standard database systems. However, object oriented database systems have not yet replaced the
RDBMS in commercial business applications. Following are the two different approaches for
designing an object-oriented database:

• Designed to store, retrieve and manage objects created by programs written in some object
oriented languages (OOL) such as C++ or java.

Although a relational database can be used to store and manage objects, it does not understand
objects as such. Therefore, a middle layer called object manager or object-oriented layer software
is required to translate objects into tuples of a relation .

• Designed to provide object-oriented facilities to users of non object-oriented programming


languages (OOPLs) such as C or Pascal.

The user will create classes, objects, inheritance and so on and the database system will store and
manage these objects and classes. This second approach, thus, turns non-OOPLs into OOPLs. A
translation layer is required to map the objects created by user into objects of the database
system.

Advantages of OODBMS

Enriched modeling capabilities

The object-oriented data model allows the 'real world' to be modeled more closely. The object,
which encapsulates both state and behavior, is a more natural and realistic representation of real-
world objects. An object can store all the relationships it has with other objects, including many-
to- many relationships, and objects can be formed into complex objects that the traditional data
models cannot cope with easily.

Extensibility

OODBMSs allow new data types to be built from existing types. The ability to factor out
common properties of several classes and form them into a superclass that can be shared with
subclasses can greatly reduce redundancy within system and, as we stated· at the start of this
chapter, is regarded as one of the main advantages of object orientation. Further, the reusability of
classes promotes faster development and easier maintenance of the database and its applications.

Capable of handling a large variety of data types

313
Unlike traditional databases (such as hierarchical, network or relational), the object oriented
database are capable of storing different types of data, for example, pictures, voice video,
including text, numbers and so on.

Removal of impedance mismatch

A single language interface between the Data Manipulation Language (DML) and the
programming language overcomes the impedance mismatch. This eliminates many of the
efficiencies that occur in mapping a declarative language such as SQL to an imperative 'language
such as 'C'. Most OODBMSs provide a DML that is computationally complete compared with
SQL, the 'standard language of RDBMSs.

More expressive query language

Navigational access from the object is the most common form of data access in an OODBMS.
This is in contrast to the associative access of SQL (that is, declarative statements with selection
based on one or more predicates). Navigational access is more suitable for handling parts
explosion, recursive queries, and so on.

Support for schema evolution

The tight coupling between data and applications in an OODBMS makes schema evolution more
feasible.

Support for long-duration, transactions

Current relational DBMSs enforce serializability on concurrent transactions to maintain database


consistency. OODBMSs use a different protocol to handle the types of long-duration transaction
that are common in many advanced database application.

Applicability to advanced database applications

There are many areas where traditional DBMSs have not been particularly successful, such as,
Computer-Aided Design (CAD), Computer-Aided Software Engineering (CASE), Office
Information System(OIS), and Multimedia Systems. The enriched modeling capabilities of
OODBMSs have made them suitable for these applications.

Improved performance

There have been a number of benchmarks that have suggested OODBMSs provide significant
performance improvements over relational DBMSs. The results showed an average 30-fold
performance improvement for the OODBMS over the RDBMS.

314
Disadvantages of OODBMSs

There are following disadvantages of OODBMSs:

Lack of universal data model: There is no universally agreed data model for an OODBMS, and
most models lack a theoretical foundation. This .disadvantage is seen as a significant drawback,
and is comparable to pre-relational systems.

Lack of experience: In comparison to RDBMSs the use of OODBMS is still relatively limited.
This means that we do not yet have the level of experience that we have with traditional systems.
OODBMSs are still very much geared towards the programmer, rather than the naïve end-user.
Also there is a resistance to the acceptance of the technology. While the OODBMS is limited to a
small niche market, this problem will continue to exist

Lack of standards: There is a general lack of standards of OODBMSs. We have already


mentioned that there is not universally agreed data model. Similarly, there is no standard object-
oriented query language.

Competition: Perhaps one of the most significant issues that face OODBMS vendors is the
competition posed by the RDBMS and the emerging ORDBMS products. These products have an
established user base with significant experience available. SQL is an approved standard and the
relational data model has a solid theoretical formation and relational products have many
supporting tools to help .both end-users and developers.

Query optimization compromises encapsulations: Query optimization requires. An


understanding of the underlying implementation to access the database efficiently. However, this
compromises the concept of incrassation.

Locking at object level may impact performance Many OODBMSs use locking as the basis for
concurrency control protocol. However, if locking is applied at the object level, locking of an
inheritance hierarchy may be problematic, as well as impacting performance.

Complexity: The increased functionality provided by the OODBMS (such as the illusion of a
single- level storage model, pointer sizzling, long-duration transactions, version management, and
schema evolution--makes the system more complex than that of traditional DBMSs. In
complexity leads to products that are more expensive and more difficult to use.

Lack of support for views: Currently, most OODBMSs do not provide a view mechanism,
which, as we have seen previously, provides many advantages such as data independence,
security, reduced complexity, and customization.

Lack of support for security: Currently, OODBMSs do not provide adequate security
mechanisms. The user cannot grant access rights on individual objects or classes.
315
If OODBMSs are to expand fully into the business field, these deficiencies must be rectified.

316
OBJECT RELATIONAL DBMS

Relational DBMSs are currently the dominant database technology. The OODBMS has also
become the favored system for financial and telecommunications applications. Although the
OODBMS market is still same. The OODBMS continues to find new application areas, such as
the World Wide Web. Some industry analysts expect the market for the OODBMSs to grow at
over 50% per year, a rate faster than the total database market. However, their sales are unlikely
to overtake those of relational systems because of the wealth of businesses that find RDBMSs
acceptable, and because businesses have invested to much money and resources in their
development that change is prohibitive. This is the approach that has been taken by many
extended relational DBMSs, although each has implemented different combinations of features.
Thus there is no single extended relational model rather, there are a variety of these models,
whose characteristics depends upon the way and the degree to which extensions were made.
However, all the models do share the same basic relational tables and query language, all
incorporate some concept of 'object, and some have the ability to store methods (or procedures or
triggers), as well as data in the database.

In a four-quadrant view of the database world, as illustrated in the figure, the lower-left quadrant
are those applications that process simple data and have no requirements for querying the data.

These types of application, for example standard text processing packages such as Word,

WordPerfect, and Frame maker, can use the underlying operating system to obtain the essential
DBMS functionality of persistence. In the lower-right quadrant are those applications that process
complex data but again have no significant requirements for querying the data. For these types of
application, for example computer-aided design packages, an OODBMS may be an appropriate
317
choice of DBMS.

In the top-left quadrant are those applications that process simple data and also have
requirements for complex querying. Many traditional business applications fall into this quadrant
and an RDBMS may be the most appropriate DBMS.

318
Finally, in the top-right quadrant are those applications that process completed data and have
complex querying requirements. This represents many of the advanced database applications and
for these applications an ORDBMS may be the appropriate choice of DBMS.

Advantages and Disadvantages of ORDBMSS

ORDBMSs can provide appropriate solutions for many types of advanced database applications.
However, there are also disadvantages.

Advantages of ORDBMSs

There are following advantages of ORDBMSs:

Reuse and Sharing: The main advantages of extending the Relational data model come from
reuse and sharing. Reuse comes from the ability to extend the DBMS server to perform standard
functionality centrally, rather than have it coded in each application.

Increased Productivity: ORDBMS provides increased productivity both for the developer and
for the, end user

Use of experience in developing RDBMS: Another obvious advantage is that .the extended
relational approach preserves the significant body of knowledge and experience that has gone into
developing relational applications. This is a significant advantage, as many organizations would
find it prohibitively expensive to change. If the new functionality is designed appropriately, this
approach should allow organizations to take advantage of the new extensions in an evolutionary
way without losing the benefits of current database features and functions.

Disadvantages of ORDBMSs

The ORDBMS approach has the obvious disadvantages of complexity and associated increased
costs. Further, there are the proponents of the relational approach that believe the· essential
simplicity' and purity of the .relational model are lost with these types of extension.

ORDBMS vendors are attempting to portray object models as extensions to the relational model
with some additional complexities. This potentially misses the point of object orientation,
highlighting the large semantic gap between these two technologies. Object applications are
simply not as data- centric as relational-based ones.

PARALLEL DATABASE

• A parallel database system seeks to improve performance through parallelization of various


operations, such as loading data, building indexes and evaluating queries. Although data may be
stored in a distributed fashion, the distribution is governed solely by performance
319
considerations. Parallel database improves processing and input/output speeds by using multiple
CPUs and disks in

320
parallel. Centralized and client-server database systems are not powerful enough to handle such
applications. In parallel processing, many operations are performed simultaneously, as opposed
to serial processing, in which the computational steps are performed sequentially.

 A parallel database system seeks to improve performance through parallelization of


various operations, such as loading data, building indexes and evaluating queries.
 Although data may be stored in a distributed fashion, the distribution is governed solely
by performance considerations. Parallel databases improve processing and input/output
speeds by using multiple CPUs and disks in parallel.
 Centralized and client–server database systems are not powerful enough to handle
such applications.
 In parallel processing, many operations are performed simultaneously, as opposed to
serial processing, in which the computational steps are performed sequentially.
 Parallel databases can be roughly divided into two groups, the first group of
architecture is the multiprocessor architecture, the alternatives of which are the
followings :

 Shared memory architecture, where multiple processors share the main memory space.
 Shared disk architecture, where each node has its own main memory, but all nodes
share mass storage, usually a storage area network. In practice, each node usually also
has multiple processors.
 Shared nothing architecture, where each node has its own mass storage as well as
main memory.

Distributed Database Architecture


A distributed database system allows applications to access data from local and remote
databases. In a homogenous distributed database system, each database is an Oracle
Database. In a heterogeneous distributed database system, at least one of the databases is not
an Oracle Database. Distributed databases use client/server architecture to process information
requests.
It contains the following database systems:
 Homogenous Distributed Database Systems
 Heterogeneous Distributed Database Systems
 Client/Server Database Architecture
Homogenous Distributed Database Systems
A homogenous distributed database system is a network of two or more Oracle Databases that
reside on one or more machines. Below Figure illustrates a distributed system that connects
three databases: hq, mfg, and sales. An application can simultaneously access or modify the data
in several databases in a single distributed environment. For example, a single query from a
321
Manufacturing client on local database mfg can retrieve joined data from the products table on
the local database and the dept table on the remote hq database.

322
For a client application, the location and platform of the databases are transparent. You can also
create synonyms for remote objects in the distributed system so that users can access them with
the same syntax as local objects. For example, if you are connected to database mfg but want to
access data on database hq, creating a synonym on mfg for the remote dept table enables you to
issue this query:
SELECT * FROM dept;

Homogeneous Distributed Database

An Oracle Database distributed database system can incorporate Oracle Databases of different
versions. All supported releases of Oracle Database can participate in a distributed database
system. Nevertheless, the applications that work with the distributed database must understand the
functionality that is available at each node in the system. A distributed database application cannot
expect an Oracle7 database to understand the SQL extensions that are only available with Oracle
Database.
Heterogeneous Distributed Database Systems
In a heterogeneous distributed database system, at least one of the databases is a non-Oracle
Database system. To the application, the heterogeneous distributed database system appears as a

323
single, local, Oracle Database. The local Oracle Database server hides the distribution and
heterogeneity of the data.

324
The Oracle Database server accesses the non-Oracle Database system using Oracle Heterogeneous
Services in conjunction with an agent. If you access the non-Oracle Database data store using an
Oracle Transparent Gateway, then the agent is a system-specific application. For example, if you
include a Sybase database in an Oracle Database distributed system, then you need to obtain a
Sybase-specific transparent gateway so that the Oracle Database in the system can
communicate with it.
Alternatively, you can use generic connectivity to access non-Oracle Database data stores so
long as the non-Oracle Database system supports the ODBC or OLE DB protocols.
Heterogeneous Services
Heterogeneous Services (HS) is an integrated component within the Oracle Database server and
the enabling technology for the current suite of Oracle Transparent Gateway products. HS
provides the common architecture and administration mechanisms for Oracle Database gateway
products and other heterogeneous access facilities. Also, it provides upwardly compatible
functionality for users of most of the earlier Oracle Transparent Gateway releases.
Transparent Gateway Agents
For each non-Oracle Database system that you access, Heterogeneous Services can use a
transparent gateway agent to interface with the specified non-Oracle Database system. The agent
is specific to the non-Oracle Database system, so each type of system requires a different agent.
The transparent gateway agent facilitates communication between Oracle Database and non-
Oracle Database systems and uses the Heterogeneous Services component in the Oracle
Database server. The agent executes SQL and transactional requests at the non-Oracle Database
system on behalf of the Oracle Database server.
Client/Server Database Architecture
A database server is the Oracle software managing a database, and a client is an application that
requests information from a server. Each computer in a network is a node that can host one or
more databases. Each node in a distributed database system can act as a client, a server, or both,
depending on the situation.
An Oracle Database Distributed Database System

325
A client can connect directly or indirectly to a database server. A direct connection occurs when
a client connects to a server and accesses information from a database contained on that server.
For example, if you connect to the hq database and access the dept table on this database as in
below Figure, you can issue the following:

SELECT * FROM dept;

This query is direct because you are not accessing an object on a remote database.
In contrast, an indirect connection occurs when a client connects to a server and then accesses
information contained in a database on a different server. For example, if you connect
to the hq database but access the emp table on the remote sales database as in above Figure you
can issue the following:
SELECT * FROM emp@sales;

PARALLEL VS. DISTRIBUTED DATABASE :

Parallel Database System seeks to improve performance through parallelization of various


operations, such as data loading, index building and query evaluating. Although data may be
stored in a distributed fashion in such a system, the distribution is governed solely by
performance considerations.

326
In Distributed Database System, data is physically stored across several sites, and each site is

327
typically managed by a DBMS capable of running independent of the other sites. In contrast to
parallel databases, the distribution of data is governed by factors such as local ownership and
increased availability.

PDB & DDB Comparison:

1. System Components
- Distributed DBMS consists of many Geo-distributed, low-bandwidth link
connected, autonomic sites.
- Parallel DBMS consists of tightly coupled, high-bandwidth link connected, non-autonomic
nodes.

2. Component Role
- Sites in Distributed DBMS can work independently to handle local transactions or work
together to handle global transactions.
- Nodes in Parallel DBMS can only work together to handle global transactions.

3. Design Purposes
= Distributed DBMS is for:
- Sharing Data
- Local Autonomy
- High Availability
= Parallel DBMS is for:
- High Performance
- High Availability

But both PDB&DDB need to consider the following problems:

1. Data Distribution (Placement & Replicatioin);

2. Query Parallelization(Distributed Evaluation). And also, many parallel system consists of


network of workstation, the difference between Parallel DB & Distributed DB is becoming
smaller.

DATA WAREHOUSING

A data warehouse is a collection of data marts representing historical data from different
operations in the company. This data is stored in a structure optimized for querying and data
analysis as a data warehouse. Table design, dimensions and organization should be consistent
throughout a data warehouse so that reports or queries across the data warehouse are consistent.
328
A data warehouse can also be viewed as a database for historical data from different functions
within a company. The term Data Warehouse was coined by Bill Inmon in 1990, which he
defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision making process". He
defined the terms in the sentence as follows: Subject Oriented: Data that gives information
about a particular subject instead of about a company's ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of sources and
merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time period.
Non-volatile: Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the business. It is a single,
complete and consistent store of data obtained from a variety of different sources made
available to end users in what they can understand and use in a business context. It can be
• Used for decision Support
• Used to manage and control business
• Used by managers and end-users to understand the business and make judgments

Benefits of data warehousing

• Data warehouses are designed to perform well with aggregate queries running on large
amounts of data.
• The structure of data warehouses is easier for end users to navigate, understand and query
against unlike the relational databases primarily designed to handle lots of transactions.
• Data warehouses enable queries that cut across different segments of a company's operation.
E.g. production data could be compared against inventory data even if they were originally
stored in different databases with different structures.
• Queries that would be complex in very normalized databases could be easier to build and
maintain in data warehouses, decreasing the workload on transaction systems.
• Data warehousing is an efficient way to manage and report on data that is from a variety
of sources, non uniform and scattered throughout a company.
• Data warehousing is an efficient way to manage demand for lots of information from lots of users.
• Data warehousing provides the capability to analyze large amounts of historical data for
nuggets of wisdom that can provide an organization with competitive advantage.

Data Warehouse Characteristics

• A data warehouse can be viewed as an information system with the following attributes:

– It is a database designed for analytical tasks

– It’s content is periodically updated


329
– It contains current and historical data to provide a historical perspective of information

Data warehouse admin and management

The management of data warehouse includes,

• Security and priority management

• Monitoring updates from multiple sources

• Data quality checks

• Managing and updating meta data

• Auditing and reporting data warehouse usage and status

• Purging data

• Replicating, sub setting and distributing data

• Backup and recovery

• Data warehouse storage management which includes capacity planning, hierarchical


storage management and purging of aged data etc..

DESIGN OF DATA WAREHOUSE

The following nine-step method is followed in the design of a data warehouse:

1. Choosing the subject matter

2. Deciding what a fact table represents

3. Identifying and conforming the dimensions

4. Choosing the facts

5. Storing pre calculations in the fact table

6. Rounding out the dimension table

7. Choosing the duration of the db

330
8. The need to track slowly changing dimensions

9. Deciding the query priorities and query models

Technical considerations

A number of technical issues are to be considered when designing a data warehouse


environment. These issues include:

• The hardware platform that would house the data warehouse

• The dbms that supports the warehouse data

• The communication infrastructure that connects data marts, operational systems and end users

• The hardware and software to support meta data repository

• The systems management framework that enables admin of the entire environment

Implementation considerations

The following logical steps needed to implement a data warehouse:

• Collect and analyze business requirements

• Create a data model and a physical design

• Define data sources

• Choose the db tech and platform

• Extract the data from operational db, transform it, clean it up and load it into the warehouse

• Choose db access and reporting tools

• Choose db connectivity software

• Choose data analysis and presentation s/w

• Update the data warehouse

Access tools

331
Data warehouse implementation relies on selecting suitable data access tools. The best way to
choose this is based on the type of data can be selected using this tool and the kind of access it

permits for a particular user. The following lists the various type of data that can be accessed:

• Simple tabular form data

• Ranking data

• Multivariable data

• Time series data

• Graphing, charting and pivoting data

• Complex textual search data

• Statistical analysis data

• Data for testing of hypothesis, trends and patterns

• Predefined repeatable queries

• Ad hoc user specified queries

• Reporting and analysis data

• Complex queries with multiple joins, multi level sub queries and sophisticated search criteria

Data extraction, clean up, transformation and migration

A proper attention must be paid to data extraction which represents a success factor for a data
warehouse architecture. When implementing data warehouse several the following selection
criteria that affect the ability to transform, consolidate, integrate and repair the data should be
considered:

• Timeliness of data delivery to the warehouse

• The tool must have the ability to identify the particular data and that can be read by conversion tool

• The tool must support flat files, indexed files since corporate data is still in this type

332
• The tool must have the capability to merge data from multiple data stores

333
• The tool should have specification interface to indicate the data to be extracted

• The tool should have the ability to read data from data dictionary

• The code generated by the tool should be completely maintainable

• The tool should permit the user to extract the required data

• The tool must have the facility to perform data type and character set translation

• The tool must have the capability to create summarization, aggregation and derivation of records

• The data warehouse database system must be able to perform loading data directly

from these tools

Data placement strategies

– As a data warehouse grows, there are at least two options for data placement. One is to put
some of the data in the data warehouse into another storage media.

– The second option is to distribute the data in the data warehouse across multiple servers.

User levels

The users of data warehouse data can be classified on the basis of their skill level in accessing
the warehouse. There are three classes of users:Casual users: are most comfortable in retrieving
info from warehouse in pre defined formats and running pre existing queries and reports. These
users do not need tools that allow for building standard and ad hoc reports

Power Users: can use pre defined as well as user defined queries to create simple and ad hoc
reports. These users can engage in drill down operations. These users may have the experience
of using reporting and query tools.

Expert users: These users tend to create their own complex queries and perform standard
analysis on the info they retrieve. These users have the knowledge about the use of query and
report tools

Benefits of data warehousing

Data warehouse usage includes,


334
– Locating the right info

335
– Presentation of info

– Testing of hypothesis

– Discovery of info

– Sharing the analysis

The benefits can be classified into two:

• Tangible benefits (quantified / measureable):It includes,

– Improvement in product inventory

– Decrement in production cost

– Improvement in selection of target markets

– Enhancement in asset and liability management

• Intangible benefits (not easy to quantified): It includes,

– Improvement in productivity by keeping all data in single location and eliminating

– Reduced redundant processing

– Enhanced customer relation rekeying of data

ARCHITECTURE OF DATA WAREHOUSING

The data in a data warehouse comes from operational systems of the organization as well as
from other external sources. These are collectively referred to as source systems. The data
extracted from source systems is stored in a area called data staging area, where the data is
cleaned, transformed, combined, duplicated to prepare the data for us in the data warehouse. The
data staging area is generally a collection of machines where simple activities like sorting and
sequential processing takes place. The data staging area does not provide any query or
presentation services. As soon as a system provides query or presentation services, it is
categorized as a presentation server. A presentation server is the target machine on which the
data is loaded from the data staging area organized and stored for direct querying by end users,
report writers and other applications. The three different kinds of systems that are required for a
data warehouse are:

336
1. Source Systems
2. Data Staging Area
3. Presentation servers

337
The data travels from source systems to presentation servers via the data staging area. The entire
process is popularly known as ETL (extract, transform, and load) or ETT (extract, transform,
and transfer). Oracle’s ETL tool is called Oracle Warehouse Builder (OWB) and MS SQL
Server’s ETL tool is called Data Transformation Services (DTS).

A typical architecture of a data warehouse is shown below:

Each component and the tasks performed by them are explained below:

1. OPERATIONAL DATA
The sources of data for the data warehouse is supplied from:

(i) The data from the mainframe systems in the traditional network and hierarchical
format.
(ii) Data can also come from the relational DBMS like Oracle, Informix.
(iii) In addition to these internal data, operational data also includes external data
obtained from commercial databases and databases associated with supplier and
customers.

2. LOAD MANAGER
338
The load manager performs all the operations associated with extraction and loading data into
the data warehouse. These operations include simple transformations of the data to prepare the
data for entry into the warehouse. The size and complexity of this component will vary between
data warehouses and may be constructed using a combination of vendor data loading tools
and custom built programs.

339
3. WAREHOUSE MANAGER
The warehouse manager performs all the operations associated with the management of data in
the warehouse. This component is built using vendor data management tools and custom built
programs. The operations performed by warehouse manager include:
(i) Analysis of data to ensure consistency
(ii) Transformation and merging the source data from temporary storage into data
warehouse tables
(iii) Create indexes and views on the base table.
(iv) Denormalization
(v) Generation of aggregation
(vi) Backing up and archiving of data
In certain situations, the warehouse manager also generates query profiles to determine which
indexes ands aggregations are appropriate.

4. QUERY MANAGER
The query manager performs all operations associated with management of user queries. This
component is usually constructed using vendor end-user access tools, data warehousing
monitoring tools, database facilities and custom built programs. The complexity of a query
manager is determined by facilities provided by the end-user access tools and database.

5. DETAILED DATA
This area of the warehouse stores all the detailed data in the database schema. In most cases
detailed data is not stored online but aggregated to the next level of details. However the
detailed data is added regularly to the warehouse to supplement the aggregated data.

6. LIGHTLY AND HIGHLY SUMMERIZED DATA


The area of the data warehouse stores all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager. This area of the warehouse is transient
as it will be subject to change on an ongoing basis in order to respond to the changing query
profiles. The purpose of the summarized information is to speed up the query performance. The
summarized data is updated continuously as new data is loaded into the warehouse.

7. ARCHIVE AND BACK UP DATA


This area of the warehouse stores detailed and summarized data for the purpose of archiving
and back up. The data is transferred to storage archives such as magnetic tapes or optical disks.

8. META DATA
The data warehouse also stores all the Meta data (data about data) definitions used by all
processes in the warehouse. It is used for variety of purposed including:
(i) The extraction and loading process – Meta data is used to map data sources
to a common view of information within the warehouse.
(ii) The warehouse management process – Meta data is used to automate the
production of summary tables.
(iii) As part of Query Management process Meta data is used to direct a query to the
340
most appropriate data source.

341
The structure of Meta data will differ in each process, because the purpose is different. More
about Meta data will be discussed in the later Lecture Notes.

9. END-USER ACCESS TOOLS


The principal purpose of data warehouse is to provide information to the business managers for
strategic decision-making. These users interact with the warehouse using end user access tools.
The examples of some of the end user access tools can be:
(i) Reporting and Query Tools
(ii) Application Development Tools
(iii) Executive Information Systems Tools
(iv) Online Analytical Processing Tools
(v) Data Mining Tools

THE E T L (EXTRACT TRANSFORMATION LOAD) PROCESS


In this section we will discussed about the 4 major process of the data warehouse. They are
extract (data from the operational systems and bring it to the data warehouse), transform (the
data into internal format and structure of the data warehouse), cleanse (to make sure it is of
sufficient quality to be used for decision making) and load (cleanse data is put into the data
warehouse).
The four processes from extraction through loading often referred collectively as Data Staging.

EXTRACT
Some of the data elements in the operational database can be reasonably be expected to be useful
in the decision making, but others are of less value for that purpose. For this reason, it is necessary
to extract the relevant data from the operational database before bringing into the data
warehouse. Many commercial tools are available to help with the extraction process. Data
Junction is one of the commercial products. The user of one of these tools typically has an
easy-to-use windowed interface by which to specify the following:

(i) Which files and tables are to be accessed in the source database?
(ii) Which fields are to be extracted from them? This is often done internally by
SQL Select statement.
(iii) What are those to be called in the resulting database?
(iv) What is the target machine and database format of the output?
(v) On what schedule should the extraction process be repeated?

TRANSFORM

The operational databases developed can be based on any set of priorities, which keeps changing
342
with the requirements. Therefore those who develop data warehouse based on these databases are
typically faced with inconsistency among their data sources. Transformation process deals with
rectifying any inconsistency (if any).

343
One of the most common transformation issues is ‘Attribute Naming Inconsistency’. It is
common for the given data element to be referred to by different data names in different
databases. Employee Name may be EMP_NAME in one database, ENAME in the other. Thus one
set of Data Names are picked and used consistently in the data warehouse. Once all the data
elements have right names, they must be converted to common formats. The conversion may
encompass the following:

(i) Characters must be converted ASCII to EBCDIC or vise versa.


(ii) Mixed Text may be converted to all uppercase for consistency.
(iii) Numerical data must be converted in to a common format.
(iv) Data Format has to be standardized.
(v) Measurement may have to convert. (Rs/ $)
(vi) Coded data (Male/ Female, M/F) must be converted into a common format.
All these transformation activities are automated and many commercial products are available to
perform the tasks. DataMAPPER from Applied Database Technologies is one such comprehensive
tool.

CLEANSING
Information quality is the key consideration in determining the value of the information. The
developer of the data warehouse is not usually in a position to change the quality of its underlying
historic data, though a data warehousing project can put spotlight on the data quality issues and
lead to improvements for the future. It is, therefore, usually necessary to go through the data
entered into the data warehouse and make it as error free as possible. This process is known as
Data Cleansing.

Data Cleansing must deal with many types of possible errors. These include missing data and
incorrect data at one source; inconsistent data and conflicting data when two or more source are
involved. There are several algorithms followed to clean the data, which will be discussed in the
coming lecture notes.

LOADING
Loading often implies physical movement of the data from the computer(s) storing the source
database(s) to that which will store the data warehouse database, assuming it is different. This
takes place immediately after the extraction phase. The most common channel for data movement
is a high-speed communication link. Ex: Oracle Warehouse Builder is the API from Oracle, which
provides the features to perform the ETL task on Oracle Data Warehouse.

344
DATA MINING

Architecture of a Data Mining System

The architecture of a typical data mining system may have the following major components .

 Database, data warehouse, or other information repository: This is one or a set of


databases, data warehouses, spreadsheets, or other kinds of information repositories. Data
cleaning and data integration techniques may be performed on the data.

 Database or data warehouse server: The database or data warehouse server is responsible
for fetching the relevant data, based on the user’s data mining request.
 Knowledge base: This is the domain knowledge that is used to guide the search, or evaluate
345
the interestingness of resulting patterns. Such knowledge can include concept hierarchies,
used to organize

346
attributes or attribute values into different levels of abstraction. Knowledge such as user
beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness,
may also be included. Other examples of domain knowledge are additional interestingness
constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous
sources).
 Data mining engine: This is essential to the data mining system and ideally consists of a set
of functional modules for tasks such as characterization, association, classification, cluster
analysis, and evolution and deviation analysis.
 Pattern evaluation module: This component typically employs interestingness measures
and interacts with the data mining modules so as to focus the search towards interesting
patterns. It may use interestingness thresholds to filter out discovered patterns.
Alternatively, the pattern evaluation module may be integrated with the mining module,
depending on the implementation of the data mining method used. For efficient data mining,
it is highly recommended to push the evaluation of pattern interestingness as deep as possible
into the mining process so as to confine the search to only the interesting patterns.
 Graphical user interface: This module communicates between users and the data mining
system, allowing the user to interact with the system by specifying a data mining query or
task, providing information to help focus the search, and performing exploratory data mining
based on the intermediate data mining results. In addition, this component allows the user to
browse database and data warehouse schemas or data structures, evaluate mined patterns, and
visualize the patterns in different forms.

Functions of Data Mining

Data mining identifies facts or suggests conclusions based on sifting through the data to
discover either patterns or anomalies. Data mining has five main functions:

 Classification: infers the defining characteristics of a certain group (such as


customers who have been lost to competitors).

 Clustering: identifies groups of items that share a particular characteristic. (Clustering


347
differs from classification in that no predefining characteristic is given in classification.)

 Association: identifies relationships between events that occur at one time (such as the
contents of a shopping basket).

348
 Sequencing: similar to association, except that the relationship exists over a period of
time (such as repeat visits to a supermarket or use of a financial planning product).

 Forecasting: estimates future values based on patterns within large sets of data (such
as demand forecasting).

Data Mining Applications

The areas where data mining has been applied recently include:

 Science
o astronomy,
o bioinformatics,
o drug discovery, ...
 Business
o advertising,
o Customer modeling and CRM (Customer Relationship management)
o e-Commerce,
o fraud detection
o health care, ...
o investments,
o manufacturing,
o sports/entertainment,
o telecom (telephone and communications),
o targeted marketing,
 Web:
o search engines, bots, ...
 Government
o anti-terrorism efforts
o law enforcement,
o profiling tax cheaters

One of the most important and widespread business applications of data mining is Customer
Modeling, also called Predictive Analytics. This includes tasks such as

 predicting attrition or churn, i.e. find which customers are likely to terminate service
 targeted marketing:
o customer acquisition - find which prospects are likely to become customers
o cross-sell - for given customer and product, find which other product(s) they are
likely to buy
 credit-risk - identify the risk that this customer will not pay back the loan or credit card
 fraud detection - is this transaction fraudulent?

349
The largest users of Customer Analytics are industries such as banking, telecom, retailers, where
businesses with large numbers of customers are making extensive use of these technologies.

350

You might also like