0% found this document useful (0 votes)
10 views40 pages

Introduction To DBMS July 2018

DBMS, or Database Management System, is a collection of inter-related data and programs designed to store and retrieve data efficiently, addressing the need for optimized storage and fast retrieval of large amounts of data. It is widely used across various applications such as telecom, banking, education, and online shopping, providing advantages over traditional file systems, including reduced data redundancy and improved data integrity. However, DBMS also has disadvantages such as high implementation costs and complexity, while concepts like keys, normalization, and data abstraction are fundamental to its operation.

Uploaded by

ayandaprince11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views40 pages

Introduction To DBMS July 2018

DBMS, or Database Management System, is a collection of inter-related data and programs designed to store and retrieve data efficiently, addressing the need for optimized storage and fast retrieval of large amounts of data. It is widely used across various applications such as telecom, banking, education, and online shopping, providing advantages over traditional file systems, including reduced data redundancy and improved data integrity. However, DBMS also has disadvantages such as high implementation costs and complexity, while concepts like keys, normalization, and data abstraction are fundamental to its operation.

Uploaded by

ayandaprince11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Introduction to DBMS

DBMS stands for Database Management System. We can break it like this DBMS
= Database + Management System. Database is a collection of data and
Management System is a set of programs to store and retrieve those data. Based on
this we can define DBMS like this: DBMS is a collection of inter-related data and
set of programs to store & access those data in an easy and effective manner.
What is the need of DBMS?
Database systems are basically developed for large amount of data. When dealing
with huge amount of data, there are two things that require optimization: Storage
of data and retrieval of data.
Storage: According to the principles of database systems, the data is stored in such
a way that it acquires lot less space as the redundant data (duplicate data) has been
removed before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving
account and another is salary account. Let’s say bank stores saving account data at
one place (these places are called tables we will learn them later) and salary
account data at another place, in that case if the customer information such as
customer name, address etc. are stored at both places then this is just a wastage of
storage (redundancy/ duplication of data), to organize the data in a better way the
information should be stored at one place and both the accounts should be linked to
that information somehow. The same thing we achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed.
Database systems ensure that the data is retrieved as quickly as possible.
Database Applications – DBMS
Applications where we use Database Management Systems are:
• Telecom: There is a database to keeps track of the information regarding
calls made, network usage, customer details etc. Without the database
systems it is hard to maintain that huge amount of data that keeps updating
every millisecond.
• Industry: Where it is a manufacturing unit, warehouse or distribution
centre, each one needs a database to keep the records of ins and outs. For
example distribution centre should keep a track of the product units that
supplied into the centre as well as the products that got delivered out from
the distribution centre on each day; this is where DBMS comes into picture.
• Banking System: For storing customer info, tracking day to day credit and
debit transactions, generating bank statements etc. All this work has been
done with the help of Database management systems.
• Education sector: Database systems are frequently used in schools and
colleges to store and retrieve the data regarding student details, staff details,
course details, exam details, payroll data, attendance details, fees details etc.
There is a hell lot amount of inter-related data that needs to be stored and
retrieved in an efficient manner.

• Online shopping: You must be aware of the online shopping websites such
as Amazon, Flipkart etc. These sites store the product information, your
addresses and preferences, credit details and provide you the relevant list of
products based on your query. All this involves a Database management
system.

I have mentioned very few applications, this list is never going to end if we start
mentioning all the DBMS applications.
Advantages of DBMS over file system
Drawbacks of File system:
• Data Isolation: Because data are scattered in various files, and files may be
in different formats, writing new application programs to retrieve the
appropriate data is difficult.
• Duplication of data – Redundant data
• Dependency on application programs – Changing files would lead to change
in application programs.

Advantage of DBMS over file system


There are several advantages of Database management system over file system.
Few of them are as follows:
• No redundant data – Redundancy removed by data normalization
• Data Consistency and Integrity – data normalization takes care of it too
• Secure – Each user has a different set of access
• Privacy – Limited access
• Easy access to data
• Easy recovery
• Flexible
Disadvantages of DBMS:
• DBMS implementation cost is high compared to the file system
• Complexity: Database systems are complex to understand
• Performance: Database systems are generic, making them suitable for
various applications. However this feature affect their performance for some
applications
View of Data in DBMS
Abstraction is one of the main features of database systems. Hiding irrelevant
details from user and providing abstract view of data to users, helps in easy and
efficient user-database interaction.
To understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema. Refer these two tutorials to learn them in
detail.
Data Abstraction in DBMS
Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from users.
This process of hiding irrelevant details from user is called data abstraction.
We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this
level.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.
Example: Let’s say we are storing customer information in a customer table. At
physical level these records can be described as blocks of storage (bytes, gigabytes,
terabytes etc.) in memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along
with their data types, their relationship among each other can be logically
implemented. The programmers generally work at this level because they are
aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the
details at the screen, they are not aware of how the data is stored and what data is
stored; such details are hidden from them.
Instance and schema in DBMS

Definition of schema:

Design of a database is called the schema. Schema is of three types: Physical


schema, logical schema and view schema.

The design of a database at physical level is called physical schema, how the data
stored in blocks of storage is described at this level.

Design of database at logical level is called logical schema, programmers and


database administrators work at this level, at this level data can be described as
certain types of data records gets stored in data structures, however the internal
details such as implementation of data structure is hidden at this level (available at
physical level).
Design of database at view level is called view schema. This generally describes
end user interaction with database systems.

Definition of instance: The data stored in database at a particular moment of time


is called instance of database. Database schema defines the variable declarations in
tables that belong to a particular database; the value of these variables at a moment
of time is called the instance of that database.

keys in DBMS

Key plays an important role in relational database; it is used for identifying unique
rows from table. It also establishes relationship among tables.

Types of keys in DBMS

Primary Key – A primary is a column or set of columns in a table that uniquely


identifies tuples (rows) in that table.

Super Key – A super key is a set of one of more columns (attributes) to uniquely
identify rows in a table.

Candidate Key – A super key with no redundant attribute is known as candidate


key

Alternate Key – Out of all candidate keys, only one gets selected as primarykey,
remaining keys are known as alternate or secondary keys.

Composite Key – A key that consists of more than one attribute to uniquely
identify rows (also known as records & tuples) in a table is called composite key.

Foreign Key – Foreign keys are the columns of a table that points to the primary
key of another table. They act as a cross-reference between tables.

Primary key in DBMS

Definition: A primary is a column or set of columns in a table that uniquely


identifies tuples (rows) in that table.

Example:
Student Table
Stu _Id Stu _Name Stu _Age
101 Steve 23
102 John 24
103 Robert 28
104 Carl 22

In the above Student table, the Stu _Id column uniquely identifies each row of the
table.

• We denote the primary key by underlining the column name.


• The value of primary key should be unique for each row of the table.
Primary key column cannot contain duplicate values.
• Primary key column should not contain nulls.
• Primary keys are not necessarily to be a single column; more than one
column can also be a primary key for a table. For e.g. {Stu _Id, Stu _Name}
collectively can play a role of primary key in the above table, but that does
not make sense because Stu _Id alone is enough to uniquely identifies rows
in a table then why to make things complex. Having that said, we should
choose more than one columns as primary key only when there is no single
column that can play the role of primary key.

How to choose a primary key?


There are two ways: Either to create a column and let database automatically have
numbers in increasing order for each row or choose a column yourself making sure
that it does not contain duplicates and nulls. For e.g. in the above Student table,
The Stu _Name column cannot be a primary key as more than one people can have
same name, similarly the Stu _Age column cannot play a primary key role as more
than one persons can have same age.

Super key in DBMS

Definition: A super key is a set or one of more columns (attributes) to uniquely


identify rows in a table. Often people get confused between super key and
candidate key, so we will also discuss a little about candidate key here.
How candidate key is different from super key?
Answer is simple – Candidate keys are selected from the set of super keys, the only
thing we take care while selecting candidate key is: It should not have any
redundant attribute. That’s the reason they are also termed as minimal super key.

Let’s take an example to understand this: Employee table

Emp _SSN Emp _Number Emp _Name


123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert

Super keys:

• {Emp _SSN}
• {Emp _Number}
• {Emp _SSN, Emp _Number}
• {Emp _SSN, Emp _Name}
• {Emp _SSN, Emp _Number, Emp _Name}
• {Emp _Number, Emp _Name}

All of the above sets are able to uniquely identify rows of the employee table.

Candidate Keys:
As I stated above, they are the minimal super keys with no redundant attributes.

• {Emp _SSN}
• {Emp _Number}

Only these two sets are candidate keys as all other sets are having redundant
attributes that are not necessary for unique identification.

Super key vs Candidate Key

I have been getting lot of comments regarding the confusion between super key
and candidate key. Let me give you a clear explanation.
1. First you have to understand that all the candidate keys are super keys. This is
because the candidate keys are chosen out of the super keys.
2. How we choose candidate keys from the set of super keys? We look for those
keys from which we cannot remove any fields. In the above example, we have not
chosen {Emp_SSN, Emp_Name} as candidate key because {Emp_SSN} alone can
identify a unique row in the table and Emp_Name is redundant.

Primary key:
Primary key is being selected from the sets of candidate keys by database designer.
So Either {Emp _SSN} or {Emp _Number} can be the primary key.

Foreign key in DBMS

Definition: Foreign keys are the columns of a table that points to the primary key
of another table. They act as a cross-reference between tables.

For example:
In the below example the Stu _Id column in Course _enrollment table is a foreign
key as it points to the primary key of the Student table.

Course _enrollment table:

Course _Id Stu _Id


C01 101
C02 102
C03 101
C05 102
C06 103
C07 102

Student table:

Stu _Id Stu _Name Stu _Age


101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of
another table, if it points to a unique column (not necessarily a primary key) of
another table then too, it would be a foreign key. So, a correct definition of foreign
key would be: Foreign keys are the columns of a table that points to the candidate
key of another table.

Candidate Key in DBMS

A super key with no redundant attribute is known as candidate key. Candidate keys
are selected from the set of super keys, the only thing we take care while selecting
candidate key is: It should not have any redundant attributes. That’s the reason
they are also termed as minimal super key.

For example:

Emp _Id Emp _Number Emp _Name


E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

There are two candidate keys in above table:


{Emp _Id}
{Emp _Number}

Note: A primary key is being selected from the group of candidate keys. That
means we can either have Emp _Id or Emp _Number as primary key.

Composite key in DBMS

A key that consists of more than one attribute to uniquely identify rows (also
known as records & tuples) in a table is called composite key. It is also known as
compound key.

Example: Table – Sales

Cust _Id Order _Id Product _code Product _count


C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42

Key in above table: {cust _id, order _id}


This is a composite key as it consists of more than one attribute.

Alternate key in DBMS

Out of all candidate keys, only one gets selected as primary key, remaining keys
are known as alternative or secondary keys.

For example: Consider the below table

Emp _Id Emp _Number Emp _Name


E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

There are two candidate keys in above table:


{Emp _Id}
{Emp _Number}

Since we have selected Emp _Id as primary key, the remaining key Emp _Number
would be called alternative or secondary key.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a process of organizing the data in database to avoid data


redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s discuss
about anomalies first then we will discuss normal forms with examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized.
These are – Insertion, update and deletion anomaly. Let’s take an example to
understand this.

Example: Suppose a manufacturing company stores the employee details in a table


named employee that has four attributes: emp _id for storing employee’s id, emp
_name for storing employee’s name, emp _address for storing employee’s address
and emp _dept for storing the department details in which the employee works. At
some point of time the table looks like this:

Emp _id Emp _name Emp _address Emp _dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a
table is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he
belongs to two departments of the company. If we want to update the address of
Rick then we have to update the same in two rows or the data will become
inconsistent. If somehow, the correct address gets updated in one department but
not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be able to
insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department
D890 then deleting the rows that are having emp_dept as D890 would also delete
the information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we
will discuss about normalization.

Normalization
Here are the most commonly used normal forms:

• First normal form(1NF)


• Second normal form(2NF)
• Third normal form(3NF)
• Boyce & Codd normal form (BCNF)

First normal form (1NF)

As per the rule of first normal form, an attribute (column) of a table cannot hold
multiple values. It should hold only atomic values.

Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company
stored them in the same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that
rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987

Second normal form (2NF)

A table is said to be in 2NF if both the following conditions hold:

• Table is in 1NF (First normal form)


• No non-prime attribute is dependent on the proper subset of any candidate
key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects
they teach. They create a table that looks like this: Since a teacher can teach more
than one subjects, the table can have multiple rows for a same teacher.

teacher_id subject teacher_age


111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non - prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non- prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. This violates the rule for 2NF as the rule
says “no non-prime attribute is dependent on the proper subset of any candidate
key of the table”.

To make the table complies with 2NF e can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40

teacher_subject table:

teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

• Table must be in 2NF


• Transitive functional dependency of non-prime attribute on any super key
should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and
for each functional dependency X-> Y at least one of the following conditions
hold:

• X is a super key of table


• Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each


employee, they create a table named employee _details that looks like this:
Emp _id Emp _name Emp _zip Emp _state Emp _city Emp _district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp _id}, {emp _id, emp _name}, {emp _id, emp _name, emp
_zip}…so on
Candidate Keys: {emp _id}
Non-prime attributes: all attributes except emp _id are non-prime as they are not
part of any candidate keys.

Here, emp _state, emp _city & emp _district dependent on emp _zip. And, emp
_zip is dependent on emp _id that makes non-prime attributes (emp _state, emp
_city & emp _district) transitively dependent on super key (emp _id). This violates
the rule of 3NF.

To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:

employee table:

Emp _id Emp _name Emp _zip


1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999

employee_zip table:

Emp _zip Emp _state Emp _city Emp _district


282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is


stricter than 3NF. A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

Emp Emp Dept Dept _no _of


Emp _dept
_id _nationality _type _emp
1001 Austrian Production and planning D001 200
1001 Austrian Stores D001 250
design and technical
1002 American D134 100
support
1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp _id -> emp _nationality
emp _dept -> {dept _type, dept _no _of _emp}

Candidate key: {emp _id, emp _dept}

The table is not in BCNF as neither emp _id nor emp _dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like
this:
emp _nationality table:

Emp _id Emp _nationality


1001 Austrian
1002 American
emp_dept table:

Emp _dept Dept _type Dept _no _of _emp


Production and planning D001 200
stores D001 250
design and technical support D134 100
Purchasing department D134 600

Emp _dept _mapping table:

Emp _id Emp _dept


1001 Production and planning
1001 Stores
1002 design and technical support
1002 Purchasing department

Functional dependencies:
emp _id -> emp _nationality
emp _dept -> {dept _type, dept _no_ of _ emp}

Candidate keys:
For first table: emp _id
For second table: emp _dept
For third table: {emp _id, emp _ dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

Data models in DBMS

A Data Model is a logical structure of Database. It describes the design of


database to reflect entities, attributes, relationship among data, constrains etc.

Types of Data Models:


Object based logical Models – Describe data at the conceptual and view levels.

1. E-R Model
2. Object oriented Model
Record based logical Models – Like Object based model, they also describe data at
the conceptual and view levels. These models specify logical structure of database
with records, fields and attributes.

1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that
it has graph-like structure rather than a tree-based structure. Unlike
hierarchical model, this model allows each record to have more than one
parent record.

Physical Data Models – These models describe data at the lowest level of
abstraction.

E-R model in DBMS

An entity–relationship model (ER model) is a systematic way of describing and


defining a business process. An ER model is typically implemented as a database.
The main components of E-R model are: entity set and relationship set.

Here are the geometric shapes and their meaning in an E-R Diagram –

Rectangle: Represents Entity sets.

Ellipses: Attributes

Diamonds: Relationship Set

Lines: They link attributes to Entity Sets and Entity sets to Relationship Set

Double Ellipses: Multivalued Attributes

Dashed Ellipses: Derived Attributes

Double Rectangles: Weak Entity Sets

Double Lines: Total participation of an entity in a relationship set

A sample E-R Diagram:


Multivalued Attributes: An attribute that can hold multiple values is known as
multivalued attribute. We represent it with double ellipses in an E-R Diagram. E.g.
A person can have more than one phone numbers so the phone number attribute is
multivalued.

Derived Attribute: A derived attribute is one whose value is dynamic and derived
from another attribute. It is represented by dashed ellipses in an E-R Diagram. E.g.
Person age is a derived attribute as it changes over time and can be derived from
another attribute (Date of birth).

E-R diagram with multivalued and derived attributes:

Total Participation of an Entity set:


A Total participation of an entity set represents that each entity in entity set must
have at least one relationship in a relationship set. For example: In the below
diagram each college must have at-least one associated Student.

Relational
model in DBMS

In relational model, the data and relationships are represented by collection of


inter-related tables. Each table is a group of column and rows, where column
represents attribute of an entity and rows represents records.

Sample relationship Model: Student table with 3 columns and four records.

Stu _Id Stu _Name Stu _Age


111 Ashish 23
123 Saurav 22
169 Lester 24
234 Lou 26

Course table: Course table

Stu _Id Course _Id Course _Name


111 C01 Science
111 C02 DBMS
169 C22 Java
169 C39 Computer Networks
Here Stu _Id, Stu _Name & Stu _Age are attributes of table Student and Stu _Id,
Course _Id & Course _Name are attributes of table Course. The rows with values
are the records (commonly known as tuples).

Hierarchical model in DBMS

In hierarchical model, data is organized into a tree like structure with each record
is having one parent record and many children. The main drawback of this model
is that, it can have only one to many relationships between nodes.

Sample Hierarchical Model Diagram:

Example of hierarchical data represented as relational tables: The above


hierarchical model can be represented as relational tables like this:
Stu _Id Stu _Name Stu _Age
123 Steve 29
367 Chaitanya 27
234 Ajeet 28

Course Table:

Course _Id Course _Name Stu _Id


C01 Cobol 123
C21 Java 367
C22 Perl 367
C33 JQuery 234

DBMS languages

Database languages are used for read, update and store data in a database. There
are several such languages that can be used for this purpose; one of them is SQL
(Structured Query Language).

Types of DBMS languages:


Data Definition Language (DDL): DDL is used for specifying the database
schema. Let’s take SQL for instance to categorize the statements that comes under
DDL.

• To create the database instance – CREATE


• To alter the structure of database – ALTER
• To drop database instances – DROP
• To delete tables in a database instance – TRUNCATE
• To rename database instances – RENAME

All these commands specify or update the database schema that’s why they come
under Data Definition language.

Data Manipulation Language (DML): DML is used for accessing and


manipulating data in a database.
• To read records from table(s) – SELECT
• To insert record(s) into the table(s) – INSERT
• Update the data in table(s) – UPDATE
• Delete all the records from the table – DELETE

Data Control language (DCL): DCL is used for granting and revoking user
access on a database –

• To grant access to user – GRANT


• To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control
languages are not separate language; rather they are the parts of a single database
language such as SQL.

Functional dependency in DBMS

The attributes of a table is said to be dependent on each other when an attribute of


a table uniquely identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu _Id, Stu _Name,
Stu _Age. Here Stu _Id attribute uniquely identifies the Stu _Name attribute of
student table because if we know the student id we can tell the student name
associated with it. This is known as functional dependency and can be written as
Stu _Id->Stu _Name or in words we can say Stu _Name is functionally dependent
on Stu _Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A)

Types of Functional Dependencies

Trivial functional dependency in DBMS with example

The dependency of an attribute on a set of attributes is known as trivial functional


dependency if the set of attributes includes that attribute.

Symbolically: A ->B is trivial functional dependency if B is a subset of A.


The following dependencies are also trivial: A->A & B->B

For example: Consider a table with two columns Student _id and Student _Name.

{Student _Id, Student _Name} -> Student _Id is a trivial functional dependency as
Student _Id is a subset of {Student _Id, Student _Name}. That makes sense
because if we know the values of Student _Id and Student _Name then the value of
Student _Id can be uniquely determined.

Also, Student _Id -> Student _Id & Student _Name -> Student _Name are trivial
dependencies too.

RDBMS Concepts

RDBMS stands for relational database management system. A relational database


has following major components: Table, Record / Tuple, Field & Column
/Attribute.

Table:
A table is a collection of data represented in rows and columns. For e.g. following
table stores the information of students.

Student _Id Student _Name Student _ Addr Student _Age


101 Chaitanya Dayal Bagh, Agra 27
102 Ajeet Delhi 26
103 Rahul Gurgaon 24
104 Shubham Chennai 25

Records / Tuple:
Each row of a table is known as record or it is also known as tuple. For e.g. The
below row is a record.

102 Ajeet Delhi 26

Field:
The above table has four fields: Student _Id, Student _Name, Student _ Addr &
Student _Age.
Column / Attribute:
Each attribute and its values are known as attributes in a database. For e.g. Set of
values of Student _Id field is one of the four columns of the Student table.

Student _Id
101
102
103
104

Cardinality in DBMS

In DBMS you may hear cardinality term at two different places and it has two
different meanings as well.

In Context of Data Models:


In terms of data modeling, cardinality refers to the relationship between two tables.
They can be of four types:

One to One – A single row of table 1 associates with single row of table 2

One to Many – A single row of table 1 associates with more than one rows of table
2

Many to One – Many rows of table 1 associate with a single row of table 2

Many to Many – Many rows of table 1 associate with many rows of table 2

In Context of Query Optimization:


In terms of query, the cardinality refers to the uniqueness of a column in a table.
The column with all unique values would be having the high cardinality and the
column with all duplicate values would be having the low cardinality. These
cardinality scores helps in query optimization.

Transaction Management in DBMS


A transaction is a set of logically related operations. For example, you are
transferring money from your bank account to your friend’s account, the set of
operations would be like this:

Simple Transaction Example

1. Read your account balance


2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account

This whole set of operations can be called a transaction. Although I have shown
you read, write and update operations in the above example but the transaction can
have operations like read, write, insert, update, delete.

Transaction failure in between the operations

Now that we understand what is transaction, we should understand what are the
problems associated with it.

The main problem that can happen during a transaction is that the transaction can
fail before finishing the all the operations in the set. This can happen due to power
failure, system crash etc. This is a serious problem that can leave database in an
inconsistent state. Assume that transaction fail after third operation (see the
example above) then the amount would be deducted from your account but your
friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then


commit those changes to the database permanently.

Rollback: If any of the operation fails then rollback all the changes done by
previous operations.

Even though these operations can help us avoiding several issues that may arise
during transaction but they are not sufficient when two transactions are running
concurrently. To handle those problems we need to understand database ACID
properties.
ACID properties in DBMS

To ensure the integrity of data during a transaction (A transaction is a unit of


program that updates various data items, read more about it here), the
database system maintains the following properties. These properties are widely
known as ACID properties:

• Atomicity: This property ensures that either all the operations of a


transaction reflect in database or none. Let’s take an example of banking
system to understand this: Suppose Account A has a balance of 400$ & B
has 700$. Account A is transferring 100$ to Account B. This is a transaction
that has two operations a) Debiting 100$ from A’s balance b) Creating 100$
to B’s balance. Let’s say first operation passed successfully while second
failed, in this case A’s balance would be 300$ while B would be having
700$ instead of 800$. This is unacceptable in a banking system. Either the
transaction should fail without executing any of the operation or it should
process both the operations. The Atomicity property ensures that.
• Consistency: To preserve the consistency of database, the execution of
transaction should take place in isolation (that means no other transaction
should run concurrently when there is a transaction already running). For
example account A is having a balance of 400$ and it is transferring 100$ to
account B & C both. So we have two transactions here. Let’s say these
transactions run concurrently and both the transactions read 400$ balance, in
that case the final balance of A would be 300$ instead of 200$. This is
wrong. If the transaction were to run in isolation then the second transaction
would have read the correct balance 300$ (before debiting 100$) once the
first transaction went successful.
• Isolation: For every pair of transactions, one transaction should start
execution only when the other finished execution. I have already discussed
the example of Isolation in the Consistency property above.
• Durability: Once a transaction completes successfully, the changes it has
made into the database should be permanent even if there is a system failure.
The recovery-management component of database systems ensures the
durability of transaction.

Constraints in DBMS
Constraints enforce limits to the data or type of data that can be
inserted/updated/deleted from a table. The whole purpose of constraints is to
maintain the data integrity during an update/delete/insert into a table. In this
tutorial we will learn several types of constraints that can be created in RDBMS.

Types of constraints

• NOT NULL
• UNIQUE
• DEFAULT
• CHECK
• Key Constraints – PRIMARY KEY, FOREIGN KEY
• Domain constraints
• Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value.
When we don’t provide value for a particular column while inserting a record into
a table, it takes NULL value by default. By specifying NULL constraint, we can be
sure that a particular column(s) cannot have NULL values.

Example:

CREATE TABLE STUDENT (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235),
PRIMARY KEY (ROLL_NO)
);

Read more about this constraint here.

UNIQUE Constraint in SQL

UNIQUE Constraint enforces a column or set of columns to have unique values. If


a column has a Unique constraint, it means that particular column cannot have
duplicate values in a table.

Set UNIQUE Constraint while creating a table


For SQL Server / MS Access / Oracle:

Syntax:

CREATE TABLE <table _name>


(
<column _name> <data _type> UNIQUE,
<column_name2> <data _type>,
....
....
);

Example:

Here we are setting up the UNIQUE Constraint for two columns: STU_NAME &
STU_ADDRESS. which means these two columns cannot have duplicate values.

Note: STU_NAME column has two constraints (NOT NULL and UNIQUE both)
setup.

CREATE TABLE STUDENTS (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

MySQL:

Syntax:

CREATE TABLE <table _name>


(
<column _name> <data _type>,
<column_name2> <data _type>,
....
....
UNIQUE(column _name)
);
Example:

Setting up constraint on STU_NAME column.

CREATE TABLE STUDENTS (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35),
UNIQUE(STU_NAME),
PRIMARY KEY (ROLL_NO)
);

Naming of UNIQUE Constraint:

MySQL / SQL Server / MS Access / Oracle:

CREATE TABLE STUDENTS(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35),
CONSTRAINT stu_Info UNIQUE(STU_NAME, STU_ADDRESS),
PRIMARY KEY (ROLL_NO)
);

Set UNIQUE Constraint on a already created table

For MySQL / Oracle / SQL Server / MS Access:

For single column and without constraint naming:

Syntax:

ALTER TABLE <table_name>


ADD UNIQUE (<column_name>);

Example:

ALTER TABLE STUDENTS


ADD UNIQUE (STU_NAME);
For multiple columns and with constraint naming:

Syntax:

ALTER TABLE <table_name>


ADD CONSTRAINT <constraint_name> UNIQUE (<column_name1>,
<column_name2>,...);

Example:

ALTER TABLE STUDENTS


ADD CONSTRAINT stu_Info UNIQUE (STU_NAME,STU_ADDRESS);

How to drop a UNIQUE Constraint

IN MySQL:

syntax:

ALTER TABLE <table_name>


DROP INDEX <constraint_name>;

Example:

ALTER TABLE STUDENTS


DROP INDEX stu_Info

IN ORACLE / SQL Server / MS Access:

Syntax:

ALTER TABLE <table _name>


DROP CONSTRAINT <constraint _name>;

Example:

ALTER TABLE STUDENTS


DROP CONSTRAINT stu _Info;

Enjoyed this post? Try these related posts

1. SQL – DROP Table Statement to delete the entire table


2. SQL – CREATE TABLE Statement
3. WHERE Clause in SQL
4. Distinct Keyword in SQL
5. SQL – SELECT Query
6. Group By clause in SQL

UNIQUE:

UNIQUE Constraint enforces a column or set of columns to have unique values. If


a column has a unique constraint, it means that particular column cannot have
duplicate values in a table.

CREATE TABLE STUDENT (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

DEFAULT:

The DEFAULT constraint provides a default value to a column when there is no


value provided while inserting a record into a table.

CREATE TABLE STUDENT (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35),
PRIMARY KEY (ROLL_NO)
);

Read more: Default constraint

CHECK:
This constraint is used for specifying range of values for a particular column of a
table. When this constraint is being set on a column, it ensures that the specified
column must have the value falling in the specified range.

CREATE TABLE STUDENT (


ROLL_NO INT NOT NULL CHECK (ROLL_NO >1000),
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35),
PRIMARY KEY (ROLL_NO)
);

In the above example we have set the check constraint on ROLL_NO column of
STUDENT table. Now, the ROLL_NO field must have the value greater than
1000.

Key constraints:

PRIMARY KEY:

Primary key uniquely identifies each record in a table. It must have unique values
and cannot contain nulls. In the below example the ROLL_NO field is marked as
primary key, that means the ROLL_NO field cannot have duplicate and null
values.

CREATE TABLE STUDENT (


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.
Domain constraints:
Each table has certain set of columns and each column allows a same type of data,
based on its data type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:

Domain Constraint = data type + Constraints (NOT NULL / UNIQUE /


PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)

Mapping constraints:

Mapping constraints can be explained in terms of mapping cardinality:

Mapping Cardinality:
One to One: An entity of entity-set A can be associated with at most one entity of
entity-set B and an entity in entity-set B can be associated with at most one entity
of entity-set A.

One to Many: An entity of entity-set A can be associated with any number of


entities of entity-set B and an entity in entity-set B can be associated with at most
one entity of entity-set A.

Many to One: An entity of entity-set A can be associated with at most one entity
of entity-set B and an entity in entity-set B can be associated with any number of
entities of entity-set A.

Many to Many: An entity of entity-set A can be associated with any number of


entities of entity-set B and an entity in entity-set B can be associated with any
number of entities of entity-set A.

We can have these constraints in place while creating tables in database.

Example:

CREATE TABLE Customer (


Customer _id int PRIMARY KEY NOT NULL,
First _name varchar (20),
Last _name varchar (20)
);

CREATE TABLE Order (


Order _id int PRIMARY KEY NOT NULL,
Customer _id int,
order _details varchar (50),
constraint fk _Customers foreign key (customer _id)
references dbo. Customer
);

Assuming, that a customer orders more than once, the above relation represents
one to many relation. Similarly we can achieve other mapping constraints based
on the requirements.

Concurrency Control in DBMS

When more than one transactions are running simultaneously there are chances of
a conflict to occur which can leave database to an inconsistent state. To handle
these conflicts we need concurrency control in DBMS, which allows transactions
to run simultaneously but handles them in such a way so that the integrity of data
remains intact.

Let’s take an example to understand what I’m talking here.

Conflict Example

You and your brother have a joint bank account, from which you both can
withdraw money. Now let’s say you both go to different branches of the same bank
at the same time and try to withdraw 5000 INR, your joint account has only 6000
balance. Now if we don’t have concurrency control in place you both can get 5000
INR at the same time but once both the transactions finish the account balance
would be -4000 which is not possible and leaves the database in inconsistent state.
We need something that controls the transactions in such a way that allows the
transaction to run concurrently but maintaining the consistency of data to avoid
such issues.

Solution of Conflicts: Locks

A lock is kind of a mechanism that ensures that the integrity of data is maintained.
There are two types of a lock that can be placed while accessing the data so that the
concurrent transaction can not alter the data while we are processing it.

1. Shared Lock(S)
2. Exclusive Lock(X)
1. Shared Lock(S): Shared lock is placed when we are reading the data, multiple
shared locks can be placed on the data but when a shared lock is placed no
exclusive lock can be placed.

For example, when two transactions are reading Steve’s account balance, let them
read by placing shared lock but at the same time if another transaction wants to
update the Steve’s account balance by placing Exclusive lock, do not allow it until
reading is finished.

2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write
the data. This lock allows both the read and write operation, Once this lock is
placed on the data no other lock (shared or Exclusive) can be placed on the data
until Exclusive lock is released.

For example, when a transaction wants to update the Steve’s account balance, let it
do by placing X lock on it but if a second transaction wants to read the data(S lock)
don’t allow it, if another transaction wants to write the data(X lock) don’t allow
that either.

So based on this we can create a table like this:

Lock Compatibility Matrix

__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------

How to read this matrix?:


There are two rows, first row says that when S lock is placed, another S lock can
be acquired so it is marked true but no Exclusive locks can be acquired so marked
False.
In second row, When X lock is acquired neither S nor X lock can be acquired so
both marked false.

Deadlock in DBMS
A deadlock is a condition wherein two or more tasks are waiting for each other in
order to be finished but none of the task is willing to give up the resources that
other task needs. In this situation no task ever gets finished and is in waiting state
forever.

Coffman conditions

Coffman stated four conditions for a deadlock occurrence. A deadlock may occur
if all the following conditions holds true.

• Mutual exclusion condition: There must be at least one resource that


cannot be used by more than one process at a time.
• Hold and wait condition: A process that is holding a resource can request
for additional resources that are being held by other processes in the system.
• No preemption condition: A resource cannot be forcibly taken from a
process. Only the process can release a resource that is being held by it.
• Circular wait condition: A condition where one process is waiting for a
resource that is being held by second process and second process is waiting
for third process ….so on and the last process is waiting for the first process.
Thus making a circular chain of waiting.

Deadlock Handling

Ignore the deadlock (Ostrich algorithm)

Did that made you laugh? You may be wondering how ignoring a deadlock can
come under deadlock handling. But to let you know that the windows you are
using on your PC, uses this approach of deadlock handling and that is reason
sometimes it hangs up and you have to reboot it to get it working. Not only
Windows but UNIX also uses this approach.

The question is why? Why instead of dealing with a deadlock they ignore it
and why this is being called as Ostrich algorithm?

Well! Let me answer the second question first, This is known as Ostrich algorithm
because in this approach we ignore the deadlock and pretends that it would never
occur, just like Ostrich behavior “to stick one’s head in the sand and pretend there
is no problem.”

Let’s discuss why we ignore it: When it is believed that deadlocks are very rare
and cost of deadlock handling is higher, in that case ignoring is better solution than
handling it. For example: Let’s take the operating system example – If the time
requires handling the deadlock is higher than the time requires rebooting the
windows then rebooting would be a preferred choice considering that deadlocks
are very rare in windows.

Deadlock detection

Resource scheduler is one that keeps the track of resources allocated to and
requested by processes. Thus, if there is a deadlock it is known to the resource
scheduler. This is how a deadlock is detected.

Once a deadlock is detected it is being corrected by following methods:

• Terminating processes involved in deadlock: Terminating all the


processes involved in deadlock or terminating process one by one until
deadlock is resolved can be the solutions but both of these approaches are
not good. Terminating all processes cost high and partial work done by
processes gets lost. Terminating one by one takes lot of time because each
time a process is terminated, it needs to check whether the deadlock is
resolved or not. Thus, the best approach is considering process age and
priority while terminating them during a deadlock condition.
• Resource Preemption: Another approach can be the preemption of
resources and allocation of them to the other processes until the deadlock is
resolved.

Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock
occurs so preventing one or more of them could prevent the deadlock.

• Removing mutual exclusion: All resources must be sharable that means at a


time more than one processes can get a hold of the resources. That approach
is practically impossible.
• Removing hold and wait condition: This can be removed if the process
acquires all the resources that are needed before starting out. Another way to
remove this to enforce a rule of requesting resource when there are none in
held by the process.
• Preemption of resources: Preemption of resources from a process can
result in rollback and thus this needs to be avoided in order to maintain the
consistency and stability of the system.
• Avoid circular wait condition: This can be avoided if the resources are
maintained in a hierarchy and process can hold the resources in increasing
order of precedence. This avoid circular wait. Another way of doing this to
force one resource per process rule – A process can request for a resource
once it releases the resource currently being held by it. This avoids the
circular wait.

Deadlock Avoidance

Deadlock can be avoided if resources are allocated in such a way that it avoids the
deadlock occurrence. There are two algorithms for deadlock avoidance.

• Wait/Die
• Wound/Wait

Here is the table representation of resource allocation for each algorithm. Both of
these algorithms take process age into consideration while determining the best
possible way of resource allocation for deadlock avoidance.

Wait/Die Wound/Wait
Older process needs a resource held by Older process Younger process
younger process waits dies
Younger process needs a resource held Younger process Younger process
by older process dies waits

Once of the famous deadlock avoidance algorithm is Banker’s algorithm

You might also like