DATABASE MANAGEMENT SYSTEM
A database management system (or DBMS) is essentially nothing more than a computerized data-
keeping system. Users of the system are given facilities to perform several kinds of operations on such a
system for either manipulation of the data in the database or the management of the database
structure itself.
Database
Data stored and accessed electronically from a computer system. Where databases are more complex
they are often developed using formal design and modeling techniques
The database is a collection of inter-related data which is used to retrieve, insert and delete the data
efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students and faculty etc.
Database Management System
Database management system is a software which is used to manage the database. For example:
MySQL, Oracle, etc are a very popular commercial database which is used in different applications.
DBMS provides an interface to perform various operations like database creation, storing data in it,
updating data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also maintains data
consistency.
Tasks possible in DBMS
Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
Data Updation: It is used for the insertion, modification, and deletion of the actual data in the database.
Data Retrieval: It is used to retrieve the data from the database which can be used by applications for
various purposes.
User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing
data security, dealing with concurrency control, monitoring performance and recovering information
corrupted by unexpected failure.
Characteristics of DBMS
It uses a digital repository established on a server to store and manage the information.
It can provide a clear and logical view of the process that manipulates data.
DBMS contains automatic backup and recovery procedures.
It contains ACID properties which maintain data in a healthy state in case of failure.
It can reduce the complex relationship between data.
It is used to support manipulation and processing of data.
It is used to provide security of data.
It can view the database from different viewpoints according to the requirements of the user.
Advantages of DBMS
Controls database redundancy: It can control data redundancy because it stores all the data in one
single database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data among multiple
users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup of data from
hardware and software failures and restores the data if required.
Multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces
Disadvantages of DBMS
Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run
DBMS software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of the organization, all
the data stored in a single database and if the database is damaged due to electric failure or database
corruption then the data may be lost forever
DBMS vs. File System
File System Approach
File based systems were an early attempt to computerize the manual system. It is also called a
traditional based approach in which a decentralized approach was taken where each department stored
and controlled its own data with the help of a data processing specialist. The main role of a data
processing specialist was to create the necessary computer file structures, and also manage the data
within structures and design some application programs that create reports based on file data
DBMS:
A database approach is a well-organized collection of data that are related in a meaningful way which
can be accessed by different users but stored only once in a system. The various operations performed
by the DBMS system are: Insertion, deletion, selection, sorting etc.
DBMS Architecture
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal with
a large number of PCs, web servers, database servers and other components that are connected with
networks.
The client/server architecture consists of many PCs and a workstation which are connected via the
network.
DBMS architecture depends upon how users are connected to the database to get their request done.
Types of DBMS Architecture
Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of
two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
In this architecture, the database is directly available to the user. It means the user can directly sit on
the DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn’t provide a handy tool for
end users.
The 1-Tier architecture is used for development of the local application, where programmers can
directly communicate with the database for the quick response.
2-Tier Architecture
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
client end can directly communicate with the database at the server side. For this interaction, API’s like:
ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction
management.
To communicate with the DBMS, client-side application establishes a connection with the server side.
3-Tier Architecture
The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can’t directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with
the database system.
End user has no idea about the existence of the database beyond the application server. The database
also has no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.
Data model Schema and Instance
The data which is stored in the database at a particular moment of time is called an instance of the
database.
The overall design of a database is called schema.
A database schema is the skeleton structure of the database. It represents the logical view of the entire
database.
A schema contains schema objects like table, foreign key, primary key, views, columns, data types,
stored procedure, etc.
A database schema can be represented by using the visual diagram. That diagram shows the database
objects and relationship with each other.
A database schema is designed by the database designers to help programmers whose software will
interact with the database. The process of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of record type, data type,
and constraints. Other aspects can’t be specified through the schema diagram. For example, the given
figure neither show the data type of each data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given figure, the database
changes whenever we add a new grade or add a student. The data at a particular moment of time is
called the instance of the database.
Data Independence
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the database
system without altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
Logical data independence refers characteristic of being able to change the conceptual schema without
having to change the external schema.
Logical data independence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not be
affected.
Logical data independence occurs at the user interface level.
2. Physical Data Independence
Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
If we do any changes in the storage size of the database system server, then the Conceptual structure of
the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
Physical data independence occurs at the logical interface level
Database Language
A DBMS has appropriate languages and interfaces to express database queries and updates.
Database languages can be used to read, store and update the data in the database.
Types of Database Language
Data Definition Language
o DDL stands for Data Definition Language. It is used to define database
structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.
Here are some tasks that come under DDL
Create: It is used to create objects in the database.
Alter: It is used to alter the structure of the database.
Drop: It is used to delete objects from the database.
Truncate: It is used to remove all records from a table.
Rename: It is used to rename an object.
Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that’s why they
come under Data definition language.
Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
Here are some tasks that come under DML:
Select: It is used to retrieve data from a database.
Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.
3. Data Control Language
DCL stands for Data Control Language. It is used to retrieve the stored or
saved data.
The DCL execution is transactional. It also has rollback parameters.
Here are some tasks that come under DCL:
Grant: It is used to give user access privileges to a database.
Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
Transaction Control Language
TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.
Here are some tasks that come under TCL:
Commit: It is used to save the transaction on the database.
Rollback: It is used to restore the database to original since the last Commit.
Keys
Keys play an important role in the relational database.
It is used to uniquely identify any record or row of data from the table. It is
also used to establish and identify relationships between tables.
Types of key:
Primary Key
It is the first key which is used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys as we saw in PERSON
table. The key which is most suitable from those lists become a primary key.
In the EMPLOYEE table, ID can be primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary key since they are also unique.
For each entity, selection of the primary key is based on requirement and
developers.
Candidate Key
A candidate key is an attribute or set of an attribute which can uniquely identify a
tuple.
The remaining attributes except for primary key are considered as a candidate key.
The candidate keys are as strong as the primary key.
Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super
key is a superset of a candidate key.
Foreign key
Foreign keys are the column of the table which is used to point to the primary
key of another table.
In a company, every employee works in a specific department, and employee
and department are two different entities. So we can’t store the information
of the department in the employee table. That’s why we link these two tables
through the primary key of one table.
We add the primary key of the DEPARTMENT table, Department_Id as a new
attribute in the EMPLOYEE table.
Now in the EMPLOYEE table, Department_Id is the foreign key, and both the
tables are related.
Generalization
Generalization is like a bottom-up approach in which two or more entities of
lower level combine to form a higher level entity if they have some attributes
in common.
In generalization, an entity of a higher level can also combine with the entities
of the lower level to form a further higher level entity.
Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach.
In generalization, entities are combined to form a more generalized entity,
i.e., subclasses are combined to make a superclass.
Specialization
Specialization is a top-down approach, and it is opposite to Generalization. In
specialization, one higher level entity can be broken down into two lower level
entities.
Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
Normally, the superclass is defined first, the subclass and its related attributes
are defined next, and relationship set are then added.
Aggregation
In aggregation, the relation between two entities is treated as a single entity.
In aggregation, relationship with its corresponding entities is aggregated into
a higher level entity.
Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them
using relationship.
The normal form is used to reduce redundancy from the database table.
Types of Normal Forms
First Normal Form (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.
First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.
Second Normal Form (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional
dependent on the primary key
Third Normal Form (3NF)
A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.
A relation is in third normal form if it holds atleast one of the following
conditions for every non-trivial function dependency X → Y.
X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists,
then the relation will be a multi-valued dependency.
Fifth normal form (5NF)
A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible
in order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of
the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
SQL
SQL stands for Structured Query Language. It is used for storing and
managing data in relational database management system (RDMS).
It is a standard language for Relational Database System. It enables a user to
create, read, update and delete relational databases and tables.
All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use
SQL as their standard database language.
SQL allows users to query the database in a number of ways, using English-
like statements.
Rules:
SQL follows the following rules:
Structure query language is not case sensitive. Generally, keywords of SQL
are written in uppercase.
Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
Using the SQL statements, you can perform most of the actions in a database.
SQL depends on tuple relational calculus and relational algebra.
SQL process
When an SQL command is executing for any RDBMS, then the system figure
out the best way to carry out the request and the SQL engine determines that
how to interpret the task.
In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc.
All the non-SQL queries are handled by the classic query engine, but SQL
query engine won’t handle logical files.
Characteristics of SQL
SQL is easy to learn.
SQL is used to access data from relational database management systems.
SQL can execute queries against the database.
SQL is used to describe the data.
SQL is used to define the data in the database and manipulate it when needed.
SQL is used to create and drop the database and table.
SQL is used to create a view, stored procedure, function in a database.
SQL allows users to set permissions on tables, procedures, and views.
Advantages of SQL
There are the following advantages of SQL:
High speed
Using the SQL queries, the user can quickly and efficiently retrieve a large
amount of records from a database.
No coding needed
In the standard SQL, it is very easy to manage the database system. It doesn’t
require a substantial amount of code to manage the database system.
Well defined standards
Long established are used by the SQL databases that are being used by ISO
and ANSI.
Portability
SQL can be used in laptop, PCs, server and even some mobile phones.
Interactive language
SQL is a domain language used to communicate with the database. It is also
used to receive answers to the complex questions in seconds.
Multiple data view
Using the SQL language, the users can make different views of the database
structure.