1 DMS(313302) CM-3-K
Unit - I
Introduction To Database System
1.1 Database concepts:-Data, Database, Database management system,
File system Vs DBMS, Applications of DBMS, Data Abstraction, Data
Independence, Database Schema, The Codd’s rules, Overall
structure of DBMS.
Concept of Data, Database, DBMS:
Data -Data is the information which has been translated into a form that is more
convenient to process or move.
Database- The collection of related data is termed as Database which is organized in
such a way that it can be easily retrieved and managed.
A Database Management System (DBMS) is System software which manages the data.
It can perform various tasks like creation, retrieval, insertion, modıfication and deletion
of data to manage it in a systematic way as per requirement.
File Processing System
File Processing System is a Computer based system in which all the information is stored
in various computer files.
It is useful but as the requirement of data processing and the size of data increases,
the drawback of systems comes.
Disadvantages of Traditional File Processing System
1. Data Redundancy
Sometimes as per requirement same data may be stored in multiple files.
Consider an employee having record in both Employee and Team files. The name and
address of employee is stored in both of these files.
Means the data get duplicated. If such data increases, it leads to higher storage
and access cost. This duplication of data in various files is termed as data redundancy.
In traditional file system, it is very difficult to avoid this data redundancy.
2. Data inconsistency
When data is to be updated the data redundancy may lead to data inconsistency.
Data inconsistency occurs when data is not updated in all the files simultaneously.
For example if the designation of employee get changed, then the respective
changes should be made in both Employee and team file. If for some reason, it is not
done, then it leads to data inconsistency.
Because for the sample employee, we may get different information which may
create problems in the processing of data.
3. Limited Data Sharing
It is difficult to share data in traditional file system. Each application has its own private
files and users have little choice to share the data outside their own applications.
To share data, we have to write complex programs.
4. Difficulty in Accessing Data
The need of data access varies time to time. Means different types of information is
needed at different situations.
For example just consider that we want to retrieve the data of employees who do not
have taken any leave throughout the quarter. In such case we have two options.
We can access the data by manual method or we have to write an application
program to retrieve such customized data, both the options are not convenient as both of
them leads to wastage of time.
If we do it, then also it may be possible that after some time we may require data
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
2 DMS(313302) CM-3-K
with some another filter criteria. The data retrieval for customized information becomes
difficult because the conventional files system does not provide any Efficient and
convenient way to retrieve the data
5. Data Dependence
In the files, data is stored in some specific format tab, semicolon or comma. If the
format of any of the file is changed, then we have to make changes in program which
processes the file.
But sometimes there may be many programs related with the same file. In such
case changes in all such programs should be done. Missing changes in single program
may lead to failure of whole application.
6. Poor Data Control
The Traditional File System does not have centralized data control, the data is de-
centralized or distributed. In this system the same field may have different names in files
of different departments of an organization.
This situation may lead to different meaning of same data field in different context or
same meaning for different fields. This cause poor data control.
7. Problem of Security
It is very difficult to enforce security checks and access rights in a traditional file
system. To the file we can set password protection. But what if we have to give access to
only few records in the file? For example, in our database system, the project manager
should be able to see all the data regarding teams under him The team leader should be
able to see data about his specific team. But payment details of one project manager
should not be accessible to his team members or any another project manager. In the
conventional file processing system, the the application programs are added in ad hoc
manner (for specific purpose) which makes it difficult to enforce security constraints.
8. Concurrency Problems
Concurrency means access of same data by multiple users at the same time. This is very
important aspect as it leads to increase in performance of a system and faster response.
Many advanced systems allow the concurrent access and manipulation of data. For
example in our system, consider a record of an employee is accessed and updated by
multiple users simultaneously at a time. This may lead to inconsistency of data, if the
concurrency is not controlled in a proper manner. In another example if multiple
transactions are make updations on a same bank account, then it may show incorrect
balance, if any other transactions try to access balance amount in between. It is very
difficult to implement concurrency control mechanism on file processing system, which
leads to incorrect or wrong data retrieval.
9. Poor Data Modeling of Real World
It is difficult for File Processing System to represent the complex data and interfile
relationships. This results in poor data modelling properties. That means the real world
applications are difficult to implement using File Processing System.
10. Data isolation
It is difficult to store the entire data in a single file. It is distributed in different files as per
the category. These files may be in different formats because of which it becomes
difficult to write application programs to access the desired data from these files.
11. Integrity Problems
Every enterprise has its own constraints while maintain data in the files. Suppose in
employee files the employee ID must start with 'E. Such constraints can be added while
writing application programs. But later on if any new constraints are introduced by the
enterprise, then it becomes difficult to add these constraints again. The File processing
system does not provide any functionality to handle this situation.
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
3 DMS(313302) CM-3-K
12. Atomicity Problem
Failure in a computer system may occur any time. When failure occurs, if any transaction
is in its midway then it may lead to some incorrect data updation in the system.
Consider another example of bank transaction where some amount is transferred from
account A to account B. Initially the balance from account A is accessed and debited by
Rs. 1000. Then we are going to credit it in account B. But before that system crash occurs
which halts the transaction.
Now this situation leads to incorrect data updation in the balance of account A. In file
processing system, it very difficult to handle such situation to maintain the atomicity of
database. The purpose of Database Management System is to solve all these problems
and give functionality to store and manage data in efficient and convenient way.
Advantages of DBMS over File Processing System
1. Controlling Data Redundancy
In File Processing System the different applications has separate files for data storage. In
this case, the duplicated copies of the same data are created at many places. In DBMS, all
the data of an organization is integrated into a single database. The data is recorded at
only one place in the database and it is not duplicated. For example, the Employee file
and the Team file contain several items that are identical. When they are converted into
database, the data is integrated into a single database so that multiple copies of the same
data are reduced to-single copy. Controlling the data redundancy helps to save storage
space. Similarly, it is useful for retrieving data from database using queries.
2. Data Consistency
The data consistency is obtained by controlling the data redundancy, If a data item
appears only once, any update to its value has to be performed only once and the
updated value is immediately available to all users.
For example if there is change in designation of employee, then the changes are made in
single centralized file which is
available to all the users.
3. Sharing of Data
In DBMS, data can be easily shared by different applications. The database administrator
manages the data and gives rights to users to access the data.
Multiple users can be authorized to access the same data simultaneously. The remote
users can also share same data.
4. Data Independence
In DBMS we can completely separate the data structure of database and programs or
applications which are used to access the data. This is called as data independence. If any
changes are made in structure of database then there is no need to make changes in the
programs. For example you can modify the size or data type of a data items (fields of a
database table) without making any change in application program.
5. Data Control
The DBMS provides centralized data storage. Hence keeping control on data is very much
easy as compared to Traditional File Processing System. As data is common for all the
application, no possibility of any confusion or complication.
6. Security
In DBMS the different users can have different levels of access to data based on their
roles. In the college database, students will have access to their own data only, while
their teachers will have access to data of all the students whom they are teaching. Class
teacher will be able to see the reports of all the students in that class, but not other
classes. The principal will have access to entire data.
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
4 DMS(313302) CM-3-K
Similarly, in a banking system, individual operator and clerk will have limited access to
the data while the bank manager can access the entire data. All these levels of security
and access are not allowed in file system.
7. Control over Concurrency
In a computer file-based system, if multiple users are accessing data simultaneously, it is
possible that it may lead to some irrelevant data generation. For example, if update
operation is executed by both the users on the same record, then value updated by one
may get overwrite by other. Mostly the database management systems have sub-systems
to control the concurrency so that accuracy is maintained in transaction recording.
8. Data Modeling of Real World
The DBMS has many functionalities are provided to represent the complex data and
interfile relationships. This helps to map the database with real world applications.
Disadvantages of DBMS
1. Increased costs
2. Complexity
3. Size
4. Frequent upgrade/replacement cycle
5. Higher impact of a failure
6. Performance
Application of DBMS
1. Telecom
2. Banking
3. Industry
4. E-Commerce
5. Airlines
6. Education Systems
7. Railway Reservation System
8. Library Management System
9. Social Media Sites
Three Level Architecture for Database system
A database system is collection of related data and system software which
manages the data. The data is generally stored in a detailed and complex manner. It is
important to provide an abstract view of data to the user.
Data Abstraction
Extracting the important data by ignoring the remaining irrelevant details is
known as abstraction. This process of hiding irrelevant details from user is called data
abstraction. The complexity of database can be hiding from user by using different level
of abstraction.
Different Levels of Data Abstraction
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
5 DMS(313302) CM-3-K
Physical Level –This is the lowest level in the three level architecture. The physical level
describes how data is actually stored in the database. In the lowest level, this data is
stored in the external hard drives like hard disk, magnetic tapes etc.
Logical Level- This is the next higher level of abstraction which is used to describe what
data the database stores, and what relationships exist in between the data items.
Database administrators use the logical level of abstraction to decide what information
to keep in a database.
View Level- It is the highest level of data abstraction. This level describes the user
interaction with database system. End user interacts with system with the help of GUI
and enters the details at the screen at view level. User is not aware of how the data is
stored and what data is stored; such details are hidden from them.
Data Abstraction: Different levels of Data abstraction, Instance and
Schema, Data Independence
Instance and Schema
Instance-The data is stored in the database at particular moment is called as instance of
the database.
Schema- The design of a database is called the schema.
Types of
Schema
Physical Logical View
Schema Schema Schema
Types of Schema
Physical Schema-is the design of database at physical level i.e. How the data
stored in the blocks of storage in this level.
Logical Schema –is the design of database at logical level developers and
database administrator work at this level.
View Schema –refers to design of database at view level. This usually describes
the end user interaction with database system.
Types of Data
Independence
Physical Data Logical Data
Independence Independence
Data Independence
The ability to modify schema definition in one level without affecting schema definition
in the next higher level is called data independence.
Physical Data Independence –It is ability to modify the physical schema without
causing application programs to be rewritten. we change the physical storage/level
without affecting the conceptual or external view of the data
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
6 DMS(313302) CM-3-K
Logical Data Independence - It is ability to modify the logical schema without causing
application programs to be rewritten. Modification at this level are necessary whenever
the logical structure of the database is altered.
Overall Structure of DBMS
DDL Compiler: It converts the DDL commands into set of table containing metadata
stored in data dictionary.
DML Compiler: It receives the DML commands from application program and converts
DML commands into object code for understanding of database.
Query Optimizer: It optimized the object code to execute query in best way and then
send to store data manager.
Storage Data Manager: It is a program module which is responsible for storing,
retrieving and updating data in the database. It receives the request from query
optimizer to machine understandable form.it makes the actual request inside the
database.
Data Files: It stores the database itself.
Compiled DML: The DML Compiler converts the high level Queries into low level file
access commands known as Compiled DML.
Data Dictionary: It stores metadata, i.e. file, storage , attributes , access path , etc.
E.F. Codd’s Rules for RDBMS
Rule 0: Foundation Rule
Foundation rule states that the system must be capable to manage their database
systems through their relational capabilities. All other twelve rules are derived from this
foundation rule. Remaining twelve rule are as follows:
Rule 1: Information Representation
All the information in the database must be stored in standard form of
tables.Table is considered as best format to store and manage the data.
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
7 DMS(313302) CM-3-K
Rule 2: Systematic treatment of null values
In a database the NULL values must be given a systematic and uniform treatment.
In number of cases we may have to set null values on place of data. For Example, data is
missing, data is not known, or data is not applicable. Null Value is different from empty
character or string, string of a blank character, and it is also different from zero value or
any number.
Rule 3: The guaranteed access rule
In a data base every single data element must be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column
value). There should not be requirement of any other entity to access
the data.
Rule 4: Active online catalog
The description of the structure of entire database must be stored in an online
catalog, known as data dictionary. Data dictionary can be accessed by authorized users.
Metadata should be maintained for all the data in the database.
The system must support an online, inline, relational catalog that is accessible to
authorized users by means of their regular query language. That is nothing but, just like
ordinary data the database description is represented at the logical level, so that is
possible for authorized users to apply the same language of regular data to its
interrogation.
Rule 5: The comprehensive data sub language rule
Database should not be directly accessible. It should always be accessible by using
some strong query language. This rule illustrates that the system should support at least
one relational language. The language -
(a) Has a linear syntax
(b) Can be used both interactively and within application programs (c) Supports
data definition operations (including view definitions), data manipulation
operations (update as well as retrieval), security and integrity constraints, and
transaction management operations (begin, commit, and rollback).
Rule 6: View updating rule
Views are the virtual tables created using queries to show the partial or complete
view of the table. All views that are theoretically updatable must also be updatable by
the system The rule states that we should be able to make changes in views.
Rule 7: High-level Insert, Update, and Delete Rule
High Level means multiple rows from multiple columns are affected by the single
query. This rule states that a database must support insertion, updation, and deletion.
This must not be limited for a single row, that is, it must also support union, minus and
intersection operations to yield sets of data records.
Example
Suppose employees got 5% hike in a year. Then their salary has to be updated to
reflect the new salary. Since this is the annual hike given to the employees, this increment
is applicable for all the employees. Hence, the query should not be written for updating
the salary one by one for thousands of employee. A single query should be strong enough
to update the entire employee's salary at a time.
Rule 8: Physical data independence
Changes in the physical level (i.e. format of data storage or container of data like
array or linked list) must not require any change to an application.
Rule 9: Logical data independence
The user's view (application) should not be dependent upon logical data in a
database. That means changes in logical data must not affect the applications which uses
it. For example, if two tables are merged or one is split into two different tables, there
should be no impact or change on the user application.
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
8 DMS(313302) CM-3-K
Rule 10: Integrity independence
The integrity constraints must be specified independently from application
programs and stored in the catalog. We should be able to make changes in integrity
constraints independently without the need of any change in the application.
Rule 11: Distribution independence
The distribution of database at various locations should not be visible to end user. Users
should always get the impression that the data is located at one site only. That means
even though the data is distributed, it should not affect the speed of access and
performance of data compared to centralized database.
Rule 12: The non-subversion rule
If a relational system has a low-level (single-record-at-a-time) language, that low
level cannot be used to subvert or bypass the integrity rules and constraints expressed in
the higher level relational language.
Means anyhow those integrity rules and constraints must be followed, we cannot
violet them by using any back door option.
1.2 Architecture: Two tier and Three tier architecture of database.
What is Database Architecture?
DBMS architecture helps in design, development, implementation, and
maintenance of a database. A database stores critical information for a business.
Selecting the correct Database Architecture helps in quick and secure access to this data.
• 1 tier Architecture
• 2-tier Architecture
• 3-tier Architecture
1 tier Architecture
The simplest of Database Architecture are 1 tier where the Client, Server, and Database
all reside on the same machine. Anytime you install a DB in your system and access it to
practise SQL queries it is 1 tier architecture. But such architecture is rarely used in
production.
2-tier Architecture
A two-tier architecture is a database architecture where
1. Presentation layer runs on a client (PC, Mobile, Tablet, etc)
2. Data is stored on a Server.
An application interface which is called ODBC (Open Database Connectivity) an API
which allows the client-side program to call the DBMS. Today most of the DBMS offers
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
9 DMS(313302) CM-3-K
ODBC drivers for their DBMS. 2 tier architecture provides added security to the DBMS as
it is not exposed to the end user directly.
Example -MS- Access. Standalone Applications Library Management Systems
In the above 2-teir architecture we can see that one server is connected with clients 1,
2m and 3. This architecture provides Direct and faster communication.
3 tier Architecture
3-tier schema is an extension of the 2-tier architecture. 3-tier architecture has following
layers
1. Presentation layer (your PC, Tablet, Mobile, etc.)
2. Application layer (server)
3. Database Server
This DBMS architecture contains an Application layer between the user and the DBMS,
which is responsible for communicating the user's request to the DBMS system and send
the response from the DBMS to the user.
The application layer(business logic layer) also processes functional logic, constraint,
and rules before passing data to the user or down to the DBMS.
Three tier architecture is the most popular DBMS architecture.
The goal of Three-teir architecture is:
To separate the user applications and physical database
Proposed to support DBMS characteristics
Program-data independence
Support of multiple views of the data
Example of Three-teir Architecture
E-commerce Websites
Online Banking Systems
Learning Management Systems (LMS)
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
10 DMS(313302) CM-3-K
1.3 Data Models:- Hierarchical, Networking, Relational Data Models
Data Models
The process of analysis of data object and their relationships to other data objects
is known as data modeling. t is the conceptual representation of data in database. It is
the first step in database designing. Data models define how data is connected to each
other and how they are processed and stored inside the system. A data model provides a
way to describe the design of a database at the physical, logical and view levels.
Types Data Models
1.Record Based Logical Model/Relational Model-
Developed by E.F.Codd. Tables are used and also known as relations.
Record are known as tuples and fields are known as attributes.
Key-Every record must have a unique identification or key.
Advantages-
Support SQL
Flexible
Stud_ID Stud_Name DOB
101 Prajakta 03/03/1995
102 Rakesh 13/01/1996
103 Rahul 16/08/1995
2. Hierarchical Model
➢ Organized into a tree structure.
➢ Parent-child relationship.
➢ Data is stored in the form of Record.
➢ Record is collection of fields and it contains only one value.
➢ One parent can have many child nodes but one child node can have only one
parent node.
Advantages-
Simple to Understand.
Database Integrity
Efficient
B C
D E F
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke
11 DMS(313302) CM-3-K
3.Network Database Model
➢ Extended type of Hierarchical data Model but any child can have multiple
parent.
➢ No need of Parent-child relationship.
➢ Allows multiple Records linked in same file.
➢ Advantages-
1. Design is Simple.
2. Capability to handle various relationship.
3. Easy to access.
S.K.B.P. Polytechnic, Computer Technology Prof.D.R.Cholke