DBMS Unit1
DBMS Unit1
Data:
Facts that can be recorded.
It is raw or unprocessed form.
You cannot take decision based on data.
Data has no meaning.
Example(s):
Text, numbers, images, videos etc….
25, Karthik, Karimnagar.
Processing
Information
Example(s):
1
Database (db): Collection of similar or related data or
The related information when placed in an organized form makes a database or
organized collection of related information is known as database.
Example:
Various types of databases can be Traditional databases (text & numbers), Multimedia
databases (audio, video, movies, speech), Geographically Information System (satellite
images), Real time databases (store).
Data ware house: Data ware house is a kind of database in which the data is going to be
very huge and historical.
Example: Data stored about past 100 years of a company .stock market rates.
***Good decisions require good information that is derived from raw facts (Data).
Database Management systems: It is the software system that allows the user to define
and maintain the database and provide control access to the database or
DBMS is a collection of interrelated data and a set of programs to access data. The
primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.
2
Summary:
Functionalities:
1. Define: Specifying the data type, structure and constraints for the data to be
stored.
2. Construct: Process of storing data on some storage medium.
3. Manipulate: Querying the database to retrieve specific data, updating database
and generating reports.
4. Share: Allows multiple users and programs to access the database concurrently.
3
Database System Application:
A Historical Perspective:
4
1960s:
Charles Bachman designed first general – purpose DBMS, was called integrated
data store.
IDS formed the basis for the network data model.
Late 1960s:
1970:
1980s:
The relational model consolidated its position as the dominant DBMS paradigm,
and database systems continued to gain widespread use,
The SQL query language for relational databases, developed as part of IBM’s
project, is now the standard query language.
Several vendors (e.g IBMS’s DB2, Oracle 8, Informix) extended their systems
with the ability to store new data types such as images, videos and to ask more
complex queries.
Specialized systems have been developed by numerous vendors for creating data
warehouses.
Present era:
DBMS are entered the Internet age. While the first generation of websites stored
their data in operating system files, the use of a database accessed through a web
browser is becoming wide spread.
Queries are generated through web-accessible forms and answers are formatted
using a markup language such as HTML.
Database Management Systems
5
File Systems vs a DBMS
1. Data Redundancy:
There are no methods to validate the insertion of duplicate data in the system. Any user
can enter any data. File system doesn’t validate for the kind of data being entered nor
doesn’t validate for previous existence of the same data in the same file. Duplicate data in
the system is not appreciated as it is a waste of space and always lead to confusion and
misleading of data. When there are duplicate data in the file and if we need to update or
delete the record, we might end up in updating / deleting one of the records leaving the
other record in the file.
2. Data inconsistency:
For Example student and student_report files have student’s address in it, and there was a
change request for one particular student’s address. The program search only student file
for the address and updated it correctly. There is another program which prints the
student’s report and mails it to the address mentioned in the student_report file. There is a
mismatch in the actual address and his report is sent to his old address. This mismatch in
different copies of same data is called data inconsistency.
3. Data Isolation:
Imagine we have to generate a single report of student, who is studying in particular
class, his study report, his library book details and hostel information. All these
information are stored in different files. How do we get all these details in one report?
We have to write a program. But before writing the program , the programmer should
find out which all files have the information needed, what is the format of each file, how
to search data in each file etc. Once all these analysis is done, he writes a program. If
there is 2-3 files involved, programming would be bit simple. Imagine if there is lot many
files involved in it? It would require lot of effort for the programmer. Since all the data
are isolated from each other in different files, programming becomes difficult.
4. Security:
Each file can be password protected. But what if we have to give access to only few
records in the file? For Example, user has to be given access to view only their bank
account details in the file. This is very difficult in the file system.
1. Hierarchical Model.
2. Network Model
3. Entity – Relationship Model
4. Relational Model
Hierarchical Model:
This database model organizes data into a tree-like-structure, with a single root, to which all the
other data is linked. The hierarchy starts from the Root data, and expands like a tree, adding
child nodes to the parent nodes. In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes etc.
It has one-to-many relationship between two different types of data, for example, one department
can have many courses, many professors and many students.
This is an extension of the Hierarchical model. In this model data is organized more like a graph,
and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this database
model. Also, as the data is more related, hence accessing the data is also easier and fast. This
database model was used to map many-to-many data relationships.
This model is good to design a database, which can then be turned into tables in relational model.
Let's take an example, If we have to design a School Database, then Student will be an entity
with attributes name, age, address etc. As Address is generally complex, it can be another
entity with attributes street name, pincode, city etc, and there will be a relationship between
them.
To simplify user's interaction with the system, the complexity is hidden from the database users
through several levels of abstraction.
View Level:
Example: If we have a login-id and password in a university system, then as a student, we can
view our marks, attendance, fee structure, etc. But the faculty of the university will have a
Example: Let us take an example where we use the relational model for storing the data. We
have to store the data of a student, the columns in the student table will be student_name, age,
mail_id, roll_no etc. We have to define all these at this level while we are creating the database.
Though the data is stored in the database but the structure of the tables like the student table,
teacher table, books table, etc are defined here in the conceptual level or logical level. Also, how
the tables are related to each other is defined here. Overall, we can say that we are creating a
blueprint of the data at the conceptual level.
Physical Level:
It tells the actual location of the data that is being stored by the user. The Database
Administrators (DBA) decides that which data should be kept at which particular disk drive, how
the data has to be fragmented, where it has to be stored etc. They decide if the data has to be
centralized or distributed. Though we see the data in the form of tables at view level the data
Data Independence
Data independence is ability to modify a schema definition in one level without affecting a
schema definition in the next higher level. or Capacity of changing the schema at one level
without effecting the another level.
Physical Data Independence refers to the characteristic of changing the physical level without
affecting the logical level or conceptual level. Using this property we can easily change the
storage device of the database without affecting the logical schema.
Due to Physical independence, any of the below change will not affect the conceptual layer.
It refers to the characteristics of changing the logical level without affecting the external or
view level. This also helps in separating the logical level from the view level. If we do any
changes in the logical level then the user view of the data remains unaffected. The changes in
the logical level are required whenever there is a change in the logical structure of the
database.
Due to Logical independence, any of the below change will not affect the external layer.
Application programmers:
As its name shows, application programmers are the one who writes application programs that
uses the database. These application programs are written in programming languages like PHP,
Java etc. These programs meet the user requirement and made according to user requirements.
Retrieving information, creating new information and changing existing information is done by
these application programs.
Sophisticated Users:
They are database developers, who write SQL queries to select/insert/delete/update data. They do
not use any application or programs to request the database. They directly interact with the
database by means of query language like SQL. These users will be scientists, engineers,
analysts who thoroughly study SQL and DBMS to apply the concepts in their requirement.
Database Administrators:
The life cycle of database starts from designing, implementing to administration of it. A database
for any kind of requirement needs to be designed perfectly so that it should work without any
issues. Once all the design is complete, it needs to be installed. Once this step is complete, users
start using the database. The database grows as the data grows in the database. When the
database becomes huge, its performance comes down. Also accessing the data from the database
becomes challenge. There will be unused memory in database, making the memory inevitably
huge. These administration and maintenance of database is taken care by database Administrator
(DBA).A DBA has many responsibilities. A good performing database is in the hands of DBA.
DBA is responsible for installing a new DBMS server for the new projects. He is also
responsible for upgrading these servers as there are new versions comes in the market or
requirement. If there is any failure in up gradation of the existing servers, he should be
able revert the new changes back to the older version, thus maintaining the DBMS
working.
Performance tuning:
Since database is huge and it will have lots of tables, data, constraints and indices, there
will be variations in the performance from time to time. Also, because of some designing
issues or data growth, the database will not work as expected. It is responsibility of the
DBA to tune the database performance. He is responsible to make sure all the queries and
programs works in fraction of seconds.
Sometimes, users using oracle would like to shift to SQL server or MySQL. It is the
responsibility of DBA to make sure that migration happens without any failure, and there
is no data loss.
Proper backup and recovery programs needs to be developed by DBA and has to be
maintained him. This is one of the main responsibilities of DBA. Data/objects should be
backed up regularly so that if there is any crash, it should be recovered without much
effort and data loss.
Security:
DBA is responsible for creating various database users and roles, and giving them
different levels of access rights.
DBA should be properly documenting all his activities so that if he quits or any new
DBA comes in, he should be able to understand the database without any effort. He
should basically maintain all his installation, backup, recovery, security methods. He
should keep various reports about database performance.
Data Interpreter: It interprets DDL statements and records them in tables containing metadata.
DML Compiler: The DML commands such as insert, update, delete, retrieve from the
application program are sent to the DML compiler for compilation into object code for database
access. The object code is then optimized in the best way to execute a query by the query
optimizer and then send to the data manager.
Query Evaluation Engine: It executes low-level instructions generated by the DML compiler.
Buffer Manager: It is responsible for retrieving data from disk storage into main memory. It
enables the database to handle data sizes that are much larger than the size of main memory.
File Manager: It manages the allocation of space on the disk storage and the data structure used
to represent information stored on disk.
Authorization and Integrity Manager: Checks the authority of users to access data and
satisfaction of the integrity constraints.
Transaction Manager: It ensures that the database remains in a consistent state despite the
system failures and that concurrent transaction execution proceeds without conflicting.
Data dictionary: Data Dictionary, which stores metadata about the database, in particular the
schema of the database such as names of the tables, names of attributes of each table, length of
attributes, and number of rows in each table.
Indices: An index is a small table having two columns in which the 1 st column contains a copy
of the primary or candidate key of a table and the second column a set of pointers holding the
address of the disk block.
These statistics provide the optimizer with information about the state of the tables that will be
accessed by the SQL statement that is being optimized. The types of statistical information
stored in the system catalog include:
Information about tables including the total number of rows, information about
compression, and total number of pages.
Information about columns including number of discrete values for the column and the
distribution range of values stored in the column.
Information about table spaces including the number of active pages.
Current status of the index including whether an index exists or not, the organization of
the index (number of leaf pages and number of levels), the number of discrete values for
the index key, and whether the index is clustered.
Information about the table space and partitions.
1. Entity sets
2. Attributes
3. Relationship sets
Types of Entities:
Entity set: Entity set is a group of similar entities that share the same properties i.e
it represents schema / structure.
Symbol:
Student
Attributes: Attributes are the units that describe the characteristics / properties of
entities.
Types of Attributes:
Example: DOB
Name
Student
Example:
L_name F_name
Name
Student Roll_No
Single attribute: Single attribute can have only one value at an instance of time.
Represented by oval
Example:
DOB
Student
Multi-valued attribute: Multi-valued attribute can have more than one value at an
instance of time.
Represented by double oval.
Student
Stored attribute: Stored attribute is an attribute which are physically stored in the
database.
Example:
DOB
Student
Derived Attribute: Derived attribute are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database.
Represented by dotted oval.
Example:
DOB
Student age
Example:
teaches Student
Teacher
Since
works Department
Employee
for
Case Study:
E1. . D1
E2. .
. D2
E3. .
E4. . D3
E5. .
. D4
E6.
N Works 1
employe Department
for
Figure - 1
1. Participation constraints
Total participation is represented using a double line between the entity set and
relationship set.
Example:
Works 1
Employee for
Department
Partial participation: It specifies that each entity in the entity set may or may not
participate in the relationship instance in that relationship set. It is also called as
optional participation.
Partial participation is represented using a single line between the entity set and
relationship set.
N
Works 1
employee for
Department
One to one
One to many
Many to one
Many to many
1. One to one:
Example: Every dept should have a hod and only one employee manages a
dept and an employee can manage only one dept.
E1. . D1
E2. .
. D2
E3. .
E4. . D3
E5. .
E6. . D4
N
Works 1
employee Department
for
1 N
Has
Dept Employee
Example: Every emp is supported to work at least one project and he can
work on many projects, a project is supported to have many emp and a
project should have at least one emp.
E1 . P1
.
E2 . .
. P2
E3 .
.
E4 . . P3
E5 . .
. P4
E6 . .
.
M N
Employee Works project
on
Roll No Name
Student
Strong entity: An entity type is strong if its existence does not depend on some
other entity type. Such entity is called Strong Entity.
Representation
Example:
Weak Entity:
A weak entity is an entity set that does not have sufficient attributes for Unique
Identification of its records. Simply a weak entity is nothing but an entity which
does not have a key attribute.
It contains a partial key called as discriminator which helps in identifying a
group of entities from the entity set.
Discriminator is represented by underlining with a dashed line.
Representation
Example:
Example:
If we have a person entity type who has attributes such as Name, Phone_no,
Address. Now, suppose we are making software for a shop then the person can be
Generalization:
It is the process of extracting shared characteristics from two or more classes and
combining them into a generalized super class, shared characteristics can be
attributes.
Reverse of Specialization
Bottom – up approach.
We can see that the three attributes i.e. Name, Phone, and Address are common
here. When we generalize these two entities, we form a new higher-level entity
type Person. The new entity type formed is a generalized entity. We can see in the
below E-R diagram that after generalization the new generalized entity Person
contains only the common attributes i.e. Name, Phone, and Address. Employee
entity contains only the specialized attribute like Employee_id and Salary.
Similarly, the Customer entity type contains only specialized attributes like
Customer_id, Credit, and Email. So from this example, it is also clear that when
we go from bottom to top, it is a Generalization and when we go from top to
bottom it is Specialization. Hence, we can say that Generalization is the reverse of
Specialization.
Representation:
Example:
Drawbacks:
After aggregation
While identifying the attributes of an entity set, it is sometimes not clear whether a
property should be modeled as an attribute or as an entity set. For example,
consider adding address information to the Employees entity set. One option is to
use an attribute address. This option is appropriate if we need to record only one
address per employee, an alternate is to create an entity set called Addresses and to
record associations between employees and address using a relationship (say,
Has_address). This more complex alternative is necessary in two situations.