DBS Unit 1
DBS Unit 1
What is Database
The database is a collection of inter-related data which is used to retrieve,
insert and delete the data efficiently. It is also used to organize the data in
the form of a table, schema, views, and reports, etc.
For example: The college Database organizes the data about the admin,
staff, students and faculty etc.
Using the database, you can easily retrieve, insert, and delete the
information.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it
stores all the data in one single database file and that recorded data is placed
in the database.
o Data sharing: In DBMS, the authorized users of an organization can share
the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the centralized
nature of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create
automatic backup of data from hardware and software failures and restores
the data if required.
o multiple user interface: It provides different types of user interfaces like
graphical user interfaces, application program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data
processor and large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them
efficiently.
o Complexity: Database system creates additional complexity and
requirements.
o Higher impact of failure: Failure is highly impacted the database because
in most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the
data may be lost forever.
What is Data?
Data is a collection of a distinct small unit of information. It can be used in a
variety of forms like text, numbers, media, bytes, etc. it can be stored in
pieces of paper or electronic memory, etc.
Word 'Data' is originated from the word 'datum' that means 'single piece of
information.' It is plural of the word datum.
What is Database?
A database is an organized collection of data, so that it can be easily
accessed and managed
You can organize data into tables, rows, columns, and index it to make it
easier to find relevant information.
Database handlers create a database in such a way that only one set of
software program provides access of data to all the users.
There are many dynamic websites on the World Wide Web nowadays
which are handled through databases. For example, a model that checks the
availability of rooms in a hotel. It is an example of a dynamic website that
uses a database.
There are many databases available like MySQL, Sybase, Oracle,
MongoDB, Informix, PostgreSQL, SQL Server, etc.
Evolution of Databases
The database has completed more than 50 years of journey of its evolution
from flat-file system to relational and objects relational systems. It has gone
through several generations.
The Evolution
File-Based
1968 was the year when File-Based database were introduced. In file-based
databases, data was maintained in a flat file. Though files have many
advantages, there are several limitations.
One of the major advantages is that the file system has various access
methods, e.g., sequential, indexed, and random.
Below diagram represents Hierarchical Data Model. Small circle represents objects.
In this model, files are related as owners and members, like to the common network model.
This model also had some limitations like system complexity and difficult to design and
maintain.
Relational Database
1970 - Present: It is the era of Relational Database and Database Management. In 1970,
the relational model was proposed by E.F. Codd.
Relational database model has two main terminologies called instance and schema.
Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate
logic.
Like file system, this model also had some limitations like complex
implementation, lack structural independence, can't easily handle a many-
many relationship, etc.
During the era of the relational database, many more models had introduced
like object-oriented model, object-relational model, etc.
Cloud database
Cloud database facilitates you to store, manage, and retrieve their
structured, unstructured data via a cloud platform. This data is accessible
over the Internet. Cloud databases are also called a database as service
(DBaaS) because they are offered as a managed service.
Lower costs
Automated
Increased accessibility
You can access your cloud-based database from any location, anytime. All
you need is just an internet connection.
Data Independence
These commands are used to update the database schema that's why they
come under Data definition language.
2. Data Manipulation Language (DML)
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
o DCL stands for Data Control Language. It is used to retrieve the stored
or saved data.
o The DCL execution is transactional. It also has rollback parameters.
There are the following operations which have the authorization of Revoke:
In this section, we will learn and understand about the ACID properties. We
will learn what these properties stand for and what does each property is
used for. We will also understand the ACID properties with the help of some
examples.
ACID Properties
The expansion of the term ACID defines for:
1) Atomicity
The term atomicity defines that the data remains atomic. It means if any
operation is performed on the data, either it should be performed or
executed completely or should not be executed at all. It further means that
the operation should not break in between or execute partially. In the case of
executing operations on the transaction, the operation should be completely
executed and not partially.
Example: If Remo has account A having $30 in his account from which he
wishes to send $10 to Sheero's account, which is B. In account B, a sum of $
100 is already present. When $10 will be transferred to account B, the sum
will become $110. Now, there will be two operations that will take place. One
is the amount of $10 that Remo wants to transfer will be debited from his
account A, and the same amount will get credited to account B, i.e., into
Sheero's account. Now, what happens - the first operation of debit executes
successfully, but the credit operation, however, fails. Thus, in Remo's
account A, the value becomes $20, and to that of Sheero's account, it
remains $100 as it was previously present.
In the above diagram, it can be seen that after crediting $10, the amount is
still $100 in account B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done
successfully. Thus the transaction is atomic.
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS, the
integrity of the data should be maintained, which means if a change in the database is made, it
should remain preserved always. In the case of transactions, the integrity of the data is very
essential so that the database remains consistent before and after the transaction. The data should
always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a transaction T one
by one to both B & C. There are two operations that take place, i.e., Debit and Credit. Account A
firstly debits $50 to account B, and the amount in account A is read $300 by B before the
transaction. After the successful transaction T, the available amount in B becomes $150. Now, A
debits $20 to account C, and that time, the value read by C is $250 (that is correct as a debit of
$50 has been successfully done to B). The debit and credit operation from account A to C has
been done successfully. We can see that the transaction is done successfully, and the value is also
read correctly. Thus, the data is consistent. In case the value read by B and C is $300, which
means that data is inconsistent because when the debit operation executes, it will not be
consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a
database where no data should affect the other one and may occur
concurrently. In short, the operation on one database should begin when the
operation on the first database gets complete. It means if two operations are
being performed on two different databases, they may not affect the value of
one another. In the case of transactions, when two or more transactions
occur simultaneously, the consistency should remain maintained. Any
changes that occur in any particular transaction will not be seen by other
transactions until the change is not committed in the memory.
4) Durability
Durability ensures the permanency of something. In DBMS, the term
durability ensures that the data after the successful execution of the
operation becomes permanent in the database. The durability of the data
should be so perfect that even if the system fails or leads to a crash, the
database still survives. However, if gets lost, it becomes the responsibility of
the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we
make changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the
consistency and availability of data in the database.
Thus, it was a precise introduction of ACID properties in DBMS. We have
discussed these properties in the transaction section also.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an
entity can be represented as rectangles.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak
entity doesn't contain any key attribute of its own. The weak entity is
represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to
represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a
student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a primary
key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued attribute.
The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from
another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then
it is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to
one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then this is known
as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is
done by the only specific scientist.
Notation of ER diagram
Database can be represented using the notations. In ER diagram, many
notations are used to express the cardinality. These notations are as follows:
Fig: Notations of ER diagram
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance
of an entity on the right associates with the relationship then it is known as a
many-to-one relationship.
For example, Student enrolls for only one course, but a course can have
many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is
known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have
many employees.
It is a Bottom up process i.e. consider we have 3 sub entities Car, Truck and
Motorcycle. Now these three entities can be generalized into one super class
named as Vehicle.
Specialization is a process of identifying subsets of an entity that share
some different characteristic. It is a top down approach in which one entity
is broken down into low level entity.
In above example Vehicle entity can be a Car, Truck or Motorcycle.
Category or Union
Relationship of one super or sub class with more than one super class.
Owner is the subset of two super class: Vehicle and House.
Aggregation
Represents relationship between a whole object and its component.
Mapping Constraints
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity
in E2, and an entity in E2 is associated with at most one entity in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of
entities in E2, and an entity in E2 is associated with any number of entities in
E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of
entities in E2, and an entity in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one
entity in E2, and an entity in E2 is associated with any number of entities in
E1.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is
also used to establish and identify relationships between tables.
1. Primary key
o It is the first key used to identify one and only one instance of an entity
uniquely. An entity can contain multiple keys, as we saw in the PERSON table.
The key which is most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and
developers.
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a
tuple.
o Except for the primary key, the remaining attributes are considered a
candidate key. The candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key.
The rest of the attributes, like SSN, Passport_Number, License_Number, etc.,
are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is
a superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME), the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of
another table.
o Every employee works in a specific department in a company, and employee
and department are two different entities. So we can't store the department's
information in the employee table. That's why we link these two tables
through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that
uniquely identify each tuple in a relation. These attributes or combinations of
the attributes are called the candidate keys. One key is chosen as the
primary key from these candidate keys, and the remaining candidate key, if
it exists, is termed the alternate key. In other words, the total number of
the alternate keys is the total number of candidate keys minus the primary
key. The alternate key may or may not exist. If there is only one candidate
key in a relation, it does not have an alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a
composite key. This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be
assigned multiple roles, and an employee may work on multiple projects
simultaneously. So the primary key will be composed of all three attributes,
namely Emp_ID, Emp_role, and Proj_ID in combination. So these attributes
act as a composite key since the primary key comprises more than one
attribute.
Generalization
o Generalization is like a bottom-up approach in which two or more entities of
lower level combine to form a higher level entity if they have some attributes
in common.
o In generalization, an entity of a higher level can also combine with the
entities of the lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity,
i.e., subclasses are combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a
higher level entity Person.
Specialization
o Specialization is a top-down approach, and it is opposite to Generalization. In
specialization, one higher level entity can be broken down into two lower
level entities.
o Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related
attributes are defined next, and relationship set are then added.
Aggregation
In aggregation, the relation between two entities is treated as a single entity.
In aggregation, relationship with its corresponding entities is aggregated into
a higher level entity.
For example: Center entity offers the Course entity act as a single entity in
the relationship which is in a relationship with another entity visitor. In the
real world, if a visitor visits a coaching center then he will never enquiry
about the Course only or just about the Center instead he will ask the
enquiry about both.
Reduction of ER diagram to Table
The database can be represented using the notations, and these notations
can be reduced to a collection of tables.
There are some points for converting the ER diagram to the table:
Using these rules, you can convert the ER diagram to tables and columns
and assign the mapping between the tables. Table structure for the given ER
diagram is as below:
Figure: Table structure
1. One-to-one (1:1)
2. One-to-many (1:M)
3. Many-to-many (M:N)
1. One-to-one
o In a one-to-one relationship, one occurrence of an entity relates to only one
occurrence in another entity.
o A one-to-one relationship rarely exists in practice.
o For example: if an employee is allocated a company car then that car can
only be driven by that employee.
o Therefore, employee and company car have a one-to-one relationship.
2. One-to-many
o In a one-to-many relationship, one occurrence in an entity relates to many
occurrences in another entity.
o For example: An employee works in one department, but a department has
many employees.
o Therefore, department and employee have a one-to-many relationship.
3. Many-to-many
o In a many-to-many relationship, many occurrences in an entity relate to
many occurrences in another entity.
o Same as a one-to-one relationship, the many-to-many relationship rarely
exists in practice.
o For example: At the same time, an employee can work on several projects,
and a project has a team of many employees.
o Therefore, employee and project have a many-to-many relationship.