Evolution of Database Systems
Evolution of Database Systems
UNIT –I
                            Database System Applications
A Historical Perspective
             The concept of a database was made possible by the emergence of direct access
storage media such as magnetic disks, which became widely available in the mid-1960s;
earlier systems relied on sequential storage of data on magnetic tape. The subsequent
development of database technology can be divided into three eras based on data model or
structure: navigational, SQL/relational, and post-relational.
            The two main early navigational data models were the hierarchical model and
the CODASYL model (network model). These were characterized by the use of pointers
(often physical disk addresses) to follow relationships from one record to another.
             The relational model, first proposed in 1970 by Edgar F. Codd, departed from
this tradition by insisting that applications should search for data by content, rather than by
following links. The relational model employs sets of ledger-style tables, each used for a
different type of entity. Only in the mid-1980s did computing hardware become powerful
enough to allow the wide deployment of relational systems (DBMSs plus applications). By
the early 1990s, however, relational systems dominated in all large-scale data
processing applications, and as of 2018 they remain dominant: IBM Db2, Oracle, MySQL,
and Microsoft SQL Server are the most searched DBMS. The dominant database language,
standardized SQL for the relational model, has influenced database languages for other data
models.
           The next generation of post-relational databases in the late 2000s became known
as NoSQL databases, introducing fast key–value stores and document-oriented databases. A
competing "next generation" known as NewSQL databases attempted new implementations
that retained the relational/SQL model while aiming to match the high performance of
NoSQL compared to commercially available relational DBMSs.
1960s, Navigational DBMS
             The introduction of the term database coincided with the availability of direct-
 access storage (disks and drums) from the mid-1960s onwards. The term represented a
 contrast with the tape-based systems of the past, allowing shared interactive use rather than
 daily batch processing. The Oxford English Dictionary cites a 1962 report by the System
 Development Corporation of California as the first to use the term "data-base" in a specific
 technical sense.
                    It doesn’t provide backup and              It provides backup and recovery of data even if it is
Backup and Recovery recovery of data if it is lost.            lost.
                       There is less data consistency in the   There is more data consistency because of the
Consistency            file system.                            process of normalization.
                        File systems provide less security in DBMS has more security mechanisms as compared
Security Constraints    comparison to DBMS.                   to file systems.
Data models are used to describe how the data is stored, accessed, and updated in a DBMS.
 There are many types of data models that are used in the industry.
 Types of Data Models in DBMS
 1) Hierarchical model
                In hierarchical model, data is organized into a tree like structure with each
  record is having one parent record and many children. The main drawback of this model is
  that, it can have only one to many relationships between nodes.
              Lets say we have few students and few courses and a course can be assigned to a
  single student only, however a student take any number of courses so this relationship
  becomes one to many.
2) Network model
   The network model is a very complex database model, so the user must be very familiar with
    the overall structure of the database.
   Updating the database is a quite difficult and boring task. We need the help of the application
    programs that are being used to navigate the data.
    3) Entity-Relationship Model
    Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
     among them. While formulating real-world scenario into the database model, the ER Model
     creates entity set, relationship set, general attributes and constraints.
    ER Model is best used for the conceptual design of a database.
    ER Model is based on −
     Entitiesand their attributes.
     Relationships among entities.
Advantages
1.           SIMPLE : It is simple to draw an ER diagram when we know entities and
 relationships.
2.           EASY TO UNDERSTAND : The design of ER is very logical and hence they
 are easy to design and understand.
3.           INTEGRATED : The ER Model can be easily integrated with relational
 model.
4.           USEFUL IN DECISION MAKING : By drawing an ER-Diagram we come to
 know what kind of attributes and relationship exist between them.
5.           EASY CONVERSION : It can be easily converted to other type of models.
Disadvantages
1.           LOSS OF INFORMATION: While drawing an ER Model some of the
 information can be hidden or lost.
2.           NO REPRESENTATION FOR DATA MANIPULATION: It is not possible
 to represent data manipulation(commands like insert(),delete(),alter(),update()) in ER
 model.
3.           DATA INCONSISTENCY: Due to improper Normalization some data
 inconsistency may occur so, while creating an ER diagram at least it should be in third
 normal form.
4) Relational Model
The relational model in DBMS is an abstract model used to organize and manage the data
 stored in a database. It stores data in two-dimensional inter-related tables, also known as
 relations in which each row represents an entity and each column represents the properties of
 the entity.
The main highlights of this model are −
 Data is stored in tables called relations.
 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain.
Relational Model Concepts
   Relational Schema : It is the logical blueprint of the relation i.e., it describes the design
    and the structure of the relation. It contains the table name, its attributes, and their types:
For our Student relation example, the relational schema will be:
   Relational Instance : It is the collection of records present in the relation at a given time.
        Relation Key : It is an attribute or a group of attributes that can be used to uniquely
         identify an entity in a table or to determine the relationship between two tables. Relation
         keys can be of 6 different types:
1.                      Candidate Key
2.                      Super Key
3.                      Composite Key
4.                      Primary Key
5.                      Alternate Key
6.                      Foreign Key
Highlights:
     Relational models make use of some rules to ensure the accuracy and accessibility of the
        data. These rules or constraints are known as Relational Integrity Constraints.
     These constraints are checked before performing any operation like insertion, deletion, or
       updation on the data present in a relational database. These constraints include:
      Domain Constraint : It specifies that every attribute is bound to have a value that lies
       inside a specific range of values.
         Key Constraint : It states that every relation must contain an attribute or a set of
          attributes (Primary Key) that can uniquely identify a tuple in that relation. This key can
          never be NULL or contain the same value for two different tuples.
         Referential Integrity Constraint : It is defined between two inter-related tables. It
          states that if a given relation refers to a key attribute of a different or same table.
When we notice any unexpected behavior while working with the relational databases, there
may be a presence of too much redundancy in the data stored in the database. This can
cause anomalies in the DBMS and it can be of various types such as:
 InsertionAnomalies: It is the inability to insert data in the database due to the absence of
other data. For example: Suppose we are dividing the whole class into groups for a project
and the GroupNumber attribute is defined so that null values are not allowed. If a new student
is admitted to the class but not immediately assigned to a group then this student can't be
inserted into the database.
 Deletion Anomalies - It is the accidental loss of data in the database upon deletion of any
other data element. For example: Suppose, we have an employee relation that contains the
details of the employee along with the department they are working in. Now, if a department
has one employee working in it and we remove the information of this employee from the
table, there will be the loss of data related to the department also. This can lead to data
inconsistency.
Advantages
Disadvantages
     The  performance of the relational model depends upon the number of relations present in
      the database.
     Hence, as the number of tables increases, the requirement of physical memory increases.
     The structure becomes complex and there is a decrease in the response time for the
      queries.
     Because of all these factors, the cost of implementing a relational database increase.
5) Object Oriented Data Model
The foundation of any object-oriented programming language is based on the concept of
 objects and classes. This enables the user to achieve abstraction, inheritance, polymorphism,
 and encapsulation. A similar concept is used in the Object Oriented Model in DBMS.
The Object-Oriented Model in DBMS or OODM is the data model where data is stored in the
 form of objects. This model is used to represent real-world entities. The data and data
 relationship is stored together in a single entity known as an object in the Object Oriented
 Model. The Object-Oriented Database Management System is built on top of Object Oriented
 Model.
As we have discussed earlier, we can use the Object Oriented Model in DBMS to store real-
world entities. Here, we can store pictures, audio, video, and other types of data.
Example
Object attribute- The objects have certain characteristics. These are known as
 the attributes of the object.
Inheritance- It is the ability of the object within the class hierarchy to inherit the
  attributes and methods of the classes above it. A new class can be derived from an
  existing class, the new class has the attributes and methods described in the
  existing class and also has its attributes and methods. This helps in code
  reusability.
   It is used to add semantic content that can be well understood by the user.
   It is used to make the code resemble real-world objects.
   Database integrity can be achieved.
   Structural and database independence is created.
           We can store pictures, audio, video, and other types of data, which was previously
            impossible to store earlier.
The basic goal for the Object-relational database is to bridge the gap between relational
 databases and the object-oriented modeling techniques used in programming languages such
 as Java, C++, Visual Basic .NET or C#. However, a more popular alternative for achieving
 such a bridge is to use a standard relational database systems with some form of object-
 relational mapping (ORM) software. Whereas traditional RDBMS or SQL-DBMS products
focused on the efficient management of data drawn from a limited set of datatypes (defined
by the relevant language standards), an objectrelational DBMS allows software developers to
integrate their own types and the methods that apply to them into the DBMS.
          It can be used to represent data from some data sources that aren’t bound by a
           schema.
          It offers a versatile format for data sharing between various DBs.
          Viewing structured data as semi-structured can be beneficial for browsing purposes.
          It is simple to alter the schema.
          It’s possible that the data transfer format is portable.
Disadvantages
The main disadvantage of using a semi-structured data model is that queries cannot be
 performed as quickly as they can in a more limited structure, such as the relational model.
In a semi-structured DB, records are typically stored with unique IDs referenced with pointers
 to their disc location.
It makes navigational or path-based queries very efficient, but it is inefficient for searching
 multiple records (as is common in SQL) because it must seek around the disc following
 pointers.
8) Flat Data Model
A flat database is a simple database system in which each database is represented as a single
table in which all of the records are stored as single rows of data, which are separated by
delimiters such as tabs or commas. The table is usually stored and physically represented as a
simple text file.
Because of the limitations of flat databases, they are not unsuitable for most software
applications in which there is a need to represent and store complex business relationships.
However, some application developers still use flat files in order to reduce the cost and
complexity of integrating a relational database.
Unlike relational databases, flat databases cannot represent complex relationships between
 entities. They also have no way of enforcing constraints between data.
For example, in an application used by a commercial bank, it is a good idea to ensure that, at
 the time of creation, a new account must be linked to an existing customer. In a relational
 database this is easily enforced using the concept of foreign keys to ensure that customer IDs
 are filled in while creating an account, and also that said customer IDs already exist in
 another table. This is not possible with flat databases.
Another limitation of flat databases vis-a-vis relational databases is the former’s lack of query
 and indexing capability.
Some real-life examples of flat databases are contact lists in a mobile phone and the storage of
 a high-scores list in a simple video game.
                               Data Abstraction in DBMS
While sending an email to some have you ever thought of where the email is physically stored
, what data model is used ? The answer is No. What we need to know is that we have to send
an email to that particular email address. This is called Data abstraction.
Now let's look at the levels of data abstractions in DBMS and discuss them in detail.
1. Physical or Internal Level
It is the lowest level of abstraction for DBMSs, defining how data is stored, data structures for
 storing data, and database access mechanisms.
Developers or database application programmers decide how to store data in the database. It is
 complex to understand.
2. Logical or Conceptual Level
The logical level is the next higher level or intermediate level. It explains what data is stored
 in the database and how those data are related. It seeks to explain the complete or entire data
 by describing what tables should be constructed and what the linkages between those tables
 should be. It is less complex than the physical level.
3. View or External Level
This is the top level. There are various views at the view level, with each view defining only a
 portion of the total data. It also facilitates user engagement by providing a variety of views or
 numerous views of a single database. All users have access to the view level. This is the
 easiest and most simple level.
                                    Data Independence
The primary goal of data abstractions in DBMS is to obtain data independence in order to save
 time and money when modifying or altering a database.
Data independence is known as the ability to change the scheme without affecting the
 programs and applications to be rewritten.
Data Independence is mainly of two types :
1. Physical level independence
It refers to the ability to change the physical schema without changing the conceptual or
 logical schema, which is done for optimization purposes.
2. Logical level independence
This feature is referred to as the ability to change the logical schema without changing the
 external schema or application program.
Any modifications to the conceptual representation of the data would not affect the user's
 perception of the data.
 DML Compiler: It processes the DML statements into low level instruction (machine
  language), so that they can be executed.
 DDL Interpreter: It processes the DDL statements into a set of table containing meta
  data (data about data).
 Embedded DML Pre-compiler: It processes DML statements embedded in an
  application program into procedural calls.
 Query Optimizer: It executes the instruction generated by DML Compiler.
2. Storage Manager: Storage Manager is a program that provides an interface between
  the data stored in the database and the queries received. It is also known as Database
  Control System. It maintains the consistency and integrity of the database by applying
  the constraints and executing the DCL statements. It is responsible for updating,
  storing,      deleting,      and      retrieving     data     in      the      database.
  It contains the following components –
 Authorization Manager: It ensures role-based access control, i.e,. checks whether
  the particular person is privileged to perform the requested operation or not.
 Integrity Manager: It checks the integrity constraints when the database is modified.
   File Manager: It manages the file space and the data structure used to represent
    information in the database.
   Buffer Manager: It is responsible for cache memory and the transfer of data between
    the secondary storage and main memory.
   Data Dictionary: It contains the information about the structure of any database
    object. It is the repository of information that governs the metadata.
   Indices: It provides faster retrieval of data item.
                   INTRODUCTION TO DATABASE DESIGN
What is an ER Diagram?
An Entity Relationship Diagram (ER Diagram) pictorially explains the relationship between
entities to be stored in a database. Fundamentally, the ER Diagram is a structural design of
the database. It acts as a framework created with specialized symbols for the purpose of
defining the relationship between the database entities. ER diagram is created based on three
principal components: entities, attributes, and relationships.
The following diagram showcases two entities - Student and Course, and their relationship.
 The relationship described between student and course is many-to-many, as a course can be
 opted by several students, and a student can opt for more than one course. Student entity
 possesses attributes - Stu_Id, Stu_Name & Stu_Age. The course entity has attributes such as
 Cou_ID & Cou_Name.
What is an ER Model?
An Entity-Relationship Model represents the structure of the database with the help of a
diagram. ER Modelling is a systematic process to design a database as it would require you to
analyze all data requirements before implementing your database.
History of ER models
Peter Chen proposed ER Diagrams in 1971 to create a uniform convention that can be used as
 a conceptual modeling tool. Many models were presented and discussed, but none were
 suitable. The data structure diagrams offered by Charles Bachman also inspired his model.
   ER Diagram helps you conceptualize the database and lets you know which fields need to
    be embedded for a particular entity
   ER Diagram gives a better understanding of the information to be stored in a database
   It reduces complexity and allows database designers to build databases quickly
   It helps to describe elements using Entity-Relationship models
   It allows users to get a preview of the logical structure of the database
           Entities
                 Strong Entity
                 Weak Entity
           Attributes
                 Key Attribute
                 Composite Attribute
                 Multi-valued Attribute
                   Derived Attribute
              Relationships
                   One-to-One Relationships
                   One-to-Many Relationships
                   Many-to-One Relationships
                   Many-to-Many Relationships
Entities
An entity can be either a living or non-living component.
For example, in a student study course, both the student and the course are entities.
Weak Entity
An entity that makes reliance over another entity is called a weak entity
In the example below, school is a strong entity because it has a primary key attribute - school
 number.
Unlike school, the classroom is a weak entity because it does not have any primary key and
the room number here acts only as a discriminator.
Attribute
An attribute exhibits the properties of an entity.
Key Attribute
Key attribute uniquely identifies an entity from an entity set.
For example: For a student entity, the roll number can uniquely identify a student from a set of
 students.
Composite Attribute
An attribute that is composed of several other attributes is known as a composite attribute.
An oval showcases the composite attribute, and the composite attribute oval is further
connected with other ovals.
Multi-valued Attribute
Some attributes can possess over one value, those attributes are called multivalued attributes.
The double oval shape is used to represent a multivalued attribute.
Derived Attribute
An attribute that can be derived from other attributes of the entity is known as a derived
attribute.
Relationship
The diamond shape showcases a relationship in the ER diagram.
In the example below, both the student and the course are entities, and study is the relationship
 between them.
One-to-One Relationship
When a single element of an entity is associated with a single element of another entity, it is
called a one-to-one relationship.
For example, a student has only one identification card and an identification card is given to
 one person.
One-to-Many Relationship
When a single element of an entity is associated with more than one element of another entity,
it is called a one-to-many relationship
For example, a customer can place many orders, but an order cannot be placed by many
 customers.
Many-to-One Relationship
When more than one element of an entity is related to a single element of another entity, then
it is called a many-to-one relationship.
For example, students have to opt for a single course, but a course can have many students.
Many-to-Many Relationship
When more than one element of an entity is associated with more than one element of another
entity, this is called a many-to-many relationship.
For example, you can assign an employee to many projects and a project can have many
 employees.
      First,
            identify all the Entities. Embed all the entities in a rectangle and label them
      properly.
      Identify relationships between entities and connect them using a diamond in the middle,
      illustrating the relationship. Do not connect relationships with each other.
      Connect    attributes for entities and label them properly.
      Eradicate   any redundant entities or relationships.
      Make     sure your ER Diagram supports all the data provided to design the database.
      Effectively   use colors to highlight key areas in your diagrams.
Conclusion
ER Diagram in DBMS is widely used to describe the conceptual design of databases. It helps
 both users and database developers to preview the structure of the database before
 implementing the database.
Entity
An entity in DBMS (Database Management System) is a real-world object that has certain
properties called attributes that define the nature of the entity.
Entities are divided into two categories. These two categories of an entity are tangible entities
 and non-tangible entities.
Tangible Entities: Tangible entities are the entities that physically exist in the real world. For
 example, the entity of cars, the entity of books, etc.
Non-Tangible Entities: Non-tangible entities are entities that do not physically exist in the
real world. For example, email id, social media account, etc.
    In ER diagram,
            Attributes are associated with an entity set.
            Attributes describe the properties of entities in the entity set.
            Based on the values of certain attributes, an entity can be identified uniquely.
Symbols Used-
Example-
In this ER diagram,
 Two   strong entity sets “Student” and “Course” are related to each other.
 Student  ID and Student name are the attributes of entity set “Student”.
 Student ID is the primary key using which any student can be identified uniquely.
 Course ID and Course name are the attributes of entity set “Course”.
 Course ID is the primary key using which any course can be identified uniquely.
 Double line between Student and relationship set signifies total participation.
 It suggests that each student must be enrolled in at least one course.
 Single line between Course and relationship set signifies partial participation.
 It suggests that there might exist some courses for which no enrollments are made.
Weak Entity Set-
               A weak entity set is an entity set that does not contain sufficient attributes to
                uniquely identify its entities.
               In other words, a primary key does not exist for a weak entity set.
               However, it contains a partial key called as a discriminator.
               Discriminator can identify a group of entities from the entity set.
               Discriminator is represented by underlining with a dashed line.
NOTE-
      The combination of discriminator and primary key of the strong entity set makes it
       possible to uniquely identify all entities of the weak entity set.
      Thus, this combination serves as a primary key for the weak entity set.
      Clearly, this primary key is not formed by the weak entity set completely.
Symbols Used-
Example-
In this ER diagram,
         One strong entity set “Building” and one weak entity set “Apartment” are related to each
          other.
         Strong entity set “Building” has building number as its primary key.
         Door number is the discriminator of the weak entity set “Apartment”.
         This is because door number alone can not identify an apartment uniquely as there may be
          several other buildings having the same door number.
         Double line between Apartment and relationship set signifies total participation.
         It suggests that each apartment must be present in at least one building.
         Single line between Building and relationship set signifies partial participation.
         It suggests that there might exist some buildings which has no apartment.
Thus,
Primary key of Apartment = Primary key of Building + Its own discriminator= Building
 number + Door number
Differences between Strong entity set and Weak entity set-
         Strong entity set                           Weak entity set
         A single rectangle is used for the           A double rectangle is used for the
        representation of a strong entity set.       representation of a weak entity set.
         A diamond symbol is used for the             A double diamond symbol is used for the
        representation of the relationship that      representation of the identifying relationship
        exists between the two strong entity         that exists between the strong and weak
        sets.                                        entity set.
         Total participation may or may not           Total participation always exists in the
        exist in the relationship.                   identifying relationship.
Note-
In ER diagram, weak entity set is always present in total participation with the identifying
 relationship set.
Relationship Set
Relationship Example:
‘Enrolled in’ is a relationship that exists between entities Student and Course.
Relationship Set:
A relationship set is a set of relationships of same type.
The number of entity sets that participate in a relationship set is termed as the degree of that
 relationship set. Thus,
On the basis of degree of a relationship set, a relationship set can be classified into the
following types-
Unary relationship set is a relationship set where only one entity set participates in a
relationship set.
Example:
Binary relationship set is a relationship set where two entity sets participate in a relationship
set.
Example:
Ternary relationship set is a relationship set where three entity sets participate in a relationship
 set.
Example:
N-ary relationship set is a relationship set where ‘n’ entity sets participate in a relationship set.
Specialization:
 In specialization, an entity is divided into sub-entities based on their characteristics. It is a
 top-down approach where higher level entity is specialized into two or more lower level
 entities. For Example, EMPLOYEE entity in an Employee management system can be
 specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
 attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and
 specialized attributes like TES_TYPE become part of specialized entity (TESTER).
Aggregation                                                                                  –
An ER diagram is not capable of representing relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher level entity. Aggregation is an abstraction
through which we can represent relationships as higher level entity sets.
For Example, Employee working for a project may require some machinery. So, REQUIRE
 relationship is needed between relationship WORKS_FOR and entity MACHINERY. Using
 aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
 aggregated into single entity and relationship REQUIRE is created between aggregated
 entity and MACHINERY.