DBMS - Unit I
DBMS - Unit I
UNIT – I
                                                         1
1.1 A HISTORICAL PERSPECTIVE
❖ From the earliest days of computers, storing and manipulating data is the major application focus.
❖ The first general purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and
    was called the Integrated Data Store.
❖ It formed the basis for the network data model, and strongly influenced database systems
    through the 1960s.
❖ In the late 1960s, IBM developed the Information Management System (IMS) DBMS, IMS formed the basis
    for the hierarchical data model.
❖ In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new data representation framework
    called the relational data model.
❖ This is a big change in the development of database systems, it ignited rapid development of several DBMSs
    based on the relational model along with a rich body of theoretical results.
❖ Database systems matured as an academic discipline, and the popularity of relational DBMSs changed the
    commercial landscape. Their benefits were widely recognized, and the use of DBMSs for managing corporate
    data became standard practice.
❖ In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database
    systems continued to gain widespread use.
❖ The SQL query language for relational databases, developed as part of IBM's System R project, is now the
    standard query language. SQL was standardized in the late 1980s.
❖ In the late 1980s and the 1990s, advances and researches have been made in many areas of database systems
    to support complex analysis of data from all parts of an enterprise.
❖ Several vendors (e.g., IBM's DB2, Oracle 8, Informix UDS) have extended their systems with the ability to
    store new data types such as images and text, and with the ability to ask more complex queries.
❖ Specialized systems have been developed by numerous vendors for creating data warehouses, consolidating
    data from several databases, and for carrying out specialized analysis.
❖ Enterprise resource planning (ERP) and management resource planning (MRP) packages have been developed.
❖ Most significantly DBMSs have entered into the Internet Age.
❖ The use of a DBMS to store data that is accessed through a Web browser is becoming          widespread.
❖ Queries are generated through Web-accessible forms and answers are formatted using a markup language such
    as HTML, in order to be easily displayed in a browser.
❖ All the database vendors are adding features to their DBMS aimed at making it more suitable for deployment
    over the Internet.
                                                         2
 ❖ Database management continues to gain importance as more and more data is brought on-line, and made ever
      more accessible through computer networking.
 ❖ Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital
      libraries, and a host of scientific projects such as the human genome mapping effort and NASA's Earth
      Observation System project.
 ❖ Companies are using DBMS concept in their decision-making processes and mine their data repositories for
      useful information about their businesses.
 1.2 FILE SYSTEMS VERSUS A DBMS
 1.2.1 File Processing System
 ❖ In case of file-processing system, The system stores permanent records in various files with the support of
      operating system.
 ❖ Many application programs are used to manipulate (add, Extract, update, delete) the stored data.
 ❖ Ex:- Consider part of a savings-bank enterprise that keeps information about all customers and savings
      accounts.
 ❖ To allow users to manipulate the information, the system has a number of application programs
      • A program to debit or credit an account
      • A program to add a new account
      • A program to find the balance of an account
 ❖ System programmers wrote these application programs to meet the needs of the bank.
 ❖ New application programs are added to the system as the need arises.
 ❖ For example, suppose that the savings bank decides to generate monthly statements. As a result, the bank
      creates new permanent files that contain information about to generate monthly statements.
 ❖ Thus, as time goes by, the system acquires more files and more application programs.
 ❖ Keeping organizational information in a file-processing system has a number of major disadvantages:
 1.2.2 File Processing System Disadvantages
1.   Data redundancy and inconsistency        2. Difficulty in accessing data. 3. Data isolation
4.   Integrity problems 5. Atomicity problems         6. Concurrent-access anomalies               7.    Security
problems
 1. Data redundancy and inconsistency: -
 ❖ Since different programmers create the files and application programs over a long period, the various files are
      likely to have different formats and the programs may be written in several programming languages. Moreover,
      the same information may be duplicated in several places (files).
 ❖ For example, the address and telephone number of a particular customer may appear in a file that consists of
      savings-account records and in a file that consists of checking-account records. This redundancy leads to
                                                          3
     higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
     same data may no longer agree. For example, a changed customer address may be reflected in savings-account
     records but not elsewhere in the system.
2. Difficulty in accessing data: -
❖ Suppose that one of the bank officers needs to find out the names of all customers who live within a particular
     postal-code area. The officer asks the data-processing department to generate such a list. Because the designers
     of the original system did not anticipate this request, there is no application program on hand to meet it. There
     is, however, an application program to generate the list of all customers. The bank officer has now two choices:
     either obtain the list of all customers and extract the needed information manually or ask a system programmer
     to write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such
     a program is written, and that, several days later, the same officer needs to trim that list to include only those
     customers who have an account balance of $10,000 or more. As expected, a program to generate such a list
     does not exist. Again, the officer has the preceding two options, neither of which is satisfactory.
❖ The point here is that conventional file-processing environments do not allow needed data to be retrieved in a
     convenient and efficient manner. More responsive data-retrieval systems are required for general use.
3. Data isolation: -
❖ Because data are scattered in various files, and files may be in different formats, writing new application
     programs to retrieve the appropriate data is difficult.
4. Integrity problems: -
❖ The data values stored in the database must satisfy certain types of consistency constraints.
❖ For example, the balance of a bank account may never fall below a prescribed amount (say, 1000Rs).
     Developers enforce these constraints in the system by adding appropriate code in the various application
     programs. However, when new constraints are added, it is difficult to change the programs to enforce them.
     The problem is compounded when constraints involve several data items from different files.
5. Atomicity problems: -
❖ A computer system, like any other mechanical or electrical device, is subject to failure.
❖ In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that existed
     prior to the failure.
❖ Consider a program to transfer Rs 5000 from account A to account B. If a system failure occurs during the
     execution of the program, it is possible that the Rs5000 was removed from account A but was not credited to
     account B, resulting in an inconsistent database state. Clearly, it is essential to database consistency that either
     both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen
     in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
6. Concurrent-access anomalies: -
                                                            4
❖ For the sake of overall performance of the system and faster response, many systems allow multiple users to
    update the data simultaneously. In such an environment, interaction of concurrent updates may result in
    inconsistent data.
❖ Consider bank account A, containing Rs 10000. If two customers withdraw funds (say Rs 5000 and Rs1000
    respectively) from account A at about the same time, the result of the concurrent executions may leave the
    account in an incorrect (or inconsistent) state.
❖ Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by
    the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both
    read the value 10000, and write back Rs5000 and Rs 9000, respectively. Depending on which one writes the
    value last, the account may contain Rs 5000 or Rs 9000, rather than the correct value of Rs 4000.
❖ To guard against this possibility, the system must maintain some form of supervision. But supervision is
    difficult to provide because data may be accessed by many different application programs that have not been
    coordinated previously.
7. Security problems: -
❖ Not every user of the database system should be able to access all the data.
❖ For example, in a banking system, payroll personnel need to see only that part of the database that has
    information about the various bank employees.
❖ They do not need access to information about customer accounts. But, since application programs are added
    to the system in an ad hoc manner, enforcing such security constraints is difficult.
1.2.3 DBMS
❖ DBMS is a piece of software that is designed to make the preceding tasks easier.
❖ By storing data in a DBMS, we can use the DMBS features to manage the data in an efficient manner.
❖ DBMS supports huge volume of data and more number of users.
1.2.4   Advantages of DBMS
❖ Using a DBMS to manage data has many advantages:
1. Data independence: -
❖ Application programs should be as independent as possible from details of data representation and storage.
❖ The DBMS can provide an abstract view of the data to insulate application code from such details.
2. Efficient data access: -
❖ A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently.
❖ This feature is especially important if the data is stored on external storage devices.
3. Data integrity and security: -
❖ If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data.
                                                         5
❖ For example, before inserting salary information for an employee, the DBMS can check that the department
    budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different
    classes of users.
4. Data administration:-
❖ When several users share the data, centralizing the administration of data can offer significant improvements.
❖ Experienced professionals who understand the nature of the data being managed, and how different groups of
    users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-
    tuning the storage of the data to make retrieval efficient.
5. Concurrent access and crash recovery: -
❖ A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being
    accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures.
6. Reduced application development time: -
❖ Clearly, the DBMS supports many important functions that are common to many applications accessing data
    stored in the DBMS.
❖ This, in conjunction with the high-level interface to the data, facilitates quick development of applications.
    Such applications are also likely to be more robust than applications developed from scratch because many
    important tasks are handled by the DBMS instead of being implemented by the application.
1.3 THE DATA MODEL
❖ Underlying the structure of a database is the data model
❖ Data model is a collection of conceptual tools for describing data, data relationships, data semantics (meaning
    and use of specific pieces of data), and consistency constraints.
❖ To illustrate the concept of a data model, we use two data models to describe the design of a database at the
    logical level they are 1. The E-R Model,          2. Relation Model
1.3.1 The Entity-Relationship Model
❖ The entity-relationship (E-R) data model is based on a perception of a real world that consists of a collection
    of basic objects, called entities, and of relationships among these objects.
❖ An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example,
    each person is an entity, and bank accounts can be considered as entities.
❖ Entities are described in a database by a set of attributes.
❖ For example, the attributes account-number and balance may describe one particular account in a bank.
❖ Attributes customer-name; customer-street address and customer-city may describe a customer entity.
❖ An extra attribute customer-id is used to uniquely identify customers (since it may be possible to have two
    customers with the same name, street address, and city).
Note: - In India we use Aadhar number as unique number.
                                                          6
❖ A relationship is an association among several entities. For example, a depositor relationship associates a
    customer with each account that he has.
❖ The set of all entities of the same type and the set of all relationships of the same type are termed an entity set
    and relationship set, respectively.
❖ The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram, which
    is built up from the following components:
   Percentage Table
                                            Roll Number         %
                                            13601A501           94
                                            13601A502           90
                                            13601A503           91
                                            13601A504           88
                                                          8
                                            Figure 1.2 Network Data model
4. Hierarchical data model: -
❖ A hierarchical model represents the data in a tree-like structure in which there is a single parent for each record.
                                                          9
                                   Figure 1.4: Levels of Abstraction in a DBMS
❖ The database description consists of a schema at each of these three levels of abstraction
i) Physical Schema: -
❖ Also called as Physical level or internal level.
❖ The physical schema describes the database design at the physical level
❖ It is the lowest level of abstraction for DBMS which defines how the data is actually stored.
❖ It defines data-structures to store data and access methods used by the database.
❖ Actually, it is decided by developers or database application programmers how to store the data in the database.
❖ The entire database is described in this level that is physical or internal level.
❖ It is a very complex level to understand.
❖ We must decide what file organizations to use to store the relations, and create auxiliary data structures called
   indexes to speed up data retrieval operations.
❖ We create and uses indexes for faster retrieval of the data.
ii) Conceptual schema: -
❖ The conceptual schema also called the logical schema or logical level
❖ It describes what data is stored in the database and what relationship exists among those data.
❖ It is less complex than the physical level.
❖ Logical level is used by developers or database administrators (DBA). So, overall, the logical level contains
   tables (fields and attributes) and relationships among table attributes.
iii) External schema: -
❖ It is the highest level and also called as view level.
❖ A view is conceptually a relation, but the records in a view are not stored in the DBMS.
❖ Rather, they are computed using a definition for the view, in terms of relations stored in the DBMS.
❖ There are different levels of views and every view only defines a part of the entire data.
                                                           10
❖ External schemas, allow data access to be customized (and authorized) at the level of individual users or groups
   of users. i.e. it can be used by all users (all levels' users).
❖ It also simplifies interaction with the user and it provides many views or multiple views of the same database.
❖ This level is the least complex and easy to understand.
❖ For example, a user can interact with a system using GUI that is view level and can enter details at GUI or
   screen and the user does not know how data is stored and what data is stored, this detail is hidden from the user.
❖ Each external schema consists of a collection of one or more views and relations from the conceptual schema.
1.5 DATA INDEPENDENCE
❖ It is the ability to modify the scheme without affecting the programs and the application to be rewritten.
❖ Data is separated from the programs, so that the changes made to the data will not affect the program execution
   and the application.
❖ The data independence provides the database in simple structure
❖ We know the main purpose of the three levels of data abstraction is to achieve data independence.
❖ If the database changes and expands over time, it is very important that the changes in one level should not
   affect the data at other levels of the database.
❖ This would save time and cost required when changing the database.
❖ There are two levels of data independence based on three levels of abstraction. They are
   i)        Physical Data Independence                  ii) Logical Data Independence
i) Physical Data Independence: -
❖ Physical Data Independence means changing the physical level without affecting the logical level or conceptual
    level.
❖ Using this property, we can change the storage device of the database without affecting the logical schema.
❖ The changes in the physical level may include changes using the following −
   i)        A new storage device like magnetic tape, hard disk, etc.
   ii)       A new data structure for storage.
   iii)      A different data access method or using an alternative files organization technique.
   iv)       Changing the location of the database.
ii) Logical Data Independence: -
❖ Logical view of data is the user view of the data.
❖ It presents data in the form that can be accessed by the end users.
❖ Codd’s Rule of Logical Data Independence says that users should be able to manipulate the logical View of
    data without any information of its physical storage.
❖ Software or the computer program is used to manipulate the logical view of the data.
                                                             11
❖ Database administrator is the one who decides what information is to be kept in the database and how to use
    the logical level of abstraction.
❖ The changes in the logical level may include
    i)      Change the data definition.
   ❖ Adding, deleting, or updating any new attribute, entity or relationship in the database.
                                                       12
                                              Figure 1.5: The Structure of DBMS
❖        The above figure shows the structure of a typical DBMS based on the relational data model.
❖        The DBMS accepts SQL commands generated from a variety of user interfaces, produces query evaluation
         plans, executes these plans against the database, and returns the answers.
❖        The Query evaluation Engine consists
    i)        Plan Executor      ii) Parser        iii) Operator Evaluator        iv) Optimizer
1) Plan Executor: - It prepares the execution plans. An execution plan is a blueprint for
    evaluating a query, and is usually represented as a tree of relational operators
2) Parser: - It translates a textual format into a combination of internal binary structures that can
    be easily manipulated by the optimizer.
3) Optimizer: - A query optimizer is a critical database management system (DBMS) component that analyzes
Structured Query Language (SQL) queries and determines efficient execution mechanisms. It picks the lowest cost
evaluation plan from among the alternatives.
4) Operator evaluator: - Which executes low-level instructions generated by the DML compiler.
❖ DML means Data Manipulation Language commands - Insert, Update, Delete, Select etc..
❖ DDL means Data Definition Language commands - Create, Alter, Drop, Truncate
Files and Access methods: - Supports in storing the records in the form of files (pages), organizing the files,
allocating the indexes, fetching the files and also keeps the track of files.
Buffer Manager: - It is a critical part of the database system, it enables the database to handle data sizes that are
much larger than the size of main memory.
          A Buffer Manager is responsible for allocating space to the buffer (space in main memory) in order to store
data into the buffer. It is responsible for fetching data from disk storage into main memory, and deciding what data
to cache in main memory.
Disk Space Manager: - The lowest layer of the DBMS software deals with management of space on disk, where
the data is stored. Higher layers allocate, de-allocate, read, and write pages through this layer.
❖ The DBMS supports concurrency and crash recovery by carefully scheduling user requests
     and maintaining a log of all changes to the database.
❖ The concurrency control mechanism consists
     i) Transaction Manager ii) Lock Manager
i) Transaction Manager: - A transaction manager is a part of an application that controls the
coordination of transactions over one or more resources.
          The transaction manager is responsible for creating transaction objects and managing them.
ii) Lock Manager: - The lock manager keeps track of requests for locks and grants locks on database objects when
they become available.
                                                              13
The Recovery manager: - It is responsible for maintaining a log, and restoring the system to a consistent state
after a crash.
           The disk space manager, buffer manager, and file and access method layers must interact with these
components.
                                                           14
❖     In this step we must consider typical expected workloads that our database must support and further refine
      the database design to ensure that it meets desired performance criteria.
❖     This step involve in building indexes on some tables and clustering some tables, or it may involve a
      substantial redesign of parts of the database schema obtained from the earlier design steps.
6) Security Design:
❖     In this step, we identify different user groups and different roles played by various users (e.g., the
      development team, the customer support representatives, and the product manager).
❖     For each role and user group, we must identify the parts of the database that they must be able to access and
      must not be able to access.
    Note: The ER model is most relevant to the first three steps:
1.8 ENTITIES, ATTRIBUTES, AND ENTITY SETS
❖ An entity is an object in the real world that is distinguishable from other objects.
❖ It is often useful to identify a collection of similar entities.
❖ Such a collection is called an entity set.
❖ An entity is described using a set of attributes.
❖ All entities in a given entity set have the same attributes; this is essentially what we mean by similar.
❖ For each attribute associated with an entity set, we must identify a domain of possible values.
❖ For each entity set, we choose a key.
❖ A key is a minimal set of attributes whose values uniquely identify an entity in the set.
❖ There could be more than one candidate key; if so, we designate one of them as the primary key.
❖ Each attribute in the primary key is underlined
❖ The Employees entity set with attributes ssn, name, and lot is shown in Figure.
5. Derived Attributes:
❖ An attribute which is derived from another attribute is called as a ‘derived attribute.
Example: 1.‘Age’ attribute is derived from another attribute ‘Date’.
2. Average_salary in a department should not be saved directly in the database, instead it can be derived
6. Stored Attributes:
❖ An attribute which is not derived from another attribute is called as a ‘stored attribute.
❖ Stored attributes are those attributes that are stored in the physical database
Example: In the above example,’ Date’ is a stored attribute.
1.9 RELATIONSHIPS AND RELATIONSHIP SETS
❖ A relationship is an association among two or more entities.
❖ A set of similar relationships can be collected into a relationship set.
                                                          16
                                     Figure: 1.7 The works_in relationship set
❖ Here each relationship indicates a department in which an employee works.
❖ Note that several relationship sets might involve the same entity sets. For example, we could also have a
   Manages relationship set involving Employees and Departments.
❖ A relationship can also have descriptive attributes.
❖ Descriptive attributes are used to record information about the relationship.
❖ An instance of a relationship set is a set of relationships.
❖ Can be thought of as a `snapshot' of the relationship set at some instant of time.
❖ An instance of the Works_In relationship set is shown in the below figure
                                                         17
❖ For the simplicity purpose, since value is shown beside each relationship
❖ As another example of an ER diagram, suppose that each department has offices in several locations and we
    want to record the locations at which each employee works.
❖ This relationship is ternary because we must record an association between an employee, a department, and a
    location.
❖ The ER diagram for this variant of Works In, which we call Works In2, is shown in Figure
                                                          18
❖ If an entity set plays more than one role, the role indicator concatenated with an attribute name from the entity
    set gives us a unique name for each attribute in the relationship set.
❖ For example, the Reports To relationship set has attributes corresponding to the ssn of the supervisor and the
   ssn of the subordinate, and the names of these attributes are supervisor ssn and subordinate ssn.
1.10 ADDITIONAL FEATURES OF THE ER MODEL
❖ We now look at some of the constructs in the ER model that allow us to describe some subtle (delicate)
    properties of the data.
❖ The expressiveness of the ER model is a big reason for its widespread use.
1.10.1 Key Constraints
❖ Key constraints are used to apply some condition on the relationship that exist between the entities.
❖ Consider the Works In relationship shown in Figure 1.7 An employee can work in several departments, and a
    department can have several employees.
❖ Now consider another relationship set called Manages between the Employees and Departments entity sets
    such that each department has at most one manager, although a single employee is allowed to manage more
    than one department.
❖ The restriction that each department has at most one manager is an example of a key constraint, and it implies
    that each Departments entity appears in at most one Manages relationship in any allowable instance of
    Manages.
❖ This restriction is indicated in the ER diagram of Figure 1.11 by using an arrow from Departments to Manages.
    The arrow states that given a Departments entity, we can uniquely determine the Manages relationship in which
    it appears.
❖ A relationship set like Manages is sometimes said to be one-to-many, to indicate that one employee can be
    associated with many departments (in the capacity of a manager), whereas each department can be associated
    with at most one employee as its manager.
                                                         19
❖ In contrast, the Works In relationship set, in which an employee is allowed to work in several departments and
    a department is allowed to have several employees, is said to be many-to-many.
❖ If we add the restriction that each employee can manage at most one department to the Manages relationship
    set, which would be indicated by adding an arrow from Employees to Manages in Figure 1.11, we have a one-
    to-one relationship set.
1.10.2 Key Constraints for Ternary Relationships
❖ If an entity set E has a key constraint in a relationship set R, each entity in an instance of E appears in at most
     one relationship in (a corresponding instance of) R. To indicate a key constraint on entity set E in relationship
     set R, we draw an arrow from E to R.
❖ In Figure 1.12, we show a ternary relationship with key constraints.
                                                         20
                                          Figure 1.13 An Instance of Works In3
❖ An instance of the Works In3 relationship set is shown in Figure 2.9. Notice that each department can be
    associated with several employees and locations, and each location can be associated with several departments
    and employees; however, each employee is associated with a single department and location.
1.10.3 Participation Constraints
❖ The participation constraint specifies the number of instances of an entity can participate in a relationship set.
❖ If all the instances of an entity participates in the relationship then it is said to be total participation
❖ Example:- Every employees works at least in any one department
❖ If the relationship is total, the two are connected by a thick line
❖ A participation that is not total is said to be partial. Connected with normal line.
❖ Example: - The participation of the entity set Employees in Manages is partial, since not every employee gets
    to manage a department. If the participation of an entity set
❖ The presence of an arrow indicates a key constraint.
❖    It breaks an entity into multiple entities from higher level (super class) to lower level (subclass).
❖    Example: - The class vehicle can be specialized into Car, Truck and Motorcycle ( Top Down Approach)
     Hence, vehicle is the super class and Car, Truck, Motorcycle are subclasses.
❖    All three of these inherit attributes from vehicle.
❖    It shows the relationships between the classes in the form of an organization chart.
                                                           22
                                          Figure 1.16 Class Hierarchy
                                                       23
❖    Aggregation refers to the process by which entities are combined to form a single meaningful entity.
❖    The specific entities are combined because they do not make sense on their own.
❖    In aggregation, relationship with its corresponding entities is aggregated into a higher level entity.
❖    For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
     relationship with another entity visitor.
❖    In the real world, if a visitor visits a coaching center then he will never enquiry about the Course only or just
     about the Center instead he will ask the enquiry about both.
25