Database Engineering
Unit 1
Introduction to DBMS
                       Introduction
•   Data
•   raw, unorganized facts that need to be processed.
•   Data can be something simple and seemingly
    random and useless until it is organized.
•   Information
•   When data is processed, organized, structured or
    presented in a given context so as to make it useful,
•   it is called information.
                       Introduction
•   Database
    •   collection of related data
•   Database Management System (DBMS)
    •   software that manages and controls access to the
        database.
•   Database application
    •   A program that interacts with the database at
        some point in its execution.
                        Introduction
•   Database Management System (DBMS) -
•   Collection of interrelated data   - database
•   Set of programs to access those data
•   Primary Goal –
•   store and retrieve data in convenient and efficient manner
•   Define structure of information
•   Mechanism for manipulation
•   Safety against system crash and unauthorized access
•   Avoid anomalous result while sharing
          Database System Applications
•   Banking: all transactions
•   Airlines: reservations, schedules
•   Universities: registration, grades
•   Sales: customers, products, purchases
•   Manufacturing: production, inventory, orders, supply chain
•   Human resources: employee records, salaries, tax
    deductions
          Purpose of Database System
    Database Systems versus Traditional File based
                      Systems
•     Early days, database applications were built on top of file
      systems
•     Drawbacks of using file systems to store data:
•     Data redundancy and inconsistency
     • Multiple file formats, duplication of information in
       different files
•     Difficulty in accessing data
     • Need to write a new program to carry out each new task
•     Data isolation — multiple files and formats
•   Integrity problems
    • Integrity constraints (e.g. account balance < 0) become
      part of program code
    • Hard to add new constraints or change existing ones
•   Atomicity of updates
    • Failures may leave database in an inconsistent state with
      partial updates carried out
    • E.g. transfer of funds from one account to another should
      either complete or not happen at all
•   Concurrent access by multiple users
    • Concurrent acces needed for performance
    • Uncontrolled concurrent accesses can lead to
      inconsistencies
    • E.g. two people reading a balance and updating it at the
      same time
•   Security problems
•   Database Approach
    All the above limitations of the file-based approach
    can be attributed to two factors:
    (1) the definition of the data is embedded in the
    application programs, rather than being
    stored separately and independently;
•
    (2) there is no control over the access and
    manipulation of data beyond that imposed by
    the application programs.
•   To become more effective, a new approach was
    required.
•   What emerged were the database and the Database
    Management System (DBMS).
    History of Database Management Systems
•   DBMS has its roots in the 1960s
•   Apollo moon-landing project
•   Which was initiated in response to President
    Kennedy’s objective of landing a man on the moon
•   At that time there was no system available that would
    be able to handle and manage the vast amounts of
    information that the project would generate.
•   North American Aviation (NAA, now Rockwell
    International), the prime contractor for the project,
    developed software known as GUAM (Generalized
    Update Access Method).
•   hierarchical structure
•   In the mid-1960s, IBM joined NAA to develop GUAM
    into what is now known as IMS (Information
    Management System).
•   In the mid-1960s, another significant development
    was the emergence of IDS (Integrated
    Data Store) from General Electric.
•   This work was headed by one of the early pioneers
    of database systems, Charles Bachmann.
•   This development led to a new type of database
    system known as the network DBMS
•   The network database was developed partly to
    address the need to represent more complex data
    relationships than could be modeled with hierarchical
    structures, and partly to impose a database standard.
•   To help establish such standards, the Conference on
    Data Systems Languages (CODASYL), comprising
    representatives of the US government and the world
    of business and commerce, formed a List Processing
    Task Force in 1965, subsequently renamed the Data
    Base Task Group (DBTG) in 1967.
•   The terms of reference for the DBTG were to define
    standard specifications for an environment that
    would allow database creation and data
    manipulation.
•   A draft report was issued in 1969 and the first
    definitive report in 1971.
•   The DBTG proposal identified three components:
•   the network schema – the logical organization of the
    entire database as seen by the
    DBA – which includes a definition of the database
    name, the type of each record, and
    the components of each record type;
•   the subschema – the part of the database as seen
    by the user or application program;
•   a data management language to define the data
    characteristics and the data structure, and
    to manipulate the data.
•   for standardization, the DBTG specified three distinct
    languages:
    Data Definition Language (DDL), which enables the
    DBA to define the schema;
    a subschema DDL, which allows the application
    programs to define the parts of the database they
    require;
    Data Manipulation Language (DML), to manipulate
    the data.
    Advantages and Disadvantages of DBMSs
•   Advantages
•   Control of data redundancy
    •   traditional file-based systems waste space by
        storing the same information in more than one file.
•   Data consistency
    •   By eliminating redundancy, we reduce the risk of
        inconsistencies occurring.
    •   If a data item is stored only once in the database,
        any update to it has to be performed only once
        and the new value is available immediately to all.
•   More information from the same amount of data
    •   With the integration of the operational data, it may
        be possible for the organization to derive
        additional information from the same data.
•   Sharing of data
    •   database belongs to the entire organization and
        can be shared by all authorized users.
•   Improved data integrity
    •   Database integrity refers to the validity and
        consistency of stored data.
    •   Integrity is expressed in terms of constraints,
    •   which are consistency rules that the database is
        not permitted to violate.
    •   Constraints may apply to data items within a
        single record or they may apply to relationships
        between records.
•   Improved security
    •   Database security is the protection of the
        database from unauthorized users.
•   Enforcement of standards
    •   integration allows the DBA to define and enforce
        the necessary standards.
    •   These may include departmental, organizational,
        national, or international standards.
•   Economy of scale
    •   Combining all the organization’s operational data
        into one database, and creating a set of
        applications that work on this one source of data,
        can result in cost savings.
•   Improved data accessibility and responsiveness
    •   As a result of integration, data that crosses
        departmental boundaries is directly
        accessible to the end-users. This provides a
        system with potentially much more functionality
•   Increased productivity
    •   DBMS provides many of the standard functions
        that the programmer would normally have to write
        in a file-based application.
    •   At a basic level, the DBMS provides all the low-
        level file-handling routines that are typical in
        application programs.
•   Increased concurrency
    •   Many DBMSs manage concurrent database
        access and ensure problems cannot occur.
•   Improved backup and recovery services
    •   modern DBMSs provide facilities to minimize the
        amount of processing that is lost following a
        failure.
•   Disadvantages
•   Complexity
    •   The provision of the functionality we expect of a
        good DBMS makes the DBMS an extremely
        complex piece of software.
    •   Database designers and developers, the data and
        database administrators, and end-users must
        understand this functionality to take full
        advantage of it.
    •   Failure to understand the system can lead to bad
        design decisions, which can have serious
        consequences for an organization.
•   Size
    •   The complexity and breadth of functionality makes
        the DBMS an extremely large piece of software,
        occupying many megabytes of disk space and
        requiring substantial amounts of memory to run
        efficiently.
•   Cost of DBMSs
    •   The cost of DBMSs varies significantly, depending
        on the environment and functionality provided.
    •   a single-user DBMS for a personal computer may
        only cost US$100.
•   Additional hardware costs
    •   The disk storage requirements for the DBMS and
        the database may necessitate the purchase of
        additional storage space.
    •   To achieve the required performance, it may
        be necessary to purchase a larger machine
•   Cost of conversion
    •   In some situations, the cost of the DBMS and
        extra hardware may be insignificant
        compared with the cost of converting existing
        applications to run on the new DBMS and
        hardware.
    •   This cost also includes the cost of training staff to
        use these new systems
•   Performance
    •   a file-based system is written for a specific
        application, such as invoicing.
    •   As a result, performance is generally very good.
    •   However, the DBMS is written to be more
        general, to cater for many applications rather than
        just one.
    •   The effect is that some applications may not run
        as fast as they used to.
•   Higher impact of a failure
    •   The centralization of resources increases the
        vulnerability of the system.
    •   Since all users and applications rely on the
        availability of the DBMS, the failure of certain
        components can bring operations to a halt.
                       View of Data
•   Complex data structures used for efficiency
•   Hides certain details of how data are stored and maintained
•   Provides abstract view
•   Hides complexity
•   3 Levels of Data Abstraction
•   Physical Level
•   Logical Level
•   View Level
                     Physical Level
•   Lowest level of abstraction
•   Describes how data are actually stored
•   Describes complex low level data structures in detail
•   Blocks of consecutive storage locations- words, byte etc
•   Administrator may be aware of certain details of the
    physical organization
                      Logical Level
•   Next higher level of abstraction
•   Describes what data are stored in the database
•   And what relationship exists among those data
•   Describes entire database in terms of a small number of
    relatively simple structures
•   Do not bother with complex physical storage
•   Database administrator uses logical level
•   Who decides what information to keep in database
•   Each record is described by a type definition
•   Programmers also work at this level
                        View Level
•   Highest level of abstraction
•   Describes only part of entire database
•   Exists to simplify interaction with the system
•   System may provide many views for the same database
•   Set of application programs that hide details of the data
    types
•   Views also provides security mechanisms to prevent users
    from accessing certain parts of database
•   type customer = record
                name : string;
                street : string;
                city : integer;
                 end;
               Instances and Schemas
•   Database changes over time as information is inserted and
    deleted
•   Instance of the database –
•   Collection of information stored in the database at a
    particular moment
•   Schema –
•   The overall design of the database
•   Schemas are changed infrequently
              Instances and Schemas
•   Similar to types and variables in programming languages
•   Schema – the logical structure of the database
•   e.g. the database consists of information about a set of
    customers and accounts and the relationship between them
•   Analogous to type information of a variable in a program
•   In DBMS, Schema is the overall Design of the
    Database.
•   Instance is the information stored in the Database at
    a particular moment.
•   Database schema corresponds to the variable declaration
•   The value of the variable in a program at a point in time
    correspond to an instance of a database schema
•   Database systems have several schemas
•   Partitioned according to levels of abstraction
•   Physical Schema – describes database design at physical
    level
•   Logical Schema – describes database design at logical
    level
•   Several schemas at view level, called as sub-schemas
•   Describes different views of database
•   Physical Data Independence –
•   Logical schema is most important
•   the ability to modify the physical schema without changing
    the logical schema
•   Applications depend on the logical schema
•   In general, the interfaces between the various levels and
    components should be well defined so that changes in
    some parts do not seriously influence others.
                       Data Models
•   Underlying the structure of a database is data model
•   A collection of conceptual tools for describing data, data
    relationships, data semantics, and consistency constraints
•   The Entity-Relationship Model (E-R)
•   Relational Model
                  Relational Model
•   uses a collection of tables to represent both data and
    the relationships among those data.
•   Each table has multiple columns, and each column
    has a unique name.
•   relational model is an example of a record-based
    model.
•   Record-based models are so named because the
    database is structured in fixed-format records of
    several types.
•   Each table contains records of a particular type.
•   Each record type defines a fixed number of fields, or
    attributes.
•   The columns of the table correspond to the attributes
    of the record type.
•   The relational data model is the most widely used
    data model,
•   a vast majority of current database systems are
    based on the relational model.
•   The relational model is at a lower level of abstraction
    than the E-R model.
•   Database designs are often carried out in the E-R
    model, and then translated to the relational model
                Other Data Models
•   object-oriented data model
•   is another data model that has seen increasing
    attention.
•   The object-oriented model can be seen as extending
    the E-R model with notions of encapsulation,
    methods (functions), and object identity.
•   The object-relational data model combines
    features of the object-oriented data model and
    relational data model.
•   Semi-structured data models permit the specification
    of data where individual data items of the same type
    may have different sets of attributes.
•   This is in contrast with the data models mentioned
    earlier,
•   where every data item of a particular type must have
    the same set of attributes.
•   The extensible markup language (XML) is widely
    used to represent semistructured data.
•   Historically, two other data models, the network data
    model and the hierarchical data model, preceded
    the relational data model.
•   These models were tied closely to the underlying
    implementation,
•   and complicated the task of modeling data.
•   As a result they are little used now, except in old
    database code that is still in service in some places.
        The Entity-Relationship Model (E-R)
•   Collection of basic objects – entities
•   And relationships among these objects
•   Entity – things or objects in the real world
    • Distinguishable from other objects
    • E. g. person, bank account
•   Entities are described in database by a set of attributes
•   E. g. bank account – account numbers and balance
•   Some attributes used to uniquely identify the entities
•   Relationship – association among several entities
•   An entity set is a set of entities of the same type that
    share the same properties, or attributes.
•   The set of all persons who are customers at a given
    bank, for example, can be defined as the entity set
    customer
•   Entity sets do not need to be disjoint.
•   For example, it is possible to define the entity set of
    all employees of a bank (employee) and the entity
    set of all customers of the bank (customer).
•   A person entity may be an employee entity, a
    customer entity, both, or neither.
•   An entity is represented by a set of attributes.
•   Attributes are descriptive properties possessed by
    each member of an entity set.
•   Each entity has a value for each of its attributes
•   Some attributes are used to uniquely identify entities
•   For each attribute, there is a set of permitted values,
    called the domain, or value set, of that attribute.
•   A database thus includes a collection of entity sets,
    each of which contains any number of entities of the
    same type.
•   an attribute of an entity set is a function that maps
    from the entity set into a domain.
•   Since an entity set may have several attributes,
•   each entity can be described by a set of (attribute,
    data value) pairs,
•   one pair for each attribute of the entity set.
•   E.g. (customer-id, 677-89-9011)
•   (customer-name, Hayes)
•   An attribute can be characterized by the following
    attribute types
       Simple and composite attributes
•   Simple attributes- they are not divided into
    subparts.
•   Composite attributes- can be divided into subparts
    (other attributes).
•   E.g. attribute name could be structured as a
    composite attribute consisting of first-name, middle-
    initial, and last-name
•   if a user will wish to refer to an entire attribute on
    some occasions, and only a component of the
    attribute on other occasions.
•   composite attribute may appear as a hierarchy
    Single-valued and multivalued attributes
•    single value for a particular entity attribute
•    There may be instances where an attribute has a set
     of values for a specific entity.
•    employee entity set with the attribute phone-number.
•    An employee may have zero, one, or several phone
     numbers
•    and different employees may have different numbers
     of phones.
•    This type of attribute is said to be multivalued.
•    appropriate, upper and lower bounds may be placed
     on values in a multivalued attribute.
                   Derived attribute
•   The value for this type of attribute can be derived
    from the values of other related attributes or entities.
•   the customer entity set has an attribute loans-held,
•   which represents how many loans a customer has
    from the bank.
•   derive the value for this attribute by counting the
    number of loan entities associated with that
    customer.
•   The value of a derived attribute is not stored, but is
    computed when required.
•   An attribute takes a null value when an entity does
    not have a value for it.
•   The null value may indicate “not applicable”—the
    value does not exist for the entity.
•   For example, one may have no middle name.
•   Null can also designate that an attribute value is
    unknown.
•   Components of E-R diagram
•   Rectangle – represents entity sets
•   Ellipses – represents attributes
•   Diamonds – represents relationships among entity sets
•   Lines – link attributes to entity sets and entity sets to
    relationship
•   Double Ellipse – Multivalued Attributes
•   Dashed Ellipse – Derived Attributes
•   Double lines - indicate total participation of an entity
    in a relationship set
•   Each entity is labeled with name
                 Relationship Sets
•   A relationship is an association among several
    entities.
•   a relationship that associates customer Hayes with
    loan L-15.
•   This relationship specifies that Hayes is a customer
    with loan number L-15.
•   A relationship set is a set of relationships of the
    same type.
•   If E1, E2, . . .,En are entity sets, then a relationship
    set R is a subset of
The association between entity sets is referred to as
   participation;
the entity sets E1, E2, . . .,En participate in relationship
   set R
•   The function that an entity plays in a relationship is
    called that entity’s role.
                     Constraints
•   constraints to which the contents of a database must
    conform
•   Mapping Cardinalities
•   Mapping cardinalities, or cardinality ratios,
•   express the number of entities to which another
    entity can be associated via a relationship set.
•   E.g. each account must belong to only one customer
•   For a binary relationship set R between entity sets A
    and B,
•   the mapping cardinality must be one of the following:
•   One to one :-
•   An entity in A is associated with at most one entity in
    B, and
•   an entity in B is associated with at most one entity in
    A.
•   E.g. One Customer – one loan account
•   One to many :-
•   An entity in A is associated with any number (zero or
    more) of entities in B.
•   An entity in B, can be associated with at most one
    entity in A.
•   E.g. One Customer – many loan accounts
•   Many to one :-
•   An entity in A is associated with at most one entity in
    B.
•   An entity in B, however, can be associated with any
    number (zero or more) of entities in A.
•   E.g. 3 Customers - joint loan account
•   Many to many :-
•   An entity in A is associated with any number (zero or
    more) of entities in B, and
•   an entity in B is associated with any number (zero or
    more) of entities in A.
•   E.g. Many customers many loan accounts
            Participation Constraints
•   The participation of an entity set E in a relationship
    set R is said to be total
•   if every entity in E participates in at least one
    relationship in R.
•   If only some entities in E participate in relationships
    in R,
•   the participation of entity set E in relationship R is
    said to be partial.
•   For example, we expect every loan entity to be
    related to at least one customer through the borrower
    relationship.
•   Therefore the participation of loan in the relationship
    set borrower is total.
•   In contrast, an individual can be a bank customer
    whether or not she has a loan with the bank.
•   Hence, it is possible that only some of the customer
    entities are related to the loan entity set through the
    borrower relationship,
•   and the participation of customer in the borrower
    relationship set is therefore partial.
                           Keys
•   must have a way to specify how entities within a
    given entity set are distinguished.
•   individual entities are distinct the difference among
    them must be expressed in terms of their attributes
•   The values of the attribute values of an entity must
    be such that they can uniquely identify the entity.
•   no two entities in an entity set are allowed to have
    exactly the same value for all attributes
•   A key allows us to identify a set of attributes that
    suffice to distinguish entities from each other.
•   A superkey is a set of one or more attributes that,
    taken collectively,
•   allow us to identify uniquely an entity in the entity set.
•   For example, the customer-id attribute of the entity
    set customer is sufficient to distinguish one customer
    entity from another.
•   Thus, customer-id is a superkey.
•   Similarly, the combination of customer-name and
    customer-id is a superkey for the entity set customer.
•   The customer-name attribute of customer is not a
    superkey, because several people might have the
    same name.
•   candidate keys :-
•   superkey may contain extraneous attributes.
•   If K is a superkey, then so is any superset of K.
•   superkeys for which no proper subset is a superkey.
•   Such minimal superkeys are called candidate keys.
•   several distinct sets of attributes could serve as a
    candidate key.
•   Suppose that a combination of customer-name and
    customer-street is sufficient to distinguish among
    members of the customer entity set.
•   Then, both {customer-id} and {customer-name,
    customer-street} are candidate keys.
•   Although the attributes customerid and customer-
    name together can distinguish customer entities,
•   their combination does not form a candidate key,
•   since the attribute customer-id alone is a candidate
    key
•   primary key denote a candidate key that is chosen
    by the database designer
•   as the principal means of identifying entities within an
    entity set.
•   A key (primary, candidate, and super) is a property of
    the entity set, rather than of the individual entities.
•   Any two individual entities in the set are prohibited
    from having the same value on the key attributes at
    the same time.
•   Candidate keys must be chosen with care
•   The primary key should be chosen such that its
    attributes are never, or very rarely, changed.
•   For instance, the address field of a person should not
    be part of the primary key,
•   since it is likely to change.
•   Social-security numbers, on the other hand, are
    guaranteed to never change.
•   The primary key of a relational table uniquely
    identifies each record in the table.
•   It can either be a normal attribute that is guaranteed
    to be unique (such as Social Security Number in a
    table with no more than one record per person)
•   or it can be generated by the DBMS (such as a
    globally unique identifier, or GUID, in Microsoft SQL
    Server).
•   Primary keys may consist of a single attribute or
    multiple attributes in combination.
Sample – ER Diagrams
                  Weak Entity Sets
•   An entity set may not have sufficient attributes to
    form a primary key.
•   Such an entity set is termed a weak entity set.
•   An entity set that has a primary key is termed a
    strong entity set
•   Consider the entity set payment, which has the three
    attributes: payment-number, payment-date, and
    payment-amount.
•   each payment entity is distinct starting from 1
•   But payments for different loans may share the same
    payment number.
•   Thus, this entity set does not have a primary key;
•   it is a weak entity set.
•   For a weak entity set to be meaningful, it must be
    associated with another entity set, called the
    identifying or owner entity set.
•   the weak entity set is said to be existence
    dependent on the identifying entity set.
•   The identifying entity set is said to own the weak
    entity set that it identifies.
•   The relationship associating the weak entity set with
    the identifying entity set is identifying relationship.
•   The identifying relationship is many to one
•   from the weak entity set to the identifying entity set,
    and the participation of the weak entity set in the
•   In our example, the identifying entity set for payment
    is loan,
•   and a relationship loan-payment that associates
    payment entities with their corresponding loan
    entities is the identifying relationship.
•   The primary key of a weak entity set is formed by the
    primary key of the identifying entity set, plus the
    weak entity set’s discriminator.
•   In the case of the entity set payment, its primary key
    is {loan-number, payment-number},
•   In E-R diagrams, a doubly outlined box indicates a
    weak entity set,
•   and a doubly outlined diamond indicates the
    corresponding identifying relationship.
•   In Figure the weak entity set payment depends on
    the strong entity set loan via the relationship set loan-
    payment.
•   The figure also illustrates the use of double lines to
    indicate total participation—
•   The participation of the (weak) entity set payment in
    the relationship loan-payment is total
             Extended E-R Features
•   Extended E-R features –
•   Specialization
•   Generalization
•   higher- and lower-level entity sets,
•   attribute inheritance
•   aggregation
                    Specialization
•   An entity set may include subgroupings of entities
    that are distinct in some way from other entities in the
    set.
•   For instance, a subset of entities within an entity set
    may have attributes that are not shared by all the
    entities in the entity set.
•   The E-R model provides a means for representing
    these distinctive entity groupings.
•   Consider an entity set person, with attributes name,
    street, and city.
•   A person may be further classified as one of the
    following:
    • Customer
    • employee
•   Each of these person types is described by a set of
    attributes that includes all the attributes of entity set
    person plus possibly additional attributes.
•   For example, customer entities may be described
    further by the attribute customer-id
•   whereas employee entities may be described further
    by the attributes employee-id and salary.
•   The process of designating subgroupings within an
    entity set is called specialization.
•   The specialization of person allows us to distinguish
    among persons according to whether they are
    employees or customers.
•   Another example, the bank wishes to divide accounts
    into two categories, Current account and savings
    account.
•   In terms of an E-R diagram, specialization is depicted
    by a triangle component labelled ISA, as Figure
    shows.
•   The label ISA stands for “is a”
•   represents, for example, that a customer “is a”
    person.
•   The ISA relationship may also be referred to as a
    superclass-subclass relationship.
•   Higher- and lower-level entity sets are depicted as
    regular entity sets—that is, as rectangles containing
    the name of the entity set.
                   Generalization
•   The refinement from an initial entity set into
    successive levels of entity subgroupings represents a
    top-down design process in which distinctions are
    made explicit.
•   The design process may also proceed in a bottom-
    up manner, in which multiple entity sets are
    synthesized into a higher-level entity set on the basis
    of common features.
•   customer entity set with the attributes name, street,
    city, and customer-id,
•   and an employee entity set with the attributes name,
    street, city, employee-id, and salary.
•   There are similarities between the customer entity
    set and the employee entity set -they have several
    attributes in common.
•   This commonality can be expressed by
    generalization
•   which is a containment relationship that exists
    between a higher-level entity set and one or more
    lower-level entity sets.
•   Person is the higher-level entity set and customer
    and employee are lower-level entity sets.
•   Higher- and lower-level entity sets also may be
    designated by the terms superclass and subclass,
    respectively.
•   The person entity set is the superclass of the
    customer and employee subclasses.
•   For all practical purposes, generalization is a simple
    inversion of specialization.
•   apply both processes, in combination, in the course
    of designing the E-R Diagram
•   Differences in the two approaches may be
    characterized by their starting point and overall goal.
•   Specialization stems from a single entity set;
•   it emphasizes differences among entities within the
    set by creating distinct lower-level entity sets.
•   These lower-level entity sets may have attributes, or
    may participate in relationships, that do not apply to
    all the entities in the higher-level entity set.
•   Generalization proceeds from the recognition that a
    number of entity sets share some common features
•   (namely, they are described by the same attributes
    and participate in the same relationship sets).
•   On the basis of their commonalities, generalization
    synthesizes these entity sets into a single, higher-
    level entity set.
•   Generalization is used to emphasize the similarities
    among lower-level entity sets and to hide the
    differences
•   it also permits an economy of representation in that
    shared attributes are not repeated.
               Attribute Inheritance
•   A crucial property of the higher- and lower-level
    entities created by specialization and generalization
    is attribute inheritance.
•   The attributes of the higher-level entity sets are said
    to be inherited by the lower-level entity sets.
•   For example, customer and employee inherit the
    attributes of person.
•   customer is described by its name, street, and city
    attributes, and additionally a customer-id attribute;
•   employee is described by its name, street, and city
    attributes, and additionally employee-id and salary
•   A lower-level entity set (or subclass) also inherits
    participation in the relationship sets in which its
    higher-level entity (or superclass) participates.
•   The officer, teller, and secretary entity sets can
    participate in the works-for relationship set, since the
    superclass employee participates in the works-for
    relationship.
•   Attribute inheritance applies through all tiers of lower-
    level entity sets.
•   Single Inheritance, Multiple Inheritance
       Constraints on Generalizations
•   One type of constraint involves determining which
    entities can be members of a given lower-level entity
    set.
•   Such membership may be one of the following:
•   Condition-defined-
•   In condition-defined lower-level entity sets,
    membership is evaluated on the basis of whether or
    not an entity satisfies an explicit condition or
    predicate.
•   E.g. Account – account type saving
•   User-defined- User-defined lower-level entity sets
    are not constrained by a membership condition;
•   rather, the database user assigns entities to a given
    entity set.
•   For instance, let us assume that, after 3 months of
    employment, bank employees are assigned to one of
    four work teams
•   A second type of constraint relates to whether or not
    entities may belong to more than one lower-level
    entity set within a single generalization.
•   The lower-level entity sets may be one of the
    following:
•   Disjoint- A disjointness constraint requires that an
    entity belong to no more than one lower-level entity
    set.
•   Overlapping-In overlapping generalizations, the
    same entity may belong to more than one lower-level
    entity set within a single generalization.
•   A final constraint, the completeness constraint on a
    generalization or specialization,
•   specifies whether or not an entity in the higher-level
    entity set must belong to at least one of the lower-
    level entity sets within the generalization/
    specialization
•   This constraint may be one of the following:
•   Total generalization or specialization -Each higher-
    level entity must belong to a lower-level entity set
•   Partial generalization or specialization- Some
    higher-level entities may not belong to any lower-
    level entity set
•   Partial generalization is the default.
•   We can specify total generalization in an E-R
    diagram by using a double line to connect the box
    representing the higher-level entity set to the triangle
    symbol.
•   (This notation is similar to the notation for total
    participation in a relationship.)
•   The account generalization is total:
•   All account entities must be either a savings account
    or a current account
•   The work team entity sets illustrate a partial
    specialization.
•   Since employees are assigned to a team only after 3
    months on the job,
•   some employee entities may not be members of any
    of the lower-level team entity sets.
                     Aggregation
•   One limitation of the E-R model is that it cannot
    express relationships among relationships.
•   consider the ternary relationship works-on, between
    a employee, branch, and job
•   suppose we want to record managers for tasks
    performed by an employee at a branch;
•   we want to record managers for (employee, branch,
    job) combinations.
•   One alternative for representing this relationship is to
    create a quaternary relationship manages between
    employee, branch, job, and manager.
    Design of an E-R Database Schema
•   The E-R data model gives us much flexibility in
    designing a database schema to model a given
    enterprise.
•   how a database designer may select from the wide
    range of alternatives.
•   Among the designer’s decisions are:
                   Design Phases
•   The initial phase of database design, is to
    characterize fully the data needs of the prospective
    database users.
•   Next, the designer chooses a data model, and by
    applying the concepts of the chosen data model,
    translates these requirements into a conceptual
    schema of the database.
•   The designer reviews the schema to confirm that all
    data requirements are indeed satisfied and are not in
    conflict with one another.
•   In a specification of functional requirements,
    users describe the kinds of operations (or
    transactions) that will be performed on the data.
•   The process of moving from an abstract data model
    to the implementation of the database proceeds in
    two final design phases.
•   In the logical-design phase, the designer maps the
    high-level conceptual schema onto the
    implementation data model of the database system
    that will be used.
•   The designer uses the resulting system specific
    database schema in the subsequent physical-
    design phase, in which the physical features of the
    database are specified.
    Reduction of an E-R Schema to Tables
•   represent a database that conforms to an E-R
    database schema by a collection of tables.
•   For each entity set and for each relationship set in
    the database, there is a unique table to which we
    assign the name of the corresponding entity set or
    relationship set.
•   Each table has multiple columns, each of which has
    a unique name.
•   Tabular Representation of Strong Entity Sets
•   Let E be a strong entity set with descriptive attributes
    a1, a2, . . . , an.
•   We represent this entity by a table called E with n
    distinct columns,
•   each of which corresponds to one of the attributes of
    E.
•   Each row in this table corresponds to one entity of
    the entity set E.
•   E.g. loan with attributes loan number and amount
•   Tabular Representation of Weak Entity Sets
•   Let A be a weak entity set with attributes a1, a2, . . . ,
    am.
•   Let B be the strong entity set on which A depends.
•   Let the primary key of B consist of attributes b1, b2, .
    . . , bn.
•   We represent the entity set A by a table called A with
    one column for each attribute of the set:
•   {a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
•   loan
•   payment
•   Tabular Representation of Relationship Sets
•   Let R be a relationship set, let a1, a2, . . . , am be the
    set of attributes formed by the union of the primary
    keys of each of the entity sets participating in R,
•   and let the descriptive attributes (if any) of R be b1,
    b2, . . . , bn.
•   We represent this relationship set by a table called R
    with one column for each attribute of the set:
•   {a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
•   E.g. relationship - borrower
•   Customer – primary key customer id
•   Loan – primary key loan number
•   Redundancy of Tables
•   Combination of Tables
•   Composite Attributes
•   Multivalued Attributes
•   Tabular Representation of Generalization
•   Tabular Representation of Aggregation
              Redundancy of Tables
•   Specifically in case of weak entity set
•   Table used for Identifying relationship is redundent
•   E. g. Every(loan-number,payment-number)
    combination in loan-payment would also be present
    in the payment table.
•   Thus, the loan-payment table is redundant.
             Combination of Tables
•   Combine table for account-branch with the table for
    account and require only the following two tables:
•   account, with attributes account-number, balance,
    and branch-name
•   branch, with attributes branch-name, branch-city, and
    assets
•   No need of table account-branch
Tabular Representation of Generalization
•   Create a table for the higher-level entity set.
•   For each lower-level entity set, create a table that
    includes a column for each of the attributes of that
    entity set
•   plus a column for each attribute of the primary key of
    the higher-level entity set.
Tabular Representation of Aggregation
•   The table for the relationship set manages between
    the aggregation of works-on and the entity set
    manager includes
•   a column for each attribute in the primary keys of the
    entity set manager and the relationship set works-on.
           Database System Structure
•   A database system is partitioned into modules
•   that deal with each of the responsibilities of the
    overall system.
•   The functional components of a database system
    can be broadly divided into
•   the storage manager and
•   the query processor components.
•   storage manager is a program module
•   that provides the interface between the low level data
    stored in the database and the application programs
    and queries submitted to the system.
•   The storage manager is responsible for the
    interaction with the file manager.
•   The raw data are stored on the disk using the file
    system, which is usually provided by a conventional
    operating system.
•   The storage manager translates the various DML
    statements into low-level file-system commands.
•   the storage manager is responsible for storing,
    retrieving, and updating data in the database.
The storage manager components include:
• Authorization and integrity manager
    tests for the satisfaction of integrity constraints and
    checks the authority of users to access data.
• Transaction manager
    ensures that the database remains in a consistent
    (correct) state despite system failures, and
    concurrent transaction executions proceed without
    conflicting.
• File manager -
manages the allocation of space on disk storage and the
data structures used to represent information stored on
disk.
• Buffer manager
   responsible for fetching data from disk storage into
   main memory, and deciding what data to cache in
   main memory.
   The buffer manager is a critical part of the database
   system, since it enables the database to handle data
   sizes that are much larger than the size of main
   memory.
•   The storage manager implements several data
    structures as part of the physical system
    implementation:
• Data files, which store the database itself.
• Data dictionary,
   which stores metadata about the structure of the
   database, in particular the schema of the database.
• Indices, which provide fast access to data items that
    hold particular values.
             The Query Processor
• The query processor components include
• DDL interpreter, which interprets DDL statements and
   records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a
   query language into an evaluation plan consisting of
   low-level instructions that the query evaluation
   engine understands.
• Query evaluation engine, which executes low-level
   instructions generated by the DML compiler.
Application Architectures
•   Database applications are usually partitioned into two
    or three parts
•   In a two-tier architecture, the application is
    partitioned into a component that resides at the client
    machine,
•   which invokes database system functionality at the
    server machine through query language statements.
•   Application program interface standards like JDBC
    are used for interaction between the client and the
    server.
•   in a three-tier architecture, the client machine acts
    as merely a front end and does not contain any direct
    database calls.
•   Instead, the client end communicates with an
    application server, usually through a forms
    interface.
•   The application server in turn communicates with a
    database system to access data.
    Roles in the Database Environment
•   A primary goal of a database system is to retrieve
    information from and store new information in the
    database.
•   People who work with a database can be categorized
    as
•   database users or
•   database administrators.
    Database Users and User Interfaces
•   There are four different types of database-system
    users,
•   differentiated by the way they expect to interact with
    the system.
•   Different types of user interfaces have been
    designed for the different types of users.
•   Naive users
•   are unsophisticated users who interact with the
    system by invoking one of the application programs
    that have been written previously.
•   For example, a bank teller who needs to transfer $50
    from account A to account B invokes a program
    called transfer.
•   Application programmers
•   are computer professionals who write application
    programs.
•   Application programmers can choose from many
    tools to develop user interfaces.
•   Sophisticated users
•   interact with the system without writing programs.
•   Instead, they form their requests in a database query
    language.
•   They submit each such query to a query processor,
    whose function is to break down DML statements into
    instructions that the storage manager understands.
•   Analysts who submit queries to explore data in the
    database fall in this category.
•   Specialized users
•   are sophisticated users who write specialized
    database applications that do not fit into the
    traditional data-processing framework.
•   Among these applications are computer-aided design
    systems, knowledge base and expert systems,
    systems that store data with complex data types (for
    example, graphics data and audio data), and
    environment-modeling systems.
            Database Administrator
•   One of the main reasons for using DBMSs is to have
    central control of both the data and the programs that
    access those data.
•   A person who has such central control over the
    system is called a database administrator (DBA).
•   The functions of a DBA include:
•   Schema definition. The DBA creates the original
    database schema by executing a set of data
    definition statements in the DDL.
•   Storage structure and access-method definition.
•   Schema and physical-organization modification.
    The DBA carries out changes to the schema and
    physical organization to reflect the changing needs of
    the organization, or to alter the physical organization
    to improve performance.
•   Granting of authorization for data access.
•   By granting different types of authorization, the
    database administrator can regulate which parts of
    the database various users can access.
•   Routine maintenance. Examples of the database
    administrator’s routine maintenance activities are:
•   Periodically backing up the database,
•   Ensuring that enough free disk space is available
•   Monitoring jobs running on the database and
    ensuring that performance is not degraded
            Transaction Management
•   A transaction is a collection of operations that
    performs a single logical function in a database
    application.
•   Each transaction is a unit of both atomicity and
    consistency
•   ACID
•   Concurrency control
     Structure of Relational Databases
•   Collection of tables - relation
•   Columns- attributes
•   Rows- record- tuple
•   Domain – possible values
•   Database Schema
•   Keys-
•   Underline the attributes forming primary key
         Relational Query Languages
           The Relational Algebra
•   relational algebra is a procedural query language.
•   It consists of a set of operations that take one or two
    relations as input
•   and produce a new relation as their result.
•   The fundamental operations in the relational algebra
    are select, project, union, set difference, Cartesian
    product, and rename.
•   Fundamental Operations
•   select, project, and rename operations are called
    unary operations,
•   because they operate on one relation.
•   The other three operations operate on pairs of
    relations and are called binary operations.
•   union, set difference, Cartesian product
                 The Select Operation
•   select operation selects tuples that satisfy a given
    predicate.
•   use the lowercase Greek letter sigma (σ) to denote
    selection.
•   The predicate appears as a subscript to σ.
•   The argument relation is in parentheses after the σ.
•   Thus, to select those tuples of the loan relation
    where the branch is “Perryridge,” write
•   σbranch-name =“Perryridge” (loan)
•   We can find all tuples in which the amount is more
    than $1200 by writing
•   σamount>1200 (loan)
•   allow comparisons using =, =, <, ≤, >, ≥ in the
    selection predicate.
•   we can combine several predicates into a larger
    predicate by using the connectives and (∧), or (∨),
    and not (¬).
•   to find those tuples pertaining to loans of more than
    $1200 made by the Perryridge branch, we write
•   σbranch-name =“Perryridge”∧ amount>1200 (loan)
              The Project Operation
•   Suppose we want to list all loan numbers and the
    amount of the loans,
•   but do not care about the branch name.
•   The project operation allows us to produce this
    relation.
•   The project operation is a unary operation that
    returns its argument relation, with certain attributes
    left out.
•   any duplicate rows are eliminated.
•   Projection is denoted by the uppercase Greek letter
    pi (Π).
•   We list those attributes that we wish to appear in the
    result as a subscript to Π.
•   The argument relation follows in parentheses.
•   Thus, we write the query to list all loan numbers and
    the amount of the loan as
•   Πloan-number, amount(loan)
    Composition of Relational Operations
•   the result of a relational operation is itself a relation
•   Consider query “Find those customers who live in
    Harrison.”
•   We write:
•   Πcustomer-name (σcustomer-city =“Harrison” (customer))
•   instead of giving the name of a relation as the
    argument of the projection operation,
•   we give an expression that evaluates to a relation.
               The Union Operation
•   a query to find the names of all bank customers who
    have either an account or a loan or both.
•   To answer this query, we need the information in the
    depositor relation and in the borrower relation
•   We know how to find the names of all customers with
    a loan in the bank:
•   Πcustomer-name (borrower )
•   We also know how to find the names of all customers
    with an account in the bank:
•   Πcustomer-name (depositor)
•   To answer the query,
•   we need the union of these two sets;
•   we need all customer names that appear in either or
    both of the two relations.
•   We find these data by the binary operation union,
    denoted, as in set theory, by ∪.
•   So the expression needed is
•   Πcustomer-name (borrower ) ∪ Πcustomer-name (depositor)
         The Set Difference Operation
•   The set-difference operation, denoted by −, allows
    us to find tuples that are in one relation but are not in
    another.
•   The expression r − s produces a relation containing
    those tuples in r but not in s.
•   We can find all customers of the bank who have an
    account but not a loan by writing
•   Πcustomer-name (depositor) − Πcustomer-name (borrower )
•   we must ensure that set differences are taken
    between compatible relations.
•   Therefore, for a set difference operation r − s to be
    valid,
•   we require that the relations r and s be of the same
    type
•   and the domains of the ith attribute of r and the ith
    attribute of s be the same
      The Cartesian-Product Operation
•   The Cartesian-product operation, denoted by a
    cross (×),
•   allows us to combine information from any two
    relations.
•   We write the Cartesian product of relations r1 and r2
    as r1 × r2.
•   for r = borrower × loan is
•   (borrower.customer-name, borrower.loan-number,
    loan.loan-number, loan.branch-name, loan.amount)
•   σbranch-name =“Perryridge”(borrower × loan)
           The Natural-Join Operation
•   It is often desirable to simplify certain queries that
    require a Cartesian product.
•   Usually, a query that involves a Cartesian product
    includes a selection operation on the result of the
    Cartesian product.
• Πcustomer-name, loan.loan-number, amount
(σborrower .loan-number =loan.loan-number (borrower × loan))
            The Tuple Relational Calculus
•   In relational-algebra expression-provide a sequence
    of procedures that generates the answer to query.
•   tuple relational calculus, by contrast, is a
    nonprocedural query language.
•   It describes the desired information without giving a
    specific procedure for obtaining that information.
•   A query in the tuple relational calculus is expressed
    as {t | P(t)}
•   it is the set of all tuples t such that predicate P is true
    for t
                  Example Queries
•   we want to find the branch-name, loan-number, and
    amount for loans of over $1200:
•   {t | t ∈ loan ∧ t[amount] > 1200}
•   use t[A] to denote the value of tuple t on attribute A,
    and use t ∈ r to denote that tuple t is in relation r.
•   Suppose that we want only the loan-number
    attribute, rather than all attributes of the loan relation.
•   To write this query in the tuple relational calculus, we
    need to write an expression for a relation on the
    schema (loan-number).
•   We need those tuples on (loan-number) such that
    there is a tuple in loan with the amount > 1200
•   To express this request, we need the construct “there
    exists” from mathematical logic.
•   The notation ∃ t ∈ r (Q(t))
•   means “there exists a tuple t in relation r such that
    predicate Q(t) is true.”
•   Using this notation, we can write the query “Find the
    loan number for each loan of an amount greater than
    $1200” as
• {t | ∃ s ∈ loan (t[loan-number] = s[loan-number]
∧ s[amount] > 1200)}
•   In English, we read the preceding expression as
•   “The set of all tuples t such that there exists a tuple s
    in relation loan for which the values of t and s for the
    loan-number attribute are equal, and the value of s
    for the amount attribute is greater than $1200.”
•   Consider the query “Find the names of all customers
    who have a loan from the Perryridge branch.”
•   {t | ∃ s ∈ borrower (t[customer-name] =
    s[customer-name] ∧ ∃ u ∈ loan (u[loan-number] =
    s[loan-number ] ∧ u[branch-name] =
    “Perryridge”))}
• Tuple variable u ensures that the customer is a
  borrower at the Perryridge branch.
• Tuple variable s is restricted to pertain to the
  same loan number as s.
•   To find all customers who have a loan, an account, or
    both at the bank
•   {t | ∃ s ∈ borrower (t[customer-name] = s[customer-
    name]) ∨ ∃ u ∈ depositor (t[customer-name] =
    u[customer-name])}
•   If we now want only those customers who have both
    an account and a loan at the Bank
•   {t | ∃ s ∈ borrower (t[customer-name] = s[customer-
    name]) ∧ ∃ u ∈ depositor (t[customer-name] =
    u[customer-name])}
                 Formal Definition
•   A tuple-relational-calculus expression is of the form
•   {t | P(t)}
•   where P is a formula.
•   Several tuple variables may appear in a formula.
       The Domain Relational Calculus
•   A second form of relational calculus, called domain
    relational calculus,
•   uses domain variables that take on values from an
    attributes domain, rather than values for an entire
    tuple.
•   The domain relational calculus is closely related to
    the tuple relational calculus.
                   Formal Definition
•   An expression in the domain relational calculus is of
    the form
•   {< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)}
•   where x1, x2, . . . , xn represent domain variables.
•   P represents a formula composed of atoms,
•   An atom in the domain relational calculus has one of
    the following forms:
•   < x1, x2, . . . , xn > ∈ r, where r is a relation on n
    attributes and x1, x2, . . . , xn are domain variables or
    domain constants.
                  Example Queries
•   Find the loan number, branch name, and amount for
    loans of over $1200:
•   {< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200}
•   Find all loan numbers for loans with an amount
    greater than $1200:
•   {< l > | ∃ b, a (< l, b, a > ∈ loan ∧ a > 1200)
•   Find the names of all customers who have a loan
    from the Perryridge branch and find the loan amount:
•   {< c,a > | ∃ l (< c, l >∈ borrower ∧ ∃ b (< l, b, a > ∈
    loan ∧ b = “Perryridge”))}
•   Find the names of all customers who have a loan, an
    account, or both at the Perryridge branch:
•   {< c > | ∃ l (< c, l >∈ borrower ∧∃b, a (< l, b, a >∈ loan
    ∧ b = “Perryridge”)) ∨∃a (< c,a >∈ depositor ∧∃b, n (<
    a, b, n >∈ account ∧ b = “Perryridge”))}