Unit 1: Introduction to DBMS
By :
        Mrs. Suman Madan
    suman.madan@jimsindia.org
     Database Management System (DBMS)
   Collection of interrelated data
   Set of programs to access the data
   DBMS contains information about a particular enterprise
   DBMS provides an environment that is both convenient and efficient to
    use.
   Database Applications:
    • Banking: all transactions
    • Airlines: reservations, schedules
    • Universities: registration, grades
    • Sales: customers, products, purchases
    • Manufacturing: production, inventory, orders, supply chain
    • Human resources: employee records, salaries, tax deductions
   Databases touch all aspects of our lives
    Purpose of Database System
In the early days, database applications were
  built on top of file systems.
Drawbacks of using file systems to store data:
   Data redundancy and inconsistency - duplication of
    information in different files.
     • Uncontrolled duplication of data is undesirable for
       following reasons:
            Duplication costs time and money to enter data more than
             once.
            It takes additional storage space thus again increasing
             associated costs. It can be avoided by sharing data files.
            It may lead to data inconsistency.
   Difficulty in accessing data
      Need to write a new program to carry out each new
       task.
Drawbacks of file systems (cont.)
•   Data isolation — multiple files and formats.
    • When data is isolated in separate files, it is more difficult to
      access data and to ensure that data is correct.
    • Also, the structure of file depends on application
      programming language. Thus the direct incompatibility of
      such files makes it difficult to process jointly.
•   Integrity problems
     Integrity constraints (e.g. account balance > 0) become part
      of program code
     Hard to add new constraints or change existing ones.
Drawbacks of file systems (cont.)
•   Atomicity of updates
     Failures may leave database in an inconsistent state with
      partial updates carried out.
     E.g. transfer of funds from one account to another should
      either complete or not happen at all.
•   Concurrent access by multiple users
     Concurrent accessed needed for performance.
     Uncontrolled concurrent accesses can lead to inconsistencies
         E.g. two people reading a balance and updating it at the same
          time.
•   Security problems
Database systems offer solutions to all the above
problems
            Levels of Abstraction
   Physical level/Internal level : It describes how a
    record (e.g., customer) is stored.
   Logical level/Conceptual level: It describes data
    stored in database, and the relationships among
    the data.
                type customer = record
                              name : string;
                              street : string;
                              city : integer;
                           end;
   View level/ External Level: Application programs
    hide details of data types. Views can also hide
    information (e.g., salary) for security purposes.
                   Views of Data
External
view: User &
Data
Designer
Conceptual
Schema:
Data
Designer
Physical
Storage:
DBA
               An architecture for a database system
                   Views of Data
   Physical level/Internal level : The physical
    representation of the database on the computer.
    This level describes how the data is stored in the
    database.
       • It includes :
           Where the data is located
           File structures
           Access methods
           Indexes.
      The physical schema is managed by the DBA.
                     Views of Data
   Logical level/Conceptual level: The community view of
    the database. This level describes what data is stored
    in the database and the relationships among the data.
        • What are the entities and Relationships in
          organization.
        • What information these entities and relationships
          should store in database.
        • What integrity constraints/business rules it should
          have?
        • It consists of the schemas we have described with
          CREATE TABLE statements.
                    Views of Data
   View level/ External Level: The users view of the
    database. This level describes that part of the
    database that is relevant to each user.
       • Each external schema is a combination of base
         tables and views, tailored to the needs of a single
         user.
       • It is managed by the data designer and the user.
           Advantages of DBMS
   Control of data redundancy
   Data consistency
   More information from the same amount of data
   Sharing of data
   Improved data integrity
   Improved security
   Enforcement of standards
   Economy of scale
   Balance of conflicting requirements
   Improved data accessibility and responsiveness
   Increased productivity
   Improved maintenance through data independence
             Disadvantages of DBMS
   Complexity – provision of the functionality we expect from
    DBMS makes it extremely complex.
   Size – complexity and breadth of functionality makes DBMS
    extremely large piece of software.
   Cost of DBMSs – it varies significantly depending on the
    environment & functionality provided.
   Additional hardware costs – to achieve required
    performance, it is necessary to procure large memory.
   Performance – DBMS is written to be more general, to cater
    for many applications rather than just one.
   Higher impact of failure – Centralization of resources
    increases the vulnerability of the system. Since all users &
    applications rely on DBMS, the failure of one component
    can bring operations to a halt.
          Components of the DBMS
1.   Hardware – DBMS and the applications require hardware to
     run.
     •  It can range from PC to mainframe or network of
        computers.
     •  It depends on the organization’s requirements and the
        DBMS used.
2.   Software – It comprises of following :
     •  DBMS software itself
     •  Application program
     •  Operating System including network software
3.   Data - Most important component from end-user’s point
     of view.
     •  It acts as a bridge between the machine components
        and the human component.
     •  The database contains both operational data and the
        metadata.
        Components of the DBMS
4.   Procedures – It refers to the instructions and rules that
     govern the design and use of the database.
     •  Users of the system require documented procedures
        on how to use/run the system.
     •  It may consists of instructions like
        •   Log on to the DBMS
        •   Use a particular DBMS facility or application
            program.
        •   Start and stop DBMS
        •   Make backup copies of the database
        •   Handle H/W and S/W failures.
        •   Change structure of table to improve performance
5.   People – i.e USERS
            Database Administrator
   Coordinates all the activities of the database system;
    the database administrator has a good understanding
    of the enterprise’s information resources and needs:
   Database administrator’s duties include:
    •   Schema definition
    •   Storage structure and access method definition
    •   Schema and physical organization modification
    •   Granting user authority to access the database
    •   Specifying integrity constraints
    •   Acting as liaison with users
    •   Monitoring performance and responding to changes in
        requirements
                         Database Users
   Users are differentiated by the way they expect to interact with the
    system.
     1.    Application programmers: They are the developers who interact
           with the database by means of DML queries. These DML queries
           are written in the application programs like C, C++, JAVA, Pascal
           etc. These queries are converted into object code to communicate
           with the database.
          For example, writing a C program to generate the report of
          employees who are working in particular department will involve a
          query to fetch the data from database. It will include a embedded SQL
          query in the C Program.
                    Database Users
2.    Sophisticated users : They are database developers, who write SQL
      queries to select/insert/delete/update data. They do not use any
      application or programs to request the database. They directly
      interact with the database by means of query language like SQL.
     These users will be scientists, engineers, analysts who thoroughly
     study SQL and DBMS to apply the concepts in their requirement. In
     short, we can say this category includes designers and developers of
     DBMS and SQL.
3.    Specialized users: These are also sophisticated users, but they write
      special database application programs. They are the developers who
      develop the complex programs to the requirement.
                Database Users
4.   Stand-alone Users - These users will have stand –alone
     database for their personal use. These kinds of database will
     have readymade database packages which will have menus and
     graphical interfaces.
5.   Naive users: These are the users who use the existing
     application to interact with the database. For example, online
     library system, ticket booking systems, ATMs etc which has
     existing application and users use them to interact with the
     database to fulfill their requests.
Overall System Structure
                  indices   Statistical data          disk storage
     Data files                     Data dictionary
                             Data Models
   An integrated collection of concepts for describing and manipulating the
    following in an organization:
    •  Data
    •  Data relationships
    •  Data semantics
    •  Data constraints
   Facilitate interaction among the designer, the applications programmer, and the
    end user
   A data model can be thought of as comprising 3 components:
    •   Structural Part – consisting of set of rules according to which databases can
        be structured.
    •   Manipulative part – defining the types of operations that are allowed on the
        data (includes insertion, retrieval or updating)
    •   Set of integrity rules – for ensuring that data is accurate.
                  Data Model Categories
   Object-based data models
    ◦   Entity-relationship model
    ◦   Object-oriented model
    ◦   Semantic model
    ◦   Functional model
   Record-based logical models
    ◦   Relational model (e.g., SQL/DS, DB2)
    ◦   Network model
    ◦   Hierarchical model (e.g., IMS)
   Physical Data Models
    ◦   Unifying model
    ◦   Frame Memory
                      Hierarchical Model
    A hierarchical data model is a data model in which the data is organized
    into a tree-like structure.
   The structure allows repeating information using parent/child relationships:
    each parent can have many children but each child only has one parent.
   All attributes of a specific record are listed under an entity type.
                         Hierarchical Model
•   ADVANTAGES:
    1. Simplicity : Data naturally have hierarchical relationship in most of the practical situations.
       Therefore, it is easier to view data arranged in manner. This makes this type of database more
       suitable for the purpose.
    2. Security :These database system can enforce varying degree of security feature unlike flat-file
       system.
    3. Database Integrity: Because of its inherent parent-child structure, database integrity is highly
       promoted in these systems.
    4. Efficiency: The hierarchical database model is a very efficient, one when the database contains a large
        number of I: N relationships (one-to-many relationships) and when the users require large number of
        transactions, using data whose relationships are fixed.
•   DISADVANTAGES:
     1. Complexity of Implementation: The actual implementation of a hierarchical database depends on the
         physical storage of data. This makes the implementation complicated.
     2. Difficulty in Management: The movement of a data segment from one location to another cause all the
         accessing programs to be modified making database management a complex affair.
     3. Complexity of Programming: Programming a hierarchical database is relatively complex because the
         programmers must know the physical path of the data items.
     4. Poor Portability: The database is not easily portable mainly because there is little or no standard existing for
         these types of database.
     5. Database Management Problems: If you make any changes in the database structure of a hierarchical
         database, then you need to make the necessary changes in all the application programs that access the
         database. Thus, maintaining the database and the applications can become very difficult.
     6. Lack of structural independence: Structural independence exists when the changes to the database structure
         does not affect the DBMS's ability to access data. Hierarchical database systems use physical storage paths to
         navigate to the different data segments. So, the application programs should have a good knowledge of the
         relevant access paths to access the data. So, if the physical structure is changed the applications will also have
         to be modified. Thus, in a hierarchical database the benefits of data independence are limited by structural
         dependence.
     7. Programs Complexity: Due to the structural dependence and the navigational structure, the application
         programs and the end users must know precisely how the data is distributed physically in the database in
         order to access data. This requires knowledge of complex pointer systems, which is often beyond the grasp of
         ordinary users (users who have little or no programming knowledge).
     8. Operational Anomalies: Hierarchical model suffers from the Insert anomalies, Update anomalies and
         Deletion anomalies, also the retrieval operation is complex and asymmetric, thus hierarchical model is not
         suitable for all the cases.
     9. Implementation Limitation: Many of the common relationships do not conform to the l:N format required by
         the hierarchical model. The many-to-many (N:N) relationships, which are more common in real life are very
         difficult to implement in a hierarchical model.
                                 Network Model
   The network model is a database model conceived as a flexible way of representing objects and their
    relationships.
   Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and
    relationship types are arcs, is not restricted to being a hierarchy or lattice.
   The network model replaces the hierarchical model with a graph thus allowing more general
    connections among the nodes. The main difference of the network model from the hierarchical model
    is its ability to handle many to many relationships. In other words it allow a record to have more than
    one parent.
                             Network Model
•   ADVANTAGES:
    1.) Conceptual simplicity-Just like the hierarchical model,the network model is also
        conceptually simple and easy to design.
    2.) Capability to handle more relationship types-The network model can handle the one to
        many and many to many relationships which is real help in modeling the real life situations.
    3.) Ease of data access-The data access is easier and flexible than the hierarchical model.
    4.) Data integrity- The network model does not allow a member to exist without an owner.
    5.) Data independence- The network model is better than the hierarchical model in isolating the
        programs from the complex physical storage details.
    6.) Database standards
•   DISADVANTAGES:
    1.) System complexity- All the records are maintained using pointers and hence the whole
        database structure becomes very complex.
    2.) Operational Anomalies- The insertion, deletion and updating operations of any record
        require large number of pointers adjustments.
    3.) Absence of structural independence-structural changes to the database is very difficult.
                Relational Model
                                                       Attributes
   Example of tabular data in the relational model
    Customer-   customer-   customer-   customer-   account-
    id          name        street      city        number
192-83-7465     Johnson      Alma       Palo Alto   A-101
019-28-3746     Smith        North      Rye         A-215
192-83-7465     Johnson      Alma       Palo Alto   A-201
321-12-3123     Jones        Main       Harrison    A-217
019-28-3746     Smith        North      Rye         A-201
A Sample Relational Database
             Schemas and Instances
   Instances and schemas are similar to types and variables in
    programming languages.
   Schema – the logical structure of the database. This means overall
    description of the database.
    •   e.g., the database consists of information about a set of customers and
        accounts and the relationship between them.
    •   Analogous to type information of a variable in a program.
    •   Physical schema: database design at the physical level.
    •   Logical schema: database design at the logical level.
    •   The schema is specified during the database design process and is not
        expected to change frequently.
                Schemas and Instances
   Instance – the actual content of the database i.e. actual data at a particular
    point in time .
    •   Analogous to the value of a variable.
    •   The actual data in the database may change very frequently.
   Thus many database instances can correspond to the same database
    schema.
   The schema is sometimes called as intension of the database.
   The instance is sometimes called as extension or state of the database.
      Three-Schema Architecture
•   Defines DBMS schemas at three levels:
    • Internal schema at the internal level to describe physical
      storage structures and access paths. Typically uses a
      physical data model.
    • Conceptual schema at the conceptual level to describe the
      structure and constraints for the whole database for a
      community of users. Uses a conceptual or an
      implementation data model.
    • External schemas at the external level to describe the
      various user views. Usually uses the same data model as the
      conceptual level.
      Three-Schema Architecture
   Mappings among schema levels are needed to
    transform requests and data. Programs refer to
    an external schema, and are mapped by the
    DBMS to the internal schema for execution.
   Mappings between internal and conceptual
    schemas is known I/C mapping.
   Mappings between external and conceptual
    schemas is known E/C mapping.
                   Data Independence
   The disjointing of data descriptions from the application programs (or user-
    interfaces) that uses the data is called data independence.
   Data independence is one of the main advantages of DBMS.
   The three-schema architecture provides the concept of data independence,
    which means that upper-levels are unaffected by changes to lower-levels.
   The interfaces between the various levels and components should be well
    defined so that changes in some parts do not seriously influence others.
   There are two kinds of data independence.
    •   Physical Data Independence
    •   Logical Data Independence
         Types of Data Independence
   Physical Data Independence – the ability to modify the physical
    schema without changing the logical schema
    •   Applications depend on the logical schema
    •   In general, the interfaces between the various levels and
        components should be well defined so that changes in some parts
        do not seriously influence others.
   Logical Data Independence – the ability to modify the conceptual
    schema without changing the application program
    •   Usually done when logical structure of database is altered.
    •   Logical data independence is harder to achieve as the application
        programs are usually heavily dependent on the logical structure of
        the data. An analogy is made to abstract data types in
        programming languages.
        Entity-Relationship Model
   An entity-relationship (ER) diagram is a specialized
    graphic that illustrates the interrelationships
    between entities in a database.
   ER diagrams often use symbols to represent three
    different types of information, namely, entities,
    attributes and relationships.
              How do we start ERD
   Define Entities: These are usually nouns used in
    descriptions of the system, in the discussion of business
    rules, or in documentation; identified in the narrative.
   Define Relationships: These are usually verbs used in
    descriptions of the system or in discussion of the
    business rules (entity ______ entity); identified in the
    narrative.
   Add attributes to the relations: These are determined by
    the queries, and may also suggest new entities, e.g.
    grade; or they may suggest the need for keys or
    identifiers.
                  ER Diagram Symbols
   Entity
       An entity is an object or concept about which you want to store
        information.
   Key attribute
       A key attribute is the unique, distinguishing characteristic of the
        entity. For example, an employee's social security number might
        be the employee's key attribute.
                  ER Diagram Symbols
   Relationships
       Relationships illustrate how two entities share information in the
        database structure.
   Weak Entity
       A weak entity is an entity that must defined by a foreign key
        relationship with another entity as it cannot be uniquely identified
        by its own attributes alone.
                 ER Diagram Symbols
   Derived attribute
       A derived attribute is based on another attribute. For example, an
        employee's monthly salary is based on the employee's annual
        salary.
   Multivalued attribute
       A multivalued attribute can have more than one value. For
        example, an employee entity can have multiple skill values.
           How to express Relationships
   Cardinality specifies how many instances of an entity relate to one instance
    of another entity.
   Ordinality is also closely linked to cardinality.
   While cardinality specifies the occurrences of a relationship, ordinality
    describes the relationship as either mandatory or optional.
   In other words, cardinality specifies the maximum number of relationships
    and ordinality specifies the absolute minimum number of relationships.
    When the minimum number is zero, the relationship is usually called
    optional and when the minimum number is one or more, the relationship is
    usually called mandatory.
Relationship Symbols
              a
      Entity-Relationship Model
Example of schema in the entity-
 relationship model
    Entity Relationship Model (Cont.)
   E-R model of real world
    ◦   Entities (objects)
         E.g. customers, accounts, bank branch
    ◦   Relationships between entities
         E.g. Account A-101 is held by customer Johnson
         Relationship set depositor associates customers with
          accounts
   Widely used for database design
    ◦   Database design in E-R model usually converted to
        design in the relational model (coming up next) which is
        used for storage and processing
Data Definition Language (DDL)
   Specification notation for defining the database
    schema
    ◦   E.g.
         create table account (
                 account-number char(10),
                 balance        integer)
   DDL compiler generates a set of tables stored in
    a data dictionary
   Data dictionary contains metadata (i.e., data
    about data)
    ◦   database schema
    ◦   Data storage and definition language
         language in which the storage structure and access
          methods used by the database system are specified
         Usually an extension of the data definition language
        Data Manipulation Language (DML)
   Language for accessing and manipulating the data
    organized by the appropriate data model
    ◦   DML also known as query language
   Two classes of languages
    ◦   Procedural – user specifies what data is required and how
        to get those data
    ◦   Nonprocedural – user specifies what data is required
        without specifying how to get those data
   SQL is the most widely used query language
Entity Sets
   A database can be modeled as:
    ◦   a collection of entities,
    ◦   relationship among entities.
   An entity is an object that exists and is
    distinguishable from other objects.
    ◦   Example: specific person, company, event, plant
   Entities have attributes
    ◦   Example: people have names and addresses
   An entity set is a set of entities of the same type
    that share the same properties.
    ◦   Example: set of all persons, companies, trees, holidays
Entity Sets ‘customer ‘ and ‘loan’
customer-id customer- customer- customer-    loan- amount
              name    street    city        number
Attributes
    An entity is represented by a set of attributes,
     that is descriptive properties possessed by all
     members of an entity set.
          Example:
               customer = (customer-id, customer-name,
                  customer-street, customer-city)
               loan = (loan-number, amount)
 Domain – the set of permitted values for each
     attribute
    Attribute types:
     ◦   Simple and composite attributes.
     ◦   Single-valued and multi-valued attributes
          E.g. multivalued attribute: phone-numbers
     ◦   Derived attributes
          Can be computed from other attributes
          E.g. age, given date of birth
Composite Attributes
    Degree of a Relationship Set
   Refers to number of entity sets that participate in
    a relationship set.
   Relationship sets that involve two entity sets are
    binary (or degree two). Generally, most
    relationship sets in a database system are binary.
   Relationship sets may involve more than two
    entity sets.
     H E.g. Suppose employees of a bank may have jobs (responsibilities) at
       multiple branches, with different jobs at different branches. Then there
       is a ternary relationship set between entity sets employee, job and
       branch
   Relationships between more than two entity sets
    are rare. Most relationships are binary.
             Mapping Cardinalities
   Express the number of entities to which another
    entity can be associated via a relationship set.
   Most useful in describing binary relationship sets.
   For a binary relationship set the mapping
    cardinality must be one of the following types:
    ◦   One to one
    ◦   One to many
    ◦   Many to one
    ◦   Many to many
           Mapping Cardinalities
          One to one                        One to many
Note: Some elements in A and B may not be mapped to any
elements in the other set
         Mapping Cardinalities
         Many to one                      Many to many
Note: Some elements in A and B may not be mapped to any
elements in the other set
Mapping Cardinalities affect ER Design
n Can make access-date an attribute of account, instead of a
  relationship attribute, if each account can have only one customer
   n I.e., the relationship from account to customer is many to one,
      or equivalently, customer to account is one to many
                      E-R Diagrams
n Rectangles represent entity sets.
n Diamonds represent relationship sets.
n Lines link attributes to entity sets and entity sets to relationship sets.
n Ellipses represent attributes
   n Double ellipses represent multivalued attributes.
   n Dashed ellipses denote derived attributes.
n Underline indicates primary key attributes (will study later)
E-R Diagram With Composite, Multivalued,
         and Derived Attributes
Relationship Sets with Attributes
             Cardinality Constraints
   We express cardinality constraints by drawing
    either a directed line (), signifying “one,” or an
    undirected line (—), signifying “many,” between
    the relationship set and the entity set.
   E.g.: One-to-one relationship:
    ◦   A customer is associated with at most one loan via the
        relationship borrower
    ◦   A loan is associated with at most one customer via
        borrower
      One-To-Many Relationship
   In the one-to-many relationship a loan is
    associated with at most one customer via
    borrower, a customer is associated with several
    (including 0) loans via borrower
     Many-To-One Relationships
   In a many-to-one relationship a loan is
    associated with several customers via borrower,
    a customer is associated with at most one loan
    via borrower
     Many-To-Many Relationship
   A customer is associated with several (possibly
    0) loans via borrower
   A loan is associated with several customers via
    borrower
       Alternative Notation for
          Cardinality Limits
n Cardinality limits can also express participation constraints
                             Keys
   A super key of an entity set is a set of one or more
    attributes whose values uniquely determine each
    entity.
   A candidate key of an entity set is a minimal super
    key
    ◦   Customer-id is candidate key of customer
    ◦   account-number is candidate key of account
   Although several candidate keys may exist, one of
    the candidate keys is selected to be the primary
    key.
                          Design Issues
   Use of entity sets vs. attributes
       Choice mainly depends on the structure of the enterprise being
        modeled, and on the semantics associated with the attribute in
        question.
   Use of entity sets vs. relationship sets
       Possible guideline is to designate a relationship set to describe
        an action that occurs between entities
   Binary versus n-ary relationship sets
       Although it is possible to replace any nonbinary (n-ary, for n >
        2) relationship set by a number of distinct binary relationship
        sets, a n-ary relationship set shows more clearly that several
        entities participate in a single relationship.
   Placement of relationship attributes
                       Weak Entity Sets
   An entity set that does not have a primary key is
    referred to as a weak entity set.
   The existence of a weak entity set depends on the
    existence of a identifying entity set
    ◦    it must relate to the identifying entity set via a total, one-
        to-many relationship set from the identifying to the weak
        entity set
    ◦   Identifying relationship depicted using a double diamond
                Weak Entity Sets
   The discriminator (or partial key) of a weak entity
    set is the set of attributes that distinguishes
    among all the entities of a weak entity set.
   The primary key of a weak entity set is formed by
    the primary key of the strong entity set on which
    the weak entity set is existence dependent, plus
    the weak entity set’s discriminator.
                 Weak Entity Sets
   We depict a weak entity set by double rectangles.
   We underline the discriminator of a weak entity set
    with a dashed line.
   Primary key for payment – (loan-number, payment-
    number)
         Weak Entity Sets (Cont..)
   Note: the primary key of the strong entity set is
    not explicitly stored with the weak entity set,
    since it is implicit in the identifying relationship.
   If loan-number were explicitly stored, payment
    could be made a strong entity, but then the
    relationship between payment and loan would
    be duplicated by an implicit relationship defined
    by the attribute loan-number common to
    payment and loan
     More Weak Entity Set Examples
   In a university, a course is a strong entity and a
    course-offering can be modeled as a weak entity
   The discriminator of course-offering would be
    semester (including year) and section-number (if
    there is more than one section)
   If we model course-offering as a strong entity we
    would model course-number as an attribute.
    Then the relationship with course would be implicit
    in the course-number attribute
                      Specialization
   Top-down design process; we designate sub groupings
    within an entity set that are distinctive from other entities
    in the set.
   These sub groupings become lower-level entity sets that
    have attributes or participate in relationships that do not
    apply to the higher-level entity set.
   Depicted by a triangle component labeled IS A (E.g.
    customer “is a” person).
   Attribute inheritance – a lower-level entity set inherits all
    the attributes and relationship participation of the higher-
    level entity set to which it is linked.
Specialization Example
                Generalization
   A bottom-up design process – combine a
    number of entity sets that share the same
    features into a higher-level entity set.
   Specialization and generalization are simple
    inversions of each other; they are represented in
    an E-R diagram in the same way.
   The terms specialization and generalization are
    used interchangeably.
                    Generalization (cont.)
 NoOfPassengers           LicensePlateNo               NoOfAxies             LicensePlateNo
MaxSpeed                                              Tonnage
                   CAR                Price                              TRUCK           Price
VehicleID                                             VehicleID
                       VehicleID              Price   LicensePlateNo
                                      VEHICLE
             NoOfPassengers                   d              NoOfAxies
            MaxSpeed
                              CAR                     TRUCK              Tonnage
           Generalization (Cont..)
   Generalization suppresses the difference among
    several entity types, identifying their common
    features, and generalize them into a single
    superclass of which the original types are special
    subclasses.
   The decision as to which process, generalization
    or specialization, is more appropriate in a
    particular situation is often subjective.
Specialization and Generalization
   Can have multiple specializations of an entity set based on
    different features.
   E.g. permanent-employee vs. temporary-employee, in
    addition to officer Vs secretary Vs teller
   Each particular employee would be
    ◦ a member of one of permanent-employee or temporary-
      employee,
    ◦ and also a member of one of officer, secretary, or teller
   The IS A relationship also referred to as superclass - subclass
    relationship
                 Design Constraints on a
               Specialization/Generalization
   Constraint on which entities can be members of a
    given lower-level entity set.
    ◦   condition-defined
         E.g. all customers over 65 years are members of senior-citizen
          entity set; senior-citizen ISA person.
    ◦   user-defined
   Constraint on whether or not entities may belong to
    more than one lower-level entity set within a single
    generalization.
    ◦   Disjoint
         an entity can belong to only one lower-level entity set
         Noted in E-R diagram by writing disjoint next to the ISA
          triangle
    ◦   Overlapping
         an entity can belong to more than one lower-level entity set
               E-R Design Decisions
   The use of an attribute or entity set to represent an
    object.
   Whether a real-world concept is best expressed by an
    entity set or a relationship set.
   The use of a ternary relationship versus a pair of binary
    relationships.
   The use of a strong or weak entity set.
   The use of specialization/generalization – contributes to
    modularity in the design.
   The use of aggregation – can treat the aggregate entity
    set as a single unit without concern for the details of its
    internal structure.
E-R Diagram for a Banking Enterprise
Summary of Symbols Used in E-R
           Notation
Reduction of an E-R Schema to Tables
    Primary keys allow entity sets and relationship sets to be
     expressed uniformly as tables which represent the contents
     of the database.
    A database which conforms to an E-R diagram can be
     represented by a collection of tables.
    For each entity set and relationship set there is a unique table
     which is assigned the name of the corresponding entity set or
     relationship set.
    Each table has a number of columns (generally corresponding
     to attributes), which have unique names.
    Converting an E-R diagram to a table format is the basis for
     deriving a relational database design from an E-R diagram.
Representing Entity Sets as Tables
    A strong entity set reduces to a table with the
     same attributes.
                Composite Attributes
   Composite attributes are flattened out by creating
    a separate attribute for each component attribute
    ◦   E.g. given entity set customer with composite attribute
        name with component attributes first-name and last-
        name the table corresponding to the entity set has two
        attributes :
                name.first-name and
              name.last-name
               Multivalued Attributes
   A multivalued attribute M of an entity E is
    represented by a separate table EM
    ◦   Table EM has attributes corresponding to the primary key
        of E and an attribute corresponding to multivalued
        attribute M
    ◦   E.g. Multivalued attribute dependent-names of employee
        is represented by a table
           employee-dependent-names( employee-id, dname)
    ◦   Each value of the multivalued attribute maps to a
        separate row of the table EM
         E.g., an employee entity with primary key Charles and
          dependents Johnson and Jumi maps to two rows:
            (John, Johnson) and (John, Jumi)
 Representing Weak Entity Sets
n A weak entity set becomes a table that includes a
  column for the primary key of the identifying strong
  entity set
    Representing Relationship Sets as
                             Tables
    A many-to-many relationship set is represented as a table with
    columns for the primary keys of the two participating entity sets, and
    any descriptive attributes of the relationship set.
   E.g.: table for relationship set borrower
           Redundancy of Tables
n Many-to-one and one-to-many relationship sets that are
  total on the many-side can be represented by adding an
  extra attribute to the many side, containing the primary
  key of the one side
n E.g.: Instead of creating a table for relationship account-
  branch, add an attribute branch to the entity set account
      Redundancy of Tables (Cont.)
   For one-to-one relationship sets, either side can be chosen to
    act as the “many” side
    ◦ That is, extra attribute can be added to either of the tables
      corresponding to the two entity sets
   If participation is partial on the many side, replacing a table
    by an extra attribute in the relation corresponding to the
    “many” side could result in null values
   The table corresponding to a relationship set linking a weak
    entity set to its identifying strong entity set is redundant.
    ◦ E.g. The payment table already contains the information that
      would appear in the loan-payment table (i.e., the columns
      loan-number and payment-number).
Representing Specialization as Tables
   Method 1:
    ◦   Form a table for the higher level entity
    ◦   Form a table for each lower level entity set, include primary key of
        higher level entity set and local attributes
          table   table attributes
        personname, street, city
        customer        name, credit-rating
        employee        name, salary
    ◦   Drawback: getting information about, e.g., employee requires
        accessing two tables
Representing Specialization as Tables
 Method 2:   (Cont.)
     ◦   Form a table for each entity set with all local and inherited attributes
         table   table attributes
         personname, street, city
         customer        name, street, city, credit-rating
         employee        name, street, city, salary
     ◦   If specialization is total, table for generalized entity (person) not
         required to store information
          Can be defined as a “view” relation containing union of
            specialization tables
          But explicit table may still be needed for foreign key constraints
     ◦   Drawback: street and city may be stored redundantly for persons
         who are both customers and employees
End of Unit 1
Thanks…….