Database Management System
(DBMS)
            Chapter 1: Introduction
                    Dr. Pramod Kumar Singh
                  Professor, Department of CSE
  ABV-Indian Institute of Information Technology and Management
             Morena Link Road, Gwalior, MP - 474015
Database Management System (DBMS)
 ✓ A DBMS is a collection of interrelated data and a set of programs to access that data.
 ✓ The collection of data, known as database, contains information about a particular enterprise.
 ✓ The primary goal of a DBMS is to provide an environment to store and retrieve database
   information that is both convenient and efficient to use.
    ❖ Database systems are designed to manage large bodies of information.
    ❖ Management of data involves both defining structures for storage of information and providing
       mechanisms for the manipulation of information.
    ❖ Additionally, the database system must ensure the safety of the information stored, despite
       system crashes or attempts at unauthorized access.
    ❖ If data are to be stored among several users, the system must avoid possible anomalous results.
Database Management System (DBMS)
 ✓ Database Applications:
    ❖ Banking: transactions
    ❖ Airlines: reservations, schedules
    ❖ Universities: registration, grades
    ❖ Sales: customers, products, purchases
    ❖ Online retailers: order tracking, customized recommendations
    ❖ Manufacturing: production, inventory, orders, supply chain
    ❖ Human resources: employee records, salaries, tax deductions
 ✓ The above is only an indicative list. Databases form an essential part of every enterprise today and
   touch all aspects of our lives.
 ✓ The use of databases grew in all enterprises since 1960s. However, very few people use to interact
   with the database system directly. A direct user access to the databases increased sharply since the
   Internet revolution in late 1990s. Most of our interactions to the Internet consists of database access,
   e.g., online purchase, online banking, etc.
Purpose of Database System
 ✓ If we do not use a database system, we will use typical file processing system, which is supported by
   the conventional operating systems, to manage the data.
 ✓ This system
    ❖ Stores permanent records in various files and
    ❖ Needs various application programs to extract records from, and add records to, the
        appropriate files.
 ✓ In this system …
     ❖ Typically system programmers write application programs to meet needs of the organization.
     ❖ New application programs are required to be written to handle new rules in the organization.
 ✓ Thus, as the time goes by, the system requires more files and more application programs.
Drawbacks of Using File Systems to Store data
 ✓ Data redundancy and inconsistency
    ❖ As different programmers create the files and application programs over a long period of time,
       the various files may have different structures and programs may be written in several
       programming languages.
    ❖ The same information may be duplicated at several places (files) – data redundancy. It leads to
       higher storage and access cost.
    ❖ The duplication (redundancy) may lead to data inconsistency as various copies of the same
       data may not agree.
 ✓ Difficulty in accessing data
    ❖ Need to write a new program to carry out each new task
    ❖ The conventional file processing environment do not allow needed data to be retrieved in a
        convenient and efficient manner. More responsive data-retrieval systems are required for
        general use.
Drawbacks of Using File Systems to Store Data (Contd)
 ✓Data isolation — multiple files and formats
    ❖ As data is scattered in various files and files may be in different formats, writing new
       application programs to retrieve appropriate data is difficult.
 ✓Integrity problems
    ❖ The data stored in the database must satisfy certain types of consistency constraints.
    ❖ These constraints (e.g., account balance > 0) are “buried” in program code rather than being
       stated explicitly. Adding new constraints or change existing ones in programs is difficult.
    ❖ It is more tough when constraints involve several data items from different files.
 ✓Atomicity of updates
    ❖ Failures may leave database in an inconsistent state with partial updates carried out.
       ➢Example: Transfer of funds from one account to another should either complete or not
       happen at all.
Drawbacks of Using File Systems to Store Data (Contd)
 ✓ Concurrent access by multiple users
    ❖ Concurrent access is needed for performance.
    ❖ Uncontrolled concurrent accesses can lead to inconsistencies
       ➢ Example: Two people reading a balance (say 100) and updating it by withdrawing money
           (say 50 each) at the same time.
    ❖ To guard against this possibility, the system must maintain some form of supervision, which is
      difficult to provide as the data may be accessed by many different application programs that
      have not been considered previously.
 ✓ Security problems
    ❖ Every user of the database system should not be able to access all the data. However, since
       application programs are added to the file processing system in an ad hoc manner, it is hard to
       provide such security constraints.
                      Database systems offer solutions to all the above problems.
View of Data
 ✓ Along with ease and efficiency a major purpose of the database system is to provide users with an
   abstract view of data. That is the system hides certain details of how the data are stored and
   maintained.
 Levels of Abstraction
 ✓ For the system to be useable, it must retrieve the data efficiently.
 ✓ The need of efficiency has led designers to use complex data structures to represent data in the
   database.
 ✓ Since many users are novice, developers hide the complexity from users through several levels of
   abstraction to simplify the user interaction with the system.
Level of Abstraction
 Physical level
 ✓The lowest level of abstraction describes how the data are actually stored. It describes complex low-
 level data structures in detail.
 Logical level
 ✓The next higher level of abstraction describes what data are stored in the database, and what
 relationships exist among those data. Thus, it describes the entire database in terms of a small number
 of relatively simple structures.
 ✓Though the implementation of simple structures at this level may involve complex physical-level
 structures, the user at this level does not need to know this complexity. It is known as physical data
 independence.
 View level
 ✓The highest level of abstraction hides details of data types using application programs. It provides a
 simplified interaction between the user and the system. The system may provide many views for the
 same database for security purposes.
View of Data (Level of Abstraction)
        The three levels of data abstraction
                                               Users
                                               Database Administrator and Programmers
                                               Database Designer
Instances and Schemas
 ✓ Schema – the logical structure of the database
        ➢ Example: The database consists of information about a set of customers and accounts and
          the relationship between them
    ❖ Analogous to the variable declarations along with associated type definitions in a program
        ➢ Physical schema: database design at the physical level
        ➢ Logical schema: database design at the logical level
        ➢ Subschemas: several schemas at the view level that describe different views of the database.
 ✓ Instance – the actual content of the database at a particular point in time
     ❖ Analogous to the value of a variable
 ✓ Physical Data Independence – the ability to modify the physical schema without changing the logical
   schema
     ❖ Applications depend on the logical schema
     ❖ In general, the interfaces between the various levels and components should be well defined so
       that changes in some parts do not seriously influence others.
Data Models
 ✓ Underlying the structure of a database is the data model. It is a collection of conceptual tools for
   describing data, data relationships, data semantics, and consistency constraints. It provides a way to
   describe the design of a database at the physical, logical, and the view levels.
 ✓ There are different data models:
    ❖ Relational model
    ❖ Entity-Relationship data model (mainly for database design)
    ❖ Object-based data models (Object-oriented and Object-relational)
    ❖ Semi-structured data model (XML)
    ❖ Other older models: (these models were tied closely to the underlying implementation and
      complicated the task of modeling data. As a result, they are extinct, now.)
        ➢ Network model
        ➢ Hierarchical model
Database Architecture
 The architecture of a database system is greatly influenced by the underlying computer system on
 which the database is running:
     ✓ Centralized
     ✓ Client-server (one server machine executes work on behalf of multiple client machines.)
     ✓ Parallel (exploits parallel computer architecture)
     ✓ Distributed (span multiple geographically separated machines)
Client-Server Database Architecture
 The database applications are usually partitioned into two or three parts.
  Application resides at client machine which             The client machine acts as merely a front end
  invokes database functionality at server machine        and does not contain any direct database
  through query language. Application program             calls. The client usually communicate with
  interface standards like ODBC and JDBC are              application server through a form interface
  used.                                                   and application server communicates with
                                                          data server. Good for large applications and
                                                          www.
Database Systems Internals
Database Systems Internals (Contd)
 Functional components of a database are broadly divided into:
 ✓ Storage manager
   ❖ It is very important as it requires a movement of data to and from disks because databases are
      too large to fit in primary memory.
   ❖ The database system should structure the data in such a way that the need of data movement
      between primary and secondary memory should be minimized.
 ✓ Query processor
   ❖ It helps to simplify and facilitate access to data.
   ❖ It allows users to obtain good performance while working at view level without burdened with
      understanding of physical level details of the implementation of the system.
   ❖ It translates updates and queries written in a nonprocedural language at the logical level into an
      efficient sequence of operations at the physical level.
Storage Management
 ✓ Storage manager is a program module that provides the interface between the low-level data stored
   in the database and the application programs and queries submitted to the system.
 ✓ Its components are
      ❖ Authorization and integrity manager: checks integrity constraints and authority of users
      ❖ Transaction manager: ensures that database remains consistent even after system failure
      ❖ File manager: allocation of space on disk storage and data structures used to represent the
         information on the disk
      ❖ Buffer manager: responsible for fetching data and deciding which data to cache in main
         memory.
Storage Management (Conted)
 ✓ The storage manager is responsible for the following tasks:
    ❖ Interaction with the file manager
    ❖ Efficient storing, retrieving and updating of data
 ✓ It implements several data structures as part of the physical system implementation.
    ❖ Data files: stores the database itself
    ❖ Data dictionary: stores metadata about the structure of the database
    ❖ Indices: provides fast access to the data items.
Query Processing
 1. Parsing and translation
 2. Optimization
 3. Evaluation
Query Processing (Contd)
 ✓ Alternative ways of evaluating a given query
    ❖ Equivalent expressions
    ❖ Different algorithms for each operation
 ✓ Cost difference between a good and a bad way of evaluating a query can be enormous
 ✓ Need to estimate the cost of operations
    ❖ Depends critically on statistical information about relations which the database must maintain
    ❖ Need to estimate statistics for intermediate results to compute cost of complex expressions
Database Users and Administrators
 ✓ People who work with database can be categorized as
    ❖ database users
    ❖ database administrators
 ✓ Database users may be further subcategorized as
    ❖ Naïve users
    ❖ Applications programmers
    ❖ Sophisticated Users
Database Users and Administrators (Contd)
 ✓ Naïve Users: unsophisticated users who interact with the database system by invoking application
   programs which are written previously.
 ✓ Application Programmers: computer programmers who write application programs.
 ✓ Sophisticated Users: Neither use previously written programs nor write programs. Rather they
   interact by database query language or by using tools. The data analysts fall in this category.
Database Users and Administrators (Contd)
                                         Database Administrator
 Has central control over both the data and the programs that access those data. The functions of a
 database administrator (DBA) are
  ❖ Schema definition: creates original database schema by executing data definition statements in the
     DDL.
  ❖ Storage structure and access-method definition:
  ❖ Schema and physical-organization modification: change in schema and physical organization to
     reflect changing needs of the organization.
  ❖ Granting of authorization for data access: regulate which parts of the database various users can
     access.
  ❖ Routine maintenance: periodically back up of database, ensuring enough free disk space, and
     monitoring to ensure that performance does not degrade.
Database Languages
 A database provides
    ✓ Data Manipulation Language (DML):
       express database queries and updates.
    ✓ Data definition language (DDL):
       specify database schema
 In practice, DML and DDL are not two separate languages; instead, they simply form parts of a single
 database language, such as widely used SQL language.
Data Manipulation Language (DML)
 ✓ Language for accessing and manipulating the data (retrieval, insertion, deletion and modification)
   organized by the appropriate data model. It is also known as query language.
 ✓ Two classes of languages
    ❖ Procedural – user specifies what data is required and how to get those data
    ❖ Declarative (nonprocedural) – user specifies what data is required without specifying how to get
      those data
 ✓ Declarative DMLs are easier to learn and use than the procedural DMLs. However, the database
   system has to figure out an efficient means of accessing data.
 ✓ Query language and data manipulation language are used synonymously. SQL is the most widely
   used query language.
Data Definition Language (DDL)
 ✓ Specify the database schema and additional properties of the data.
 ✓ DDL interpreter generates a set of table templates stored in a data dictionary.
 ✓ Data dictionary contains metadata (i.e., data about data)
    ❖ Database schema and
    ❖ Consistency (integrity) constraints
         ➢ Domain constraints: most elementary form of integrity constraint, the values that an
            attribute can take.
         ➢ Referential integrity: a set of attributes in one relation must appear in some other relation.
         ➢ Assertions: a condition that the database must always satisfy.
         ➢ Authorization: differentiate among users for the type of access they are permitted.
            ▪ Read: read but no modification
            ▪ Insert: insertion but no modification
            ▪ Update: modification but no deletion
            ▪ Deletion: allows deletion
History of Database Systems
 ✓ 1950s and early 1960s:
    ❖ Data processing using magnetic tapes for storage
        ➢ Tapes provided only sequential access
    ❖ Punched cards for input
 ✓ Late 1960s and 1970s:
    ❖ Hard disks allowed direct access to data
    ❖ Network and hierarchical data models in widespread use
    ❖ Ted Codd defines the relational data model
        ➢ Would win the ACM Turing Award for this work
        ➢ IBM Research begins System R prototype
        ➢ UC Berkeley begins Ingres prototype
    ❖ High-performance (for the era) transaction processing
History of Database Systems (Contd)
 ✓ 1980s:
    ❖ Research relational prototypes evolve into commercial systems
        ➢ SQL becomes industrial standard
    ❖ Parallel and distributed database systems
    ❖ Object-oriented database systems
 ✓ 1990s:
    ❖ Large decision support and data-mining applications
    ❖ Large multi-terabyte data warehouses
    ❖ Emergence of Web commerce
 ✓ Early 2000s:
    ❖ XML and XQuery standards
    ❖ Automated database administration
 ✓ Later 2000s:
    ❖ Giant data storage systems
        ➢ Google BigTable, Yahoo PNuts, Amazon, ..