DBMS Unit 1
DBMS Unit 1
B.Tech CSE 3rd year (unit one) DBMS (Dr. A.P.J. Abdul Kalam Technical University)
                                                    UNIT 1
Introduction: Overview, Database System vs File System, Database System Concept and Architecture, Data
Model Schema and Instances, Data Independence and Database Language and Interfaces, Data Definitions
Language, DML, Overall Database Structure.
Data Modeling Using the Entity Relationship Model: ER Model Concepts, Notation for ER Diagram,
Mapping Constraints, Keys, Concepts of Super Key, Candidate Key, Primary Key, Generalization,
Aggregation, Reduction of an ER Diagrams to Tables, Extended ER Model, Relationship of Higher Degree.
Data
     Data refers to pieces of informa on that are collected and can be analysed or processed.
     It can come in various forms, such as numbers, text, images, or sounds, and can be structured (like
      databases).
DBMS
A Database Management System (DBMS) is so ware that enables users to create, manage, and interact with
databases. It provides an interface for users and applica ons to access and manipulate data efficiently. Key func ons
of a DBMS include:
    1.   Data Storage: Organizes and stores data in a structured format, usually in tables.
    2.   Data Retrieval: Allows users to query the database to retrieve specific informa on.
    3.   Data Manipula on: Supports opera ons such as inser ng, upda ng, and dele ng data.
    4.   Data Security: Protects data through user authen ca on and access controls.
    5.   Data Integrity: Ensures accuracy and consistency of data through constraints and valida on rules.
    6.   Backup and Recovery: Provides mechanisms for data backup and restora on in case of failure.
Examples of popular DBMSs include MySQL, PostgreSQL, Oracle Database, and Microso SQL Server.
Structure: Data is organized in a structured format            Structure: Data is stored in files without a predefined
(tables, rows, columns) and defined by a schema.                structure; files can be text, images, etc.
Data Integrity: Enforces integrity constraints (like           Data Integrity: Lacks built-in integrity constraints; data
primary keys and foreign keys) to maintain data                consistency is managed at the applica on level.
accuracy.                                                      Data Access: Retrieval is usually done through file
Data Access: Uses a query language (like SQL) for              paths and may require custom code for complex
complex data retrieval and manipula on.                        opera ons.
Concurrency: Supports mul ple users accessing and              Concurrency: Limited support for simultaneous
modifying data simultaneously, ensuring data                   access, leading to poten al data conflicts.
consistency.                                                   Data Rela onships: Managing rela onships between
Data Rela onships: Manages rela onships between                data stored in different files can be cumbersome and
different data en es efficiently (e.g., through joins).           requires addi onal coding.
Security: Offers advanced security features, including          Security: Basic security features, o en limited to file
user roles and permissions.                                    permissions at the opera ng system level.
Backup and Recovery: Typically includes built-in tools          Backup and Recovery: Backup mechanisms are
for data backup, recovery, and transac on                      generally manual and may not provide comprehensive
management.                                                    recovery op ons
DBMS VS RDBMS (all the RDBMSs are DBMSs, but not all DBMSs are RDBMSs.)
                                  DBMS                                                         RDBMS
            -- Data stores in various formats(like                                  --Uses rela onal model (Microso SQL
              hierarchical, network etc.)                                              Server, MySql, Oracle Database)
            --May not provide SQL                                                   --Provides SQL
            --Normaliza on might not be focused.                                    --Normaliza on(to avoid redundancy)
Data Integrity
Refers to the accuracy, consistency, and reliability of data throughout its lifecycle.
Accuracy – Data should be correct and free from errors(correct details).
Consistency – Consistent across different datasets and applica ons(for instance, if a customer’s address changes,
                it should be updated in all relevant systems to ensure consistency).
Reliability – Data should be reliable and trustworthy for decision making and opera ons.
     Database Abstrac on
     It refers to the concept of hiding the complexi es of database management from users and applica ons by
     providing a simplified and higher-level view of the data. It allows users to interact with data without needing
     to understand the underlying physical storage and implementa on details. Database abstrac on is crucial
     for efficient data management, ease of use, and scalability.
      Example of Database Abstrac on - Users interact with a table called “Customers” rather than dealing
      with the underlying physical files or storage structures.
Client-Server Architecture is a fundamental design model used in networked applica ons, including
database management systems (DBMS). It organizes the interac ons between clients (users or applica ons)
and servers (which provide resources or services).
1.Two – er architecture
2.Three – er architecture
Example: Web-Based E-Commerce Pla orm
           Descrip on: A web-based e-commerce pla orm like Amazon or eBay typically uses a three-
                er architecture:
           Presenta on Tier: The web browser or mobile app that the user interacts with.
           Applica on Tier (Logic Tier): The web server or applica on server where business logic is
              implemented. This layer processes user requests, performs opera ons, and interacts with
              the database.
           Database Tier: The database server that stores and manages the data (such as product
              informa on, user accounts, and order details).
           Use Case: Users browse products, place orders, and manage their accounts through the web
              or mobile interface. The applica on er handles business logic like calcula ng prices,
              managing inventory, and processing transac ons, while the database er handles data
              storage and retrieval.
           Advantages: Enhanced scalability, maintainability, and separa on of concerns. Different ers
              can be scaled independently, and changes to one er can be made without affec ng the
              others.
           Disadvantages: More complex to design and manage; requires careful coordina on between
                ers.
Schema refers to the overall structure or design of a database. It defines how data is organized and how
rela onships between data elements are managed. Essen ally, a schema is a blueprint for the database,
specifying the tables, columns, data types, constraints, and rela onships.
Real-Life Example: E-Commerce Database Schema
Consider an e-commerce website. The schema for the database might include the following:
            1. Tables:
                    o Customers Table: Stores informa on about users who register on the website.
                             Columns: CustomerID (Primary Key), Name, Email, Address, PhoneNumber.
                    o Products Table: Contains details about items available for sale.
                             Columns: ProductID (Primary Key), ProductName, Descrip on, Price,
                                StockQuan ty.
                    o Orders Table: Records each purchase made by customers.
                             Columns: OrderID (Primary Key), CustomerID (Foreign Key), OrderDate,
                                TotalAmount.
                    o OrderItems Table: Details the individual items in each order.
     Instance refers to the actual data stored in the database at a specific point in me. It represents the
     contents of the database as defined by the schema. While the schema is the sta c structure, the instance is
     dynamic and changes as data is added, updated, or deleted.
     Real-Life Example: E-Commerce Database Instance
     Using the schema from the e-commerce example, an instance would include specific data records:
                 1. Customers Table Instance:
                         o (1, "John Doe", "john.doe@example.com", "123 Elm Street", "555-1234")
                         o (2, "Jane Smith", "jane.smith@example.com", "456 Oak Avenue", "555-5678")
                 2. Products Table Instance:
                         o (1, "Laptop", "High-performance laptop", 999.99, 50)
                         o (2, "Smartphone", "Latest model smartphone", 499.99, 100)
                 3. Orders Table Instance:
                         o (1, 1, "2024-09-10", 1499.98)
                         o (2, 2, "2024-09-11", 499.99)
                 4. OrderItems Table Instance:
                         o (1, 1, 1, 1, 999.99) (OrderID 1, ProductID 1, Quan ty 1, ItemPrice $999.99)
                         o (2, 1, 2, 1, 499.99) (OrderID 1, ProductID 2, Quan ty 1, ItemPrice $499.99)
                         o (3, 2, 2, 1, 499.99) (OrderID 2, ProductID 2, Quan ty 1, ItemPrice $499.99)
Database Models
   It refers the ways data is structured, organized and managed within a database system.
   Four models of DBMS Systems are—
   1> Hierarchical model
   2> Network model
   3> Rela onal model
   4> Object-oriented model
Hierarchical model -- It is one of the oldest models in the data model which was developed by IBM, in the 1950s.
In a hierarchical model, data are viewed as a collec on of tables, or we can say segments that form a hierarchical
rela on. In this, the data is organized into a tree-like structure where each record consists of one parent record
and many children.
Network model -- This model is the generaliza on of the hierarchical model. This model can consist of mul ple
parent segments and these segments are grouped as levels but there exists a logical associa on between the
segments belonging to any level. Mostly, there exists a many-to-many logical associa on between any of the two
segments.
Rela onal model – This model organizes data into tables (also called rela ons), which are made up of rows and
columns. Each table represents a specific en ty, like "Customers" or "Orders," and each row in a table represents
a single record. Each column contains a specific a ribute of the en ty, like "Name" or "Address."
The key feature of the rela onal model is the use of primary keys (unique iden fiers for records in a table) and
foreign keys (a ributes that create rela onships between tables). This structure allows for efficient querying and
ensures data integrity by maintaining rela onships between related tables.
Object – Oriented model – It is a type of database model that stores data as objects, similar to the way data is
represented in object-oriented programming (OOP). In this model, objects contain both data (a ributes or
proper es) and methods (func ons or behaviours) that can operate on the data.
Unlike rela onal databases, which store data in tables, an object-oriented database stores complex data as
objects, allowing for more natural representa on of real-world en es. It supports features like inheritance,
polymorphism, and encapsula on, which makes it easier to model complex rela onships and behaviours between
en es.
En ty – Object in the real world that is dis nguishable from another object.
         Example – Teacher, Student, BankAccount
En ty – Set – An en ty set is a collec on of similar en es that share common characteris cs or proper es.
Ex- In an employee database, the Employee en ty set consists of all employees in the organiza on.
            Example:
            For a Customer en ty set, possible a ributes could include:
            CustomerID (Unique iden fier for a customer)
            Name (Name of the customer)
Types of A ributes:
                1. Simple A ribute: A single-valued a ribute (e.g., CustomerName).
                2. Composite A ribute: An a ribute that can be divided into smaller sub-parts (e.g., FullName
                   can be divided into FirstName and LastName).
                3. Derived A ribute: An a ribute that can be derived from other a ributes (e.g., Age derived
                   from DateOfBirth).
                4. Mul -valued A ribute: An a ribute that can have mul ple values (e.g., PhoneNumbers could
                   have mul ple phone numbers).
Domain
The domain of an a ribute defines the set of allowable values that the a ribute can take. In simple terms, it's the
data type or range of valid values for a given a ribute.
             Example:
             For an a ribute Age, the domain could be integers between 0 and 120.
             For an a ribute Gender, the domain could be limited to values like Male, Female, or Other.
Rela onship
A rela onship represents how two or more en ty sets are associated with each other.
Unary (Degree 1)
In a rela on only one en ty set is par cipa ng then such type of rela onship is known as a unary rela onship
Binary (Degree 2)
In a rela on when two en ty sets are par cipa ng then such type of rela onship is known as a binary
rela onship.
Ternary (Degree 3)
we can say that a Ternary rela onship exists when there are three types of en ty and we call them a degree of
rela onship is 3.
N-ary (n Degree)
In the N-ary rela onship, there are n types of en ty that associates. So, we can say that an N-ary rela onship
exists when there are n types of en es.
Mapping Cardinality
Cardinality refers to the number of instances of one en ty that can be associated with the number of instances of
another en ty in a rela onship. The mapping cardinality describes how en es are related to each other based
on the rela onship type, and it plays a crucial role in designing the schema and establishing rela onships
between tables.
One to One:
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one rela onship from one en ty to
another en ty.
Many to One:
In many to one cardinality mapping, from set 1, there can be mul ple sets that can make rela onships with a
single en ty of set 2. Or we can also describe it as from set 2, and one en ty can make a rela onship with more
than one en ty of set 1.
One to one Cardinality is the subset of Many to one Cardinality. It can be represented by M:1.
One to Many:
In One-to-many cardinality mapping, from set 1, there can be a maximum single set that can make rela onships
with a single or more than one en ty of set 2. Or we can also describe it as from set 2, more than one en ty can
make a rela onship with only one en ty of set 1.
One to one cardinality is the subset of One-to-many Cardinality. It can be represented by 1: M.
Many to Many:
In many, many cardinali es mapping, there can be one or more than one en ty that can associate with one or
more than one en ty of set 2. In the same way from the end of set 2, one or more than one en ty can make a
rela on with one or more than one en ty of set 1.
It is represented by M: N or N: M.
Data Independence
The ability to modify the schema defini on of a DBMS at one level, without affec ng the schema defini on of the
next higher level is called data independence.
Data independence provides flexibility in managing and organizing data, making the system more robust,
maintainable, and adaptable to changes.
There are two types of data independence:
                 1. Logical Data Independence
                 2. Physical Data Independence
Strong En ty Set
A strong en ty set (also called a regular en ty set) is an en ty set that can be uniquely iden fied by its own
a ributes. It has a primary key, which is a set of a ributes that can uniquely iden fy each en ty in the set.
Key Characteris cs:
                   Has a primary key: Every record in a strong en ty set can be uniquely iden fied by its own
                      a ributes.
                   Independent: A strong en ty does not depend on any other en ty for its iden fica on.
                
Weak En ty Set
A weak en ty set is an en ty set that cannot be uniquely iden fied by its own a ributes alone. Instead, it relies
on the primary key of another en ty (called the owner en ty) and one or more addi onal a ributes to form a
composite key that uniquely iden fies the en ty.
Key Characteris cs:
                 No primary key: A weak en ty does not have a sufficient set of a ributes to form a unique
                     iden fier on its own.
                 Dependent on a strong en ty: A weak en ty depends on a strong en ty for iden fica on.
                     The primary key of the strong en ty is combined with the weak en ty’s a ributes to form a
                     composite primary key.
                 Has a par al key: The weak en ty has a par al key (some mes called a discriminator) that,
                     when combined with the primary key of the strong en ty, forms a full key.
                 Iden fied by its rela onship: The weak en ty can be iden fied through a rela onship with
                     the strong en ty.
ER – Model
The En ty rela onship model is a high-level data model.
ER Model and ER diagrams gives the clear, visual representa on of how data is organized and how data elements
interact with one another.
The ER data model consists of a collec on of basic objects, called en                es, and of rela onships among these
objects.
                    ER diagrams represent the E-R model in a database, making them easy to convert into
                     rela ons (tables).
                    ER diagrams provide the purpose of real-world modeling of objects which makes them
                     intently useful.
                    ER diagrams require no technical knowledge and no hardware support.
                    These diagrams are very easy to understand and easy to create even for a naive user.
                    It gives a standard solu on for visualizing the data logically.
                1. Generalisa on
                2. Specialisa on
                3. Aggrega on
All the components of an ER diagram are included in an EER diagram and there are some addi ons men oned in
below points.
Addi ons are Specializa on and Generaliza on, Aggrega on, Subclass and Superclass, Higher and lower-level
en ty sets etc.
Generaliza on
It is the process of extrac ng the common a ributes or proper es of en es to sum them up and form a
superclass.
In generaliza on we combine lower-level en es to form a higher-level en ty. Thus, it’s clear that it follows a
bo om-up approach.
Example – In a bank there are two different types of accounts – Current and Savings, combine to form a super
en ty Account.
Specializa on
It is a procedure that defines a set of en es that are divided into subgroups based on their characteris cs.
While generaliza on may follow a bo om-up approach. Specializa on is opposite to that; it follows a top-down
approach rather.
Example – Employee may be decomposed to further as current employee en ty and ex-employee en ty.
Aggrega on
Rela onship set par cipa ng in rela onship then aggrega on is used.
Keys
                    Keys play an important role in the rela onal database.
                    It is used to uniquely iden fy any record or row of data from the table. It is also used to
                     establish and iden fy rela onships between tables.
                    Key can also be defined as an a ribute or set of a ributes whose values can uniquely
                     iden fy a tuple in a rela on.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON table,
passport_number, license_number, SSN are keys since they are unique for each person.
1.Super key
                o   Super key is an a ribute set that can uniquely iden fy a tuple. A super key is a superset of a
                    candidate key.
                o   All possible keys of a rela on. (SuperSet of all keys)
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two employees
can be the same, but their EMPLYEE_ID can't be the same. Hence, this combina on can also be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
2.Primary key
                o   It is the first key used to iden fy one and only one instance of an en ty uniquely. An en ty
                    can contain mul ple keys, as we saw in the PERSON table. The key which is most suitable
                    from those lists becomes a primary key.
                o   Primary key is simply the chosen Candidate key and also primary key never contains any
                    NULL value
                o   In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
                    EMPLOYEE table, we can even select License_Number and Passport_Number as primary keys
                    since they are also unique.
                o   For each en ty, the primary key selec on is based on requirements and developers.
3.Candidate key
                o   A candidate key is an a ribute or set of a ributes that can uniquely iden fy a tuple.
                o   Except for the primary key, the remaining a ributes are considered a candidate key. The
                    candidate keys are as strong as the primary key.
                o   For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the
                    a ributes, like SSN, Passport_Number, License_Number, etc., are considered a candidate key.
4.Foreign key
                o   Foreign keys are the column of the table used to point to the primary key of another table.
                o   Every employee works in a specific department in a company, and employee and department
                    are two different en es. So we can't store the department's informa on in the employee
                    table. That's why we link these two tables through the primary key of one table.
                o   We add the primary key of the DEPARTMENT table, Department_Id, as a new a ribute in the
                    EMPLOYEE table.
                o   In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5.Alternate key
                o   There may be one or more a ributes or a combina on of a ributes that uniquely iden fy
                    each tuple in a rela on. These a ributes or combina ons of the a ributes are called the
                    candidate keys. One key is chosen as the primary key from these candidate keys, and the
                    remaining candidate key, if it exists, is termed the alternate key.
                o   In other words, the total number of the alternate keys is the total number of candidate keys
                    minus the primary key.
                o   The alternate key may or may not exist. If there is only one candidate key in a rela on, it
                    does not have an alternate key.
5.Composite key
                o   Whenever a primary key consists of more than one a ribute, it is known as a composite key.
                    This key is also known as Concatenated Key.
                o   For example, in employee rela ons, we assume that an employee may be assigned mul ple
                    roles, and an employee may work on mul ple projects simultaneously. So the primary key
                    will be composed of all three a ributes, namely Emp_ID, Emp_role, and Proj_ID in
                    combina on. So, these a ributes act as a composite key since the primary key comprises
                    more than one a ribute.
DDL is a subset of SQL used to define and manage the structure of a database.
                o   CREATE – Creates new database objects. (e.g. tables)
                o   ALTER – Modifies exis ng database objects.
                o   DROP – Deletes database objects.
                o   TRUNCATE – Deletes all rows from a table.
DQL is a subset of SQL used specifically for querying and retrieving data from a database.
                o   SELECT – Retrieves data from one or more tables.
DML is a subset of SQL used for managing and manipula ng data within a database.
                o   INSERT – Adds new records.
                o   UPDATE – Modifies exis ng records.
                o   DELETE – Removes records.
Database Interfaces
It refers to the mechanism or system through which a user, applica on, or so ware interacts with the database. It
acts as a mediator between the database engine (which manages the data) and the user or applica on that
requests, modifies, or queries the data.
The overall structure of a Database Management System (DBMS) refers to the architecture and components that
work together to manage databases effec vely.
o Query Processor
o Storage Manager
o Disk Storage
1. Query Processor
The query processing is handled by the query processor, as the name implies. It executes the user's query, to put
it simply. In this way, the query processor aids the database system in making data access simple and easy. The
query processor's primary duty is to successfully execute the query. The Query Processor transforms (or
interprets) the user's applica on program-provided requests into instruc ons that a computer can understand.
o DDL Interpreter:
Data Defini on Language is what DDL stands for. As implied by the name, the DDL Interpreter interprets DDL
statements like those used in schema defini ons (such as create, remove, etc.). This interpreta on yields a set of
tables that include the meta-data (data of data) that is kept in the data dic onary. Metadata may be stored in a
data dic onary. In essence, it is a part of the disc storage that will be covered in a later sec on of this ar cle.
o DML Compiler:
Compiler for DML Data Manipula on Language is what DML stands for. In keeping with its name, the DML
Compiler converts DML statements like select, update, and delete into low-level instruc ons or simply machine-
readable object code, to enable execu on. The op miza on of queries is another func on of the DML compiler.
Since a single ques on can typically be translated into a number of evalua on plans. As a result, some
op miza on is needed to select the evalua on plan with the lowest cost out of all the op ons. This process,
known as query op miza on, is exclusively carried out by the DML compiler. Simply put, query op miza on
determines the most effec ve technique to carry out a query.
Before the query evalua on, the embedded DML commands in the applica on program (such as SELECT, FROM,
etc., in SQL) must be pre-compiled into standard procedural calls (program instruc ons that the host language can
understand). Therefore, the DML statements which are embedded in an applica on program must be converted
into rou ne calls by the Embedded DML Pre-compiler.
o Query Op mizer:
It starts by taking the evalua on plan for the ques on, runs it, and then returns the result. Simply said, the query
evalua on engine evaluates the SQL commands used to access the database's contents before returning the
result of the query. In a nutshell, it is in charge of analysing the queries and running the object code that the DML
Compiler produces. Apache Drill, Presto, and other Query Evalua on Engines are a few examples.
2. Storage Manager:
An applica on called Storage Manager acts as a conduit between the queries made and the data kept in the
database. Another name for it is Database Control System. By applying the restric ons and running the DCL
instruc ons, it keeps the database's consistency and integrity. It is in charge of retrieving, storing, upda ng, and
removing data from the database.
o Integrity Manager:
Whenever there is any change in the database, the Integrity manager will manage the integrity constraints.
o Authoriza on Manager:
Authoriza on manager verifies the user that he is valid and authen cated for the specific query or request.
o File Manager:
All the files and data structure of the database are managed by this component.
o Transac on Manager:
It is responsible for making the database consistent before and a er the transac ons. Concurrent processes are
generally controlled by this component.
o Buffer Manager:
The transfer of data between primary and main memory and managing the cache memory is done by the buffer
manager.
3. Disk Storage
A DBMS can use various kinds of Data Structures as a part of physical system implementa on in the form of disk
storage.
It contains the metadata (data of data), which means each object of the database has some informa on about its
structure. So, it creates a repository which contains the details about the structure of the database object.
o Data Files:
o Indices:
These indices are used to access and retrieve the data in a very fast and efficient way.