0% found this document useful (0 votes)
380 views83 pages

DBMS in 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
380 views83 pages

DBMS in 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Video chapters Syllabus

• Chapter-1 (Basics): Data & information, Database System vs File System, Views of Data Base, Data Independence, • UNIT-1 : INTRODUCTION Overview, Database System vs File System, Database System Concept and Architecture, Data
Instances & Schema, OLAP Vs OLTP, Types of Data Base, DBA, Architecture. Model Schema and Instances, Data Independence and Database Language and Interfaces, Data Definitions Language,
• Chapter-2 (ER Diagram): Entity, Attributes, Relationship, Degree of a Relationship, Mapping, Weak Entity set, DML, Overall Database Structure. Data Modeling Using the Entity Relationship Model: ER Model Concepts, Notation
Conversion from ER Diagram to Relational Model, Generalization, Specification, Aggregation. for ER Diagram, Mapping Constraints, Keys, Concepts of Super Key, Candidate Key, Primary Key, Generalization,
• Chapter-3 (RDBMS & Functional Dependency): Basics & Properties, Update Anomalies, Purpose of Normalization, Aggregation, Reduction of an ER Diagrams to Tables, Extended ER Model, Relationship of Higher Degree.
Functional Dependency, Closure Set of Attributes, Armstrong’s axioms, Equivalence of two FD, Canonical cover, Keys. • UNIT-2 : RELATIONAL DATA MODEL Relational Data Model Concepts, Integrity Constraints, Entity Integrity, Referential
• Chapter-4 (Normalization): 1NF, 2NF, 3NF, BCNF, Multivalued Dependency, 4NF, Lossy-Lossless Decomposition, 5NF, Integrity, Keys Constraints, Domain Constraints, Relational Algebra, Relational Calculus, Tuple and Domain Calculus.
Dependency Preserving Decomposition. Introduction on SQL: Characteristics of SQL, Advantage of SQL. SQl Data Type and Literals. Types of SQL Commands.
• Chapter-5 (Indexing): Overview of indexing, Primary indexing, Clustered indexing and Secondary Indexing, B-Tree. SQL Operators and Their Procedure. Tables, Views and Indexes. Queries and Sub Queries. Aggregate Functions. Insert,
• Chapter-6 (Relational Algebra) : Query Language, Select, Project, Union, Set Difference, Cross Product, Rename Update and Delete Operations, Joins, Unions, Intersection, Minus, Cursors, Triggers, Procedures in SQL/PL SQL.
Operator, Additional or Derived Operators. • UNIT-3 : DATA BASE DESIGN & NORMALIZATION Functional dependencies, normal forms, first, second, 3 third normal
• Chapter-7(SQL) : Introduction to SQL, Classification, DDL Commands, Select, Where, Set Operations, Cartesian forms, BCNF, inclusion dependence, loss less join decompositions, normalization using FD, MVD, and JDs, alternative
Product, Natural Join, Outer Join, Rename, Aggregate Functions, Ordering, String, Group, having, Trigger, embedded, approaches to database design.
dynamic SQL. • UNIT-4 : TRANSACTION PROCESSING CONCEPT Transaction System, Testing of Serializability, Serializability of
• Chapter-8 (Relational Calculus) : Overview, Tuple Relation Calculus, Domain Relation Calculus. Schedules, Conflict & View Serializable Schedule, Recoverability, Recovery from Transaction Failures, Log Based
• Chapter-9 (Transaction) : What is Transaction, ACID Properties, Transaction Sates, Schedule, Conflict Serializability, Recovery, Checkpoints, Deadlock Handling. Distributed Database: Distributed Data Storage, Concurrency Control,
View Serializability, Recoverability, Cascade lessness, Strict Schedule. Directory System.
• Chapter-10 (Recovery & Concurrency Control) : Log Based Recovery, Shadow Paging, Data Fragmentation, TIME • UNIT-5 : CONCURRENCY CONTROL TECHNIQUES Concurrency Control, Locking Techniques for Concurrency Control,
STAMP ORDERING PROTOCOL, THOMAS WRITE RULE, 2 phase locking, Basic 2pl, Conservative 2pl, Rigorous 2pl, Strict Time Stamping Protocols for Concurrency Control, Validation Based Protocol, Multiple Granularity, Multi Version
2pl, Validation based protocol Multiple Granularity. Schemes, Recovery with Concurrent Transaction, Case Study of Oracle.
Knowledge Gate Website Knowledge Gate Website

What is Data ? What is Information ?


• Data are characteristics or attributes, often numerical, collected through observation and can • Data becomes information when analyzed and placed in context, providing a basis for
be qualitative(descriptive) or quantitative(numerical). Any facts and figures about an entity is understanding, decision-making, and further analysis. Processed Data is called
called as Data. information.
• Data serves a crucial role in various sectors by facilitating analysis, supporting decision-making
processes, and being fundamental to research activities.

Knowledge Gate Website Knowledge Gate Website


What is Data Base? What is Data Base Management System?
• A database is a structured collection of data, facilitating easy access, • A DBMS is software facilitating efficient data storage, retrieval, and management in
management, and updates., generally stored and accessed electronically from a databases.
computer system. • Ensures data safety and integrity, while offering accessibility and concurrency control.
• Supports functions like data querying, reporting, and analytics for informed decision-
making.

Knowledge Gate Website Knowledge Gate Website

Aspect File System Database Management System View of Data Base (Data Abstraction)
Data Access
Slower data retrieval due to unstructured Structured querying capabilities allow for • Physical Level
querying capabilities. quicker data access.

Challenges in correlating data across Facilitates data integration, reducing data • Logical Level/ Conceptual Level
Data Isolation
separate files leading to data isolation. isolation issues.
• View Level
Risk of inadvertent data alterations or Features to prevent unauthorized data
Data Integrity
deletions creating integrity problems. alterations, maintaining integrity.

Potential for data inconsistency due to Supports transaction properties like


Atomicity Problem incomplete operations, leading to atomicity, ensuring operations are either
atomicity problems. completed fully or not at all.

Conflicts and inconsistencies from Advanced concurrency controls to manage


Concurrent Access
simultaneous data access/modifications, multiple users accessing the database
Anomalies
causing concurrent access anomalies. simultaneously, reducing anomalies.
Knowledge Gate Website Knowledge Gate Website
View of Data Base (Data Abstraction) View of Data Base
• Physical Level: The internal schema details data storage and access on hardware, • Logical Level/ Conceptual Level: Above the physical level, this level showcases
featuring the lowest level of data abstraction with complex structures, predominantly data as entity sets and their relationships, detailing the types and connections
managed by the database administrator. between stored data in the database.

Knowledge Gate Website Knowledge Gate Website

View of Data Base Data independence


• View Level: This is the pinnacle of data abstraction, displaying only a portion of
• Data independence is defined as the capacity to change the schema at one level of a
the entire database focusing on user-interest areas. It can represent multiple
database system without having to change the schema at the next higher level.
views of the same data, allowing users to access information through various • Types of data independence :
applications from the database. • Physical data independence: a. Physical data independence is the ability to modify
internal schema without changing the conceptual schema. b. Modification at the
physical level is occasionally necessary in order to improve performance. c. It refers to
the immunity of the conceptual schema to change in the internal schema. d.
Examples of physical data independence are reorganizations of files, adding a new
access path or modifying indexes, etc.
• Logical data independence: a. Logical data independence is the ability to modify the
conceptual schema without having to change the external schemas or application
programs. b. It refers to the immunity of the external model to changes in the
conceptual model. c. Examples of logical data independence are addition/removal of
entities.
Knowledge Gate Website Knowledge Gate Website
Instance and Schemas Aspect
OLAP (Online Analytical OLTP (Online Transaction
Processing) Processing)
• Instance of the Database:
• The collection of information stored in the database at a specific moment is known as an Designed for complex data analysis Handles daily transactional data
instance of the database. It is a snapshot of the database that contains live data at that Primary Function
and reporting. processing.
moment, showing the current state of all records and transactions.
Star or snowflake schema, Normalized schema, optimizing for
Database Design
• Database Schema: optimizing for read operations. write operations.
• The database schema refers to the overall design of the database, illustrating the logical
structure and organization of data. It defines how data is organized and how relationships Complex queries involving Simple and standard queries
between data are handled, essentially serving as the blueprint for how the database is Query Complexity aggregations and computations focusing on CRUD operations
constructed. across multiple dimensions. (Create, Read, Update, Delete).

Deals with large volumes of data Processes a high number of small


Data Volume
for historical analysis. transactions.

Slower response time due to Fast response time to support high


Response Time
Knowledge Gate Website complex queries.
Knowledge Gate Websitetransaction rates.

Types of Data Base • Temporal Database: Keeps track of changing data over time, allowing for
queries concerning time-based data. A historical trading database in the
• Commercial Database: Predominantly used in the business sector to handle
financial sector which keeps track of stock prices over time.
large volumes of transactions and customer data. A CRM system like Salesforce
which handles large volumes of customer data and transactions.
• Geological Information System (GIS): Stores, organizes, and analyzes
• Multimedia Database: Stores data types such as images, audio, and video files,
geographical data, aiding in spatial analysis and mapping projects. A system like
facilitating the management and retrieval of multimedia content. A digital asset
ArcGIS which enables the storage, analysis, and visualization of geographical
management system like Adobe Experience Manager that facilitates the storage
data.
and retrieval of multimedia content.

• Deductive Database: Utilizes logic programming to derive information from


data stored in a database, allowing for more complex and analytical queries. A
database using Datalog (a query language) which allows for complex logical
queries and information derivation.
Knowledge Gate Website Knowledge Gate Website
DBA(Database Administrator) DBMS Architecture
Database administrators hold authority over data and the programs facilitating data access. Their
roles/functions are:
• Schema Definition: DBA outlines the original database schema.
Achieved through writing definitions translated to permanent
labels in the data dictionary by the DDL compiler.
• Storage Structure and Access Method Definition: Responsible
for forming appropriate storage structures and access methods.
Achieved through writing definitions translated by the data
storage and definition language compiler.
• Schema and Physical Organization Modification: Involves altering
the database schema or physical storage organization. Changes are
implemented by writing definitions that modify the relevant
internal system tables.
• Granting of Authorization for Data Access: DBA grants varied types of data access authorization to
different database users.
• Integrity Constraint Specification: DBA implements and maintains integrity constraints to ensure data
accuracy and consistency.
Knowledge Gate Website Knowledge Gate Website

1.Query Processor: This is the component of a DBMS that interprets and executes user queries. It comprises several A Database Management System (DBMS) consists of three primary components:
sub-components including:
1. DML Compiler: Processes Data Manipulation Language (DML) statements into low-level instructions that can
be executed.
1. Internal Level: Concerns the physical storage of data in databases, overseeing data storage
2. DDL Interpreter: Processes Data Definition Language (DDL) statements into metadata tables. on hardware devices, and managing low-level aspects like data compression and indexing.
3. Embedded DML Pre-compiler: Translates DML statements embedded in application programs into
procedural calls. 2. Conceptual Level: Represents the logical layout of the database, detailing the schema with
4. Query Optimizer: Determines the most efficient way to execute a query by evaluating different query plans. tables and attributes and their interrelations. It's independent of specific DBMS
2.Storage Manager: Also known as the Database Control System, it is responsible for managing the data stored in
the database, ensuring its consistency and integrity. It includes the following sub-components:
implementations, focusing on organizing and connecting data elements.
1. Authorization Manager: Manages access controls and privileges.
2. Integrity Manager: Ensures that data modifications adhere to integrity constraints. 3. External Level: Embodies the user interface of the database, facilitating data access and
3. Transaction Manager: Manages concurrent access to the database and maintains database consistency interaction through user-friendly views and interfaces tailored to various user groups.
during transactions.
4. File Manager: Manages file space and data structures representing information in the database.
5. Buffer Manager: Manages data cache and data transfer between main memory and secondary storage.
3.Disk Storage: Represents the storage aspect of a DBMS, encompassing the following components:
1. Data Files: Files where the actual data is stored.
2. Data Dictionary: Repository containing information about the structure and characteristics of database
objects.
3. Indices: Data structures that facilitate faster data retrieval.
Knowledge Gate Website Knowledge Gate Website
ER Diagram ER diagram for bank
• Developed by Dr. Peter Chen in 1976, this conceptual level
method, grounded in real-world perceptions, facilitates
diagrammatic data representation, simplifying
comprehension for non-technical users.

• The E-R data model, central to database design, encapsulates


entities and their attributes within an enterprise schema,
serving as a clear, standardized tool for translating real-world
enterprise interactions into a conceptual schema.

Knowledge Gate Website Knowledge Gate Website

ER diagram for University System ER diagram for University System

Knowledge Gate Website Knowledge Gate Website


ER diagram for Marketing Company ENTITY
• An entity is a thing or an object in the real world that is distinguishable from other object
based on the values of the attributes it possesses.

• An entity may be concrete, such as a person or a book, or it may be abstract, such as a course,
a course offering, or a flight reservation.

Knowledge Gate Website Knowledge Gate Website

• Types of Entity • In ER diagram we cannot represent an entity, as entity is an instant not schema,
and ER diagram is designed to understand schema
• Tangible - Entities which physically exist in real world. E.g. - Car, Pen, locker
• In a relational model entity is represented by a row or a tuple or a record in a
• Intangible - Entities which exist logically. E.g. – Account, video. table.

Knowledge Gate Website Knowledge Gate Website


• ENTIY SET- Collection of same type of entities that share the same properties ATTRIBUTES
or attributes. • Attributes are the units defines and describe properties and characteristics of entities.

• In an ER diagram an entity set is represented by a rectangle • Attributes are the descriptive properties possessed by each member of an entity set.
for each attribute there is a set of permitted values called domain.
• In a relational model it is represented by a separate table

Knowledge Gate Website Knowledge Gate Website

• In an ER diagram attributes are represented by ellipse or oval connected to rectangle. Types of Attributes
• Single valued- Attributes having single value at any instance of time for an entity. E.g. –
• While in a relational model they are represented by independent column. e.g. Aadhar no, dob.
Instructor (ID, name, salary, dept_name)
• Multivalued - Attributes which can have more than one value for an entity at same time. E.g.
- Phone no, email, address.
• A multivalued attribute is represented by a double ellipse in an ER diagram and by an
independent table in a relational model.
• Separate table for each multivalued attribute, by taking mva and pk of main table as fk in
new table

Knowledge Gate Website Knowledge Gate Website


Customer Customer
Customer ID First Name Surname Telephone Number
Customer ID First Name Surname Telephone Number
123 Pooja Singh 555-861-2025, 192-122-1111
123 Rabri Devi Singh 555-861-2025, 192-122-1111 456 San Zhang (555) 403-1659 Ext. 53; 182-929-2929
(555) 403-1659 Ext. 53; 182-929- 789 John Doe 555-808-9633
456 Imarti Devi Zhang
2929
789 Barfi Devi Doe 555-808-9633
जुगाड़ technology

Customer

Customer ID First Name Surname Telephone Number1 Telephone Number2

123 Pooja Singh 555-861-2025 192-122-1111


(555) 403-1659 Ext.
456 San Zhang 182-929-2929
53
789 John Doe 555-808-9633

Knowledge Gate Website Knowledge Gate Website

Customer
• Simple - Attributes which cannot be divided further into sub parts. E.g. Age
Customer ID First Name Surname Telephone Number
123 Rabri Devi Singh 555-861-2025, 192-122-1111

456 Imarti Devi Zhang (555) 403-1659 Ext. 53; 182-929-2929


• Composite - Attributes which can be further divided into sub parts, as simple
789 Barfi Devi Doe 555-808-9633 attributes. A composite attribute is represented by an ellipse connected to an
ellipse and in a relational model by a separate column.

Customer Name Customer Phone Number


Customer ID Telephone Number
Customer ID First Name Surname 123 555-861-2025
123 Pooja Singh 123 192-122-1111
456 (555) 403-1659 Ext. 53
456 San Zhang
456 182-929-2929
789 John Doe 789 555-808-9633

Knowledge Gate Website Knowledge Gate Website


• Stored - Main attributes whose value is permanently stored in database. E.g. Descriptive attribute - Attribute of relationship is called descriptive attribute.
date_of_birth
• An attribute takes a null value when an entity does not have a value for it. The null
• Derived -The value of these types of attributes can be derived from values of other value may indicate “not applicable”— that is, that the value does not exist for the
Attributes. E.g. - Age attribute can be derived from date_of_birth and Date attribute. entity.

Knowledge Gate Website Knowledge Gate Website

Relationship / Association Relationship / Association


• Is an association between two or more entities of same or different entity set.

• In ER diagram we cannot represent individual relationship as it is an instance or


data.

Knowledge Gate Website Knowledge Gate Website


• In an ER diagram it is represented by a diamond, while in relational model • Every relationship type has three components.
sometimes through foreign key and other time by a separate table.
• Name

• Degree

• Structural constraints (cardinalities ratios, participation)

Knowledge Gate Website Knowledge Gate Website

• NAME- every relation must have a unique name. Degree of a relationship/relationship set
• Means number of entities set(relations/tables) associated(participate) in the
relationship set.
• Most of the relationship sets in a data base system are binary.
• Occasionally however relationship sets involve more than two entity sets.
• Logically, we can associate any number of entity set in a relationship called N-ary
Relationship.

Knowledge Gate Website Knowledge Gate Website


• Unary Relationship - One single entity set participate in a Relationship, means • Binary Relationship - Two entity sets participate in a Relationship. It is
two entities of the same entity set are related to each other. most common Relationship.
• These are also called as self -referential Relationship set.

• E.g.- A member in a team maybe supervisor of another member in team.

Knowledge Gate Website Knowledge Gate Website

• Ternary Relationship - When three entities participate in a Relationship. E.g. • Quaternary Relationship - When four entities participate in a Relationship.
The University might need to record which teachers taught which subjects in
which courses.

Knowledge Gate Website Knowledge Gate Website


• N-ary relationship – where n number of entity set are associated Structural constraints (Cardinalities Ratios, Participation)

• An E-R enterprise schema may define certain constraints to which the contents
of a database must conform.

• But the most common relationships in ER models are Binary.

Knowledge Gate Website Knowledge Gate Website

MAPPING CARDINALITIES / CARNINALITY RATIOS One to One (1:1) Relationship - An entity in A is associated with at most one entity
in B, and an entity in B is associated with at most one entity in A.
• Express the number of entities to which another entity can be associated via a
relationship set. Four possible categories are-
• One to One (1:1) Relationship.
• One to Many (1: M) Relationship.
• Many to One (M: 1) Relationship.
• Many to Many (M: N) Relationship.

E.g.- The directed line from relationship set advisor to both entities set indicates that ‘an instructor
may advise at most one student, and a student may have at most one advisor’.

Knowledge Gate Website Knowledge Gate Website


One to Many (1: M) Relationship - An entity in A is associated with any number Many to One (M: 1) Relationship - An entity in A is associated with at most one
(zero or more) of entities in B. An entity in B, however, can be associated with at entity in B. An entity in B, however, can be associated with any number (zero or
most one entity in A. more) of entities in A.

E.g.- This indicates that student may have many instructors but an instructor can
advise at most one student.
E.g.- This indicates that an instructor may advise many students, but a student may
have at most one advisor.

Knowledge Gate Website Knowledge Gate Website

Many to Many(M:N) Relationship - An entity in A is associated with any number • PARTICIPATION CONSTRAINTS- it defines participations of entities of an entity
(zero or more) of entities in B, and an entity in B is associated with any number type in a relationship.
(zero or more) of entities in A.
• Partial participation

• Total Participation

E.g.- This indicates a student may have many advisors and an instructor may advise
many students.

Knowledge Gate Website Knowledge Gate Website


STRONG AND WEAK ENTITY SET • For a weak entity set to be meaningful and converted into strong entity set, it must be
• An entity set is called strong entity set, if it has a primary key, all the tuples in the set associated with another strong entity set called the identifying or owner entity set i.e.
are distinguishable by that key. weak entity set is said to be existence dependent on the identity set.

• An entity set that does not process sufficient attributes to form a primary key is called a • The identifying entity set is said to own weak entity set that it identifies.
weak entity set. • A weak entity set may participate as owner in an identifying relationship with another
• It contains discriminator attributes (partial key) which contain partial information about weak entity set.
the entity set, but it is not sufficient enough to identify each tuple uniquely.
Represented by double rectangle.

Knowledge Gate Website Knowledge Gate Website

• The relationship associating the weak entity set with the identifying entity set is called REASONS TO HAVE WEAK ENTITY SET
the identifying relationship (double diamonds).
• The identifying relationship is many to one from the weak entity set to identifying • Weak entities reflect the logical structure of an entity being dependent on
entity set, and the participation of the weak entity set in relationship is always total. another.
• The primary key of weak entity set will be the union of primary key and discriminator • Weak entity can be deleted automatically when their strong entity is deleted.
attributes. • Without weak entity set it will lead to duplication and consequent possible
inconsistencies.

Knowledge Gate Website Knowledge Gate Website


Conversion From ER Diagram To Relational Model Generalization
• Entity Set • Involves merging two lower-level entities to create a higher-level entity.
• Convert every strong, weak entity set into a separate table. In weak entity set we make it • A bottom-up approach that builds complexity from simpler components.
dependent onto one strong entity set (identifying or owner entity set).
• Relationship • Highlights similarities among lower-level entity sets while hiding differences.
• If Unary: No separate table is required, add a new column as fk which refer the pk of the same • Leads to a simplified, structured data representation, aiding in database design
table. and querying processes.
• if 1:1 No separate table is required, take pk of one side and put it as fk on other side, priority must
be given to the side having total participation.
• if 1:n or n:1 No separate table is required, modify n side by taking pk of 1 side a foreign key on n
side.
• If m-n Separate table is required take pk of both table and declare their combination as a pk of
new table
• (3 or More) Take the pk of all participating entity sets as fk and declare their combinations as pk in
the new table.
• Attributes
• Multivalued-A separate table must be taken for all multivalued attributes, where we take pk of the
main table as fk and declare combination of fk and multivalued attribute are pk in the new table.
• Composite Attributes-A separate column must be taken for all simple attributes of the composite
Knowledge Gate Website Knowledge Gate Website
attribute.

Specialization Aggregation
• A process where a higher-level entity is broken down into more specific, lower-
level entities. • A concept wherein relationships are abstracted to form higher-level
• This top-down approach delineates complexity into simpler components. entities, enabling a more organized representation of complex
• Acts as the converse of the generalization process, focusing on differentiating relationships.
properties rather than similarities.

Knowledge Gate Website Knowledge Gate Website


ADVANTAGES OF E-R DIGRAM RELATIONAL DATABASE MANAGEMENT SYSTEM
• Constructs used in the ER diagram can easily be transformed into relational • A relational database management system (RDBMS),
tables. conceptualized by Edgar F. Codd in 1970, serves as the foundation
for most contemporary commercial and open-source database
• It is simple and easy to understand with minimum training. applications.
• Central to its design is the utilization of tables for data storage,
DISADVANTAGE OF E-R DIGRAM where it maintains and enforces specific data relationships,
• Loss of information content marking a significant evolution in database design.

• Limited constraints representation


• It is overly complex for small projects

Knowledge Gate Website Knowledge Gate Website

BASICS OF RDBMS • Table (Relation) - A Relation is a set of tuples/rows/entities/records.

• Domain (set of permissible value in particular column) is a set of atomic values. • Tuple - Each row of a Relation/Table is called Tuple.
• By atomic we mean that each value in the domain is indivisible as far as the formal • Arity/Degree - No. of columns/attributes of a Relation. E.g. - Arity is 5 in Table Student.
relational model is concerned.
• A common method of specifying a domain is to specify a data type from which the • Cardinality - No of rows/tuples/record of a Relational instance. E.g. - Cardinality is 4 in table
data values forming the domain are drawn. Student. NAME ID CITY COUNTRY HOBBY
NISHA 1 AGRA INDIA PLAYING
• E.g. Names: The set of character strings that represent names of persons. Rows/Tuples/Record/ NIKITA 2 DELHI INDIA DANCING
Cardinality AJAY 3 AGRA INDIA CHESS
ARPIT 4 PATNA INDIA READING
Domain/ NAME ID CITY COUNTRY HOBBY
NISHA 1 AGRA INDIA PLAYING
Field/ NIKITA 2 DELHI INDIA DANCING
Column/ AJAY 3 AGRA INDIA CHESS
ARPIT 4 PATNA INDIA READING
Arity/
Degree Knowledge Gate Website Knowledge Gate Website
Properties of Relational tables Problems in relational database
1. Cells contains atomic values • Update Anomalies- Anomalies that cause redundant work to be done during
insertion into and Modification of a relation and that may cause accidental loss
2. Values in a column are of the same kind of information during a deletion from a relation
• Insertion Anomalies
3. Each row is unique
• Modification Anomalies
4. No two tables can have the same name in a relational schema.
• Deletion Anomalies
5. Each column has a unique name

6. The sequence of rows is insignificant

7. The sequence of columns is insignificant.


Knowledge Gate Website Knowledge Gate Website

• Insertion anomalies: An independent piece of information cannot be recorded • Modification anomalies: The update of a piece of information must occur at
into a relation unless an irrelevant information must be inserted together at the multiple locations.
same time.
Roll no name Age Br_code Br_name Br_hod_name
Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc
1 A 19 101 Cs Abc 2 B 18 101 Cs Abc
2 B 18 101 Cs Abc 3 C 20 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr
4 D 20 102 Ec Pqr

Knowledge Gate Website Knowledge Gate Website


• Deletion Anomalies: The deletion of a piece of information unintentionally Roll no name Age Br_code Br_name Br_hod_name
removes other information. 1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
Br_name Br_hod_name 4 D 20 102 Ec Pqr
Roll no name Age Br_code
1 A 19 101 Cs Abc PK FK PK
2 B 18 101 Cs Abc Roll no name Age Br_code Br_code Br_name Br_hod_name
3 C 20 101 Cs Abc 1 A 19 101 101 Cs Abc
4 D 20 102 Ec Pqr 2 B 18 101 102 ec Pqr
3 C 20 101
4 D 20 102

Knowledge Gate Website Knowledge Gate Website

Purpose of Normalization Conclusion


• Normalization may be simply defined as refinement process. Which includes creating 1. Like every paragraph must have a single idea similarly every table must have a
tables and establishing relationships between those tables according to rules designed
single idea and if a table contains more than one idea then that table must be
both to protect data and make the database more flexible by eliminating two factors;
decomposed until each table contains only one idea.
• Redundancy
• Inconsistent Dependency
2. We need some tools to approach this decomposition or normalization on large
• With out normalization data base system may be inaccurate, slow and inefficient and database which contains a number of table, and that tool is functional
they might not produce the data we expect. A series of normal form tests that can dependencies.
be carried out on individual relation schemas so that the relational database can
be normalized to any desired degree.
• 1NF>>>2NF>>3NF>>BCNF

Knowledge Gate Website Knowledge Gate Website


Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr

Roll no name Age Br_code Br_code Br_name Br_hod_name Functional Dependency


1 A 19 101 101 Cs Abc
2 B 18 101 102 ec Pqr
3 C 20 101
4 D 20 102

Knowledge Gate Website Knowledge Gate Website

Br_code Br_hod_name

Br_code Br_hod_name

Knowledge Gate Website Knowledge Gate Website


Functional Dependency कोई बताता नह ,ीं इसक feel आ जात है FUNCTIONAL DEPENDENCY
• A formal tool for analysis of relational schemas.
X Y Z
• In a Relation R, if ‘α’ ⊑ R AND ‘β’ ⊑ R, then 1 4 2
attribute or a Set of attribute ‘α’ Functionally
1 4 3
derives an attribute or set of attributes ‘β’, iff
2 6 3
each ‘α’ value is associated with precisely one ‘β’
value. 3 2 2

• For all pairs of tuples t1 and t2 in R such that


• If T1[α] = T2[α]
• Then, T1[β] = T2[β]

Knowledge Gate Website Knowledge Gate Website

Q Consider the following relation instance, which of the following • Trivial Functional dependency - If β is a subset of α, then the functional
dependency doesn’t hold dependency α → β will always hold.
X Y Z
A) A → b
जजसका होना न होना बराबर हो 1 4 2
A B C 1 4 3
2 6 3
1 2 3 3 2 2
B) BC → A
4 2 3
C) B → C 5 3 3

D) AC → B
Knowledge Gate Website Knowledge Gate Website
ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE SET OF ATTRIBUTES ARMSTRONG’S AXIOMS
• Attribute closure of an attribute set can be defined as set of attributes which can
1. An axiom or postulate is a statement that is taken to be true, to
be functionally determined from F. serve as a premise or starting point for further reasoning and
• DENOTED BY F+ arguments.
Q R(ABCDE)
R (A, B, C, D) R(ABCDEFG) A → BC 2. Armstrong's axioms are a set of axioms (or, more
precisely, inference rules) used to infer all the functional
A→B A→B CD → E dependencies on a relational database. They were developed
B→C BC → DE B→D
by William W. Armstrong in his 1974 paper.
3. The axioms are sound in generating only functional dependencies
AB → D AEG → G E→A in the closure of a set of functional dependencies (denoted as F+)
when applied to that set (denoted as F).
A+ = (AC)+ =? (B)+ =

Knowledge Gate Website Knowledge Gate Website

Armstrong Axioms From these rules, we can derive these secondary rules-

• Union: If X → Y and X → Z, then X → YZ


• Reflexivity: If Y is a subset of X, then X → Y

• Decomposition: If X → YZ, then X → Y and X → Z

• Augmentation: If X → Y, then XZ → YZ
• Pseudo transitivity: If X → Y and WY → Z, then WX → Z

• Transitivity: If X → Y and Y → Z, then X → Z • Composition: If X → Y and Z → W, then XZ → YW

Knowledge Gate Website Knowledge Gate Website


Why Armstrong axioms refers to the Sound and Complete Equivalence of Two FD sets-
• By sound, we mean that given a set of functional dependencies F specified on a
Two FD sets F1 and F2 are equivalent if –
relation schema R, any dependency that we can infer from F by using the
primary rules of Armstrong axioms holds in every relation state r of R that
F1+ = F2+
satisfies the dependencies in F.
Or
• By complete, we mean that using primary rules of Armstrong axioms repeatedly
to infer dependencies until no more dependencies can be inferred results in the
F1 ⊑ F2+ and F2 ⊑ F1+
complete set of all possible dependencies that can be inferred from F.

Knowledge Gate Website Knowledge Gate Website

Q Consider the following set of fd R(ACDEH) To find the MINIMAL COVER / CANONICAL COVER / IRREDUCIBLE SET

F G • A canonical cover (also known as a minimal cover) for a set of functional dependencies in a database is
a minimal set of functional dependencies that is equivalent to the original set, but with redundant
A→C A → CD dependencies and extraneous attributes removed. It is used in the normalization process of database
design to simplify the set of functional dependencies and to find a good set of relations.

AC → D • There may be any following type of redundancy in the set of functional dependencies: -
E → AH • Complete production may be Redundant.

E → AD • One or more than one attributes may be redundant on right hand side of a production.

• One or more than one attributes may be redundant on Left hand side of a production.
E→H

Knowledge Gate Website Knowledge Gate Website


Q R(ABCD) R(A,B,C)
A→B
A→B B→C
A→C
C→B
AB → B
D → ABC AB → C
AC → B
AC → D

Knowledge Gate Website Knowledge Gate Website

Key Super key


• Set of attributes using which we can identify each tuple uniquely is called Super
key.
• Let X be a set of attributes in a Relation R , if X+(Closure of X) determines all
attributes of R then X is said to be Super key of R .
• There should be at least one Super key in every relation.

Knowledge Gate Website Knowledge Gate Website


Candidate key Prime attribute - Attributes that are member of at least one
• Minimal set of attributes using which we can identify each tuple uniquely is candidate Keys are called Prime attributes.
called Candidate key. A super key is called candidate key if it’s No proper subset
is a super key. Also called as MINIMAL SUPER KEY.

• There should be at least one candidate key.

Knowledge Gate Website Knowledge Gate Website

Primary key Foreign Keys


• One of the candidate keys is selected by database administrator as a Primary • A foreign key is a column or group of columns in a relational database table that
means to identify tuple is called primary Key. Primary Key attribute are not refers the primary key of the same table or some other table to represent
allowed to have Null values. Exactly one Primary Key per table in RDMS. relationship.

• Candidate key which are not chosen as primary key is alternate key. • The concept of referential integrity is derived from foreign key theory.

Knowledge Gate Website Knowledge Gate Website


Roll no name Age Br_code Br_name Br_hod_name PK FK
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
Roll no CR
3 C 20 101 Cs Abc
1 1
4 D 20 102 Ec Pqr
2 2
PK FK PK 3 1
4 2
Roll no name Age Br_code Br_code Br_name Br_hod_name 5 1
1 A 19 101 101 Cs Abc 6 2
2 B 18 101 102 ec Pqr 7 1
3 C 20 101 8 2
4 D 20 102 9 1
10 2
11 1
12 2
13 1
14 2
15 null
Knowledge Gate Website Knowledge Gate Website

• Composite key – Composite key is a key composed of more than one


column sometimes it is also known as concatenated key.

• Secondary key – Secondary key is a key used to speed up the search


and retrieval contrary to primary key, a secondary key does not
necessary contain unique values.
NORMAL FORM
Knowledge Gate Website Knowledge Gate Website
FIRST NORMAL FORM Prime attribute: - A attribute is said to be prime if it is part of any of the candidate key
• 1NF is the initial step of database normalization.
Non-Prime attribute: - A attribute is said to be non-prime if it is not part of any of the candidate
• Implications of first normal form key
• Atomic Values: Each cell in a table contains indivisible, atomic values. Means a Relation
should not contain any multivalued or composite attributes. Eg R(ABCD)
• Unique Columns: Each column must have a distinct name to identify the data it contains. AB→CD
Here candidate key is AB so, A and B are prime attribute, C and D are non-prime attributes.
• Primary Key: A table in 1NF should have a primary key that uniquely identifies each record.
• Eliminating Duplicates: Duplicate rows are removed to prevent data redundancy.

Knowledge Gate Website Knowledge Gate Website

PARTIAL DEPENDENCY- When a non – prime attribute is dependent only on a part (Proper SECOND NORMAL FORM
subset) of candidate key then it is called partial dependency. (PRIME > NON-PRIME) • Relation R is in 2NF if,
• R should be in 1 NF.
Full DEPENDENCY- When a non – prime attribute is dependent on the entire candidate key then • R should not contain any Partial dependency. (that is every non-prime
it is called Full dependency. attribute should be fully dependent upon candidate key)

e.g. R(ABCD)
AB→D
A→C

Knowledge Gate Website Knowledge Gate Website


Q R(A, B, C) B→C
A B C A B B C
a 1 X A 1 1 X
b 2 Y B 2 2 Y
a 3 Z A 3 3 z
C 3 Z C 3
D 3 Z D 3
E 3 Z E 3

Knowledge Gate Website Knowledge Gate Website

TRANSITIVE DEPENDENCY – A functional dependency from non-Prime attribute to THIRD NORMAL FORM
non-Prime attribute is called transitive
• Let R be the relational schema, it is said to be in 3 NF
E.g.- R(A, B, C, D) with A as a candidate key • R should be in 2NF
A→B • It must not contain any transitive dependency
B→C [ transitive dependency]
C→D [transitive dependency]

Knowledge Gate Website Knowledge Gate Website


THIRD NORMAL FORM DIRECT DEFINATION R(A, B, C) A B C A B B C
• A relational schema R is said to be 3 NF if every functional dependency A 1 P A 1 1 P
in R from α-->β, either α is super key or β is the prime attribute A→B B 2 Q B 2 2 Q
C 2 Q C 2 3 R
B→C D 2 Q D 2 4 S
E 3 R E 3
F 3 R F 3
G 4 S G 4

Knowledge Gate Website Knowledge Gate Website

BCNF (BOYCE CODD NORMAL FORM) R(A, B, C) A B C A B C B


• A relational schema R is said to be BCNF if every functional A C B A B B C
dependency in R from AB → C
B B C B B C B
• α→β
C→B B A D B A D A
• α must be a super key
A A E A A E A
C C B C C
D C B D C
E C B e C
F C B f c

Knowledge Gate Website Knowledge Gate Website


Some important note points on Normalization: Q Consider the universal relational schema R (A, B, C, D, E, F, G, H, I, J) and a set of
following functional dependencies.
• A Relation with two attributes is always in BCNF. F = {AB → C, A → DE, B → F, F → GH, D → IJ} determine the keys for R ?
Decompose R into 2nd normal form.
• A Relation schema R consist of only prime attributes then R is always in 3NF, but
may or may not be in BCNF.

Knowledge Gate Website Knowledge Gate Website

Q Find the normal form of relation R(A, B, C, D, E) having FD set F= {A → B, BC → E, Multivalued Dependency
ED → A}.
• Denoted by, A →→ B, Means, for every value of A, there may exist more than
one value of B.
• E.g. S_name → → Club_name

S_Name Club_name
Kamesh Dance
Kamesh Guitar

Knowledge Gate Website Knowledge Gate Website


• A trivial multivalued dependency X→→Y is one where either Y is E.g. let the constraint specified by MVD in relation Student as
S_name → → Club_name
S_name → → P_no
a subset of X, or X and Y together form the whole set of attributes
S_Name Club_name
of the relation. Kamesh Dance
S_Name
Kamesh
P_no
123
Kamesh Guitar Kamesh 789

S_name → → Club_name
S_name → → Club_name
S_name → → P_no S_Name Club_name P_no
Kamesh Dance 123
S_Name Club_name S_Name Club_name P_no
Kamesh Dance Kamesh Guitar 123
Kamesh Dance 123
Kamesh Guitar Kamesh Dance 789
Kamesh Guitar 123
Kamesh Guitar 789
Kamesh Dance 789
Kamesh Guitar 789

NOTE: The above Student schema is in BCNF as no functional dependency holds on EMP, but
still redundancy due to MVD.

Knowledge Gate Website Knowledge Gate Website

• Each row indicates that a given restaurant can Restaurant


Restaurant Delivery Permutations
Variety Delivery Area
• If we have two or more multivalued independent attributes in the same relation
deliver a given variety. The table has no non-key Chatora Sweets Samosa Hatibagan Market schema, we get into a problem of having to repeat every value of one of the attributes
Chatora Sweets Samosa Chandni Chowk
attributes because its only key is {Restaurant,
Chatora Sweets Samosa Koramangala
with every value of the other attribute to keep the relation state consistent and to
Variety, Delivery Area}. Therefore, it meets all Chatora Sweets Dosa Hatibagan Market maintain the independence among the attributes involved. This constraint is specified
normal forms up to BCNF. Chatora Sweets Dosa Chandni Chowk by a multivalued dependency.
Chatora Sweets Dosa Koramangala
Delivery Areas By Restaurant Varieties By Restaurant
• If we assume, however, that Variety offered by a Moolchand Ladoo Koramangala
Restaurant Delivery Area Restaurant Pizza Variety
Moolchand Dosa Koramangala
restaurant are not affected by delivery area (i.e. a Thaggu Samosa Hatibagan Market Chatora Sweets Hatibagan Market Chatora Sweets Samosa
restaurant offers all Variety it makes to all areas it Thaggu Samosa Chandni Chowk Chatora Sweets Chandni Chowk Chatora Sweets Dosa
Thaggu Ladoo Hatibagan Market Chatora Sweets Koramangala Moolchand Ladoo
supplies), then it does not meet 4NF. The Thaggu Ladoo Chandni Chowk Moolchand Koramangala Moolchand Dosa
problem is that the table features two non-trivial Thaggu Hatibagan Market Thaggu Samosa
Thaggu Chandni Chowk Thaggu Ladoo
multivalued dependencies on the {Restaurant}
attribute (which is not a super key). The
dependencies are:
o {Restaurant} →→ {Variety}
o {Restaurant} →→ {Delivery Area}
Knowledge Gate Website Knowledge Gate Website
• A relation is in 4NF iff Lossy/Lossless-Dependency Preserving Decomposition
• It is in BCNF
• There must not exist any non-trivial multivalued dependency. • Because of a normalization a table is Decomposed into two or more tables, but
during this decomposition we must ensure satisfaction of some properties out of
• Each MVD is decomposed in separate table, where it becomes trivial MVD. which the most important is lossless join property / decomposition.

Knowledge Gate Website Knowledge Gate Website

r
• if we decompose a table r into two tables r1 and r2 because of normalization then
at some later stage if we want to join(combine) (natural join) these tables r1 and A B C
r2, then we must get back the original table r, without any extra or less tuple. 1 a p
But some information may be lost during retrieval of original relation or table. 2 b q
For e.g. 3 a r
r
r1 r2
A B C
A B B C
1 a p
1 a a p
2 b q
2 b b q
3 a r A B C
3 a a r
r1 r2
A B B C

Knowledge Gate Website Knowledge Gate Website


• Decomposition is lossy if R1 ⋈ R2 ⊃ R • Decomposition is lossless if R1 ⋈ R2 = R "The decomposition of relation R into R1 and R2
is lossless when the join of R1 and R2 yield the same relation as in R." which guarantees
that the spurious (extra or less) tuple generation problem does not occur with respect
• Decomposition is lossy if R ⊃ R1 ⋈ R2 to the relation schemas created after decomposition.

• This property is extremely critical and must be achieved at any cost.

• lossless Decomposition / NonAdditive Join Decomposition

Knowledge Gate Website Knowledge Gate Website

A B C D E How to check for lossless join decomposition using FD set, following conditions must hold:
A 122 1 W A • Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be
E 236 4 X B either in R1 or in R2. Att(R1) U Att(R2) = Att(R)
A 199 1 Y C
B 213 2 Z D • Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ

• Common attribute must be a key for at least one relation (R1 or R2)
• Att(R1) ∩ Att(R2) → (R1) or Att(R1) ∩ Att(R2) → (R2)

Knowledge Gate Website Knowledge Gate Website


Q R (A, B, C, D) Q R(ABCDE)(NF) R1(AB) R2(BC) R3(ABCD) R4(EG)
A→ B, B→C, C→D, D→A
R1(A, B), R2(B, C) AND R3(C, D)
A→BC
C→DE
D→E

Knowledge Gate Website Knowledge Gate Website

5 NF/Project-Join Normal Form Dependency Preserving Decomposition

• Let relation R be decomposed into Relations R1, R2, R3…………. RN with their respective
• A Relational table R is said to be in 5th normal form if functional Dependencies set as F1, F2, F3…………. FN, then the Decomposition is Dependency
• it is in 4 NF Preserving iff
• it cannot be further non-loss decomposed • {F1 ∪ F2 ∪ F3 ∪ F4………. ∪ FN }+ = F+

Knowledge Gate Website • Dependency preservation property, although


Knowledge desirable, is sometimes sacrificed.
Gate Website
X = (PQRS) Indexing
F = {QR → S, R → P, S → Q}
Relational databases are based on set theory.
Y = (PR) and Z = (QRS) • In set theory, the order of elements is unimportant, similarly in database tables.
• However, in practical implementation, element order in tables is often specified.
• Various operations such as search, insertion, and deletion are influenced by the order
of elements in the tables.
• Elements in a table can be stored in two ways: sorted (ordered) or unsorted
(unordered).

Knowledge Gate Website Knowledge Gate Website

File organization/ organization of records in a file Unordered file organization


Ordered file organization
• Records are typically added at the end of the file, without following any specific order.
• All the records in the file are ordered on some search key field.
• This insertion method allows only linear search, resulting in slower search times.
• Here binary search is possible. (give example of book page searching)
• Despite slow searches, maintenance including insertion and deletion is simpler.
• Maintenance (insertion & deletion) is costly, as it requires reorganization of entire file.
• No reorganization of the entire file is needed, making maintenance easier.
• Notes that we will get binary search only if we are using that key for searching on which
• If file is unordered then no of block assesses required to reach correct block which contain the
indexing is done, otherwise it will behave as unsorted file.
desired record is O(n), where n is the number of blocks.
• if file is unordered then no of block assesses required to reach correct block which contain
the desired record is O(log2n), where n is the number of blocks.

Knowledge Gate Website Knowledge Gate Website


• Indexes are supplementary structures in databases, aiding in swift record • The size of index file is way smaller than that of the main file, as index file record
retrieval. contain only two columns key (attribute in which searching is done) and block pointer
• They enable quick data access based on particular attributes identified for (base address of the block of main file which contains the record holding the key),
indexing. while main file contains all the columns.
• This technique is similar to the index sections seen in books.
• Indexes provide secondary pathways to access records without changing their
physical position in the main file.

Knowledge Gate Website Knowledge Gate Website

Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B 1. Indexes can be established on any relation field, be it primary key or
= 1024 B. File records are of fixed size and are unspanned with record length R = 100 B. non-key.
Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long,
Implement primary indexing? 2. Each attribute can have a dedicated index file, meaning multiple
index files may exist for one main file.
3. Index files are always organized, allowing for the utilization of binary
search advantages, irrespective of the main file's order.
4. Indexing accelerates data retrieval time but also introduces space
overhead for storing the index file.
5. The correct block in the main file can be located with log2(number of
blocks in index file) + 1 accesses.

Knowledge Gate Website Knowledge Gate Website


TYPES OF INDEXING PRIMARY INDEXING

Index
• Main file is always sorted according to primary key.
• Indexing is done on Primary Key, therefore called as primary indexing
Single
Multilevel • Index file have two columns, first primary key and second anchor pointer (base
level
address of block)
Primary Clustering Secondary Simple • It is an example of Sparse Indexing.
B tree B+ tree
Index Index index multilevel
• Here first record (anchor record) of every block gets an entry in the index file
• No. of entries in the index file = No of blocks acquired by the main file.
• In single-level indexing, an index file is created for the main file, marking the end
of the indexing process.
• Multiple-level indexing, on the other hand, involves creating an index for the
index file and continually repeating this procedure until only a single block
remains.
Knowledge Gate Website Knowledge Gate Website

CLUSTERED INDEXING Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B
= 1024 B. File records are of fixed size and are unspanned with record length R = 100 B.
• Main file will be ordered on some non-key attributes Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long,
Implement Secondary indexing?
• No of entries in the index file = no of unique values of the attribute on which
indexing is done.

• It is the example of Sparse as well as dense indexing

Knowledge Gate Website Knowledge Gate Website


SECONDARY INDEXING Dense Vs Sparse
• Most common scenarios, suppose that we already have a primary indexing on primary key,
but there is frequent query on some other attributes, so we may decide to have one more • Dense Index In dense index, there is an entry in the index file for every search key value
index file with some other attribute. in the main file. This makes searching faster but requires more space to store index
records itself. Note that it is not for every record, it is for every search key value.
• Main file is ordered according to the attribute on which indexing is done(unordered).
Sometime number of records in the main file > number of search keys in the main file,
• Secondary indexing can be done on key or non-key attribute. for example if search key is repeated.
• No of entries in the index file is same as the number of entries in the main file.
• Sparse Index-If an index entry is created only for some records of the main file, then it
• It is an example of dense indexing. is called sparse index. No. of index entries in the index file < No. of records in the main
file. Note: - dense and sparse are not complementary to each other, sometimes it is
possible that a record is both dense and sparse.

Knowledge Gate Website Knowledge Gate Website

B tree Insertion in B-TREE


• A B-tree of order m if non-empty is an m-way search tree in which.
• A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root node is full with
• The root has at least zero child nodes and at most m child nodes. m – 1 search key values and we attempt to insert another entry in the tree, the root node splits into two nodes at
• The internal nodes except the root have at least celling(m/2) child nodes and at most m level 1.
child nodes.
• The number of keys in each internal node is one less than the number of child nodes and • Only the middle value is kept in the root node, and the rest of the values are split evenly between the other two
these keys partition the subtrees of the nodes in a manner similar to that of m-way search nodes. When a non-roof node is full and a new entry is inserted into it, that node is split into two nodes at the
same level, and the middle entry is moved to the parent node along with two pointers to the new split nodes.
tree.
• All leaf nodes are on the same level(perfectly balanced). • If the parent node is full, it is also split. Splitting can propagate all the way to the root node, creating a new level if
the root is split.
Root Internal except Root Leaf
Rules MAX MIN
Rules MAX MIN Rules MAX MIN
CHILD m 0
CHILD m ⌈m/2⌉ CHILD 0 0
DATA m-1 1
DATA m-1 ⌈m/2⌉ - 1 DATA m-1 ⌈m/2⌉ - 1

Knowledge Gate Website Knowledge Gate Website


Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 Query Language
insert them into an empty b-tree of order = 3. • After designing a data base, that is ER diagram followed by conversion in relational model
followed by normalization and indexing, now next task is how to store, retrieve and modify
data in the data base.
• So here we will be concentrating more on the retrieval part. Query languages are used for this
purpose. Query languages, data query languages or database query languages (DQLs)
are computer languages using which user request some information from the database. A well
known example is the Structured Query Language (SQL).

Knowledge Gate Website Knowledge Gate Website

Procedural Query Language Non-Procedural Query Language


• Here users instruct the system to performs a sequence of operations on the data base in order • In nonprocedural language, the user describes the desired information without giving a
to compute the desired result. specific procedure for obtaining that information.

• Means user provides both what data to be retrieved and how data to be retrieved. e.g. • What data to be retrieved e.g. Relational Calculus. Tuple relational calculus, Domain relational
Relational Algebra. calculus are declarative query languages based on mathematical logic

Knowledge Gate Website Knowledge Gate Website


• Relational Algebra (Procedural) and Relational Calculus (non-procedural) are mathematical RELATIONAL ALGEBRA
system/ query languages which are used for query on relational model.
• RA like any other mathematical system provides a number of operators and use relations
• RA and RC are not executed in any computer they provide the fundamental mathematics on
(tables) as operands and produce a new relation as their result.
which SQL is based.
• SQL (structured query language) works on RDBMS, and it includes elements of both procedural
or non-procedural query language.
3+4
Relational model RDBMS
RA, RC SQL Operand Operator Operand
Algo Code
Conceptual Reality A⊕B
Theoretical Practical
Chess Battle Field
Knowledge Gate Website Knowledge Gate Website

• Every operator in the RA accepts (one or two) relation/table as input arguments • It also does not consider duplicity by default as it is based on set theory. Same
and returns always a single relation instance as the result without a name. query is written in RA and SQL the result may be different as SQL considers
duplication.

• As it is pure mathematics no use of English keywords. Operators are represented


using symbols.

* Knowledge Gate Website Knowledge Gate Website


BASIC / FUNDAMENTAL OPERATORS DERIVED OPERATORS
• The fundamental operations in the relational algebra are select, project, union,
set difference, Cartesian product, and Rename. • There are several other operations namely: set intersection, natural
join, and assignment.

Name Symbol
Select
(σ) Name Symbol DERIVED FROM
Project
(∏) Join (⋈) (Χ)
Union
(∪) (−)
Set difference
(−)
Intersection (∩) A∩B=A-(A-B)
Cross product
(Χ) Division (÷) (X,-, ∏)
Rename
(ρ) Assignment (=)Knowledge Gate Website
Knowledge Gate Website

• The select, project, and rename operations are called unary operations, because Relational schema - A relation schema R, denoted by R (A1, A2, ..., An), is made up
they operate on one relation. of a relation name R and a list of attributes, A1, A2, ..., An. Each attribute Ai is the
name of a role played by some domain D in the relation schema R. It is use to
• Union, Cartesian product and set difference operate on pairs of relations and describe a Relation.
are, therefore, called binary operations. E.g. Schema representation of Table Student is as –
STUDENT (NAME, ID, CITY, COUNTRY, HOBBY).
• Relational algebra also provides the framework for query optimization.
Relational Instance - Relations with its data at particular instant of time.

Knowledge Gate Website Knowledge Gate Website


The Project Operation (Vertical Selection) Q Write a RELATIONAL ALGEBRA query to find the name of all customer without duplication
having bank account?
• Main idea behind project operator is to select desired columns.
• The project operation is a unary operation that returns its
argument relation, with certain attributes left out. Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches?

• Projection is denoted by the uppercase Greek letter pi (Π). Minimum number of


columns selected can be 1, Maximum selected Columns can be n - 1.

• Πcolumn_name (table_name)

Knowledge Gate Website Knowledge Gate Website

Q Write a RELATIONAL ALGEBRA query to find the name of all customer without duplication The Select Operation (Horizontal Selection)
having bank account? • The select operation selects tuples that satisfy a given predicate/Condition p.
Πcustomer_name (depositor)
• Lowercase Greek letter sigma (σ) is used to denote selection.
Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches?
• It is a unary operator.
(branch)
• Eliminates only tuples/rows.

• σ condition (table_name)

Knowledge Gate Website Knowledge Gate Website


Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000? Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000?
Πaccount_number(σ balance<1000 (account))
Q Write a RELATIONAL ALGEBRA query to find branch name which is situated in Delhi and having
Q Write a RELATIONAL ALGEBRA query to find branch name which is situated in Delhi and having
assets less than 1,00,000?
assets less than 1,00,000?
Πbranch_name(σ (branch_city = delhi) ^ (assets < 1000) (branch))

Knowledge Gate Website Knowledge Gate Website

• Commutative in Nature, σp1 ^ p2(r) = σp2 ^ p1(r) = σp1( σp2(r)) = σp2( σp1(r)) The Union Operation
• It is a binary operation, denoted, as in set theory, by ∪.
• We allow comparisons using =, ≠, <, >, ≤ and ≥ in the selection predicate. • Written as, Expression1 ∪ Expression2, r ∪ s = {t | t ∈ r or t ∈ s}

• Using the connectives and (∧), or (∨), and not (¬), we can combine several predicates into a • For a union operation r ∪ s to be valid, we require that two conditions hold:
larger predicate.
• The relations r and s must be of the same arity. That is, they must have the same number
• Minimum number of tuples selected can be 0, Maximum selected tuples can be all. of attributes.

• The domains of the ith attribute of r and the ith attribute of s must be the same, for all i.

Knowledge Gate Website Knowledge Gate Website


• Some points to remember Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan or an
account or both?
• Deg (R ∪ S) = Deg(R) = Deg(S)
Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan but do not
• Max (ӀRӀ, ӀSӀ) <= ӀRUSӀ <= (ӀRӀ+ӀSӀ) have an account?

Knowledge Gate Website Knowledge Gate Website

Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan or an The Set-Difference Operation
account or both?
Πcustomer_name(depositor)) U Πcustomer_name(borrower)) • The set-difference operation, denoted by −, allows us to find tuples that are in
one relation but are not in another. It is a binary operator.
• The expression r − s produces a relation containing those tuples in r but not in s.
Q Write a RELATIONAL ALGEBRA query to find all the customer name who have a loan but do not
have an account? • We must ensure that set differences are taken between compatible relations.
Πcustomer_name(borrower)) - (Πcustomer_name(depositor)))
• For a set-difference operation r − s to be valid, we require that the relations r
and s be of the same arity, and that the domains of the ith attribute of r and the
ith attribute of s be the same, for all i.
• 0 <= ӀR - SӀ <= ӀRӀ

Knowledge Gate Website Knowledge Gate Website


The Cartesian-Product Operation The Cartesian-Product Operation
• The Cartesian-product operation, denoted by a cross (×), allows us to combine • The Cartesian-product operation, denoted by a cross (×), allows us to combine
information from any two relations. information from any two relations.
• It is a binary operator; we write the Cartesian product of relations R1 and R2 as R1 × R2. • It is a binary operator; we write the Cartesian product of relations R1 and R2 as R1 × R2.
• Cartesian-product operation associates every tuple of R1 with every tuple of R2. • Cartesian-product operation associates every tuple of R1 with every tuple of R2.
• R1 Χ R2 = {rs | r ∈ R1 and s ∈ R2}, contains one tuple <r, s> (concatenation of tuples r • R1 Χ R2 = {rs | r ∈ R1 and s ∈ R2}, contains one tuple <r, s> (concatenation of tuples r
and s) for each pair of tuples r ϵ R1, s ϵ R2. and s) for each pair of tuples r ϵ R1, s ϵ R2.
R1 X R2 R1 X R2
A R1.B R2.B C A R1.B R2.B C
1 P Q X
R1 R2 R1 R2 1 P R Y
A B B C A B B C 1 P S Z
Q X Q X 2 Q Q X
1 P 1 P 2 Q R Y
2 Q R Y 2 Q R Y 2 Q S Z
3 R S Z 3 R S Z 3 R Q X
3 R R Y
Knowledge Gate Website Knowledge3 GateRWebsite
S Z

The Cartesian-Product Operation • R1 Χ R2 returns a relational instance whose schema contains all the fields of R1 (in
• The Cartesian-product operation, denoted by a cross (×), allows us to combine order as they appear in R1) and all fields of R2 (in order as they appear in R2).
information from any two relations.
• It is a binary operator; we write the Cartesian product of relations R1 and R2 as R1 × R2. • If R1 has m tuples and R2 has n tuples the result will be having = m*n tuples.
• Cartesian-product operation associates every tuple of R1 with every tuple of R2.
• R1 Χ R2 = {rs | r ∈ R1 and s ∈ R2}, contains one tuple <r, s> (concatenation of tuples r • Same attribute name may appear in both R1 and R2, we need to devise a naming
and s) for each pair of tuples r ϵ R1, s ϵ R2. schema to distinguish between these attributes.
R1 X R2
A R1.B R2.B C
1 P Q X
R1 R2 1 P R Y
A B B C 1 P S Z
Q X 2 Q Q X
1 P 2 Q R Y
2 Q R Y 2 Q S Z
3 R S Z 3 R Q X
3 R R Y
Knowledge3 GateRWebsite
S Z Knowledge Gate Website
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along
with account balance, who have an account in the bank? with account balance, who have an account in the bank?

Πcustomer_name,balance (σ account.account_number=depositor.account_number (account X depositor))

Knowledge Gate Website Knowledge Gate Website

Rename Operation Additional/Derived Relational-Algebra Operations


• The results of relational algebra are also relations but • If we restrict ourselves to just the fundamental operations, certain common
without any name. This Query do not change the name queries are lengthy to express. Therefore, we use additional operations.
of the table in the original data base, but create a new
copy of the table. • These additional operations do not add any power to the algebra.
• The rename operation allows us to rename the output relation.
• They are used to simplify the queries.
It is denoted with small Greek letter rho ρ. Where the result
of expression E is saved with name of x.
• ρx(A1, A2, A3, A4,…..AN)(E)

• ρLearner(Student)

• ρLearner(Stu_ID, User_Name, Age)(Student(Roll_No, Name, Age))


Knowledge Gate Website Knowledge Gate Website
Set-Intersection Operation Q Write a RELATIONAL ALGEBRA query to find all the customer name
• We will be using ∩ symbol to denote set intersection. who have both a loan and an account?

• r ∩ s = r − (r − s)
Πcustomer_name(depositor)) U Πcustomer_name(borrower))

• Set intersection is not a fundamental operation and does not add any
power to the relational algebra.

• r ∩ s = {t | t ∈ r and t ∈ s}

• 0 <= Ӏ R∩S Ӏ <= min (ӀRӀ, ӀSӀ)

Knowledge Gate Website Knowledge Gate Website

DIVISION
The Natural-Join Operation
*
• The natural join is a binary operation that allows us to combine certain
• In general, the DIVISION operation is applied when we have query like student
selections and a Cartesian product into one operation. who have completed both database1 and data base2 tasks.

• The natural join is associative in nature, the natural join is a Lossy operator.

R1 R2
A B B C R1 ⋈ R2
1 P Q X A B C
2 Q R Y 2 Q X
3 R S Z 3 R Y

Knowledge Gate Website Knowledge Gate Website


πStudent(R) − {πStudent[(πStudent(R) × S) − πStudent,Task(R)]}

Knowledge Gate Website Knowledge Gate Website

Introduction to SQL Overview of the SQL Query Language


• Structured Query Language is a domain-specific language (not general purpose) used in 1. IBM developed the original version of SQL, originally called Sequel (Structured English Query
programming and design for managing data held in a relational database management Language), as part of the System R project in the early 1970s.
system (RDBMS).
2. The Sequel language has evolved since then, and its name has changed to SQL (Structured Query
• Although we refer to the SQL language as a “query language,” it can do much more than just Language) (some other company has trademark on the word sequel). SQL has clearly established itself
query a database. It can define the structure of the data base, modify data in the database, as the standard relational database language.
specify security constraints and number of other tasks.
3. In 1986, the American National Standards Institute (ANSI) and the International Organization for
Standardization (ISO) published an SQL standard, called SQL-86.
• Originally based upon relational algebra(procedural) and tuple relational calculus (Non-
procedural) mathematical model. 4. The next version of the standard was SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008,
SQL:2011, SQL: 2016, SQL: 2019and most recently SQL:2023.

Knowledge Gate Website Knowledge Gate Website


Classification of database languages 3. Data Control Language (DCL) :
1. Data Definition Language (DDL) : 1. a. It is the component of SQL statement that control access to data and to the database.
1. a. DDL is set of SQL commands used to create, modify and delete database structures but not data. 2. b. Commit, rollback command are used in DCL. GRANT and REVOKE statement
2. b. They are used by the DBA to a limited extent, a database designer, or application developer.
3. c. Create, drop, alter, truncate are commonly used DDL command. CREATE, ALTER, DROP, TRUNCATE, 4. Data Query Language (DQL) :
COMMENT, GRANT, REVOKE statement 1. a. It is the component of SQL statement that allows getting data from the database and imposing
ordering upon it.
2. Data Manipulation Language (DML) : 2. b. It includes select statement. SELECT statement
1. a. A DML is a language that enables users to access or manipulates data as organized by the appropriate
data model.
2. b. There are two types of DMLs :
5. View Definition Language (VDL) :
1. i. Procedural DMLs : It requires a user to specify what data are needed and how to get those data. 1. 1. VDL is used to specify user views and their mapping to conceptual schema.
2. ii. Declarative DMLs (Non-procedural DMLs) : It requires a user to specify what data are needed without 2. 2. It defines the subset of records available to classes of users.
specifying how to get those data. 3. 3. It creates virtual tables and the view appears to users like conceptual level.
3. c. Insert, update, delete, query are commonly used DML commands. INSERT, UPDATE, DELETE statement 4. 4. It specifies user interfaces. SQL is a DML language.

Knowledge Gate Website Knowledge Gate Website

CREATE TABLE table_name ( • list of some common data types supported by SQL along with a brief description of each:
column1 data_type [constraints],
• Numeric Data Types:
column2 data_type [constraints],
• 1. `INT`: For storing integer values.
column3 data_type [constraints], • 2. `SMALLINT`: A smaller range of integers compared to INT.
... • 3. `BIGINT`: For storing larger integers.
); • 4. `DECIMAL(p, s)`: For storing exact numerical values, where `p` is the precision and `s` is
the scale.
• 5. `FLOAT`: For storing floating-point numbers.
CREATE TABLE Students ( • 6. `REAL`: A data type that can store floating-point numbers, generally with less precision
StudentID INT PRIMARY KEY, compared to FLOAT.
FirstName VARCHAR(50),
LastName VARCHAR(50), • String Data Types:
• 7. `VARCHAR(n)`: Variable-length character string, where `n` is the maximum length.
Age INT,
• 8. `CHAR(n)`: Fixed-length character string, where `n` is the length.
Email VARCHAR(100) • 9. `TEXT`: For storing long text strings.
);
Knowledge Gate Website Knowledge Gate Website
Adding a New Column Renaming a Column CREATE TABLE Orders (
ALTER TABLE Employees
ALTER TABLE Employees RENAME COLUMN PhoneNumber TO ContactNumber;
OrderID INT PRIMARY KEY,
ADD PhoneNumber VARCHAR(15); CustomerID INT,
Renaming a table OrderDate DATE,
Dropping a Column FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID)
ALTER TABLE Employees ALTER TABLE Employees
RENAME TO Staff; );
DROP COLUMN PhoneNumber;

DROP TABLE table_name;


Modifying an Existing Column ALTER TABLE Orders
ALTER TABLE Employees ADD FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID);
MODIFY COLUMN PhoneNumber VARCHAR(20);

ALTER TABLE Employees


ALTER COLUMN PhoneNumber VARCHAR(20);

Knowledge Gate Website Knowledge Gate Website

INSERT INTO table_name (column1, column2, column3, ...) DELETE FROM table_name
VALUES (value1, value2, value3, ...); WHERE condition;

INSERT INTO Students (StudentID, FirstName, LastName, Age) DELETE FROM Students
VALUES WHERE StudentID = 1;
(1, 'Amit', 'Sharma', 20),
(2, 'Priya', 'Gupta', 22),
(3, 'Ravi', 'Kumar', 19);
DELETE FROM table_name;

Knowledge Gate Website Knowledge Gate Website


Select A1, A2,..., An (Column name)
Basic Structure of SQL Queries from r1, r2,... , rm (Relation/table name)
Where P; (Condition)
• For any SQL as query, input and output both are relations. Number of relations inputs to a
query will be at least one, but output will always be a single relation without any name unless 1. It is to be noted only select and from are mandatory clauses, and if not required then it is not essential to write
specified, but columns will have names from input tables. where. If the where clause is omitted, the predicate P is true.

• The basic structure of an SQL query consists of three clauses: select, from, and where. The 2. SQL in general is not case sensitive i.e. it doesn’t matter whether we write query in upper or lower case.
query takes it’s input the relations listed in the from clause, operates on them as specified in
3. In practice, duplicate elimination is time-consuming. Therefore, SQL allows duplicates in relations as well as in
the where and select clauses, and then produces a relation as the result without any name the results of SQL expressions. In those cases where we want to force the elimination of duplicates, we insert
unless specified. the keyword distinct after select, will discuss in detail later. SQL allows us to use the keyword all to specify
explicitly that duplicates are not removed, Since duplicate retention is the default, we shall not use all in our
• A typical SQL query has the form. examples.

Select A1, A2,..., An (Column name)


from r1, r2,... , rm (Relation/table name)
Where P; (Condition)
Knowledge Gate Website Knowledge Gate Website

Select Clause Q Write a SQL query to find all the details of bank branches?
• The function of Select clause in SQL is more or less same as that of ‘∏’ projection in the relational
algebra. It is used to pick the column required in result of the query out of all the columns in Q Write a SQL query to find each loan number along with loan amount?
relation/table. (Vertical filtering)
• Select A1, A2,..., An (Column name) Q Write a SQL query to find the name of all customer without duplication having bank account?
• We can use ‘*’ to specify that we need all columns
• Select * Q Write a SQL query to find all account_no and balance with 6% yearly interest added to it?
• The select clause may also contain arithmetic expressions involving the operators +, -, / and *
operating on constants or attributes of tuples. however, that it does not result in any change to the
relation/table.

Knowledge Gate Website Knowledge Gate Website


Q find all the details of bank branches? Select Clause with where clause
Select *
from branch 1. Where clause in SQL is same as ‘σ’ sigma of relational algebra where we specify the
conditions/Predicate (horizontal filtering).
Q find each loan number along with loan amount? 2. Where clause can have expressions involving the comparison operators <, <, >, >=, <= and <>. SQL
Select loan_number, amount allows us to use the comparison operators to compare strings and arithmetic expressions.
from loan 3. SQL allows the use of the logical connectives and, or, and not in the where clause.
4. SQL includes a between comparison operator to simplify where clauses that specify that a value be
Q find the name of all customer without duplication having bank account? less than or equal to some value and greater than or equal to some other value.
Select distinct customer_name
5. Similarly, we can use the not between comparison operator.
from depositor

Q find all account_no and balance with 6% yearly interest added to it?
Select account_number, balance*1.06
from account

Knowledge Gate Website Knowledge Gate Website

Q Write a SQL query to find all account_no where balance is less the 1000? Q find all account_no where balance is less the 1000?
Select account_number
Q Write a SQL query to find branch name which is situated in Delhi and having assets less than 1,00,000? from account
Where balance < 1000
Q Write a SQL query to find branch name and account_no which has balance greater than equal to 1,000 but less
than equal to 10,000? Q Write a SQL query to find branch name which is situated in Delhi and having assets less than 1,00,000?
Select branch_name
from branch
Where branch_city = ‘delhi’ and assets < 1,00,000

Q Write a SQL query to find branch name and account_no which has balance greater than equal to 1,000 but less
than equal to 10,000?
Select branch_name, account_number
from account
Where balance between 1000 and 10000

Select branch_name, account_number


from account
Where balance >= 1000 and balance<=10000

Knowledge Gate Website Knowledge Gate Website


Set Operation Q Write a SQL query to find all the customer name
a) who have a loan or an account or both ?
1. The SQL operations union, intersect, and except/minus operate on relations and corresponds to
the mathematical set-theory operations ∪, ∩ and – respectively. b) who have both a loan and an account?
2. The union operation automatically eliminates duplicates, unlike the select clause, If we want to
retain all duplicates, we must write union all in place of union. c) who have a loan but do not have an account?

3. The intersect operation automatically eliminates duplicates. If we want to retain all duplicates,
we must write intersect all in place of intersect.

4. If we want to retain duplicates, we must write except all in place of except.

Knowledge Gate Website Knowledge Gate Website

Q Write a SQL query to find all the customer name


a) who have a loan or an account or both ?
Queries on Multiple Relations
Select customer_name • The from clause by itself defines a Cartesian product of the relations listed in the clause.
From depositor Cartesian product of two relations, which concatenates each tuple of the first relation with
Union
Select customer_name
every tuple of the second
From borrower • Since the same attribute name may appear in both r1 and r2, we prefix the name of the
b) who have both a loan and an account? relation from which the attribute originally came, before the attribute name.For those
Select customer_name attributes that appear in only one of the two schemas, we shall usually drop the relation-
From depositor name prefix. This simplification does not lead to any ambiguity.
intersect
• Cartesian Product is commutative in nature R1 X R 2
Select customer_name
From borrower A R1.B R2.B C
1 P Q X
c) who have a loan but do not have an account? R1 R2 1 P R Y
Select customer_name A B B C 1 P S Z
From borrower 2 Q Q X
1 P Q X 2 Q R Y
Except 2 Q R Y 2 Q S Z
Select customer_name 3 R Q X
3 R S Z 3 R R Y
From depositor
Knowledge Gate Website Knowledge Gate Website
3 R S Z
Q Write a RELATIONAL ALGEBRA query to find the name of all the customers along Q find the name of all the customers with account balance, who have an account
with account balance, who have an account in the bank? in the bank?

Select customer_name, balance


From account, depositor
Where account.account_number = depositor.account_number

Knowledge Gate Website Knowledge Gate Website

Natural Join Q Write a SQL query to find the name of all the customers along with account
• To make the life of an SQL programmer easier for this common case, SQL supports an operation called
balance, who have an account in the bank?
the natural join. The natural join operation like cartesian product operates on two relations and
produces a relation as the result.
• Natural join considers only those pairs of tuples with the same value on those attributes that appear in
the schemas of both relations. Notice that we do not repeat those attributes that appear in the
schemas of both relations; rather they appear only once.
• Notice also the order in which the attributes are listed: first the attributes common to the schemas of
both relations, second those attributes unique to the schema of the first relation, and finally, those
attributes unique to the schema of the second relation. Commutative in nature.

R1 R2
A B B C R1 ⋈ R2
1 P Q X A B C
2 Q R Y 2 Q X
3 R S Z 3 R Y
Knowledge Gate Website Knowledge Gate Website
Q Write a SQL query to find the name of all the customers along with account Outer Join
balance, who have an account in the bank? • The problem with natural join or join or inner join is only those values that appears in both
relations will manage to reach final table, but if some value is explicitly in table one or in
Select customer_name, balance second table then that information will be lost, and that will be loss of information.
From account natural join depositor
• The outer join operation works in a manner similar to the join operations we have already
studied, but preserve those tuples that would be lost in a join, by creating tuples in the result
containing null values.
• There are in fact three forms of outer join:
• The left outer join (left join) preserves tuples only in the relation named before (to the
left of) the left outer join operation.
• The right outer join (right join) preserves tuples only in the relation named after (to the
right of) the right outer join operation.
• The full outer join preserves tuples in both relations.

Knowledge Gate Website Knowledge Gate Website

R1 R2 R1 * R2 R1 ⋈ R2
Alias Operation/rename
A B B C A R1.B R2.B C A B C • SQL aliases are used to give a table, or a column in a table, a temporary name. Just create a
1 P Q X 1 P Q X 2 Q X new copy but do not change anything in the data base. An alias only exists for the duration of
2 Q R Y 1 P R Y 3 R Y
3 R S Z 1 P S Z
the query.
2 Q Q X • Aliases are often used to make column names more readable.
2 Q R Y
2 Q S Z • It uses the as clause, taking the form: old-name as new-name. The as clause can appear in
3 R Q X both the select and from clauses.
3 R R Y
3 R S Z

R1 ⟕ R2 R1 ⟖R2 R1 ⟗ R2
A B C A B C A B C
1 P null 2 Q X 1 P null
2 Q X
2 Q X 3 R Y
3 R Y
3 R Y null S Z null S Z

Knowledge Gate Website Knowledge Gate Website


Q Write a SQL query to find the account_no along and balance with 8% interest, as Q Write a SQL query to find the account_no along and balance with 8% interest, as
Account, total_balance? Account, total_balance?

Select account_number, balance*1.06 as total_balance


From account

Knowledge Gate Website Knowledge Gate Website

Q Write a SQL query to find the loan_no with maximum loan amount? Aggregate Functions
Select balance • Aggregate functions are functions that take a collection (a set or multiset) of values as input
From account and return a single value. SQL offers five built-in aggregate functions:
Except • Average: avg
Select A.balance • Minimum: min
From account as A, account as B • Maximum: max
• Total: sum
Where A.balance <B.balance
• Count: count
• The input to sum and avg must be a collection of numbers, but the other operators can
operate on collections of nonnumeric data types, such as strings, as well. Count is the only
aggregate function which can work with null, all other aggregate functions simply ignore null.
• We use the aggregate function count frequently to count the number of tuples in a relation.
The notation for this function in SQL is count (*).

Knowledge Gate Website Knowledge Gate Website


Q find the number of accounts in the bank? Q Consider a table along with two query?
Select count(*)
from account Select avg (balance)
from account
Q find the average balance of every account in the banks from south_delhi branch? Account_no balance Branch_name
Select avg(balance) Abc123 100 N_delhi
from account Pqr123 500 S_mumbai
Where branch_name = ‘south_delhi’ Wyz123 null S_delhi

Select sum(balance)/count(balance)
from account

Knowledge Gate Website Knowledge Gate Website

Ordering the Display of Tuples String Operations


• SQL offers the user some control over the order in which tuples in a relation are displayed. • SQL specifies strings by enclosing them in single quotes, for example, ’Computer’. The SQL
The order by clause causes the tuples in the result of a query to appear in sorted order. standard specifies that the equality operation on strings is case sensitive; as a result the
expression ‘Computer’ = ’computer’ evaluates to false.
Q find all the branch_name which are situated in Delhi in alphabetic order? • However, some database systems, such as MySQL and SQL Server, do not distinguish
Select distinct branch_name uppercase from lowercase when matching strings; as a result, would evaluate to true on
from branch these databases. This default behavior can, however, be changed, either at the database level
where branch_city = 'Delhi’ or at the level of specific attributes.
Order by branch_name aesc;
• SQL also permits a variety of functions on character strings, such as concatenating, extracting
Select distinct branch_name substrings, finding the length of strings, converting strings to uppercase and lowercase,
from branch removing spaces at the end of the string and so on. There are variations on the exact set of
where branch_city = 'Delhi’ string functions supported by different database systems.
Order by branch_name desc;

Knowledge Gate Website Knowledge Gate Website


• Pattern matching can be performed on strings, using the operator like. We Q find all the branch name who have exactly 5 character in their name ?
describe patterns by using two special characters:
• Percent (%): The % character matches any substring.
Q find all the customer name who have ‘kumar’ in their name ?
• Underscore (_): The _ character matches any character.

• ’%Comp%’ matches any string containing “Comp” as a substring, for example,


’Intro to Computer Science’, and ’Computational Biology’.
• ’_ _ _’ matches any string of exactly three characters.
• ’_ _ _%’ matches any string of at least three characters.

Knowledge Gate Website Knowledge Gate Website

Q find all the branch name who have exactly 5 character in their name ? • We define the escape character for a like comparison using the escape keyword. To illustrate,
Select branch_name consider the following patterns, which use a backslash (\) as the escape character:
from branch
where branch_name like ‘_ _ _ _ _ _’ • like ’ab\%cd%’ escape ’\’ matches all strings beginning with “ab%cd”.

Q find all the customer name who have ‘kumar’ in their name ? • like ’ab\\cd%’ escape ’\’ matches all strings beginning with “ab\cd”.
Select customer_name
from customer
where customer_name like ‘%kumar%’

Knowledge Gate Website Knowledge Gate Website


Group by clause Q find the average account balance of each branch?
Select branch_name, avg(balance)
1. There are circumstances where we would like to work on a set of tuples in a relation rather
than working on the whole table as one unit. from account
Group by branch_name
2. The attribute or attributes given in the group by clause are used to form groups. Tuples with
the same value on all attributes in the group by clause are placed in one group.

Knowledge Gate Website Knowledge Gate Website

Q find the branch name of Gwalior city with average balance more than 1500? • Trigger Definition
Select branch.branch_name, avg(balance) • A trigger is a special procedure that automatically activates in response to modifications on
from branch, account a table or view in a database, aiding in maintaining data integrity.
Where branch.branch_name = account.branch_name and branch_city = ‘gwalior’
Group by branch_name • Usage of Triggers
Having avg(balance) > 1500 • Triggers help maintain database integrity by automatically enforcing conditions or
modifying data during database operations. They are crucial for applying business rules,
auditing database alterations, and supporting data replication, ensuring compliance before
any data insertions, updates, or deletions.

Knowledge Gate Website Knowledge Gate Website


• Embedded SQL Relational Calculus
• Embedded SQL is a method where SQL statements are incorporated directly into a
procedural programming language, such as C or Java. It allows the programmers to
• Relational calculus is non-procedural query language, where we have
integrate SQL queries within their code, facilitating the interaction between the to define what to get and not how to get it
database and the application. Embedded SQL statements are static and defined at
compile time.
• Dynamic SQL
Relational
• Dynamic SQL, on the other hand, enables the construction of SQL statements
Calculus
dynamically at runtime. It allows the creation of more flexible and adaptable
applications, where SQL statements can be generated and executed based on
changing conditions or user inputs. This makes it possible to create more complex
and adaptable database operations, though it might be more susceptible to SQL Tuple Domain
injection attacks if not handled carefully. relational relational
calculus calculus

Knowledge Gate Website Knowledge Gate Website

Tuple Relational Calculus Student(Roll No, Name, Branch)


• TRC is based on specifying a number of tuple variables. Each tuple variable usually range over
a particular database relation, meaning that the variable may takes as its value from any
Q Find the details of all computer science students?
individual tuple of a relation.
SQL: select * from student where branch = CSE
• A simple tuple relational calculus query is of the form.
• {t| Condition(t)} RA: {σ branch = CSE (Student)}
• Where t is a tuple variable and condition(t) is a conditional expression involving. The result of TRC: {t | Student(t) ∩ t. branch = CSE}
such a query is the set of all tuple t that satisfy condition (t).
• We use t[A] or t.A to denote the value of tuple t on attribute A. We use t ∈ r or r(t) to denote DRC:
that tuple t is in relation r.

Knowledge Gate Website Knowledge Gate Website


Student(Roll No, Name, Branch) Q Find all the details of loan for amount over 1200?
Q Find the Roll No of all computer science students?
{t| t ∈ loan ∧ t[amount] > 1200] }
SQL: select Roll No from student where branch = CSE
RA: {∏roll no (σ branch = CSE (Student))}
TRC: {t. Roll No | Student(t) ∩ t. branch = CSE}
DRC:

Knowledge Gate Website Knowledge Gate Website

Q Find the loan number for each loan of amount over 1200? Q Find the name of all the customers who have a loan from Noida branch?
{t| ∃s ∈ loan (t[loan number] = s[loan number]) ∧ s[amount] > 1200] } {t| ∃s ∈ borrower (t[customer name] = s[customer name])
∧ ∃u ∈ loan (u[customer name] = s[customer name])
∧ u[branch name] = ‘Noida’ }

Knowledge Gate Website Knowledge Gate Website


Q Find the name of all the customers who have a loan or account or both at the bank? Q Find the name of all the customers who have a loan and account both at the bank?
{t| ∃s ∈ borrower (t[customer name] = s[customer name]) {t| ∃s ∈ borrower (t[customer name] = s[customer name])
V ∃u ∈ depositor (t[customer name] = u[customer name]) } ∧ ∃u ∈ depositor (t[customer name] = u[customer name]) }

Knowledge Gate Website Knowledge Gate Website

Q Find the name of all the customers who have a loan from the bank and do not Expressive Power of Languages
have a account?
• The tuple relational calculus restricted to safe expressions is equivalent in expressive
{t| ∃u ∈ depositor borrower (t[customer name] = u[customer name]) power to the basic relational algebra (with the operators ∪, −, ×, and , but without
the extended relational operations such as generalized projection and aggregation
∧ ¬∃s ∈ borrower (t[customer name] = s[customer name]) } (G)).

• For every relational-algebra expression using only the basic operations, there is an
equivalent expression in the tuple relational calculus, and for every tuple-relational-
calculus expression, there is an equivalent relational algebra expression.

Knowledge Gate Website Knowledge Gate Website


Safety of Expressions Domain Relational Calculus
• A tuple-relational-calculus expression may generate an infinite relation. • Domain calculus differs from the tuple calculus in the type of variables used in
• Example: {t |¬ (t ∈ instructor )} formulas: rather than having variables range over tuples.

• There are infinitely many tuples that are not in instructor. Most of these tuples • The variable range over single values from domains of attributes. To form a relation of
contain values that do not even appear in the database. We do not want to degree n for a query result, we must have n of these domain variables. One for each
allow such expressions. attribute.
• An expression of the domain calculus is of the form
• (x1, x2, …, xn | COND(x1, x2, …, xn, xn+1, xn+2, …, x n+m ))
• Where x1, x2, …, xn are domain variables that range over domains and COND is a
condition of the domain relational calculus.

Knowledge Gate Website Knowledge Gate Website

Student(Roll No, Name, Branch) Student(Roll No, Name, Branch)


Q Find the details of all computer science students?
SQL: select * from student where branch = CSE
Q Find the Roll No of all computer science students?
RA: {σ branch = CSE (Student)} SQL: select Roll No from student where branch = CSE
TRC: {t | Student(t) ∩ t. branch = CSE}
RA: {∏sname (σ branch = CSE (Student))}
DRC: {(Roll No, Name, Branch) | Student (Roll no, Name, Branch) ∩ branch = CSE}
TRC: {t. Roll No | Student(t) ∩ t. branch = CSE}
DRC: {(Roll No) | Student (Roll no, Name, Branch) (Name, Branch) ∩ branch = CSE}

Knowledge Gate Website Knowledge Gate Website


Instructor(instructorID, name, dept name, and salary) Q Find the branch name, loan number and amount for loan of amount over 1200?
Example: Find all details of instructors whose salary is greater than $80,000 { < l, b, a > | < l, b, a > ∈ loan ∧ a > 1200] }

{< i, n, d,s > | < i, n, d,s > ∈ instructor ∧ s > 80000}

Knowledge Gate Website Knowledge Gate Website

Q Find the loan number for each loan of amount over 1200? Q Find the name of all the customers who have a loan from Noida branch with loan amount?
{ < c, a > | ∃l (< c, l > ∈ borrower ∧ ∃b (< l, b, a > ∈ loan ∧ b = ‘Noida’ ))}
{ < l > | ∃b, a < l, b, a > ∈ loan ∧ a > 1200] }

Knowledge Gate Website Knowledge Gate Website


Q Find the name of all the customers who have a loan or account or both at the bank? Safety of Expressions
{ < c > | ∃l (< c, l > ∈ borrower ∧ ∃b, a (< l, b, a > ∈ loan ∧ b = ‘Noida’)) • Similar to TRC we can have an unsafe expression that may generate infinite
V ∃a (< c, a > ∈ depositor ∧ ∃b, a (< a, bn, bal > ∈ account ∧ bn = ‘Noida’))} relations.
• An expression such as
• {< i, n, d,s > | ¬ (< i, n, d,s > ∈ instructor )}
• is unsafe, because it allows values in the result that are not in the domain of the
expression.

Knowledge Gate Website Knowledge Gate Website

Expressive Power of Languages TRANSACTION


T1
• When the domain relational calculus is restricted to safe expressions, it is equivalent in • Why we study transaction?
Read(A)
expressive power to the tuple relational calculus restricted to safe expressions. • According to general computation principle (operating system)
we may have partially executed program, as the level of A = A-100
• All three of the following are equivalent:
atomicity is instruction i.e. either an instruction is executed Write(A)
o The basic relational algebra (without the extended relational-algebra operations)
completely or not
o The tuple relational calculus restricted to safe expressions Read(B)
o The domain relational calculus restricted to safe expressions • But in DBMS view, user perform a logical work(operation)
which is always atomic in nature i.e. either operation is
B = B+100
• Domain relational calculus also does not have any equivalent of the aggregate operation, but execute or not executed, there is no concept like partial Write(B)
it can be extended to support aggregation, and extending it to handle arithmetic expressions execution. For example, Transaction T1 which transfer 100
is straightforward. units from account A to B.

• In this transaction if a failure occurs after Read(B) then the


final statue of the system will be inconsistent as 100 units are
debited from account A but not credited in account B, this will
generate inconsistency. Here for ‘consistency’ before (A + B) ==
Knowledge Gate Website after (A + B)”. Knowledge Gate Website
What is transaction • As here we are only concerned with DBMS so we well only two basic operation
on database.
• To remove this partial execution problem, we increase the level of atomicity and
bundle all the instruction of a logical operation into a unit called transaction. • READ (X) - Accessing the database item x from disk (where database stored data)
to memory variable also name as X.
• So formally ‘A transaction is a Set of logically related instructions to perform a
logical unit of work’. • WRITE (X) - Writing the data item from memory variable X to disk.

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)
Knowledge Gate Website Knowledge Gate Website

Desirable properties of transaction • Transactions should possess several properties, often called the ACID
properties; to provide integrity and consistency of the data in the
• Now as the smallest unit which have atomicity in DBMS view is transaction, so if database. The following are the ACID properties:
want that our data should be consistent then instead of concentrating on data
base, we must concentrate on the transaction for our data to be consistent.

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)

Knowledge Gate Website Knowledge Gate Website


• Atomicity - A transaction is an atomic unit of processing; it should • Consistency - A transaction should be consistency preserving, meaning that if it
either be performed in its entirety or not performed at all. is completely executed from beginning to end without interference from other
transactions, it should take the database from one consistent state to another.

T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)

Knowledge Gate Website Knowledge Gate Website

• Isolation - A transaction should appear as though it is being executed in isolation • Durability - The changes applied to the database by a
from other transactions, even though many transactions are executing
concurrently.
committed transaction must persist in the database.
• That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently.

Knowledge Gate Website Knowledge Gate Website


Transaction states Transaction states
• ACTIVE - It is the initial state. Transaction remains in this state while it is • PARTIALLY COMMITTED - After the final statement of a transaction has been
executing operations. executed, the state of transaction is partially committed as it is still possible that
it may have to be aborted (due to any failure) since the actual output may still be
temporarily residing in main memory and not to disk.

Knowledge Gate Website Knowledge Gate Website

Transaction states Transaction states


• FAILED - After the discovery that the transaction can no longer proceed (because • ABORTED - A transaction is said to be in aborted state when the when the
of hardware /logical errors). Such a transaction must be rolled back. transaction has been rolled back and the database has been restored to its state
prior to the start of execution.

Knowledge Gate Website Knowledge Gate Website


Transaction states Why we need concurrent execution
• COMMITTED - A transaction enters committed state after successful completion • Concurrent execution is necessary because-
of a transaction and final updation in the database. • It leads to good database performance , less weighting time.
• Overlapping I/O activity with CPU increases throughput and response time.

Knowledge Gate Website Knowledge Gate Website

PROBLEMS DUE TO CONCURRENT EXECUTION OF TRANSACTION Solution is Schedule


• But interleaving of instructions between transactions may also lead to many problems • When two or more transaction executed together or one after another then
that can lead to inconsistent database. they can be bundled up into a higher unit of execution called schedule.

• Sometimes it is possible that even though individual transaction are satisfying the acid
properties even though the final statues of the system will be inconsistent.

Knowledge Gate Website Knowledge Gate Website


• Serial schedule - A serial schedule consists of sequence of instruction belonging to • Non-serial schedule - A schedule in which sequence of instructions of a transaction
different transactions, where instructions belonging to one single transaction appear appear in the same order as they appear in individual transaction but the instructions
together. Before complete execution of one transaction another transaction cannot be may be interleaved with the instructions of different transactions i.e. concurrent
started. Every serial schedule lead database into consistent state. execution of transactions takes place.

Knowledge Gate Website Knowledge Gate Website

• Conclusion of schedules
• We do not have any method to proof that a schedule is consistent, but from the
above discussion we understand that a serial schedule will always be consistent. On the basis of
SERIALIZABILITY
• So if somehow we proof that a non-serial schedule will also have same effects as of
a serial schedule than we can proof that, this particular non-serial schedule will
also be consistent.
Conflict
serializable

View
serializable

Knowledge Gate Website Knowledge Gate Website


T1 T2 T1 T2 T1 T2 T1 T2 Conflict equivalent – if one schedule can be converted to another schedule by swapping
R(A) R(B) R(A) W(A) of non- conflicting instruction then they are called conflict equivalent schedule.
R(B) R(A) W(A) R(A)

T1 T2 T1 T2
T1 T2 T1 T2 T1 T2 T1 T2
R(A) W(B) R(A) W(A) R(A) R(B)
W(B) R(A) W(A) R(A)
A=A-50 B=B+50
R(B) R(A)
T1 T2 T1 T2 T1 T2 T1 T2
B=B+50 A=A-50
R(A) R(A) W(A) W(A)
R(A) R(A) W(A) W(A) R(B) R(B)
B=B+50 B=B+50
R(A) R(A)
A=A+10 A=A+10
Knowledge Gate Website Knowledge Gate Website

SERIALIZABILITY CONFLICT SERIALIZABLE


• The schedules which are conflict equivalent to a serial schedule are called conflict serializable
• Conflicting instructions - Let I and J be two consecutive instructions belonging to two different schedule.
transactions Ti and Tj in a schedule S, the possible I and J instruction can be as-
• I= READ(Q), J=READ(Q) ->Non-conflicting • If a schedule S can be transformed into a schedule S’ by a series of swaps of non- conflicting
• I= READ(Q), J=WRITE(Q) ->Conflicting instructions, we say that S and S’ are conflict equivalent. A schedule S is conflict serializable, if
• I= WRITE(Q), J=READ(Q) ->Conflicting it is conflict equivalent to a serial schedule.
• I= WRITE(Q), J=WRITE(Q) ->Conflicting

• So, the instructions I and J are said to be conflicting, if they are operations by different
transactions on the same data item, and at least one of these instructions is a write operation.

Knowledge Gate Website Knowledge Gate Website


Q Consider the following schedule for transactions T1, T2 and T3: what is the correct Procedure for determining conflict serializability of a schedule
serialization of the above?
• It can be determined using PRECEDENCE GRAPH method:
• A precedence graph consists of a pair G (V, E)
• V= set of vertices consisting of all the transactions participating in the schedule.
• E= set of edges consists of all edges Ti → Tj, for which one of the following conditions holds:
• Ti executes write(Q) before Tj executes read(Q)
• Ti executes read(Q) before Tj executes write(Q)
• Ti executes write(Q) before Tj executes write(Q)

• If an edge Ti → Tj exists in the precedence graph, then in any serial schedule S’ equivalent to S,
Ti must appear before Tj.
• If the precedence graph for S has no cycle, then schedule S is conflict serializable, else it is
not.

Knowledge Gate Website Knowledge Gate Website

VIEW SERIALIZABLE • Two schedules S and S’ are view equivalent, if they satisfy following conditions –

• If a schedule is not conflict serializable, still it can be consistent, so let us study a weaker form • For each data item Q, if the transaction Ti reads the initial value of Q in schedule S , then then the
of serializability called View serializability, and even if a schedule is view serializable still it can transaction Ti must, in schedule S’ ,also read the initial value of Q.
be consistent. • If a transaction Ti in schedule S reads any data item Q, which is updated by transaction Tj, then a
transaction Ti must in schedule S’ also read data item Q updated by transaction Tj in schedule S’.
• If a schedule is conflict serializable then it will also be view serializable, so we must check view
serializability only if a schedule is not conflict serializable. • For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule
S, then the same transaction must also perform final write(Q) in schedule S’.

Knowledge Gate Website Knowledge Gate Website


View Serializable

A schedule S is view serializable, if it is view equivalent to a serial schedule. On the basis of On the basis of
SERIALIZABILITY RECOVERABILITY

Conflict
Recoverable
serializable

View
Cascadeless
serializable

Strict

Knowledge Gate Website Knowledge Gate Website

NON- RECOVERABLE Vs RECOVERABLE SCHEDULE CASCADING ROLLBACK Vs CASCADELESS SCHEDULE


• A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data item • It is a phenomenon, in which even if the schedule is recoverable, the a single transaction failure leads
previously written by Ti, then the commit or abort operation of Ti appears before Tj. Such a to a series of transaction rollbacks, is called cascading rollback. Cascading rollback is undesirable, since
schedule is called Non- Recoverable schedule. it leads to undoing of a significant amount of work. Uncommitted reads are not allowed in cascade less
schedule.
• A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data item
previously written by Ti, then the commit or abort of Ti must appear before Tj. Such a schedule • A schedule in which for each pair of transactions T and T such that if T reads a data item previously
i j, j
is called Recoverable schedule. written by Ti then the commit or abort of Ti must appear before read operation of Tj. Such a schedule
is called cascade less schedule.
S S S S
T1 T2 T1 T2 T1 T2 T3 T1 T2 T3
R(X) R(X) R(X) R(X)
W(X) W(X)
W(X) W(X) R(X) C
R(X) R(X) W(X) R(X)
R(X) W(X)
C C C C
C C C R(X)
C C
Knowledge Gate Website Knowledge Gate Website
Strict Schedule
• A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item
previously written by Ti then the commit or abort of Ti must appear before read and write
operation of Tj.

S1 S2 S3
T1 T2 T1 T2 T1 T2
R(a) R(a) R(a)
W(a) W(a) R(b)
W(a) C W(a)
C W(a) W(b)
R(a) R(a) C
C C R(a)
C

Knowledge Gate Website Knowledge Gate Website

Log based Recovery Aspect Deferred Database Modification Immediate Database Modification

• The system log, or transaction log, records all the changes or activities happening in the Write Operation Timing Only at the commit point. As soon as changes occur.
database, ensuring that transactions are durable and can be recovered in the event of a
failure.
Reduced, as changes are batched
Increased, as each change triggers
• Different types of log records represent various stages or actions in a transaction. and written at once during the
I/O Operations immediate write operations, leading
• <Ti start>: This log entry indicates that the transaction Ti has started its execution. commit, saving on intermediate I/O
to more frequent I/O operations.
• < Ti, Xj, V1, V2>: This log entry documents a write operation. It states that transaction Ti has operations.
changed the value of data item Xj from V1 to V2.
• < Ti commit>: This log entry marks the successful completion of transaction Ti.
• < Ti abort>: This log entry denotes that the transaction Ti has been aborted, either due to
an error or a rollback operation. Simpler, as uncommitted changes More complex, as it may require
are not reflected in the database, undoing changes from uncommitted
Recovery Complexity
• Before any write operation modifies the database, a log record of that operation needs to be making rollback easier in case of transactions that have been written
created. This is to ensure that in case of a failure, the system can restore the database to a failures. to the database.
consistent state using the log records.
Knowledge Gate Website Knowledge Gate Website
Shadow Paging Recovery Technique • Commit: Upon transaction commit, the shadow page table is discarded, and the current page
table becomes the new committed state of the database. The database atomically switches to
In the shadow paging recovery technique, the database maintains two page tables during a using the current page table, ensuring that changes are installed all at once.
transaction: the current page table (reflecting the state before the transaction began) and the
shadow page table (which tracks changes made during the transaction). Here's how it operates: • Recovery: In case of a system failure before the transaction commits, the database can easily
recover by discarding the current page table and reverting back to the shadow page table,
• Initialization: When a transaction begins, the database creates a shadow copy of the page thereby restoring the database to its state before the transaction began.
table. The database pages themselves are not duplicated; only the page table entries are
duplicated.

• Modifications: As the transaction progresses, any changes are reflected in the current page
table, while the shadow page table retains the original state. If a page is modified, a new copy
of the page is created, and the current page table is updated to point to this new version,
thereby ensuring that the shadow page table still points to the original unmodified page.

Knowledge Gate Website Knowledge Gate Website

Data fragmentation Distributed database


• Data fragmentation in the context of a Database Management System (DBMS) refers to the process of • A distributed database is a database in which storage devices are not all attached to a
breaking down a database into smaller, manageable pieces, and distributing these pieces across a common processor. It may be stored in multiple computers, located in the same
network of computers. This is a significant component in distributed database systems.
physical location, or dispersed over a network of interconnected computers.
• Horizontal Fragmentation: In DBMS, horizontal fragmentation divides a table into sections based
on rows. Each fragment contains a subset of rows, usually segmented based on certain conditions
or attributes. This can enhance performance by facilitating parallel processing and quicker data • Data Replication
retrieval, especially useful in distributed database systems. • Advantages: Increased Availability, Improved Performance, Enhanced Reliability
• Vertical Fragmentation: This fragmentation type divides a database table by columns, with • Disadvantages: Storage Costs, Maintenance Complexity, Write Complexity.
different sets of columns stored in separate fragments. This approach is employed when different
users require access to varied data attributes, promoting efficient data handling and minimizing • Data Fragmentation
data transfer times, which is advantageous in scenarios where queries only necessitate a subset of • Advantages: Efficient Data Access, Distributed Processing, Localized Management.
attributes.
• Disadvantages: Complexity, Dependency on Network, Reconstruction Issues.
• Hybrid Fragmentation: Hybrid or mixed fragmentation in DBMS combines both horizontal and
vertical strategies, optimizing data storage and retrieval for complex access patterns. This method
can potentially offer performance improvements by utilizing the merits of both horizontal and
vertical fragmentation, and is typically implemented in databases with complex, multifaceted data
access requirements.
Knowledge Gate Website Knowledge Gate Website
CONCURRENCY CONTROL • Time stamping based method: - Where before entering the system, a specific order is decided
among the transaction, so in case of a clash we can decide which one to allow and which to
Here we will study those protocol which guarantee to generate schedule which stop.
always satisfy desirable properties like conflict serializability. Along with we desire
the following properties from schedule generating protocols • Lock based method: - where we ask a transaction to first lock a data item before using it. So
that no different transaction can use a data at the same time, removing any possibility of
• Concurrency should be as high as possible, as this is our ultimate goal conflict.
because of which we are making all the effort. • 2 phase locking
• The time taken by a transaction should also be less. 1. Basic 2pl
2. Conservative 2pl
• Easy to understand and implement. 3. Rigorous 2pl
4. Strict 2pl
• Graph based protocol

• Validation based protocol – Majority of transactions are read only transactions, the rate of
conflicts among the transaction may be low, thus many of transaction, if executed without the
supervision of a concurrency control scheme, would nerveless leave the system in a consistent
Knowledge Gate Website state. Knowledge Gate Website

TIME STAMP ORDERING PROTOCOL • Time stamp for transaction,

• Basic idea of time stamping is to decide the order between the transaction • With each transaction ti, in the system, we associate a unique fixed timestamp, denoted by
TS(ti).
before they enter in the system using a stamp (time stamp), in case of any
conflict during the execution order can be decided using the time stamp. • This timestamp is assigned by database system to a transaction at time transaction enters
into the system.
• Let’s understand how this protocol works, here we have two idea of
timestamping, one for the transaction, and other for the data item. • If a transaction has been assigned a timestamp TS(ti) and a new transaction tj , enters into
the system with a timestamp TS(tj), then always TS(ti) <TS(tj).

Knowledge Gate Website Knowledge Gate Website


• Two things are to be noted • Time stamp with data item, in order to assure such scheme, the protocol maintains for each
data item Q two timestamp values:
1. First time stamp of a transaction remain fixed throughout the execution 1. W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)
successfully.
2. Second it is unique means no two transaction can have the same timestamp.
2. R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully.

• These timestamps are updated whenever a new read(Q) or write(Q) instruction is executed.

Knowledge Gate Website Knowledge Gate Website

• Suppose a transaction Ti request a read(Q) • Suppose that transaction Ti issues write(Q).


1. If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and
1. If TS(Ti ) < W-timestamp(Q), then Ti needs to read a value of Q that was already the system assumed that that value would never be produced. Hence, the write operation is
overwritten. Hence, the read operation is rejected, and Ti is rolled back. rejected, and Ti is rolled back.

2. If TS(Ti )≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is 2. If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this
set to the maximum of R-timestamp(Q) and TS(Ti). write operation is rejected, and Ti is rolled back.

3. If TS(Ti ) ≥ R-timestamp(Q), then the write operation is executed, and W-timestamp(Q) is set to
max(W-timestamp(Q), TS(Ti )).

4. If TS(Ti ) ≥ W-timestamp(Q), then the write operation is executed, and W-timestamp(Q) is set to
max(W-timestamp(Q), TS(Ti )).

Knowledge Gate Website Knowledge Gate Website


Conflict View Recoverability Cascadelessness Deadlock THOMAS WRITE RULE
Serializability Serializability Freedom

Time Stamp Ordering YES YES NO NO YES • Thomas write is an improvement in time stamping protocol, which makes some modification
and may generate those protocols that are even view serializable, because it allows greater
Thomas Write Rule potential concurrency.
Basic 2PL
• It is a Modified version of the timestamp-ordering protocol in which Blind write operations
Conservative 2PL may be ignored under certain circumstances.
Rigorous 2PL
• The protocol rules for read operations remain unchanged. while for write operation, there is
Strict 2PL slightly change in Thomas write rule than timestamp ordering protocol.

Knowledge Gate Website Knowledge Gate Website

When Ti attempts to write data item Q, • This modification is valid as the any transaction with TS(Ti ) < W-timestamp(Q), the value
written by this transaction will never be read by any other transaction performing Read(Q)
ignoring such obsolete write operation is considerable.
• if TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value
of {Q}. Rather than rolling back Ti as the timestamp ordering protocol would • Thomas' Write Rule allows greater potential concurrency. Allows some view-serializable
have done, this {write} operation can be ignored. schedules that are not conflict serializable.

Knowledge Gate Website Knowledge Gate Website


Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
Lock Based Protocols
Time Stamp Ordering YES YES NO NO YES • To ensure isolation is to require that data items be accessed in a mutually
exclusive manner i.e. while one transaction is accessing a data item, no other
Thomas Write Rule NO YES NO NO YES transaction can modify that data item. Locking is the most fundamental
Basic 2PL approach to ensure this.
Conservative 2PL
Rigorous 2PL • Lock based protocols ensure this requirement. Idea is first obtain a lock on the
Strict 2PL desired data item then if lock is granted then perform the operation and then
unlock it.

Knowledge Gate Website Knowledge Gate Website

• In general, we support two modes of lock because, to provide better concurrency. Lock –Compatibility Matrix
• Conclusion shared is compatible only with shared while exclusive is not compatible either with
• Shared mode shared or exclusive.
• If transaction Ti has obtained a shared-mode lock (denoted by S) on any data item
• To access a data item, transaction Ti must first lock that item, if the data item is already locked
Q, then Ti can read, but cannot write Q, any other transaction can also acquire a
by another transaction in an incompatible mode, or some other transaction is already waiting
shared mode lock on the same data item(this is the reason we called this shared in non-compatible mode, then concurrency control manager will not grant the lock until all
mode). incompatible locks held by other transactions have been released. The lock is then granted.

• Exclusive mode
• If transaction Ti has obtained an exclusive-mode lock (denoted by X) on any data
item Q, then Ti can both read and write Q, any other transaction cannot acquire
either a shared or exclusive mode lock on the same data item. (this is the reason we
called this exclusive mode)

Knowledge Gate Website Knowledge Gate Website


• Lock based protocol do not ensure serializability as granting and releasing of lock do not follow • If we do not use locking, or if we unlock data items too soon after reading or writing them, we
any order and any transaction any time may go for lock and unlock. Here in the example may get inconsistent states, as there exists a possibility of dirty read. On the other hand, if we
below we can see, that even this transaction in using locking but neither it is conflict do not unlock a data item before requesting a lock on another data item, concurrency will be
serializable nor independent from deadlock. poor.
T1 T2
• We shall require that each transaction in the system follow a set of rules, called a locking
LOCK-X(A)
protocol, indicating when a transaction may lock and unlock each of the data items for e.g. 2pl
READ(A)
WRITE(A) or graph based locking.
UNLOCK(A)
LOCK-S(B) • Locking protocols restrict the number of possible schedules.
READ(B)
UNLOCK(B)
LOCK-X(B)
READ(B)
WRITE(B)
UNLOCK(B)
LOCK-S(A)
READ(A)
UNLOCK(A)
Knowledge Gate Website Knowledge Gate Website

Two phase locking protocol(2PL) • Initially a transaction is in growing phase and acquires lock as needed and in between can
• The protocol ensures that each transaction issue lock and unlock requests in two phases, note perform operation reach to lock point and once a transaction releases a lock, it can issue no
that each transaction will be 2 phased not schedule. more lock requests i.e. it enters the shrinking phase.

• Growing phase- A transaction may obtain locks, but not release any locks.
T1 T2
• Shrinking phase- A transaction may release locks, but may not obtain any new locks. LOCK-X(A)
READ(A)
WRITE(A)
LOCK-S(B)
READ(B)
LOCK-X(B)
READ(B)
WRITE(B)
LOCK-S(A)
READ(A)
UNLOCK(B)
UNLOCK(A)
UNLOCK(B)
Knowledge Gate Website Knowledge Gate Website
UNLOCK(A)
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom Properties
• 2PL ensures conflict serializability, and the ordering of transaction over lock points is itself a
Time Stamp Ordering YES YES NO NO YES serializability order of a schedule in 2PL.
Thomas Write Rule NO YES NO NO YES
• If a schedule is allowed in 2PL protocol then definitely it is always conflict serializable. But it is
Basic 2PL YES YES NO NO NO not necessary that if a schedule is conflict serializable then it will be generated by 2pl.
Conservative 2PL Equivalent serial schedule is based on the order of lock points.

Rigorous 2PL • View serializability is also guaranteed.


Strict 2PL
• Does not ensure freedom from deadlock

• May cause non-recoverability.

• Cascading rollback may occur.

Knowledge Gate Website Knowledge Gate Website

Conservative 2PL Conflict View Recoverability Cascadelessness Deadlock


Serializability Serializability Freedom
• The idea is there is no growing phase transaction start directly from lock point,
i.e. transaction must first acquire all the required locks then only it can start Time Stamp Ordering YES YES NO NO YES
execution. If all the locks are not available then transaction must release the Thomas Write Rule NO YES NO NO YES
acquired locks and must wait. Basic 2PL YES YES NO NO NO
• Shrinking phase will work as usual, and transaction can unlock any data item Conservative 2PL YES YES NO NO YES
anytime. Rigorous 2PL
Strict 2PL
• we must have a knowledge in future to understand what is data required so
that we can use it

Knowledge Gate Website Knowledge Gate Website


RIGOROUS 2PL Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
• Requires that all locks be held until the transaction commits.
Time Stamp Ordering YES YES NO NO YES
• This protocol requires that locking be two phase and also all the locks taken be
held by transaction until that transaction commit.
Thomas Write Rule NO YES NO NO YES
Basic 2PL YES YES NO NO NO
• Hence there is no shrinking phase in the system. Conservative 2PL YES YES NO NO YES
Rigorous 2PL YES YES YES YES NO
Strict 2PL

Knowledge Gate Website Knowledge Gate Website

STRICT 2PL Conflict View Recoverability Cascadelessness Deadlock


Serializability Serializability Freedom
• that all exclusive-mode locks taken by a transaction be held until that transaction commits.
This requirement ensures that any data written by an uncommitted transaction are locked in
Time Stamp Ordering YES YES NO NO YES
exclusive mode until the transaction commits, preventing any other transaction from reading Thomas Write Rule NO YES NO NO YES
the data.
Basic 2PL YES YES NO NO NO
• This protocol requires that locking be two phase and also that exclusive –mode locks taken by Conservative 2PL YES YES NO NO YES
transaction be held until that transaction commits.
Rigorous 2PL YES YES YES YES NO
• So it is simplified form of rigorous 2pl
Strict 2PL YES YES YES YES NO

Knowledge Gate Website Knowledge Gate Website


Multiple Granularity Multiple Granularity

IS IX S X IS IX S X
IS IS true true true false
IX IX true true false false
S S true false true false
X X false false false false

Knowledge Gate Website Knowledge Gate Website

Validation-Based Protocols • Read phase. During this phase, the system executes transaction Ti. It reads the values of the
various data items and stores them in variables local to Ti. It performs all write operations on
• In cases where a majority of transactions are read-only transactions, the rate of conflicts temporary local variables, without updates of the actual database.
among transactions may be low.
• Validation phase. The validation test (described below) is applied to trans- action Ti. This
• Thus, many of these transactions, if executed without the supervision of a concurrency-control determines whether Ti is allowed to proceed to the write phase without causing a violation of
scheme, would nevertheless leave the system in a consistent state. serializability. If a transaction fails the validation test, the system aborts the transaction.

• A concurrency-control scheme imposes overhead of code execution and possible delay of • Write phase. If the validation test succeeds for transaction Ti, the temporary local variables
transactions. It may be better to use an alternative scheme that imposes less overhead. that hold the results of any write operations performed by Ti are copied to the database.
Read-only transactions omit this phase.
• A difficulty in reducing the overhead is that we do not know in advance which transactions
will be involved in a conflict. To gain that knowledge, we need a scheme for monitoring the
system.

• The validation protocol requires that each transaction Ti executes in two or three different
phases in its lifetime, depending on whether it is a read-only or an update transaction. The
phases are, in order:
Knowledge Gate Website Knowledge Gate Website

You might also like