DBMS Notes
DBMS Notes
www.knowledgegate.ai
www.knowledgegate.ai
Database
• Introduction: “Database Management Systems (DBMS) is a highly scoring and crucial subject for campus placements,
particularly with top recruiters like TCS, Infosys, Wipro, Cognizant, and Accenture. After Data Structures, DBMS requires
relatively less preparation time, featuring straightforward numerical and objective questions. With growing opportunities in
Data Science, AI, and Data Analytics, mastering database fundamentals is essential for a successful IT career.”
• Who Should Take This Course: This database course is specially designed by Sanchit Sir, who has more than 10 years of
experience in the computer science field. It is suitable for all B.Tech & other students, including those from CS and non-CS
branches. Sanchit Sir teaches each topic from the basics, so no previous knowledge is needed. He makes sure to cover every
important topic required for placements and avoids teaching topics that aren’t necessary. As a result, the course is efficient,
complete, sufficient, and focused, helping you save time while preparing effectively for your placement exams.
www.knowledgegate.ai
www.knowledgegate.ai
What is Data
• Data are facts and figures collected through observation. Technically, data is a set of values
(qualitative or quantitative) related to people, objects, or events. A single value from data is
called datum. Any facts or details about something or someone are called data.
• Example:
• Student name, roll number, and exam marks (Rahul, 101, 85 marks).
• Daily temperatures recorded (22°C, 24°C, 25°C).
www.knowledgegate.ai
www.knowledgegate.ai
Q Data typically refers to raw facts and figures that have not yet been processed to reveal any
meaning. Which of the following accurately represents data? (TCS)
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q Information is created from data through a specific process. Which of the following processes
converts data into information? (Infosys)
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q A database is a systematically organized collection of data. Which of the
following best defines a database? (Wipro)
(A) An unstructured file system
(B) A collection of tables without relationships
(C) Interrelated data organized for easy retrieval and management
(D) A single spreadsheet used for analysis
access. www.knowledgegate.ai
What is Data Base Management System?
• A Database Management System (DBMS) is software that helps users create, maintain, and manage databases
efficiently. It provides functionalities like data storage, retrieval, security, and concurrency control.
• Example: MySQL, Oracle, and Microsoft SQL Server are popular DBMS software used by companies to handle
their data securely and efficiently.
www.knowledgegate.ai
www.knowledgegate.ai
Q The primary function of a Database Management System (DBMS) is best described by which of
the following? (Cognizant)
(A) Creating user interfaces for applications
(B) Managing databases and controlling data access effectively
(C) Writing programming code for software applications
(D) Operating system management
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B,Explanation: DBMS effectively manages and controls data access.
Problems With File System
www.knowledgegate.ai
Q Which of the following is a significant disadvantage of the file system compared to a
database system? (Accenture)
(A) High levels of data security
(B) Reduced data redundancy
(C) Increased risk of data inconsistency
(D) Efficient concurrent data access
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q Concurrency in database systems enables multiple users to access data simultaneously
without conflicts. Which real-world scenario illustrates concurrency best? (HCL)
(A) A single user editing a document alone
(B) Multiple customers booking flight tickets simultaneously
(C) Backing up a database at midnight
(D) Encrypting data files periodically
www.knowledgegate.ai
• Hierarchical Database: Definition: Stores data in a tree-like structure with parent-child relationships; each child has only one
parent. Industrial Example: IBM’s Information Management System (IMS). Commercial Use: Telecom and banking sectors use it
for clear parent-child data structures, like customer accounts and their transactions.
• Network Database: Definition: Stores data in a network of linked records, allowing multiple relationships, like a graph
structure. Industrial Example: Integrated Data Store (IDS) by Honeywell/Bull. Commercial Use: Manufacturing and logistics
companies historically used it to manage complex interconnected data, like products and suppliers.
• NoSQL Database: Definition: Handles large amounts of unstructured or semi-structured data, offering flexibility and scalability
without fixed tables. Industrial Example: MongoDB, Cassandra, Redis.
Commercial Use: Social media (like Facebook), e-commerce (like Amazon),
and real-time analytics platforms use NoSQL to manage rapidly changing,
vast data efficiently.
www.knowledgegate.ai
www.knowledgegate.ai
Q MongoDB is categorized under which type of database? (Capgemini)
(A) Relational database
(B) Hierarchical database
(C) Network database
(D) NoSQL database
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Answer: C, Explanation: Physical view shows the actual storage format.
Instance and Schemas
• Instance: An instance of a database is the actual collection of data present at a particular point of time. It’s
essentially a snapshot of the database showing what data currently exists. Example: At the end of a semester,
the database instance might include all current students’ grades, attendance records, and exam results for that
specific semester.
• Schema: A database schema defines the overall logical structure and design of the database. It describes how
data is organized, including tables, relationships, attributes, and constraints, but does not include the actual
data. Example: A student database schema would specify tables like Students (StudentID, Name, Address),
Courses (CourseID, CourseName), and Grades (StudentID, CourseID, Grade), along with their relationships and
constraints, even before entering actual student data.
www.knowledgegate.ai
www.knowledgegate.ai
Q A database instance represents which of the following? (Tech Mahindra)
(A) Logical structure of database
(B) Database snapshot at a specific time
(C) User access control policies
(D) Database backup procedures
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B, Explanation: Instance refers to database state at a moment in time.
Q The schema of a database refers to: (Mindtree)
(A) Data entries at a specific time
(B) Physical storage devices
(C) Structural definition and organization of data
(D) Database management software tools
www.knowledgegate.ai
www.knowledgegate.ai
Answer: C, Explanation: Schema outlines the structure and organization of data.
OLAP Vs OLTP
• In real-world applications, databases handle two very different types of tasks:
• Daily transactions, which must be quick and accurate, like banking or ticket booking.
• Analytical tasks, where historical data is analyzed to make decisions, like sales analysis or trend prediction.
• These two tasks have entirely different requirements. Using a single database for both makes the system slow and inefficient.
Therefore, databases are divided into two specialized categories: OLTP (Online Transaction Processing) for daily operations,
and OLAP (Online Analytical Processing) for analyzing data and making decisions.
Basis of Difference OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Used for analyzing large volumes of
Handles daily transactions efficiently
Definition historical data to make strategic
and accurately.
decisions.
Ensuring fast and smooth daily Analyzing data to help businesses make
Main Purpose
operations. informed decisions.
Short, simple queries involving small Long, complex queries involving large
Type of Queries
datasets. historical datasets.
Commercial ATM operations, online shopping Amazon analyzing customer purchases to
Example checkout. suggest new products.
Customers, operational employees
Typical Users Business analysts, executives, managers.
www.knowledgegate.ai (e.g., cashier, bank teller).
www.knowledgegate.ai
Q OLTP systems are primarily designed for handling which type of database operations? (LTI)
(A) Complex analytical queries
(B) Daily operational transactions
(C) Large-scale historical data analysis
(D) Multidimensional data aggregations
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B, Explanation: OLTP focuses on daily operational transactions.
DBA
• Who is a DBA: A Database Administrator is the person responsible for managing, securing, and maintaining databases to
ensure they run smoothly and efficiently.
• Commercial Example: Large companies like TCS, Infosys, Oracle, and Amazon employ
specialized DBAs to handle vast amounts of sensitive data securely and efficiently.
www.knowledgegate.ai
www.knowledgegate.ai
Q What is one of the primary responsibilities of a Database Administrator (DBA)? (Deloitte)
(A) Writing application software
(B) Ensuring database security and integrity
(C) Installing operating systems
(D) Performing hardware maintenance
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B,Explanation: DBA ensures database security and data integrity.
Data Warehousing
• Why Do We Need Data Warehousing: Businesses gather data from various sources, making it challenging to analyze it
efficiently if stored separately. To overcome this, data warehousing emerged in the 1980s–90s, enabling organizations to
collect and analyze historical data from multiple sources at a single place, improving decision-making processes.
• What is a Data Warehouse: A Data Warehouse is a centralized storage system that integrates large volumes of historical data
from different sources, specifically structured to support analysis, reporting, and decision-making.
• Example (Simple & Commercial): A university combines attendance, marks, and student feedback into one database (data
warehouse) to analyse overall student performance. Companies like Amazon and Flipkart use data warehouses to study
customer buying behaviour, making effective product recommendations.
• Importance:
• Helps organizations make informed, data-driven decisions.
• Enables efficient analysis of historical trends, improving business strategies.
www.knowledgegate.ai
www.knowledgegate.ai
Data Mining
• Why Do We Need Data Mining: Companies collect massive amounts of data every day. But simply storing data isn’t enough.
Businesses need ways to discover hidden patterns and valuable insights from this data. Data mining, popular since the 1990s,
helps organizations analyse large datasets effectively to predict future trends and make better decisions.
• What is Data Mining: Data mining is the process of discovering hidden patterns, relationships, and useful information from
large databases to support decision-making and predict future behavior.
• Example (Simple & Commercial): A supermarket notices that customers buying bread also often buy milk, so they place these
items together to increase sales. Banks (like HDFC or ICICI) use data mining to detect suspicious transactions or potential fraud
based on unusual customer activity.
www.knowledgegate.ai
www.knowledgegate.ai
Data Science
• Why Do We Need Data Science: In today’s digital world, massive amounts of data are generated every day. Businesses need
to analyze and use this data effectively to make smarter decisions and predictions. This need gave rise to Data Science, an
interdisciplinary field that emerged prominently in the 2000s, combining statistics, computing, and domain expertise to
extract valuable insights from data.
• What is Data Science: Data Science is the field of extracting meaningful insights from data using scientific methods,
algorithms, and statistical techniques to solve real-world problems.
• Example (Simple & Commercial): Netflix analyzes viewers’ watching history to recommend movies and series that match
individual tastes. Swiggy and Zomato use Data Science to predict delivery times and optimize routes, ensuring faster
deliveries.
• In short:
• Data Warehouse stores data.
• Data Mining finds hidden patterns.
• Data Science combines all these to predict and solve problems.
www.knowledgegate.ai
www.knowledgegate.ai
E-R DIAGRAM/MODEL
• Introduction:
• The Entity-Relationship (E-R) Diagram was introduced by Dr. Peter Chen in 1976. It
provides a simple, visual way to design databases by representing real-world scenarios.
• Benefits:
• Easy to understand as it uses simple diagrams.
• Removes confusion by clearly showing data structure and interactions.
• Suitable even for non-technical users, making database design intuitive and
straightforward.
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q: Entity-Relationship diagrams, widely used in database design, were introduced
in 1976. Who proposed this concept initially? (TCS)
(A) Edgar F. Codd
(B) Dr. Peter Chen
(C) Charles Bachman
(D) James Gosling
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B, Explanation: ER diagrams were introduced by Dr. Peter Chen.
Q: Why are ER diagrams considered beneficial in designing databases, especially when dealing
with non-technical stakeholders? (Infosys)
(A) They are highly technical.
(B) They use programming languages.
(C) They provide an intuitive visual representation of data.
(D) They require extensive technical training.
www.knowledgegate.ai
www.knowledgegate.ai
Answer: C, Explanation: ER diagrams simplify database concepts visually, making them easy to understand.
ENTITY
• An entity is a thing or an object in the real world that is distinguishable from other object based on the values of the
attributes it possesses. An entity may be concrete, such as a person or a book, or it may be abstract, such as a course, a
course offering, or a flight reservation.
• Types of Entity
• Tangible - Entities which physically exist in real world. E.g. - Car, Pen, locker
• Intangible - Entities which exist logically. E.g. – Account, video.
• In ER diagram we cannot represent an entity, as entity is an instant not schema, and ER diagram is designed to
understand schema.
• In a relational model entity is represented by a row or a tuple or a record in a table.
www.knowledgegate.ai
www.knowledgegate.ai
• ENTIY SET- Collection of same type of entities that share the same properties or attributes.
• In an ER diagram an entity set is represented by a rectangle
• In a relational model it is represented by a separate table
www.knowledgegate.ai
www.knowledgegate.ai
Q: An entity within an ER diagram is typically defined as which of the following?
(Wipro)
(A) A specific type of database query
(B) A uniquely identifiable object or concept
(C) An action or event
(D) A property of an object
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B, Explanation: Entities represent identifiable objects or concepts.
Q: What is the correct definition of an entity set in ER diagrams? (Cognizant)
(A) A single attribute
(B) A set of different relationships
(C) A group of similar entities
(D) A complex database operation
www.knowledgegate.ai
www.knowledgegate.ai
Answer: C,Explanation: An entity set is a collection of entities of the same type.
ATTRIBUTES
• Attributes are the units defines and describe properties and characteristics of entities. Attributes are the
descriptive properties possessed by each member of an entity set. for each attribute there is a set of permitted
values called domain.
• In an ER diagram attributes are represented by ellipse or oval connected to rectangle. While in a relational model
they are represented by independent column. e.g. Instructor (ID, name, salary, dept_name)
www.knowledgegate.ai
www.knowledgegate.ai
Types of Attributes
• Single valued- Attributes having single value at any instance of time for an entity. E.g. – Aadhar no, dob.
• Multivalued - Attributes which can have more than one value for an entity at same time. E.g. - Phone no, email,
address.
• A multivalued attribute is represented by a double ellipse in an ER diagram and by an independent table in a
relational model.
• Separate table for each multivalued attribute, by taking mva and pk of main table as fk in new table
www.knowledgegate.ai
www.knowledgegate.ai
Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh Singh 555-861-2025, 192-122-1111
(555) 403-1659 Ext. 53; 182-929-
456 San Zhang Zhang
2929
789 John Doe Doe 555-808-9633
www.knowledgegate.ai
www.knowledgegate.ai
• Simple - Attributes which cannot be divided further into sub parts. E.g. Age. Composite - Attributes which can
be further divided into sub parts, as simple attributes. A composite attribute is represented by an ellipse
connected to an ellipse and in a relational model by a separate column.
• Stored - Main attributes whose value is permanently stored in database. E.g. date_of_birth. Derived -The value
of these types of attributes can be derived from values of other Attributes. E.g. - Age attribute can be derived
from date_of_birth and Date attribute.
www.knowledgegate.ai
www.knowledgegate.ai
Q: Which type of attribute can hold multiple values for a single entity in ER modeling?
(Accenture)
(A) Simple attribute
(B) Composite attribute
(C) Derived attribute
(D) Multi-valued attribute
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Degree of a relationship/relationship set
• Means number of entities set(relations/tables) associated(participate) in the relationship set. Most of the
relationship sets in a data base system are binary. Occasionally however relationship sets involve more than two
entity sets. Logically, we can associate any number of entity set in a relationship called N-ary Relationship.
www.knowledgegate.ai
www.knowledgegate.ai
Structural constraints (Cardinalities Ratios, Participation)
• An E-R enterprise schema may define certain constraints to which the contents of a database must conform.
Express the number of entities to which another entity can be associated via a relationship set. Four possible
categories are- One to One (1:1) Relationship, One to Many (1: M) Relationship, Many to One (M: 1)
Relationship, Many to Many (M: N) Relationship.
www.knowledgegate.ai
www.knowledgegate.ai
Participation Constraints
• Participation constraint specifies whether the existence of an entity depends on its being related to another entity via the
relationship type. These constraints specify the minimum and maximum number of relationship instances that each entity
must/can participate in.
• Max cardinality – it defines the maximum no of times an entity occurrence participating in a relationship. Min cardinality - it
defines the minimum no of times an entity occurrence participating in a relationship.
www.knowledgegate.ai
www.knowledgegate.ai
Q What is a binary relationship in an ER diagram context? (Capgemini)
(A) A relationship involving exactly one entity
(B) A relationship involving exactly two entities
(C) A relationship involving exactly three entities
(D) A relationship with unlimited entities
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B, Explanation: Binary relationships connect exactly two entities.
Q Which type of participation constraint implies that every entity must participate
in the relationship? (IBM)
(A) Partial participation
(B) Complete participation
(C) Optional participation
(D) Selective participation
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B,Explanation: Complete participation means all entities must participate in the relationship.
STRONG AND WEAK ENTITY SET
• A strong entity set is one that has its own primary key, allowing each entity (tuple) to be uniquely identified. A weak entity set, however,
doesn’t have sufficient attributes for a primary key and thus relies on another strong entity set (identifying or owner entit y set) for unique
identification. Weak entity sets contain discriminator attributes (partial keys), which, combined with the primary key of the identifying entity
set, uniquely identify each tuple.
• In ER diagrams, weak entity sets are shown as double rectangles, and their relationship with identifying entities (identifying relationship) is
represented by double diamonds. This relationship is always many-to-one, with total participation from the weak entity set.
• Weak entity sets are important because they accurately represent entities that logically depend on others. They help maintain database
consistency by automatically deleting dependent entities when the strong entity is deleted, avoiding data duplication and inconsistency
issues.
www.knowledgegate.ai
www.knowledgegate.ai
Q A weak entity set in an ER diagram is defined as: (Tech Mahindra)
(A) An entity set without attributes
(B) An entity set that cannot exist without a strong entity
(C) An entity set with its own independent existence
(D) An entity set always used for temporary storage
www.knowledgegate.ai
www.knowledgegate.ai
Answer: B,Explanation: Weak entities depend on strong entities for their existence.
Q Strong entities in ER diagrams differ from weak entities primarily because strong entities:
(Mindtree)
(A) Require another entity for their existence
(B) Lack attributes entirely
(C) Can independently exist and have a unique identifier
(D) Can never participate in relationships
www.knowledgegate.ai
www.knowledgegate.ai
Conversion From ER Diagram To Relational Model
• Entity Set
• Convert every strong entity set into a separate table.
• Convert every weak entity set into a separate table, by making it dependent into one strong entity set
(identifying or owner entity set).
• Relationship(Unary)
• No separate table is required, add a new column as fk which refer the pk of the same table.
• Relationship(Binary)
• 1:1 No separate table is required, take pk of one side and put it as fk on other side, priority must be given to the side
having total participation.
• Conversion of 1-1 relationship(binary)
• No separate table is required, take pk of one side as pk on other side, priority must be given to the side having total
participation.
• Conversion of 1-n or n-1 relationship (binary)
• No separate table is required, modify n side by taking pk of 1 side a foreign key on n side.
www.knowledgegate.ai
www.knowledgegate.ai
• Relationship(3 or More)
• Take the pk of all participating entity sets as fk and declare their combinations as pk in the new table.
• Multivalued Attributes
• A separate table must be taken for all multivalued attributes, where we take pk of the main table as fk and
declare combination of fk and multivalued attribute are pk in the new table.
• Composite Attributes
• A separate column must be taken for all simple attributes of the composite attribute.
www.knowledgegate.ai
www.knowledgegate.ai
Q: During ER diagram conversion to relational tables, how is a Many-to-Many relationship
typically handled? (LTI)
(A) Ignored completely
(B) Represented by adding extra attributes
(C) Represented by creating an additional junction table
(D) Represented by combining entities into a single table
www.knowledgegate.ai
Answer: C, Explanation: Weak entities are converted into separate tables that include the key
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
• ADVANTAGES OF E-R DIGRAM
• Constructs used in the ER diagram can easily be transformed into relational tables.
• It is simple and easy to understand with minimum training.
www.knowledgegate.ai
www.knowledgegate.ai
RELATIONAL DATABASE MANAGEMENT SYSTEM
• Relational Database Management System (RDBMS) is a database system based on the
relational model, originally introduced by Edgar F. Codd in 1970, who is considered the
father of modern relational database design. Most popular databases today, both
commercial and open-source (like MySQL, Oracle, and Microsoft SQL Server), follow this
relational model. An RDBMS stores data efficiently in tables and uses clearly defined
relationships among these tables to ensure data consistency and integrity.
www.knowledgegate.ai
www.knowledgegate.ai
BASICS OF RDBMS
• Domain (set of permissible value in particular column) is a set of atomic values. By atomic we mean that each
value in the domain is indivisible as far as the formal relational model is concerned. A common method of
specifying a domain is to specify a data type from which the data values forming the domain are drawn.
• E.g. Names: The set of character strings that represent names of persons.
• Table (Relation) - A Relation is a set of tuples/rows/entities/records.
• Tuple - Each row of a Relation/Table is called Tuple.
• Arity/Degree - No. of columns/attributes of a Relation. E.g. - Arity is 5 in Table Student.
• Cardinality - No of rows/tuples/record of a Relational instance. E.g. - Cardinality is 4 in table Student.
Rows/Tuples/Record/
NAME ID CITY COUNTRY HOBBY
Domain/ Cardinality
Field/ NISHA 1 AGRA INDIA PLAYING
Column/ NIKITA 2 DELHI INDIA DANCING
Arity/ AJAY 3 AGRA INDIA CHESS
Degree ARPIT 4 PATNA INDIA READING
www.knowledgegate.ai
www.knowledgegate.ai
Q) In a relational schema, each tuple is divided into fields called[TCS NINJA].
(a) Relations
(b) Domains
(c) Queries
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Properties of Relational tables
1. Cells contains atomic values
2. Values in a column are of the same kind
3. Each row is unique
4. No two tables can have the same name in a relational schema.
5. Each column has a unique name
6. The sequence of rows is insignificant
7. The sequence of columns is insignificant.
www.knowledgegate.ai
www.knowledgegate.ai
Problems in relational database
• Update Anomalies- Anomalies that cause redundant work to be done during insertion into and Modification of
a relation and that may cause accidental loss of information during a deletion from a relation
• Insertion anomalies: An independent piece of information cannot be recorded into a relation unless an
irrelevant information must be inserted together at the same time.
• Modification anomalies: The update of a piece of information must occur at multiple locations.
• Deletion Anomalies: The deletion of a piece of information unintentionally removes other information.
Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr
www.knowledgegate.ai
www.knowledgegate.ai
Roll no name Age Br_code Br_name Br_hod_name
1 A 19 101 Cs Abc
2 B 18 101 Cs Abc
3 C 20 101 Cs Abc
4 D 20 102 Ec Pqr
PK FK PK
www.knowledgegate.ai
www.knowledgegate.ai
Purpose of Normalization
• Normalization is a systematic process of organizing database tables clearly and efficiently. Its main goal is to remove
redundancy (duplicate data) and inconsistent dependencies, ensuring data accuracy and improving database performance.
• Without normalization, databases become slow, inefficient, and can produce incorrect results due to repeated or conflicting
data. Normalization solves this by ensuring each table stores data about only one idea or concept. If a table contains multiple
ideas, it is decomposed into smaller, clearly-focused tables.
• To perform normalization effectively, especially in larger databases, we rely on a powerful concept called Functional
Dependencies. These dependencies help identify how data items relate, guiding us in properly structuring tables for clarity,
accuracy, and efficiency.
www.knowledgegate.ai
www.knowledgegate.ai
FUNCTIONAL DEPENDENCY
• A formal tool for analysis of relational schemas. X Y Z
• In a Relation R, if ‘α’ ⊑ R AND ‘β’ ⊑ R, then attribute or a Set of attribute ‘α’ Functionally 1 4 2
derives an attribute or set of attributes ‘β’, iff each ‘α’ value is associated with precisely
one ‘β’ value. 1 4 3
• For all pairs of tuples t1 and t2 in R such that 2 6 3
• If T1[α] = T2[α]
• Then, T1[β] = T2[β] 3 2 2
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following relation instance, which of the following
dependency doesn’t hold
A) A → b A B C
1 2 3
B) BC → A
4 2 3
C) B → C 5 3 3
D) AC → B
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following dependency doesn’t hold good?
A) A → BC A B C D E
a 2 3 4 5
B) DE → C 2 a 3 4 5
a 2 3 6 5
C) C → DE a 2 3 6 6
D) BC → A
www.knowledgegate.ai
www.knowledgegate.ai
• Trivial Functional dependency - If β is a subset of α, then the functional dependency α → β
will always hold.
www.knowledgegate.ai
www.knowledgegate.ai
Q) A functional dependency of the form x —> y is trivial if. [Amcat].
(A) y ⊆ x
(B) y ⊂ x
(C) x ⊂ y
(D) x ⊂y and y⊂ x
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE SET OF ATTRIBUTES
• Attribute closure (also known as closure of attribute set) is the set of all attributes that can be functionally
determined from a given set of attributes using the provided functional dependencies (FDs). It includes
attributes that can be directly determined by the given dependencies, as well as those that can be logically
derived. If we have an attribute set “X,” its attribute closure is represented as X⁺.
www.knowledgegate.ai
www.knowledgegate.ai
ARMSTRONG’S AXIOMS
• Armstrong’s Axioms are a set of fundamental rules or principles used to find all possible
functional dependencies in a relational database. Introduced by William W. Armstrong in
1974, these axioms act as basic inference rules, helping us logically determine new
dependencies from existing ones. Armstrong’s Axioms are considered “sound,” meaning they
correctly generate only valid functional dependencies within the closure (denoted as F⁺) of a
given set of dependencies (denoted as F).
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following is not Armstrong’s Axiom? [Asked in Accenture]
a) Reflexivity rule
b) Transitivity rule
c) Pseudotransitivity rule
d) Augmentation rule
www.knowledgegate.ai
Ans=C www.knowledgegate.ai
Q Consider a relation R(A, B, C), which of the following statements is not true
according to inference rules for functional dependencies? [Asked in TCS NQT]
a) If A → B and B → C, then A → C
b) If AB → C, then A → B and B → C
c) If A → B and A → C, then A → BC
d) If A → B, then AC → BC
www.knowledgegate.ai
Ans: B www.knowledgegate.ai
Key
• For identifying the uniqueness of a tuple, we take help of an attribute or set of attributes. Uniqueness of tuple
is needed as a Relation is a Set and Set disallows duplicity of elements. Various Keys used in database System
are as follows-
www.knowledgegate.ai
www.knowledgegate.ai
Super key
• A Super Key is a set of one or more attributes in a relation that uniquely identifies each tuple (row) within that
relation. In other words, no two tuples can have the same values for these attributes.
• Formally, if we have a set of attributes X in a relation R, and if the attribute closure of X (denoted as X⁺) includes
all the attributes of R, then X is called a Super Key.
• Every relation must have at least one super key that doesn’t contain null values. For a relation with ‘n’ attributes,
the maximum number of possible super keys is 2^n - 1. The largest super key possible includes all attributes of
the relation.
• A super key can contain attributes that individually allow null values, but to uniquely identify each tuple, at least
one attribute within the super key must always have a NOT NULL constraint. In simpler terms, a super key cannot
be entirely null because, if it were, it wouldn’t be able to uniquely identify records. At least one attribute within a
super key must always have valid (non-null) data to maintain uniqueness and integrity.
www.knowledgegate.ai
www.knowledgegate.ai
Q. Consider a relation R(A,B,C,D,E) with the following functional dependencies:
ABC → DE and
D → AB
The number of super keys of R is: [Hexaware]
(A) 2
(B) 7
(C) 10
(D) 12
www.knowledgegate.ai
Answer : C www.knowledgegate.ai
• Candidate key: A candidate key is the smallest set of attributes required to uniquely identify every tuple (row)
within a relation. It is also known as a minimal super key, meaning no subset of a candidate key can act as a
super key. Every relation must have at least one candidate key with a NOT NULL constraint to maintain data
integrity. Attributes that form part of any candidate key are known as prime attributes.
• Primary key: A primary key is one of the candidate keys selected by the database administrator to uniquely
identify tuples in a relation. Attributes chosen as the primary key cannot contain NULL values, ensuring each
row remains distinct and clearly identifiable. Each table can have at most one primary key. Any candidate keys
not selected as the primary key are called alternate keys.
• Composite key – Composite key is a key composed of more than one column sometimes it is also known as
concatenated key.
• Secondary key – Secondary key is a key used to speed up the search and retrieval contrary to primary key, a
secondary key does not necessary contain unique values.
www.knowledgegate.ai
www.knowledgegate.ai
Q) The subset of super key is a candidate key under what condition? [Amcat].
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following options are correct regarding these three keys (Primary Key, Super Key,
and Candidate Key) in a database? [Asked in E-Litmus]
I. Minimal super key is a candidate key
II. Only one candidate key can be a primary key
III. All super keys can be a candidate key
IV. We cannot find a primary key from the candidate key
a) I and II
b) II and III
c) I and III
d) II and IV
www.knowledgegate.ai
www.knowledgegate.ai
Q Class (course id, title, dept name, credits, sec id, semester, YEAR, building, room NUMBER, capacity, TIME slot id)
The SET OF functional dependencies that we require TO hold ON class are:
course id->title, dept name, credits
building, room number->capacity
course id, sec id, semester, year->building, room NUMBER, TIME slot id
A candidate KEY FOR this schema IS {course id, sec id, semester, YEAR}
Consider the above conditions. Which of the following relation holds? [Asked in TCS NQT]
a) Course id-> title, dept name, credits
d) Cannot be determined
www.knowledgegate.ai
www.knowledgegate.ai
Q) A __________ is a property of the entire relation, rather than of the individual
tuples in which each tuple is unique. [Amcat]
(a) Rows
(b) Key
(c) Attribute
(d) Fields
www.knowledgegate.ai
www.knowledgegate.ai
Foreign Keys
• A foreign key is a column or group of columns in a relational database table that refers the primary key of the
same table or some other table to represent relationship.The concept of referential integrity is derived from
foreign key theory.
www.knowledgegate.ai
www.knowledgegate.ai
Q) An attribute in a relation is a foreign key if the __________ key from
one relation is used as an attribute in that relation. [Amcat].
(a) Candidate
(b) Primary
(c) Super
(d) Sub
www.knowledgegate.ai
www.knowledgegate.ai
Q) A __________ integrity constraint, requires that the values appearing in
specified attributes of any tuple in the referencing relation also appear in specified
attributes of at least one tuple in the referenced relation.[Amcat].
(a) Referential
(b) Referencing
(c) Specific
(d) Primary
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following statements S1 and S2 about the relational data model:
● S1: A relation scheme can have at most one foreign key.
● S2: A foreign key in a relation scheme R cannot be used to refer to tuples of R.
Which one of the following choices is correct?[Asked in Hexaware]
www.knowledgegate.ai
www.knowledgegate.ai
Q) Which is correct difference between primary key and foreign key.
[Tech Mahindra]
(a) A table can have multiple primary key and single foreign key.
(b) A primary key cannot ignore NULL value but foreign key can.
(c) A primary key can have duplicate value but foreign does not.
(d) None of the above.
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following table consisting of two attributes A and B, where ‘B’ is the A B
foreign key referring the candidate key ‘A’ with on – delete cascade option. When we 8 9
delete the tuple (3 2), we need to delete few tuples additionally in order to preserve the 4 6
referential integrity. The number of tuples that are remaining in the table when we 7 6
delete (3 2) and additional tuples if necessary is ______ 3 2
6 5
5 1
1 2
2 3
www.knowledgegate.ai
www.knowledgegate.ai
Q) A combination of two or more columns used to identify particular rows in a
relation is _____ . [Amcat]
(a) Record
(b) Field
www.knowledgegate.ai
www.knowledgegate.ai
Q) Consider the tables store and sales given below.[infosys]
Store_id City Region Productid Desc Store_id
Which is the best primary key for the sales table from the following?
(A) desc
(B) {productid,store_id}
(C) productid
(D) store_id
www.knowledgegate.ai
Answer : C www.knowledgegate.ai
Q R(ABCD)
A→B A B C D
B→C
C→A
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCDEFGHIJ)
AB → C A B C D E F G H I J
A → DE
B→F
F → GH
D → IJ
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCDE)
A→B A B C D E
BC → E
DE → A
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCD)
AB → CD A B C D
C→A
D→B
www.knowledgegate.ai
www.knowledgegate.ai
Q R(WXYZ)
Z→W W X Y Z
Y → XZ
XW → Y
www.knowledgegate.ai
www.knowledgegate.ai
Q Suppose relation R(A,B,C,D,E) has the following functional dependencies: [Asked
in Tech Mahindra]
A→B A B C D E
B→C
BC → A
A→D
E→A
D→E
www.knowledgegate.ai
Q Given a relation R(A,B,C,D,E,F) and set of functional dependency (FD)
F = {A →BC, C → E, E → F, F → AB,}. How many candidate keys does the relation R
have?
a) 1
A B C D E F
b) 3
c) 4
d) 5
www.knowledgegate.ai
www.knowledgegate.ai
• Normalization (or decomposition of relations) is a structured process used in database design to minimize
redundancy and avoid data inconsistencies. It involves breaking down larger tables into smaller, clearly focused
tables while preserving all essential relationships and information.
• To perform normalization, we primarily use tools such as functional dependencies and candidate keys. Functional
dependencies help us analyze the relationships among attributes, guiding us through the normalization steps.
Using functional dependencies, we can effectively normalize a database schema up to Boyce-Codd Normal Form
(BCNF).
• Normalization proceeds through a series of standard forms, each improving database efficiency and clarity:
• First Normal Form (1NF) → Second Normal Form (2NF) → Third Normal Form (3NF) → Boyce-Codd Normal
Form (BCNF).
www.knowledgegate.ai
www.knowledgegate.ai
FIRST NORMAL FORM
• A table (relation) is said to be in First Normal Form (1NF) if every cell contains only a single (atomic) value. In
other words, attributes should not hold multiple values or composite (combined) values within one cell.
• To satisfy 1NF:
• Each row must be unique; no two rows should be identical.
• The table must have a primary key to uniquely identify each row.
• Each column (attribute) should have a distinct and meaningful name.
• The order of rows and columns does not matter in a relational table.
www.knowledgegate.ai
www.knowledgegate.ai
Q Relations produced from E - R Model will always be in ________. [Asked in Tech
Mahindra]
(a) 1 NF
(b) 2 NF
(c) 3 NF
(d) 4 NF
www.knowledgegate.ai
www.knowledgegate.ai
• Prime attribute: - A attribute is said to be prime if it is part of any of the candidate key
• Non-Prime attribute: - A attribute is said to be non-prime if it is not part of any of the candidate key
• R(ABCD)
• AB→CD
• Here candidate key is AB so, A and B are prime attribute, C and D are non-prime attributes.
• PARTIAL DEPENDENCY- When a non – prime attribute is dependent only on a part (Proper subset) of
candidate key then it is called partial dependency. (PRIME > NON-PRIME)
• TOTAL DEPENDENCY- When a non – prime attribute is dependent on the entire candidate key then it is
called total dependency.
• e.g. R(ABCD) AB→D, A→C
www.knowledgegate.ai
www.knowledgegate.ai
SECOND NORMAL FORM
• Relation R is in 2NF if,
• R should be in 1 NF.
• R should not contain any Partial dependency. (that is every non-prime attribute should be fully dependent
upon candidate key)
R(A, B, C) B→C
A B C A B B C
a 1 X A 1 1 X
b 2 Y B 2 2 Y
a 3 Z A 3 3 z
C 3 Z C 3
D 3 Z D 3
E 3 Z E 3
www.knowledgegate.ai
www.knowledgegate.ai
TRANSITIVE DEPENDENCY – A functional dependency from non-Prime attribute to non-Prime
attribute is called transitive
www.knowledgegate.ai
www.knowledgegate.ai
THIRD NORMAL FORM
• Let R be the relational schema, it is said to be in 3 NF
• R should be in 2NF
• It must not contain any transitive dependency
• A relational schema R is said to be 3 NF if every functional dependency in R from α-->β, either α is super key or β
is the prime attribute
A B C A B B C
A 1 P A 1 1 P
B 2 Q B 2 2 Q
C 2 Q C 2 3 R
D 2 Q D 2 4 S
E 3 R E 3
F 3 R F 3
G 4 S G 4
www.knowledgegate.ai
www.knowledgegate.ai
Q Which normal form is considered as adequate for usual database
design?(Infosys)
(A) 2NF
(B) 3NF
(C) 4NF
(D) 5NF
www.knowledgegate.ai
www.knowledgegate.ai
Q Third normal form is based on the concept of ______. (TCS)
www.knowledgegate.ai
www.knowledgegate.ai
R(A, B, C) A B C A B C B
AB → C A C B A B B C
B B C B B C B
C→B B A D B A D A
A A E A A E A
C C B C C
D C B D C
E C B e C
F C B f c
www.knowledgegate.ai
www.knowledgegate.ai
BCNF (BOYCE CODD NORMAL FORM)
A relational schema R is said to be BCNF if every functional dependency in R from
Α→β
α must be a super key
E.g.- R (A, B, C, D)
{
AB->C
C->D
D->A
} Candidate key= {AB}, {DB}, {CB}
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABC) (AB, BC)
AB → C A B C
C→A
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCD)(AB)
AB → C A B C D
B→D
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCDE)(CE)
CE → D A B C D E
D→B
C→A
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCDE)(AE)
A→B A B C D
B→E
C→D
www.knowledgegate.ai
www.knowledgegate.ai
Q A relation is in ____________ if an attribute of a composite key is dependent on
an attribute of other composite key. [Asked in Amcat]
a) 2NF
b) 3NF
c) BCNF
d) 1NF
www.knowledgegate.ai
www.knowledgegate.ai
Q In 2NF [Asked in Tech Mahindra]
www.knowledgegate.ai
www.knowledgegate.ai
Q. Amit has some knowledge about database normalization. He has created a table
"customer”, which has the following characteristics.
1) Table has transitive dependencies.
2) There are no partial dependencies in the table.
3) There is no column with redundant data in the table In which normal form is the
table.[Amcat].
(a) 1NF
(b) 2NF
(c) 3NF
(d) BCNF
www.knowledgegate.ai
www.knowledgegate.ai
Q) Third normal form is inadequate in situations where the relation :
[Amcat].
www.knowledgegate.ai
www.knowledgegate.ai
Q) Consider the relation given below and find the maximum normal form applicable to them :-
R(A, B) with productions { A --> B }
R(A, B) with productions { B --> A }
R(A, B) with productions {A —> B, B --> A }
R(A, B, C) with productions {A -->B, B --> A, AB --> C } [Amcat]
www.knowledgegate.ai
Q) What is normalization? [Hexaware].
(1). Minimizing redundancy.
(2). Minimizing insertion, deletion and update anomalies.
(a)1
(c) 2
www.knowledgegate.ai
www.knowledgegate.ai
Q. A table has fields F1, F2, F3, F4, and F5, with the following functional dependencies:
F1->F3
F2->F4
(F1,F2)->F5
in terms of normalization, this table is [Asked in Wipro NLTH]
(A) 1NF
(B) 2NF
(C) 3NF
www.knowledgegate.ai
Answer : A www.knowledgegate.ai
Q. Let R(A,B,C,D,E,P,G) be a relational schema in which the following FDs are known to hold:
AB->CD
DE->P
C->E
P->C
B->G
The relation schema R is [Asked in Amcat 2020]
(A) in BCNF
www.knowledgegate.ai
Answer : D www.knowledgegate.ai
Q. In which normal form an attribute cant hold multiple values : (INFYTQ)
(A) 1NF
(B) 2NF
(C) 3NF
(D) BCNF
www.knowledgegate.ai
Answer : A www.knowledgegate.ai
Q. A relation in which every non-key attribute is fully functionally dependent on
the primary key and which has no transitive dependencies, is said to be in
_____ [Asked in InfyTQ]
(A) BCNF
(B) 1NF
(C) 2NF
(D) 3NF
www.knowledgegate.ai
Answer : B www.knowledgegate.ai
Q. Consider a relation R(A, B, C), which of the following statements is not true
according to inference rules for functional dependencies? [Asked in TCS
NQT 2021]
(D) If A → B, then AC → BC
www.knowledgegate.ai
Answer : B www.knowledgegate.ai
Q. Tables in second normal form (2NF): [Asked in L&T InfoTech (LTI) 2021]
(D) Have all non key fields depend on the whole primary key
www.knowledgegate.ai
Answer : A www.knowledgegate.ai
2 NF DECOMPOSTION
Q R(ABCDEFGHIJ)
A B C D E F G H I J
AB → C
AD → GH
BD → EF
A→I
H→J
www.knowledgegate.ai
www.knowledgegate.ai
BCNF DECOMPOSITION
Q R(ABCD)
A B C D
AB → C
BC → D
www.knowledgegate.ai
www.knowledgegate.ai
Q) Consider the schema R(S,T,U,V) and the dependencies S→T, T→U, U→V, V→S.
Let R= {R1,R2} such that R1∩R2=Φ. Then the decomposition is :[Amcat].
www.knowledgegate.ai
Lossy/Lossless-Dependency Preserving Decomposition
• When a table is decomposed into multiple tables during normalization, it’s essential to ensure the decomposition is done
properly. The most important property to check is called the lossless join property (also called non-additive join
decomposition).
• In simple terms, after decomposition, if we later perform a natural join of these smaller tables, we should get exactly the
original table back, without gaining or losing any tuples (rows). If after joining, we get additional or fewer tuples than the
original, then the decomposition is called lossy.
www.knowledgegate.ai
www.knowledgegate.ai
• Lossy decomposition occurs if:
• R1 ⋈ R2 produces more tuples (extra) than R, or fewer tuples (missing) than R.
• (Formally, if R1 ⋈ R2 ⊃ R or R ⊃ R1 ⋈ R2)
• Lossless decomposition occurs if:
• R1 ⋈ R2 = R (exactly same tuples, no more, no less)
• To verify lossless decomposition using functional dependencies, these conditions must hold true:
• The union of attributes in decomposed relations R1 and R2 should exactly cover all attributes of the original relation R.
• Att(R1) ∪ Att(R2) = Att(R)
• Intersection (common attributes) between R1 and R2 should not be empty.
• Att(R1) ∩ Att(R2) ≠ Φ
• The common attribute(s) between R1 and R2 must form a candidate key (or at least a key) for at least one of these relations.
• Att(R1) ∩ Att(R2) → Att(R1) or Att(R1) ∩ Att(R2) → Att(R2)
• These conditions guarantee the decomposition is lossless and preserve the correctness of database information.
www.knowledgegate.ai
www.knowledgegate.ai
A B C D E
A 122 1 W A
E 236 4 X B
A 199 1 Y C
B 213 2 Z D
www.knowledgegate.ai
www.knowledgegate.ai
Q R (A, B, C, D)
AB→CD, D→A
www.knowledgegate.ai
www.knowledgegate.ai
Q R(ABCDE)(NF) R1(AB) R2(BC) R3(ABCD) R4(EG)
A → BC
C → DE
D→E
www.knowledgegate.ai
www.knowledgegate.ai
Q R (A,B,C,D) is a relation. Which of the following does not have a lossless join
dependency preserving BCNF decomposition? [Asked in CoCubes]
a) A->B, B->CD
c) AB->C, C->AD
d) A->BCD
www.knowledgegate.ai
Answer : D www.knowledgegate.ai
Q Inst_dept (ID, name, salary, dept name, building, budget) is decomposed into
instructor (ID, name, dept name, salary)
department (dept name, building, budget)
This comes under [Asked in L&T Infotech (LTI)]
a) Lossy decomposition
b) Lossless-join decomposition
c) None
d) Both
www.knowledgegate.ai
Answer : B www.knowledgegate.ai
Indexing (Need for Indexing)
• Theoretically, relational databases are based on set theory, where the order of elements doesn’t matter. However, practically,
when we store data, the order in which records are stored significantly impacts the efficiency of database operations such as
searching, insertion, and deletion.
• Records in large files can be stored either in an ordered (sorted) or unordered (unsorted) manner. If records are stored
unordered, new records are typically inserted at the end, making insertions and deletions easy, but searches become slow, as
only linear search is possible. On the other hand, ordered storage of records enables faster search methods, such as binary
search—similar to quickly locating a page number in a book—but makes insertions and deletions costly due to necessary
reorganization.
• As databases grow larger, occupying numerous blocks on storage, accessing records becomes slow and inefficient if we
depend solely on the direct storage method (ordered or unordered). To address these issues, indexing is introduced.
www.knowledgegate.ai
www.knowledgegate.ai
• Indexing creates additional data structures (indexes) based on specific key attributes, allowing rapid data retrieval
without scanning the entire database. This significantly enhances the speed and efficiency of database
operations, particularly searches, while balancing the costs of insertion and deletion. Indexing in database
systems is similar to what we see in books.
www.knowledgegate.ai
www.knowledgegate.ai
• An index provides a secondary access path, giving us an alternate, faster way to access records without affecting how records
are physically stored in the main file.
• Typically, the size of an index file is much smaller compared to the main file because the index file stores only two pieces of
information: the search key (the attribute used for searching) and the block pointer (the address of the block in the main file
containing that record). In contrast, the main file stores complete records with all attributes.
• Indexes can be created on any attribute of the table, including both primary keys and non-key attributes. Additionally, multiple
indexes can be designed for the same main file, each index based on a different attribute.
• Index files are always stored in sorted order (ordered), regardless of whether the main file is ordered or unordered. This
ordered structure allows efficient searching methods, such as binary search.
• Indexing significantly reduces search time, making queries faster, but introduces extra storage overhead due to the additional
space required by the index files.
• The number of accesses needed to search and locate the correct block in the main file using an index is approximately
log₂(number of blocks in the index file) + 1.
www.knowledgegate.ai
www.knowledgegate.ai
Q Data which improves the performance and accessibility of the
database are called:
(a) Indexes
www.knowledgegate.ai
TYPES OF INDEXING
Index
Single
Multilevel
level
• Single level index means we create index file for the main file,
and then stop the process.
• Multiple level index means, we further index the index file and
keep repeating the process until we get one block.
www.knowledgegate.ai
www.knowledgegate.ai
PRIMARY INDEXING
• Main file is always sorted according to primary key.
• Indexing is done on Primary Key, therefore called as primary indexing
• Index file have two columns, first primary key and second anchor pointer (base address of block)
• It is an example of Sparse Indexing.
• Here first record (anchor record) of every block gets an entry in the index file
• No. of entries in the index file = No of blocks acquired by the main file.
www.knowledgegate.ai
www.knowledgegate.ai
CLUSTERED INDEXING
• Main file will be ordered on some non-key attributes
• No of entries in the index file = no of unique values of the attribute on which indexing is done.
• It is the example of Sparse as well as dense indexing
www.knowledgegate.ai
www.knowledgegate.ai
Q A clustering index is created when_______.
(A) primary key is declared and ordered
www.knowledgegate.ai
www.knowledgegate.ai
Q An index is clustered, if
(a) it is on a set of fields that form a candidate key
(b) it is on a set of fields that include the primary key
(c) the data records of the file are organized in the same order as the data entries of the
index
(d) the data records of the file are organized not in the same order as the data entries of
the index
www.knowledgegate.ai
www.knowledgegate.ai
Q A clustering index is defined on the fields which are of type (GATE-2008) (1
Marks)
a) non-key and ordering
www.knowledgegate.ai
www.knowledgegate.ai
SECONDARY INDEXING
• Secondary indexing is used when we already have a primary index based on a primary key, but there are frequent queries
involving other attributes as well. To speed up these queries, we create additional indexes called secondary indexes.
• Unlike primary indexing, the main file does not need to be ordered based on the attribute used for secondary indexing—it
can be unordered. Secondary indexes can be created on both key and non-key attributes.
• In secondary indexing, the number of entries in the index file typically matches the total number of records in the main file.
Thus, secondary indexing is usually an example of dense indexing, having an entry for every single record in the main file.
www.knowledgegate.ai
www.knowledgegate.ai
• Indexing can be classified based on various criteria. One important criterion is how frequently index entries are created,
leading to two types: Dense Index and Sparse Index. Dense and sparse indexes are not mutually exclusive or complementary.
• A Dense Index contains an index entry for every single search key value present in the main file. This means each distinct
value used for searching is listed in the index file. Dense indexing typically allows very quick data retrieval because each
search key is explicitly mapped, but it requires more storage space due to a higher number of entries. It’s important to
understand that a dense index has an entry for each distinct search key value, not necessarily each individual record.
Therefore, if a search key is repeated, the dense index might still contain fewer entries than the total number of records.
• In contrast, a Sparse Index contains index entries only for some selected records, usually at regular intervals. As a result,
the sparse index file always has fewer entries than the number of records in the main file, which means it takes up less
space. Searching with a sparse index can be slightly slower compared to dense indexing since some additional steps
might be needed to locate a specific record.
• It’s worth noting clearly that Dense and Sparse Indexing approaches are not necessarily complementary or mutually
exclusive. Depending on the search key used and practical requirements, both indexing methods might coexist within the
same database system. Sparse indexing is more suitable for ordered files, where it’s easy to locate records between two
index entries, whereas dense indexing works effectively with both ordered and unordered files.
www.knowledgegate.ai
www.knowledgegate.ai
MULTILEVEL INDEXING
• Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost
level so small that it can be saved in a single disk block-0, which can easily be accommodated anywhere in the
main memory.
www.knowledgegate.ai
www.knowledgegate.ai
Reason to have B tree and B+ tree
• After studying indexing in detail, we understand that index files are always sorted and frequently searched. As databases
grow larger, the index files themselves become so large that we sometimes need to index these indexes, known as multi-level
indexing. Thus, we require a suitable data structure to efficiently support searching, insertion, and deletion operations, all
while maintaining sorted order.
• Although many data structures exist—like arrays, linked lists, stacks, queues, and binary trees—each has limitations in
efficiently supporting frequent insertions, deletions, and quick searches simultaneously, especially for large, sorted datasets.
This led to the development of specialized tree structures called B-trees and B+ trees, specifically designed for indexing large
database files.
• B-trees and B+ trees are dynamic and can efficiently handle an increasing or decreasing number of records. They inherently
support multi-level indexing because their height remains minimal even as data grows, ensuring very fast searches. These
trees are self-balancing, maintaining perfect balance automatically, which significantly improves database search and update
performance.
www.knowledgegate.ai
www.knowledgegate.ai
B tree
• A B-tree of order m if non-empty is an m-way search tree in which.
• The root has at least zero child nodes and at most m child nodes.
• The internal nodes except the root have at least celling(m/2) child nodes and at most m child nodes.
• The number of keys in each internal node is one less than the number of child nodes and these keys partition the
subtrees of the nodes in a manner similar to that of m-way search tree.
• All leaf nodes are on the same level(perfectly balanced).
• Very less internal fragmentation, memory utilization is very good. Less number of nodes(blocks) are used and height is
also optimized, so access will be very fast. Difficulty of traversing the key sequentially. Means B-TREE do not hold good
for range-based queries of database.
www.knowledgegate.ai
www.knowledgegate.ai
B+ tree
www.knowledgegate.ai
www.knowledgegate.ai
Q Which one of the following statements is NOT correct about the B + tree data structure used for
creating an index of a relational database table?
(a) Each leaf node has a pointer to the next leaf node
www.knowledgegate.ai
www.knowledgegate.ai
Q B+ Trees are considered BALANCED because
a) the lengths of the paths from the root to all leaf nodes are all equal
b) the lengths of the paths from the root to all leaf nodes differ from each other by at
most 1
c) the number of children of any two non-leaf sibling nodes differ by at most 1
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q) The primary indexes, secondary indexes and cluster index are all types
of.[Amcat].
www.knowledgegate.ai
Ans=A www.knowledgegate.ai
Q) In case the indices values are larger, index is created for these values of the
index. This is called.[Amcat].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) Which one is true about clustered index.[Amcat].
www.knowledgegate.ai
Ans=C www.knowledgegate.ai
Q) Which of the following is true for B+ trees:[Hexaware]
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Q. Which of the following statements is FALSE with respect to B-tree and B+ trees? [Asked in
InfyTQ]
(A) Deletion operation is easier in B-tree but complex in the case of B+ trees.
(B) In B+ trees, data records are stored only in the leaf nodes but in B trees data records are
stored both in leaf and internal nodes.
(C) Search keys are repeated in the case of B+ trees but not in the case of B trees.
www.knowledgegate.ai
Answer : A www.knowledgegate.ai
Query Language
• After designing a database—starting with ER diagrams, converting them into relational models, normalizing, and indexing—the next important
task is how to efficiently store, retrieve, and modify data. For interacting with databases, we use special languages called query languages.
Although they can perform multiple operations, our main focus will be on data retrieval.
• Query languages, also known as Data Query Languages (DQLs), are computer languages that allow users to request information from a
database. The most widely-used example of such a language is Structured Query Language (SQL).
• There are two main types of query languages:
• Procedural Query Languages: In procedural query languages, users explicitly instruct the system on “what” data to retrieve and also
“how” exactly to retrieve it. Relational Algebra is a key example of a procedural language, where the user specifies step -by-step
operations to obtain the required result.
• Non-Procedural (Declarative) Query Languages: In non-procedural languages, users specify only “what” data they want, without
mentioning the exact steps or methods for retrieving it. Relational Calculus (including Tuple Relational Calculus and Domain Relational
Calculus) is an example of a declarative query language based on mathematical logic.
• Relational Algebra and Relational Calculus provide fundamental mathematical foundations for querying relational databases. They are
theoretical and not directly executed by computer systems, but SQL is practically built upon these foundations.
• Structured Query Language (SQL), which operates on RDBMS, combines both procedural and non-procedural features, making it powerful and
flexible for practical database operations.
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q ) What is the language used by most of the DBMSs for helping their users to access data?[TCS
NINJA]
(c) SQL
(d) 4GL
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Introduction to SQL
• Structured Query Language (SQL) is currently the most popular database query language used in relational database
management systems (RDBMS). While we often refer to SQL as just a “query language,” it can actually perform many
additional tasks. These include defining database structures, modifying data, managing security constraints, and handling
various database-related operations.
• SQL was originally developed by IBM as “SEQUEL” (Structured English Query Language) during the System R research project
in the early 1970s. Later, due to trademark reasons, its name changed from SEQUEL to SQL. Over time, SQL became the
widely accepted standard language for relational databases.
• The SQL language combines concepts from both relational algebra (procedural) and relational calculus (non-procedural),
giving it both flexibility and power. Its standards have evolved through multiple revisions, starting with SQL-86, and
continuing with updates such as SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016, SQL:2016, and most
recently, SQL:2023.
www.knowledgegate.ai
www.knowledgegate.ai
Parts of SQL
• Data Definition Language (DDL): This includes commands used for defining database structures like tables, modifying or deleting these
structures, and specifying constraints on data (e.g., NOT NULL constraint for primary keys). It also includes commands to def ine database
views and control authorization (user permissions, like granting read/write access).
• Data Manipulation Language (DML): DML consists of commands used for retrieving (querying) data, as well as inserting, updatin g, or deleting
records within a database.
• Transaction Control Language (TCL): TCL includes commands that manage database transactions, ensuring data integrity during o perations
(e.g., COMMIT, ROLLBACK, SAVEPOINT).
• Embedded and Dynamic SQL: This part specifies how SQL commands can be integrated (embedded) into general-purpose programming
languages such as C, C++, and Java, enabling programmers to use SQL commands directly in their applications.
www.knowledgegate.ai
www.knowledgegate.ai
Q Which one is correct?
(a) DML includes a query language based on both relation algebra and tuple calculus
(d) DML includes a query language based on none of the relational algebra and tuple calculus
www.knowledgegate.ai
www.knowledgegate.ai
Q) The language associated with a database management system that is employed by end users
and programmers to manipulate data in the database is the: [Amcat].
Ans=C.
www.knowledgegate.ai
www.knowledgegate.ai
Q DBMS provides the facility of accessing data from a database through?
(a) DDL
(b) DML
(c) DBA
(d) Schema
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following is/are correct?
(a) An SQL query automatically eliminates duplicates
(b) An SQL query will not work if there are no indexes on the relations
www.knowledgegate.ai
www.knowledgegate.ai
Q Which is the subset of SQL commands used to manipulate Oracle Database
structures, including tables? [Asked in TCS NQT]
a) Data Definition Language(DDL)
c) Both of above
d) None
www.knowledgegate.ai
www.knowledgegate.ai
Q What is a view? [Asked in E-Litmus ]
D) None of these
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following statement is correct regarding the difference between TRUNCATE,
DELETE and DROP command? [ Asked in Infosys]
I. DELETE operation can be rolled back but TRUNCATE and DROP operations cannot be rolled
back.
II. TRUNCATE and DROP operations can be rolled back but DELETE operations cannot be rolled
back.
III. DELETE is an example of DML, but TRUNCATE and DROP are examples of DDL.
IV. All are an example of DDL.
A) I and III
B) II and III
C) II and IV
www.knowledgegate.ai
D) II and IV www.knowledgegate.ai
Q) Which of the following is generally used for performing tasks like creating the structure of the
relations, deleting relation?[Tcs Ninja].
(b) Query
www.knowledgegate.ai
Ans=D www.knowledgegate.ai
Q) ROLLBACK in a database is ____ statement.[Wipro].
(a) DDL
(b) DML
(c) DCL
(d) TCL
www.knowledgegate.ai
Ans=D. www.knowledgegate.ai
Q) Three DDL commands: [Amcat]
Ans=A.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Major Language of databases are__________. [Amcat]
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) DQL uses select command which is used to.[Amcat].
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
Basic Structure of SQL Queries
• Every SQL query reads at least one relation (table) listed in the FROM clause.
• It always returns one relation as its result; by default this result has no name, but its columns keep the names
they had in the input tables.
• Mandatory and Optional Clauses WHERE <condition> is optional. If you omit it, the condition is treated as true
(all rows qualify).
www.knowledgegate.ai
www.knowledgegate.ai
• Case-Insensitivity
• SQL keywords are not case-sensitive (select, SELECT, and SeLeCt are equivalent).
• Duplicates in Results
• In the pure relational model, a relation is a set (no duplicates).
• SQL keeps duplicates by default because removing them can be slow.
• Add DISTINCT right after SELECT to drop duplicates when needed.
• The keyword ALL can be written after SELECT to keep duplicates explicitly, but it is unnecessary because this
is already the default.
www.knowledgegate.ai
www.knowledgegate.ai
Select Clause
• Purpose – SELECT chooses which columns appear in the result, just like projection (π) in relational algebra; this is
vertical filtering of a table.
• Column order – Columns are returned in the same order they are listed after SELECT.
• Always required – SQL queries must start with SELECT, even if you need every column; in relational algebra
projection is optional, but SQL keeps syntax consistent.
• All columns quickly – Write SELECT * when you need every column without listing them one-by-one.
• Expressions allowed – You may include arithmetic expressions (price * 0.9, marks + 5) alongside column names;
these calculate values for display only and do not alter the stored data.
• Select A1, A2,..., An (Column name)
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all the details of bank branches?
Q Write a SQL query to find each loan number along with loan amount?
Q Write a SQL query to find the name of all customer without duplication having bank account?
Q Write a SQL query to find all account_no and balance with 6% yearly interest added to it?
www.knowledgegate.ai
www.knowledgegate.ai
Q Given SQL statement is example of: [Asked in Hexaware 2018]
www.knowledgegate.ai
www.knowledgegate.ai
Q In SQL, which command is used to SELECT only one copy of each set
of duplicable rows [ Asked in Accenture 2020]
A) SELECT DISTINCT
B) SELECT UNIQUE
C) SELECT DIFFERENT
www.knowledgegate.ai
www.knowledgegate.ai
Select Clause with where clause
• Where clause in SQL is same as ‘σ’ sigma of relational algebra where we specify the conditions/Predicate (horizontal filtering)
• Where clause can have expressions involving the comparison operators <, <, >, >=, <= and <>. SQL allows us to use the
comparison operators to compare strings and arithmetic expressions.
• SQL allows the use of the logical connectives and, or, and not in the where clause.
• SQL includes a between comparison operator to simplify where clauses that specify that a value be less than or equal to
some value and greater than or equal to some other value.
• Similarly, we can use the not between comparison operator.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all account_no where balance is less the 1000?
Q Write a SQL query to find branch name which is situated in Delhi and having assets less than 1,00,000?
Q Write a SQL query to find branch name and account_no which has balance greater than equal to 1,000 but less
than equal to 10,000?
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following three SQL queries (Assume the data in the people table):
(a) Select Name from people where Age > 21;
(b) Select Name from people where Height > 180;
(c) Select Name from people where (Age > 21) or (Height > 180);
If the SQL queries (a) and (b) above, return 10 rows and 7 rows in the result set respectively, then
what is one possible number of rows returned by the SQL query ?
(a) 3
(b) 7
(c) 10
(d) 21
www.knowledgegate.ai
www.knowledgegate.ai
Q What is the output of the below query : [Asked in Infosys]
Select NAME
from Student
where BRANCH <> ‘CSE’
www.knowledgegate.ai
www.knowledgegate.ai
Q Find the cities name with the condition and temperature from table 'whether' where
condition = sunny or cloudy but temperature >= 60. [Asked in TCS NQT]
A) SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' AND condition = 'sunny' OR temperature >= 60
B) SELECT city, temperature, condition FROM weather WHERE condition = 'cloudy' OR condition = 'sunny' OR temperature >= 60
C) SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' OR condition = 'cloudy' AND temperature >= 60
D) SELECT city, temperature, condition FROM weather WHERE condition = 'sunny' AND condition = 'cloudy' AND temperature >= 60
www.knowledgegate.ai
www.knowledgegate.ai
Set Operation
• The SQL operations union, intersect, and except/minus operate on relations and corresponds to the
mathematical set-theory operations ∪, ∩ and – respectively.
• The union operation automatically eliminates duplicates, unlike the select clause, If we want to retain all
duplicates, we must write union all in place of union.
• The intersect operation automatically eliminates duplicates. If we want to retain all duplicates, we must write
intersect all in place of intersect.
• If we want to retain duplicates, we must write except all in place of except.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all the customer name
a) who have a loan or an account or both ?
b) who have both a loan and an account?
c) who have a loan but do not have an account?
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following ORACLE relations:
One (x, y) = {<2, 5>, <1, 6>, <1, 6>, <1, 6>, <4, 8>, <4, 8>}
Two (x, y) = {<2, 55>, <1, 1>, <4, 4>, <1, 6>, <4, 8>, <4, 8>, <9, 9>, <1, 6>}
Consider the following two SQL queries SQ1 and SQ2:
SQ1: SELECT * FROM One EXCEPT (SELECT * FROM Two);
SQ2: SELECT * FROM One EXCEPT ALL (SELECT * FROM Two);
For each of the SQL queries, what is the cardinality (number of rows) of the result obtained when
applied to the instances above?
www.knowledgegate.ai
Q Consider the below given query
What should the database manager write in place of command to get distinct values from table1
and table2 of column_name(s)? [Asked in Cognizant]
a) SET DIFFERENCE
b) DISTINCT
c) UNION
d) UNION ALL
www.knowledgegate.ai
www.knowledgegate.ai
Q The UNION SQL clause can be used with_____________ [Asked in Wipro
NLTH]
www.knowledgegate.ai
Q Suppose now that R(A,B) and S(A,B) are two relations with r and s tuples, respectively (again,
not necessarily distinct). If m is the number of (not necessarily distinct) tuples in the result of the
SQL query:
R intersect S;
Then which of the following is the most restrictive, correct condition on the value of m? [Asked
in L&T Infotech (LTI)]
a) m = min(r,s)
b) 0 <= m <= r + s
www.knowledgegate.ai
www.knowledgegate.ai
Queries on Multiple Relations
• Need for multi-table queries – Many real questions combine data from more than one table, not just a single
relation.
• FROM creates a Cartesian product – Listing two tables (r1, r2) forms every possible pair of their rows; filters in
WHERE are then used to pick the matching pairs you actually need.
• Attribute name clashes – If both tables have a column with the same name, qualify it with the table name or an
alias (r1.id, r2.id) to avoid confusion.
R1 R2 R1 * R 2
A B B C
1 P Q X
2 Q R Y
3 R S Z
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the name of all the customers with account balance, who have an account in the bank?
Q Write a SQL query to find the name of all the customers, who have a loan in the bank of less than 1000 or loan
from north_delhi branch?
Q Write a SQL query to find the name of the customer who have an account in the branch situated in Delhi?
www.knowledgegate.ai
www.knowledgegate.ai
Q) The given Query can be replaced with ____________:[Tcs Ninja].
SELECT name
FROM instructor1
WHERE salary <= 100000 AND salary >= 90000;
a.
SELECT name
FROM instructor1
WHERE salary BETWEEN 100000 AND 90000
b.
SELECT name
FROM instructor|
WHERE salary BETWEEN 90000 AND 100000;
c.
SELECT name
FROM instructor1
WHERE salary BETWEEN 90000 AND 100000;
d.
SELECT name
FROM instructor!
WHERE salary <= 90000 AND salary>=100000;
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) If table named Employee has 50 records and Employee2 has 0 records .what will be the
output of following query?[IBM].
Select e1.*
from Employee e1,Employee e2
where e1.empno=e2.empno;
(a) 4 Rows.
(b) 6 Rows.
(c) 2 Rows.
(d) None of the .
www.knowledgegate.ai
Ans=D. www.knowledgegate.ai
Q Consider the set of relations shown below and the SQL query that follows
Students: (Roll_number, Name, Date_of_birth)
(C) Names of students who have got an A grade in at least one of the
courses taught by Korth
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
R1 R2
A B B C
1 P Q X
2 Q R Y
3 R S Z
R1 ⋈ R2
R1 * R2 A B C
A R1.B R2.B C
1 P Q X
1 P R Y
1 P S Z
2 Q Q X
2 Q R Y
2 Q S Z
3 R Q X
3 R R Y
3
www.knowledgegate.ai R S Z
www.knowledgegate.ai
Q Write a SQL query to find the name of all the customers along with account balance, who have an account in the
bank?
Q Write a SQL query to find the name of the customer who have an account in the branch situated in Delhi?
www.knowledgegate.ai
www.knowledgegate.ai
Q) The given Query can also be replaced with_______:[Cognizant].
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Outer Join
• Inner Join vs. Outer Join:
• Inner joins (NATURAL JOIN) keep only the rows that match in both tables. Rows with no match are discarded.
• Outer joins keep the matching rows and also preserve the non-matching rows by filling the missing side with NULL.
• Why Outer Join?
• Prevents information loss when one table has rows with no counterpart in the other table.
• Types of Outer Join
• LEFT OUTER JOIN (LEFT JOIN) – keeps all rows from the left table; unmatched right-table columns are filled with NULL.
• RIGHT OUTER JOIN (RIGHT JOIN) – keeps all rows from the right table; unmatched left-table columns are filled with
NULL.
• FULL OUTER JOIN – keeps all rows from both tables; NULLs appear wherever a match is missing on either side.
www.knowledgegate.ai
www.knowledgegate.ai
R1 R2
R1 ⋈ R2 R1 ⟗ R2
A B B C
A B C A B C
1 P Q X
2 Q X
2 Q R Y
3 R Y
3 R S Z
R1 * R2 R1 ⟕ R2
A R1.B R2.B C A B C
1 P Q X
1 P R Y
1 P S Z
2 Q Q X
2 Q R Y
2 Q S Z R1 ⟖R2
3 R Q X
3 R R Y A B C
3 R S Z
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following two tables and four queries in SQL.
Query 1: SELECT B.isbn, S.copies Query 2: SELECT B.isbn, S.copies
Book (isbn, bname) FROM Book B INNER JOIN Stock S FROM Book B LEFT OUTER JOIN Stock S
Stock (isbn, copies) ON B.isbn = S.isbn; ON B.isbn = S.isbn;
Which one of the queries above is certain to have an output that is a superset of the outputs of
the other three queries?
(a) Query 1
(b) Query 2
(c) Query 3
(d) Query 4
www.knowledgegate.ai
www.knowledgegate.ai
Q Suppose ORACLE relation R(A, B) currently has tuples {(1, 2), (1, 3), (3, 4)} and relation S(B, C)
currently has {(2, 5), (4, 6), (7, 8)}.
Consider the following two SQL queries SQ1 and SQ2 :
SQ1: Select * From R Full Join S On R.B = S.B;
SQ2: Select * From R Inner Join S On R.B = S.B;
The numbers of tuples in the result of the SQL query SQ1 and the SQL query SQ2 are given by:
[Asked in CoCubes]
(a) 2 and 6 respectively
www.knowledgegate.ai
(d) 4 and 2 respectively
www.knowledgegate.ai
Q Which of the following is true ?
I. Implementation of self-join is possible in SQL with table alias.
III. Natural join and outer join operations are equivalent. [Asked in Cognizant ]
D) Only I is correct.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Select the correct query/queries for cross join:[Amcat]
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
Q) Which of the join operations do not preserve non matched tuples.[Amcat].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) The SQL join which does Cartesian product is called. [Amcat].
www.knowledgegate.ai
Ans=D. www.knowledgegate.ai
Q) Which are the join types in join condition:[Amcat].
Ans=D.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Which join is to be used between two tables A and B when the resultant table needs rows from A and B that
matches the condition and rows from A that does not match the condition.[Amcat]
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
Q) SQL the statement select * from R, S is equivalent to.[Amcat].
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Q Consider the tables project and allocation given below:[infosys].
Projectid Projectna Emepiede Empne Projectid
(a) 7
(b) 5
(c) 3
(d) 6
www.knowledgegate.ai
Answer : C www.knowledgegate.ai
Q) Ravi Sastri is a good coach. He wants to join 2 relational tables related to sports. He also
wants that all the matched and unmatched tuples of the right table should be present in the
result but only the matched rows of the left table should be present in the result after the join
operation. Which of the following joins he should use.[asked in Infytq].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) Which of the following type of Joins you will use when you have to join a table with
itself?[L&T InfoTech].
Ans=C.
www.knowledgegate.ai
www.knowledgegate.ai
Alias Operation/rename
• SQL aliases are used to give a table, or a column in a table, a temporary name. Just create a new copy but do
not change anything in the data base.
• It uses the as clause, taking the form: old-name as new-name. The as clause can appear in both the select and
from clauses.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the account_no along and balance with 8% interest, as
Account, total_balance?
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the loan_no with maximum loan amount?
www.knowledgegate.ai
www.knowledgegate.ai
Q The SQL expression
Select distinct T.branch_name
from branch T, branch S
where T.assets > S.assets and S.branch_city=”Mumbai” finds the names of (NET-SEP-2013)
(A) All branches that have greater assets than some branch located in Mumbai.
(B) All branches that have greater assets than all branches in Mumbai.
(C) The branch that has greatest asset in Mumbai.
(D) Any branch that has greater assets than any branch in Mumbai.
www.knowledgegate.ai
www.knowledgegate.ai
Aggregate Functions
• Aggregate functions are functions that take a collection (a set or multiset) of values as input and return a single
value. SQL offers five built-in aggregate functions:
• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count
• The input to sum and avg must be a collection of numbers, but the other operators can operate on collections
of nonnumeric data types, such as strings, as well.
• We use the aggregate function count frequently to count the number of tuples in a relation. The notation for
this function in SQL is count (*).
• Count is the only aggregate function which can work with null, all other
aggregate functions simply ignore null.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the number of accounts in the bank?
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the average balance of every account in the banks
from south_delhi branch?
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider a table along with two query?
Select sum(balance)/count(balance)
from account
www.knowledgegate.ai
www.knowledgegate.ai
Q In SQL, __________ is an Aggregate function.
(a) SELECT
(b) CREATE
(c) AVG
(d) MODIFY
www.knowledgegate.ai
www.knowledgegate.ai
Q) Which of the following is not a function of aggregate function?[Hexaware]
(a) Avg
(b) Count
(c) Create
(d) Max
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q Which of the following is aggregate function in SQL? [Asked in Wipro NLTH]
A) Avg
B) Select
C) Ordered by
D) distinct
www.knowledgegate.ai
www.knowledgegate.ai
Q Count function in SQL returns the number of [Asked in TCS NQT]
A) values.
B) distinct values.
C) groups.
D) columns.
www.knowledgegate.ai
www.knowledgegate.ai
Q Given the following Employee relational database. What does the following expression
result ? [Asked in TCS NQT]
Select MAX(Salary)
from Employee
where Salary < ( Select MAX(Salary) from Employee)
a) 3000
b) 4000
.
c) 7000
d) 8000
www.knowledgegate.ai
www.knowledgegate.ai
Q Table employees has 10 records. It has a non-NULL SALARY column which is also
UNIQUE. The SQL statement [ Asked in Wipro NLTH]
SELECT COUNT(*)
FROM EMPLOYEE
WHERE SALARY > ALL (SELECT SALARY FROM EMPLOYEE);
a) 10
b) 9
c) 5
d) 0
www.knowledgegate.ai
www.knowledgegate.ai
Ordering the Display of Tuples
• SQL offers the user some control over the order in which tuples in a relation are displayed.
The order by clause causes the tuples in the result of a query to appear in sorted order.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all the branch_name which are situated in Delhi in alphabetic order?
www.knowledgegate.ai
www.knowledgegate.ai
Q) In the following Query, which of the following can be placed in the Query's blank portion to
display the salary from highest to lowest amount, and sorting the employs name
alphabetically?[Tcs Ninja].
SELECT *
FROM instructor
ORDER BY salary ____, name ___;
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
String Operations
• SQL specifies strings by enclosing them in single quotes, for example, ’Computer’. The SQL standard specifies that the
equality operation on strings is case sensitive; as a result the expression ‘Computer’ = ’computer’ evaluates to false.
• However, some database systems, such as MySQL and SQL Server, do not distinguish uppercase from lowercase when
matching strings; as a result, would evaluate to true on these databases. This default behavior can, however, be changed,
either at the database level or at the level of specific attributes.
• SQL also permits a variety of functions on character strings, such as concatenating, extracting substrings, finding the length
of strings, converting strings to uppercase and lowercase, removing spaces at the end of the string and so on. There are
variations on the exact set of string functions supported by different database systems.
• A single quote character that is part of a string can be specified by using two single
quote characters; for example, the string It’s right can be specified by It”s right.
www.knowledgegate.ai
www.knowledgegate.ai
• Pattern matching can be performed on strings, using the operator like. We describe patterns
by using two special characters:
• Percent (%): The % character matches any substring.
• Underscore (_): The _ character matches any character.
• ’%Comp%’ matches any string containing “Comp” as a substring, for example, ’Intro to
Computer Science’, and ’Computational Biology’.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all the branch name who have exactly 5 character in their name ?
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find all the customer name who have ‘kumar’ in their name ?
www.knowledgegate.ai
www.knowledgegate.ai
• For patterns to include the special pattern characters (that is, % and_), SQL allows the specification of an escape character.
• The escape character is used immediately before a special pattern character to indicate that the special pattern character is
to be treated like a normal character.
• We define the escape character for a like comparison using the escape keyword. To illustrate, consider the following
patterns, which use a backslash (\) as the escape character:
• like ’ab\%cd%’ escape ’\’ matches all strings beginning with “ab%cd”.
• like ’ab\\cd%’ escape ’\’ matches all strings beginning with “ab\cd”.
• SQL allows us to search for mismatches instead of matches by using the not like comparison operator. Some databases
provide variants of the like operation which do not distinguish lower and upper case.
• SQL:1999 also offers a similar to operation, which provides more powerful pattern matching than the like operation; the
syntax for specifying patterns is similar to that used in Unix regular expressions.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Ready the Query carefully:
In the above-given Query, which of the following can be placed in the Query's blank portion to
select the "dept_name" that also contains Computer Science as its ending string?
(a) &
(b) _
(c) %
(d) $
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q Consider a “CUSTOMERS” database table having a column “CITY” filled with all the names of
Indian cities (in capital letters). The SQL statement that finds all cities that have “GAR”
somewhere in its name, is:
(a) Select * from customers where city = ‘%GAR%’
www.knowledgegate.ai
www.knowledgegate.ai
Q Write the SQL statement for the query : “ Find the name of the students whose name contains letter ‘D’.
www.knowledgegate.ai
www.knowledgegate.ai
Group by clause
1. There are circumstances where we would like to work on a set of tuples in a relation rather
than working on the whole table as one unit.
2. The attribute or attributes given in the group by clause are used to form groups. Tuples with
the same value on all attributes in the group by clause are placed in one group.
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the average account balance of each branch?
www.knowledgegate.ai
www.knowledgegate.ai
Q Write a SQL query to find the branch name of Gwalior city with average balance more than
1500?
www.knowledgegate.ai
www.knowledgegate.ai
Q
a) 0
b) 1
c) 2
d) 3
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following ORACLE relations: A B C B C D
R (A, B, C) = {<1, 2, 3>, <1, 2, 0>, <1, 3, 1>, <6, 2, 3>, <1, 4, 2>, <3, 1, 4> } 1 2 3 2 3 7
S (B, C, D) = {<2, 3, 7>, <1, 4, 5>, <1, 2, 3>, <2, 3, 4>, <3, 1, 4>}. 1 2 0 1 4 5
Consider the following two SQL queries SQ1 and SQ2 : 1 3 1 1 2 3
SQ1 : SELECT R⋅B, AVG (S⋅B) 6 2 3 2 3 4
FROM R, S 1 4 2 3 1 4
WHERE R⋅A = S⋅C AND S⋅D < 7 3 1 4
GROUP BY R⋅B;
www.knowledgegate.ai
www.knowledgegate.ai
Q The relation scheme given below is used to store information about the employees of a company, where empId is
the key and deptId indicates the department to which the employee is assigned. Each employee is assigned to
exactly one department.
emp(empId, name, gender, salary, deptId)
The above query gives, for each department in the company, the number of female employees whose salary is
greater than the average salary of [Asked in E-Litmus]
www.knowledgegate.ai
www.knowledgegate.ai
Q The STUDENT information in a university stored in the relation STUDENT (Name, SEX, Marks, DEPT_Name)
Consider the following SQL Query
SELECT DEPT_Name
from STUDENT
where SEX = 'M’
group by DEPT_Name
having avg (Marks)>SELECT avg (Marks) from STUDENT.
B) The average marks of male students is more than the average marks of
students in the University
C) The average marks of male students is more than the average marks of
male students in the University
D) The average marks of students is more than the average marks of male
students in the University
www.knowledgegate.ai
www.knowledgegate.ai
Nested Subqueries
A subquery is a select-from- where expression that is nested within another query. A common use of sub-queries is to perform
• Tests for set membership
• make set comparisons
• determine set cardinality, by nesting subqueries in the where clause.
2. The in connective tests for set membership, where the set is a collection of values produced by a select clause.
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following relation:
Works (emp_name, company_name, salary) Here, emp_name is primary key. Consider the following SQL query
Select emp_name
From works T
where salary > (select avg (salary)
from works S
where T.company _ name = S.company _ name)
The above query is for following :
(a) Find the highest paid employee who earns more than the average salary of all employees of his company.
(b) Find the highest paid employee who earns more than the average salary of all the employees of all the companies.
(c) Find all employees who earn more than the average salary of all employees of all the companies.
(d) Find all employees who earn more than the average salary of all employees of their company.
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the query :
SELECT student_name
FROM student_data
WHERE rollno (SELECT rollno
FROM student_marks
WHERE SEM1_MARK=SEM2_MARK);
Which of the following is true?
(A) It gives the name of the student whose marks in semester 1 and semester 2 are same.
(B) It gives all the names and roll nos of those students whose marks in semester 1 and semester 2 are same.
(C) It gives the names of all the students whose marks in semester 1 and semester 2 are same.
(D) It gives roll numbers of all students whose marks in semester 1 and semester 2 are same.
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the query :
SELECT student_name
FROM students
WHERE class_name = (SELECT class_name
FROM students
WHERE math_marks=100);
what will be the output ?
(A) the list of names of students with 100 marks in mathematics
(B) the names of all students of all classes in which at least one student has 100 marks in mathematics
(C) the names of all students in all classes having 100 marks in mathematics
(D) the names and class of all students whose marks in mathematics is 100
www.knowledgegate.ai
www.knowledgegate.ai
[Asked in Cognizant ]
a) 6
b) 4
c) 5
d) 3
www.knowledgegate.ai
www.knowledgegate.ai
Q Which SQL operator tests if any of the sub-query value meets the condition ? [Asked in E-Litmus]
a) LIKE operator
b) SOME operator
c) OR operator
d) IN operator
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the set of relations shown below and the SQL query that follows
Students: (Roll_number, Name, Date_of_birth)
(C) Names of students who have got an A grade in at least one of the
courses taught by Korth
www.knowledgegate.ai
SELECT *
FROM EMP
WHERE eno NOT IN (SELECT manager FROM EMP);
a) 0
b) 1
c) 3
d) error
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider a relation book (title, price) which contains the titles and prices of different books. Assuming that no two
books have the same price, what does the following SQL query list? Select title from book as B
where (select count (*)
from book as T
where T.price > B.price) < 5
(a) Titles of the six most expensive books.
www.knowledgegate.ai
www.knowledgegate.ai
Test for Empty Relations
Select branch_name
from branch as a
Where assets <= 1,00,000 and
exists(Select *
from Branch as b
where branch_city = ‘Gwalior’
and a.branch_name = b.branch_name)
• SQL includes a feature for testing whether a subquery has any tuples in its result. The exists construct returns the value true if
the argument subquery is nonempty.
• The above query also illustrates a feature of SQL where a correlation name from an outer query, can be used in a subquery in
the where clause. A subquery that uses a correlation name from an outer query is called a correlated subquery.
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the table Student(stuid, name, course, marks).
Which one of the following two queries is correct to find the highest marks student in course 5 ?
Q.1 Select S.stuid
From student S
Where not exists (select ∗
from student e
where e course = ‘5’ and e marks ≥ s marks)
Q.2 Select s.stu.id
From student S
Where s ⋅ marks > any (select distinct marks
from student S
where s ⋅ course = 5)
(A) Q.1
(B) Q.2
www.knowledgegate.ai
Q Consider the relational database with the following four schemas and their
respective instances. Student(sNo, sName, dNo) Dept(dNo, dName)
Course(cNo, cName, dNo) Register(sNo, cNo)
SQL Query:
SELECT * FROM Student AS S WHERE NOT EXIST
(SELECT cNo FROM Course WHERE dNo = “D01”
EXCEPT
SELECT cNo FROM Register WHERE sNo = S.sNo)
The number of rows returned by the above SQL query is___________.
ANS:- ( 2 )
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q) Which of the operation are not specifies in triggers.[Amcat].
(a) ALTER
(b) UPDATE
(c) INSERT
(d) DELETE
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
Q) In SQL, which of the following commands is used to add a column to a table in a
database?[Tcs Ninja].
www.knowledgegate.ai
Ans=A. www.knowledgegate.ai
Q) Which of the following is CORRECT about DELETE command?.[Accenture].
d) None of these.
www.knowledgegate.ai
Ans=B&C. www.knowledgegate.ai
Q) Consider the following two commands, C1 and C2, on the relation R from an SQL database:[Tcs Ninja].
(a) I only
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) What is correct about DELETE command and TRUNCATE command? [Accenture].
(a) 1 &2
(b) 1 &4
(c) 2&3
(d) All of the above.
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Q) Which one is correct syntax for Insert Statement.[Amcat].
(c) Insert Columns(Col1, Col2,Col3) VALUE (Val1, Val2,Val3) Into table name;
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Q) Which one is correct syntax for Update Statement.[Amcat].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
www.knowledgegate.ai
www.knowledgegate.ai
Q) The CREATE TRIGGER statement is used to create the trigger. THE _____ clause specifies the table
name on which the trigger is to be attached. The ______ specifies that this is an AFTER INSERT
trigger.[Amcat].
www.knowledgegate.ai
Ans=B. www.knowledgegate.ai
Q) What is the error in the following code of PL/SQL: [Amcat].
declare
a integer :=5;
b integer :=10;
c integer;
begin
c:=a+b;
dbms _ output .put _line ('sum='||c);
end;
(c) C:=A+B;
www.knowledgegate.ai
Q) How can we specifies a row-level trigger.[Amcat].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) Anil is designing a database of motor vehicles. It has one base entity ‘Vehicles’ which is
classified into two sub-entities, 2 -wheeler and 4-wheeler. He further breaks them down into
more entities. What is the process being used here? [Amazon].
(a) Generalization
(b) Specialization
(c) Aggregation
(d) Segregation
www.knowledgegate.ai
www.knowledgegate.ai
Q) Metadata about relations are stored in a structure known as.[Amcat].
www.knowledgegate.ai
Ans=C. www.knowledgegate.ai
Q) Data dictionary tell DBMS.[Amcat].
www.knowledgegate.ai
Ans=D. www.knowledgegate.ai
TRANSACTION
• According to general computation principle (operating system) we may have partially executed program, as the level of
atomicity is instruction i.e. either an instruction is executed completely or not. But in DBMS view, user perform a logical
work(operation) which is always atomic in nature i.e. either operation is execute or not executed, there is no concept like
partial execution. For example, Transaction T1 which transfer 100 units from account A to B.
• In this transaction if a failure occurs after Read(B) then the final statue of the system will be inconsistent as 100 units are
debited from account A but not credited in account B, this will generate inconsistency. Here for ‘consistency’ before (A + B) ==
after (A + B)”.
• To remove this partial execution problem, we increase the level of atomicity and bundle all the instruction of a logical
operation into a unit called transaction. So formally ‘A transaction is a Set of logically related instructions to perform a logical
unit of work’.
T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
www.knowledgegate.ai
Write(B)
www.knowledgegate.ai
• As here we are only concerned with DBMS so we well only two basic operation on database
• READ (X) - Accessing the database item x from disk (where database stored data) to memory
variable also name as X.
• WRITE (X) - Writing the data item from memory variable X to disk.
www.knowledgegate.ai
www.knowledgegate.ai
Q A transaction can include following basic database access operations:
(A) Read_item(X)
(B) Write_item(X)
www.knowledgegate.ai
www.knowledgegate.ai
Desirable properties of transaction
• Now as the smallest unit which have atomicity in DBMS view is transaction, so if want that our data should be consistent then
instead of concentrating on data base, we must concentrate on the transaction for our data to be consistent.
• Transactions should possess several properties, often called the ACID properties; to provide integrity and consistency of the
data in the database. The following are the ACID properties:
T1
Read(A)
A = A-100
Write(A)
Read(B)
B = B+100
Write(B)
www.knowledgegate.ai
www.knowledgegate.ai
• Atomicity - A transaction is an atomic unit of processing; it should either be performed in its entirety or not performed at all.
It is the responsibility of recovery control manager / transaction control manager of DBMS to ensure atomicity.
• Consistency - A transaction should be consistency preserving, meaning that if it is completely executed from beginning to end
without interference from other transactions, it should take the database from one consistent state to another. The definition
of consistency may change from one system to another. The preservation of consistency of database is the responsibility of
programmers(users) or the DBMS modules that enforces integrity constraints.
• Isolation - A transaction should appear as though it is being executed in isolation from other transactions, even though many
transactions are executing concurrently. That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently. The isolation property of database is the responsibility of concurrency control manager
of database.
• Durability - The changes applied to the database by a committed transaction must persist in the database. These changes
must not be lost because of any failure. It is the responsibility of recovery control manager of DBMS.
www.knowledgegate.ai
www.knowledgegate.ai
Q NOT a part of the ACID properties of database transactions?
(a) Atomicity
(b) Consistency
(c) Isolation
(d) Deadlock-freedom
www.knowledgegate.ai
www.knowledgegate.ai
Q All Oracle transactions obey the basic properties of a database transaction. What
is the name of the following property? ‘All tasks of a transaction are performed or
none of them are. There are no partial transactions.’ [Asked in Wipro NLTH]
a) Atomicity
b) Durability
c) Consistency
d) Isolation
www.knowledgegate.ai
www.knowledgegate.ai
Q) Sunil is working on a database management system. He wants to program transactions such
that a transaction is completed only if the entire database operations are completed and
committed, otherwise the transaction is aborted and rolled back. Which of the following
database characteristics is he trying to implement?[Amazon].
(a) Atomicity
(b) Consistency
(c) Isolation
(d) Durability
www.knowledgegate.ai
www.knowledgegate.ai
Transaction states
• ACTIVE - It is the initial state. Transaction remains in this state while it
is executing operations.
• PARTIALLY COMMITTED - After the final statement of a transaction has
been executed, the state of transaction is partially committed as it is
still possible that it may have to be aborted (due to any failure) since
the actual output may still be temporarily residing in main memory and
not to disk.
• FAILED - After the discovery that the transaction can no longer proceed
(because of hardware /logical errors). Such a transaction must be rolled
back.
• ABORTED - A transaction is said to be in aborted state when the when
the transaction has been rolled back and the database has been
restored to its state prior to the start of execution.
• COMMITTED - A transaction enters committed state after successful
completion of a transaction and final updation in the database.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Rollback is used to restore the data_________ [Amcat].
www.knowledgegate.ai
Ans: B www.knowledgegate.ai
Q) A transaction completes its execution is said to be.[Amcat].
(a) Saved
(b) Loaded
(c) Rolled
(d) Committed
www.knowledgegate.ai
Ans: d www.knowledgegate.ai
Q if the transaction is in which of the state that we can guarantee that
data base is in consistent state
a) aborted b) committed
www.knowledgegate.ai
www.knowledgegate.ai
Why we need concurrent execution
• Concurrent execution is necessary because-
• It leads to good database performance , less weighting time.
• Overlapping I/O activity with CPU increases throughput and response time.
www.knowledgegate.ai
www.knowledgegate.ai
PROBLEMS DUE TO CONCURRENT EXECUTION OF TRANSACTION
• But interleaving of instructions between transactions may also lead to many problems that can lead to
inconsistent database.
• Sometimes it is possible that even though individual transaction are satisfying the acid properties even though
the final statues of the system will be inconsistent.
www.knowledgegate.ai
www.knowledgegate.ai
Lost update problem / Write - Write problem-
• If there is any two write operation of different transaction on same data value,
and between them there is no read operations, then the second write over
writes the first write.
T1 T2
Read(A)
Write(A)
Write(A)
Commit
Commit
www.knowledgegate.ai
www.knowledgegate.ai
Dirty read problem/ Read -Write problem
• In this problem, the transaction reads a data item updated by another
uncommitted transaction, this transaction may in future be aborted
or failed.
• The reading transactions end with incorrect results.
T1 T2
Read(A)
Write(A)
Read(A)
Commit
Abort
www.knowledgegate.ai
www.knowledgegate.ai
Unrepeatable read problem
• When a transaction tries to read a value of a data item twice, and another transaction updates
the data item in between, then the result of the two read operation of the first transaction
will differ, this problem is called, Non-repeatable read problem
T1 T2
Read(A)
Read(A)
Write(A)
Read(A)
www.knowledgegate.ai
www.knowledgegate.ai
Phantom read problem
T1 T2
Read(A)
Delete(A)
Read(A)
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following scenarios may lead to an irrecoverable error in a database
system?
(A) A transaction writes a data item after it is read by an uncommitted transaction
www.knowledgegate.ai
www.knowledgegate.ai
Solution is Schedule
• When two or more transaction executed together or one after another then they can be
bundled up into a higher unit of execution called schedule.
• A schedule of n transactions T1, T2, ..., Tn is an ordering of the operations of the transactions.
Operations from different transactions can be interleaved in the schedule S.
• However, schedule for a set of transaction must contain all the instruction of those
transaction, and for each transaction Ti that participates in the schedule S, the operations of Ti
in S must appear in the same order in which they occur in Ti.
www.knowledgegate.ai
www.knowledgegate.ai
• Serial schedule - A serial schedule consists of sequence of instruction belonging to
different transactions, where instructions belonging to one single transaction appear
together. Before complete execution of one transaction another transaction cannot be
started.
www.knowledgegate.ai
www.knowledgegate.ai
• So the number of schedules for n different transaction T1,T2,T3,---------TN where
each transaction conations n1, n2, n3, -----------,n4 respectively will be.
www.knowledgegate.ai
www.knowledgegate.ai
• Conclusion of schedules
• We do not have any method to proof that a schedule is consistent, but from the above
discussion we understand that a serial schedule will always be consistent, so if somehow
we proof that a non-serial schedule will also have same effects as of a serial schedule that
we get a proof that, this particular non-serial schedule will also be consistent “find those
schedules that are logically equal to serial schedules”.
www.knowledgegate.ai
www.knowledgegate.ai
On the basis of On the basis of
SERIALIZABILITY RECOVERABILITY
Conflict
Recoverable
serializable
View
Cascadeless
serializable
Result
Strict
Equivalent
www.knowledgegate.ai
www.knowledgegate.ai
T1 T2 T1 T2 T1 T2 T1 T2
R(A) R(B) R(A) W(A)
R(B) R(A) W(A) R(A)
T1 T2 T1 T2 T1 T2 T1 T2
R(A) W(B) R(A) W(A)
W(B) R(A) W(A) R(A)
T1 T2 T1 T2 T1 T2 T1 T2
R(A) R(A) W(A) W(A)
R(A) R(A) W(A) W(A)
www.knowledgegate.ai
www.knowledgegate.ai
SERIALIZABILITY
• Conflicting instructions - Let I and J be two consecutive instructions belonging to two different
transactions Ti and Tj in a schedule S, the possible I and J instruction can be as-
• I= READ(Q), J=READ(Q) ->Non-conflicting
• I= READ(Q), J=WRITE(Q) ->Conflicting
• I= WRITE(Q), J=READ(Q) ->Conflicting
• I= WRITE(Q), J=WRITE(Q) ->Conflicting
• So, the instructions I and J are said to be conflicting, if they are operations by different
transactions on the same data item, and at least one of these instructions is a write operation.
www.knowledgegate.ai
www.knowledgegate.ai
Q In the context of concurrency control, a given pair of operations in a schedule is called conflict
schedule if
a. At least one of the operations is write operation
b. Both the operations are performed on the same data item
c. Both the operations are performed by different transactions
d. Both the operations are performed on different data items
Choose the correct answer from the options given below:
www.knowledgegate.ai
Ans:B www.knowledgegate.ai
CONFLICT SERIALIZABLE
• The schedules which are conflict equivalent to a serial schedule are called conflict serializable
schedule.
www.knowledgegate.ai
www.knowledgegate.ai
Q Let ri(z) and wi(z) denote read and write operations respectively on a data item z by a transaction Ti. Consider the
following two schedules.
• S1:r1(x)r1(y)r2(x)r2(y)w2(y)w1(x) S1 S2
• S2:r1(x)r2(x)r2(y)w2(y)r1(y)w1(x) T1 T2 T1 T2
R(X) R(X)
Which one of the following options is correct?
R(Y) R(X)
a) S1 is conflict serializable, and S2 is not conflict serializable R(X) R(Y)
b) S1 is not conflict serializable, and S2 is conflict serializable R(Y) W(Y)
c) Both S1 and S2 are conflict serializable W(Y) R(Y)
d) Neither S1 nor S2 is conflict serializable W(X) W(X)
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following schedule for transactions T 1, T2 and T3:
Which one of the schedules below is the correct serialization of the above?
(A) T1->>T3->>T2
(B) T2->>T1->>T3
(C) T2->>T3->>T1
(D) T3->>T1->>T2
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the transactions T1, T2, and T3 and the schedules S1 and S2 given below.
T1: r1(X); r1(Z); w1(X); w1(Z)
S1
T2: r2(Y); r2(Z); w2(Z) T1 T2 T3
T3: r3(Y); r3(X); w3(Y) R(X)
R(Y)
S1: r1(X); r3(Y); r3(X); r2(Y); r2(Z);w3(Y); w2(Z); r1(Z); w1(X); w1(Z) R(X)
R(Y)
R(Z)
S2: r1(X); r3(Y); r2(Y); r3(X); r1(Z);r2(Z); w3(Y); w1(X); w2(Z); w1(Z) W(Y)
W(Z)
R(Z)
Which one of the following statements about the schedules is TRUE? W(X)
W(Z)
(A) Only S1 is conflict-serializable.
(B) Only S2 is conflict-serializable. S2
(C) Both S1 and S2 are conflict-serializable. T1 T2 T3
R(X)
(D) Neither S1 nor S2 is conflict-serializable. R(Y)
R(Y)
R(X)
R(Z)
R(Z)
W(Y)
W(X)
W(Z)
W(Z)
www.knowledgegate.ai
www.knowledgegate.ai
VIEW SERIALIZABLE
• Why we need it – View serializability captures schedules that are not conflict-serializable yet still behave correctly.
• Relation to conflict serializability – Every conflict-serializable schedule is view-serializable, so we test view serializability only if
the conflict test fails.
• Role of a blind write – A non-conflict-serializable schedule without any blind write can never be view-serializable; having at
least one blind write merely makes it possible.
• How to test – List all serial orders and see if the schedule matches one on first read, read-from, and final write; a match
means the schedule is view-serializable.
• Complexity wise finding the schedule is view serializable or not is a NP- complete problem.
www.knowledgegate.ai
www.knowledgegate.ai
Check Conflict
Serializability
Check For
Not View
View
www.knowledgegate.ai
Serializable
Serializability www.knowledgegate.ai
• Two schedules S and S’ are view equivalent, if they satisfy following conditions –
• For each data item Q, if the transaction Ti reads the initial value of Q in schedule S , then then the
transaction Ti must, in schedule S’ ,also read the initial value of Q.
• If a transaction Ti in schedule S reads any data item Q, which is updated by transaction Tj, then a
transaction Ti must in schedule S’ also read data item Q updated by transaction Tj in schedule S’.
• For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule
S, then the same transaction must also perform final write(Q) in schedule S’.
www.knowledgegate.ai
www.knowledgegate.ai
View Serializable
www.knowledgegate.ai
www.knowledgegate.ai
View Serializable
www.knowledgegate.ai
item, are called Blind updation or BLIND WRITE.
Q A Schedule that is not conflict serializable and contains at least one blind write
then the schedule is
a) Always view serializable
b) Always non-serializable
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following statements (s) is/are TRUE?
S1: All view serializable schedules are also conflict serializable.
S2: All conflict serializable schedules are also view serializable.
S3: If a schedule is not conflict serializable then it is not view serializable
S4: If a schedule is not view serializable then it is not conflict serializable.
a) S1 and S2 only
b) S2 and S3 only
c) S2 and S4 only
d) S1 and S3 only
www.knowledgegate.ai
www.knowledgegate.ai
• NON- RECOVERABLE SCHEDULE: A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a
data item previously written by Ti, then the commit or abort operation of Ti appears before Tj. Such a schedule is
called Non- Recoverable schedule.
• RECOVERABLE SCHEDULE: A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data
item previously written by Ti, then the commit or abort of Ti must appear before Tj. Such a schedule is called
Recoverable schedule.
S S
T1 T2 T1 T2
R(X) R(X)
W(X) W(X)
R(X) R(X)
C C
C C
www.knowledgegate.ai
www.knowledgegate.ai
• CASCADING ROLLBACK: It is a phenomenon, in which a single transaction failure leads to a series of transaction rollbacks, is
called cascading rollback. Even if the schedule is recoverable the, the commit of transaction may lead lot of transaction to
rollback. Cascading rollback is undesirable, since it leads to undoing of a significant amount of work. Uncommitted reads are
not allowed in cascade less schedule.
• CASCADELESS SCHEDULE: To avoid cascading rollback, cascade less schedule are used. A schedule in which for each pair of
transactions Ti and Tj, such that if Tj reads a data item previously written by T i then the commit or abort of Ti must appear
before read operation of Tj. Such a schedule is called cascade less schedule.
S S
T1 T2 T3 T1 T2 T3
R(X) R(X)
W(X) W(X)
R(X) C
W(X) R(X)
R(X) W(X)
C C
C R(X)
www.knowledgegate.ai
C C
www.knowledgegate.ai
Check Dirty Read
Not Cascadleless
Both Recoverable and
Check order of Cascadless
commit
If order of commit is
same in which dirty
read is done,
www.knowledgegate.ai
recoverable otherwise
www.knowledgegate.ai
not
Q) A single transaction failure may result into a set of transaction rollbacks is known to be.
[Amcat].
www.knowledgegate.ai
Ans: d www.knowledgegate.ai
CONCURRENCY CONTROL
• Now we understood that if there is a schedule how to check whether it will work correctly or not i.e. weather it will maintain
the consistency of the data base or not. (conflict serializability, view serializability, recoverability and cascade less). Now we
will understand those protocol which guarantee how to design those schedules which ensure conflict serializability or other
properties.
• There are different approach or idea to ensure conflict serializability which is the most important property. So first we must
understand what is the possibility of conflict between two instruction and if somehow, we manage than the generated
schedule will always be conflict serializable.
• If we remember, two instructions are conflicting if and only if three things happen simultaneously
• Belong to different transactions
• Must operate on same data value
• At least one of them should be a write instruction
• if we think sufficiently, there will be no way to change any of these three conditions.
But actual problem is not that two instructions are trying to access same data base
but, they are trying to do that at same time. So point if we somehow by any
technique manage that two transaction do not access same data at same time then
ensuring conflict serializability will be easy.Now question is how to approach conflict
serializability, there are Three popular approaches to go forwards.
www.knowledgegate.ai
www.knowledgegate.ai
• Time stamping based method: - where before entering the system, a specific order is decided among the
transaction, so in case of a clash we can decide which one to allow and which to stop.
• Lock based method: - where we ask a transaction to first lock a data item before using it. So that no different
transaction can use a data at the same time, removing any possibility of conflict.
• 2 phase locking
• Basic 2pl
• Conservative 2pl
• Rigorous 2pl
• Strict 2pl
• Graph based protocol
www.knowledgegate.ai
www.knowledgegate.ai
Goals of a Protocol: - We desire the following properties from schedule generating protocols
• Concurrency should be as high as possible, as this is our ultimate goal because of which we are making all the
effort.
www.knowledgegate.ai
www.knowledgegate.ai
Q) Serializability of schedules can be ensured through a mechanism called.[Amcat].
www.knowledgegate.ai
www.knowledgegate.ai
Q) Characteristics of Good Concurrency Protocol. [Amcat]
(b) Its storage mechanisms and computational methods should be modest to minimize overhead.
(c) It must enforce some constraints on the structure of atomic actions of transactions.
www.knowledgegate.ai
www.knowledgegate.ai
TIME STAMP ORDERING PROTOCOL
• Basic idea of time stamping is to decide the order between the transaction before they enter in the system using
a stamp (time stamp), in case of any conflict during the execution order can be decided using the time stamp.
• Each transaction T gets a unique, fixed stamp TS(T) when it starts (system-clock value ⇒ always increasing &
unique). Serial-order rule: if TS(T₁) < TS(T₂), the executed schedule must be equivalent to T₁ before T₂.
• For every data item Q the protocol keeps two stamps:
• W-TS(Q) – largest stamp of any transaction that successfully wrote Q
• R-TS(Q) – largest stamp of any transaction that successfully read Q
www.knowledgegate.ai
www.knowledgegate.ai
• Suppose a transaction Ti request a read(Q)
• If TS(Ti ) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten. Hence, the
read operation is rejected, and Ti is rolled back.
• If TS(Ti )≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is set to the maximum
of R-timestamp(Q) and TS(Ti).
www.knowledgegate.ai
www.knowledgegate.ai
Properties
• Time stamp ordering protocol ensures conflict serializability. Because conflicting operations are processed in timestamp order,
since all the arcs in the precedence graph are of the form thus, there will be no cycles in the precedence graph.
• As we know that view is liberal form conflict so view serializability also holds good
• As there is a possibility of dirty read, and no restriction on when to commit, so can be irrecoverable and may suffer from
cascading rollback.
• At the time of request, here either we allow or we reject, so there is no idea of deadlock.
• If a schedule is not conflict serializable then it is not allowed by time stamp ordering scheme.
• But it is not necessary that all conflict serializable schedule generated by time stamping.
www.knowledgegate.ai
www.knowledgegate.ai
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
www.knowledgegate.ai
www.knowledgegate.ai
Lock Based Protocols
• To ensure isolation is to require that data items be accessed in a mutually exclusive manner i.e. while one
transaction is accessing a data item, no other transaction can modify that data item. Locking is the most
fundamental approach to ensure this. Lock based protocols ensure this requirement. Idea is first obtain a lock on
the desired data item then if lock is granted then perform the operation and then unlock it. In general, we
support two modes of lock because, to provide better concurrency.
• Shared mode: If transaction Ti has obtained a shared-mode lock (denoted by S) on any data item Q, then Ti can
read, but cannot write Q, any other transaction can also acquire a shared mode lock on the same data item(this is
the reason we called this shared mode).
• Exclusive mode: If transaction Ti has obtained an exclusive-mode lock (denoted by X) on any data item Q, then Ti
can both read and write Q, any other transaction cannot acquire either a shared or exclusive mode lock on the
same data item. (this is the reason we called this exclusive mode)
www.knowledgegate.ai
www.knowledgegate.ai
Lock –Compatibility Matrix
• Conclusion shared is compatible only with shared while exclusive is not compatible either with shared
or exclusive.
• To access a data item, transaction Ti must first lock that item, if the data item is already locked by
another transaction in an incompatible mode, or some other transaction is already waiting in non-
compatible mode, then concurrency control manager will not grant the lock until all incompatible locks
held by other transactions have been released. The lock is then granted.
www.knowledgegate.ai
www.knowledgegate.ai
Q) If a transaction obtains an exclusive lock on a row, it means that the transaction
wants to ……. that row.[Amcat].
(a) Select
(b) Update
(c) View
(d) Read
www.knowledgegate.ai
www.knowledgegate.ai
Q) A _________ lock is also called a Read-only lock. [Amcat].
www.knowledgegate.ai
www.knowledgegate.ai
• Lock based protocol do not ensure serializability as granting and T1 T2
releasing of lock do not follow any order and any transaction any time LOCK-X(A)
may go for lock and unlock. Here in the example below we can see, that READ(A)
even this transaction in using locking but neither it is conflict serializable WRITE(A)
nor independent from deadlock. UNLOCK(A)
LOCK-S(B)
• If we do not use locking, or if we unlock data items too soon after READ(B)
reading or writing them, we may get inconsistent states, as there exists a UNLOCK(B)
possibility of dirty read. On the other hand, if we do not unlock a data LOCK-X(B)
item before requesting a lock on another data item, concurrency will be READ(B)
poor. WRITE(B)
UNLOCK(B)
• We shall require that each transaction in the system follow a set of rules,
LOCK-S(A)
called a locking protocol, indicating when a transaction may lock and
READ(A)
unlock each of the data items for e.g. 2pl or graph based locking. UNLOCK(A)
• Locking protocols restrict the number of possible schedules.
www.knowledgegate.ai
www.knowledgegate.ai
T1 T2
Two phase locking protocol(2PL) LOCK-X(A)
• The protocol ensures that each transaction issue lock READ(A)
WRITE(A)
and unlock requests in two phases, note that each LOCK-S(B)
transaction will be 2 phased not schedule. READ(B)
LOCK-X(B)
• Growing phase- A transaction may obtain locks, but READ(B)
not release any locks. WRITE(B)
LOCK-S(A)
READ(A)
• Shrinking phase- A transaction may release locks, but
UNLOCK(B)
may not obtain any new locks. UNLOCK(A)
UNLOCK(B)
• Initially a transaction is in growing phase and acquires UNLOCK(A)
lock as needed and in between can perform operation
reach to lock point and once a transaction releases a
lock, it can issue no more lock requests i.e. it enters the
shrinking phase.
www.knowledgegate.ai
www.knowledgegate.ai
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
Rigorous 2PL
Strict 2PL
www.knowledgegate.ai
www.knowledgegate.ai
Properties
• 2PL ensures conflict serializability, and the ordering of transaction over lock points is itself a serializability order of
a schedule in 2PL.
• If a schedule is allowed in 2PL protocol then definitely it is always conflict serializable. But it is not necessary that
if a schedule is conflict serializable then it will be generated by 2pl. Equivalent serial schedule is based on the
order of lock points.
• View serializability is also guaranteed.
• Does not ensure freedom from deadlock
• May cause non-recoverability.
• Cascading rollback may occur.
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following concurrency protocol ensures both conflict serializability
and freedom from deadlock?
(1) 2 - phase Locking (2) Time stamp - ordering
www.knowledgegate.ai
www.knowledgegate.ai
Q) A system is in a ______ state if there exists a set of transactions such that every transaction in
the set is waiting for another transaction in the set. [Amcat].
(a) Idle
(b) Waiting
(c) Deadlock
(d) Ready
www.knowledgegate.ai
www.knowledgegate.ai
• Conservative 2PL: The idea is there is no growing phase transaction start directly from lock
point, i.e. transaction must first acquire all the required locks then only it can start execution.
If all the locks are not available then transaction must release the acquired locks and must
wait. Shrinking phase will work as usual, and transaction can unlock any data item anytime.
we must have a knowledge in future to understand what is data required so that we can use it
www.knowledgegate.ai
www.knowledgegate.ai
Q In conservative two phase locking protocol, a transaction?
www.knowledgegate.ai
www.knowledgegate.ai
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
www.knowledgegate.ai
www.knowledgegate.ai
• RIGOROUS 2PL: Requires that all locks be held until the transaction commits. This protocol requires that locking
be two phase and also all the locks taken be held by transaction until that transaction commit. Hence there is no
shrinking phase in the system.
• STRICT 2PL: that all exclusive-mode locks taken by a transaction be held until that transaction commits. This
requirement ensures that any data written by an uncommitted transaction are locked in exclusive mode until the
transaction commits, preventing any other transaction from reading the data. This protocol requires that locking
be two phase and also that exclusive –mode locks taken by transaction be held until that transaction commits. So
it is simplified form of rigorous 2pl
www.knowledgegate.ai
www.knowledgegate.ai
Q In a Rigorous 2 phase protocol
a) All shared locks held by the transaction are related after the transaction is committed
b) All exclusive locks held by the transaction are released after the transaction is committed
c) All locks held by the transaction are released after the transaction is committed
d) All locks held by the transaction are released before the transaction is committed
www.knowledgegate.ai
www.knowledgegate.ai
Conflict View Recoverability Cascadelessness Deadlock
Serializability Serializability Freedom
www.knowledgegate.ai
www.knowledgegate.ai
Q Consider the following two statements about database transaction schedules:
I. Strict two-phase locking protocol generates conflict serializable schedules that are also
recoverable.
II. Timestamp-ordering concurrency control protocol with Thomas Write Rule can generate view
serializable schedules that are not conflict serializable.
Which of the above statements is/are TRUE?
(a) Both I and II (b) Neither I nor II
(c) II only (d) I only
www.knowledgegate.ai
www.knowledgegate.ai
Q Which of the following statement is/are correct
a) Every conflict serializable schedule allowed under 2PL protocol is allowed by basic time
stamping protocol.
b) Every schedule allowed under basic time stamping protocol is allowed by Thomas-write rule
c) Every schedule allowed under Thomas-write rule is allowed by basic time stamping protocol
d) none
www.knowledgegate.ai
www.knowledgegate.ai