0% found this document useful (0 votes)

244 views13 pages

CSE 444 Practice Problems

Plan B may be faster than Plan A for two reasons: 1. Hash join in Plan B is likely to be more efficient than nested loops join in Plan A. Hash join typically has better performance than nested loops join when the relations are large. 2. Applying selection conditions earlier in Plan B (on A before the join) may reduce the amount of data that needs to be processed in subsequent operations. Applying selection as early as possible improves efficiency by reducing the data that needs to be processed in later operations.

Uploaded by

Yaikob Kebede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

244 views13 pages

CSE 444 Practice Problems

Uploaded by

Yaikob Kebede

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CSE 444 Practice Problems

DBMS Architecture

1. Data Independence

(a) What is physical data independence?

Solution:
Physical data independence is a property of a DBMS that ensures decoupling between the physical
layout of data and applications which access it. In other words, with physical independence,
applications are insulated from changes in physical storage details: changes in how the data is
stored do not cause application changes.

(b) What properties of the relational model facilitate physical data independence?
Solution:
Declarative query language or set-at-a-time query language.

1
(c) What is logical data independence?
Solution:
Logical data independence is a property of a DBMS that ensures decoupling between the logical
structure of data and applications that operate on it. With this property, changes in the logical
data layout like tables, rows, and columns do not require application to be changed.

(d) How can one provide a high level of logical data independence with the relational model?
Solution:
By defining views.

2
2. High-Level DBMS Architecture
You should know the key components of a relational DBMS. Please see the lecture notes for an overview.
You should also be able to discuss what you implemented in the labs. For example, we could ask you
for a high-level description of your buffer manager.

3
3. Data Storage and Indexing
You should be able to show what happens when one adds/removes data to/from a B+ Tree. Please
see the web quizzes for some good examples.

4
4. Data Storage and Indexing
Suppose we have a relation R(a,b,c,d,e) and there are at least 1000 distinct values for each of the
attributes.
Consider each of the following query workloads, independently of each other. If it is possible to speed it
up significantly by adding up to two additional indexes to relation R, specify for each index (1) which
attribute or set of attributes form the search key of the index, (2) if the index should be clustered or
unclustered, (3) if the index should be a hash-based index or a B+-tree. You may add at most two
new indexes. If adding a new index would not make a significant difference, you should say so. Give a
brief justification for your answers.

(a) 100,000 queries have the form: select * from R where b <?
10,000 queries have the form: select * from R where c =?

Solution:
Since we need efficient range-queries on R(b), we definitely want a clustered, B+-tree index on
R(b).
For the second query, an index on R(c) will help. It can be either a B+-tree or a hash-based index
since queries look-up specific key values. The index must be unclustered since the index on R(b)
is clustered. That is fine, however, since queries will look-up specific key values and we know that
there are many distinct values in the relation.

(b) 100,000 queries have the form: select * from R where b <? and c =?
10,000 queries have the form: select * from R where d =?
1,000 queries have the form: select * from R where a =?

Solution:
For the first query, a clustered B+-tree index on R(c,b) would be most helpful since we could use
it to look-up all data items that match both the given value on c and the range on b.
Since we can only add a second index, we will favor the most frequent query and add an index on
R(d). As in the question above, this index must be unclustered and can be either a B+-tree or a
hash-based index.

5
(c) 100,000 queries have the form: select a, c from R where b <?
10,000 queries have the form: select * from R where d <?

Solution:
Since both queries are range-selection queries, we need clustered indexes for both of them, but we
cannot have more than one such index. However, we can have a covering index.
We thus recommend:
• A clustered, B+-tree index on R(d).
• An unclustered, B+-tree index on R(b,a,c). This is also a covering index in that we only need
to use the index to answer the query. We don’t need to touch the data.

6
5. Relational Algebra and Query Processing
Consider three tables R(a,b,c), S(d,e,f), and T(g,h,i).

(a) Consider the following SQL query:

SELECT R.b
FROM R, S, T
WHERE R.a = S.d
AND S.e = T.g
AND T.h > 21
AND S.f < 50
GROUP BY R.b
HAVING count(*) > 2
For each of the following relational algebra expressions, indicate if it is a correct translation of
the above query or not.
i. πR.b (σTOTAL>2 (γR.b,count(∗)→TOTAL (σT.h>21 AND S.f <50 (R 1R.a=S.d (S 1S.e=T.g T )))))

CORRECT INCORRECT
Solution:
CORRECT.

ii. πR.b (σTOTAL>2 (γR.b,count(∗)→TOTAL (R 1R.a=S.d ((σS.f <50 (S)) 1S.e=T.g (σT.h>21 (T ))))))

CORRECT INCORRECT
Solution:
CORRECT.

iii. πR.b (σTOTAL>2 (σT.h>21 AND S.f <50 (γR.b,count(∗)→TOTAL (R 1R.a=S.d (S 1S.e=T.g T )))))

CORRECT INCORRECT
Solution:
INCORRECT.

7
(b) For the following SQL query, show an equivalent relational algebra expression. You can give the
expression in the same format as we used above or you can draw it in the form of an expression
tree or logical query plan.
SELECT R.b
FROM R, S
WHERE R.a = S.d
AND R.b NOT IN (SELECT R2.b FROM R as R2, T WHERE R2.b = T.g)
Solution:
πR.b (R 1R.a=S.d S) − πR.b (R 1R.b=T.g T )

8
(c) A user just connected to a database server and submitted a SQL query in the form of a string.
Give four important steps involved in evaluating that SQL query and the order in which they
are performed. You only need to name the steps. No need to explain them.
Solution:

(d) Query parsing: the DBMS parses the SQL string into an internal tree representation of the
query. Typically, a variety of checks are also performed at that time including syntax checks,
authorization checks, simple integrity constraint checks, etc.
(e) Query rewrite: the parse-tree is converted into an initial logical query plan. During that step,
views are also rewritten and queries are flattened if possible.
(f) Query optimization: the query optimizer searches for an efficient physical query plan to execute
the query.
(g) Query execution: the DBMS actually executes the query and returns the results to the user.

9
(h) What is the difference between a logical and a physical query plan?

Solution:
A logical query plan is an extended relational algebra tree. A physical query plan is a logical query
plan with extra annotations that specify the (1) access path to use for each relation (whether to use
an index or a file scan and which index to use), the (2) implementation to use for each relational
operator, and (3) how the operators should be executed (pipelined execution, intermediate result
materialization, etc.).

10
6. Relational Algebra and Query Processing
Consider four tables R(a,b,c), S(d,e,f), T(g,h), U(i,j,k).

(a) Consider the following SQL query:

SELECT R.b, avg(U.k) as avg
FROM R, S, T, U
WHERE R.a = S.d
AND S.e = T.g
AND T.h = U.i
AND U.j = 5
AND (R.c + S.f) < 10
GROUP BY R.b
Draw a logical plan for the query. You may chose any plan as long as it is correct (i.e. no need to
worry about efficiency).

Solution:
Many solutions were possible including:

π R.b,avg

γ R.b, avg(U.k) -> avg

σ (U.j = 5) ∧ (R.c+S.f < 10)

T.h = U.i

S.e = T.g
U

R.a = S.d
T

R S

11
(b) Consider the following two physical query plans. Give two reasons why plan B may be faster
than plan A. Explain each reason.

(On the fly) π A.d,B.d (On the fly) π A.d,B.d

(On the fly) σ B.c < 2000

A.b = B.b
(Hash join)

(Index nested loop) σ a>10 σ B.c < 2000

A.b = B.b

(Use B+ tree index) A B

σ a>10 (File scan) (File scan)

A B
(B+ tree index on A.a) (B+ tree index on B.b)

Plan A Plan B

Solution:
The following are three possible reasons why plan B could be faster than plan A:
• The low selectivity of the selection predicate (a > 10) and an unclustered index on A.a may
make a file scan of A faster than using the index.
• If there are lots of matches, hash joins may be faster than index nested loops. The hash join
reads its input only once (unless the input is too large). The index-nested loop may end-up
reading the same pages of B multiple times.
• Pushing the selection (B.c < 2000) down can get rid of lots of B tuples before the join,
reducing the cost of that operation.

12
7. Operator Algorithms
Relation R has 90 pages. Relation S has 80 pages. Explain how a DBMS could efficiently join these
two relations given that only 11 pages can fit in main memory at a time. Your explanation should be
detailed: specify how many pages are allocated in memory and what they are used for; specify what
exactly is written to disk and when.

(a) Present a solution that uses a hash-based algorithm.

Solution:
The algorithm proceeds as follows:
• First, we split R into partitions:
– Allocate one page for the input buffer.
– Allocate 10 pages for the output buffers: one page per partition.
– Read in R one page at the time. Hash into 10 buckets. As the pages of the different
buckets fill-up, write them to disk. Once we process all of R, write remaining incomplete
pages to disk. At the end of this step, we have 10 partitions of R on disk. Assuming
uniform data distribution, each partition comprises 9 pages.
• Then, we split S into partitions the same way we split R (must use same hash function).
• For each pair of partitions that match:
– Allocate one page for input buffer
– Allocate one page for the output buffer.
– Read one 9-page partition of R into memory. Create a hash table for it using a different
hash function than above.
– Read corresponding S partition into memory one page at the time. Probe the hash table
and output matches.

(b) Present a solution that uses a sort-based algorithm.

Solution:
The algorithm proceeds as follows
• First, we sort R:
– Read 10 pages worth of R tuples into memory, sort, and write to disk.
– Repeat for next 10 pages until all R tuples have been processed.
– Now we have 9 runs of 10 pages sorted on disk.
– Allocate one page per run and one page for the output.
– Merge runs in sorted order and output into file.
• Sort S the same way that we sorted R above.
• Read S and R one page at the time. Merge them and output matches.
Note that we can be more efficient if we use a priority queue and output runs of length twice
the size of memory in the first step of the algorithm. Then, we can join R and S together while
merging their individual runs.

Final 15
No ratings yet
Final 15
7 pages
Databases II Midterm Solution
No ratings yet
Databases II Midterm Solution
15 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Chap12 Practice Key
No ratings yet
Chap12 Practice Key
3 pages
SE3060 - Database Systems
No ratings yet
SE3060 - Database Systems
6 pages
Lec 13
No ratings yet
Lec 13
26 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
QEII
No ratings yet
QEII
44 pages
NR 410210 Database Management System
No ratings yet
NR 410210 Database Management System
8 pages
DBMS
No ratings yet
DBMS
15 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Final Review
No ratings yet
Final Review
96 pages
Lecture11 Query Processing
No ratings yet
Lecture11 Query Processing
37 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
CMSPCOR02T Final Question Paper 2022
No ratings yet
CMSPCOR02T Final Question Paper 2022
2 pages
Solution 02
No ratings yet
Solution 02
6 pages
CSE 444: Database Internals: Section 4: Query Optimizer
No ratings yet
CSE 444: Database Internals: Section 4: Query Optimizer
16 pages
13 QP1
No ratings yet
13 QP1
33 pages
Midterm 15w2
No ratings yet
Midterm 15w2
8 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
22426913
No ratings yet
22426913
124 pages
Practice Optimizer Blank
No ratings yet
Practice Optimizer Blank
7 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
Query Execution
No ratings yet
Query Execution
87 pages
Database Exam for CS Students
100% (1)
Database Exam for CS Students
19 pages
Dbms Lab Manual 2015-16
No ratings yet
Dbms Lab Manual 2015-16
50 pages
Jntuk BT Cse 4 Data-Base-Management-Systems 2014
No ratings yet
Jntuk BT Cse 4 Data-Base-Management-Systems 2014
8 pages
Execution
No ratings yet
Execution
37 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
Query Processing for CS Students
No ratings yet
Query Processing for CS Students
47 pages
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
No ratings yet
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
3 pages
Assignment DBMS
No ratings yet
Assignment DBMS
6 pages
Dbmsass 2
No ratings yet
Dbmsass 2
2 pages
Dbms - 4th Sem
No ratings yet
Dbms - 4th Sem
2 pages
CS (H) DBMS May 2024
No ratings yet
CS (H) DBMS May 2024
8 pages
MidtermPracticeQuestions Solutions
No ratings yet
MidtermPracticeQuestions Solutions
6 pages
COMP3278 Dec2015
No ratings yet
COMP3278 Dec2015
10 pages
An Introduction To Database Systems Bipin C.desaI
No ratings yet
An Introduction To Database Systems Bipin C.desaI
849 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
B.Sc. H Computer SC RYNrJ3F
No ratings yet
B.Sc. H Computer SC RYNrJ3F
8 pages
DBMS Chap-4
No ratings yet
DBMS Chap-4
20 pages
Unit 3 - DBMS
No ratings yet
Unit 3 - DBMS
15 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Dbms Ques & Ans-1
No ratings yet
Dbms Ques & Ans-1
9 pages
Dbms ND 2011 - Cse Tube
No ratings yet
Dbms ND 2011 - Cse Tube
4 pages
Advanced Database Systems Chapter One Query Processing & Optimization
No ratings yet
Advanced Database Systems Chapter One Query Processing & Optimization
22 pages
2022 DBMS
No ratings yet
2022 DBMS
12 pages
Reg No.: - Name: - Name
No ratings yet
Reg No.: - Name: - Name
4 pages
Rec 1975
No ratings yet
Rec 1975
6 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
ADBT Unit-1
No ratings yet
ADBT Unit-1
17 pages
Query Optimization Techniques
100% (1)
Query Optimization Techniques
38 pages
OS Concepts for IT Students
No ratings yet
OS Concepts for IT Students
31 pages
Lab One Manual
No ratings yet
Lab One Manual
2 pages
Quiz 2: 6.893 Database Systems: Fall 2004
No ratings yet
Quiz 2: 6.893 Database Systems: Fall 2004
12 pages
Lab Two
No ratings yet
Lab Two
3 pages
Wollega University: M.Sc. Proposal On
100% (1)
Wollega University: M.Sc. Proposal On
45 pages
01chapter One-Introduction To OOP
No ratings yet
01chapter One-Introduction To OOP
17 pages
Oodbms Chapter 1
No ratings yet
Oodbms Chapter 1
2 pages
Advanced Database Management Systems: Query Processing: Chapter 1
No ratings yet
Advanced Database Management Systems: Query Processing: Chapter 1
23 pages
1 W Kirubel Tesfaye PDF
No ratings yet
1 W Kirubel Tesfaye PDF
114 pages
Example: Array Scanning Searching
No ratings yet
Example: Array Scanning Searching
5 pages
CamScanner Document Scan
No ratings yet
CamScanner Document Scan
2 pages
Declaration: GSM Based Home Security System February 17, 2015
No ratings yet
Declaration: GSM Based Home Security System February 17, 2015
31 pages
Crime File Management System Project Report JAVA
100% (2)
Crime File Management System Project Report JAVA
153 pages
Northwind Erd PDF
No ratings yet
Northwind Erd PDF
1 page
17CS61 - Chethana C
No ratings yet
17CS61 - Chethana C
309 pages
EQLLP Hrishikesh Gore Designer Parin Group Rajkot
No ratings yet
EQLLP Hrishikesh Gore Designer Parin Group Rajkot
5 pages
Radio Broadcasting in India Pre and Post Independence
No ratings yet
Radio Broadcasting in India Pre and Post Independence
7 pages
THC 8 Syllabus 2022 1
No ratings yet
THC 8 Syllabus 2022 1
14 pages
HAZMAT Loading Guide for Mariners
No ratings yet
HAZMAT Loading Guide for Mariners
8 pages
General Tour Guidelines
No ratings yet
General Tour Guidelines
4 pages
CP4291 IOT LAb MANUAL-1
No ratings yet
CP4291 IOT LAb MANUAL-1
37 pages
BS Accountancy Sample Thesis
78% (9)
BS Accountancy Sample Thesis
8 pages
Company Profile (ACE VALVE)
100% (1)
Company Profile (ACE VALVE)
20 pages
Proposed Introduction Budget - Fred and Lovicer
100% (6)
Proposed Introduction Budget - Fred and Lovicer
2 pages
Wyoming Investment Overview
No ratings yet
Wyoming Investment Overview
3 pages
R1 ObliCon
No ratings yet
R1 ObliCon
7 pages
Mobile Games for Pokémon Fans
No ratings yet
Mobile Games for Pokémon Fans
2 pages
Materi Pendahuluan Exergy
No ratings yet
Materi Pendahuluan Exergy
21 pages
Wa0001
No ratings yet
Wa0001
10 pages
Examination For Cisco Voice
No ratings yet
Examination For Cisco Voice
12 pages
Harvard Admissions 2025 Brochure
No ratings yet
Harvard Admissions 2025 Brochure
36 pages
Hydraulic Cylinder
No ratings yet
Hydraulic Cylinder
30 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
18 pages
Ebook Cold Calling Tips and Million Dollar Sales Prospecting Secrets PDF
No ratings yet
Ebook Cold Calling Tips and Million Dollar Sales Prospecting Secrets PDF
16 pages
Hospital Incident Analysis Report
No ratings yet
Hospital Incident Analysis Report
8 pages
RC1602D Datasheet
No ratings yet
RC1602D Datasheet
1 page
COIS Level 3 Unit 3 Sample Paper 1
No ratings yet
COIS Level 3 Unit 3 Sample Paper 1
2 pages
List of High Schools, Ludhiana
No ratings yet
List of High Schools, Ludhiana
12 pages
KAIBEL CHEM Alisa
No ratings yet
KAIBEL CHEM Alisa
13 pages
Legal Analysis: Vehicle Seizure Dispute
No ratings yet
Legal Analysis: Vehicle Seizure Dispute
1 page
Lacoto Targe Julytodec 2023
No ratings yet
Lacoto Targe Julytodec 2023
2 pages
PUSOY-DOS Interractive Web Game
No ratings yet
PUSOY-DOS Interractive Web Game
4 pages
INMO Application
No ratings yet
INMO Application
6 pages

CSE 444 Practice Problems

Uploaded by

CSE 444 Practice Problems

Uploaded by

CSE 444 Practice Problems

(a) What is physical data independence?

(a) Consider the following SQL query:

(a) Consider the following SQL query:

γ R.b, avg(U.k) -> avg

σ (U.j = 5) ∧ (R.c+S.f < 10)

(On the fly) π A.d,B.d (On the fly) π A.d,B.d

(On the fly) σ B.c < 2000

(Index nested loop) σ a>10 σ B.c < 2000

(Use B+ tree index) A B

(a) Present a solution that uses a hash-based algorithm.

(b) Present a solution that uses a sort-based algorithm.

You might also like