0% found this document useful (0 votes)
5 views5 pages

Unit 3

The document covers the process of Query Processing and Optimization in Database Management Systems, detailing the steps of parsing, optimization, and evaluation involved in extracting data. It discusses measures for query cost estimation, including the evaluation of disk accesses and execution time, as well as techniques like pipelining and materialization to enhance performance. The content is structured around the unit's focus on the architecture and functioning of query processing systems in the context of a database course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Unit 3

The document covers the process of Query Processing and Optimization in Database Management Systems, detailing the steps of parsing, optimization, and evaluation involved in extracting data. It discusses measures for query cost estimation, including the evaluation of disk accesses and execution time, as well as techniques like pipelining and materialization to enhance performance. The content is structured around the unit's focus on the architecture and functioning of query processing systems in the context of a database course.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lecture Notes

Session: 2024-25

Course Name :Database Management System Semester: 4th


Course Code: CSPC250T​ ​ ​ ​ ​

UNIT No. : 3
Query Processing and Optimization
Query Processing and Optimization process, measures of query cost estimation in query optimization,
pipelining and Materialization, Structure of query evaluation plans.

3.1 Query Processing and Optimization process

Query Processing is the activity performed in extracting data from the database. In query processing, it
takes various steps for fetching the data from the database. The steps involved are:

1.​ Parsing and translation


2.​ Optimization
3.​ Evaluation

The query processing works in the following way:


3.1.1 Parsing

During the parse call, the database performs the following checks: Syntax check, Semantic check, and
Shared pool check, after converting the query into relational algebra because certain activities for data
retrieval are included in query processing. First, high-level database languages like SQL are used to
translate the user queries that have been provided. It is transformed into expressions that can be applied
further at the file system’s physical level. Following this, the queries are actually evaluated along with a
number of query-optimizing transformations. Consequently, a computer system must convert a query
into a language that is readable and understandable by humans before processing it. Therefore, the best
option for humans is SQL or Structured Query Language.
Parser performs the following checks (refer to the detailed diagram):
Syntax check: concludes SQL syntactic validity.
Example:
SELECT * FORM employee

3.1.2 Optimization

During the optimization stage, the database must perform a hard parse at least for one unique DML
statement and perform optimization during this parse. This database never optimizes DDL unless it
includes a DML component such as a subquery that requires optimization. It is a process in which
multiple query execution plans for satisfying a query are examined and the most efficient query plan is
satisfied for execution. The database catalog stores the execution plans and then the optimizer passes the
lowest-cost plan for execution.

3.1.3 Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated
relational algebra expression with the instructions used for specifying and evaluating each operation.
Thus, after translating the user query, the system executes a query evaluation plan.

Query Evaluation Plan

○​ In order to fully evaluate a query, the system needs to construct a query evaluation plan.
○​ The annotations in the evaluation plan may refer to the algorithms to be used for the particular
index or the specific operations.
○​ Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.
○​ Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
○​ A query execution engine is responsible for generating the output of the given query. It takes the
query execution plan, executes it, and finally makes the output for the user query.

3.2 Measures of query cost estimation in query optimization


Query Cost is a cost in which the enhancer considers what amount of time your query will require
(comparative with absolute clump time). Then the analyzer attempts to pick the most ideal query plan by
taking a glance at your inquiry and insights of your information, attempting a few execution designs, and
choosing the most inexpensive of them.
The measures of query cost in DBMS can be done by creating a framework that can make numerous
designs for an inquiry. It tends to be finished by the means of contrasting every conceivable arrangement
as far as their assessed cost. For working out the net assessed cost of any arrangement, the expense of
every activity inside an arrangement ought to be set in a deterministic and consolidated cost to get the
net assessed cost of the query assessment plan.
Example: We utilize the number of square exchanges that is basically the block from the disk and the
quantity of the disk seeks to appraise the expense of a query assessment plan. Assuming that the disk
subsystem takes a normal of tT seconds to move a square of information and has a normal block access
time (disk lookup time in addition to rotational idleness) of tS seconds, then, at that point, an activity that
moves b obstructs and performs S looks for would take b ∗ tT + S ∗ tS seconds. The upsides of tT and tS
should be aligned for the disk framework utilization, however, normal qualities for top-end disk today
would be tS = 4 milliseconds and tT = 0.1 milliseconds, expecting a 4-kilobyte block size and an
exchange pace of 40 megabytes each second.
tT – time to transfer one block
tS – time for one to seek
Cost for b block transfers plus S seeks
b * tT + S * tS

The expense assessment of a query assessment plan is determined by keeping in mind the different assets
that follow as:
The number of disk accesses.
Time of Execution taken by the CPU to execute a query.
The involved Communication costs in either distributed or parallel database systems.
To gauge the expense of a query assessment plan, utilizing the number of blocks moved from the disk,
and the quantity of disk seeks looks for. For the most part, for assessing the expense, we must consider
the most pessimistic scenario that could occur which is the worst-case scenario. The clients accept that at
first, the information is perused from the disk as it were. However, there should be an opportunity that
the data is now present in the principle memory also called the main memory. Notwithstanding, the
clients generally disregard this impact, and because of this, the genuine expense of execution comes out
lesser than the assessed esteem.
The reaction time, i.e., the time that is expected so as to execute the arrangement could be utilized for
assessing the expense of the query assessment plan. Yet, because of the accompanying reasons, it
becomes hard to work out the reaction time without really executing the query assessment plan. As soon
as the query starts the process of execution, the time of reaction becomes reliant upon the substance put
away in support. In any case, this data is so hard to recover that when the query is in its upgraded mode,
or it isn’t accessible.
Whenever there is a framework with the various presence of disks available, the time of reaction relies
upon a cross-examination that is ready “what is the way that accesses gets to be circulated among the
disk available?”. It is hard to gauge without having itemized information on the information format
present over the disk. Thus, rather than limiting the time of reaction for any query assessment plan, the
analyzers observe that it is better to lessen the wholesome asset utilization of the query plan. Along with
these lines to appraise the expense of a query assessment plan, it is great to limit the assets utilized for
getting to the disk or utilization of the additional assets.
3.3 Pipelining
Database system processing in a satisfactory manner encompasses providing fast responses to data
retrieval and manipulation tasks, with two of the keywords being performance and responsiveness. A
concept that acts as the foundational element in improving batch processing performance is called
"pipeline." In this article, the network of rungs or pipes that organize the fetching of data from queries
for display will be discussed, (their structure, functioning, pros, and cons).
Pipelining in Query Processing
Pipelining in the query treatment means the method is based on the approach of splitting the query
processor into multiple mini-processes, which help to perform parallel tasks and, as a result, increase the
efficiency of the queries. The pipeline architecture organizes the series of operations that are performed
on the data in a way that the output of the current stage becomes the input for the next stage, therefore
ensuring that the data resulting from as many stages as possible is directly loaded into the computation
of the next stages that relieve the overall performance of the system.

Components of Pipeline in Query Processing


The pipeline in query processing typically consists of the following components:
Parsing and Optimization: In this stage, the selected query is parsed to locate the specific elements like
tables, columns, conditions, and so on. The nearest query optimizer must step in here and select the most
appropriate execution plan among several based on specific data access plans such as distribution of
indexes and use of join algorithms.
Execution: When the query has been optimized, this process is seen by execution phase of the query,
where each operation indicated in the execution plan is performed. Processing data may involve such
activities as disk accessing, performing join and aggregation operations, and applying the filtering
operations.
Result Generation: At the upcoming stage of the pipeline, the production output is made, and it is based
on which operations have been carried out earlier. This includes, but is not limited to, allocating,
grouping, or arranging the data as the query may direct.
Functioning of Pipeline in Query Processing
The piped pathways work on the same principle of pipelined execution, wherein the different stages of
query processing overlap to maximize the number of result rows fetched and to minimize the end-user
latency. With the data flowing through the pipeline, each stage is working alongside the others like a
team, working on the query data in a streaming manner, not having to wait for the entire query to
complete before moving downstream. Benefiting from this pipelined process resource are the CPU,
memory, and I/O that consequently inspire performance and response times.

3.4 Materialization
Materialization is an easy approach for evaluating multiple operations of the given query and storing the
results in the temporary relations. The result can be the output of any join condition, selection condition,
and many more. Thus, materialization is the process of creating and setting a view of the results of the
evaluated operations for the user query. It is similar to the cache memory where the searched data get
settled temporarily. We can easily understand the working of materialization through the pictorial
representation of the expression. An operator tree is used for representing an expression.
The materialization uses the following approach for evaluating operations of the given expression:
●​ In the operator tree, we begin from the lowest-level operations (at the bottom of the tree) in the
expression. The inputs to the lowest level operations are stored in the form of relations in the
database. For example, suppose we want to fetch the name of the student as 'John' from the
'Student' relation.​
The relation expression will be:​
σ name= "John" (Student)​
In this example, there is only one operation of selecting the name from the given relation. Also,
this operation is the lowest-level operation. So, we will begin by evaluating this selection
operation.
●​ Now, we will use an appropriate algorithm which is suitable for evaluating the operation. Like in
our example, we will use an appropriate selection algorithm for retrieving the name from the
Student relation.
●​ Then, store the result of the operation in the temporary relations.
●​ We use these temporary relations for evaluating the next-level operation in the operator tree. The
result works as an input for every next level up in the tree.
●​ Repeat these steps until all operators at the root of tree will be evaluated, and the final result of
the expression will be generated.
We also call the described evaluation as Materialized evaluation because the result of one operation is
materialized and used in the evaluation of next operation and so on.

You might also like