15 QueryOptimization
15 QueryOptimization
Query optimization
DB
 MG
                                   1
                                       DBMS Architecture
SQL INSTRUCTION
OPTIMIZER
                                    CONCURRENCY CONTROL
      MANAGEMENT OF ACCESS
            METHODS
      Index Files
                         System    DATABASE
                         Catalog
       Data Files
                                                            2
DB
 MG
                                           Query optimizer
                                                              3
DB
 MG
                                           Query optimizer
                                                                4
DB
 MG
                           Query optimizer
             SQL
            QUERY
      LEXICAL, SYNTACTIC
        AND SEMANTIC
           ANALYSIS
                                       5
DB
 MG
         Lexical, syntactic and semantic analysis
                                                                   6
DB
 MG
         Lexical, syntactic and semantic analysis
      Output
         Internal representation in (extended) relational
        algebra
      Why relational algebra?
         It explicitly represents the order in which operators
         are applied
            It is procedural (different from SQL)
         There is a corpus of theorems and properties
            exploited to modify the initial query tree
                                                             7
DB
 MG
                                    Query optimizer
                  SQL
                 QUERY
           LEXICAL, SYNTACTIC
             AND SEMANTIC               DATA
                ANALYSIS                DICTIONARY
        INTERNAL REPRESENTATION
      BASED ON RELATIONAL ALGEBRA
              ALGEBRAIC
             OPTIMIZATION
                                                     8
DB
 MG
                                 Algebraic optimization
                                                              9
DB
 MG
                                    Query optimizer
                  SQL
                 QUERY
           LEXICAL, SYNTACTIC
             AND SEMANTIC               DATA
                ANALYSIS                DICTIONARY
        INTERNAL REPRESENTATION
      BASED ON RELATIONAL ALGEBRA
              ALGEBRAIC
             OPTIMIZATION
             COST BASED
             OPTIMIZATION
                                                     10
DB
 MG
                                Cost based optimization
                                                              11
DB
 MG
                                 Cost based optimization
      Output
        Access program in executable format
           It exploits the internal structures of the DBMS
        Set of dependencies
           conditions on which the validity of the query plan
           depends
               e.g., the existence of an index
                                                                12
DB
 MG
                                             Query optimizer
                           SQL
                          QUERY
                   LEXICAL, SYNTACTIC
                     AND SEMANTIC                       DATA
                        ANALYSIS                        DICTIONARY
                INTERNAL REPRESENTATION
              BASED ON RELATIONAL ALGEBRA
                        ALGEBRAIC
                       OPTIMIZATION
                                                                         13
DB
 MG
      ACCESS PROGRAM              SET OF DEPENDENCIES
                                     Execution modes
      Compile and go
        Compilation and immediate execution of the
        statement
        No storage of the query plan
        Dependencies are not needed
                                                     14
DB
 MG
                                        Execution modes
                                                          15
DB
 MG
 Database Management Systems
Algebraic optimization
DB
 MG
                                     16
                                  Algebraic optimization
                           SQL
                          QUERY
                   LEXICAL, SYNTACTIC
                     AND SEMANTIC                       DATA
                        ANALYSIS                        DICTIONARY
                INTERNAL REPRESENTATION
              BASED ON RELATIONAL ALGEBRA
                        ALGEBRAIC
                       OPTIMIZATION
                                                                         17
DB
 MG
      ACCESS PROGRAM              SET OF DEPENDENCIES
                                 Algebraic optimization
                                                              18
DB
 MG
                                            Transformations
  1. Atomization of selection
            sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
                                                           19
DB
 MG
                                            Transformations
  1. Atomization of selection
            sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
  2. Cascading projections
            pX(E) ≡ pX (pX,Y(E))
                                                           20
DB
 MG
                                            Transformations
  1. Atomization of selection
            sF1 Ʌ F2 (E) ≡ sF2 (sF1 (E)) ≡ sF1 (sF2 (E))
  2. Cascading projections
            pX(E) ≡ pX (pX,Y(E))
  3. Anticipation of selection with respect to join
     (pushing selection down)
            sF (E1    E2) ≡ E1    (sF (E2))
            F is a predicate on attributes in E2 only
                                                           21
DB
 MG
                                              Transformations
                                                               22
DB
 MG
                                           Transformations
                                                               23
DB
 MG
                                           Transformations
                                                               24
DB
 MG
                                            Transformations
                                                               25
DB
 MG
                                          Transformations
                                                        26
DB
 MG
                                            Transformations
                                                        27
DB
 MG
                                            Transformations
                                                           28
DB
 MG
                                           Transformations
  9. Other properties
           sF1 V F2(E) ≡ (sF1 (E))  (sF2 (E))
           sF1 Ʌ F2(E) ≡ (sF1 (E))  (sF2 (E))
                                                      29
DB
 MG
                                        Transformations
                                                    30
DB
 MG
                                            Example
      Tables
          EMP (Emp#, ………, Dept#, Salary)
          DEPT (Dept#, DName,……………)
SQL query
                                                31
DB
 MG
              Example: Algebraic transformations
                                                                  32
DB
 MG
              Example: Algebraic transformations
Prop #1
                                                                  33
DB
 MG
              Example: Algebraic transformations
Prop #1
Prop #5
                                                                  34
DB
 MG
      Example: Algebraic transformations
Prop #3
                                             35
DB
 MG
              Example: Algebraic transformations
Prop #3
Prop #2 and #4
                                                                  36
DB
 MG
                                       Example: Query tree
pDept# pDept#,DName
sSalary>1000 DEPT
EMP
                                                          37
DB
 MG
                            Example: Cardinalities
                                                   38
DB
 MG
 Database Management Systems
DB
 MG
                                      39
                              Cost based optimization
                           SQL
                          QUERY
                   LEXICAL, SYNTACTIC
                     AND SEMANTIC                       DATA
                        ANALYSIS                        DICTIONARY
                INTERNAL REPRESENTATION
              BASED ON RELATIONAL ALGEBRA
                        ALGEBRAIC
                       OPTIMIZATION
                                                                         40
DB
 MG
      ACCESS PROGRAM              SET OF DEPENDENCIES
                                Cost based optimization
      It is based on
        Data profiles
           statistical information describing data distribution for
           tables and intermediate relational expressions
        Approximate cost formulas for access operations
           Allow evaluating the cost of different alternatives for
           executing a relational operator
                                                                 41
DB
 MG
 Database Management Systems
Data profiles
DB
 MG
                                 42
                                                 Table profiles
                                                                43
DB
 MG
                                              Table profiles
                                                                44
DB
 MG
                                                    Data profiles
                                                                      45
DB
 MG
 Database Management Systems
Access operators
DB
 MG
                                  46
                                                        Query tree
pDept# pDept#,DName
sSalary>1000 DEPT
EMP
                                                              47
DB
 MG
                                             Query tree
                                                     48
DB
 MG
                                           Sequential scan
                                                        49
DB
 MG
                                                 Sorting
                                                     50
DB
 MG
                                    Predicate evaluation
                                                           51
DB
 MG
                                                               B+-tree versus bitmap
                                                     Bitmap VS B-Tree
                                 600
                                 500
               Disk space (MB)
400
300
200
100
                                  0
                                       0   5    10   15   20   25        30   35   40   45
                                                                    NK
                                                          B-Tree     Bitmap
      B-tree                                   NRLen(Pointer)
      Bitmap                                   NR  NK  1 bit                Courtesy of Golfarelli, Rizzi,
                                                                              ”Data warehouse, teoria e
      Len(Pointer) = 48 bit                                                  pratica della progettazione”,
                                                                              McGraw Hill 2006          52
DB
 MG
                                      Predicate evaluation
                                                             53
DB
 MG
                             Example: Predicate evaluation
      5      F        Y       Piemonte       0        1         1
                                             1        1         0
                                             0        0         0
                                             0        0         0
                                             1        1         1
DB                                       RID 5                      54
 MG
                                         Predicate evaluation
                                                            55
DB
 MG
                                              Join operation
                                                               56
DB
 MG
                                                 Nested loop
a a
                        internal         a
                        or direct scan
                            join
                         attribute                         57
DB
 MG
                                                Nested loop
                                                            58
DB
 MG
                                                   Nested loop
      Efficient when
         inner table is small and fits in memory
            optimized scan
         join attribute in the inner table is indexed
            index scan
      Execution cost
         The nested loop join technique is not symmetric
         The execution cost depends on which table takes
         the role of inner table
                                                           59
DB
 MG
                                               Merge scan
                               right
      Left table       left    scan        Right table
                       scan
                   A                   A
                   a                   a
                                       b
                                       d
                   b
                   b
                   c
                                       e
                   e
                             join
                          attribute                      60
DB
 MG
                                                  Merge scan
                           j                 j
                           p                 z
                               Join
                                                                62
DB
 MG
                               Attribute
                                                      Hash join
                                                               63
DB
 MG
                                      Bitmapped join index
                                                             65
DB
 MG
                                           Bitmapped join
                                                          66
DB
 MG
                            Example: Bitmapped join
                                                       67
DB
 MG
                                           Bitmapped join
2 0 … 1 0 0 1 1
3 0 … 0 1 0 OR 1 = 1
4 1 … 0 0 0 0 0
  …     …    …     …     …         1                0            1
                                   …             …            …
                                                             68
DB
 MG
                                                     Bitmapped join
                                                                         69
DB
 MG
                                                Group by
      Sort based
         Sort on the group by attributes
         Next compute aggregate functions on groups
      Hash based
         Hash function on the group by attributes
         Next sort each bucket and compute aggregate
         functions
      Materialized views may be exploited to improve
      the performance of aggregation operations
                                                       70
DB
 MG
 Database Management Systems
DB
 MG
                                      71
                              Cost based optimization
      Inputs
         Data profiles
         Internal representation of the query tree
      Output
         “Optimal” query execution plan
         Set of dependencies
      It evaluates the cost of different alternatives for
         reading each table
         executing each relational operator
      It exploits approximate cost formulas for access
      operations
                                                            72
DB
 MG
                  General approach to optimization
                                                           73
DB
 MG
                 General approach to optimization
                                                         74
DB
 MG
                                                     Example
      Given 3 tables
         R, S, T
      Compute the join
                            R      S     T
      Execution alternatives
         4 join techniques to evaluate (for both joins)
         3 join orders
         In total, at most
             4 * 4 * 3 = 48 different alternatives
                                                          75
DB
 MG
                                                                                                Example
                                         R             1
                                                           S   2
                                                                   T
 R             1
                   S         2
                                 T           S         1
                                                           T   2
                                                                   R            R       1
                                                                                            T          2
                                                                                                           S
R INNER S INNER
T INNER T OUTER
                                                                                                               76
DB
 MG                     LEAF NODE
                       Best execution plan selection
                                                       77
DB
 MG
                       Best execution plan selection
                                                             78
DB
 MG