19/03/12
Machine Learning:
Algorithms and Applications
Floriano Zini
Free University of Bozen-Bolzano
Faculty of Computer Science
Academic Year 2011-2012
Lecture 4: 19th March 2012
    Evolutionary computing
These slides are mainly taken from A.E. Eiben and J.E. Smith,
Introduction to Evolutionary Computing
                                                                      1
                                                         19/03/12
 The Main EC Metaphor
    EVOLUTION                       PROBLEM SOLVING
     Environment                    Problem
           Individual               Candidate Solution
             Fitness                Quality
   Fitness → chances for survival and reproduction
   Quality → chance for seeding new solutions
   Fitness in nature: observed, 2ndary, EC: primary
 Motivations for EC
  Developing,
             analyzing, applying problem solving
  methods a.k.a. algorithms is a central theme in
  mathematics and computer science
  Time   for thorough problem analysis decreases
  Complexity    of problems to be solved increases
  Consequence:ROBUST PROBLEM SOLVING
  technology needed
                                                               2
                                                     19/03/12
  Evolutionary machine learning
     Wehave corresponding sets of inputs &
      outputs and seek model that delivers correct
      output for every known input
  Modelling example: load applicant creditibility
  British
         bank evolved
  creditability model to
  predict loan paying
  behavior of new
  applicants
  Evolving:   prediction
  models
  Fitness:
           model accuracy
  on historical data
                                                           3
                                                              19/03/12
Modelling example: mushroom edibility
  Classify
           mushrooms as
  edible or not edible
  Evolving:   classifications
  models
  Fitness:
           classification
  accuracy on training
  set of edible and not
  edible mushrooms
  EC metaphor
   Apopulation of individuals exists in an environment
    with limited resources
   Competition   for those resources causes selection of
    those fitter individuals that are better adapted to the
    environment
   Theseindividuals act as seeds for the generation of
    new individuals through recombination and mutation
   The
       new individuals have their fitness evaluated and
    compete (possibly also with parents) for survival.
   Over time Natural selection causes a rise in the
    fitness of the population
                                                                    4
                                                           19/03/12
                General scheme of EAs
                     Parent selection
                                         Parents
Intialization
                                           Recombination
                                           (crossover)
                 Population
                                            Mutation
Termination
                                         Offspring
                    Survivor selection
    EA scheme in pseudo-code
                                                                 5
                                                                           19/03/12
  Main EA components
     Representation
     Population
     Evaluation
     Selection    (parent selection, survivor selection)
     Variation    (mutation, recombination)
     Initialization
     Termination      condition
  Representation
                                                          Genotype space
Phenotype space                Encoding
                           (representation)                R0c01cd
                                                        B0c01cd
                                                          G0c01cd
                                      Decoding
                              (inverse representation)
In order to find the global optimum, every feasible solution must be
represented in genotype space
                                                                                 6
                                                                                   19/03/12
  Population
  Role:
        holds the candidate solutions of the
  problem as individuals (genotypes)
  Formally,a population is a multiset of individuals,
  i.e. repetitions are possible
  Population
             is the basic unit of evolution, i.e., the
  population is evolving, not the individuals
  Selection         operators act on population level
  Variation         operators act on individual level
  Evaluation (fitness) function
  A.k.a.        quality function or objective function
  Role:
       Represents the task to solve, the requirements to adapt to
         (can be seen as “the environment”)
       enables selection (provides basis for comparison)
             e.g., some phenotypic traits are advantageous, desirable, e.g.
               big ears cool better, these traits are rewarded by more offspring
               that will expectedly carry the same trait
  Assigns
          a single real-valued fitness to each phenotype
  which forms the basis for selection
       So the more discrimination (different values) the better
  Typically        we talk about fitness being maximised
       Some problems may be best posed as minimisation
         problems, but conversion is trivial
                                                                                         7
                                                                      19/03/12
  Selection
 Role:
   Identifies individuals
       to become parents
       to survive
   Pushes  population towards higher fitness
   Usually probabilistic
       high quality solutions more likely to be selected than low
         quality
       but not guaranteed
       even worst in current population usually has non-zero
         probability of being selected
   This
       stochastic nature can aid escape from local
   optima
  Selection mechanism example
Example: roulette wheel selection
                                                       1/6 = 17%
                                                       B
         fitness(A) = 3                         A             C
         fitness(B) = 1                    3/6 = 50%    2/6 = 33%
         fitness(C) = 2
In principle, any selection mechanism can be used for
parent selection as well as for survivor selection
                                                                            8
                                                                     19/03/12
  Survivor selection
  A.k.a. replacement
  Most EAs use fixed population size so need a way
    of going from (parents + offspring) to next
    generation
  Often deterministic (while parent selection is
    usually stochastic)
       Fitness based : e.g., rank parents+offspring and take
         best
       Age based: make as many offspring as parents and
         delete all parents
  Sometimes a combination of stochastic and
  deterministic (elitism)
  Variation operators
  Role:      to generate new candidate solutions
  Usually
          divided into two types according to their
  arity (number of inputs):
       Arity 1 : mutation operators
       Arity >1 : recombination operators
           Arity   = 2 typically called crossover
           Arity   > 2 is formally possible, seldomly used in EC
  There
        has been much debate about relative
  importance of recombination and mutation
       Nowadays most EAs use both
       Variation operators must match the given representation
                                                                           9
                                                                      19/03/12
      Mutation
    Role: causes small, random variance
    Acts on one genotype and delivers another
    Element of randomness is essential and differentiates it from
      other unary heuristic operators
       before            1 1 1 1 1 1 1
        after            1 1 1 0 1 1 1
      Recombination
    Role: merges information from parents into offspring
    Choice of what information to merge is stochastic
    Most offspring may be worse, or the same as the parents
    Hope is that some are better by combining elements of
      genotypes that lead to good traits
                       Parents
           cut                           cut
      1 1 1 1 1 1 1               0 0 0 0 0 0 0
      1 1 1 0 0 0 0               0 0 0 1 1 1 1
                      Offspring
                                                                           10
                                                                    19/03/12
  Initialisation / Termination
  Initialisation   usually done at random
      Need to ensure even spread and mixture of possible allele
        values
      Can include existing solutions, or use problem-specific
        heuristics, to “seed” the population
  Termination      condition checked every generation
      Reaching some (known/hoped for) fitness
      Reaching some maximum allowed number of generations
      Reaching some minimum level of diversity
      Reaching some specified number of generations without
        fitness improvement
  Example: the 8-queens problem
  Place 8 queens on an 8x8 chessboard in
  such a way that they cannot check each other
                                                                         11
                                                            19/03/12
 The 8-queens problem: representation
Phenotype:
a board configuration
Genotype:                                 Obvious mapping
a permutation of
the numbers 1 - 8            1 3 5 2 6 4 7 8
 The 8-queens problem: fitness evaluation
   Penalty
          of one queen: the number of queens
   she can check
   Penalty
           of a configuration: the sum of
   penalties of all queens
   Note:   penalty is to be minimized
   Fitness
         of a configuration: inverse penalty to
   be maximized
                                                                 12
                                                               19/03/12
  The 8-queens problem: mutation
   Small variation in one permutation, e.g.:
   •  swapping values of two randomly chosen positions,
    1 3 5 2 6 4 7 8                     1 3 72 6 4 58
  The 8-queens problem: recombination
Combining two permutations into two new
permutations:
•  choose random crossover point
•  copy first parts into children
•  create second part by inserting values from other parent:
   • in the order they appear there
   • beginning after crossover point
   • skipping values already in child
   13526 47 8                           13542 87 6
   87654 32 1                           87624 13 5
                                                                    13
                                                                  19/03/12
The 8-queens problem: selection
  Parent    selection:
      Pick 5 parents and take best two to undergo
        crossover
  Survivor    selection (replacement)
      When inserting a new child into the population,
        choose an existing member to replace by:
      sorting the whole population by decreasing fitness
      enumerating this list from high to low
      replacing the first with a fitness lower than the given
        child
8 Queens Problem: summary
                 Note that is only one possible
           set of choices of operators and parameters
                                                                       14
                                                                                 19/03/12
 Typical behavior of an EA
  Stages in optimizing on a 1-dimensional fitness landscape
                                         Early stage:
                                         quasi-random population distribution
                                         Mid-stage:
                                         population arranged around/on hills
                                         Late stage:
                                         population concentrated on high hills
 Typical run: progression of fitness
         Best fitness in population
                                      Time (number of generations)
Typical run of an EA shows so-called “anytime behavior”
                                                                                      15
                                                                                                                          19/03/12
                             Typical run: progression of fitness
                                  Best fitness in population
                                                                   Progress in 2nd half
                                                                    Progress in 1st half
                                                                           Time (number of generations)
                                                Are long runs beneficial?
                                                             It depends on how much you want the last bit of progress
                                                             May be better to do more short runs
                      Is it worth expending effort on smart initialization?
Best fitness in population
                             F                                       F: fitness after smart initialisation
                                                                      T: time needed to reach level F after random
                                                                      initialisation
                                                               T
                                                                         Time (number of generations)
                   •             Answer: it depends.
                                  • Possibly good, if good solutions/methods exist
                                  • Care is needed
                                                                                                                               16
                                                                                                    19/03/12
          EAs as problem solvers: Goldberg view (1989)
Performance of methods on problems
                                                                 Special, problem tailored method
                                        Evolutionary algorithm
                                                                                 Random search
                                                      Scale of “all” problems
                                       EAs as problem solvers: Michalewicz view (1996)
                                                                                  EA 4
  Performance of methods on problems
                                                     EA 2
                                                                                     EA 3
                                         EA 1
                                                       Scale of “all” problems
                                                                                                         17
                                                        19/03/12
        Genetic Algorithms
GA Quick Overview
  Developed:       USA in the 1970’s
  Early   names: J. Holland, K. DeJong, D. Goldberg
  Holland’s
           original GA is now known as the
  simple genetic algorithm (SGA)
  Other   GAs use different:
      Representations
      Mutations
      Crossovers
      Selection mechanisms
                                                             18
                                                       19/03/12
 Representation
 SGA technical summary tableau
Representation       Binary Strings
Recombination        N-point or uniform
Mutation             Bitwise bit-flipping with fixed
                     probability
Parent selection     Fitness-Proportionate
Survivor selection   All children replace parents
Speciality           Emphasis on crossover
                                                            19
                                                    19/03/12
SGA reproduction cycle
1.    Select parents for the mating pool
      (size of mating pool = population size)
2.    Shuffle the mating pool
3.    Apply crossover for each consecutive pair
      with probability pc, otherwise copy parents
4.    Apply mutation for each offspring (bit-flip
      with probability pm independently for each
      bit)
5.    Replace the whole population with the
      resulting offspring
SGA operators: 1-point crossover
  Choose  a random point on the two parents
  Split parents at this crossover point
  Create   children by exchanging tails
  Pc   typically in range (0.6, 0.9)
                                                         20
                                                                   19/03/12
SGA operators: mutation
  Altereach gene independently with a
    probability pm
  pm       is called the mutation rate
        Typically between 1/pop_size and
          1/chromosome_length
SGA operators: Selection
Main idea: better individuals get higher chance
  Chances proportional to fitness
  Implementation:           roulette wheel technique
        Assign to each individual a part of the roulette wheel
          Spin the wheel n times to select n individuals
                 B
   C         2/6 = 33%                        Fitness(A)   3
1/6 = 17%
                                              Fitness(B)   2
             A                                Fitness(C)   1
          3/5 = 50 %
                                                                        21
                                                        19/03/12
An example after Goldberg ‘89 (1)
  Simpleproblem: max x2 over {0,1,…,31}
  GA approach:
      Representation: binary code, e.g., 01101 ↔ 13
      Population size: 4
      1-point xover, bitwise mutation
      Roulette wheel selection
      Random initialization
  We    show one generational cycle done by hand
X2 example: selection
                                                             22
                        19/03/12
X2 example: Crossover
X2 example: Mutation
                             23
                                                                   19/03/12
 The simple GA
  Has     been subject of many (early) studies
       still often used as benchmark for novel GAs
  Shows      many shortcomings, e.g.,
       Representation is too restrictive
       Mutation & crossover operators only applicable for bit-
         string & integer representations
       Selection mechanism sensitive for converging
         populations with close fitness values
       Generational population model (step 5 in SGA repr.
         cycle) can be improved with explicit survivor selection
 Alternative Crossover Operators
  Performance
              with 1 Point Crossover depends on
 the order that variables occur in the
 representation
     more likely to keep together genes that are near each
       other
     Can never keep together genes from opposite ends of
       string
     This is known as Positional Bias
     Can be exploited if we know about the structure of our
       problem, but this is not usually the case
                                                                        24
                                                                 19/03/12
    n-point crossover
   Choose   n random crossover points
   Split along those points
   Glue parts, alternating between parents
   Generalisation    of 1 point (still some positional bias)
   Uniform crossover
  Assign    'heads' to one parent, 'tails' to the other
  Flip   a coin for each gene of the first child
  Make    an inverse copy of the gene for the second child
  Inheritance    is independent of position
                                                                      25
                                                                    19/03/12
      Crossover OR mutation?
   Decade  long debate: which one is better /
      necessary / main-background
   Answer        (at least, rather wide agreement):
          it depends on the problem, but
          in general, it is good to have both
          both have another role
          mutation-only-EA is possible, xover-only-EA would not
            work
      Crossover OR mutation? (cont’d)
Exploration: Discovering promising areas in the search
space, i.e. gaining information on the problem
Exploitation: Optimising within a promising area, i.e.
using information
There is co-operation AND competition between them
  Crossover is explorative, it makes a big jump to an
 
area somewhere “in between” two (parent) areas
  Mutation is exploitative, it creates random small
 
diversions, thereby staying near (in the area of ) the
parent
                                                                         26
                                                       19/03/12
   Crossover OR mutation? (cont’d)
  Only  crossover can combine information from two
    parents
  Only mutation can introduce new information
    (alleles)
  To hit the optimum you often need a ‘lucky’
    mutation
                                                            27