Genetic Algorithms

I. Introduction and Overview

What are Genetic Algorithms ?

Genetic algorithms are a part of evolutionary computing, which is a
rapidly growing area of artificial intelligence.  Inspired by Darwin's 
theory of evolution - Genetic Algorithms  (GAs) are computer programs 
which create an environment where populations of data can compete and
only the fittest survive, sort of evolution on a computer.

Genetic Algorithms are a search method that can be used for both solving 
problems and modelling evolutionary systems.  Since it is heuristic (it 
estimates a solution) you won't know if the solution is exact.  Most real-
life problems are like that: you estimate a solution, you don't calculate
it exactly.

Most problems don't have any formula for solving the problem because it is
either too complex or it takes too long to calculate the solution exactly.
One example could be space optimization - the best way to put objects of
varying size into a room so they take as little space as possible.  A
heuristic method is a more feasible approach.

GAs differ from other heuristic methods in several ways. One important 
difference is that  it works on a population of possible solutions, which
other heuristic methods use a single solution in their  iterations (Astar).
Another difference is that GAs are probabilistic and  not deterministic. 
      
What Are They Used For?

Optimization

Optimization is the art of selecting the best alternative among a given 
set of options.  In any optimization problem there is an objective function
or objective that depends on a set of variables.  To reach an  optimum'
does not necessarily mean " maximum". It means the best value for the
function.

GAs are excellent for all tasks requiring optimization and  are highly
effective in any situation where many inputs (variables) interact to
produce a large number of possible outputs (solutions).  Some example
situations are:
  • Optimization such as data fiting, clustering, trend spotting, path finding, ordering.
  • Management: Distribution, scheduling, project management, courier routing, container packing, task assignment, time tables.
  • Financial: Portfolio balancing, budgeting, forecasting, invest- ment analysis and payment scheduling.
  • Engineering: Structural design (eg beam sizes), electrical design (eg circuit boards), mechanical design (eg optimize weight, size & cost), process control, network design (eg computer networks).
  • R & D : Curve and surface fitting, neural network connec- tion matrices, function optimisation, fuzzy logic, population modeling, molecular modeling and drug design.
Advantanges and Disadvantages 

A GA has a number of advantages. It can quickly scan a vast solution set.
Bad proposals do not effect the end solution negatively as they are simply
discarded. The inductive nature of the GA means that it doesn't have to
know any rules of the problem - it works by its own internal rules. This
is very useful for complex or loosely defined problems.

GAs have drawbacks too, of course. While the great advantage of GAs
is the fact that they find a solution through evolution, this is also the
biggest disadvantage. Evolution is inductive; in nature life does not
evolve towards a good solution - it evolves away from bad circumstances
This can cause a species to evolve into an evolutionary dead end.
 Likewise, GAs risk finding a suboptimal solution.

Consider this example: a GA must find the highest point in a landscape. 
The algorithm will favor solutions that lie on a peak (a hill or whatever).
As the individuals gradually begin to propose solutions in the same
vicinity (somewhere on the peak), they begin to look alike. In the end
you may have individuals that are almost identical. The best of these
suggest the top of the peak as the solution. But what if there is another,
higher peak at the other end of our map? It is too hard for the individuals
to venture away from their current peak. The ones that do will be eliminated,
because their solution is worse than the ones we have. An individual might
get "lucky", but that would mean its "genes" are very different from the rest
of the population, so this is unlikely. In other words, the algorithm produces
a suboptimal solution - and we may not even know it. 

Genetic Algorithms (GAs) were invented by John Holland and developed
by him and his students andcolleagues.  Holland's book "Adaption in Natural 
and Artificial Systems" published in 1975. 
        

II. Biological Background

Chromosomes


All living organisms consist of cells. In each cell there is the same set of 
chromosomes. Chromosomes are strings of DNA and serve as a model
for the whole organism. A chromosome consist of genes, blocks of DNA.
Each gene encodes a particular protein. Basically, each gene encodes a 
trait, for example eye color. Possible settings for a trait (e.g. blue, brown)
are called alleles. Each gene has its own position in the chromosome. This
position is called locus. 
A complete set of genetic material (all chromosomes) is called genome. A
particular set of genes in genome is called the genotype.  

Reproduction
During reproduction, the first thing that occurs is recombination (or cross-
over). Genes from the parents combine in some way to create a whole new
chromosome. The newly created offspring can then be mutated. Mutation means 
that the elements of DNA are a bit changed. These changes are mainly caused
by errors in copying genes from parents. 

Fitness

The fitness of an organism is measured by the success of the organism in 
its life. 

Evolution

Evolution is a process that each and every species in nature is 
subjected to. In evolution, each species faces the problems of
searching for a beneficial adaptation to adapt to the rapidly
changing environmnet around them.          
       

III. Search Space

Search space

To solve some problem, the solution should be the best among others.
The space of all feasible solutions is called search space (also state
space). Each point in the search space represents one feasible solution.
Each feasible solution can be "marked" by its value or fitness for the
problem. The solution is one point (or more) among feasible solutions -
that is one point in the search space. 

Looking for a solution is equal to looking for some extreme (minimum or
maximum) in the search space. The search space can be wholly  known by 
the time of solving a problem, but usually only a few points from it are
known --other points are generated in the process of finding a solution.  

       

IV. Genetic Algorithm

Basic Description
    
     Start with a set of possible solutions (represented by chromosomes) --
     the population. Solutions from one population are taken and used to 
     form a new population. This is motivated by a hope that the new pop- 
     ulation will be better than the old one. New  solutions (offspring)
     are selected according to their fitness - the more suitable they are
     the more chances they have to reproduce by mating (crossover). 
     Repeat the cycle until some condition is satisfied.
     
Outline of the Genetic Algorithm
  1. Generate a random population of n chromosomes which are suitable solutions.
  2. Establish a method to evaluate the fitness f(x) of each chromosome x in the population
  3. Create a new population by repeating the following steps until the new population is complete
    • Selection - select from the population according to some fitness scheme.
    • Crossover- New offspring formed by a crossover with the parents
    • Mutation - With a mutation probability mutate new offspring at each locus (position in chromosome).
  4. Use the newly generated population for a further run of algorit

V. Operators of a GA

A. Encoding a Chromosome
   The chromosome should contain information about the solution it represents.

   1. Binary Encoding
      One way of encoding is a binary string.  The chromosome could look like this:
     Chromosome 1 
   1101100100110110    
     Chromosome 2
   1101011000011110   
      Each bit in the string can represent some characteristic of the solution or it 
      could represent  whether or not some particular characteristic was present. Another
      possibilitiy is that the chromosome could contain just 4 numbers where  each number
      is represented by 4 bits  (the highest number therefore being 15.) 

  2. Permutation Encoding

      Permutation encoding can be used in ordering problems, such as the travelling
      salesman problem or a task ordering problem.  Every  chromosome is a string
      of numbers, which represents number in a sequence. In the TSP each number 
      would represent a city to be  visited.
     Chromosome 1 
 1 4 7 9 6 3 5 0 2 8    
     Chromosome 2
 9 3 2 5 8 1 6 0 4 7     
 3.  Value Encoding

     Direct value encoding can be used in problems where some complicated value,
     such as real numbers, are used and where binary encoding would not suffice.
     While value encoding is very good for some problems, it is often necessary to
     develop some specific crossover and mutation techniques for these chromosomes.
     Chromosome 1 
 A B E D B C A E D D    
     Chromosome 2
 N W W N E S S W N N      
     In chromosome 1 above, A could represent a particular task, B another, etc. 
     For chromosome 2 N could be north, S south, and thus could be the path
     through  a maze. 
4. Tree encoding
     Tree encoding is used to actually have programs or expressions evolve.
     In tree encoding every chromosome is a tree of some objects, such as 
     functions or commands in the programming language.  LISP is often 
     used for this because programs in LISP can be represented in this form
     and then be easily parsed as a tree.
     
     Example of a Problem.
              Finding a function from given input and output values.
     Task: Find a function that will give the best output for all given 
           inputs.     


     More Later (I hope).

Reproduction

Reproduction is a process in which individual strings (chromosomes) are copied
according to their fitness.  Intuitively one can think of the fitness function
as some measure of profit, utility or goodness that is to be maximized.  Copying 
strings according to their fitness or goodness means that strings with a higher
value have a higher probability of contributing one or more offspring in the next 
generation.

This operator is an artificial version of natural selection, a Darwinian survival
of the fittest  among string creatures.  In natural populations fitness is deter-
mined by a creature's ability  to survive predators, pestilence and other ob-
stacles to adulthood and subsequent reproduction.  In the artifical setting of
a GA the objective function is the final arbiter of the  string-creature's life
or death.

There are many methods for selecting  the best chromosomes: roulette wheel
selection, Boltzman selection, tournament selection, rank selection, steady
state selection and others. 

Roulette Wheel

     Simple reproduction allocates offspring strings using a roulette wheel 
     with slots sized according to fitness. This is a way of choosing members
     from the population of chromosomes in a way that is proportional to their
     fitness. Parents are selected according to their fitness. The better the 
     fitness of the chromosome, the greater the chance it will be selected, 
     however it is not guaranteed that the fittest member goes to the next 
     generation. 

     Imagine a roulette wheel in which all chromosomes in the population    
     are placed according to fitness,  as in the picture below. The wheel is 
    "spun" and the marble falls into one of the slots.  The wheel is spun to
     select each chromosome that is to mate. 
 

     The wheel is "spun" once for each chromosome that is chosen.
Steady State 
In a steady-state genetic algorithm one member of the population is changed 
at a time. To perform selection a member of the population is chosen according
to its fitness. It is copied and the copy mutated. A second member of the pop- 
ulation is selected which is replaced by the mutated string. In crossover two
members of the population are chosen, a single child created which replaces a
member of the population. Any selection method can be used to select the 
individual for mutation or the parents. There are a number of different
replacement strategies 
  • replace worst (often gives too rapid convergence)
  • replace a randomly chosen member
  • select replacement using the negative fitness.
The major difference between steady-state and generational GAs is that for each P members of the population generated in the generational GA there are 2P selections. Consequently the selection strength and genetic drift for a steady-state GA is twice that of the generational GA. The steady-state GA, therefore, appears twice as fast although it can lose out in the long term because it does not explore the landscape as well as the generational GA. Tournament In general tournament selection n individuals are selected at random and the fittest is selected. The most common type of tournament selection is binary tournament selection, where just two individuals are selected. Elitism The best chromosome (or a few best chromosomes) is copied to the population in the next generation. The rest are chosen in classical way. Elitism can very rapidly increase performance of GA, because it prevents losing the best found solution to date. A variation is to eliminate an equal number of the worst solutions, i.e. for each "best chromosome" carried over a "worst chromosome" is deleted. Rank Selection The roulette method of selection will have problems when the fitnesses differs greatly. For example, if the best chromosome fitness is 90% of all the roulette wheel then the other chromosomes will have a slim chance of being selected. Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population). Before: After:

Crossover

Single Point Crossover Randomly choose a crossover point, then for offspring 1 a) copy everything in parent 1 before the crossover point & b) copy everything in parent2 after this point to the new chromosome. For offspring 2 do the reverse.
     Chromosome 1 
 A B C D E F G H I J    
     Chromosome 2
 0 1 2 3 4 5 6 7 8 9    
     Offspring 1 

 A B C 3 4 5 6 7 8 9 
     Offspring 2
 
 0 1 2 D E F G H I J
       The crossover point above was 3.


Two Point Crossover

        Same as above except this time two crossover points are randomly chosen.
     Chromosome 1 
 A B C D E F G H I J    
     Chromosome 2
 0 1 2 3 4 5 6 7 8 9    
     Offspring 1 

 A B 2 3 4 5 6 7 8 J 
     Offspring 2
 
 0 1 C D E F G H I 9
        Here the crossover points were at 2 and 9.
 Uniform Crossover

       A certain number of genes are randomly selected to be "swapped" 
     Chromosome 1 
 A B C D E F G H I J    
     Chromosome 2
 0 1 2 3 4 5 6 7 8 9
    
     Offspring 1

 0 B C D E 5 G H I 9
     Offspring 2 

 A 1 2 3 4 F 6 7 8 J 
 Specific Crossover

        A specific crossover for a specific problem
        For example:
            A permutation encoding where one crossover point is selected.
            The gene is copied from parent 1, the second parent is scanned
            and if the the number is not  yet included it is added and 
            then the process repeats. 
Arithmetic Crossover

     Some arithmetic operation is performed on the two strings to create a new
     string. Here it was AND.
     Chromosome 1 
 1 0 1 1 1 0 1 1 1 0 1   
     Chromosome 2
 1 1 0 0 1 0 0 0 1 1 1    
     Offspring 1 
 1 0 0 0 1 0 0 0 1 0 1  

Mutation

Flip a bit at the rate of 1% of those bits processed. Mutation depends on the encoding as well as the crossover. Can easily be overdone.
     Binary Encoding
 Permutation Encoding   
     Chromosome 1 1 0 0 1 0 0 0 1 1 1
Chromosome   9 4 6 2 3 1 0 5 7 8    
     Offspring  1 0 0 0 1 0 0 0 1 1 1
 Offspring   9 6 4 2 3 1 0 5 7 8  
A Genetic Algorithm By Hand