Genetic Algorithms : what they are , how they work , knapsack and travels sales man problem

Gourine ayoub
12 min readMar 17, 2021

Introduction

in this article you learn about genetic algorithms where they come from, what they are , how to use them in your problem solving and how they are used to get a solution to some famous problems like the knapsack problem and the travel-salesman problem.

To understand the genetic algorithms in a good way you need first to understand some other technical words and concepts that the GA is based on.

Optimization problems:

wiki : In mathematics, computer science and economics, an optimization problem is the problem of finding the best solution from all feasible solutions.

My definition : basically an optimization problems is where we have a problems that has different solution, those solution may varies in terms of efficiency based on what we are judging those solution ( success/efficiency factor different from one problem to another) and since we are always interested in finding the best solution that we can, we call this an optimization problem.

Evolutionary Computing and evolutionary problems :

this is a filed in computer since where we are interested in studding the behavior of algorithms and there evolving based on Darwinian principles of natural selection , you don't need to know those principals cause they will be discussed in the GA steps , all what you need to know is that GE is inspired from the natural selection that happens in the real word (this article does not discuss if you agree with the Darwinian principals or not it just to clear the fact that the GA methods are inspired from them).

Genetic Algorithms :

what they are ?

  • Genetic algorithms (GA) are adaptive methods used to solve search and optimization problems , they are inspired from the genetic process of the natural organisms so in a way the algorithms mimics the real organism genetics process.
  • The GA follows the principals of naturals selection that goes by “the strong(best/fittest) ones will survive” , and by that we have many generation of solution and the solution that will last will be the more fittest and more better as our problem solution.
  • In a way we can say that the solutions will evolve during the GA process .
  • GA implementation can be different from a problem to another and the factors that we jugged based on them the efficiency or the fitness of the solution is also different from one problem to another.
  • GA starts from an Initial Population (a number of solution fixed in the size) of possible solution and from them it will create the next generation solutions by applying some steps like crossover and mutation which are steps that happened in the naturals selection process of leaving organisms.

How the GA works ?

in order to understand how the GA works you need to understand this principals that the GA is based on and/or they are a part of it’s process :

  • An initial population is the first set of possible solutions or you can say the first generation of solution.
  • A solution is represented like a real genome so we can manipulate it , one representation will be a binary chain a “string of 0 and 1” (you will understand more in the real problems solving example).
  • Calculating the fitness : calculating the fitness of a solution means giving it a rate of successes that will determine it’s rank between other solutions , each GA has a proper fitness function that will take a solution as an input and return a fitness rate, this function need to be programmed based on how you are evaluating the solutions.
  • The Selection process : this process is where we select some solution over the other based on the fitness but also with a factor of randomness always there.
  • The crossover process : in this process will try to change between the solution like taking a part from solution A and a part from solution b and form a new solution, remember the solutions are represented like an organism genome so we can apply operation like switching parts, adding or removing parts .
  • The mutation process : in this process we change one part of the same solution like switching 0 bit to 1 or the inverse.
  • The steps of crossover and mutation has a randomness factor which means it’s likely that they will not happened to all parent solutions (previous solutions) , they have a probability of accruing.
  • Best solution will likely get selected over and over.
  • The crossover tend to be more important then mutation cause it simply helps to rapidly more explore the wide search area where the mutation explore in the same gene by changing one bit of it.
  • The result of crossover solutions be called offspring's.

The General process of the GA :

Generating initial population :

In this step a first generation of possible solution will be generated.

Calculating the fitness using the fitness function :

In this step you need to create a function that takes as an input a solution and give back a fitness rate that is calculated based on some factors that you define as an example a solution of finding a short path will check the distance as a factor maybe also the traffic, factors differentiate from one problem to another and defining a good fitness function will effect a lot your GA results.

Selection process :

In this step the generated of solution that wan introduced in the initial population phase , will go through a selection process where some solution will be picked based on the fitness gave by the fitness function, on way of doing this is to randomly pick two solutions and make them go against each other in a tournament like environment where the fittest solution will be picked from each two solutions.

Crossover process :

In this step we process using the selected solution and and from two different solution we will get different parts so that a part of one solution A will get with another part of other solution B . resulting in what we call offspring’s.

Mutation process :

In this step we take those offspring’s and make a change in the same solution by changing one it’s part , like changing a 0 to 1 or the inverse if it’s a binary string or changing an letter in if it’s a chain of letters.

after that the cycle will repeat we pick the best solution after the mutation and do the process again and again resulting each time in a new generation of solutions

When the GA algorithm stops ?

The GA will stops when it reaches a convergence state where the new generation of solution looks similar to the previous generation.“A gene is said to have converged when 95% of the population share the same value. The population is said to have converged when all of the genes have converged.”

Famous problems where the GE can be used :

The Travel sales man problem:

Now, You who are reading this article it’s for sure that you are leaving somewhere in this world and any where you are leaving I am pretty sure that there is other places around you and the country that you are leaving in have districts or zones you can call it anything you want basically it’s other places around you , imagine that you want to make a trip and you want to visit a place called X and to go to that place you have to go through other zones/districts , for the sake of the problem lets say that where you live there is 5 zones A B C D and X and you live in the zone A , the zones/districts are in a map and of course between each zone and other there is a known distance , so your goal here is to go to zone X starting from where you live zone A so we have Start = A and Goal = X and also we have these information :

knowing that each zones that have a distance between them are adjacent in the map so basically to go to zone X you can take this roads :

A →B → D → X : 230 km OR A → C → D→ X: 480 km

and you want to take the route with less distance so obviously you will take the rout A →B → D → X that has a less distance

and that’s it that's the travel sales man problem it’s just that the maps get bigger so we have more zones and that results in a very big number or combination and probabilities that needs a long long time to get calculated and that necessitate some algorithms like the GA to solve this optimization problem and give us a good or the best rout to take.

The solution for this problem can be modeled like a binary chain 01… where each bit 0/1 indicates the visiting(1) or non-visiting(0) of the zone(A/B..) or with binary string indicating the taking route ABC… .

we will not go through each step solving this problem with GA instead we will give just indications , but don;t worry we will go throw all the steps in the next problem (the knapsack problem).

the fitness function in this problems can be calculated based on factors like the distance and some other things like traffic and the safety of the route , this will change based on the goal that you want your solution to achieve.

The cross over and mutation will change parts in the binary or letter chain resulting in new generation of solutions.

The knapsack problem :

Imagine that’s it’s your very very lucky day and you walk into a shop suddenly the owner tells you I will give you this bag and you have this X items let’s say ( TV, phone, PS, tablet ,PC, ) and he tells you that you can take any thing from those items and put it in the bag and take it for free , you start getting excited and you are about to do that , suddenly he says wait but there is one condition the weight of the bag has to not exceed X kg so be careful , now you are in a situation where you have to pick the most valuable things based on what you think and make it in the bag so it does not exceed the X size and take it , and that’s it that's the knapsack problem and now we will see how we can solve this problems with the GA process.

Note that there is many variant of the knapsack problem where different constrains are applied like the number of items in each solution the weight and other constrains.

Solving the knapsack problem with GA :

Initial population :

we will start with X possible solutions and we can say that X=10 (this can change in your own modeling ) so we have 10 different solution each solution with 100 different items named from item0 … to item99 each item has some characteristic like:

  • Value : how much it worth in terms of money
  • Weight : how much it weights in term of Kg.
  • Shape: the shape of the item and how it will fit in the bag.

we can add more characteristic like surface and other things , this characteristic are important cause they will be used in the fitness function.

each solution will be an array of size 100, with 1 and 0 where the 1 indicates the existing of item[index] in the solution and 0 for the inverse

example: solution1 : |0 |1| 1| 0| ..….. this can be translated to :

item0 not found | item1 found | item2 found | item3 not found …..

The fitness function :

this part here is up to the programmer and it’s so important to make a good fitness function so the GA can give better results as an example we can say that our fitness function will be :

  • Input : one possible solution.
  • Output : a rate of fitness from -1 to 1 (this is based on how you want to model your solution).
  • Process :
  1. get the difference between the weight of the solution if it’s lower a value from 0 to one will be assigned (the greater the wightValue means the weight is more close the the wight limit) if the weight is greater a value from -1 to 0 will be assigned (the greater the wightValue means the weight is more close to be equal to limit).
  2. get the value of each of item in the solution and some them up , a PriceValue will be assigned.
  3. based on the shape of each item and how they fit together we can also give a ShapeValue where maybe we can use a 3d dimension visualizer an based on the space left empty in the bag we can set the ShapeValue.
  4. a final process will be use those three variables in a formula to get the fitness rate , you should note that this solution is just an example you can change the factor based on the problems and also change how you evaluate the solution and the items inside the solution to best fit your problem.

Selection process:

  • we calculate each solution fitness in the 10 solutions and get the fitness rate.
  • each time we make pick randomly 2 solutions and in a tournament style we take the fittest until we have the number that we want .

Cross Over process:

  • from the selected solutions we take tow by tow and then we do:
  1. take part from solution 1. ex : part from 0… to 25 and from 90..to 100
  2. take a part from solution 2 ex: part from 25 … to 90.
  3. the new solution will have both parts from solution 1 and solution2 resulting in a new solution called an offspring’s.
  4. repeat the process with other solution until we got the number that we want.

Mutation process:

  • From the offspring’s generated by the cross over process we change parts in each solution that we have simply bu changing a bit or more from 0 to 1 or 1 to 0 and that will mean removing or adding an item to some of items in that’s solution.

Repeating the process :

  • we will repeat the same process until we reach a state of convergence where the new resulted generation are some very similar to the previous one and some time we can reach a full convergence state where the new generation are the same as the parents .

Strength and Weakness of GA :

  • The GA is robust and flexible and can be applied to solve many problems.
  • a GA will find a good acceptable quickly solution but it will not always find the best solution.
  • GA can and most of the time is hybridized with other techniques to give a better solution.
  • The main weakness of the GA is in it’s convergence cause ones a highly fit gen (has a good fitness rate) dominates the generation the possibility of another better solution to appear is eliminated and the new generation will start to only look like the parents and that the convergence cause the cross over will only change parts between the parents so the randomness will be only in the mutation which is not that great rate of randomness so it results in a week search or exploitation of solutions once the convergence appears.
  • GA can be used with Neural nets to get better solutions.
  • The GA is best used in the numerical function optimization.
  • The sarcastic thing is that the flexibility of the GA is both and advantage cause it can be used to solve more problems but also a weakness points cause the coder need to do a really good work in defining the fitness function and make it very meaningful for the GA.

Note :

if you want a detail research about the GA you may want to read this 15 page article : https://www.researchgate.net/publication/2383112_Genetic_Algorithms_Overview

note : some of the illustration i made theme and some other are just downloaded from other sources.

conclusion :

  • GA is a great thing to learn and it really gives you a new perspective on how to solve optimization problems.
  • know your problems and understand theme before you decide what algorithm to use on it cause that will make a big difference.
  • GA in most cases is used with other techniques to get the best solutions.
  • the way how you value your solution (the fitness function) has a great importance in the coding of the GA implementation for your problem.
  • nature is a great place and till now we still inspire things from it in every aspect of our world.

--

--