Since now we know optimization algorithms are in fact search algorithms, then why don’t we calculate all possible values that the function can take and simply search for the maximum or minimum? The thing is, it is a good enough approach if your input space (Domain) is limited or only has a few points.
What if the domain was the entire 2-D space which is infinite or if the domain size was too large. Calculating values at all possible inputs is certainly not feasible here.
But do we need to search the entire space in such cases?
Why don’t we first consider an example
What is the minima of the function x^2-5
Now wait, before you skip this blog regarding it as fairly simple and not intellectually challenging enough. I want you to write down the steps you took before you just simply screamed out the answer in your mind
The normal calculus method for calculating this value dictates that we should take the following steps
- Differentiate the function with respect to the unknown
- Equate this to zero to get a value of x.
- Substitute this back in the function to get the value of the optimum (minima/maxima)
So what have we done here? Now obviously the 1-D space (which is the domain of this function) certainly has an infinite number of points. Why didn’t we consider all of them?
The reason is that we have used a certain property of the function which enables us to this search set from infinite points to just a few points. Those few points being the inputs x where the gradient is parallel to x-axis or in other words the gradient is zero.
But there are a few things we took for granted while performing these steps
- We either knew the gradient of this function or it was easy to calculate. But what if it wasn’t just a single variable or two variables? What if it had many unknowns and lots of terms?
- Well you actually knew the function for which you wanted to find the optima. In not all real life applications would you be able to ‘logically’ assign a function or some equation to what you want to optimize.
Okay so this last point may not make much sense to you right now, but try picking up or searching for real life applications where optimization heuristics such as swarm algorithms or evolutionary algorithms have been used, and you will see that in not all cases were they “explicitly” optimizing a mathematical function. Of course there is some mathematical formula underlying the whole thing but it is not readily visible.
So what would you do in such cases? I know what you are thinking -There are so many techniques and tools out there to solve these complex functions, why bother.
Well you are right, in case you don’t worry much about the cost and time they would use up. These are precisely the correct/classical approaches that we spoke about in the last post.
Of course there are tools out there but if they were perfect, the world would be a happier place.
Sometimes solving complex functions would involve solving ordinary or partial differential equations which take up a lot of time. This is assuming there is a correct/classical approach for solving the function. Sometimes these tools themselves employ some heuristics so we can’t argue it’s the correct approach at all.
Still not convinced? Try minimizing this function then (by hand…*evil laugh)
Recommended variable values are: a = 20, b = 0.2 and c = 2π. (d is the number of dimensions)
So let’s not digress anymore. What if I told you, you do not need to know the gradient of the function? And when I say you don’t need the gradient, I also mean that you don’t need to know the function at all, you can just make do with input-value pairs. So the exact formulation need not be known, just the function values at the inputs are needed.
Heuristic optimization algorithms have a lot to offer in this regard. Some Heuristic Optimization algorithms such as evolutionary algorithms or swarm algorithms can perform this optimization (Search) without even knowing the gradient of the function.
Optimization Heuristics ensure that you would arrive at your answer, though not so accurate but certainly in a much faster way. It is a speed to accuracy trade-off you are willing to take. You do not want the exact answer in most cases but something close enough would do.
Heuristic itself means something that may not be perfect but good enough for immediate goals. So mostly such algorithms go without a proof of correctness. But in most cases, with sufficient restrictions on the search space, these algorithms perform very well.
Yes in case you want a perfectly accurate answer and computation time and cost is of no value to you, then there are many linear and non-linear optimization techniques available which employ complex mathematical structures. But as I said, most application simply can’t afford the computation time or do not require such accuracy.
For e.g.- your Bio-metric face recognition app has to optimize the matching scores before it decides whether you are the ‘genuine’ face or not. Are you willing to spend an hour before gaining access?
So from this post we have taken care of the following ideas
- Difference between a classical approach and a heuristic approach.
- When a classical approach or why a classical approach is in-feasible.
- When to use a heuristic optimization algorithm and what trade offs we are wiling to make
Lots of research has been done in this field and it will surely continue to be a hot topic in mathematics. In the next post, we will talk about some of this research and discuss some assumptions we make while using these heuristic algorithms.
And by the way, Particle Swarm Optimization takes a second or less to calculate the minimum of that function in the image.