Should I use a Gradient Based Optimization Algorithm or a Direct Search One like a Genetic Algorithm?
Foreground
Should I use a Gradient Based Optimization Algorithm or a Direct Search One like a Genetic Algorithm?
This is one of the most common questions optimization beginners ask us at our Nexus trainings and probably one of the most difficult to answer. The true is that there is no correct answer, or to say better, there are a multitude of possible correct answers. In this post, I will share my personal standpoint, but first let me put things in perspective.
Numerical Optimization
Generally speaking, Numerical Optimization Algorithms are algorithms designed to solve optimization problems, i.e. to maximizing or minimizing a real function by systematically choosing input values from within a defined domain. Very bottom line, optimization means finding the “best available” values of some objective function given a defined domain and constraints.
Any Optimization Algorithm proceed in the search of the optimal solution towards a series of iterative evaluations and comparison of the objective function varying the so called design variables within the domain of interest.
There are two main ways an Optimization Algorithm defines its iterations, i.e. the way it selects the next solutions to be analyzed given a current set of results:
- Direct Search: the algorithm selects the next solutions only looking at the objective (and constraint) values of the pre-existing evaluations. Values are compared and a new set or evaluations is scheduled to form the next iteration.
- Gradient Based Search: the algorithm selects the next solutions looking at objective (and constraint) values and derivatives. The set of evaluations required to move to the next iteration are chosen using information coming from both function values and derivatives and this is usually done by estimating the search direction in the domain of interest that maximizes the objective (or constraint) improvements.
General considerations
Generally speaking, these are the main pro- and cons- of Direct Search and Gradiend Based Optimisation algorithms:
- work even with non continuous domain and objective (constraint) functions
- generally more robust than Gradient Based, i.e. results are less dependent from starting point
- less sensitive to numerical noise
- require continuous and derivable (first order) objective (constraint) functions
- generally converge faster, requiring much less evaluations than Direct Search(es)
- results and convergence strongly depend on starting point
- convergence is sensitive to numerical noise and derivatives approximations
Moving from the above considerations, here it is my (personal) list of Golden Rules to choose among Gradient Based and Direct Search algorithms:
- the problem has discrete design variables, non-continuous or non-differentiable objective (or constraint) functions → use Direct Search
- I know my problem is likely to have many possible local optima → use Direct Search
- I know my problem is noisy, i.e. objective and constraint functions suffer from numerical rounding, noise, etc → seriously consider to use Direct Search
- attempt using Gradient Based in all other cases, especially if Derivatives are known in closed-form, i.e. do not require finite-difference to be computed
Few Examples
The Rosenbrok function
In this section, we will compare three different optimization algorithms to minimize the so-called Rosenbrock function in a 2D domain X=[-2.5, 2.5], Y=[-2.5, 2.5]. The Rosenbrock function is defined as:
F(X,Y) = 100*(Y – X^2)^2 + (X-1)^2
has a global minimum in X=1, Y=1, F(1,1) = 0.0.
The rationale of this example is to compare the number of function evaluations required by different algorithms to converge to the optimal solution.
The BFGS algorithm is a Gradient Based algorithm designed to solve unconstrained nonlinear optimization problems. The algorithm uses an hill-climbing optimization techniques that seeks a stationary point using an approximated Hessian matrix of the problem (more info Here).
Results applied to the Rosembrock function starting from the initial point X=-1, Y=2.5 are reported below. The implementation of the Nelder-Mead algorithm available within Nexus converged to the global optimum in 46 iterations, corresponding to 106 function evaluations.
The Nelder-Mead algorithm is a direct search one (more info Here). It moves towards iteration defining and modifying a simplex, which is a polytope of n+1 vertices in n dimensions where n represent the number of design variables. The simplex is modified towards the iterations attempting to get closer to the minimum solution. Results applied to the Rosembrock function are reported below. The implementation of the Nelder-Mead algorithm available within Nexus converged to the global optimum in 28 iterations, corresponding to 104 function evaluations.
Genetic Algorithms are a family of Direct Search methods that mimic the natural evolutionary process by maintaining and evolving a population of solution towards the iterations. Best individuals have best chances to combined each-other so to generate next generations (more info Here).
When applied to the Rosembrock, the single objective Genetic Algorithm available in Nexus converged to the optimal solution in 84 iterations, each having 24 members. The total number of function evaluation was 1096.
The Schwefel function
In this second example, we will compare three different optimization algorithms to minimize the so-called Schwefel function in a 2D domain X=[-512, 512], Y=[-512, 512]. The Schwefel function is defined as:
F(X,Y) = 838 + X*sin( abs(X)^0.5 ) + Y*sin( abs(Y)^0.5 )
has a global minimum in X=-420.56, Y=-420.75, F(X,Y)=0.0611.
The Schwefel is a multi-variant function with multiple local minima. The rationale here is to compare robustness of different algorithms, i.e. their capability to converge to a robust global optimum instead of the closest one.
Limiting to this example, the BFGS Algorithm will fail to converge to the global optimum when started from an arbitrary starting point. As shown below, searches started from X=[-300,-300] and X=[-310, 100] converge to two local minima:
- X=[-300,-300] → X=[-203.81, -203.81], F(X)=434.31 requiring 4 iterations and 18 function evaluations
- X=[-310, 100] → X=[-420.95, 302.53], F(X)=118.47 requiring 7 iterations and 12 function evaluations
The global optimum is reached selecting X=[-310,-310] as starting point, converging in 5 iterations and 9 function evaluations.
Limiting to this example, the Genetic Algorithm is capable to explore the whose design space identifying the most promising areas of interest and focusing function evaluation there getting very close to the theoretical optimum.
Conclusive Remarks
This post provides two examples of minimization problems and compares the behavior of different optimization algorithms in the attempt to highlight and derive some general guidelines. When both Gradient Based and Direct Search approaches are equally possible, i.e. when the domain of interest is continuous and the function to be minimized is derivable, Gradient Based algorithms generally converge faster (in terms of function evaluation) than Direct Search algorithms but are more prone to get trapped in local minima depending on the selected starting point.