Consider the problem of optimizing the function . Our general pipeline from Calculus is to compute for in our domain of interest such that

Approaches

Derivative Free Optimization ¹ involves the following steps
- Pick a starting point and find the value .
- Pick a small step size .
- Calculate the objective function at and . Pick the appropriate point for the next step.
- Iterate until convergence or termination

Gradient Descent / Ascent is defined as follows
- Compute the derivative of the function
- Pick starting point and control parameter .
- Use the rule For maximization and For minimization
- Iterate until convergence or termination
*Gradient Descent involves moving uphill / downhill along the direction of steepest descent.
For the case of multidimensional data, we replace the update rule of gradient descent as
Monte Carlo Search involves the following
- Pick a sample of points or values.
- Compute the value of the objective function at each sample point.
- Keep the value with the largest or smallest value of the objective function.
- Iterate as needed. Make sure to compare the value at each iteration.
Optimization using Root Finding
- Compute
- Set the derivative to and solve using a root finding method.
- Determine if the critical point or one of the endpoints is the extrema of interest

Considerations

The main pitfall is in local extrema where methods get stuck on.
Each method has its pros and cons.
- Derivative Free Optimization does not require computing a derivative (which may not exist or be computationally expensive) but it converges slower since it has to check all neighbors without considering something like the gradient to guide search.
- Gradient Descent tends to be fast since it moves in the direction of steepest ascent / descent. However, it requires the computation of derivatives and a good choice for starting point and .
- Monte Carlo Search is better suited for noisy objective functions, higher dimensional data, and large sample spaces . It does not require computing derivatives but it tends to be slower and more computationally expensive
- Root Finding has a good convergence rate near the optimum. However, it requires knowing the derivative of and an appropriate choice of starting value.