- Consider the problem of optimizing the function
. Our general pipeline from Calculus is to compute for in our domain of interest such that
Approaches
- Derivative Free Optimization 1 involves the following steps
- Pick a starting point
and find the value . - Pick a small step size
. - Calculate the objective function at
and . Pick the appropriate point for the next step. - Iterate until convergence or termination
- Pick a starting point
-
Gradient Descent / Ascent is defined as follows
- Compute the derivative of the function
- Pick starting point
and control parameter . - Use the rule
For maximization andFor minimization
- Iterate until convergence or termination
- Compute the derivative of the function
-
*Gradient Descent involves moving uphill / downhill along the direction of steepest descent.
-
For the case of multidimensional data, we replace the update rule of gradient descent as
-
Monte Carlo Search involves the following
- Pick a sample of
points or values. - Compute the value of the objective function at each sample point.
- Keep the
value with the largest or smallest value of the objective function. - Iterate as needed. Make sure to compare the value at each iteration.
- Pick a sample of
-
Optimization using Root Finding
- Compute
- Set the derivative to
and solve using a root finding method. - Determine if the critical point or one of the endpoints is the extrema of interest
- Compute
Considerations
- The main pitfall is in local extrema where methods get stuck on.
- Each method has its pros and cons.
- Derivative Free Optimization does not require computing a derivative (which may not exist or be computationally expensive) but it converges slower since it has to check all neighbors without considering something like the gradient to guide search.
- Gradient Descent tends to be fast since it moves in the direction of steepest ascent / descent. However, it requires the computation of derivatives and a good choice for starting point and
. - Monte Carlo Search is better suited for noisy objective functions, higher dimensional data, and large sample spaces . It does not require computing derivatives but it tends to be slower and more computationally expensive
- Root Finding has a good convergence rate near the optimum. However, it requires knowing the derivative of
and an appropriate choice of starting value.
Links
- Numerical Methods — An Inquiry Based Approach with Python by Sullivan
- Numerical Differential Calculus
Footnotes
-
This is essentially a Greedy Algorithm. ↩