• Consider the problem of optimizing the function . Our general pipeline from Calculus is to compute for in our domain of interest such that

Approaches

  • Derivative Free Optimization 1 involves the following steps
    • Pick a starting point and find the value .
    • Pick a small step size .
    • Calculate the objective function at and . Pick the appropriate point for the next step.
    • Iterate until convergence or termination
  • Gradient Descent / Ascent is defined as follows

    • Compute the derivative of the function
    • Pick starting point and control parameter .
    • Use the rule
      For maximization and
      For minimization
    • Iterate until convergence or termination
  • *Gradient Descent involves moving uphill / downhill along the direction of steepest descent.

  • For the case of multidimensional data, we replace the update rule of gradient descent as

  • Monte Carlo Search involves the following

    • Pick a sample of points or values.
    • Compute the value of the objective function at each sample point.
    • Keep the value with the largest or smallest value of the objective function.
    • Iterate as needed. Make sure to compare the value at each iteration.
  • Optimization using Root Finding

    • Compute
    • Set the derivative to and solve using a root finding method.
    • Determine if the critical point or one of the endpoints is the extrema of interest

Considerations

  • The main pitfall is in local extrema where methods get stuck on.
  • Each method has its pros and cons.
    • Derivative Free Optimization does not require computing a derivative (which may not exist or be computationally expensive) but it converges slower since it has to check all neighbors without considering something like the gradient to guide search.
    • Gradient Descent tends to be fast since it moves in the direction of steepest ascent / descent. However, it requires the computation of derivatives and a good choice for starting point and .
    • Monte Carlo Search is better suited for noisy objective functions, higher dimensional data, and large sample spaces . It does not require computing derivatives but it tends to be slower and more computationally expensive
    • Root Finding has a good convergence rate near the optimum. However, it requires knowing the derivative of and an appropriate choice of starting value.

Links

Footnotes

  1. This is essentially a Greedy Algorithm.