• The confidence score represents the probability that the output of the model is correct.
  • A loss function is a way to quantify the accuracy of a certain model. More specifically it associates a prediction with a score that denotes how far it is from the ground truth.

Cross-Entropy Loss

  • The cross entropy loss takes in a probability vector as input.

  • Let: be the input vector representing an estimate. be a vector representing the ground truth be the number of labels that we have (i.e., the dimension of and ).

    The function is defined as:

Interpretation

  • It derives from cross entropy. We may see this by noting that the “actual probabilities” are those represented in our label vector and the “believed probabilities” that of .
  • The cross entropy loss not only minimizes the model’s error, but it also minimizes the Entropy, and by extension the amount of information we need to communicate the correct label. In doing so, we also minimize the degree of uncertainty that the model has as a result of its predictions.

Relation to Softmax

  • We get the following equations

    Now consider the partial derivative with respect to any :

    So the gradient is simply

    Which means to perform gradient descent, all we need is the difference between the expected and observed probabilities.

    Now take the second derivative. We get As it turns out, this is equal to the variance of the Softmax function.

Mean Square Error

  • The Mean Squared Error, denoted is defined as:
  • A convention is to multiply the above quantity by since it makes the derivative more mathematically convenient

Smooth L1- Loss

  • Defined as

  • The term is present to make the function continuous.

  • This acts as a combination of L1 and L2 regression

Links