- The confidence score represents the probability that the output of the model is correct.
- A loss function is a way to quantify the accuracy of a certain model. More specifically it associates a prediction with a score that denotes how far it is from the ground truth.
Cross-Entropy Loss
-
The cross entropy loss takes in a probability vector as input.
-
Let:
be the input vector representing an estimate. be a vector representing the ground truth be the number of labels that we have (i.e., the dimension of and ). The function is defined as:
Interpretation
- It derives from cross entropy. We may see this by noting that the “actual probabilities” are those represented in our label vector
and the “believed probabilities” that of . - The cross entropy loss not only minimizes the model’s error, but it also minimizes the Entropy, and by extension the amount of information we need to communicate the correct label. In doing so, we also minimize the degree of uncertainty that the model has as a result of its predictions.
Relation to Softmax
-
We get the following equations
Now consider the partial derivative with respect to any
: So the gradient is simply
Which means to perform gradient descent, all we need is the difference between the expected and observed probabilities.
Now take the second derivative. We get
As it turns out, this is equal to the variance of the Softmax function.
Mean Square Error
- The Mean Squared Error, denoted
is defined as: - A convention is to multiply the above quantity by
since it makes the derivative more mathematically convenient
Smooth L1- Loss
-
Defined as
-
The
term is present to make the function continuous. -
This acts as a combination of L1 and L2 regression
Links
- Differential Calculus
- Optimization Algorithms in Machine Learning - how to actually minimize or maximize the loss.