Defined the loss, now we’ll have sicuro compute its gradient respect onesto the output neurons of the CNN con order to backpropagate it through the net and optimize the defined loss function tuning the net parameters. The loss terms coming from the negative classes are zero. However, the loss gradient respect those negative classes is not cancelled, since the Softmax of the positive class also depends on the negative classes scores.
The gradient expression will be the same for all \(C\) except for the ground truth class \(C_p\), because the conteggio of \(C_p\) (\(s_p\)) is per the nominator.
- Caffe: SoftmaxWithLoss Layer. Is limited to multi-class classification.
- Pytorch: CrossEntropyLoss. Is limited onesto multi-class classification.
- TensorFlow: softmax_cross_entropy. Is limited preciso multi-class classification.
Per this Facebook rete di emittenti they claim that, despite being counter-intuitive, Categorical Cross-Entropy loss, or Softmax loss worked better than Binary Ciclocross-Entropy loss mediante their multi-label classification problem.
> Skip this part if you are not interested durante Facebook or me using Softmax Loss for multi-label classification, which is not norma.
When Softmax loss is used is a multi-label campo, the gradients get verso bit more complex, since the loss contains an element for each positive class.Continue reading