Focal Loss was introduced by Lin et al

Focal Loss was introduced by Lin et al

Per this case, the activation function does not depend per scores of other classes in \(C\) more than \(C_1 = C_i\). So the gradient respect preciso the each score \(s_i\) mediante \(s\) will only depend on the loss given by its binary problem.

  • Caffe: Sigmoid Ciclocampestre-Entropy Loss Layer
  • Pytorch: BCEWithLogitsLoss
  • TensorFlow: sigmoid_cross_entropy.

Focal Loss

, from Facebook, sopra this paper. They claim sicuro improve one-tirocinio object detectors using Focal Loss esatto train per detector they name RetinaNet. Focal loss is verso Ciclocross-Entropy Loss that weighs the contribution of each sample to the loss based durante the classification error. The pensiero is that, if a sample is already classified correctly by the CNN, its contribution to the loss decreases. With this strategy, they claim puro solve the problem of class imbalance by making the loss implicitly focus con those problematic classes. Moreover, they also weight the contribution of each class to the lose durante verso more explicit class balancing. They use Sigmoid activations, so Focal loss could also be considered verso Binary Ciclocampestre-Entropy Loss. We define it for each binary problem as:

Where \((1 – s_i)\gamma\), with the focusing parameter \(\genere >= 0\), is per modulating factor puro ritornato the influence of correctly classified samples mediante the loss. With \(\varieta = 0\), Focal Loss is equivalent preciso Binary Ciclocampestre Entropy Loss.

Where we have separated formulation for when the class \(C_i = C_1\) is positive or negative (and therefore, the class \(C_2\) is positive). As before, we have \(s_2 = 1 – s_1\) and \(t2 = 1 – t_1\).

The gradient gets a bit more complex paio preciso the inclusion of the modulating factor \((1 – s_i)\gamma\) mediante the loss formulation, but it can be deduced using the Binary Ciclocampestre-Entropy gradient expression.

Where \(f()\) is the sigmoid function. Sicuro get the gradient expression for a negative \(C_i (t_i = 0\)), we just need sicuro replace \(f(s_i)\) with \((1 – f(s_i))\) con the expression above.

Topo that, if the modulating factor \(\genere = 0\), the loss is equivalent sicuro the CE Loss, and we end up with the same gradient expression.

Forward pass: Loss computation

Where logprobs[r] stores, per each element of the batch, the sum of the binary ciclocampestre entropy per each class. The focusing_parameter is \(\gamma\), which by Codice sconto cougar life default is 2 and should be defined as per layer parameter per the net prototxt. The class_balances can be used preciso introduce different loss contributions a class, as they do con the Facebook paper.

Backward pass: Gradients computation

In the specific (and usual) case of Multi-Class classification the labels are one-hot, so only the positive class \(C_p\) keeps its term in the loss. There is only one element of the Target vector \(t\) which is not zero \(t_i = t_p\). So discarding the elements of the summation which are niente coppia puro target labels, we can write:

This would be the pipeline for each one of the \(C\) clases. We arnesi \(C\) independent binary classification problems \((C’ = 2)\). Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem sicuro backpropagate, and the losses sicuro videoclip the global loss. \(s_1\) and \(t_1\) are the punteggio and the gorundtruth label for the class \(C_1\), which is also the class \(C_i\) con \(C\). \(s_2 = 1 – s_1\) and \(t_2 = 1 – t_1\) are the score and the groundtruth label of the class \(C_2\), which is not per “class” in our original problem with \(C\) classes, but a class we create preciso serie up the binary problem with \(C_1 = C_i\). We can understand it as verso preparazione class.

Posted in Cougar Life visitors.

ใส่ความเห็น

อีเมลของคุณจะไม่แสดงให้คนอื่นเห็น