CrossEntropyLossLayer

CrossEntropyLossLayer["Index"]

represents a net layer that computes the cross-entropy loss by comparing input class probability vectors with indices representing the target class.

CrossEntropyLossLayer["Probabilities"]

represents a net layer that computes the cross-entropy loss by comparing input class probability vectors with target class probability vectors.

CrossEntropyLossLayer["Binary"]

represents a net layer that computes the binary cross-entropy loss by comparing input probability scalars with target probability scalars, where each probability represents a binary choice.

Details and Options

  • CrossEntropyLossLayer exposes the following ports for use in NetGraph etc.:
  • "Input"real array of rank n
    "Target"real array of rank n or integer array of rank n-1
    "Loss"real number
  • When operating on multidimensional inputs, CrossEntropyLossLayer effectively threads over any extra array dimensions to produce an array of losses and returns the mean of these losses.
  • For CrossEntropyLossLayer["Binary"], the input and target should be scalar values between 0 and 1, or arrays of these.
  • For CrossEntropyLossLayer["Index"], the input should be a vector of probabilities {p1,,pc} that sums to 1, or an array of such vectors. The target should be an integer between 1 and c, or an array of such integers.
  • For CrossEntropyLossLayer["Probabilities"], the input and target should be a vector of probabilities that sums to 1, or an array of such vectors.
  • For the "Index" and "Probabilities" forms, where the input array has dimensions {d1,d2,,dn}, the final dimension dn is used to index the class. The output loss is taken to be the mean over the remaining dimensions {d1,,dn-1}.
  • CrossEntropyLossLayer[][<|"Input"->in,"Target"target|>] explicitly computes the output from applying the layer.
  • CrossEntropyLossLayer[][<|"Input"->{in1,in2,},"Target"->{target1,target2,}|>] explicitly computes outputs for each of the ini and targeti.
  • When given a NumericArray as input, the output will be a NumericArray.
  • CrossEntropyLossLayer is typically used inside NetGraph to construct a training network.
  • CrossEntropyLossLayer can operate on arrays that contain "Varying" dimensions.
  • A CrossEntropyLossLayer[] can be provided as the third argument to NetTrain when training a specific network.
  • When appropriate, CrossEntropyLossLayer is automatically used by NetTrain if an explicit loss specification is not provided. One of "Binary", "Probabilities", or "Index" will be chosen based on the final activation used for the output port and the form of any attached NetDecoder.
  • CrossEntropyLossLayer[form,"port"->shape] allows the shape of the input or target port to be specified. Possible forms for shape are:
  • "Real"a single real number
    "Integer"a single integer
    na vector of length n
    {n1,n2,}an array of dimensions n1×n2×
    "Varying"a vector whose length is variable
    {"Varying",n2,n3,}an array whose first dimension is variable and remaining dimensions are n2×n3×
    NetEncoder[]an encoder
    NetEncoder[{,"Dimensions"{n1,}}]an encoder mapped over an array of dimensions n1×
  • Options[CrossEntropyLossLayer] gives the list of default options to construct the layer. Options[CrossEntropyLossLayer[]] gives the list of default options to evaluate the layer on some data.
  • Information[CrossEntropyLossLayer[]] gives a report about the net layer.
  • Information[CrossEntropyLossLayer[],prop] gives the value of the property prop of CrossEntropyLossLayer[]. Possible properties are the same as for NetGraph.

Examples

open allclose all

Basic Examples  (3)

Create a CrossEntropyLossLayer object that takes a probability vector and an index:

Create a CrossEntropyLossLayer where the input is a probability vector and the target is an index:

Apply it to an input and a target:

Create a CrossEntropyLossLayer that operates on vectors generated from strings:

Apply it to an input and a target:

Completely correct predictions produce a loss of 0:

Scope  (5)

Create a CrossEntropyLossLayer where the input and target are single probabilities:

Apply it to an input and a target:

Thread the layer over a batch of inputs:

Create a CrossEntropyLossLayer where the input is a probability vector and the target is an index:

Apply it to an input and a target:

Create a CrossEntropyLossLayer where the input is a probability vector and the target is a probability vector:

Apply it to an input and a target:

Create a CrossEntropyLossLayer where the input and target are images representing matrices of binary class probabilities:

Apply the layer to an input and target:

Measure all possible losses from inputs and targets from a small set:

The input-target pairs with the smallest losses:

The input-target pairs with the largest losses:

Create a graph containing a CrossEntropyLossLayer in which the input is a 3-channel image in which each color channel represents a class, and the target is a matrix of indices representing the correct class for each pixel:

Measure the loss on a target image and matrix in which the areas that are predominantly red, green, and blue match the indices 1, 2, and 3, respectively, in the target:

Permuting the colors makes the per-pixel distributions disagree with the target matrix and increases the loss:

Applications  (2)

CrossEntropyLossLayer["Binary"] is used automatically by NetTrain when the final activation used for an output is an ElementwiseLayer[LogisticSigmoid]. Create a network that takes a pair of numbers and produces either True or False:

Train the network to decide if the first number in the pair is greater than the second:

The training network automatically constructed by NetTrain contains a binary-type CrossEntropyLossLayer:

Show the behavior of the trained network on the plane:

Plot the underlying probability learned by the network:

CrossEntropyLossLayer["Index"] is used automatically by NetTrain when the final activation used for an output is a SoftmaxLayer. Create an artificial dataset from three normally distributed clusters:

Plot the dataset:

The training data consists of rules mapping the point to the cluster it is in:

Create a net to compute the probability of a point lying in each cluster, using a "Class" decoder to classify the input as either Red, Green or Blue:

Train the net on the data:

The training network automatically constructed by NetTrain contains an index-type CrossEntropyLossLayer:

Evaluate the net on the centers of each cluster:

Show the contours in feature space in which each class reaches a posterior probability of 0.75:

Properties & Relations  (4)

Here is the function computed by CrossEntropyLossLayer["Binary"]:

Evaluate the function on some data:

This is equivalent to the following:

When the target is fixed at 1, the loss is minimized as the input approaches 1:

In general, the loss is minimized when the target approaches the input:

CrossEntropyLossLayer["Binary"] applied to scalar probabilities p is equivalent to CrossEntropyLossLayer["Probabilities"] applied to a vector probabilities {p,1-p}:

Demonstrate the same by substitution:

Here is the function computed by CrossEntropyLossLayer["Probabilities"]:

Evaluate the function on some data:

This is equivalent to the following:

CrossEntropyLossLayer["Probabilities"] is equivalent to CrossEntropyLossLayer["Index"] with a one-hot encoding:

Evaluating them on sparse label data gives the same result:

CrossEntropyLossLayer["Index"] is typically faster to evaluate than CrossEntropyLossLayer["Probabilities"] and can use significantly less memory when the number of classes is very large.

Introduced in 2016
 (11.0)
 |
Updated in 2017
 (11.1)
2018
 (11.3)
2019
 (12.0)
2020
 (12.1)