Computes the gradient of f(z)=log∑iexp(zi)f(z)=log∑iexp(zi) at z = x. Concretely,
op_softmax(x, axis = NULL, name = "")
x | matrix or CNTK Function that outputs a tensor |
---|---|
axis | axis across which to perform operation |
name | (str) the name of the Function instance in the network |
softmax(x)=[exp(x1)∑iexp(xi)exp(x1)∑iexp(xi)…exp(x1)∑iexp(xi)]softmax(x)=[exp(x1)∑iexp(xi)exp(x1)∑iexp(xi)…exp(x1)∑iexp(xi)]
with the understanding that the implementation can use equivalent formulas for efficiency and numerical stability.
The output is a vector of non-negative numbers that sum to 1 and can therefore be interpreted as probabilities for mutually exclusive outcomes as in the case of multiclass classification.
If axis is given, the softmax will be computed along that axis.