Each element of the input is independently set to 0 with probabily dropout_rate or to 1 / (1 - dropout_rate) times its original value (with probability 1-dropout_rate). Dropout is a good way to reduce overfitting.
op_dropout(x, dropout_rate = 0, seed = 4294967293, name = "")
x | matrix or CNTK Function that outputs a tensor |
---|---|
name | (str) the name of the Function instance in the network |
This behavior only happens during training. During inference dropout is a no-op. In the paper that introduced dropout it was suggested to scale the weights during inference In CNTK’s implementation, because the values that are not set to 0 are multiplied with (1 / (1 - dropout_rate)), this is not necessary.