Layer factory function to create a convolution layer.
Convolution(filter_shape, num_filters = NULL, sequential = FALSE, activation = activation_identity, init = init_glorot_uniform(), pad = FALSE, strides = 1, sharing = TRUE, bias = TRUE, init_bias = 0, reduction_rank = 1, transpose_weight = FALSE, max_temp_mem_size_in_samples = 0, op_name = "Convolution", name = "")
filter_shape | int or list of int - shape (spatial extent) of the receptive field, not including the input feature-map depth. E.g. (3,3) for a 2D convolution. |
---|---|
num_filters | (int, defaults to None) – number of filters (output feature-map depth), or () to denote scalar output items (output shape will have no depth axis). |
activation | (Function) - optional activation Function |
init | (scalar or matrix or initializer, defaults to init_glorot_uniform()) – initial value of weights W |
pad | (bool or list of bools) – if False, then the operation will be shifted over the “valid” area of input, that is, no value outside the area is used. If pad=True on the other hand, the operation will be applied to all input positions, and positions outside the valid region will be considered containing zero. Use a list to specify a per-axis value. |
strides | (int or tuple of ints, defaults to 1) – stride of the operation. Use a list of ints to specify a per-axis value. |
bias | (bool) – whether to include bias |
init_bias | (scalar or matrix or initializer, defaults to 0) – initial value of weights b |
name | string (optional) the name of the Function instance in the network |
This implements a convolution operation over items arranged on an N-dimensional grid, such as pixels in an image. Typically, each item is a vector (e.g. pixel: R,G,B), and the result is, in turn, a vector. The item-grid dimensions are referred to as the spatial dimensions (e.g. dimensions of an image), while the vector dimension of the individual items is often called feature-map depth.
For each item, convolution gathers a window (“receptive field”) of items surrounding the item’s position on the grid, and applies a little fully-connected network to it (the same little network is applied to all item positions). The size (spatial extent) of the receptive field is given by filter_shape. E.g. to specify a 2D convolution, filter_shape should be a tuple of two integers, such as (5,5); an example for a 3D convolution (e.g. video or an MRI scan) would be filter_shape=(3,3,3); while for a 1D convolution (e.g. audio or text), filter_shape has one element, such as (3,) or just 3.
The dimension of the input items (input feature-map depth) is not to be specified. It is known from the input. The dimension of the output items (output feature-map depth) generated for each item position is given by num_filters.
If the input is a sequence, the sequence elements are by default treated independently. To convolve along the sequence dimension as well, pass sequential=True. This is useful for variable-length inputs, such as video or natural-language processing (word n-grams). Note, however, that convolution does not support sparse inputs.
Both input and output items can be scalars intead of vectors. For scalar-valued input items, such as pixels on a black-and-white image, or samples of an audio clip, specify reduction_rank=0. If the output items are scalar, pass num_filters=() or None.
A Convolution instance owns its weight parameter tensors W and b, and exposes them as an attributes .W and .b. The weights will have the shape (num_filters, input_feature_map_depth, *filter_shape)