# 17.3 卷积的反向传播原理

## 17.3 卷积层的训练⚓︎

1. 本层的权重矩阵的误差项
2. 本层的需要回传到下一层的误差矩阵

### 17.3.1 计算反向传播的梯度矩阵⚓︎

Z = W*A+b \tag{0}

z_{11} = w_{11} \cdot a_{11} + w_{12} \cdot a_{12} + w_{21} \cdot a_{21} + w_{22} \cdot a_{22} + b \tag{1}z_{12} = w_{11} \cdot a_{12} + w_{12} \cdot a_{13} + w_{21} \cdot a_{22} + w_{22} \cdot a_{23} + b \tag{2}z_{21} = w_{11} \cdot a_{21} + w_{12} \cdot a_{22} + w_{21} \cdot a_{31} + w_{22} \cdot a_{32} + b \tag{3}z_{22} = w_{11} \cdot a_{22} + w_{12} \cdot a_{23} + w_{21} \cdot a_{32} + w_{22} \cdot a_{33} + b \tag{4}

\frac{\partial J}{\partial a_{11}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{11}}=\delta_{z11}\cdot w_{11} \tag{5}

$J$$a_{12}$的梯度时，先看正向公式，发现$a_{12}$$z_{11}$$z_{12}$都有贡献，因此需要二者的偏导数相加：

\frac{\partial J}{\partial a_{12}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{12}}+\frac{\partial J}{\partial z_{12}} \frac{\partial z_{12}}{\partial a_{12}}=\delta_{z11} \cdot w_{12}+\delta_{z12} \cdot w_{11} \tag{6}

\frac{\partial J}{\partial a_{22}}=\frac{\partial J}{\partial z_{11}} \frac{\partial z_{11}}{\partial a_{22}}+\frac{\partial J}{\partial z_{12}} \frac{\partial z_{12}}{\partial a_{22}}+\frac{\partial J}{\partial z_{21}} \frac{\partial z_{21}}{\partial a_{22}}+\frac{\partial J}{\partial z_{22}} \frac{\partial z_{22}}{\partial a_{22}}  =\delta_{z11} \cdot w_{22} + \delta_{z12} \cdot w_{21} + \delta_{z21} \cdot w_{12} + \delta_{z22} \cdot w_{11} \tag{7}

\delta_{out} = \delta_{in} * W^{rot180} \tag{8}

• 当Weights是$3\times 3$时，$\delta_{in}$需要padding=2，即加2圈0，才能和Weights卷积后，得到正确尺寸的$\delta_{out}$
• 当Weights是$5\times 5$时，$\delta_{in}$需要padding=4，即加4圈0，才能和Weights卷积后，得到正确尺寸的$\delta_{out}$
• 以此类推：当Weights是$N\times N$时，$\delta_{in}$需要padding=N-1，即加N-1圈0

### 17.3.2 步长不为1时的梯度矩阵还原⚓︎

1. 得到从上层回传的误差矩阵形状，假设为$M \times N$
2. 初始化一个$(M \cdot S) \times (N \cdot S)$的零矩阵
3. 把传入的误差矩阵的第一行值放到零矩阵第0行的0,S,2S,3S...位置
4. 然后把误差矩阵的第二行的值放到零矩阵第S行的0,S,2S,3S...位置
5. ......

\begin{bmatrix} \delta_{11} & 0 & \delta_{12} & 0 & \delta_{13}\\\\ 0 & 0 & 0 & 0 & 0\\\\ \delta_{21} & 0 & \delta_{22} & 0 & \delta_{23}\\\\ \end{bmatrix}

\begin{bmatrix} \delta_{11} & 0 & 0 & \delta_{12} & 0 & 0 & \delta_{13}\\\\ 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\ 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\ \delta_{21} & 0 & 0 & \delta_{22} & 0 & 0 & \delta_{23}\\\\ \end{bmatrix}

### 17.3.3 有多个卷积核时的梯度计算⚓︎

z_{111} = w_{111} \cdot a_{11} + w_{112} \cdot a_{12} + w_{121} \cdot a_{21} + w_{122} \cdot a_{22}z_{112} = w_{111} \cdot a_{12} + w_{112} \cdot a_{13} + w_{121} \cdot a_{22} + w_{122} \cdot a_{23}z_{121} = w_{111} \cdot a_{21} + w_{112} \cdot a_{22} + w_{121} \cdot a_{31} + w_{122} \cdot a_{32}z_{122} = w_{111} \cdot a_{22} + w_{112} \cdot a_{23} + w_{121} \cdot a_{32} + w_{122} \cdot a_{33}
z_{211} = w_{211} \cdot a_{11} + w_{212} \cdot a_{12} + w_{221} \cdot a_{21} + w_{222} \cdot a_{22}z_{212} = w_{211} \cdot a_{12} + w_{212} \cdot a_{13} + w_{221} \cdot a_{22} + w_{222} \cdot a_{23}z_{221} = w_{211} \cdot a_{21} + w_{212} \cdot a_{22} + w_{221} \cdot a_{31} + w_{222} \cdot a_{32}z_{222} = w_{211} \cdot a_{22} + w_{212} \cdot a_{23} + w_{221} \cdot a_{32} + w_{222} \cdot a_{33}

$J$$a_{22}$的梯度：

\begin{aligned} \frac{\partial J}{\partial a_{22}}&=\frac{\partial J}{\partial Z_{1}} \frac{\partial Z_{1}}{\partial a_{22}}+\frac{\partial J}{\partial Z_{2}} \frac{\partial Z_{2}}{\partial a_{22}} \\\\ &=\frac{\partial J}{\partial z_{111}} \frac{\partial z_{111}}{\partial a_{22}}+\frac{\partial J}{\partial z_{112}} \frac{\partial z_{112}}{\partial a_{22}}+\frac{\partial J}{\partial z_{121}} \frac{\partial z_{121}}{\partial a_{22}}+\frac{\partial J}{\partial z_{122}} \frac{\partial z_{122}}{\partial a_{22}} \\\\ &+\frac{\partial J}{\partial z_{211}} \frac{\partial z_{211}}{\partial a_{22}}+\frac{\partial J}{\partial z_{212}} \frac{\partial z_{212}}{\partial a_{22}}+\frac{\partial J}{\partial z_{221}} \frac{\partial z_{221}}{\partial a_{22}}+\frac{\partial J}{\partial z_{222}} \frac{\partial z_{222}}{\partial a_{22}} \\\\ &=(\delta_{z111} \cdot w_{122} + \delta_{z112} \cdot w_{121} + \delta_{z121} \cdot w_{112} + \delta_{z122} \cdot w_{111}) \\\\ &+(\delta_{z211} \cdot w_{222} + \delta_{z212} \cdot w_{221} + \delta_{z221} \cdot w_{212} + \delta_{z222} \cdot w_{211})\\\\ &=\delta_{z1} * W_1^{rot180} + \delta_{z2} * W_2^{rot180} \end{aligned}

\delta_{out} = \sum_m \delta_{in\_m} * W^{rot180}_ m \tag{9}

### 17.3.4 有多个输入时的梯度计算⚓︎

\begin{aligned} z_{11} &= w_{111} \cdot a_{111} + w_{112} \cdot a_{112} + w_{121} \cdot a_{121} + w_{122} \cdot a_{122} \\\\ &+ w_{211} \cdot a_{211} + w_{212} \cdot a_{212} + w_{221} \cdot a_{221} + w_{222} \cdot a_{222} \end{aligned} \tag{10}  \begin{aligned} z_{12} &= w_{111} \cdot a_{112} + w_{112} \cdot a_{113} + w_{121} \cdot a_{122} + w_{122} \cdot a_{123} \\\\ &+ w_{211} \cdot a_{212} + w_{212} \cdot a_{213} + w_{221} \cdot a_{222} + w_{222} \cdot a_{223} \end{aligned}\tag{11}  \begin{aligned} z_{21} &= w_{111} \cdot a_{121} + w_{112} \cdot a_{122} + w_{121} \cdot a_{131} + w_{122} \cdot a_{132} \\\\ &+ w_{211} \cdot a_{221} + w_{212} \cdot a_{222} + w_{221} \cdot a_{231} + w_{222} \cdot a_{232} \end{aligned}\tag{12}  \begin{aligned} z_{22} &= w_{111} \cdot a_{122} + w_{112} \cdot a_{123} + w_{121} \cdot a_{132} + w_{122} \cdot a_{133} \\\\ &+ w_{211} \cdot a_{222} + w_{212} \cdot a_{223} + w_{221} \cdot a_{232} + w_{222} \cdot a_{233} \end{aligned}\tag{13}

\begin{aligned} \frac{\partial J}{\partial a_{111}}&=\frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial a_{122}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial a_{122}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial a_{122}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial a_{122}} \\\\ &=\delta_{z_{11}} \cdot w_{122} + \delta_{z_{12}} \cdot w_{121} + \delta_{z_{21}} \cdot w_{112} + \delta_{z_{22}} \cdot w_{111} \end{aligned}

\delta_{out1} = \delta_{in} * W_1^{rot180} \tag{14}

$J$$a_{222}$的梯度：

\begin{aligned} \frac{\partial J}{\partial a_{211}}&=\frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial a_{222}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial a_{222}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial a_{222}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial a_{222}} \\\\ &=\delta_{z_{11}} \cdot w_{222} + \delta_{z_{12}} \cdot w_{221} + \delta_{z_{21}} \cdot w_{212} + \delta_{z_{22}} \cdot w_{211} \end{aligned}

\delta_{out2} = \delta_{in} * W_2^{rot180} \tag{15}

### 17.3.5 权重（卷积核）梯度计算⚓︎

\begin{aligned} \frac{\partial J}{\partial w_{11}} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial w_{11}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial w_{11}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial w_{11}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial w_{11}} \\\\ &=\delta_{z11} \cdot a_{11} + \delta_{z12} \cdot a_{12} + \delta_{z21} \cdot a_{21} + \delta_{z22} \cdot a_{22} \end{aligned} \tag{9}

\begin{aligned} \frac{\partial J}{\partial w_{12}} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial w_{12}} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial w_{12}} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial w_{12}} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial w_{12}} \\\\ &=\delta_{z11} \cdot a_{12} + \delta_{z12} \cdot a_{13} + \delta_{z21} \cdot a_{22} + \delta_{z22} \cdot a_{23} \end{aligned} \tag{10}

\delta_w = A * \delta_{in} \tag{11}

### 17.3.6 偏移的梯度计算⚓︎

\begin{aligned} \frac{\partial J}{\partial b} &= \frac{\partial J}{\partial z_{11}}\frac{\partial z_{11}}{\partial b} + \frac{\partial J}{\partial z_{12}}\frac{\partial z_{12}}{\partial b} + \frac{\partial J}{\partial z_{21}}\frac{\partial z_{21}}{\partial b} + \frac{\partial J}{\partial z_{22}}\frac{\partial z_{22}}{\partial b} \\\\ &=\delta_{z11} + \delta_{z12} + \delta_{z21} + \delta_{z22} \end{aligned} \tag{12}

\delta_b = \delta_{in} \tag{13}

### 17.3.7 计算卷积核梯度的实例说明⚓︎

w=\begin{pmatrix} 0 & -1 & 0 \\\\ 0 & 2 & 0 \\\\ 0 & -1 & 0 \end{pmatrix}

z = x * w  loss = \frac{1}{2}(z-y)^2

\frac{\partial loss}{\partial w}=\frac{\partial loss}{\partial z}\frac{\partial z}{\partial w}=x * (z-y)

def train(x, w, b, y):
output = create_zero_array(x, w)
for i in range(10000):
# forward
jit_conv_2d(x, w, b, output)
# loss
t1 = (output - y)
m = t1.shape[0]*t1.shape[1]
LOSS = np.multiply(t1, t1)
loss = np.sum(LOSS)/2/m
print(i,loss)
if loss < 1e-7:
break
# delta
delta = output - y
# backward
dw = np.zeros(w.shape)
jit_conv_2d(x, delta, b, dw)
w = w - 0.5 * dw/m
#end for
return w


1. 用jit_conv_2d(x,w...)做一次前向计算
2. 计算loss值以便检测停止条件，当loss值小于1e-7时停止迭代
3. 然后计算delta值
4. 再用jit_conv_2d(x,delta)做一次反向计算，得到w的梯度
5. 最后更新卷积核w的值

......
3458 1.0063169744079507e-07
3459 1.0031151142628902e-07
3460 9.999234418532805e-08
w_true:
[[ 0 -1  0]
[ 0  2  0]
[ 0 -1  0]]
w_result:
[[-1.86879237e-03 -9.97261724e-01 -1.01212359e-03]
[ 2.58961697e-03  1.99494606e+00  2.74435794e-03]
[-8.67754199e-04 -9.97404263e-01 -1.87580756e-03]]
w allclose: True
y allclose: True


ch17, Level3