# 15.6 批量归一化的实现

## 15.6 批量归一化的实现⚓︎

### 15.6.1 反向传播⚓︎

\delta = \frac{dJ}{dZ}，\delta_i = \frac{dJ}{dz_i} \tag{10}

#### 求批量归一化层参数梯度⚓︎

\frac{dJ}{d\gamma} = \sum_{i=1}^m \frac{dJ}{dz_i}\frac{dz_i}{d\gamma}=\sum_{i=1}^m \delta_i \cdot n_i \tag{11}
\frac{dJ}{d\beta} = \sum_{i=1}^m \frac{dJ}{dz_i}\frac{dz_i}{d\beta}=\sum_{i=1}^m \delta_i \tag{12}

#### 求批量归一化层的前传误差矩阵⚓︎

• $z_i \leftarrow n_i \leftarrow x_i$
• $z_i \leftarrow n_i \leftarrow \mu_B \leftarrow x_i$
• $z_i \leftarrow n_i \leftarrow \sigma^2_B \leftarrow x_i$
• $z_i \leftarrow n_i \leftarrow \sigma^2_B \leftarrow \mu_B \leftarrow x_i$

\frac{dJ}{dx_i} = \frac{dJ}{d n_i}\frac{d n_i}{dx_i} + \frac{dJ}{d \sigma^2_B}\frac{d \sigma^2_B}{dx_i} + \frac{dJ}{d \mu_B}\frac{d \mu_B}{dx_i} \tag{13}

\frac{dJ}{d n_i}= \frac{dJ}{dz_i}\frac{dz_i}{dn_i} = \delta_i \cdot \gamma\tag{14}

\frac{dJ}{d N}= \delta \cdot \gamma\tag{14}

\delta^{(64 \times 10)} \odot \gamma^{(1 \times 10)}=R^{(64 \times 10)}

\begin{aligned} \frac{dJ}{d \sigma^2_B} &= \sum_{i=1}^m \frac{dJ}{d n_i}\frac{d n_i}{d \sigma^2_B} \\ &= -\frac{1}{2}(\sigma^2_B + \epsilon)^{-3/2}\sum_{i=1}^m \frac{dJ}{d n_i} \cdot (x_i-\mu_B) \end{aligned} \tag{16}

\frac{dJ}{d \mu_B}=\frac{dJ}{d n_i}\frac{d n_i}{d \mu_B} + \frac{dJ}{d\sigma^2_B}\frac{d \sigma^2_B}{d \mu_B} \tag{18}

\frac{d n_i}{d \mu_B}=\frac{-1}{\sqrt{\sigma^2_B + \epsilon}} \tag{19}

\frac{d \sigma^2_B}{d \mu_B}=-\frac{2}{m}\sum_{i=1}^m (x_i- \mu_B) \tag{20}

\frac{dJ}{d \mu_B}=-\frac{\delta \cdot \gamma}{\sqrt{\sigma^2_B + \epsilon}} - \frac{2}{m}\frac{dJ}{d \sigma^2_B}\sum_{i=1}^m (x_i- \mu_B) \tag{18}

\frac{d \mu_B}{dx_i} = \frac{1}{m} \tag{21}

\frac{dJ}{dx_i} = \frac{\delta \cdot \gamma}{\sqrt{\sigma^2_B + \epsilon}} + \frac{dJ}{d\sigma^2_B} \cdot \frac{2(x_i - \mu_B)}{m} + \frac{dJ}{d\mu_B} \cdot \frac{1}{m} \tag{13}

### 15.6.2 代码实现⚓︎

#### 初始化类⚓︎

class BnLayer(CLayer):
def __init__(self, input_size, momentum=0.9):
self.gamma = np.ones((1, input_size))
self.beta = np.zeros((1, input_size))
self.eps = 1e-5
self.input_size = input_size
self.output_size = input_size
self.momentum = momentum
self.running_mean = np.zeros((1,input_size))
self.running_var = np.zeros((1,input_size))


#### 前向计算⚓︎

    def forward(self, input, train=True):
......


#### 反向传播⚓︎

    def backward(self, delta_in, flag):
......


d_norm_x需要多次使用，所以先计算出来备用，以增加代码性能。

self.var = np.mean(self.x_mu**2, axis=0, keepdims=True) + self.eps
self.std = np.sqrt(self.var)


self.var \times self.std = self.var \times self.var^{0.5}=self.var^{(3/2)}

#### 更新参数⚓︎

    def update(self, learning_rate=0.1):
self.gamma = self.gamma - self.d_gamma * learning_rate
self.beta = self.beta - self.d_beta * learning_rate


### 15.6.3 批量归一化层的实际应用⚓︎

#### 主程序代码⚓︎

if __name__ == '__main__':
......
params = HyperParameters_4_1(
learning_rate, max_epoch, batch_size,
net_type=NetType.MultipleClassifier,
init_method=InitialMethod.MSRA,
stopper=Stopper(StopCondition.StopLoss, 0.12))

net = NeuralNet_4_1(params, "MNIST")

fc1 = FcLayer_1_1(num_input, num_hidden1, params)
net.add_layer(fc1, "fc1")
bn1 = BnLayer(num_hidden1)
net.add_layer(bn1, "bn1")
r1 = ActivationLayer(Relu())
net.add_layer(r1, "r1")
......


#### 运行结果⚓︎

......
epoch=4, total_iteration=4267
loss_train=0.079916, accuracy_train=0.968750
loss_valid=0.117291, accuracy_valid=0.967667
time used: 19.44783306121826
save parameters
testing...
0.9663


ch15, Level6