# 19.6 深度循环神经网络

## 19.6 深度循环神经网络⚓︎

### 19.6.2 前向计算⚓︎

#### 公式推导⚓︎

h2 = s1 \cdot Q + s2_{t-1} \cdot W2 \tag{4}

s1 = \tanh(h1) \tag{5}
s2 = \tanh(h2) \tag{6}

#### 代码实现⚓︎

class timestep(object):
def forward(self, x, U, V, Q, W1, W2, prev_s1, prev_s2, isFirst, isLast):
...


### 19.6.3 反向传播⚓︎

#### 公式推导⚓︎

\frac{\partial Loss}{\partial V}=\frac{\partial Loss}{\partial z}\frac{\partial z}{\partial V}=s2^{\top} \cdot dz \rightarrow dV \tag{11}
\begin{aligned} \frac{\partial Loss}{\partial h2} &= \frac{\partial Loss}{\partial z}\frac{\partial z}{\partial s2}\frac{\partial s2}{\partial h2} \\\\ &=(dz \cdot V^{\top}) \odot \sigma'(s2) \rightarrow dh2 \end{aligned} \tag{12}
\begin{aligned} \frac{\partial Loss}{\partial h1} &= \frac{\partial Loss}{\partial h2}\frac{\partial h2}{\partial s1}\frac{\partial s1}{\partial h1} \\\\ &=(dh2 \cdot Q^{\top}) \odot \sigma'(s1) \rightarrow dh1 \end{aligned} \tag{13}

dz = 0 \tag{14}
\begin{aligned} \frac{\partial Loss}{\partial h2_t} &= \frac{\partial Loss}{\partial h2_{t+1}}\frac{\partial h2_{t+1}}{\partial s2_t}\frac{\partial s2_t}{\partial h2_t} \\\\ &=(dh2_{t+1} \cdot W2^{\top}) \odot \sigma'(s2_t) \rightarrow dh2_t \end{aligned} \tag{15}
dV = 0 \tag{16}
\begin{aligned} \frac{\partial Loss}{\partial h1_t} &= \frac{\partial Loss}{\partial h1_{t+1}}\frac{\partial h1_{t+1}}{\partial s1_t}\frac{\partial s1_t}{\partial h1_t}+\frac{\partial loss_t}{\partial h2_t}\frac{\partial h2_t}{\partial s1_t}\frac{\partial s1_t}{\partial h1_t} \\\\ &=(dh1_{t+1} \cdot W1^{\top} + dh2_t\cdot Q^{\top}) \odot \sigma'(s1_t) \rightarrow dh1_t \end{aligned} \tag{17}

dW1 = 0, dW2 = 0 \tag{18}

\frac{\partial Loss}{\partial W1}=s1^{\top}_ {t-1} \cdot dh_1 \rightarrow dW1 \tag{19}
\frac{\partial Loss}{\partial W2}=s2^{\top}_ {t-1} \cdot dh2 \rightarrow dW2 \tag{20}

\frac{\partial Loss}{\partial Q}=\frac{\partial Loss}{\partial h2}\frac{\partial h2}{\partial Q}=s1^{\top} \cdot dh2 \rightarrow dQ \tag{21}
\frac{\partial Loss}{\partial U}=\frac{\partial Loss}{\partial h1}\frac{\partial h1}{\partial U}=x^{\top} \cdot dh1 \rightarrow dU \tag{22}

#### 代码实现⚓︎

class timestep(object):
def backward(self, y, prev_s1, prev_s2, next_dh1, next_dh2, isFirst, isLast):
...


### 19.6.4 运行结果⚓︎

• 网络类型：回归
• 时间步数：24
• 学习率：0.05
• 最大迭代数：100
• 批大小：64
• 输入特征数：6
• 输出维度：1

8 损失函数值：
0.001157

0.740684
4 损失函数值：
0.000644

0.855700
2 损失函数值：
0.000377

0.915486
1 损失函数值：
0.000239

0.946411

#### 与单层循环神经网络的比较⚓︎

U: 6x4+4=28
V: 4x1+1= 5
W: 4x4  =16
-----------
Total:   49


U: 6x2=12
Q: 2x2= 4
V: 2x1= 2
W1:2x2= 4
W2:2x2= 4
---------
Total: 26


ch19, Level6