First, Momentum1. Calculate DW, Db.2. Define V_DB, V_dw
\[v_{dw}=\beta v_{dw}+ (1-\beta) dw\]
\[v_{db}=\beta v_{db}+ (1-\beta) db\]
3. Update DW, DB
\[dw=w-\alpha V_{dw}\]
\[db=b-\alpha V_{db}\]
Second, RMSprop1. Calculate DW, Db.2. Define S_DB, S_dw (here The square is the element level)
\[s_{dw}=\beta s_{dw}+ (1-\beta) dw^2\]
\[s_{db}=\beta s_{db}+ (1-\beta) db^2\]
3. Update DW, DB
\[dw=w-\alpha \frac{w}{\sqrt{s_{dw}}}\]
\[db=b-\alpha \frac{b}{\sqrt{s_{db}}}\]
Third, adam== combined with the momentum+rmsprop==1. Calculate DW, Db.2. Define v_db, V_dw, s_db, S_dw.
\[v_{dw}=\beta_{1} v_{dw}+ (1-\beta_{1}) dw\]
\[v_{db}=\beta_{1} v_{db}+ (1-\beta_{1}) db\]
\[S_{DW}=\BETA_{2} s_{dw}+ (1-\beta_{2}) dw^2\]
\[S_{DB}=\BETA_{2} s_{db}+ (1-\beta_{2}) db^2\]
3. Deviation Correction (T is the number of iterations)
\[v_{dw}^{correct}=\frac{v_{dw}}{1-\beta^t}\]
\[v_{db}^{correct}=\frac{v_{db}}{1-\beta^t}\]
\[s_{dw}^{correct}=\frac{s_{dw}}{1-\beta^t}\]
\[s_{db}^{correct}=\frac{s_{db}}{1-\beta^t}\]
4. Update the DW, db,e to a very small number, and prevent the denominator from 0. Usually (e=10^-8)
\[dw=w-\alpha \frac{v_{dw}^{correct}}{\sqrt{s_{dw}^{correct}}+e}\]
\[db=b-\alpha \frac{v_{db}^{correct}}{\sqrt{s_{db}^{correct}}+e}\]
Deep learning optimization Algorithm Momentum Rmsprop Adam