Trainability¶

Problems of Gradients¶

Dead ReLU: neurons with negative value (before ReLU) are not undated. Recovered if the updated outputs from the previous layer make its value positive.

Vanishing Gradients¶

Exploding Gradients¶

Initialization¶

Exercise: what happen if we initialize all parameter to be 0? No update.

Variance of the value of a node (before activation) grows with number of in-flow outputs.

We want to keep variance of all neurons roughly the same upon unitialization.

Xavier Initialization¶

Kaiming Initialization¶

Normalization¶

Adding an Affine Transformation¶

Residual Connections¶

previous

Stochastic Gradient Descent

next

Regularization