Parameter Selection via exponential moving Average
When training a classifier via gradient decent, we update the current classifier ' s parametersθθviaθt+1=θt+αδθt,θt+1=θt+ Αδθt,
Whereθtθt is the ' parameters andδθtδθt is the ' update step proposed by your favorite. Often times, over NN iterations, we simply stop the optimization procedure (where nn is chosen using some sort of decisio n rule) and Useθnθn as our trained classifier ' s parameters.
However, we often observe empirically that a post-processing step can is applied to improve the classifier ' s performance. Once Such example is Polyak averaging. A closely related-and quite popular-procedure is to take A exponential moving averaging (EMA) of the optimization traject Ory (Θn) (Θn), θema= (1−λ) n∑i=0λiθn−i,θema= 1−λ (∑i=0nλiθn−i),
whereλ∈[0,1) λ∈[0,1) is the decay rate or momemtum of the EMA. It ' s a simple modification to the optimization procedure that often yields better generalization than simply selectingθnθ N, and has also been used quite effectively in semi-supervised learning.
Implementation-wise, the best to apply EMA to a classifier be to use the built-in Tf.train.ExponentialMovingAverage functi On. However, the documentation doesn ' t provide a guide for the, and cleanly use Tf.train.ExponentialMovingAverage to construct a N Ema-classifier. Since i ' ve been playing with EMA recently, I thought that it would is helpful to write a gentle guide to implementing a E Ma-classifier in TensorFlow. Understanding Tf.train.ExponentialMovingAverage
For those who wish to dive straight in the full codebase, you can find it here. For Self-containedness, let's start with the code that constructs the classifier.
def classifier (x, Phase, scope= ' class ', Reuse=none, Internal_update=false, Getter=none): With
Tf.variable_scope ( Scope, Reuse=reuse, Custom_getter=getter): With
arg_scope ([Leaky_relu], a=0.1), \
arg_scope ([conv2d, dense], Activation=leaky_relu, Bn=true, phase=phase), \
arg_scope ([Batch_norm], internal_update=internal_update):
x = conv2d (x, 3, 1) x =
conv2d (x, 3, 1)
x = conv2d (x, 3, 1)
x = Max_pool (x, 2, 2) x =
Dropout ( X, training=phase)
x = conv2d