Mxnet is the foundation, Gluon is the encapsulation, both like TensorFlow and Keras, but thanks to the dynamic graph mechanism, the interaction between the two is much more convenient than TensorFlow and Keras, its basic operation and pytorch very similar, but a lot of convenience, It's easy to get started with a pytorch foundation.
Library import notation,
From mxnet import Ndarray as Ndfrom mxnet import autogradfrom mxnet import gluonimport mxnet as MX
MXNet
mxnet. Ndarray is the basis of the entire scientific computing system, and the overall API is consistent with the NumPy Nparray, which is similar to Pytorch, but unlike the pytorch built-in variables, tensor and other data types, Mxnet simplified only ndarray one, through the Mxnet.autograd can directly realize the derivation, very convenient.
Automatic derivation
x = Nd.arange (4). Reshape ((4, 1)) # tags need to be automatically derivative of the amount of X.attach_grad () # There is an automatic derivation needs to be recorded calculation chart with Autograd.record (): y = 2 * Nd.dot (x.t , x) # Reverse propagation Output Y.backward () # Get gradient print (' X.grad: ', X.grad)
nd conversion to digital
nd. asscalar()
the Mutual of ND and NP arrays
y = Nd.array (x) # NumPy converted to Ndarray.
z = y.asnumpy () # Ndarray converted to NumPy.
memory-saving additions
nd. Elemwise_add(x, y, out=z)
Layer Implementation
Relu activation
def relu (x): return nd.maximum (x, 0)
Fully connected Layer
# variable Generation w = Nd.random.normal (scale=1, Shape= (num_inputs, 1)) B = Nd.zeros (shape= (1,)) params = [w, b]# variable mount gradient for Param in para MS: Param.attach_grad () # implements fully connected Def net (x, W, b): return Nd.dot (x, W) + b
SGD implementation
def SGD (params, LR, batch_size): for param in params: param[:] = PARAM-LR * param.grad/batch_size
Gluon Memory Data Set loading
import mxnet as Mxfrom mxnet import Autograd, Ndimport numpy as Npnum_inputs = 2num _examples = 1000true_w = [2, -3.4]true_b = 4.2features = Nd.random.normal (scale=1, shape= (Num_examples, num_inputs)) label s = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_blabels + = Nd.random.normal (scale=0.01, shape=labels.sh APE) from Mxnet.gluon import data as Gdatabatch_size = 10dataset = Gdata. Arraydataset (features, labels) Data_iter = Gdata. Dataloader (DataSet, Batch_size, Shuffle=true) for X, y with Data_iter:print (x, y) break
[[ -1.74047375 0.26071024] [0.65584248-0.50490594] [ -0.97745866-0.01658815] [ -0.55589193 0.30666101] [- 0.61393601-2.62473822] [0.82654613-0.00791582] [0.29560572-1.21692061] [ -0.35985938-1.37184834] [ -1.69631028-1.740 14604] [1.31199837-1.96280086]]<ndarray 10x2 @cpu (0) >[ -0.14842382 7.22247267 2.30917668 2.0601418 11.89551163 5.87866735 8.94194221 8.15139961 6.72600317 13.50252151]< Ndarray @cpu (0) >
Model definition
- Sequence model generation
- Layer Fill
- Initialize model parameters
NET = Gluon.nn.Sequential () with Net.name_scope (): Net.add (Gluon.nn.Dense (1)) Net.collect_params (). Initialize ( Mx.init.Normal (sigma=1) # model parameter initialization Select Normal distribution
Optimizer
The WD parameter adds L2 regularization to the model, with the following mechanism: W = w-lr*grad-wd*w
Trainer = Gluon. Trainer (Net.collect_params (), ' sgd ', { ' learning_rate ': learning_rate, ' WD ': Weight_decay})
Trainer.step (batch_size) needs to run after each reverse propagation, the parameters are updated, and a simulated training process is as follows,
For e in range (epochs): to data, label in Data_iter_train: With Autograd.record (): output = net (data) Loss = Square_loss (output, label) Loss.backward () trainer.step (batch_size) train_loss.append (Test (net , X_train, Y_train)) test_loss.append (Test (NET, X_test, y_test))
Layer Function API
Stretch
nn. Flatten()
Fully connected Layer
Gluon. nn. Dense(activation="Relu")
parameter represents the output node count
loss function Class API
Cross Entropy
Loss = Gloss. Softmaxcrossentropyloss()
"MXNet" First play _ Basic operation and common layer implementation