Torch Getting Started note 12: Preprocessing of data

Source: Internet
Author: User

This chapter uses examples to explain the training set before the formal training and basic operations, please start from the beginning to run Itorch


Import two packages, although this chapter does not involve the establishment of the network, but do not import the NN this package, the data read membership error, specific I also do not know, this is Google out, I just opened not imported NN package, resulting in the file has not been read.

The parameter is the absolute path of the file, the file download I put here (need 2 points download, please support, if there is no point, send me a private messages, I send you)
http://download.csdn.net/detail/u010946556/9518759

There are two data in the compressed package, which are imported into trainset and Testset respectively.

Declares an array of classes that describes the name of the training set picture, with a total of 10 categories of pictures

Print under Trainset can see there are 10000 pictures, 3 channels, Pixels is 32x32, the corresponding label also has 10,000

To be intuitive, you can print one of the photos to watch.
Itorch.image () is used to print a picture, trainset.data[100] represents the 100th picture.
TRAINSET.LABEL[100] represents the label of the 100th label, note that the label is 1 to 10, the 100th label value is 2, the corresponding classes array of the 2nd element is "car"

In order to take advantage of the "NN. Stochasticgradient "method to calculate weights, the data set must meet two conditions: 1) The DataSet must contain the size () method, and 2) the dataset must have an index, which is data[i] that represents the data for the sample I. Therefore, in order for the data set to satisfy the above two points, the function of setmetatable is used. This function has time to take out a chapter in detail, and here you probably know that it is used to add indexes to the dataset. Turn Bytetensor into Doubletensor. Declares a size () function to return the size of its own data (1)

The structure of the trainset.data tensor (Tensor) is printed out as above (imagined as a 4-D array), and the size () function defined is used to put back the size (1), or 10000. If this is not clear, insert an example to illustrate.

This is an initialized tensor (Tensor), similar to a two-dimensional array, size (1) is 3,size (2) is 2.

So much has been said to make the previous trainset.data[33] (save the picture) and Trainset.label[33] (the memory is the label of the picture) the data contained in the transformation into trainset[33] this form of expression. Itorch.image () displays the corresponding picture.

This is the No. 999 image corresponding to the data trainset[999], if you change to print (Trainset[999][1]) is printed by the specific image of the numerical value, as shown below

Below, we need to insert a section to explain the index operation of the next tensor (Tensor)

The index operation of the tensor starts with "[{}]", which is "{}". In our example, the tensor of trainset.data is 4-dimensional, representing the "number of pictures", "the channel of the picture", "the longitudinal pixels of the picture", "the horizontal pixels of the picture", "{}", which represent all elements of the selected dimension. For the 13th line of code, select all the pictures, select the 1th channel, select all the x-axis and one-axis direction pixels.

Print out the result of the image above, note that the value of the second dimension changes from 3 to 1


Again like this time 100 to 105 images, print results of 6 pictures, 1 channels (red), 32x32 pixels


Create two arrays, respectively, for averaging, and standard deviation, which is the data center and normalization operation. Generally speaking, the center is to let the image corresponding to the matrix of the various values of 0 for the origin of the fluctuation and not in some other value fluctuations, normalization is to speed up and make the algorithm better convergence, why the normalization of the following image

If not normalized, the process of convergence may be the case, it takes 9 steps to converge to the optimal value

Normalization is the "ellipse" of the data becomes "circle", so the speed of convergence will be accelerated.
For more details, please see Andrew Ng's explanation above Cousera.

This is a centralized, normalized operation that is done with a for loop.
The loop is I from 1 to 3,mean[i] for the array just created to store the average value of each channel evaluated, note that mean () is the method of averaging, and do not confuse mean with this array. Print () Prints the average of the I-channel calculated Mean[i]. Call the Add () method plus negative mean[i] to complete the centering. Similarly, the second part calculates the standard deviation of channel I, prints the standard deviation stdv[i], calls the DIV () divided by stdv[i], and completes the normalization. Of course, the normalization method is not the only one here is the use of standard deviation processing, related to other algorithms can Google a bit.

At this point, has completed the establishment of the network model, training of basic operations, data preprocessing, the next chapter is to do a summary, complete implementation of our example, with code attached.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.