Pytorch (i)--Data processing

Source: Internet
Author: User
Tags format definition shuffle pytorch dataloader

Directory Connections
(1) Data processing
(2) Build and customize the network
(3) Test your pictures with a well-trained model
(4) Processing of video data
(5) Pytorch source code modification to increase the CONVLSTM layer
(6) Understanding of gradient reverse transfer (backpropogate)
(total) Pytorch encounters fascinating bugs Pytorch learn and use (i)

Pytorch installation than Caffe easy too much, once on the success, the specific installation of the more do not say, Pytorch official said very detailed, there are pytorch official (Chinese) Chinese version.
The use of Pytorch is also relatively simple, specific tutorials can be seen deep learning with pytorch:a Minute Blitz, speaking of easy to understand.

To be able to learn to use a framework, only run its test experiment is not possible, so now intends to Caffe in the Siamese model using Pytorch implementation, to consolidate their proficiency in pytroch use. Data preprocessing

The first is data processing this piece, Pytorch uses the torchvision to complete the data processing, it only realizes some data sets the processing, if the processing own project then needs to revise the increment content.

It takes 3 steps to use raw data processing for the model: Transforms.compose () torchvision.datasets Torch.utils.data.DataLoader () can be understood as data processing format definition, data processing and data loading respectively.

The explanation given in the Compose () code is composes several transforms together. is by compose some of the methods of image processing together. For example, the first center, and then converted to tensor (pytorch data structure), the code is: Transforms.compose ([Transform. Centercrop (Ten), transofrms. Totensor ()]) is converted to tensor first, then regularization, and the code is: ' Transforms.compose ([transofrms. Totensor (), transforms. Normalize ((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))], the specific parameters of the call in the source code can be seen, not much to say. Also note that the code for compose is:

def __call_ (Self, img): for
    T in self.transforms:
        img = t (IMG)
    return img

This is how the input to compose is executed sequentially . First one, then the second .... If you need to work with your own data, you can implement specific manipulations in this class.

Torchvision.datasets , which implements different processing methods for data sets, is used primarily for loading data and processing data. For example, mnist.py and cifar.py are used to process mnist and Cifar datasets. The implementation of the class requires the inheritance of the parent class data. The main methods of the Dataset are 2:

__init__ (self, root, train=ture, Transform=none, Traget_transform=none, Download=false): This method is used to initialize the class and load the data (sometimes you need to define some switches to prevent duplicate processing). The data is loaded into the memory for different data, and it reads data and labels (divided into training and test data).

--getitem__ (self, Index): This method is to pass the read-in output to the Pytorch (the way the iterator is). Note: The transform.compose defined above is called in the number of times, the data that needs to be accessed is determined by index, then the format is converted, and the processed data is finally returned. This means that the data is defined only by defining a class, and its specific data is used when it needs to be used.

at this point, the data is loaded and processed to Pytorch has been completed, and if you need to process your own data, it is done by modifying and adding this part. Then we need to deal with the training data, such as batch size, very random processing and so on.

Torch.utils.data.DataLoader () Data Loder, combines a dataset and and a sampler, and provides single, or multi-process iterators over the dataset. is to synthesize the data and provide iterative access. The input parameters are:

DataSet (DataSet). The input loaded data is the implementation of the above Torchvision.datasets.myData (), so you need to inherit data. Dataset that satisfies this interface.

batch-size, Shuffle, sampler, num_workers, Collate_fn , Pin_memory, Drop_last. These parameters are better understood, the name will know its role. respectively: Batch-size. The size of each batch in the sample is 1 by default. Shuffle If the data is scrambled, the default is False. Sampler Define a method to draw the sample data, and if you define the method, you cannot use shuffle. Num_workers. The data is divided into batches (for big data). Collate_fn. Organize the data and organize each batch data into tensor. (You typically use the default call Default_collate (Batch).) Pin_memory. Processing for different types of batch. For example, for types such as map or squence, the tensor type needs to be processed. Drop_last. The data used to process the last batch. Because the last one may not be divisible, if set to true, the last one is discarded, and false retains the last, but the last one may be small.

The specific processing of iterators (Dataloaderiter) is based on the settings of these parameters, respectively.

Supplemental 2017/8/10:

The Torch.utils.data.DataLoader class is implemented primarily using the Torch.utils.data.sampler, sampler is the base class for all samplers, providing an iterator iteration (__iter__) and length (__len__ ) interface, and sampler is also an index to the data shuffling (shuffle) and other operations. Therefore, if Dataloader does not apply to your data, you need to redesign the data in batches to fully use the provided Smapler.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.