Train neural networks using GPUs and Caffe

Source: Internet
Author: User
Tags nvidia digits

Train neural networks using GPUs and Caffe

absrtact: In this paper, we introduce the method of training a multilayer Feedforward network model based on the data of Kaggle "Otto Group Product Classification challenge" by using GPU and Caffe training neural network, how to apply the model to new data, And how to visualize network graphs and training weights.

"Editor 's note" This paper introduces the method of training a multilayer Feedforward network model based on the data of Kaggle's "Otto Group Product Classification Challenge", which is used to train neural network using GPU and Caffe, and how to apply the model to new data. And how to visualize network graphs and training weights.

Caffe is an open-source deep learning framework initiated by Jiayanqing that allows you to train your neural network using your GPU. Compared to other deep learning frameworks such as Theano or torch, Caffe does not require you to write your own algorithmic program, you only need to specify the network through the configuration file. Obviously, this is more time-saving than writing all of your own programs, and it also limits you to a certain framework. However, in most cases, this is not a big problem, because the framework provided by Caffe is quite powerful and progressive.

The topic of this article consists of a multilayer feedforward network. The model will be trained according to Kaggle's "Otto Group Product Classification Challenge" data. We also focus on the application of the model to new data and how to visualize the network graph and the weights obtained from the training. Confined to space, this article will not explain all the details. In addition, the simple code is more persuasive than the 1000-word words. This article will focus on the concepts and some of the stumbling blocks I have encountered relative to the programmatic details of the control Ipython notebook.

Set up

If you haven't installed Caffe on your system, I recommend working on a EC2 instance that allows GPU processing, such as G2.2xlarge instances. For instructions on how to work with EC2, you can view the guide to EC2 from the Command line, set up Caffe and its preparations to refer to the GPU Powered deep learning with NVIDIA DIGITS on EC2. For the use of Caffe, I also recommend that you install Ipython on your instance notebook--here you can find tutorials.

Defining model and Meta parameters

Training for a model and its application requires at least three configuration files. The format of these configuration files follows the interface Description language, called the Protocol buffer (protocol buffers). It appears to be similar to JSON, but it is significantly different, and it should actually be replaced in the form of a data document that needs to be validated (by way of a custom pattern-like this in Caffe) and serialized.

In order to train, you must have a Prototxt file that maintains the training of meta-parameters (Config.prototxt) and a model for defining network Graphics (model_train_test.prototxt)-connecting layers in an aperiodic and directional manner. Note that the flow of data from the bottom to the top is accompanied by the order in which the layers are specified. The example network here has five levels:

    1. Data layer (one for training, one for testing)
    2. Inner accretion layer (weight ⅰ)
    3. Relus (hidden layer)
    4. Inner accretion layer (weight ⅱ)
    5. Output layer (soft Max for classification)

A,soft Max layer gives a loss
B, the accuracy layer--allows us to see how the network improves while training.

The following extracts from Model_train_test.prototxt show layers (4) and (5A):

[...] Layer {  name: "IP2"  type: "Innerproduct"  Bottom: "ip1"  Top: "IP2"  inner_product_param {    num_ Output:9    Weight_filler {      type: "Xavier"    }    Bias_filler {      type: "Constant"      value:0    }  }} Layer {  name: "Accuracy"  type: "Accuracy"  bottom: "ip2"  Bottom: "Label"  Top: "Accuracy"  include {    phase:test  }}[...]

The third Prototxt file (model_prod.prototxt) specifies the network that is applied to it. In this case, it is largely consistent with the training specification-but it lacks the data layer (because we do not read the data from the product's data source) and the soft max layer does not produce a loss value but is classified as possible. In addition, the accuracy layer is now gone. Also note that we are now starting to specify the input size (as expected: 1,93,1,1)-It is definitely confusing, all four dimensions are called Input_dim, and only the order defines which is which, and does not specify a clear background.

Supported data sources

This is one of the first psychological hurdles to overcome when trying to use Caffe. It's not as simple as using some CSV to provide a way to Caffe executable. In fact, you have three options for data without images.

    • LMDB (Lightning Memory Mapping database)
    • LevelDB
    • HDF5 format

HDF5 is probably the easiest to use because you only need to store the dataset in a file in the HDF5 format. Lmdb and LEVELDB are databases, so you have to follow their protocol. The size of the HDF5 format storage DataSet is limited by memory, which is why I abandoned it. The choice between Lmdb and Leveldb is quite casual-from the resources I skim, Lmdb seems to be stronger, faster and more mature. Then, from GitHub, Leveldb's maintenance seems to be more aggressive, with a larger footprint of Google and StackOverflow.

BLOBs and Datums

The Caffe internally uses a data structure called BLOBs for forward passing data and reverse gradients. This is a four-dimensional array whose four dimensions are called:

    1. N or Batch_size
    2. Channel
    3. Height
    4. Width

This is related to us because we have to shape our case according to the structure before we store it in Lmdb-from where it was sent to Caffe. The shape of the image is intuitive, and a batch of 64 100x200 RGB pixels of the specified image will eventually be used as the shape Array (64,3,200,100). For a batch of 64 feature vectors, the shape of each blob with a length of 93 is (64,93,1,1).

When loading data into Lmdb, you can see individual cases or eigenvectors stored on datum objects. The integer data is stored in the (byte string format) data, and the floating point type is stored in the float_data. At first, I made a mistake. Assign floating-point data to database, causing the model to learn nothing. Before storing the datum to Lmdb, you need to serialize the object to a byte string representation.

Summarize

For me, mastering Caffe is a surprisingly non-linear experience. In other words, to understand this system profoundly, there is no entry point and continuous learning path. The effective information that makes Caffe work for you is distributed in many different tutorials, source code on GitHub, IPython Notebook and forum themes. That's why I took the time to write this tutorial and related code. After I have summed up the knowledge I have learned, I will read it from the beginning.

I think Caffe has a bright future--just adding new features will not only grow horizontally, but will also refactor vertically and improve the experience of all users. This is definitely a great tool for high-performance deep learning. If you want to do image processing and convolutional neural networks, I suggest you look at Nvidia DIGITS, which will provide you with a comfortable GUI to achieve your goals.

Original link: Neural Nets with Caffe utilizing the GPU (translation/Wang Wei Zebian/Zhou Jianding)

Train neural networks using GPUs and Caffe

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.