Caffe︱ build Lmdb datasets and set up a fine-grained solution for each file path name

Last Update:2017-01-04 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A brief description of the process of lmdb generation

1, organize and constrain the size, folder. The picture is placed under different folders, note that the size of the picture needs to be in a uniform format, otherwise the calculation of the mean file will be an error.
2. Put the content generation list into the TXT file. Two txt files, train training file, Val test file. The train inside is your classification.
3, the formation of LMDB data set.
4, form the training set of the mean file.

Collation and specification. In general, the use of data enhancement function, the general use of OPENCV, this piece of the author has not explored, so first do not say.

First, the picture list generation

The picture content becomes a list. This method is many, many software can use, Python, R, Linux system also can.
If it appears, according to the name of the image to be renamed, you can look at the blog: Caffe Learning Series (11): Convert image data to db (Leveldb/lmdb) file
This blog teaches you to use Linux commands to generate lists by grabbing some keywords.

How many questions do you have when generating a list?

1. How to set the path name when txt list?

When the list is generated, what is the path name to add before? There are a variety of prefixes when creating lists on the Web, such as:

Nothing to add (ref: Http://www.mamicode.com/info-detail-1338521.html)

The front adds a bunch of clutter (ref.: http://www.voidcn.com/blog/garfielder007/article/p-5005545.html)

The author as a small white, instant blindfolded ... After their own practice found that the path name as little as possible, because in the later editing of the create_imagenet.sh file, you can set.

Therefore, in general, the more convenient way is:

If the training set requires a more categorical path, the validation set will write the image name directly.

For example, the training set for the 0/12 classification, the path name:
Training set:

0011...

Test set:

Pic3:0Pic4.jpg1

This is simple and convenient, can achieve the effect. After you can see the editor create_imagenet.sh time, will be associated with this side of the relationship.

Note: The Train.txt file should preferably start with a picture of label 0, which will be better as the first one.

2, the picture name need to standardize into a unified format?
The name of the picture is not strictly stipulated, but, there are rules to facilitate your future search.

Ii. using create_imagenet.sh to generate Lmdb files

The caffe requires a fixed format, so you need to generate Lmdb files using create_imagenet.sh.
create_imagenet.sh files generally in the/caffe/examples/imagenet, which is a imagenet case, you can also learn how to modify the inside.

1. File modification

After opening to modify the place there are three, said the content of the changes, it should be simple ... A look on the Internet is dumbfounded ... A variety of versions, and let me have a meal.

file Scenario: For example I am now in the Caffe/examples/lmdb_test/train folder, put in the Train Picture Training folder (0/1 category), the Val Picture verification folder. Then copy the create_imagenet.sh to this folder.
here's a "pain point": I put the train picture folder under the Train folder

Modify Part One: data, tool storage path

EXAMPLE=/caffe/examples/lmdb_test/train # example是下面第三步、第四步要用的，注意~DATA=/caffe/examples/lmdb_test/train#数据存在哪？图片集合所在路径trainTOOLS=/caffe/build/tools#工具在哪？一般都在这个目录下，照抄就行

--Data path storage location, attention is not so meticulous, with the third part generated Lmdb file corresponding to, pay attention to see!

Modify Part Two: training, verifying data set storage path
(with "One, picture List generation" related)

TRAIN_DATA_ROOT=/caffe/examples/lmdb_test/train/train/（训练集路径）VAL_DATA_ROOT=/caffe/examples/lmdb_test/train/val/（验证集路径）

--training, validation set data storage path, and each path under the val/"/" must be added.
This section corresponds to the image generation list, such as My train dataset:

/caffe/examples/lmdb_test/train/train/0/pic1.jpg0/caffe/examples/lmdb_test/train/train/1/pic2.jpg1

Validation set:

/caffe/examples/lmdb_test/train/val0/caffe/examples/lmdb_test/train/val1

See, this part is perfectly aligned with the text list! So, before setting the TXT in the text list, keep it simple. There are so many versions on the Internet and there is no explanation for the white deceptive!!

Modify Section Three: Lmdb file generation Path

GLOG_logtostderr=1 $TOOLS/convert_imageset     --resize_height=$RESIZE_HEIGHT     --resize_width=$RESIZE_WIDTH     --shuffle     $TRAIN_DATA_ROOT     $DATA/train.txt     $EXAMPLE/train_lmdbecho "Creating val lmdb..."GLOG_logtostderr=1 $TOOLS/convert_imageset     --resize_height=$RESIZE_HEIGHT     --resize_width=$RESIZE_WIDTH     --shuffle     $VAL_DATA_ROOT     $DATA/val.txt     $EXAMPLE/val_lmdb

It is worth noting that the $ $EXAMPLE, $DATA section, where the path corresponds to "Modify Part One".

$DATA/caffe/examples/lmdb_test/train/train.txt

The general default, as long as the $data set is no problem, of course, Train.txt's file name has been changed, you have to look carefully

$EXAMPLE/caffe/examples/lmdb_test/train/train_lmdb

Here, the main is to generate Lmdb folder, so the name can be arbitrarily modified, the front $data part, can not!

Additional needs to modify part four:

RESIZE=falseif$RESIZEthen  RESIZE_HEIGHT=256  RESIZE_WIDTH=256else  RESIZE_HEIGHT=0  RESIZE_WIDTH=0

Note: Here resize is generally false close, is not to make the picture size modification, this side if need to make picture correction need to open.
This is generally the same size that needs to be unified.

when asked--resize=true, do you need to install OPENCV?
Answer: No, not a module

2. File Run and check

Sh or come to the folder directly./create_imagenet.sh.
If this data pack all of our training picture data, check if the size of the file is expected size, if the size of the file is only a few k, then it means that you did not package success, estimated because the path set error.
The path error is the primary error, and if it is determined not to be a path error to troubleshoot the data, you may need to open the look: You can refer to the blog: Deep Learning (13) Caffe Training data Format (http://www.voidcn.com/blog/garfielder007/ article/p-5005545.html)

The final figure, which appears to be done:

3. Error

06:07:33.180974  3151 io.cpp:80notopenorfile /caffe/examples/lmdb_test/train/train/0/1376_faceimage49773.jpg

Picture list generation, error, some pictures are not matched on ... But it does not affect the overall operation.

报错二：(2mkdir07440 (-10mkdir /caffe/examples/train/train_lmdb failed

The parameters are wrong, go back and see
-

报错三：Mkdir(source.c_str(),07440(-10) mkdir examples/.../train_lmdb failed

Because my folder contains the name of the folder, the generation is not out.
Reference blog: (original) Caffe to generate data in LMDB format from images

4. h5py Format Data

Lmdb is a picture of a label, but if it is a face recognition such that a picture to enter 4, then you need to use the H5PY format data.
Caffe the use of H5PY data format, it is necessary for the external, data extension, data normalization and other related data preprocessing operations, Caffe more troublesome.
Do not delve into, can refer to blog: Deep Learning (13) Caffe Training data Format (http://www.voidcn.com/blog/garfielder007/article/p-5005545.html)

Third, using make_imagenet_mean.sh to generate image mean

Why do we need to subtract the mean from the picture?

In summary: image stability (reduced volatility), can improve classification accuracy
And subtracting the mean generally makes the brightness drop, but the brightness is actually less important for the image classification.
In addition, generally choose to standardize the processing of data, but it is meaningful to calculate the variance of the image, so why not choose a more simple mean normalization method.

Viewpoint Reference Blog: Deep Learning-Data preprocessing (http://blog.csdn.net/dcxhun3/article/details/47999281)

* * Note one: The **make_imagenet_mean.sh setting is the same as the previous create_imagenet.sh principle, but one thing to understand is that the mean file is only for the training set, not for the validation set!
* * Note two: **create_imagenet.sh is calculated on the Lmdb file, not on the original image.

1. make_imagenet_mean.sh File Modification

EXAMPLE=/caffe/examples/lmdb_test/trainDATA=/caffe/examples/lmdb_test/trainTOOLS=/caffe/build/tools

– above three, in the middle of the data in particular, the data in this file is the path to store the mean file, so you can be a little bit arbitrary

$TOOLS$EXAMPLE/train_lmdb   $DATA/imagenet_mean.binaryproto

– Be careful here,

$EXAMPLE/caffe/examples/lmdb_test/train/train_lmdb

The example here is needed for your training set Lmdb path
$DATA represents the directory to generate the mean file, and the file name you can easily modify, storage path can be arbitrary.

Then run as before.

2, Mean.binaryproto turn mean.npy

When working with the C + + interface of Caffe, the required image mean file is PB format, for example, the common mean file name is Mean.binaryproto, but when you operate with the Python interface, the image mean file you need is numpy format, such as Mean.npy. So when working across languages, you need to convert Mean.binaryproto to Mean.npy
(Reference Blog: Caffe Learning Series--tools: Calculating the image mean of a data set)

3. Error

报错一：Check failed: size_in_datum == data_size (33266717787Incorrectdata field size 332667

Mean file size does not match, indicating that your picture size is not uniform, there is no unified word will appear such an error.

Details of the blog: Caffe actual running problems (continuous update

Workaround: generate Lmdb Stage resize=true, the author pro-Test, available!!

Reference blog:

1. Caffe Study Notes (iv) Convert your JPG data to Lmdb format (http://lib.csdn.net/article/deeplearning/55138)
2, (original) Caffe in the image generation Lmdb format data (http://www.cnblogs.com/darkknightzh/p/5909121.html)

Caffe︱ build Lmdb datasets and set up a fine-grained solution for each file path name

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More