Caffe Study Notes (iv) Convert your JPG data to Lmdb format

Source: Internet
Author: User
Tags stack trace


1 Introduction1-1 Take example_mnist as an example, how do I load my own test set?


First, a question is thrown: in the case of Example_mnist, the test set is given. So what if we want to try to write some numbers ourselves and verify the recognition effect?



Observing the Lenet_train_test.prototxt file under caffe_root/examples/mnist/, it is found that both the path of the training set and the path of the test set are given. So the answer is obviously, we can make our own test set in Leveldb (or Lmdb) format, and then give it the path in Lenet_train_test.prototxt, and then follow the last blog to do that.


1-2 How do I turn my data into Leveldb/lmdb format? Several references


How do you make LEVELDB format for your own training sets and test sets (for example, some images that write our own numbers)?



I need to go to the format, because I have a small experiment I want to do, "using a simple neural network to achieve the two-yuan two-function fitting", so that the input data will be LEVELDB or lmdb format, you need to understand this to get training sets and test sets. I have not tried this question, but there are a lot of blogs on the internet have related instructions, and now a few I think can be studied:



[1] CSDN Blog "Caffe Neural network framework of the auxiliary tools (the image to LEVELDB format)" (the blog with C + + source code, but very difficult to understand);



[2] The question "Caffe how to turn their data into Lmdb or Leveldb-beanfrog answer" (There is Python code, but involves more unfamiliar problems, but also to study how Caffe in Python, For example, how is the import Caffe package generated? )。



[3] Shikayu "study note 3 with their own data training and testing" caffenet "(this blog I refer to his content, his blog is [4] the detailed version, written very well).



[4]caffe official website "caffe| ImageNet Tutorial (at the beginning there is a phrase: This guide is meant to get a ready to train your own model on your own data.)


2 Move your image data into Lmdb format2-1 Get the names of all images within a folder


(1) First create a new folder called "Batch Name", and then put some pictures in it:





(2) Next Start menu →cmd, all the way to the CD index path until the directory "Batch Name":





(3) Enter command "Dir/s/on/b>d:/train.txt", will generate a text file named train in the D disk, which contains the "Batch Name" in the path of all the images.





(4) You can use the Find and replace function, so that the above content is modified to "file name label" In the form of:



① index "C:\Users\LJJ\Desktop\ test diagram \caffe experiment \ Batch Name \" replaced with empty.



The equivalent of removing these paths, preserving only the file names;



The ② index "JPG" is replaced with "JPG 1". Here 1 is the label, which refers to the category of the image.





A little trick is not to mix all the training data, but to put them into separate folders. For example, for handwritten numbers, we should create 10 folders, then put 0 all in a folder "0", 1 All in a folder "1" ...



Next use the instructions in the previous (3) to each folder to create a TXT file for the image name, and finally set up a good will get train0.txt~train9.txt 10 files. For each TXT file, according to its category, and then find and replace the time to replace the content is not the same. For example, for Train3.txt, we will find "JPG" replaced with "JPG 3", for Train5.txt, we will find "JPG" replaced with "jpg 5".



When all train0.txt~train9.txt are found and replaced, they are then consolidated into a TXT document named Train.txt.


2-2 Resize an image of any size into a 256*256


The official website and Shikayu notes are given a piece of code, said can be made into the sh file and run the image can be turned into 256*256, but I tried for a long time did not succeed.



To achieve this goal, I decided to use the Cvresize function provided in the OPENCV to accomplish this. The reason why choose OpenCV is because before my graduation design is OPENCV so more familiar with, see online more people use MATLAB, have the opportunity also can a try.



Because here is the detailed writing is also a blog space, so I opened a separate blog dedicated to this issue, please step "OpenCV play (a) batch resize all the images in a folder."


2-3 generating Convert_imageset.exe


In the Caffe_root directory there is a folder called Tools, which has a CPP file called Convert_imageset.cpp, using this CPP file to generate an EXE file Convert_imageset.exe. Please step into the Caffe study Note (iii) generate the required EXE file under VS2013.


2-4 generating Lmdb files


Create a folder in the data directory called myself, here the assorted with this event related to the file all in, there: Training set train (a folder to put all the training pictures), Test set Val (a folder with all the test pictures), Train.txt,val.txt,test.txt. The latter two have identical filenames, except that they are val.txt with labels and test.txt without labels.



Copy the next file named Create_imagenet.sh to the myself, and then set the path to the caffe_root\examples\imagenet.









The thought used here is the same as in the previous OpenCV play (a) batch resize all images in a folder, are string concatenation. Where data and tools are path headers, the complete path is combined with the file names that follow. To put it bluntly, it is to fill in the training set and test set three absolute paths + a folder name to be generated under Glog, these three absolute paths are:



(1) The absolute path of the Convert_imageset.exe;



(2) The absolute path of the train.txt/val.txt;



(3) The absolute path of the train folder/test set Val folder for the training set that holds the picture.



And the code above about resize can actually get rid of it, because we've resize the image beforehand. A simplified version of the create_imagenet.sh can be written as follows, and its effect is identical:





After you complete the path setting, double-click the sh file to generate the Imagenet_train_lmdb and Imagenet_val_lmdb folders under the myself folder.


3 Unresolved issues3-1 generating lmdb files times Check failure


When generating the Lmdb file, its log is as follows:


Creating train lmdb...

*** Check failure stack trace: ***

I0512 16:23:45.290897  1184 convert_imageset.cpp:83] Shuffling data

I0512 16:23:45.290897  1184 common.cpp:32] System entropy source not available, using fallback algorithm to generate seed instead.

I0512 16:23:45.290897  1184 convert_imageset.cpp:86] A total of 12 images.

F0512 16:23:45.306498  1184 db_lmdb.hpp:14] Check failed: mdb_status == 0 (112 vs. 0)

Creating val lmdb...

*** Check failure stack trace: ***

I0512 16:23:45.509299  3544 convert_imageset.cpp:83] Shuffling data

I0512 16:23:45.509299  3544 common.cpp:32] System entropy source not available, using fallback algorithm to generate seed instead.

I0512 16:23:45.509299  3544 convert_imageset.cpp:86] A total of 4 images.

F0512 16:23:45.509299  3544 db_lmdb.hpp:14] Check failed: mdb_status == 0 (112 vs. 0)

Done.


Say there is a "check failed", then will this check failure affect the generated Lmdb file? Whether this error occurred with the previous configuration when I put "in the db.cpp to make the following changes ..." Check_eq "This step has been removed from the relevant?" And how can we eliminate this error?


3-2 Take the trained lenet network to test the Lmdb file, fail


Try to use this method to generate a batch of handwritten numerals (including tags) of the Lmdb file, and then use the method caffe the second blog to get the training network Lenet_iter_10000.caffemodel to the Lmdb file test, a look at the result is a failure, its log as follows:


(Everything in the previous log is normal. I started to intercept it after opening lmdb.)

I0512 17: 13: 10.089304 188 db_lmdb.cpp: 38] Opened lmdb

D: / MachineLearning / CAFFE_ROOT / data / myself / imagenet_train_lmdb

I0512 17: 13: 10.089304 188 data_reader.cpp: 114] Restarting data prefetching from start.

F0512 17: 13: 10.089304 2492 data_transformer.cpp: 465] Check failed: datum_channels> 0 (0 vs. 0)

I0512 17: 13: 10.089304 188 data_reader.cpp: 114] Restarting data prefetching from start.

I0512 17: 13: 10.089304 188 data_reader.cpp: 114] Restarting data prefetching from start. 


The point is not knowing how the test data is done, with labels or without labels? Where are val.txt and test.txt used separately? Why did I use val.txt instead of test.txt when I generated the Lmdb file? And where does that test.txt go? These questions hang here first, I'm ready to follow Shikayu's third note to do it again.


3-3 How to generate leveldb files?


This test generates a Lmdb file, but there is no corresponding option in the configuration file create_imagenet.sh for how the Leveldb file is generated. Can the answer be found in the convert_imageset.cpp?



Hope that in the follow-up study can solve these three problems.



2016.5.12



By Yau Wang Nanshan



Caffe Study Notes (iv) Convert your JPG data to Lmdb format


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.