Convert your own image data to Caffe required db (Leveldb/lmdb) file
After setting up the Caffe environment, we often need to train/test our image data, our image data often when the picture file, such as Jpg,jpeg,png, but in Caffe we need to use the type of data is Lmdb or LEVELDB, For example: Before testing the Mnist DataSet "deep learning Article 2: Using the Mnist DataSet to verify Caffe installation Success", we ran the script create_mnist.sh to generate the corresponding DB file, which was run after the ~/caffe/examples/mnist In the/mnist_train_lmdb directory, you can see the generated DB file:
So before we train/test our own image data, we need to convert it into a DB file that the Caffe framework can use directly, and this post will explain in detail how to convert.
1. Create a picture manifest file
First we need to create a list of our own picture data set TXT file, here we first take Caffe with two pictures as an example, in the Caffe directory/examples/images, There are two cat.jpg and FISH-BIKE.JPS, and we will use these two images as Category 1 and Category 2 respectively. Then we need to create a SH script file to generate the picture list:
cd ~/caffe/sudo gedit examples/images/create_filelist.sh
Edit the following in the file:
DATA=Examples/imagesecho"Create train.txt ..."Rm-RF $DATA/train.Txtfind$DATA -name *Cat.Jpg|Cut- D '/' -f3 |Sed"s/$/1/">>$DATA/train.Txtfind$DATA -name *Bike.Jpg|Cut- D '/' -f3 |Sed"s/$/2/">>$DATA/tmp.Txtcat$DATA/tmp.Txt>>$DATA/train.Txtrm-RF $DATA/tmp.Txtecho"done.."
The students who are familiar with the Linux command should be able to understand the meaning of this script file, and the students who are not quite sure can follow the following commands:
- RM: Deleting files
rm -rf $DATA/train.txt
Delete train.txt file under folder
- Find: Find Files
- Cut: Intercept Path
- SED: Add labels at the end of each line, for example, add label Category 1 to the cat.jpg, and add callout Category 2 to the Bike.jpg file.
find $DATA -name *cat.jpg | cut -d ‘/‘ -f3 | sed "s/$/ 1/">>$DATA/train.txt
The meaning is to find the Cat.jpg file, and intercept the filename, after the file name with the label 1, and put it into the Train.txt file, the next line of command the same.
- Cat: Merges the contents of the file into a single file.
cat $DATA/tmp.txt>>$DATA/train.txt
This means that the commands for category 2 in Tmp.txt are combined into the Train.txt file.
After editing is complete save, and then execute the command to generate the corresponding train.txt, the command is as follows:
cd ~/caffe/sudo sh examples/images/create_filelist.sh
The execution process is as follows:
After execution,/examples/images/can see the generated train.txt file under the path of the previous script file configuration, such as the following:
You can see the corresponding category of the picture. If the number of pictures is very small, you can use manual writing, if the picture is particularly large, you need to use a script generated, here generated the Train.txt file, Val.txt and Test.txt can be generated by the same principle.
2. Generate the corresponding DB file using the manifest file
In Caffe, under the Tools folder in the root directory, there is a file convert_imageset.cpp, compiled, the resulting executable file in the build/tools/directory, This file can be used to convert eggs into DB files that can be used directly by the Caffe framework. The command line for this file is used as follows:
convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME
Four of these parameters have the following meanings:
- FLAGS: Picture Parameter Group
- Gray: Whether to open the picture as a grayscale image. The program calls the Imread () function in the OpenCV library to open the picture, which defaults to false.
- Shuffle: Random Shuffle of picture order. The default is False.
- Backend: Convert to DB file format, optional leveldb or Lmdb, default to LMD.
- Resize_width/resize_height: Change the size of the picture. In operation, all pictures are required to be of the same size, so the image size needs to be changed. The program calls the OpenCV Library's resize () function to zoom in on the image, which defaults to 0 and does not change.
- Check_size: Check that all data has the same size. The default is false, not checked.
- Encoded: If the original image encoding is placed in the final data, the default is False.
- Encode_type: Corresponding to the previous parameter, encode the picture into which format: ' png ', ' jpg ' and so on.
- rootfolder/: Absolute path of picture storage, starting from the root of the Linux system (not caffe root directory, need image storage absolute path)
- ListFile: A list of picture files, usually a txt file, one picture at a line
- Db_name: The final generated DB file storage directory
Next we create the script to implement the transformation:
cd ~/caffe/sudo gedit examples/images/create_lmdb.sh
Then edit the generated SH file with the following contents:
data = examples/imagesrm -rf $DATA /img_ Train_lmdbbuild/tools/convert_imageset -- Shuffle -- resize_height= 256 -- resize_width= 256 /home/moqi/caffe/examples/images/ $DATA /train TXT $DATA /img_train_lmdb
wherein, set the parameter-shuffle, disturb the picture order. Set parameters-resize_height and-resize_width to change all picture sizes to 256*256. /home/moqi/caffe/examples/images/the absolute path to the picture, which needs to be replaced by the path of your computer, such as your Caffe directory in/home/xx/caffe, you need to replace it with Home/xx/caffe /examples/images/, of course, you can also specify any location on the computer, and be careful to replace the absolute path.
Finally, run the saved script file that you just edited:
cd ~/caffe/sudo
The running process is as follows:
The Img_train_lmdb folder is generated under the examples/images/directory, which is the data set required for the Caffe operation.
When you open the Img_train_lmdb folder, you may encounter insufficient permissions, just change the permissions of the folder:
sudo777 img_train_lmdb/
Do not know the students can self-Baidu Linux permissions to modify the command.
3. Example: Complete the conversion of your own training/test picture into a Caffe DataSet 3.1 data set display
Here we will five kinds of pictures, each category 100, of which 80 as training, 20 as a test, the picture category labeled 3,4,5,6,7, the picture name corresponds to not 301,302 and so on. We put the picture in the Caffe directory under the Data folder, the folder name is Moqi, the folder contains two folders: the Test and Train,test folder to hold 100 test pictures, train folder to hold 400 training pictures. As shown in the following:
There are more data sets on the Internet, we can download them by ourselves, and store them well in the format.
3.2 Generating training and test manifest files
Under the examples directory under the Caffe root directory, create a myfile folder to store the configuration files and script files. Then write a script create_filelist.sh to generate the Train.txt and test.txt manifest files.
cd ~/caffe/sudo mkdir examples/myfilesudo gedit examples/myfile/create_filelist.sh
Edit the file with the following contents:
Data=data/moqi/my=examples/myfileEcho "Create train.txt ..."Rm-rf$MY/train.txt forIinch 3 4 5 6 7 DoFind$DATA/train-name$i*.jpg | Cut- D '/' - F4-5| Sed"s/$/ $i/">>$MY/train.txt DoneEcho "Create test.txt ..."Rm-rf$MY/test.txt forIinch 3 4 5 6 7 DoFind$DATA/test-name$i*.jpg | Cut- D '/' - F4-5| Sed"s/$/ $i/">>$MY/test.txt DoneEcho " all Done"
After editing is complete, save the exit and then go back to the Caffe root to run the script:
cd ~/caffe/sudo sh examples/myfile/create_filelist.sh
The execution process is as follows:
After execution, Train.txt and test.txt two text files are generated under the examples/myfile/folder in the Caffe root directory, which is a list of the pictures. Such as:
The contents of the Train.txt file are as follows:
The contents of the Test.txt file are as follows:
3.3 Convert to Caffe required DB file
Create a new script file to implement the transformation:
cd ~/caffe/sudo gedit examples/myfile/create_lmdb.sh
Edit the file with the following contents:
MY=Examples/myfileecho"Create train Lmdb ..."Rm-RF $MY/img_train_lmdbbuild/tools/convert_imageset--Shuffle--Resize_height= the --Resize_width= the/home/moqi/caffe/Data/moqi/ $MY/train.Txt$MY/img_train_lmdbecho"Create test Lmdb."Rm-RF $MY/img_test_lmdbbuild/tools/convert_imageset--Shuffle--Resize_width= the --Resize_height= the/home/moqi/caffe/Data/moqi/ $MY/test.Txt$MY/img_test_lmdbecho"all done."
Here you need to note the path, absolute path remember is your own absolute path, modify can.
After editing is complete save exit, execute script file:
cd ~/caffe/sudo sh examples/myfile/create_lmdb.sh
The execution process is as follows:
After execution, two folders Img_train_lmdb and Img_test_lmdb are generated under Examples/myfile in the Caffe root directory, respectively, to save the image converted Lmdb file.
At this point, all the conversion process is completed, the reader can be based on their own data set, a slight modification of the script file can complete their own picture of the aggregation of the conversion, very convenient.
Deep Learning Article 3: Converting your own image data into Caffe required db (Leveldb/lmdb) files