Training Kitti Datasets with YOLO

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Other articles Http://blog.csdn.net/baolinq

The last time I wrote an article about using YOLO to train an VOC dataset, the Portal (http://blog.csdn.net/baolinq/article/details/78724314). But you can't always use just one dataset and use a few datasets to see the results. Because I am mainly in the vehicle and pedestrian detection. Just Kitti data set is a public authoritative data set for unmanned driving, including a large number of roads, city streets and other road traffic data sets, the official website (http://www.cvlibs.net/datasets/kitti/index.php), is in line with my requirements.

1. Download the Kitti data set

The Kitti data is taken with multiple cameras on the roof, and I'm only taking Leftimage, the first, the link (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_ benchmark=2d), a bit large, about 12G. Click on the link will let you fill in a mailbox, the real download link will be sent to your mailbox, as well as a label file. Only the training picture has the label, the test picture has no label, now do not know how to use, have known the great God Guidance. I am doing this for the time being, I will divide the train picture into Train,val, and test three parts

2. Convert Kitti Data set to VOC dataset format

If you do not modify the source code of the YOLO network, we know that it needs to be a training set of VOC data format. So the VOC data set format is so kind of. Simply take a look:




--VOC  
      --annotations  
      --imagesets  
        --main  
       --layout  
       --segmentation  
     --jpegimages  
     -- Segmentationclass

The back two are segmented, without the care of them. The folders used here are annotation, imagesets, and Jpegimages. Where the folder annotation in the main XML file, each XML corresponding to an image, and each XML is stored in the target of each tag location and category information, the name is usually the same as the corresponding original image, and imagesets we only need to use the main folder, This is stored in a number of text files, usually train.txt, test.txt, etc., the text file inside the content is required to train or test the name of the image (no suffix no path); Jpegimages folder we have the original image named after the uniform rules.

And what about the Kitti data set?

The Kitti tag information is a txt with the contents as follows:

Car 0.00 0-1.67 642.24 178.50 680.14 208.68 1.381.49 3.32 2.41 1.66 34.98-1.60
Car 0.00 0-1.75 685.77 178.12 767.02 235.21 1.50 1.62 3.89 3.27 1.67 21.18-1.60

The meaning of the above content is as follows: (Screenshot from HTTPS://GITHUB.COM/NVIDIA/DIGITS/BLOB/V4.0.0-RC.3/DIGITS/EXTENSIONS/DATA/OBJECTDETECTION/README.MD)

This section references http://blog.csdn.net/jesse_mx/article/details/65634482,

Probably understand the format of the data, we start to transform it. First set up a series of folders:

# Create a new Kittidevkit/kitti two-tier subdirectory under the scipts/folder with the required files in kitti/

annotations/

└──000000.xml

imagesets/

└──main/

└──trainval.txt

└──test.txt#, wait.

jpegimages/

└──000000.png

kitti_labels/

└──000000.txt # Self-built folder, storing the original callout information, to be converted to XML, not in the VOC format

create_train_test_txt.py# 3 Python Tools, detailed in the following sections

modify_annotations_txt.py

txt_to_xml.py

Screenshots are as follows:

The Code tool I will give the download link at the end of the text.

First step, category conversion

PASCAL VOC datasets A total of 20 categories, if used in a particular scenario, 20 categories are indeed many. The blogger set up 3 categories for the data set, ' car ', ' cyclist ', ' pedestrian ', except that there are other types of cars and people in the message, directly skipping a bit of waste, bloggers want ' Van ', ' Truck ', ' Tram ' merged into ' Car ' category, merge ' person_sitting ' into the ' Pedestrian ' category (' Misc ' and ' dontcare ' are two types of direct ignoring). The modify_annotations_txt.py tool is used here. Step two: Convert the TXT callout information to XML format

After processing the original TXT file, the next step is to convert the callout file from txt to XML and remove the unused portions of the callout information, leaving only 3 classes, and converting the coordinate values from float to int, and finally all the generated XML files are stored in the annotations folder. Here is the txt_to_xml.py tool, which is modified by KITTI_SSD's code, thanks to the author's contribution. Step three: Generate a training validation set and a list of test sets

The data set for the Pascal VOC format for SSD training is a total of three chunks: the first is the Jpegimages folder, where all PNG images are placed, and then the annotations folder, the above steps have generated the corresponding XML file The last is the Imagessets folder, which has a main subfolder, which holds the training validation set, the test set of the relevant list file, as shown in the following figure:

The data conversion is basically done here.

But it can't be trained directly. There is a voc_label.py file under the Scripts folder, and our converted data cannot be used directly with the voc_label.py file. Some minor changes are needed. I changed it to kitti_label.py file, at the end of the article there is a link to download. There is no key change, just to adapt the data set so that it can correctly generate the TXT file for the image path. Running this py file will regenerate the labels file, txt format, inside the tag information is adjusted to the image proportional to the size. Will generate a label folder inside the Groundtruth of each picture of the box coordinate information txt format, but also in the scripts directory generated train.txt,val.txt,test.txt files, record the absolute path of the picture, easy for the program to read in bulk.

Back then is similar to the previous training of the VOC Data set blog, according to the actual situation to adjust network parameters, including CFG files, names files, data files. http://blog.csdn.net/baolinq/article/details/78724314

Finally there is a very important detail, do not neglect, the VOC dataset is JPG format, and the Kitti DataSet is in PNG format.

We need to see how the source code reads the TXT file for the image and tag information. Very simple Before we in the Train.txt file only recorded the absolute path of all training pictures, and did not record the path of the labels file, so we want to get labels path through the path of the picture, because if we are placed in a standard way file, then labels file location and PNG file location is very similar, a rule Law can be found, just replace some folders and suffix name.

Here should be able to normal training, this time my maxbatch set to 40000, training for more than 10 hours. The test results can be seen: Training Command: Darknet.exe detector train data/kitti.data cfg/kitti.cfg Pre_model/darknet19_448.conv.23-gpus 0,1

The evaluation results, the detection recall is 77.33%, the accuracy rate is 89.96%, (the data seems to have a problem.) Looks good, but does not work well when actually detecting pictures, especially for datasets other than Kitti. Reasons to study, welcome to the comment area guidance message exchange.

Different function is detector behind the parameters are not the same, there are train,test,recall,demo and so on, you can see the source detector.c
If you're just testing a single image, use the test command, like Darknet.exe detector test data/kitti.data cfg/kitti.cfg Weights/kitti_40000.weights-gpus 0,1
If you are testing the accuracy and recall of the entire test set, use the recall command. Similar to this darknet.exe detector recall Data/kitti.data cfg/kitti.cfg Weights/kitti_40000.weights-gpus 0,1
Test Video with demo command, similar to Darknet.exe detector demo Data/voc-3.data cfg/yolo-3.cfg Weights/yolo-3_final.weights-gpus 1 test_ Image/123.mp4

Reference documents

[1]https://pjreddie.com/darknet/yolo/website offers a lot of well-trained models, very conscientious

[2]https://arxiv.org/abs/1612.08242 YOLO v2 Thesis Original

[3] https://github.com/pjreddie/darknet GitHub source

[4]http://www.cvlibs.net/datasets/kitti/index.php Kitti Data Set official website

[5] There are plenty of good blogs to thank for their contributions

Kitti to VOC Data set tool code download link http://download.csdn.net/download/baolinq/10181761

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Training Kitti Datasets with YOLO

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Training Kitti Datasets with YOLO

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support