Training Kitti Datasets with YOLO

Source: Internet
Author: User


Other articles Http://blog.csdn.net/baolinq



The last time I wrote an article about using YOLO to train an VOC dataset, the Portal (http://blog.csdn.net/baolinq/article/details/78724314). But you can't always use just one dataset and use a few datasets to see the results. Because I am mainly in the vehicle and pedestrian detection. Just Kitti data set is a public authoritative data set for unmanned driving, including a large number of roads, city streets and other road traffic data sets, the official website (http://www.cvlibs.net/datasets/kitti/index.php), is in line with my requirements.



1. Download the Kitti data set



The Kitti data is taken with multiple cameras on the roof, and I'm only taking Leftimage, the first, the link (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_ benchmark=2d), a bit large, about 12G. Click on the link will let you fill in a mailbox, the real download link will be sent to your mailbox, as well as a label file. Only the training picture has the label, the test picture has no label, now do not know how to use, have known the great God Guidance. I am doing this for the time being, I will divide the train picture into Train,val, and test three parts



2. Convert Kitti Data set to VOC dataset format



If you do not modify the source code of the YOLO network, we know that it needs to be a training set of VOC data format. So the VOC data set format is so kind of. Simply take a look:








--VOC  
      --annotations  
      --imagesets  
        --main  
       --layout  
       --segmentation  
     --jpegimages  
     -- Segmentationclass  
      






The back two are segmented, without the care of them. The folders used here are annotation, imagesets, and Jpegimages. Where the folder annotation in the main XML file, each XML corresponding to an image, and each XML is stored in the target of each tag location and category information, the name is usually the same as the corresponding original image, and imagesets we only need to use the main folder, This is stored in a number of text files, usually train.txt, test.txt, etc., the text file inside the content is required to train or test the name of the image (no suffix no path); Jpegimages folder we have the original image named after the uniform rules.






And what about the Kitti data set?



The Kitti tag information is a txt with the contents as follows:



Car 0.00 0-1.67 642.24 178.50 680.14 208.68 1.381.49 3.32 2.41 1.66 34.98-1.60
Car 0.00 0-1.75 685.77 178.12 767.02 235.21 1.50 1.62 3.89 3.27 1.67 21.18-1.60






The meaning of the above content is as follows: (Screenshot from HTTPS://GITHUB.COM/NVIDIA/DIGITS/BLOB/V4.0.0-RC.3/DIGITS/EXTENSIONS/DATA/OBJECTDETECTION/README.MD)










This section references http://blog.csdn.net/jesse_mx/article/details/65634482,



Probably understand the format of the data, we start to transform it. First set up a series of folders:



# Create a new Kittidevkit/kitti two-tier subdirectory under the scipts/folder with the required files in kitti/



annotations/



└──000000.xml



imagesets/



└──main/



└──trainval.txt



└──test.txt#, wait.



jpegimages/



└──000000.png



kitti_labels/



└──000000.txt # Self-built folder, storing the original callout information, to be converted to XML, not in the VOC format



create_train_test_txt.py# 3 Python Tools, detailed in the following sections



modify_annotations_txt.py



txt_to_xml.py






Screenshots are as follows:










The Code tool I will give the download link at the end of the text.



First step, category conversion



PASCAL VOC datasets A total of 20 categories, if used in a particular scenario, 20 categories are indeed many. The blogger set up 3 categories for the data set, ' car ', ' cyclist ', ' pedestrian ', except that there are other types of cars and people in the message, directly skipping a bit of waste, bloggers want ' Van ', ' Truck ', ' Tram ' merged into ' Car ' category, merge ' person_sitting ' into the ' Pedestrian ' category (' Misc ' and ' dontcare ' are two types of direct ignoring). The modify_annotations_txt.py tool is used here. Step two: Convert the TXT callout information to XML format



After processing the original TXT file, the next step is to convert the callout file from txt to XML and remove the unused portions of the callout information, leaving only 3 classes, and converting the coordinate values from float to int, and finally all the generated XML files are stored in the annotations folder. Here is the txt_to_xml.py tool, which is modified by KITTI_SSD's code, thanks to the author's contribution. Step three: Generate a training validation set and a list of test sets



The data set for the Pascal VOC format for SSD training is a total of three chunks: the first is the Jpegimages folder, where all PNG images are placed, and then the annotations folder, the above steps have generated the corresponding XML file The last is the Imagessets folder, which has a main subfolder, which holds the training validation set, the test set of the relevant list file, as shown in the following figure:










The data conversion is basically done here.



But it can't be trained directly. There is a voc_label.py file under the Scripts folder, and our converted data cannot be used directly with the voc_label.py file. Some minor changes are needed. I changed it to kitti_label.py file, at the end of the article there is a link to download. There is no key change, just to adapt the data set so that it can correctly generate the TXT file for the image path. Running this py file will regenerate the labels file, txt format, inside the tag information is adjusted to the image proportional to the size. Will generate a label folder inside the Groundtruth of each picture of the box coordinate information txt format, but also in the scripts directory generated train.txt,val.txt,test.txt files, record the absolute path of the picture, easy for the program to read in bulk.



Back then is similar to the previous training of the VOC Data set blog, according to the actual situation to adjust network parameters, including CFG files, names files, data files. http://blog.csdn.net/baolinq/article/details/78724314



Finally there is a very important detail, do not neglect, the VOC dataset is JPG format, and the Kitti DataSet is in PNG format.






We need to see how the source code reads the TXT file for the image and tag information. Very simple Before we in the Train.txt file only recorded the absolute path of all training pictures, and did not record the path of the labels file, so we want to get labels path through the path of the picture, because if we are placed in a standard way file, then labels file location and PNG file location is very similar, a rule Law can be found, just replace some folders and suffix name.



Here should be able to normal training, this time my maxbatch set to 40000, training for more than 10 hours. The test results can be seen: Training Command: Darknet.exe detector train data/kitti.data cfg/kitti.cfg Pre_model/darknet19_448.conv.23-gpus 0,1


















The evaluation results, the detection recall is 77.33%, the accuracy rate is 89.96%, (the data seems to have a problem.) Looks good, but does not work well when actually detecting pictures, especially for datasets other than Kitti. Reasons to study, welcome to the comment area guidance message exchange.



Different function is detector behind the parameters are not the same, there are train,test,recall,demo and so on, you can see the source detector.c
If you're just testing a single image, use the test command, like Darknet.exe detector test data/kitti.data cfg/kitti.cfg Weights/kitti_40000.weights-gpus 0,1
If you are testing the accuracy and recall of the entire test set, use the recall command. Similar to this darknet.exe detector recall Data/kitti.data cfg/kitti.cfg Weights/kitti_40000.weights-gpus 0,1
Test Video with demo command, similar to Darknet.exe detector demo Data/voc-3.data cfg/yolo-3.cfg Weights/yolo-3_final.weights-gpus 1 test_ Image/123.mp4



Reference documents



[1]https://pjreddie.com/darknet/yolo/website offers a lot of well-trained models, very conscientious



[2]https://arxiv.org/abs/1612.08242 YOLO v2 Thesis Original



[3] https://github.com/pjreddie/darknet GitHub source



[4]http://www.cvlibs.net/datasets/kitti/index.php Kitti Data Set official website



[5] There are plenty of good blogs to thank for their contributions






Kitti to VOC Data set tool code download link http://download.csdn.net/download/baolinq/10181761











Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.