Kitti Data Set

Source: Internet
Author: User
Tags benchmark

Kitti Data Set

Thank you for your collection.

Source: https://blog.csdn.net/solomon1558/article/details/70173223


Abstract: This article incorporates the is we ready for autonomous Driving? The Kitti Vision Benchmark Suite and Vision meets Robotics:the Kitti DataSet two papers, the main introduction Kitti DataSet overview, data acquisition Platform, DataSet detailed description, Evaluation criteria and specific use cases. This paper provides a more detailed and comprehensive introduction to the Kitti data set, focusing on the use of Kitti data sets for various research and experiments. 1.KITTI Data Set Overview

The Kitti DataSet, co-founded by the German Institute of Technology in Karlsruhe and the Toyota-American Academy of Technologies, is currently the world's largest computer vision evaluation dataset for automated driving scenarios. The data set is used to evaluate stereoscopic images (stereo), optical flow (optical flow), visual ranging (visual Odometry), 3D object Detection (object detection) and 3D tracking (tracking) The performance of computer vision technology in the vehicle environment. The Kitti contains real-world image data captured in urban, rural and expressway scenes, with up to 15 cars and 30 pedestrians in each image, as well as various levels of occlusion and truncation. The entire data set consists of 389 pairs of stereoscopic images and optical flow graphs, 39.2 km visual ranging sequences and images of over 200k 3D labeled objects [1], sampled and synchronized at 10Hz frequencies. Overall, the original datasets were classified as ' Road ', ' City ', ' residential ', ' Campus ' and ' person '. For 3D object detection, the label subdivision is composed of car, van, truck, pedestrian, pedestrian (sitting), cyclist, tram, and misc. 2. Data Acquisition Platform

As shown in Figure 1, the data acquisition platform of the Kitti dataset is equipped with 2 grayscale cameras, 2 color cameras, a Velodyne 64-wire 3D LiDAR, 4 optical lenses, and 1 GPS navigation systems. The specific sensor parameters are as follows [2]: 2xPointGray Flea2 Grayscale Cameras (fl2-14s3m-c), 1.4 megapixels, "Sony ICX267 CCD, Global shutter 2xPointGray Flea2 Color Cameras (fl2-14s3c-c), 1.4 megapixels, "Sony ICX267 CCD, global shutter 4xEdmund Optics le Nses, 4mm, opening angle∼90◦, vertical opening angle of region of interest (ROI) ∼35◦1xvelodyne hdl-64e Rotating 3D Laser scanner, ten Hz, beams, 0.09 degree angular resolution, 2 cm distance accuracy, collecting∼1.3 million points/se Cond, field of View:360◦horizontal, 26.8◦vertical, range:120 m 1xOXTS RT3003 inertial and GPS navigation system, 6 a XIs, Hz, L1/l2 RTK, resolution:0.02m/0.1◦

Figure-1 Data acquisition platform
Figure 2 shows the configuration plan for the sensor. In order to generate binocular stereoscopic images, the same type of camera is installed 54cm apart. Because the color camera's resolution and contrast is not good enough, it also uses two stereo grayscale cameras, which are installed 6cm apart from the color camera. In order to facilitate sensor data calibration, the coordinate system direction is specified as follows [2]:
camera:x = right, y = down, z = Forward
velodyne:x = forward, y = left, z = up
gps/imu:x = forward, y = left, z = up

Figure-2 sensor settings3.Dataset Details

Figure 3 shows a typical sample of the Kitti dataset, divided into ' Road ', ' City ', ' residential ', ' Campus ' and ' person ' five categories. The original data was collected in 2011 for 5 days, with a total of 180GB.

Figure 3 A sample of the Kitti dataset that shows the diversity of the Kitti dataset. 3.1 Data Organization

The data organization mentioned in [2] is probably an earlier version, which is different from the present Kitti data set official website, this article is introduced.
As shown in Figure 4, all sensor data for a video sequence is stored in the Data_drive folder, where date and drive are placeholders that represent the date and video number of the data being collected. The timestamp is recorded in the Timestamps.txt file.

Figure-4 Data organization
The file organization of each sub-task that is downloaded from the official website of the Kitti data set is simpler. As an example of object Detection, the following figure is the directory structure of the left color images file in the object Detection Evaluation 2012 standard DataSet, stored in testing and training datasets, respectively.

Data_object_image_2
|──testing
│└──image_2
└──training
└──image_2

The following figure is the label folder directory structure of the training dataset.

training/
└──label_2 3.2 Annotations

The

Kitti DataSet provides a 3D border callout (using a LiDAR coordinate system) for moving objects in the camera's field of view. The dataset's annotations are divided into 8 categories: ' Car ', ' Van ', ' Truck ', ' pedestrian ', ' person (sit-ting) ', ' cyclist ', ' Tram ' and ' Misc ' (e.g, trailers, SEGW ays). The paper [2] shows that the 3D labeling information is stored in Date_drive_tracklets.xml, and each object is labeled by its category and 3D dimensions (height,weight and length). Labels for the current dataset are stored in the label folder of each task sub-dataset, slightly different. &NBSP
In order to illustrate the labeling format of the Kitti dataset, this article takes the data set of the object detection task as an example. The data description is in the Readme.txt document of the object Development Kit. From the callout data link training labels of object data set (5 MB) to download data, unzip the file into the directory, each image corresponding to a. txt file. A frame image and its corresponding. txt callout file are shown in Figure 5.  
    
Figure-5 object Detection sample and callout  
to understand the meaning of each field of the callout file, you need to read the Readme.txt file. The file is stored in the Object Development Kit (1 MB) file, and the Readme details the sample size of the child dataset, the number of label categories, the format of the file organization, the format of the label, and the method of evaluation. The following describes the label description of the data format:  
     
Note that the ' dontcare ' label indicates that the area is not labeled, for example, because the target object is too far away from the lidar. The evaluation script automatically ignores the predictions for the ' dontcare ' area in order to prevent a false positive (false positives) for areas that would otherwise have been targeted but not labeled for some reason during the evaluation process (mainly computational precision). 3.3 Development Kit

The

Kitti each sub-dataset provides development tools development kit, consisting mainly of CPP folders, MATLAB folders, mapping folders, and Readme.txt. The following figure takes the folder devkit_object of the object detection task as an example, and you can see that the CPP folder mainly contains the source code evaluate_object.cpp of the evaluation model. The file in the Mapping folder records the mapping of training sets to the original datasets so that developers can simultaneously use multi-modal data such as LiDAR point clouds, GPS data, right color camera data, and grayscale camera images. The tools in the Matlab folder include read-write labels, draw 2d/3d callout boxes, and run demo tools. Readme.txt file is very important, detailed description of a sub-dataset data format, Benchmark introduction, the results of evaluation methods and other details.

devkit_object 
|──cpp 
│|──evaluate_object.cpp 
│└──mail.h 
|──mapping  
│|──train_mapping.txt 
│└──train_rand.txt 
|──matlab 
│|──COMPUTEBOX3D.M  
│|──computeorientation3d.m 
│|──drawbox2d.m 
│|──drawbox3d.m 
│|── projecttoimage.m 
│|──readcalibration.m 
│|──readlabels.m 
│|──run_demo.m 
│| ──run_readwritedemo.m 
│|──run_statistics.m 
│|──visualization.m 
│└──WRITELABELS.M Strong>4. Evaluation Criteria Evaluation Metrics 4.1 Stereo and visual odometry tasks

The Kitti DataSet uses different evaluation criteria for different tasks. For stereoscopic images and optical flows (stereo and optical flow), the average number of error pixels is calculated based on disparity and end-point error (average numbers of erroneous pixels).
For visual ranging and SLAM tasks (visual Odometry/slam), evaluation is based on the error of the trajectory end point (trajectory end-point). The traditional method considers both translational and rotational errors, kitti separate evaluation [1]:

Figure 6 Stereo and optical flow predictions and evaluation 4.2 3D Object detection and direction prediction

Target detection needs to achieve both target positioning and target recognition tasks. The correctness of the target location is determined by comparing the size of the predicted border and the ground Truth border (intersection over Union,iou) and the threshold (e.g. 0.5), and the accuracy of the target recognition by the comparison between the confidence score and the threshold value. The above two steps comprehensively determine whether the target detection is correct, finally the multi-category target detection problem is converted to "a certain kind of object detection correct, detection error" of the two classification problem, so that the confusion matrix can be constructed, using a series of indicators of target classification to evaluate the accuracy of the model.
The Kitti DataSet evaluates the results of a single-class target detection model using the average correct rate (Average precision,map) used in the literature [3]. The PASCAL Visual Object Classes Challenge2007 (VOC2007) [3] DataSet uses Precision-recall curves for qualitative analysis, using average precision (AP) to quantitatively quantify model accuracy. Object detection and evaluation criteria for object inspection and error detection penalties, while the same object is repeated and the correct detection is counted only once, redundant detection is considered false (false positive).
For each sample and a given type of detector, the algorithm outputs the predicted results, indicating that there is an object in the image, and that the position information and the confidence distribution of each object are the same.
To evaluate the accuracy of the bounding box positioning, use the coincidence degree between the check box and the ground Truth box to measure:

If you indicate that the detected border and the image on the ground Truth match on that class, at this point. To prevent duplicate detection, if the border is more than 50% coincident with ground truth, the most coincident degree is considered to match. The detection algorithm duplicates the real object, only one of which is the correct prediction, and the remaining repeat prediction is considered as error detection.
For a given category, N is the number of real objects on all images of the class. The object detection task uses PR curve and AP value to evaluate the model accuracy, given different threshold T, get different recall rate and accuracy rate, thus can draw p-r curve, calculate AP value of each detector:

The evaluation accuracy AP is the integral value of the p-r curve, and when T is discrete, the AP is the average of the exact rate corresponding to the recall rate of the different T. To simplify the calculation, PASCAL VOC2007 uses an interpolation method that uses the average value of the exact values on the 11 equidistant recall as the AP for the classifier. Recall value [0,0.1,..., 1], the formula is as follows:

The exact value of each recall rate R is determined by the maximum interpolation of the exact ratio of the r corresponding to the value:

For the Kitti target detection task, simply evaluate the predicted result with a target height greater than 25pixel, treat the easily confused category as the same class to reduce the false positive (false positives) rate, and approximate the AP for the classifier using the average of the exact values on the 41 equidistant recall.
For object Direction prediction, a novel method is proposed in reference [1]: Mean direction similarity, Average Orientation similarity (AOS). The indicator is defined as:

where r represents the recall rate of object detection recall. Under dependent variable R, directional similarity s∈[0,1] is defined as the normalization of all predicted samples and ground truth cosine distances:

where D (r) indicates that all the predictions under recall R are positive samples, ∆θ (i) indicates the difference between the predicted angle of the detected object I and the ground truth. In order to punish multiple checkout matches to the same ground truth, if checkout I already matches to ground truth (Iou at least 50%) Set Δil = 1, otherwise δil = 0.5. Data usage Practices

The Kitti dataset is richer in labeling information and may require only a subset of fields in real-world use, or a format that needs to be converted to another dataset. For example, Kitti datasets can be converted to Pascal VOC format, making it easier to train with advanced detection algorithms such as faster RCNN or SSDs. Converting the Kitti dataset requires attention to the format of the source dataset and target dataset, the re-processing of category labels, implementation details recommendations reference JESSE_MX[4] and Manutdzou's open source project on GitHub [5], which describes the conversion Kitti dataset as Pascal VOC format, which facilitates the training of models such as faster RCNN or SSDs. Reference

[1] Andreas Geiger and Philip Lenz and Raquel Urtasun. is we ready for autonomous Driving? The Kitti Vision Benchmark Suite. CVPR, 2012
[2] Andreas Geiger and Philip Lenz and Christoph Stiller and Raquel Urtasun. Vision meets Robotics:the Kitti Dataset. IJRR, 2013
[3] M. Everingham, L.van Gool, C. K. I.williams, J.winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge (VOC2011) Results.
[4] Jesse_Mx.SD:Single Shot multibox Detector Training Kitti Data Set (1).
http://blog.csdn.net/jesse_mx/article/details/65634482
[5]MANUTDZOU.MANUTDZOU/KITTI_SSD.HTTPS://GITHUB.COM/MANUTDZOU/KITTI_SSD
Appendix

Figure 7 Frequency of different categories of objects appearing in datasets (above);
Histogram of main directional statistics for two main categories (vehicles, pedestrians) (below)

Fig. 8 Frequency statistics of different categories of objects in each picture.

Figure 9 is the statistical histogram of speed, acceleration (excluding static state), statistical histogram of video sequence length, and the histogram of the frame number of each scene (e.g, Campus, city).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.