I. ImageNet Large scale Visual Recognition competition (ILSVRC)
Imagenet data set is one of the most widely used data sets in the field of deep learning image, and the research work on image classification, localization and detection is mainly based on this data set. The imagenet dataset has more than 14 million images covering more than 20,000 categories, of which more than millions of images have explicit category labels and the location of objects in the image. Imagenet Data Set documentation detailed, with dedicated team maintenance, the use of very convenient, in the field of Computer vision research paper application is very wide, almost become the current deep learning image domain algorithm performance test "standard" data set.
corresponding to the imagenet dataset, there is a world-renowned "Imagenet International Computer Vision Challenge (ILSVRC)", which currently includes the following events:
1. Target positioning (object localization)
Given an image, the algorithm needs to generate 5 category labels with confidence level and their corresponding target border information. The algorithm accuracy assessment is based on the prediction that most matches the label's category label picture may have more than one object and its corresponding callout information and the border information of the callout overlaps the forecast border picture may exist multiple similar objects. Why did you do that? This allows the algorithm to recognize multiple objects in the image, and the algorithm will not be penalized when one of the objects does exist in the image but is not labeled. There may be some unclear or incorrect places, and you can look at the official assessment rules.
2. Target detection (object detection)
Given an image, the algorithm needs to generate predictive information in the form of multiple groups (CI,SI,BI) (C_i, s_i, b_i), where CI c_i is the category label, Si S_i is confidence, and bi B_i is the border information. It is important to note that the algorithm must detect every trained object that appears in the image and be punished for both missed and repeated tests.
3. Target detection for video sequences (object detection from video)
This is similar to the previous target detection.
4. Scenario classification (scene classification)
The game uses the Places2 dataset, the rules of the game are for a given image, allowing the algorithm to generate 5 scene categories, and pick the highest matching as the evaluation results, look at their evaluation rules. Why do you do this? Because the same picture can contain multiple scene categories, the same picture is actually labeled in multiple categories.
5. Scenario analysis (Scene parsing)
The goal of this competition is to divide images into different image regions associated with semantic categories, such as the sky, roads, people and beds. See official Internet cafes for specific rules.two. COCO Common Objects Dataset
Coco data set sponsored by Microsoft, its image of the label information not only the category, location information, as well as the semantic text description of the image, Coco Data set of open source makes the image segmentation semantic understanding has made great progress in recent two or three years, but also almost become the image semantic understanding algorithm performance evaluation of the "standard" data set. Google's open source show and tell generation model is tested on this dataset.
Currently included in the competition are:
1. Target Detection (COCO Detection Challenge), consisting of two races: the border of the output target (using bounding box output), which is what we often call target detection (object Detection). Requires the object to be separated from the image (object segmentation output), which is what we call the semantic segmentation of the image (Semantic image segmentation)
2. Image labeling (COCO captioning Challenge)
Specifically, a sentence accurately describes the information on the image (producing image captions that is informative and accurate). So how do you rate this? It is now based on manual scoring.
3. Human key point detection (COCO KeyPoint Challenge)
The match requirement is to find out where the person is and then locate some key points in the human body (the KeyPoint challenge involves simultaneously detecting people and localizing their keypoints).
three. Pascal VOC
The PASCAL VOC challenge is a benchmark for the identification and detection of visual objects, providing standard image annotation datasets and standard evaluation systems for detection algorithms and learning performance. PASCAL VOC photo collection consists of 20 directories: human; Animals (birds, cats, cows, dogs, horses, sheep); vehicles (airplanes, bicycles, boats, buses, cars, motorcycles, trains); Indoors (bottles, chairs, tables, potted plants, sofas, televisions). The PASCAL VOC Challenge is no longer held after 2012 years, but its data set images are good quality, well-labeled and ideal for testing algorithmic performance.
four. Cifar
CIFAR-10 contains 10 categories, 50,000 training images, color image size: 32x32,10,000 test images. CIFAR-100, similar to CIFAR-10, contains 100 classes, each with 600 images, 500 of which are for training, 100 for testing, and 100 classes to make up 20 super-classes. The image categories are clearly labeled. Cifar is a very good small-to-medium scale data set for image classification algorithm testing.
Five. The MNIST DATABASE of handwritten digits
"Hello world!" in the field of deep learning. Mnist is a handwritten numeric dataset that has 60,000 training sample sets and 10,000 test sample sets, with a width of 28*28 for each sample image. It is important to note that this dataset is stored in binary and cannot be viewed directly in the image format.
The first deep convolutional network lenet is for this data set, the current mainstream deep learning framework almost no exception to the processing of mnist datasets as the first tutorial introduction and Getting Started.
Six. Kitti
Kitti was co-founded in 2012 by the Karlsruhe Institute of Technology (Karlsruhe Institute of Technology) and the Toyota Technical Institute for Chicago (Toyoda technological Institute at Chicago), is currently the world's largest automatic driving scene under the computer Vision algorithm evaluation data set . Used to evaluate the performance of 3D targets (motor vehicles, non-motor vehicles, pedestrians, etc.), 3D target tracking, road segmentation and other computer vision technology in the vehicle environment. The Kitti contains real-world image data captured in urban, rural and expressway scenes, with up to 15 cars and 30 pedestrians in each image, as well as varying degrees of occlusion.
Seven. Cityscapes
Cityscapes is also a data set for autonomous driving related aspects, focusing on pixel-level scene segmentation and instance labeling .
Resources:
Deep Learning visual field Common data set Rollup eight. Face recognition DataSet LFW (labeled Faces in the Wild)
In this data set, the system DeepID2 based on deep learning can achieve 99.47% recognition rate.