Microsoft COCO Data Set

Source: Internet
Author: User

This blog focuses on introducing the MS Coco DataSet as a goal, divided into 3 parts: Coco Introduction, Data set classification and Coco Showcase.

I mainly download the 2014 version of the data, there is a total of about 20G pictures and about 500M of the label files. The label file marks the exact coordinates of each segmentation pixel precise position +bounding box, with a precision of two digits after the decimal point. A target is labeled as follows:

{"Segmentation": [[392.87, 275.77, 402.24, 284.2, 382.54, 342.36, 375.99, 356.43, 372.23, 357.37, 372.23, 397.7, 383.48, 41 9.27,407.87, 439.91, 427.57, 389.25, 447.26, 346.11, 447.26, 328.29, 468.84, 290.77, 472.59, 266.38], [429.44, 465.23, 453.8 3, 473.67, 636.73, 474.61, 636.73, 392.07, 571.07, 364.88, 546.69,363.0]], "area": 28458.996150000003, "Iscrowd": 0, "image _id ": 503837," bbox ": [372.23, 266.38, 264.5,208.23]," category_id ": 4," id ": 151109},

Here's a look at this data set.

Coco Introduction:

Coco Data Set is an official description URL for the Microsoft team that can be used to image the Recognition+segmentation+captioning dataset: http://mscoco.org/.

The main features of this data set are as follows: (1) Object segmentation (2) Recognition in Context (3) Multiple objects per image (4) more than 300,000 Images (5) More than 2 Million instances (6) + object categories (7) 5 captions per image (8) keypoints on 100,000 people

In order to better introduce this data set, Microsoft published this article in ECCV workshops: Microsoft Coco:common Objects in Context. From this article, we learned that this data set is targeted at scene understanding, which is mainly intercepted from complex everyday scenes, where the target is calibrated by precise segmentation. The image includes 91 categories of targets, 328,000 images, and 2,500,000 labels.

The data set mainly addresses 3 issues: target detection, contextual relations between targets, and precise positioning on 2 dimensions of the target. Comparison diagram of data sets:


Data Set Classification:

Image classification:

The classification requires a binary label to determine whether the target is in the image. Early datasets are primarily a single target in a blank background, such as the mnist handwriting database, COIL household objects. The famous datasets in the Machine learning field are CIFAR-10 and CIFAR-100, which provide classes 10 and 100, respectively, on the 32*32 image. The most recently known categorical dataset is the imagenet,22,000 class, which has 500-1000 images per class.

Object Detection:

In the classic case, the target location is determined by bounding box, which is primarily used for face detection and pedestrian detection, and datasets such as the Caltech pedestrian dataset contain 350,000 bounding box labels. PASCAL VOC data includes 20 targets over 11,000 images, over 27,000 target bounding box. Recently there are detection datasets obtained under Imagenet data, Class 200, 400,000 images, and 350,000 bounding box. Because some goals are strongly related rather than independent, it makes sense to detect a certain goal in a given scenario, so accurate location information is more important than bounding box.

Semantic Scene Labeling:

Such issues require pixel levels of labeling, where individual goals are difficult to define, such as streets and grasslands. Data sets mainly include indoor scenes and outdoor scenes, and some datasets include depth information. Among them, the SUN dataset consists of 908 scene classes, 3,819 general target classes (person, chair, car) and semantic scene classes (wall, sky, floor), with a large difference in the number of each class (this Coco data is improved, Ensure that each type of data is sufficient).


Other Vision Datasets:

Some datasets, such as Middlebury Datasets, contain stereo-relative, multi-view stereo pairs, and optical streams, as well as Berkeley segmentation data Set (BSDS500) to evaluate segmentation and edge Detection algorithm.

Coco Show:

The data set tagging process is as follows:


Coco datasets have 91 classes, though less than imagenet and sun categories, but each class of images is much more, which facilitates the ability to get more of each class in a particular scenario, compared to Pascal VOC, which has more classes and images.

Coco datasets are released in two parts, the previous part released in 2014, the latter part in 2015, 2014 version: 82,783 training, 40,504 validation, and 40,775 testing images, There are 270k segmented people and 886k segmented object;2015 year versions: 165,482 train, 81,208 Val, and 81,434 test images.

Its performance comparisons and some examples:



Source: http://blog.csdn.net/u012905422/article/details/52372755 Keep calm and Carry on

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.