FCN notes (Fully convolutional Networks for Semantic segmentation)
(1) main operation of FCN
(a) Replace the entire connection layer of the previously classified network with the convolution layer,
The FCN replaces the full-junction layer with a convolution layer, which can then generate a heatmap. The size of the convolution layer is (1,1,4096), (1,1,4096), (1,1,1000). FCN in front and back to the calculation, are faster than the previous method, FCN generate a 10*10 result, need 22ms, and the previous method produced a 1 results, you need 1.2ms, if it is 100 results, you need 120ms, so FCN faster. After using the full convolution layer, the size of the input image is not required.
(b) using the on-sample operation, (c) and after sampling these feature graphs, the signatures are connected together,
Because after multiple convolution and pooling, the resulting image is getting smaller and lower resolution, FCN in order to get information, using on-sample (using deconvolution) to achieve dimensional restoration. It not only restores the characteristic graph after pool5, but also restores the feature graph after Pool4 and pool3, the result shows that the semantic information about the image can be obtained from these feature graphs, and the effect is better and more with the feature graph.
(2) The evaluation index in semantic segmentation
Specific content: Measurement criteria (accuracy) in semantic segmentation of deep learning (pixel accuracy, mean accuracy, mean iu,frequency weighted IU)
About patch wise training and fully convolutional training
The answer in StackOverflow
The term "Fully convolutional Training" just means replacing fully-connected layer with convolutional layers so that the W Hole network contains just convolutional layers (and pooling layers).
The term "patchwise training" was intended to avoid the redundancies of full image training. In semantic segmentation, given, is classifying each pixel in the image, by using the whole image, is adding A lot of redundancy in the input. A standard approach to avoid this during training segmentation networks are to feeds the network with batches of random PATC Hes (small image regions surrounding the objects of interest) from the training set instead of full images. This "patchwise sampling" ensures that the input have enough variance and is a valid representation of the training dataset (The Mini-batch should has the same distribution as the training set). This technique also helps to converge faster and to balance the classes. In this paper, they claim that's it not necessary to use patch-wise training and if you want to balance the classes you c An weight or sample the loss. In a different perspective, the problem with full image training in Per-pixel segmentation are that thE input image has a lot of spatial correlation. To fix this, you can either sample patches from the training set (Patchwise training) or sample the loss from the whole IM Age. That's why the subsection are called "Patchwise training is loss sampling". So by ' restricting the loss to a randomly sampled subset of their spatial terms excludes patches from the gradient Computati On. " They tried this "loss sampling" by randomly ignoring cells from the last layer so the loss are not calculated over the Whol E image.
The final effect
Disadvantages (original connection)
What we should pay attention to here is the disadvantage of FCN:
- Is the result of getting or not fine. 8 times times the sampling is a lot better than 32 times times, but the result of the sampling is rather blurry and smooth, not sensitive to the details in the image.
- is to classify the individual pixels without fully considering the relationship between the pixels and the pixel. The spatial regularization (spatial regularization) step, which is used in the usual pixel-based segmentation method, lacks spatial coherence.
FCN notes (Fully convolutional Networks for Semantic segmentation)