In both CNN (1) and CNN (2) Two articles, the main explanation is CNN's basic architecture and weight sharing (Weight sharing), this article focuses on the convolution part.
First, before convolution, our data is 4D tensor (width,height,channels,batch), which was mentioned in CNN (1): Architecture. The passage here, and the previously mentioned depth, is a concept, such as a grey scale image with a channel number of 1;RGB graphs of 3.
In fact, Kernel also has channel, and its number is the same as the number of input tensor channels. such as RGB image, there are 3 channels, in the convolution process we use 5 kernel.
So, how many feature maps are there? For a long time, my answer was 15 feature map. But in fact, the answer is 5 (consistent with the number of kernel). Shown, the kernel also has 3 channels, respectively, with the RGB three channels for convolution, generated 3 single-channel convolution results, and then three results are added to the feature map. In this convolutional layer, kernel is the only parameter we need to "learn", that is, 3*3*3*5=45 parameters. In fact, by training kernel, you will learn which features of the image should be filtered.
Convolutional neural Networks (3): Convolution and Channels