translation Proficiency is limited, welcome to readOriginal
first, deep learningAt present, deep learning has become a hot topic. With the development of deep learning algorithms and GPU technology, we have been able to address many areas (computer vision, natural language processing, and robotics) that were once thought impossible. Deep learning is built on the traditional deep neural network. The hot spot in recent years is the use of big data sets and powerful GPUs. Neural networks are essentially parallel algorithms, so the use of multicore GPUs can significantly reduce the time it takes to train deep neural networks. Below, I will discuss how to develop an object recognition system using MATLAB and deep convolutional neural networks and GPUs
second, why the deep learning for computer vision? Machine learning technology uses data (images, signals, text) to train machines (models) for image classification, object detection, and language translation. Traditional machine learning techniques can still be used to solve challenging image classification problems. But if traditional machine learning techniques are used directly to process images, the results will be poor, because traditional methods ignore the structure and properties of the images. At present, the best-performing machine learning technology uses feature extraction algorithm to extract the feature part (interesting parts) of an image, and then compose the feature vector of the image (feature vectors). It is then combined with the traditional machine learning algorithm for further processing. Into the world of deep learning! Deep convolutional neural Networks (CNNs) are a special type in deep learning algorithms. It makes up for the shortcomings of traditional machine learning techniques and changes the way we solve problems. CNNs not only can be used for classification, but also can be directly used to extract features of images, thus avoiding the link of manual extraction of features. In the field of computer vision application, the problem you encounter is not only image classification, but also a good computer vision technology for object detection. It also requires some knowledge of the field of expertise and how to use GPUs effectively. In the following sections of this article, I will use an example of object recognition to illustrate that deep learning with MATLAB is a very easy task, even if you do not have knowledge of the computer vision field or GPU programming.
third, object detection and recognitionThe goal of this section is to detect a pet from the video and correctly tell whether it is a cat or a dog. To complete this example, you need a parallel computing toolbox with MATLAB (Parallel Computing Toolbox), a Computer Vision Toolbox (Computer Vision System Toolbox), and a statistics and machine learning Toolbox ( Statistics and machine learning Toolbox). If you do not have these tools, you can go to www.mathworks.com/trial to apply for a trial version. In addition, I also used the Nvidia Tesla K40 GPU, you can run this example on any MATLAB compatiblecuda-enabled nvidia GPU. Our approach consists of the following two steps:
a、
Object Detection: "Where is the pet in the video?" ”
b、
Object Recognition: "Now I know where it is, but is it a cat or a dog?" Figure 1 shows the final result.
Figure 1 Image detection and recognition system
1. Use a pre-trained CNN classifierThe first step is to train the classifier, which distinguishes between cat and dog images. I can pass:
a, collect some images, cut and change the size of the cat and dog in the image within a reasonable training time range, or
b, using a model trained in a large number of common images to solve my problem. For this example, I will use the 2nd method, because the 2nd method is very common in practice. To do this, I started with a CNN classifier that was trained on the dataset Imagenet. Then, I'll use the matconvnet. Matconvnet is a MATLAB CNN toolkit that uses the Nvidia CUDNN library to accelerate training and forecast time (for more cudnn information, refer to Parallel Forall post here). Matconvnet's download and installation instructions can be found on its homepage. Previously, I had installed matconvnet on my computer, so now I can download the pre-trained CNN classifier directly using the MATLAB code below and then make predictions. Note: I also used the cnnpredict () help function, see my github specifically.
Percent Download and predict using a pretrained ImageNet model% Setup matconvnetRun (FullFile (' Matconvnet-1.0-beta15 ',' matlab ',' vl_setupnn.m '));%Download ImageNetModel frommatconvnetPretrained Networks Repositoryurlwrite (' Http://www.vlfeat.org/matconvnet/models/imagenet-vgg-f.mat ',' Imagenet-vgg-f.mat '); cnnmodel.net = Load (' Imagenet-vgg-f.mat ');%Load andDisplay an example imageimshow (' Dog_example.png '); img = Imread (' Dog_example.png ');%PredictLabel usingImageNetTrained Vgg-fCNNModellabel = Cnnpredict (cnnmodel,img), title (label,' FontSize ', -)
The pre-trained CNN classifier is more effective than expected in object classification. The CNN model tells us that in the image used in this example (image 2) there is a beagle. Although this is a good start, our problems are somewhat different. I want to: (1) draw a rectangle around the pet (object detection), and (2) accurately mark whether it is a dog or a cat (category). Next, let's start with a pre-trained CNN model to build a dog vs cat classifier.
Figure 2 Pre-trained imagenet model will have a dog image that is "beagle"
2. Training Dog vs Cat classifierObject recognition is still relatively simple. The first thing to do is to solve a simple classification problem--given an image, train a classifier that can accurately identify whether a dog or a cat is in the image. As long as there is a sufficient number of images of cats and dogs, and then the pre-trained CNN classifier, this problem is still easy to achieve. To get the images needed in this example, I asked my colleague to send me some pictures of their own pets. Then I divide it into cats and dogs and put them in the "cat" and "Dog" folders under the folder "Pet_images", and the benefit is that the ImageSet function automatically handles the images. Later, I use the following code to add it to MATLAB.
Percent Load images from folder% UseImageSet to load images storedinchPet_images Folderimset = ImageSet (' Pet_images ',' recursive ');%preallocateArrays with fixed size forPredictionimagesize = Cnnmodel.net.normalization.imagesize;trainingimages = Zeros ([ImageSize sum ([Imset] (:).Count])],' single ');%Load andResize Images forPrediction forII =1: Numel(Imset) forJJ =1: Imset(ii).CountTrainingimages (:,:,:, JJ) = Imresize (single (Read (Imset (ii), JJ)), ImageSize (1:2));EndEnd%GetThe image labelstraininglabels = Getimagelabels (Imset); summary (traininglabels)%Display class label distribution
3. Feature extraction using CNNThe next step is to use the above data and pre-trained imagenet to extract the image features. As I mentioned earlier, CNNs can extract general features from an image. These features can then be used to train new classifiers to solve different problems, such as the Cat and dog image classification problem in this example. The CNN algorithm is a computationally intensive algorithm, so the computational process may be longer. And since CNN is essentially a parallel algorithm, we can use GPUs to speed up the computational process. The following is the code for feature extraction using pre-trained models, as well as comparisons with the use of multithreaded CPUs and the use of GPU implementations.
Percent Extract features using pretrained cnn% dependingOn how much memory is on yourGPUYou could use a larger% batch size.IHave -Images, soIChoose $As my batch sizecnnModel.info.opts.batchSize = $;% MakePrediction on aCPU[~, Cnnfeatures, timecpu] = cnnpredict (Cnnmodel,trainingimages,' Usegpu ',false);% MakePrediction on aGPU[~, Cnnfeatures, Timegpu] = cnnpredict (Cnnmodel,trainingimages,' Usegpu ',true);%CompareThe Performance Increasebar ([Sum (TIMECPU), sum (TIMEGPU)],0.5) Title (sprintf (' approximate speedup:%2.00f x ', SUM (TIMECPU)/sum (TIMEGPU))) Set (GCA,' Xticklabel ',{' CPU ',' GPU '},' FontSize ', -) Ylabel (' Time (sec) '), grid on, Grid minor
Figure 3 Time comparison of feature extraction using CPU (left) and GPU (right)
Figure 4 uses the CPU and GPU to extract features from 1128 images in the same time as 4 shows, in this case the performance gains from GPUs are obvious-about 15 times times higher. The function cnnpredict has been encapsulated in the Vl_simplenn prediction function inside the matconvnet. If you want to use the GPU to make predictions, just change the code in the red rectangle in Figure 5. The Gpuarray function in the Parallel Computing Toolbox makes it very convenient for you to convert code implemented on the CPU to the GPU.
Gpuarray and gather
functions allow you to convert data in the MATLAB workspace to the GPU "title=" >
Figure 5
gpuArray
And
gather
function allows you to convert data in the MATLAB workspace into the GPU
4. Using CNN feature training ClassifierWith the CNN feature extracted in section C, we are now starting to train a simple classifier. To train the model and compare multiple models, we need to use the classification learner app in the statistics and machine learning Toolbox. Note: To learn about machine learning and classification problems in MATLAB, you can go to machines learning Made Easy webinar. Next, I first use the FITCSVM function to extract features, and then use cnnfeature as input or predictor, using traininglabels as the output or predictive value, and then on this basis to train the SVM classifier. Of course, in order to verify the accuracy I will also use cross-validation to test the classifier. The accuracy obtained is the unbiased estimation of the results obtained by the classifier on the actual data.
%% Train a classifier using extracted features%HereI train a linear support vector machine (SVMPerformand check accuracycvmdl = crossval(svmmdl,‘KFold‘,10);fprintf(‘kFold CV accuracy: %2.2f\n‘,1-cvmdl.kfoldLoss)
SVMMDL is my classifier, which can be used to differentiate whether the pet in the image is a cat or a dog.
5. Object DetectionIn addition to pets, most images or videos have many other things, possibly trees or raccoons. In this case, even if the classifier is very good (such as I used above), its classification effect is often not very good. But if we can locate the object in the image (dog or cat) and then extract the area of the object and use it in the classifier, the classification results will be significantly improved. The object positioning process is called object detection. To detect the desired object, I'm going to use a technique called Optical flow. The core of the technique is that object pixels have different positions in the continuous frames of the video. A frame of the video (with motion vector overlap) is shown in Figure 6.
Figure 6 The next step in a frame object detection in video is to extract the moving pixels, and then use the image Region Analyzer app to analyze the connected parts of the binary image to eliminate the noise pixels caused by the camera motion.
four, object detection and identification stepsSo far, I've got the basic steps I need to build a pet detection and identification system, and I'll summarize them. Let's take a quick look at these basic steps:
Aand detecting the position of pet in the image;
B, extract the pet area and use the trained CNN to extract the features;
C, using SVM classifier to classify (feature).
Pet Detection and identificationAfter finishing the basic steps above, I can finally get the complete pet detection and identification system as shown in the MATLAB code below.
Percent tying the workflow togetherVR = Videoreader (FullFile (' Petvideos ',' Videoexample.mov ')); VW = Videowriter (' Test.avi ',' Motion JPEG AVI '); Opticflow = Opticalflowfarneback;open (VW); whileHasframe (VR)% Count FramesFramenumber = Framenumber +1;% Step 1. Read FrameVideoframe = Readframe (VR);% Step 2. Detect ROIVframe = Imresize (Videoframe,0.25);% Get Video frameFramegray = Rgb2gray (vframe);% Convert to gray for detectionBboxes = Findpet (Framegray,opticflow);% Find bounding boxesif~IsEmpty(bboxes) img =Zeros([imageSize size (bboxes,1)]); forII =1:size(Bboxes,1) img (:,:,:, ii) = Imresize (Imcrop (videoframe,bboxes (ii,:)), ImageSize (1:2));End% Step 3. Recognize Object% (a) Extract features using a CNN[~, scores]= Cnnpredict (cnnmodel,img,' Usegpu ', True,' Display ', false);% (b) Predict using the trained SVM ClassifierLabel = predict (Svmmdl,scores);% Step 4. Annotate ObjectVideoframe = Insertobjectannotation (Videoframe,' Rectangle ', BBOXES,CELLSTR (label),' FontSize ', +);End% Step 5. Write Video to FileWritevideo (Vw,videoframe); fprintf (' Frames processed:%d of%d\n ', Framenumber,Ceil(VR.framerate*VR.Duration));EndClose (VW);
V. Conclusion in solving real-world computer vision problems, it is often necessary to weigh the needs of your application (performance, accuracy, and simplicity of the program). In terms of the accuracy of visual recognition, advanced techniques such as deep learning have significantly improved compared to traditional machine learning. But for mainstream applications, the performance cost is often significant, but GPU technology can compensate for these costs with an order of magnitude speedup. MATLAB makes deep learning computer vision work more convenient. The combination of easy-to-use applications and other programming environments, open source computer vision libraries and machine learning algorithms, and CUDA-enabled graphics, makes Matlab an ideal platform for designing and rapidly implementing computer vision problems. If you are interested in this article, you can register for our upcoming webinar: Deep learning for computer Vision with MATLAB. At the end of the seminar, we will answer the questions on site. By the way, you may also be interested in previous MATLAB posts on Parallel forall.
Related ConnectionsDeep speech:accurate Speech recognition with gpu-accelerated deep learningbidmach:machine learning at the Limit with GPU Sdeep Learning for Image understanding in planetary scienceeasy multi-gpu deep learning with DIGITS 2
Deep learning-Computer Vision with MATLAB and CuDNN