How computer vision is getting started

Last Update:2018-07-25 Source: Internet

Author: User

Tags advantage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following is a transcript of the guest's invitation from the "Fun Live – Knowledge Live platform" from June 29, 2017.
Shared guest: Rough

At present, artificial intelligence, machine learning, deep learning, computer vision has become a new era vane. This article mainly introduces the following points:
1th, if you want to get started with computer vision, you need to know some basic knowledge.

2nd, since you want to study in this area, you have to understand the reference books, you can learn some of the public lessons.

3rd, perhaps everyone is more interested in, is the computer vision as a branch of artificial intelligence, it is inevitable to be combined with deep learning, and deep learning can be said to be integrated into the computer vision, image processing, including we speak of natural language processing, So this article will also briefly introduce the combination of computer vision and deep learning.

4th, in the field of computer, we will inevitably do open source work, so this article will introduce some open source software.

5th, to learn or to study computer vision, it is necessary to read some of the literature, then how we begin to read the literature, and slowly find their own direction in this field, these will be in the arts and sciences for a brief introduction. 1. Basic Knowledge

Next, the 1th is what computer vision means, followed by some basic knowledge of image and video. Includes the hardware of the camera, as well as the CPU and GPU operations.
In computer vision, we also inevitably involve considering whether to use the CPU or use the GPU to do the operation. Then it is the intersection with other disciplines, because computer vision can be and a lot of disciplines do cross, and in the intersection of disciplines, can play a greater significance and value. In addition, for the former is not to do artificial intelligence friends, may be doing software development, want to transform to do computer vision, how to transform. What programming languages and mathematical foundations you need to learn. These are all introduced in the first section. 1.0 What is Computer vision

Computer Vision is a science that studies how to "see" machines.
Further, refers to the use of cameras and computers instead of human eyes to identify, track and measure the machine vision, and further to do graphics processing, so that the computer processing becomes more suitable for human eye observation or transmission to a detection image
As a science subject, computer vision research related to the theory and technology, the view establishes the artificial intelligence system which can obtain "the information" from the image or the multidimensional data.
At present, the very fire of VR, ar,3d processing and other directions, are part of computer vision.
Application of Computer Vision Unmanned unmanned security human face Recognition vehicle license plate recognition with image search image vr/ar 3D reconstruction medical image Analysis UAV other

After understanding what computer vision is, give us a list of some applications in the field of computer vision, almost everywhere, and the current most fire of all entrepreneurial direction is covered inside. These include driverless, unmanned security, and face recognition, which we often refer to. Face recognition is already one of the most mature areas of application, followed by word recognition, vehicle license plate recognition, image search, vr/ar, and 3D refactoring, as well as a promising area of the moment – medical image analysis.
Medical image Analysis He was raised early, has been studied for a long time, but now has a new development, more researchers, whether it is to do the image of researchers, or itself in the medical field of researchers, are increasingly concerned about computer vision, artificial intelligence and medical image analysis. And in the present, medical image analysis has also bred a lot of entrepreneurial companies, the future of this direction is still worth looking forward to. In addition, unmanned aerial vehicles, unmanned aerial vehicles, etc., have been applied to the technology of computer vision.1.1 Images and videos you need to know the conceptImage
A picture contains: dimension, height, width, depth, number of channels, color format, data first address, end address, amount of data, and so on.
Image Depth: The number of bits (bits) that are used to store each pixel
When a pixel takes up more bits, it can show more and richer colors. Example: A 8-bit graph of 400*400, what is the original amount of data for this graph? If the pixel value is integer, what is the value range?
1, raw data volume calculation: * * * (8/8) =160,000bytes
(approx. 160K)
2, Value range: 2 of 8 times, 0~255image format and compression: Common picture format jpeg,png,bmp is essentially a compression encoding of pictures
Example: JPEG compression
1, the original image is divided into small pieces of 8*8, each block has 64pixels.
2, the block of each 8*8 in the image is DCT transformed (the more complex the image, the less easily compressed)
3, different images are divided, each small block of complexity, so the final compression results are not the same video
Original video = picture sequence.
Each ordered picture in the video is called a frame. Compressed video, will take a variety of algorithms to reduce the capacity of the data, where IPB is the most common.
I-Frame: Represents a keyframe, which can be understood as a complete reservation of the picture, which is only required when the frame data is decoded (because it contains the full picture)P-Frame: Represents the difference between this frame and a previous keyframe (or P-frame), which needs to be decoded to create the final picture by overlaying the differences defined by this frame with the previously cached screen. (That is, the difference frame, p frame does not have the complete picture data, only with the previous frame picture difference data)B-FrameThe two-way differential frame, recording the difference between the frame and the frame (more complex, there are 4 cases), in other words, to decode the B-frame, not only to obtain the previous cache screen, but also to decode the screen after the image and the frame before and after the overlay to obtain the final picture. B-Frame compression rate is high, but decoding is more troublesome.Code Rate: The larger the code rate, the larger the size, the smaller the size, the smaller the volume.
The bitrate is the number of data bits transmitted per unit of time when data is transferred, typically in Kbps, or thousands per second. That is, the sampling rate (not the same as the sampling rate, the sampling rate is Hz, the number of samples per second), the higher the sampling rate in unit time, the higher the precision, the more processed files are closer to the original file, but the volume of the file is proportional to the sampling rate, So almost all of the coding format is about how to use the lowest bit rate to achieve the least distortion, around this core derived from CBR (fixed rate) and VBR (variable rate), the higher the rate is clearer, the other is the picture is rough and multi-mosaic.Frame rate
Influence the smoothness of the screen, proportional to the flow of the picture: the greater the frame rate, the smoother the picture, the smaller the frame rate, the more the screen has a sense of beating. If the code rate is a variable, then the frame rate will also affect the volume, the higher the frame rate, the more screen per second, the need for the higher bit rate, the larger the volume.
The frame rate is the number of frames of a picture that is transmitted in a single second, and it can be understood as the number of times the graphics processor refreshes per second.resolutionThe image size is proportional to the image size, the higher the resolution, the larger the image, and the lower the resolution, the smaller the image.Clarity
In the case of a certain rate, the resolution is inversely proportional to the sharpness: the higher the resolution, the less clear the image, the lower the resolution, the clearer the image
In the case of certain resolution, the code rate and clarity are directly proportional to the relationship: the higher the bitrate, the clearer the image; the lower the bitrate, the less clear the imagebandwidth, frame rate
For example, the transmission of the image on the ADSL line, the upstream bandwidth is only 512Kbps, but to transmit 4 CIF resolution of the image. According to the usual, CIF resolution recommended code rate is 512Kbps, then according to this calculation can only be transmitted all the way, reducing the bitrate will inevitably affect the image quality. In order to ensure image quality, it is necessary to reduce the frame rate, so that even the lower bitrate will not affect the image quality, but will have an impact on image coherence.1.2 Camera

Camera classification: Surveillance cameras (network cameras and touch your camcorder) cameras for different industries (ultra-wide dynamic cameras, infrared cameras, thermal cameras, etc.) Smart camera industrial cameras

Current camera hardware we can be divided into surveillance cameras, professional industry applications of cameras, smart cameras and industrial cameras. And in the surveillance camera, the current use of more than two types of one is called a network camera, an analog camera, they are mainly imaging principle is not the same.
Network cameras are generally more clear than the traditional analog camera, analog cameras should now be said to be slowly in a state of elimination, it can be understood as the previous generation of surveillance cameras, and network cameras are currently a mainstream camera, probably in 13, the market may be 70% to 80% Many are analog cameras, and now maybe 60% to 70% are network cameras.
In addition, different industries will have a specific camera, want to ultra-wide dynamic camera and infrared camera, thermal imaging camera, are in specific areas of special use may be used, and he obtained the picture is completely different from the image. If we want to do image processing and computer vision analysis, what kind of camera is more advantageous to you, we should learn to take advantage of the hardware.
If doing research is generally able to control what kind of camera we use, but if it is in the actual application scenario, the likelihood of this control is slightly smaller, but here you have to know that some problems may you change a hardware, it can be well solved, this is a train of thought.
There are some problems you may have been using algorithms for a long time did not solve, even your efficiency is very poor, the cost is very high, but a little change of hardware, you will find that the original problem is not exist, are well resolved, this is the hardware for you a new situation. The
includes now also smart cameras, industrial cameras, industrial cameras in general the price will be more expensive, because he is dedicated to a variety of industrial areas, or to do some precision instruments, high-precision high-definition cameras. 1.3 CPU and GPU

Next to the CPU and GPU, if you want to do computer vision and image processing, then certainly skip the GPU operation, GPU operation this piece may also need to learn or self-study a point of knowledge.
As you can see, most of the current paper on computer vision, many of which are implemented with the GPU to achieve, but in the field of application, because the price of the GPU is more expensive, so the CPU's application scenario is still accounted for the majority.
and where is the difference between CPU and GPU? Their differences can be mainly compared in two aspects, the first is called performance, the second is called throughput.
Performance, in other words, performance is replaced by another word called Latency (low latency). Low latency is when your performance is better, the more efficient you are at dealing with the analysis, the less latency you have, and this is performance. Another, called throughput, means the amount of data you can handle at the same time.
and where is the difference between CPU and GPU? Mainly in these two places, CPU it is a high performance, is ultra-low latency, he can quickly do complex operations, and can achieve a good performance requirements. And the GPU is in a format called the arithmetic unit, so his advantage is not low latency, because he is really not good at doing complex operations, he every processor is very small, relatively weak, but it can let all its weak processor, at the same time to do processing, it is equivalent to he can process a large amount of data, That means it has a very high throughput, so the CPU is paying attention to performance, and the GPU is paying attention to throughput.
So most of the time, the GPU will be associated with another word, called parallel computing, which means it can do a lot of threading at the same time, why the image is especially suitable for GPU computing. This is because the GPU is the first design is called the graphics processing unit, it means that I can divide each pixel into a single thread to operate, each pixel only do a few simple operations, this is the beginning of the graphics processor the principle of appearance.
When it comes to graphical rendering, it calculates the transformation of each pixel. So the calculation of each pixel transformation is very small, it may be a formula calculation, the calculation is very small, it can be placed in a simple calculation unit to do the calculation, which is the CPU and GPU differences.
Based on this difference, we are going to design when to use the CPU and when to use the GPU. If you are currently designing an algorithm that is not very parallel, it is a complex calculation from top to bottom, and there is not much to do with it, so even if you use the GPU, it will not help you to improve your computing performance.

so, instead of saying that everyone else is using the GPU, then you use the GPU, and we need to understand why it's best to use the GPU and what the GPU is for. 1.4 The relationship between computer vision and other disciplines

Computer vision is now very much related to other disciplines, including the robot, as well as the medical, physical, image, satellite image processing mentioned earlier, these will often use computer vision, then here, the most frequently asked questions is nothing more than three concepts, a computer vision, called Machine vision, One is called image processing, then what is the difference between these three things?
The difference between these three things is quite individual, and every researcher has a different understanding of it.
First of all, image processing More is the graphics image of some processing, image pixel level of some processing, including 3D processing, more will be understood as an image processing, and machine vision, more is it also combined with the hardware level of processing, is the combination of hardware and software graphics computing capabilities, With the ability of graphics intelligence, we generally understand that he is called machine vision.
And what we call today's computer vision, more is biased to the software level of computer processing, and not to say that the recognition of the image is so simple, more also includes the understanding of the image, and even some of the image of the transformation processing, the current we are involved in the generation of some images, can also be classified into the field of computer vision.
So computer vision itself is also a very basic discipline, can be cross-cutting with all disciplines, at the same time, its own internal will be divided into more fine, including machine vision, image processing. 1.5 programming language and Mathematical Basics

This section of the content can be found in the "non-computer science, how to learn Computer vision" 2. Reference books and public lessons

Reference Books
The first is called "computer vision:models, learning and Inference" written by Simon J.D. Prince, the main speaker is more suitable for entry-level, Because this book is supporting a lot of code, Matlab code, C code has, supporting a lot of learning code, as well as reference materials, literature, are very detailed, so it is suitable for entry-level students to see.

The second "computer vision:algorithms and Applications" written by Richard Szeliski, this is a very classic, very authoritative reference, this book is not used to see, is used to check, similar to a reference book, it is the most widely covered by a book, so generally can be used as a reference book to see, to consult.

The third "Introduction to OpenCV3 programming " Author: Mao Nebula, cold snow fly, if you want to quickly get started to implement some projects, you can look at this book, it can teach you to do some examples, and learn to OpenCV the most classic and broadest open Source Library of computer vision.

Open Class:
Stanford cs223b
More suitable for the foundation, suitable for just the beginning of the students, with the combination of deep learning will be less, not the whole course of deep learning, but mainly computer vision, all aspects will be talked about.

Stanford cs231n
This should not be introduced, generally many people know that this is a computer vision and deep learning combined with a course, we can see on YouTube, this course is the teacher Li Feifei teacher, if not know can check, do computer vision, this person is the industry and academia "dominate" The 3. Deep learning knowledge that needs to be understood

Deep learning doesn't have much to say, not to say that the content is not much, is very much, here only a book to everyone, this book is out of the end of last year, is the latest book of Deep learning, it speaks very comprehensive, from basic mathematics, to just say the probability, statistics, machine learning and calculus, linear geometry of knowledge points, It's very comprehensive.
4. Open source software that needs to be understood and learned

OpenCV
It is a very classical computer vision Library, which realizes many common algorithms of computer vision. Can help you get started quickly.
Caffe
If is to do computer vision, compare suggest Caffe. Caffe is more adept at doing convolutional neural networks, which use the most in computer vision.
So no matter what other open source software you learn later, Caffe is inevitable, because after learning Caffe you will find that if you understand Caffe, will use Caffe, even have the ability to change its source code, you will find that deep learning has a qualitative leap of understanding.
TensorFlow
TensorFlow recently very hot, but its entry threshold is not low, you have to learn to use it takes more time than all the other software, followed by it is not particularly mature and stable, so the version of the update between the iteration is very much, compatibility and not good, running efficiency there is a lot of room for improvement. 5. How to read the relevant literature

First familiar with the direction of the development process, and then the course of intensive reading milestones in the literature.
For example: deep learning to do target detection, rcnn,fast rcnn,faster RCNN,SPPNET,SSD and YOLO These models must be known. Also for example, deep learning to do target tracking, DLT,SO-DLT and so on.

Top of computer vision:
Iccv:international Conference on Computer vision, International Computer Vision Conference
Cvpr:international Conference on Comput ER vision and Pattern recognition, international computer vision and Pattern Recognition Conference
Eccv:european Conference on Computer vision, European Computer Vision Conference
In addition to the summit, but also There is a top issue. Like Pami and IJCV, these are the top magazines that represent the most cutting-edge and current research directions in the field.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More