Gesture recognition and desktop control system based on Zedboard
Shandong University Information Institute IC
Sorin
Directory
Gesture recognition and desktop control system based on Zedboard ......... ............. ....................... ........ 1
Review....................................................................................................................... .......... 3
Introduction to gesture recognition ...... ..... ..... ....................... ..... ..... ..... ..... ..... ............. .......... 3
The application foreground of gesture recognition: intellectualization, family ... ..... ..... ..... ..... ..... ..... ............................... 3
Technical background: Hardware and software co-design is the general direction ..... ..... ..... ..... ..... ..... .......................... 3
Project Profile ... ..... ..... .... ..... ..... .......................... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ....... 3
The system is set up ... ..... ..... ..... ..... ..... ..... ..... ..... ................. ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ..... ..... 3
Zedboard hardware Preparation ..... .... ..... ....... ..................... ..... ..... ..... ..... ..... ............. .......... 3
The Linux software platform is ready ....... ..... ..... .................... ..... ..... ..... ..... ..... ............. .......... 3
Boot Linux from SD card ..... ..... ..... ..... ................... ..... ..... ..... ................. .......... 3
Cross-compiling environment to build ..... ...... ..... .................. ....... ..... .................. ......... 4
Compile the OpenCV library with the QT Library ..... ..... ..... ..... ............... ..... ..... ..... ................. .......... 4
Data collection and display ..... ..... ..... ..... ....................... ..... ..... ..... ..... ..... ............... ......... 4
USB Camera data Collection ..... ....... .................... ....... ..... ....................... 4
Show video ..... ..... ..... ..... .......................... ..... ..... ..... ..... ..... ..... ............ ........... 4
Image preprocessing ..... ..... ..... ..... ..... ..... ..... ................... ..... ..... ..... ..... ..... ..... ..... ....... ....... ......... 4
Filter operation ..... ..... ..... ..... ..... ....................... ..... ..... ..... ..... ..... ..... ............ ........... 4
Skin tone extraction ..... .... ..... ....... ........................ ..... ..... ..... ..... ..... ..... ............ ........... 4
Skin color in different color space clustering analysis ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ................... 4
A variety of extraction methods combined with the scheme .......... ............ ....................... ........ 5
Morphological operation anti-noise ....... ..... ..... .................... ..... ..... ..... ..... .............. ........... 5
Feature learning and Classification ...... ..... ...........--.....--.....???????? ..... ..... ..... ..... ..... ............... ......... 5
Contours and features ...... ..... ..... ..... ....................... ..... ..... ..... ..... ..... ............... ......... 5
The Bayesian classifier identifies gestures ...... ....................... ....... ..... ....................... 5
The text file reads and displays ...... ..... ...................... ..... ..... ..... ................. .......... 5
Hardware acceleration and collaborative design ..... ..... .... ..... ..... .................. ..... ..... ..... ..... .............. ........... 5
Run the effect ... ..... ..... ..... ..... ..... ..... ..... ...................... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ..... ..... 5
Skin tone extraction ..... ..... ..... ..... ..... ....... .................... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ....... 5
Contour treatment ... ..... ..... ..... ..... ..... ..... ..... ..... ................ ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ....... 5
Gesture recognition ..... ..... ..... ..... ..... ..... ....................... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ....... 5
System control ..... ... ..... ..... ..... ..... ..... ..... ................... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ....... 5
Project evaluation ... ..... ..... ..... .....-... and ... and .....-.....-.....?????????????? ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ....... ..... ..... 5
The pros and cons of ... ..... .... .... ..... ..... ....................... ..... ..... ..... ..... ..... ..... ..... ....... ....... ......... 5
Summing up and looking forward ... ..... ..... ..... ............................ ..... ..... ..... ..... ..... ..... ..... ....... ....... ......... 5
Resources..................................................................................................................... ..... 5
Overview Gesture Recognition Introduction
As an important part of machine vision, the vision-based gesture recognition technology (VBGR, Vision Based Gesture reeognition) is a branch of natural human-computer interaction, and its development is closely related to human-computer interaction technology. Gesture recognition is a variety of gestures in accordance with certain rules by computer recognition, indicating that the computer into the corresponding control commands or semantics, to achieve computer control or information exchange. Application foreground of gesture recognition: intellectualization, family
The advent of the intelligent machine era has greatly increased the demand for gesture recognition systems. It represents a more natural and direct human-computer interaction, in the smart TV, intelligent machines, and smart home and other aspects have a very broad application prospects. Especially in smart TV, such as TV reading, channel control and so on, the number of gesture recognition, recognition accuracy and so on have put forward higher requirements. Technical background: Hardware and software collaborative design is the general direction
Most of the present gesture recognition systems are based on pure software, and the recognition speed and hardware cost are often not in balance, especially in the embedded field. and hardware and software collaborative design, with the hardware to achieve pre-processing, software with the hardware to complete the intelligent algorithm to achieve high-speed, low-cost design purposes. Xilinx's ZYNQ series of chips and Planahead and HLS tools provide the possibility for large-scale system hardware and software co-design. Therefore, this project uses the ZYNQ experimental platform developed by Digilent company Zedboard to realize the design of gesture recognition and application system. Project Introduction
The project is based on Zedboard development design, using Xilinx HLS tools for the hardware development of the image preprocessing section, building a Linux system in the PS section, and using OPENCV to develop an intelligent algorithm that uses QT to complete the display. In view of the current status of recognition algorithms, in the aspect of skin color extraction, the system adopts a combination of various extraction methods to improve the extraction success rate; Use Bayes classifier to complete the learning and recognition of static gestures.
The complete system will be able to simulate the interactive function of the smart TV by means of gestures such as zooming of text on the display, changing the page, and completing simple system control instructions. System Setup Zedboard Hardware Preparation
To run Linux in Zedboard, first configure the Linux operating environment in Zedboard, where the Digilent company's OOB design was selected as the master for the migration.
OOB Design Hardware Composition 1
OOB Design hardware 1 Linux software platform ready to boot Linux from SD card
Refer to ZEDBOARDCTT to create boot files, Devicetree and RAMDisk for booting Linux.
Open "Create Zynq Boot Image" in the SDK environment, in order to add the official FSBL file, bitstream file generated by Planahead and U-boot file, The U-boot.bin is generated and renamed to Boot.bin, copied to the SD card with the device tree file Devicetree.dtb, disk mirroring ramdisk8M.image.gz, and x-linux kernel image, plugged into the board, and configured to boot mode for the SD card.
Connect the serial cable, open minicom under Linux, can view the boot information configuration OPENCV and QT running environment
1. Install the Arm-xilinx-linux-gnueabi under Fedora
2. Cross-compiling OPENCV, compile-time note compile the necessary dependencies at the same time
3. Cross-compile Qtsourceeverywhere, and configure Qmake
4. Compile QT project, build project in Qtcreator and run successfully, enter project folder under Terminal, enter Qmake-o Makefile project.pro generate Makefile, enter make command to generate QT program running on embedded system Data acquisition and display
Because OPENCV does not support capturing images using Cvcameracapture on xilinx-linux, you need to write your own program to read webcam images by v4l and process them. USB Camera Data Acquisition
V4L is a set of specifications (APIs) for developing video capture device drivers in a Linux environment, which provides a unified interface for driver programming and incorporates all the drivers of video capture devices into its management. V4L not only brings great convenience to driver writers, but also facilitates the authoring and porting of applications. V4L2 is an upgraded version of V4L, because the OOB we use is 3.3 kernel and no longer supports V4L, so programming no longer considers V4L API and parameter definitions.
V4L2 supports memory mapping (MMAP) and direct read (read) to capture data, the former is generally used for continuous video data acquisition, the latter is often used for the collection of static image data, this paper focuses on memory mapping method of video capture.
The application collects video data through the V4L2 interface into five steps:
Firstly, the video device file is opened, the parameters of video acquisition are initialized, the acquisition window of the video image is set through the V4l2 interface, the size and format of the lattice are collected;
Secondly, the application of a number of video capture frame buffers, and these framebuffer from the kernel space mapping to the user space, easy for the application to read/process video data;
Third, the application of the frame buffer in the video capture input queue queuing, and start video capture;
Finally, the driver starts the video data acquisition, the application takes out the frame buffer from the video capture output queue, and after processing, puts the frame buffer back into the video capture input queue and collects continuous video data in the loop.
Five, stop the video capture.
The class declaration of the V4LCAM is thus constructed so as to facilitate the program invocation
typedef struct{
void *start;
unsigned length;
}buffer;
Class V4lcam
{
Public
V4lcam ();
struct V4l2_buffer camera_buf;
void init (int width,int height);
Buffer *buffers;
void Startcamstream ();
int init_v4l2 (void);
struct V4l2_buffer imgbuf;
Buffer *getframedata ();
void Releasecap ();
V4lsize imgsize ();
Private
int picstate;
unsigned short imagewidth;
unsigned short imageheight;
int FD;
}; Show Video
To display a camera image read by v4l, you want to convert YUV to an RGB image and display it manually by setting the Qimage data.
V4L Read the YUV (422) image into the RGB image part of the code:
V4lsize camsiz=cam.imgsize ();
Buf=cam.getframedata ();
Cvsize siz;
Siz.width=camsiz.width;
Siz.height=camsiz.height;
Iplimage *img,*img2;
Img=cvcreateimage (siz,8,3);
Img2=cvcreateimage (siz,8,3);
Input2yuvimage ((char*) (Buf->start), img->imagedata,siz.width,siz.height);
Cvcvtcolor (IMG,IMG2,CV_YCRCB2RGB);
‘
Dynamic display on QT, use Qpainter to draw all the images and refresh at a certain frequency. Image preprocessing filter operation
The image read by webcam is often not clear enough, there will be a lot of noise, for the later processing more convenient, first of all the image filtering processing. Commonly used filtering operation includes: Median filter, Gaussian filter, etc., after testing, it is found that the kernel size of 5x5 median filter has better effect and running speed, and it is easy to realize the hardware. Skin Tone Extraction
Skin tone Extraction is the first step in the gesture recognition system, and it is also the final step, the effect of skin color extraction can even directly affect the success or failure of the recognition system. Fortunately, many predecessors have made a contribution to this. The cluster analysis of skin tones in different color spaces RGB space
In the RGB color model, the represented image consists of 3 image components, each of which is a primary color image. Most of the pavilion-like set are based on CCD technology as the core, directly aware of the color of R, G, B two components, which also makes the three-color model for image imaging, display, printing and other equipment, has a very important role, is widely used in video monitors and color cameras. HSV Space
The HSV (hue, saturation, brightness) color space is one of the color systems that people use to pick colors from a palette or a color wheel. The color system is closer to people's experience and perception of color than the RGB system. The luminance information is separated in the HSV space, so the chroma H component is more robust to the illumination variation. It is also known that skin color is highly robust in h component, so it has good effect in using H and s components for skin color segmentation.
HSV color space 1 YCRCB color space
YCbCr space is widely used in image and video compression standards, such as MPEG and JPEG are used in this format, it is one of the YUV color space family. In the YCbCr model, y represents brightness. The CB and CR components represent the color of blue and red respectively, and the CB and CR are two-dimensional independent. The analysis shows that the joint probability density image of Cb and Cr component of skin color is roughly elliptic, so the ellipse model [I]pow (cb-148,2)/(30.4*30.4) *cos (2.44) +pow (cr-106,2)/(10.6*10.6) *sin ( 2.44) <1 is still worth taking. However, in the actual test process, the resulting image is not very ideal, the advantage is that in the 3 model discussed, the CRCB ellipse model of the light sensitivity is minimal. Multiple extraction methods combined with the scheme
In the process of project research, it is found that the combination of the HS component of HSV space and RGB space for threshold processing can achieve better results. In the reference literature [II] and test that the H component actually has a certain dependence on V components, but as long as the outside of the strong illumination, the segmentation effect is still good. After many experiments, it is found that the ideal segmentation effect can be obtained by using 85
h<130&&h>80&&s<170&&s>15&&r>g&&r>b&& ((R> 80&&G>25&&B>10&&R-B>15) | | (r>210&&g>200&&b>160&&r-g<=15)) Morphological Operation Anti-noise
The two-valued image with fixed threshold segmentation has more noise, and it is simple to deal with morphological operation. Expansion operations can connect small areas and eliminate negative noise, and corrosion operations can remove noise from the surrounding environment. After many experiments, the best morphology treatment effect can be obtained by sequential expansion, two corrosion and two expansion. Feature learning and classification contour and feature
Contours are useful in apparent vision, and can be used to kick out holes due to improper pre-processing and to remove noise.
In OpenCV, contours are represented by sequences, and the contours of a two-value image can be easily obtained by cvfindcontor
In order to remove unwanted areas, such as environmental disturbances, noise, etc., and find the most suitable contour, in the process of finding the contour, first through the perimeter area than remove the large area of background noise, and then in the remaining contour to select the area of the largest areas as the monitoring gesture area. The code for this section is as follows:
Cvcontourscanner scanner = Cvstartfindcontours (skin,memsto,sizeof (cvcontour), cv_retr_ccomp,cv_chain_approx_none);
while (seq = Cvfindnextcontour (scanner)) = NULL) {
Area = Fabs (Cvcontourarea (seq));
indexarea=area/(Siz.width*siz.height);
Lenth=cvcontourperimeter (seq);
if (indexarea<0.009| | lenth*lenth/area>150.0) {
Cvsubstitutecontour (scanner,null);//delete the current profile
}else{
cvbox2d Box=cvminarearect2 (Seq,memsto);
Box_area=box.size.width*box.size.height;
seq2= cvConvexHull2 (seq,memstohull,cv_clockwise,1);
if (Area>areamax) {
Areamax=area;
Classifydata.arearatio=areamax/fabs (Cvcontourarea (SEQ2));
classifydata.recratio=areamax/(Box.size.height*box.size.width);
Classifydata.angle=box.angle;
Rec_wid=box.size.width;
Indaramax=indexarea;
indlenmax= (lenth*lenth)/area;
Seqmax=seq;
HULLMAX=SEQ2;
}
// }
}
}
Seq=cvendfindcontours (&scanner);
After extracting the contour, many useful features can be obtained, in which the characteristics such as normalized hu moment, circumference area ratio, convex convex defect and so on are provided to provide a good classification basis for scaling rotation invariance. Bayesian classifier recognition gesture
The basis of Bayesian classification is probabilistic reasoning, which is how to complete the reasoning and decision-making task in the case of uncertainty of the existence of various conditions and only knowing the probability of its occurrence. Probabilistic inference is corresponding to deterministic reasoning.
The API functions associated with the Bayesian classifier in OpenCV are as follows:
(1) Cvnormalbayesclassifier::cvnormalbayesclassifier ();
The function is the default constructor;
(2) Cvnormalbayesclassifier::cvnormalbayesclassifier (const mat& traindata, const mat& response, const mat& Varidx=mat (), const mat& Sampleidx=mat ());
The function actually calls the train () function inside the default constructor for classifier training;
(3) bool Cvnormalbayesclassifier::train (const mat& traindata, const mat& response, const mat& Varidx=mat (), Const mat& Sampleidx=mat ());
The function of the Bayesian classifier training, the input vector must be a row vector, the variable response must be an integer, but its initialization type can be set to CV_32FC1;
All eigenvectors must be intact, and no data is missing from a vector in the training sample set;
(4) Float cvnormalbayesclassifier::p redict (const mat& samples, mat* result=0);
The function returns the class to which it belongs according to the characteristic vectors of the test sample entered by the user; Note that if the input is a matrix of the eigenvectors of many test samples, the return value is the result matrix;
In this project, we first construct the characteristic data acquired by Bayesian classifier in the upper computer learning training, and export the classifier data to the embedded device.
Part of the Code
if (is_training&& (! train_completed)) {
Classifier. Load ("Classifier.txt");
Classifier. Train (Train_mat,res_mat,mat (), Mat (), 0);
Classifier. Save ("Classifier.txt");
train_completed=1;
}
Identification results See Accessories hardware acceleration and collaborative design
The great advantage of developing with the ZYNQ platform is that you can use HLS for high-level synthesis and hardware acceleration instead of the part that was originally done with the software. In the image processing, the preprocessing part based on the pixel stream is most suitable for accelerating with HLS, and the high-speed data communication directly with PS can speed up the system operation very AXI-STREAM+DMA.
The basic process of hardware/software co-design is as follows:
Run effect
In the upper left corner of the screen is the reading of the source image with the detected contour and convex hull overlay, the lower right corner of the detected skin color area, the upper right corner of the image is reconstructed through the contour of two values, so you can see effectively remove noise and fill the inner and edge of the hole.
Result shows the results returned by the classifier, which is consistent with the index number preset at the time of training, and 45 is also correct for the tilt angle of the gesture, and the other eigenvalues are the classifiers used to learn and identify, and the analysis of the merits and demerits of the information item evaluation cannot be visualized here.
As a reference model, the project shows the basic idea of intelligent visual processing of hardware and software collaborative design. By using the existing technology, the development cycle is greatly shortened and the development quality is improved.
However, there are still many deficiencies in the project, such as the increase in skin color extraction and the number of recognition gestures. Summary and Outlook reference
"Hardware and software co-design-based on Zynq"
QT Programming Proficiency (Fourth edition)
"Learning OpenCV"
Xilinx Official documentation
Lazy Bunny Blog
Superb Sunny Blog
I
[II]
[III]
Zedboard-based gesture recognition and desktop control system _ project paper