According to foreign media reports, Intel officially launched the Perceptual Computing Software Development Kit 2013 (Perceptual Computing SDK 2013) Beta at the recent IDF General Assembly (Intel Development Forum). The SDK is to help developers develop applications that enable users to interact with computing devices by combining voice and machine vision with keyboard, mouse, and direct touch-screen interfaces.
This technology can fundamentally change the way users interact with mobile phones, tablets and PCs, and is key to Intel's future development because it requires a lot of computing resources and is important for Intel's future. Today, users interact with computing devices like mobile phones, tablets and PCs in countless ways, such as a direct touch screen, keyboard, mouse, and touchpad. When we see Microsoft's early investment in machine vision and voice control through Siri and Apple through the Kinect, there are many things you can find to improve the user experience.
speech recognition, machine vision is widely used in the military field, since 9/11 after the government's funding for the past 10 years has been significantly developed, but has not become a common mainstream application. Speech recognition has been going on for a long time, and many older and newer windows have been preinstalled, but never really succeeded, as it was never truly natural and inaccurate, especially with headphones.
only on Apple iOS and Google Android platform, speech recognition has been developed, but still not enough to be accurate enough to not keyboard, but also need to connect to the Internet to work properly. The magic of the XBOX Kinect voice interaction is its limited lexicon, the "What do you see" approach, the dual microphone and the "beamforming". Microsoft's approach is good for gaming environments, but it's not natural to use it across different devices.
Machine Vision has recently been popular with Microsoft's Kinect. Kinect uses two cameras, but does not accurately detect specific fingers and joints. While processing is done on the camera and Xbox, the user must stay somewhere in the room, and the game is limited to the uncomplicated games that require minimal computational resources.
some computer manufacturers and even Google's NEXUS7 provide facial recognition, but these features are slow and easily fooled by pictures, videos or masks. In addition to television, this is unacceptable for most computing environments.
How do you make the interface more natural? First, you need to use a lot of local computing performance at very low power consumption to use the natural user interface. Let's take a look at machine vision that protects user login security. The best way is to have two high-resolution cameras that draw a three-dimensional view of the human face. This can be viewed as reverse engineering for 3D games. 3D machine vision is not to display the game's pixels and textures, but also to input these polygons and textures into the computing device. The challenge is that it requires a lot of processing performance and a lot of power, not just computing engines but also high resolution and stereo cameras.
then, 3D "diagram" needs to match the pattern of the local database, which requires more computational performance and power. This step is called "Object recognition", and the device needs to decide who is looking. While this secure, face-recognition login is just one example, there are many potential uses for this natural user interface:
-In business meetings the host can use gestures to move slides without the need to "click". They just shake their hands.
-a chef with flour on his hands can turn the page when he is looking at a cookbook.
-Costume designers can use their hands, arms and torso to wear a computer-designed pair of shoulder pads.
-by identifying the tone of the voice, the home computer knows to avoid distracting you when you're upset. You can enjoy soft music and dim lighting When you go home.
-When you hear a panic, the computer on your car knows you're in trouble and asks if you want to call 911.
-Your home computer will be able to send a photo of that person using your computer because it feels that someone who has it is not recognized.
-In a nursing home, a tenant computer can know that a convalescent person is not up all day and will notify a nurse or family member.
-Dictation achieves nearly 100% accuracy by combining voice, text, and lip reading.
-When you tell your child there are only 2 guests, if 5 people appear in the room, the TV can recognize and warn you.
-"Hand mouse" replaces the physical mouse or touchpad, where the hand can be clicked and waved anywhere on the screen. The camera can take your hands, joints and fingertips in real time.
--The minutes of the meeting record everything on the meeting and separate the records by different people. Actions and "pauses" are automatic "perceptions."
This example is ... With these very personalized examples, privacy controls are required and Intel has added a "privacy notice" to the SDK. This becomes simple when the indicator shows that you are being recorded by a microphone or camera.
if speech and machine vision are not directly touching the touchpad and the monitor, the keyboard and mouse will soon disappear and be mistaken. We will go to the "multi-mode" interface, where the device will choose the best control based on the environment and user history. This is Intel's "Use mode alignment" to choose the best interactive mode. In addition, two different patterns can be used at the same time when coordination is required. Lip reading can be combined with speech and text to fundamentally improve voice interaction.
when can we make it? Intel's Perceptual Computing program is a long-term plan to see progress in performance and deliverables over the year. Today, the camera is still too large to be used for too much power. Even if the system bus and USB require too much power, it is likely to be replaced by a mobile bus such as MIPI. All this can be resolved over time. In addition to Intel, there will be a lot of different companies vying for the lead, as this is a key lead.
Intel is in a good position because it has a lot of size, clout and strength, and is the only company outside of Nvidia that spans from smartphones to supercomputer chips. If Intel can successfully lead the industry, it needs to have a large number of high-performance chips from the start, which is Intel's advantage. Time is critical for Intel, as the industry has seen time and again that the company has invested a lot of effort in the video codec of mobile chips, which requires a fixed function or a programmable chip to shorten the gap.
(Responsible editor: Lu Guang)