Absrtact: What is HoloLens? HoloLens, a wearable augmented reality computing device released by Microsoft, has a number of key elements: it is augmented reality, augmented Reality (AR), and AR technology that combines computer-generated images with the real world.
What is HoloLens?
HoloLens, a wearable augmented reality computing device released by Microsoft, has several key elements:
It is an augmented reality product, that is, augmented Reality (AR), an AR technology that combines computer-generated images with the real world. Similar products have images projected onto the retina of Google Glass, and the phone's AR apps superimposed on the camera screen. It has a separate computing unit, with CPU + GPU +hpu, and no external computers are required. Its CPU and GPU are based on Intel's 14 nm Cherry Trail chip, HPU is the acronym for Microsoft's invention, the holographic 處理 unit, the holographic processor. According to the answers to anonymous users, HPU is an ASIC (Application-specific integrated incrementally), a customized integrated circuit for the HoloLens, which I can only say "rich and capricious".
HoloLens not what?
After watching Microsoft's lifelike promotional video, if your response is
The Matrix is coming.
Then you should take a good look at this paragraph, because the Matrix is virtual REALITY/VR/VR, which is characterized by allowing participants to be exposed to the computer-generated three-dimensional image world and to dilute the real world. The recent representative product of VR is Oculus Rift, you can't see the real world after wearing Rift. In my opinion the biggest problem with VR is that the virtual world is real and exciting, but what's the use? In other words, VR can only achieve a more realistic three-dimensional world, it can not help people better understand the real world.
HoloLens is also not Google Glass (hereinafter referred to as GG), it is more than GG:
Three-dimensional perceptual ability can be modeled on the three-dimensional scene around. And GG can only see RGB pixel values. Three-dimensional rendering capability. Human-Computer Interaction ability, can use gestures to control.
HoloLens is also not a common ar in the marketplace, and common webcam based AR applications are based on cameras:
AR based on ugly black and white mark pictures
and AR based on any picture.
It's cool, but they can only detect the plane where the picture is. HoloLens than they are cattle, it can detect the three-dimensional scene of all angles!
How does the HoloLens AR get the three-dimensional scene depth information?
We return to the definition of AR, want to realize augmented reality, must first understand the reality, then what is the reality for HoloLens? Is the data for the sensor.
What's the sensor? It's a camera.
Also is the camera, why HoloLens can perceive depth? Microsoft's Kinect is very successful in this respect, so is it HoloLens to put an embedded Kinect on it?
The answer is in the following prototype picture:
HoloLens has four cameras, two on both sides of the table. By analyzing the real-time images of these four cameras, the horizontal and vertical angles of the HoloLens can reach 120 degrees.
In other words, it uses stereo vision/Stereo Vision technology to get a depth map (depth map) similar to the following figure.
Stereoscopic vision is a discipline of computer vision, focusing on the distance from the camera to the object in the real scene from the image data of the two cameras. The schematic diagram is as follows:
The following are basic steps that you can use to refer to the OpenCV documentation for specific functions:
Camera correction, undistortion. Because the lens of the camera is distorted at the factory, in order to obtain accurate data need to be more positive before use. The common method is to shoot several times based on the various gestures of the chessboard, and then calculate the camera matrix to compete. The following diagram is a common calibration interface.
Image alignment, rectification. Because the location of the two cameras is different, so they see the scene is biased, the left side of the camera can see the leftmost scene, the right side of the scene to see the most right. The purpose of image alignment is to get the same scene part. Left and right image match, correspondence. You can use OpenCV to get disparity map. A depth map is obtained by remapping functions, such as the cv::reprojectimageto3d in OpenCV.
Only a depth map is not enough, it is just a moment the real scene in the camera map. To get a complete three-dimensional scene, we need to analyze a series of depth graphs.
HoloLens how to reconstruct three-dimensional scenes from multiple depth graphs?
The answer is slam,simultaneous Localization and Mapping, the synchronous positioning and mapping system. This technology is used in the positioning and searching system of robots, unmanned vehicles and unmanned aerial vehicles. The solution is a very philosophical question:
Where am I now? Where can I go?
SLAM There are many ways to achieve, there is an open source way to achieve a lot of depth map processing and matching algorithms, can be considered three-dimensional version of the OpenCV.
Microsoft has created the Kinect fushion algorithm around the Kinect's depth map data and published two papers:
Kinectfusion:real-time 3D Reconstruction and consortium Using a moving Depth Camera; kinectfusion:real-time dense Surface Mapping and tracking.
Why do I think HoloLens is related to the Kinect fushion? The answer is on this page. Shahram Izadi is the principal researcher and the Shine manager. His interactive 3D technology Group/Interactive 3D Technologies provides research power for Microsoft's many products, including Kinect for Windows, Kinect Fusion, and HoloLens. By the way, their group is hiring:
The Kinect fushion, by moving the Kinect device indoors, obtains a depth map of different angles, real time iterations, a three-dimensional model of the exact room and the objects in the room by accumulating the different depth graphs.
It is divided into four stages:
Depth diagram format conversion, after the conversion of the depth of the unit is meters, with floating point to save. and compute the vertex coordinate and the normal vector of the surface. Calculates the camera posture (including position and orientation) in the world coordinate system, tracking the two values through an iterative alignment algorithm, so that it always knows how much the current camera has changed compared to the initial posture. The third phase merges the depth data of the gesture-known situation into a single three-dimensional Lego space, and you can call it a MineCraft space because the basic element of the space is not a triangle but a square lattice. The frequent occurrence of MineCraft scenarios in the demo video is also related to this phase. Based on the three-dimensional rendering of raycasting, raycasting needs to emit rays from the current camera position to intersect with the three-dimensional space. Lego space is especially suitable for raycasting, which can be used to speed up the intersection of Rays with Octree. Raycasting, raytracing, and rasterization are three common rendering methods, and this is not the case.
In the application of HoloLens we run to the third step, that is, to obtain the three-dimensional Lego model, the fourth step is not necessary. Because the HoloLens screen is transparent and does not need to render the model of the house again, our own eyes have been rendered again:
HoloLens Cool Demo is how to make?
There are three remaining difficulties left for subsequent articles to narrate:
How does gesture recognition work? How does eye tracking work? What is a very fitting three-dimensional rendering?