Dr Simon Taylor, Founder & Research Director, Zappar
The “Next Generation” Augmented/Mixed Reality headset race is heating up. There is a significant amount of hype building around Microsoft HoloLens, and Magic Leap have been generating attention ever since last year when they appeared from nowhere and announced an investment of more than $500 million from a group including Google and Qualcomm.
This initial Magic Leap attention led me to write a blog almost exactly one year ago discussing the display properties that I thought would be key for widely adopted Augmented Reality eyewear. The latest Magic Leap demo video “shot directly through Magic Leap technology” finally gives us an opportunity to see how their hardware might stack up against the properties I discussed in that article.
1. Field of View and Resolution
The first item on my wishlist was a combination of large field of view and high resolution. Here's a frame of interest from the Magic Leap demo video:
We can't say definitively what the field of view of a Magic Leap device would be from the video, but it is clear the virtual content extends right to the edge of the camera image, so we know it will be at least as high as the field of view of the camera used to shoot the demo. From the way the background scene changes as the camera moves about it appears a relatively standard handheld camera lens at the wide-angle side of the zoom setting; something akin to the roughly 60 degrees horizontal field-of-view of an iPhone 6. That's not too bad, and seems to me larger than the field-of-view reported by people who have tried out the HoloLens prototypes.
In terms of resolution this is again something we are not able to measure directly from the video. The high level of compression on the YouTube video means I haven’t been able to find any frames to do an accurate resolution estimate, so really all we can say from the image above is that it appears reasonably detailed.
From a resolution and field-of-view perspective, I’d say the demo video presents a pretty compelling experience.
2. Focus Behaviour
The second point I mentioned in my original blog was offering correct focus behavior within the virtual scene. This is definitely a key feature of the Magic Leap proposition; it’s something I’ve heard them mention in pubic, but until the video release we weren’t really able to see what they could deliver.
A slight aside on focus...
Before diving into the video though, let’s cover a bit of background about what happens in our eyes when we view a real 3D scene, and why we can only focus on objects at a single depth at any one time.
The diagrams above show the process of forming an image of a 3D object on the retina at the back of our eyes. The image on the left shows the light rays from two distinct points on the 3D object that pass straight through the centre of the lens. These rays will not be refracted (bent) at all. The figure with just these “central rays” allows us to explain why the image is formed upside down in with respect to the real world, but does not provide any insight into how focus works.
To understand focus we need to consider not just a single ray through the centre of the lens, but rather all of the rays that enter the eye through the pupil. Light rays will leave a 3D point on an objects surface in all directions, as the majority of objects in typical real-world scenes are visible due to diffuse reflection of light, where incoming light is reflected in all directions from the surface. A “cone” of those rays will enter the eye through the pupil, and all of those rays will contribute to the final image we observe.
The image on the right shows these cones of rays from the same two points used in the previous diagram. All of the light in each cone came from the same 3D point, so for us to observe this as a single sharp point on the retina, the lens in the eye must bend the outer rays of the cone back towards the central ray. The amount of bending required depends on the angle of the cone, which is directly related to how far away the original point is in the world. Thus we need to alter the shape of the lens to focus at different depths and can only see one narrow depth range of the world as being “in focus” at the same time. In the scenario shown in the image, the brown point from the trunk is in focus as the cone meets back at a point on the surface of the retina, whereas the green point will be slightly out-of-focus and be imaged on the retina as a circle. The size of the circle (how “out of focus” the point appears) depends on how far away its depth is from the current depth we are focusing on in the scene.
So let’s say we want to now simulate this real 3D scene with correct focus cues using a near-eye display, positioned somewhere like the blue line on the right-hand diagram. It’s clear that it’s not simply a case of putting a normal “screen” there and using one pixel for the brown point, and one pixel for the green point, each emitting light from those points in all directions. Instead the brown and the green “cones” are overlapping and both contain rays across a good portion of the surface, but importantly with the ray though each point on the surface having a particular direction in each cone. This set of light rays descirbed by both a point on a surface and a direction through that surface is called a “Light Field”, and devices that can generate such a set of light rays are known as “Light Field Displays”.
Back to Magic Leap...
Magic Leap have a patent that describes one way that this could be done. The image above is taken from their patent application and shows a block of material containing a whole set of surfaces that partially reflect input light rays. The shape of the surfaces is such that input light in a particular direction (straight up as shown in the diagram) will produce a cone of light that is identical to one that would be produced by a point source much further away from the display surface in the real world. Changing the angle of the input light rays (566) alters the paths of the rays emitted so the virtual point source appears to be from a different position at the same virtual depth. Thus by altering the angle of the input light it’s possible to scan through a set of “pixels” on a virtual screen at some depth in the world.
The depth of these virtual pixels is dependent on the shape of the reflectors in the material. Magic Leap’s patent describes a couple of ways of taking the core idea above and building a display that can have multiple virtual focus planes for the content. One way is to stack up a set of these layers and have them each configured with differently shaped internal reflectors to provide virtual points at different depths. Then when scanning the beam to generate the light field, it can also be directed into one of the multiple layers of reflectors, to select the depth at which it appears. Another suggestion is to construct the reflectors in a single layer out of a material that allows the shape of the reflectors to be altered dynamically, perhaps using some sort of liquid crystal technology.
I’ve known all that stuff since reading through the patent around a year ago, but what details we can work out from the videos? The first point is Magic Leap are clearly very proud of their Light Field Display technology as many shots involve the camera changing focus and altering the sharpness of both the virtual content and the background. It’s also clear that the video does show the content being displayed at different focal distances. The initial solar system shot has the content around 1m from the camera. A later close up shot shows the content out-of-focus until the camera’s focus is adjusted to be much closer to the camera than that.
Another interesting issue is whether content can simultaneously be displayed at different virtual focal planes. I haven’t seen any frames in the demo video where this is obviously the case. This should be most obvious in the close-up sequence (as in photography, “macro” focus gives the most pronounced out-of-focus effects in the background). However in the image below as the content comes into focus it appears to all have the same degree of sharpness regardless of the distance of the virtual content. There is however a significant difference in the distance to the camera of the earth and the asteroids in the background, so I think I would expect to see different levels of focus between these objects if the true Light Field for the virtual scene was generated.
Vertical crops of 3 different frames from the demo video as the content comes into focus
Although there are some cuts in the demo video, it looks to me as if there is enough motion within some of the shots to conclude that the virtual focal plane of the content is being adjusted dynamically. If I had to guess I would suggest the setup we are seeing here has a single layer of the type shown in the patent image composed of reflectors whose shape can be adjusted dynamically. However the rate of change of the shape components is probably not sufficient to use a dynamic depth for each pixel, so the entire image appears to be in focus at a single depth, but that depth can be slowly and smoothly adjusted as the user moves relative to the content.
I should point out here that in a 3D headset there will be separate views generated for the left and right eyes. The relative disparity between the position of an object in those two views is the dominant cue for our perception of depth, and is the only cue used by current VR headsets and 3D films. There is no doubt a Magic Leap headset would also present stereo views, which would allows the apparent depth to vary continuously within the virtual scene. However correctly generating the focus cues too (even just approximately as a single movable focal plane) should add an additional level of realism, and also allow Mixed Reality experiences where it is important both the real world and the virtual content at a particular depth appear in focus at the same time.
Overall I’m pretty intrigued to try this out myself, although that would involve signing an NDA and deprive me of the fun job of trying to work things out from their public disclosures! It seems like the controllable focus aspect of the display is something Magic Leap have nailed, and even if it’s only a single plane for all the content it could still power some interesting applications.
3. Controllable Transparency
The neat Light Field display described above demonstrates how light rays from virtual objects can be simulated correctly so they appear at a controllable virtual depth in the real scene. Unfortunately that’s only half of the story. To correctly simulate the light field for a scene with an added solid virtual object you don’t just need to add the light field that would result from the virtual object’s surface; you also need to block the light rays coming from the real world behind where our virtual object is rendered – in real life the object would block those light rays from reaching our eyes.
The problem of blocking light from a surface that does exist is similar to the problem of simulating light from a surface that doesn’t – it is not enough to simply block all rays passing through a particular point on the near-eye display surface; rather incoming rays must be blocked based on the combination of their position and their direction. I’m yet to see any compelling solutions to this problem demonstrated.
I had hoped that the talk of “Cinematic Reality” around the original Magic Leap announcements along with the mocked up views of solid objects inhabiting outdoor scenes that they showed on their website indicated that Magic Leap had got this one cracked too. Unfortunately from the demo video it is apparent that if that have got a solution to this issue, it is not yet one they are demonstrating. The content and environment have clearly been carefully selected for this demo so it is not easy to notice the lack of opacity of the virtual objects. It is however clear that this is purely additive blending – I certainly didn’t notice any frames where a virtual object appears darker than the color of the background, so there appears to be no selective darkening of the image from the real world.
How much of a problem this purely additive blending is really depends on the brightness and dynamic range (amount of contrast) in the background scene, and the level of detail of the image. The backgrounds are very dark in the demo video (there’s a reason that robot is hiding under the table!) and the solar system scene I find myself begging someone to turn a light on. Thanks to that carefully selected environment and content the virtual objects in these scenes do look pretty solid. The “darkest” part on the virtual objects that I could see was the black spot on Mars. By looking at this spot as it moves from being in front of the chair to in front of the wall it’s possible to see it getting noticeably lighter. In fact with a certain amount of mental effort I can perceive that spot as a transparent window through to the world beyond.
The dark spot on Mars is darker when in front of the chair
How much does this matter?
To demonstrate how Mars might look on a brighter background scene using purely additive blending, I’ve mocked up the image below.
Edited 28 Oct: A commenter on the HackerNews post of this article suggests additive blending is not a good way to approximate human perception and questions the validity of the mock-ups images. I don't know very much about the human visual system, so will happily concede these mockups might not accurately represent the view through the display. However from trying other see-through additive displays these mock-ups appear reasonable to me, but it's possible that something fundametally changes in human perception when a Light Field Display is in use.
The left-top image shows a perfectly sharp and normally bright background. In this case the only “additive Mars” that looks at all believable is the view in front of the black background of the desk. The white wall background is already quite bright, so additively blending the image over this area is doomed to failure. The other positions also lack contrast in the overlaid imagery – it is hard to distinguish features on the image over the keyboard due to the high-contrast texture in the background.
Things look a little better if the background is out-of-focus, as in the middle image. This would be the case if the virtual planet was displayed relatively close to the viewer, and the background real-world objects are further away. In this case it is possible to distinguish more of the features of the planet texture shown above the keyboard; the background scene is less distracting. However displaying anything reasonable on the brighter background parts of the scene is still difficult.
For the image on the right I’ve additionally reduced the brightness and contrast of the background. I approximately matched the brightness of my wall to the “white wall” in the background of the Magic Leap demo. Now our additive planets look a lot more solid, much closer to how they appear in the demo video. With a bit of animation and camera motion you might well be convinced they were in fact solid in that environment. The boundary between the desk and the wall is not that visible behind the rightmost Mars image as the texture on the planet and the blurred and lower contrast background helps to hide it well. However I still perceive the image that overlaps the top of my laptop screen as noticeably transparent – the blur and reduced contrast can’t hide the fact the screen is still blue. Look again at the Magic Leap demo video and note how muted all of the real-world colors are in background scenes behind the content.
The lower row of images demonstrates how having a detailed texture on the object helps to hide the transparency – in all cases the plain circles are perceived as transparent unless the world behind it is a single color.
What this all means in practice is that any purely additive see-through displays will not be able to produce convincingly solid objects with arbitrary textures on arbitrary backgrounds. There are perceptual tricks that can be played if the content is controlled, but it seems the environment also needs to be significantly darkened to achieve reasonable results in general.
Conclusions and Comparisons to HoloLens
Out of my three “required features”, Magic Leap is able to tick off two – relatively large field of view, and a controllable focus plane for the virtual content. That was the same score I gave HoloLens based on much more guesswork and speculation. Having read more first-person reports of HoloLens in practice, I would now knock off my tentative point for Field of View (it seems universally reported as small on HoloLens prototypes), and I’ve still be unable to verify that the virtual focus plane is controllable in the HoloLens hardware.
Therefore I’d rate the Magic Leap technology as a much more solid 2 out of 3 than the HoloLens tech. My conclusions from the HoloLens blog are still appropriate though – is 2 out of 3 enough?
I’d previously hoped that Magic Leap had somehow solved the problem of controllable opacity, so it is a shame they haven’t done so. For me the utility of such a headset is severely restricted without this capability. There are use cases where it is possible to make purely additive blending work – one example would be for 3D artists to set up a dark black space to one side of their monitor, and using that space to display a real-time preview of the model they are working on when they look in that direction. However most of the use cases being shown in mock-ups and Microsoft’s “live demos” are in general uncontrolled environments. I think purely additive blending will be a severe restriction in these cases. The background scene can be globally darkened to give sufficient contrast to the content, but that feels like a step away from the promise of truly “Mixed Reality” experiences. As soon as you’re forced to de-emphasize the real world to make the virtual content visible, it means the user now has to choose between focusing attention on the real or the virtual, and diminishes the feeling that the virtual objects are truly part of the real world scene.
Magic Leap’s display technology does look like an impressive feat, with decent field of view, and controllable focus. I’m excited to try it out, and I applaud them for sharing a video shot directly through their display without employing compositing tricks.
I have significant doubts about Microsoft’s “Live Demos” on the other hand. I believe the tracking of the camera position is real, and looks to be very high quality. However I suspect the videos that they show are not an accurate reflection of how the scene would appear to a HoloLens user, but are instead a composited view of a standard camera shot of the real world and the rendered content. Many reports have mentioned the field of view with the prototype is much lower than it is made to appear in their videos. The bigger problem I have with those videos though is the use of standard blending to combine the real and virtual content (all of the HoloLens demos show the virtual content able to darken areas of the world). I have yet to see any real-world reports that indicate HoloLens is capable of that; instead I’m pretty confident it contains an additive display with a single panel that can darken the view of the entire world.
An image from a HoloLens "Live Demo". This is not additive blending - note the hole in the wall appears much darker than the real world in the background
I read Magic Leap’s caption of “No special effects or compositing were used in the creation of these videos” as a direct challenge to Microsoft – stick a camera inside your HoloLens display and show us what it would actually look like. I’d certainly be keen to see that.
I’ll reserve judgment on how important the controllable transparency aspect is for general AR/MR eyewear until I’ve had a chance to try them out for myself. I’m still really excited about the possibilities for dedicated AR/MR hardware and look forward to trying out some prototypes as the various competitors gradually lift the veil on their devices.
We'll continue to keep a close eye on the developing device ecosystem and ensure that Zappar’s platform and tools are perfectly suited to the AR content creators of tomorrow, regardless of the device that is used to deliver the experience.
ps: I've posted this to HackerNews, so I'll follow this thread and try to answer any comments or questions there.