Notes on stereo programming from:

Advanced Graphics Programming Techniques Using OpenGL

Tom McReynolds

Copyright ©1998 by Tom McReynolds and David Blythe.
All rights reserved

SIGGRAPH `98 Course
(Also in the SIGGRAPH '99 Course)

4.1 Stereo Viewing

Stereo viewing is a common technique to increase visual realism or enhance user interaction with 3D scenes. Two views of a scene are created, one for the left eye, one for the right. Some sort of viewing hardware is used with the display, so each eye only sees the view created for it. The apparent depth of objects is a function of the difference in their positions from the left and right eye views. When done properly, objects appear to have actual depth, especially with respect to each other. When animating, the left and right back buffers are used, and must be updated each frame.

OpenGL supports stereo viewing, with left and right versions of the front and back buffers. In normal, non-stereo viewing, when not using both buffers, the default buffer is the left one for both front and back buffers. Since OpenGL is window system independent, there are no interfaces in OpenGL for stereo glasses, or other stereo viewing devices. This functionality is part of the OpenGL/Window system interface library; the style of support varies widely.

In order to render a frame in stereo:

Computing the left and right eye views is fairly straightforward. The distance separating the two eyes, called the interocular distance, must be selected. Choose this value to give the proper size of the viewer's head relative to the scene being viewed. Whether the scene is microscopic or galaxy-wide is irrelevant. What matters is the size of the imaginary viewer relative to the objects in the scene. This distance should be correlated with the degree of perspective distortion present in the scene to produce a realistic effect.

4.1.1 Fusion Distance

The other parameter is the distance from the eyes where the lines of sight for each eye converge. This distance is called the fusion distance. At this distance objects in the scene will appear to be on the front surface of the display (``in the glass''). Objects farther than the fusion distance from the viewer will appear to be ``behind the glass'' while objects in front will appear to float in front of the display. The latter illusion is harder to maintain, since real objects visible to the viewer beyond the edge of the display tend to destroy the illusion.

Instead of assigning units to it, think of the fusion distance as a dimensionless quantity, relative to location of the front and back clipping planes. For example, you may want to set the fusion distance to be halfway between the front and back clipping planes. This way it is independent of the application's coordinate system, which makes it easier to position objects appropriately in the scene.

To model viewer attention realistically, the fusion distance should be adjusted to match the object in the scene that the viewer is looking at. This requires knowing where the viewer is looking. If head and eye tracking equipment is being used in the application finding the center of interest is straightforward. A more indirect approach is to have the user consciously designate the object being viewed. Clever psychology can sometimes substitute for eye tracking hardware. If the animated scene is designed in such a way as to draw the viewer's attention in a predictable way, or if the scene is very sparse, intelligent guesses can be made as to the viewers center of interest.

The view direction vector and the vector separating the left and right eye position are perpendicular to each other. The two view points are located along a line perpendicular to the direction of view and the ``up'' direction. The fusion distance is measured along the view direction. The position of the viewer can be defined to be at one of the eye points, or halfway between them. In either case, the left and right eye locations are positioned relative to it.

If the viewer is taken to be halfway between the stereo eye positions, and assuming gluLookAt() has been called to put the viewer position at the origin in eye space, then the fusion distance is measured along the negative Z axis (like the near and far clipping planes), and the two viewpoints are on either side of the origin along the X axis, at (-IOD/2, 0, 0) and (IOD/2, 0, 0).

L <)---___    ^      ---___    |            ---___      fusion distance    |          angle / ---___   /IOD |-------------------------->    |               ___---~~~    |         ___---    V   ___--- R <)---Figure 8. Stereo Viewing Geometry

4.1.2 Computing the Transforms

The transformations needed for stereo viewing are simple rotations and translations. Computationally, the stereo viewing transforms happen last, after the viewing transform has been applied to put the viewer at the origin. Since the matrix order is the reverse of the order of operations, the viewing matricies should be applied to the modelview matrix stack first.

The order of matrix operations should be:

  1. Transform from viewer position to left eye view.
  2. Apply viewing operation to get to viewer position (gluLookAt() or equivalent).
  3. Apply modeling operations.
  4. Change buffers, repeat for right eye.
Assuming that the identity matrix is on the modelview stack:
glLoadIdentity(); /* the default matrix */
glTranslatef(-IOD/2.f, 0, 0)
glRotatef(-angle, 0.f, 1.f, 0.f)
<viewing transforms>
<modeling transforms>
glTranslatef(IOD/2, 0., 0.)
glRotatef(angle, 0.f, 1.f, 0.f)
<viewing transforms>
<modeling transforms>
Where angle is the inverse tangent of the ratio between the fusion distance and half of the interocular distance. angle=arctan((fusuon distance)/(IOD/2)) Each viewpoint is rotated towards the centerline halfway between the two viewpoints.

Another approach to implementing stereo transforms is to change the viewing tranform directly. Instead of adding an extra rotation and translation, use a separate call to gluLookAt() for each eye view. Move fusion distance along the viewing direction from the viewer position, and use that point for the center of interest of both eyes. Translate the eye position to the appropriate eye, then render the stereo view for the corresponding buffer.

The difficulty with this technique is finding the left/right eye axis to translate along from the viewer position to find the left and right eye viewpoints. Since your now computing the left/right eye axis in object space, it is no longer constrained to be the X axis. Find the left/right eye axis in object space by taking the cross product of the direction of view and the up vector.

4.1.3 Rotate vs. Shear*

Rotating the left and right eye view is is not the only way to generate the stereo images. The left and right eye views can be sheared instead. The left and eyes remain oriented along the direction of view, but each eyes view is sheared along z so that the two frustums converge at the fusion distance.

Although shearing each eye's view instead of rotating is less physically accurate, sheared stereo views can be easier for viewers to achieve stereo fusion. This is because the two eye views have the same orientation and lighting.

For objects that are far from the eye, the differences between the two approaches are small.

*Also see: Notes on Rotation vs. Shear by Joe Krahn