You will need an mpeg movie player to view the clips on this page. You should be able to find what you need here

A Video-Based System for Tracking Eye and Head Movement

Video demonstrations from a presentation given by Jeff Mulligan at the 1998 Annual Meeting of the Optical Society of America.


Overview of head mounted video camera system (4.4M)

This segment begins with a shot of me pointing out the various components of the headmount. The scene camera is the small horizontal cylinder mounted slightly below my eye. There is a small mirror attached to a stalk next to the camera which redirects the line of sight forward. In the original ISCAN design, this mirror directed the line of sight upward, where another reflection from the hot mirror redirected the beam forward. Their design had the advantage that the camera's line of sight could be made to be more nearly coincident with the subject's line of sight, and because there were two mirror reflections the final image was not inverted. After we reduced the field of view with a longer focal length lens, however, we started seeing a double image caused by reflections from both the front and back surface of the hot mirror. We therefore adopted the present configuration.

The eye camera and illuminator (an IR LED) are mounted on the upper part of the unit, both point down and are directed towards the subject's eye by the "hot" (infrared) mirror oriented at about 45 degrees.

The camera then pans to our video console where two monitors display the images from the two head-mounted cameras. We then return to a view of the subject working at a console. A final view of the scene camera image on the monitor indicates that lines of text can be resolved in the scene camera image.


Visualization of pupil tracking (620K)

Four seconds of eye movement are analysed. Three features are located: the pupil margin, the corneal reflex (CR), and the fourth Purkinje image (P4). This segment shows the fit to the pupil margin as a pair of rings, a light ring drawn inside the pupil, and a dark ring drawn outside. When the pupil is accurately located, the iris margin should appear centered between these two rings. Tracking is pretty good, although some instability can be seen after the blink, and when the pupil margin goes off the side of the image. Note that the pupil is fit with an ellipse, and that the eccentricity changes as the pupil moves around.


Visualization of corneal reflex (CR) tracking (620K)

This segment shows the CR tracking, which is pretty good except for large deviations of gaze for which the CR falls outside of the pupil. (The current software restricts the search for the CR to the pupil.) Although the slow-motion playback may not be slow enough to observe this, if you have a vcr which can do pause/still frame, you may notice that on frames captured during a saccade the CR is no longer a circular disk, but a streak. Currently the software does not exploit this additional source of information about the direction of the eye movement. One might imagine strobing the illuminator...


Visualization of fourth Purkinje image tracking (620K)

The next segment shows the P4 tracking. At least, it attempts to show it. Unfortunately, the image of P4 is so weak that, when viewed on my screen, the MPEG encoding seems to have obliterated it... I am working on a contrast-enhanced version. P4 can be found fairly reliably when the eye is still, but during saccades the smeared P4 is below the threshold of the camera and it disappears. Towards the end of the clip there are also some false locks on what I think is P3 (a diffuse bright blob that moves around with the eye movement). The most useful approach may be to predict the expected position of P4 from the pupil and CR data, and see how close the measured position comes to the expected position (or simply restrict the search to a neighborhood of the expected position). If P4 is found near where we expect it, then its position may be used to refine the estimate of gaze, otherwise we should probably not include it.


Simple head tracking using scene camera imagery (550K)

The first segment shows a series of scene camera images overlaid on a mosaic derived from the entire sequence. The mosaic was constructed by averaging shifted copies of the input images, without any geometric corrections or rotations. The results are fairly decent (well, judge for yourself), although errors of several pixels can be seen easily.


Visualization of features inset w/ average registration (520K)

As a first step towards getting better accuracy, I tried tracking individual features. The 15 strongest features were first located (using gaussian curvature of the image "surface"), and then tracked individually in the scene camera movie. The first sequence showing the features, however, was not generated using the individually tracked features, but using the same shifts that were used in the first sequence, i.e. the best shift for the whole image. These are shown just to illustrate the errors at individual features when using the first method. The white rectangle indicates the position of the current input frame, the circles indicate features that are located within the frame. Within each circle, image data from the input image is blended with the mosaic image. The best way to see what is going on is to pick one of the feature locations, and just stare at it as the window roves around. You will probably see the feature jiggle around in time with the motion of the input frame. This jiggling is the residual error caused by failure of the various assumptions underlying the simple method.


Visualization of independently registered features (520K)

The final segment uses the same visualization technique, but here the features are registered individually. That is, instead of matching up the whole input image, and using that shift for all the features, the features are individually matched against the mosaic, and each feature is displayed using the best match. Here, a little bit of residual motion can still be seen, but it is significantly better than the preceding version. I have not worked out the details of how to exploit this, but the broad scheme is to use the residual motion of the features to refine the models of the scene and/or the camera.


Web Curator: Valerie Huemer