Presented at the 1998 IS&T/SPIE Electronic Imaging Symposium, January 24-30, San Jose, CA.

Published in B. E. Rogowitz and T. N. Pappas, eds., Human Vision and Electronic Imaging III, SPIE Proc. Vol. 3299, pp. 79-85, 1998.

# indicates updates from the published version

A Technique to Extract Relevant Image Features for Visual Tasks

Bettina L. Beard* and Albert J. Ahumada, Jr.
NASA Ames Research Center, Mail Stop 262-2, Moffett Field, CA 94035

ABSTRACT

Here we demonstrate a method for constructing stimulus classification images. These images provide information regarding the stimulus aspects the observer uses to segregate images into discrete response categories. Data are first collected on a discrimination task containing low contrast noise. The noises are then averaged separately for the stimulus-response categories. These averages are then summed with appropriate signs to obtain an overall classification image. We determine stimulus classification images for a vernier acuity task to visualize the stimulus features used to make these precise position discriminations. The resulting images reject the idea that the discrimination is performed by the single best discriminating cortical unit. The classification images show one Gabor-like filter for each line, rejecting the nearly ideal assumption of image discrimination models predicting no contribution from the fixed vernier line.

Keywords: spatial vision stimulus classification templates visual noise discrimination vernier acuity pattern recognition detection

1 * Further author information-
B.L.B. (correspondence): Email: tina@vision.arc.nasa.gov; http://vision.arc.nasa.gov/~tina/beard.html; Telephone: 650-604-1327; Fax: 650-604-0255.

1. INTRODUCTION

Researchers attempt to understand human sensory processing through psychophysical discrimination experiments. In a discrimination experiment, observers compare two stimuli that differ. Perceptually there is a set of image percepts. To make the discrimination, the observer must divide this set into two discrete categories. The rule that the observer uses to form the categories depends on the task and the variable that is being manipulated in the experiment. Here we describe a technique that provides a window to visualize the underlying stimulus classification rules.

To determine the stimulus aspects used to classify the image into discrete categories, white noise is added to the stimulus. The noise is of very low contrast to minimize its effect on the decision rule but strong enough to influence the decision. The noise sample presented on each trial, the observer's response (R0 or R1) and the stimulus type (S0 or S1) are recorded. The noises are then averaged for each of the four stimulus-response categories. These averaged noises are then combined with appropriate signs to form a classification image illustrating the contribution of each noise image pixel to the decision. A perceptual classification image for a stimulus is the correlation over trials between the local noise contrast and the observer's responses to that stimulus. Algebra shows that the correlation for Si is proportional to the difference between the average noise image for Si R0 and Si R1. If the classification rule is linear, the correlation images for S0 and S1 will be proportional to each other and can be combined to form a single image.

Much research on human spatial vision has focused on the hyperacuities [ref. 1] , which are measured under conditions where very fine visual discrimination is achieved in terms of spatial parameters. Vernier acuity, one branch of the hyperacuity family, refers to the smallest misalignment of two lines an observer can detect (see Figure 1). For abutting stimulus features, and under optimal conditions, vernier thresholds as small as one arc sec have been reported. [2] It is of interest to know what cues are used that can account for such precise discriminations. It is postulated that these cues change if a wide spatial separation is introduced between the vernier lines (see Figure 1b) since there is a dramatic elevation in threshold under these conditions. [3,4] In the terminology used above, 'what are the stimulus classification rules for alignment discrimination and do they change with line separation?'

Figure 1. Vernier acuity stimulus. In this example, the vernier stimulus is composed of two line features. On each trial, the right line will either be aligned (S0) with the left, or displaced upward (S1) relative to the left line. The task is to categorize the trial as "aligned" (R0) or "offset" (R1).

Psychophysical experimental results have suggested to some that the stimulus classification rules of vernier acuity may be based on cortical orientation selective filters. [5] When abutting vernier acuity is measured in the presence of high contrast, one dimensional, spatially oriented masks, thresholds are most strongly elevated at mask orientations that are ±10 to 20 deg away from the vernier target lines. [6,7] This result has lent support to the hypothesis that, for abutting line features, vernier thresholds are determined by the contrast response properties of cortical units sensitive to this oblique angle. It has even been suggested that the discrimination is mediated by the single most discriminating cortical unit. In this case the classification image would resemble a single Gabor filter. The upper left panel of Figure 2 shows a single oriented channel with an excitatory center and inhibitory flanks, attempting to mimic the classification images generated by the observers. In this illustration, the filter for the abutting case is oriented 14 deg relative to the horizontal vernier features. The lower left panel shows the prediction of a single Gabor filter angled at 5 deg for widely separated vernier features.

Since the orientation tuning functions obtained using the spatially oriented mask peak at both + and - 10 to 20 deg away from the vernier target lines [6,7] , it is also possible that two Gabor filters oriented on either side of the vernier target define the stimulus classification rule. Figure 2b (center panels) presents the expected classification image if the difference between two oriented even-symmetric Gabor channels is used for alignment detection for abutting (2b upper panel) and wide separation (2b lower panel) vernier conditions.

A third possible classification scheme are filters that place a position label or 'local sign' by estimating the centroid or weighted average vertical position of the contrast in a region surrounding each line. The alignment judgment is then based on the difference between these vertical position estimates. Figure 2c presents the expected classification image for two odd-symmetric Gabor filters being used to compute the 'local' vertical position, for abutting and wide separation conditions.

Figure 2. The three upper panels represent predicted classification images for abutting vernier features. Lower panels represent predictions for widely separated features. The marks positioned immediately above each panel show the stimulus location. The leftmost panel (a) shows the predicted image if the classification rule is based on a single even-symmetric Gabor filter. The center panels (b) show the predicted image for a Gabor filter pair oriented at ±14 deg for the abutting features and ±5 deg for the widely separated features. The images were formed by subtracting the output of one filter from the other. (c) Predicted images if the observer is using a local centroid rule where two independent filters calculate the center of the stimulus distribution to make the localization.

In the one-dimensional oriented mask experiments [6,7] , varying the properties of the highly visible mask may change the strategy or relevant feature used by the observer. [8] The stimulus classification technique lets the noise variability on individual trials expose the classification rule while the experimental conditions are held constant.

2. METHODS

Line stimuli were presented on a 640 x 480 pixel display screen. Viewing was binocular with natural pupils from a 14 ft viewing distance. This distance was achieved using a mirror. From this distance, each pixel subtended 0.31 arc min. The display screen had a constant mean luminance of 26.25 cd/m 2 . The room lights were extinguished to prevent overhead lighting from reflecting off the display monitor.

Vernier stimuli were two short, dark, horizontal lines (5.0 arc min by 0.93 arc min)#. Stimulus duration was 500 msec with abrupt onset and offset. Two spatial separations were tested: (a) gap = 0, (b) gap = 10.2 arc min.# A relatively long line length was chosen since threshold is independent of line length for lengths above 5 arc min.[9] Noise contrast was increased to maintain an error rate near 25%. The gap image had a 33 pixel space (10.2 arc min)# separating the two line features.

The vernier stimulus was added to and centrally located within a 39.7 arc min by 39.7 arc min (128 by 128 pixels) noise display area. A single white noise image was first generated where each pixel value was drawn from a uniform random distribution covering the range of -1 to 1. Because this distribution had a mean of zero, it did not effect the experimental image mean luminance. The initial noise peak contrast was ± 0.25.# In a pilot study it was determined that this barely perceptible contrast level had negligible effects on vernier thresholds. To allow fast generation of a different noise on each trial, a single 128 x 128 noise image was shifted in one dimension (with wrap-around) by a random amount.

Alignment thresholds were obtained using a two interval forced choice (2IFC) method of constant stimuli with one vernier offset value. Data were obtained in blocks of 100 trials. On a given trial, the left vernier line was randomly either aligned with the right line or upwardly offset by 0.31 arc min. Three observers were tested for the abutting vernier feature condition and two observers for the wide separation condition. The observers were instructed to report if the two lines were aligned or vertically misaligned. After each trial, a tone gave the observer feedback. Observer's visual acuity was corrected to at least 20/20.

3. RESULTS

Initially we present the classification images for the combined observer results. The combined results were computed by concatenating the trial-by-trial data for each observer and treating it as data from a single observer. Figure 3 presents theaveraged noises for each of the trial types for the abutting vernier condition. The number of trials contributing to the averaged noises are shown above each panel (e.g., 4772 S0 R0 trials). The overall error rate was (1014+1196)/(11398) = 19.4%. Although there are more correct trials (S = R) than there are error trials (S not equal to R) and the same correlation pattern must appear in both (each image goes into one or the other response group), the pattern of correlation is more visible from the error trial images. The pattern can be described as dark contrast in the region above the possibly offset line leads to the response "offset", light contrast below the line position leads to the response "aligned" and the pattern is reversed for the line that does not change position.

We then combined the four noise averaged images from Figure 4 to obtain a raw classification image. Letting SiRj indicate the average image for the SiRj trial type, we computed

Raw Classification Image = (-S0 R0) + S0 R1 + (-S1 R0) + S1 R1 (1)

The image polarity for "aligned" responses were reversed to make them compatible with the "offset" responses. The resulting image (shown in Figure 4a) resembles two sine phase Gabors of opposite polarity. To foster visualization of the classification image, we then computed a weighted average of adjacent pixels using a 5 by 5 kernel (Figure 4b). The expected standard deviation of these pixel values for sorting, independent of the noise, was computed based on the number of images, the uniform distribution of single pixel noise, and the adjacent pixel averaging or smoothing. Figure 4c shows pixels within 2 standard deviations of the expected value in neutral gray and other pixel values quantized in 1 standard deviation steps. The resulting classification image is a schematic of the classification rule in detecting misalignment.

Figure 3. Each panel presents the averaged noise for a particular stimulus-response category. The numbers in parentheses indicate the total trial number comprising that image. On trials where the vernier features are vertically offset, the observer will either guess that there was an offset or that there was not (first row of the figure). Observers will also respond either that the stimuli were different or the same on trials containing identical stimuli (second row of the figure). The contrast of each image was scaled by its own peak contrast and the contrast polarity of the R0 images was reversed to keep the expected pattern the same in all images.

Figure 4. (a) A stimulus classification image for abutting vernier discrimination. (b) The image after a weighted average was computed. This weighted average was based on the center and four flanking pixels in the cardinal directions. In this way, the center pixel had the greatest weight for any given smoothing operation. (c) The rightmost panel shows pixels that were at least two standard deviations lighter or darker than the neutral gray value.

Figure 5. Stimulus classification images for an abutting vernier acuity task. The results of three observers are shown (i.e., PW, DF and BLB) as well as the classification image formed by a combined data set. Dark areas imply that darker contrast of the noise in these locations contributed to the "offset" response. Light areas in the image suggest that darker contrast in the local noise image led to the "aligned" response. The group sum was based on 11398 trials.

Figure 6. Stimulus classification images for vernier features separated by 10 arc min. The results of two observers are shown (i.e., PW and DF). The group sum was based on 7700 trials.

Figure 5 presents the statistical significance images for abutting vernier features for each of the three observers and for the group. Although there are individual differences, all observers show discrete bipolar distributions. Figure 6 presents the data for the wide feature separation condition. Here for each of two observers, bipolar distributions are seen.

4. DISCUSSION

The first aim of this paper was to describe the stimulus classification technique, and show how it can be used to estimate the classification rule for detection and discrimination tasks. The second aim was to use this technique to test the psychophysically-based idea that abutting vernier discrimination thresholds are primarily determined by a mechanism tuned to orientations ±10 to 20 deg away from the stimulus orientation while vernier thresholds for widely separated features are determined by the linear spacing of cortical units (local signs or centroids).

We added external quasi-random white noise to a vernier target image to estimate perceptual classification images used by observers. There are several classification rules that an observer might use to detect the offset of two lines. One possibility is that the contrast sensitivity of the most discriminating cortical unit determines the response. Looking back at the left column of Figure 2 we see the expected image for abutting lines above and that for widely separated features below. The observed classification images are not consistent with the discrimination being based on the output of a single Gabor-like filter for either the abutting or wide separation conditions.

Another possibility is that the classification decision is determined by the difference between two Gabor filters, each oriented 10 to 20 deg on each side of the vernier target angle in the abutting case. Somewhat counterintuitively, the expected image looks like two abutting sine phase Gabors of opposite polarity (even when the angle between them is reduced, they do not cancel in the central region). This model is tenable for the abutting case, but not the separated case.

A 'local sign' mechanism based on the contrast centroid is a good predictor of the classification images. Although it sounds like a very different mechanism from the oriented filter pair model, it turns out to be indistinguishable from it when the lines are abutting. When the lines are apart, it is the only tenable model of the three considered here.

Ideal observer and current image discrimination models [10] would predict that the classification image would be the difference between the aligned and offset vernier stimuli. The appearance would be that of a blurred horizontal dipole in the location of the left line. This is similar to the appearance of the left part of the classification image, but the extent of the blur is much more than the model predicts. Also, the stimulus classification image shows that both left and right lines were used in the vernier judgments. The results illustrate the problems that can arise from the lack of position noise in image discrimination models.

The stimulus classification technique was introduced in auditory research to extract the relevant features for tone detection [11,12] and has been used in vision research to obtain classification images for vernier acuity that is not within the hyperacuity range [13] and in letter discrimination. [14] These results show that the technique is useful in clarifying theoretical questions about receptive field properties of the underlying mechanisms for visual discriminations.

5. ACKNOWLEDGMENTS

A portion of this research was previously presented by Beard and Ahumada [15]. Supported by NASA/AOS and NASA/UL.

6. REFERENCES

[1.] G. Westheimer, "Visual acuity and hyperacuity," Investigative Ophthalmology 14, pp. 570-572, 1975.

[2. ]S.A. Klein, D.M. Levi, "Hyperacuity thresholds of 1 second: Quantitative predictions and empirical validation," Journal of the Optical Society A 2, pp. 1170-1190, 1985.

[3.] S.A. Waugh, D.M. Levi, "Visibility and vernier acuity for separated targets," Vision Research 33, pp. 539-552, 1993a.

[4.] B.L. Beard, D.M Levi, S.A. Klein, "Vernier acuity with non-simultaneous targets: the cortical magnification factor estimated by psychophysics," Vision Research 37, pp. 325-346, 1997.

[5.] H.R. Wilson, "Responses of spatial mechanisms can explain hyperacuity," Vision Research 26, pp. 453-470, 1986.

[6.] S.J. Waugh, D.M. Levi, T. Carney, "Orientation, masking, and vernier acuity for line targets," Vision Research 33, pp. 1619-1638, 1993.

[7.] D.M. Levi, S.J. Waugh, B.L. Beard, "Spatial scale shifts in amblyopia," Vision Research 34, pp. 3315-3333, 1994.

[8.] B.L. Beard, A.J. Ahumada, Jr., "Fixed pattern noise advantage absent in the periphery," Optics and Photonics News 6 p. 138, 1997.

[9.] G. Westheimer, S.P. McKee, "Integration regions for visual hyperacuity," Vision Research 26, pp. 89-93, 1977b.

[10. ]A.J. Ahumada, Jr., B.L. Beard, "Image discrimination models predict detection in fixed but not random noise," Journal of the Optical Society of America A 14, pp. 2471- 2476, 1997.

[11.] A.J. Ahumada, Jr. , R. Marken, A. Sandusky, "Time and frequency analyses of auditory signal detection," Journal of the Acoustical Society of America 57, pp. 385-390, 1975.

[12.] A.J. Ahumada, Jr., J. Lovell,"Stimulus features in signal detection,"Journal of the Acoustical Society of America 49, pp. 1751-1756, 1971.

[13.] A.J. Ahumada, Jr., "Perceptual classification images from vernier acuity masked by noise," Perception 26, p. 18, 1996.

[14.] A.B. Watson, R. Rosenholtz, "A Rorschach test for visual classification strategies," Investigative Ophthalmology and Visual Science 38, pp. S1, 1997.

[15.] B.L. Beard, A.J. Ahumada, Jr., "Relevant image features for vernier acuity," Perception 26, p.38, 1997.