Classification image and noise estimation
|
Albert J. Ahumada, Jr. |
NASA Ames Research Center,
Moffett Field, CA, USA |
Abstract
For the linear discrimination of two stimuli in white Gaussian noise in the presence of internal noise, a method is described for estimating linear classification weights from the sum of noise images segregated by stimulus and response. The recommended method for combining the two response images for the same stimulus is to difference the average images. Weights are derived for combining images over stimuli and observers. Methods for estimating the level of internal noise are described with emphasis on the case of repeated presentations of the same noise sample. Simple tests for particular hypotheses about the weights are shown based on observer agreement with a noiseless version of the hypothesis. This expanded version of Ahumada (2002) includes methods for estimating the internal noise from the length of the estimated weight vector, based on the method of Nykamp and Ringach (2002), and estimating the weight vector when the repeated noises are used, based on the work of Murray, Bennett, and Sekuler (2002).
Keywords: discrimination, detection, vision, noise
Symbols in Order of Appearance
m the number of image components (2.1.1)
s s s = 0, 1; 1 by m signal vectors (2.1.1)
p s s = 0, 1; probability of signal s s (2.1.1)
n 1 by m noise vector with components n(i), i = 1, m (2.1.1)
g 1 by m trial stimulus vector with components g(i), i = 1, m (2.1.1)
E[·] averaging or expectation operator (2.1.2)
Var[·] variance
computing operator (2.1.3)
s2 variance of n(i) (2.1.4)
w 1 by m classification vector with components w(i), i = 1, m (2.2.1)
b bias of linear classifier (2.2.1)
R the observer's response, 0 or 1 (2.2.1)
T vector transpose operator (2.2.1)
||.|| vector length, ||w|| = (w w T) 1/2 (2.2.2)
Pr{} probability of enclosed event (2.2.3)
p
s, R probability of
response R
given signal s s , Pr{R|s s} (2.2.3)
F(·) cumulative standard normal distribution function (2.2.3)
d 0' sensitivity of linear classifier (2.2.4)
b 0 shifted bias of linear classifier, b – w s 0 T (2.2.5)
Z(·) functional inverse of the cumulative standard normal distribution function, F -1(·) (2.2.8)
w I classification vector w of the ideal observer (2.3.1)
d I' sensitivity of the ideal observer (2.3.2)
r 2 the sampling efficiency of w, r = w w IT (2.3.3)
b 0, H random shifted bias of the human observer model (2.4.1)
g 2 variance of b H (2.4.2)
a2 proportion of external noise in the classification variable, 1/(1+ g 2) (2.4.3)
d H' sensitivity of the human observer model (2.4.6)
b H performance bias of the human observer model (2.4.7)
n s,R a random noise n conditional on signal s s and
detection response R
(2.5.1.1)
a s, R the average of N s,R noises n s, R (2.5.1.1)
v s, R the expected value of n s, R when m = 1 (2.5.1.1)
f(·) standard normal distribution density function (2.5.2.1)
x, y, z standard normal variables (2.5.3.1,3,4)
U an orthonormal m by m transformation (2.5.4.1)
I the m by m identity transformation (2.5.4.2)
z i standard normal variables (2.5.4.3)
N s, R the
number of presentations of stimulus s that led to response R (2.5.4.9)
N s the
number of presentations of stimulus s,
N s = N s, 0 + N s, 1 (2.5.5.4)
e
the decision contribution from
the external noise, replacing w
n T
(3.1.1)
p s, R, R' for two presentations of the same signal s s and the same noise, the probability of response R then response R' (3.1.1)
M R R = 0, 1; the event that an internal-noise-free model made response R (3.2.3)
p M, s, 0 the probability of event M 0 given that the signal was s s (3.2.3)
b M, s the
signal-dependent, internal-noise-free model criterion, b
0 if s 0
, or b 0 - d 0'
if s 1 (3.2.3)
n s ,R, R' a random noise n conditional on signal s s and
responses R, R'
(3.5.1)
v s, R, R' the expected value of n s, R, R' when m = 1 (3.5.1)
N s, R, R' the number of presentations of stimulus s that led to
responses R, R' (3.5.6)
a s, R, R' the average of N s, R, R' noises n s, R (3.5.4)
1 Historical Introduction
In 1965, a frustrated graduate student in physiological psychology was looking for a thesis topic in the auditory research laboratory of E. C. Carterette and M. P. Friedman, the editors to be of the Handbook of Perception. They recommended that he tape record the stimulus of the traditional tone-in-noise yes-no detection experiment and analyze the sounds in the four different types of trials to determine whether correlates could be found in the stimuli relating to the observer responses. The noise masker was continuous wide-band noise, and marker tones were recorded on a second track to keep track of the signals presented. The tapes were digitized and analyzed, but the signal-to-noise level at threshold was so low that no trace of the signals could be found in the digitized records. To ensure earning a degree in the foreseeable future, the student made several changes in the experiment. To improve the signal-to-noise ratio on the tape, the noise bandwidth was narrowed, and the noise was turned on only during the short interval when the signal might be present. To reduce the effects of observer noise, the tape was repeatedly presented to the observer to get average ratings of signal presence. To minimize degrees of freedom in the stimulus measurement, the stimulus was reduced to the energy passed by a filter tuned to the signal tone frequency. This combination of changes allowed the student to find that on signal trials, very narrow filter outputs correlated best with observer ratings, whereas on noise trials, wider filter outputs correlated best, contradicting the prediction of single linear filter models for auditory tone detection (Ahumada, 1967).
To gain better control of the masking noise and avoid the limitations of tape recording, Ahumada and Lovell (1971) used computer-generated tones and noises defined by their Fourier component amplitudes and reported linear regressions on the component energies with average observer ratings. These results were essentially auditory classification images that again demonstrated results contrary to simple linear filter theory: frequency components were weighted differently on signal trials from noise-only trials and negative weights were frequently observed. The results of both experiments seemed to be consistent with models with multiple linear channels that were being nonlinearly combined. Ahumada, Marken, and Sandusky (1975) extended the experiment to the combined time and frequency domains with similar results.
Our first visual classification images (Ahumada, 1996) were done to see whether the method we had used in audition could be used to elucidate the features used by observers to accomplish a vernier acuity task. Figure 1 shows a raw classification image and the same image smoothed and quantized so only weights that are significantly different from zero are colored differently from the gray background. The ideal observer would have only weights on the right side, the side of the line that was either even with or one pixel higher than the left line. Spatial position uncertainty was presumably responsible for the observer needing to compare the two lines and for blurring the image more than optical blurring would predict. Theories that postulate that the response would be determined by the output of a single best-discriminating Gabor-like filter (Findlay, 1973; Foley, 1994) are not supported by the appearance, but were not tested statistically. Beard and Ahumada (1998) wanted to see whether observer performance was best characterized as orientation discrimination based on an oriented filter output or a local position measurement (Waugh, Levi, & Carney, 1993). The question was left unanswered; the linear classification functions obtained from the abutting stimuli were consistent with possible implementations of either theory.


Figure 1. A raw classification image (top) and the same image smoothed and quantized (bottom), so only weights significantly different from zero are colored differently from the gray background. The black squares on the sides show the heights and positions of the fixed line (left) and the variable line offset (right). The dark lines on the top and bottom show the lengths and positions of the lines. The observer was A.J.A., who ran 1,600 trials (Ahumada, 1996).
The first visual classification images were linear combinations of four averaged noise images, one for each of the four stimulus-response categories. For a given stimulus, the average of all the added noises has zero mean, so the sum of noises from one response class has an expectation equal to the negative of the expectation of the sum of the noises from the other response class, so we knew to combine the two response noise images with opposite sign. It appeared in the initial images that the error images were clearer than the correct response images, so we took the difference of the averages rather than the sums, realizing that this was an arbitrary decision. We also arbitrarily combined the images from the two stimuli with equal weight to get a single overall image. By symmetry, this must be the right weighting to use if the observer is making the same number of errors to equal numbers of each kind of stimulus, which was approximately the case. In the next section, there is an analysis showing that for a simplified theoretical situation, it is possible to show that the averaging is nearly optimal and to find expressions for good weighting functions for the cases of unequal stimulus presentation rates and unsymmetrical response biases. The beginning of the section introduces notation for a standard signal detection experiment as analyzed by Green and Swets (1966).
2 Template Estimation for Linear Classification of Two Signals in Additive White Gaussian Noise
2.1 The Signals and Noise
s 0 and s 1 are 1 by m signal vectors, presented with probabilities p 0 and p 1 = (1 -- p 0) for N trials. On each trial, a random noise sample vector n is added to the signal, so the trial stimulus
g = s 0 + n
or
g = s 1 + n. (2.1.1)
n is a 1 by m vector of independent samples of identically distributed Gaussian variables n(i) with
E[n(i)] = 0 (2.1.2)
and
Var[n(i)] = E[(n(i) - E[n(i)])2] = s2, (2.1.3)
where E[·] is the averaging or expectation operator and Var[·] computes the variance. Without loss of generality, we can assume that the noise has been normalized by its standard deviation so that
s2 = s = 1. (2.1.4)
2.2 The Linear Observer Model
The linear observer classifying the noisy signals by responding R = 0 or R = 1 or would use a vector w and respond R = 1 if and only if
w g T > b, (2.2.1)
where b is a response criterion and T indicates the matrix transpose operator, so that
Also, without lack of generality, we will assume that w has unit length (w and b have already been divided by the length of w) so that
||w|| = (w w T) 0.5 = 1. (2.2.2)
The performance of an observer is characterized by the error rates
p0, 1 = Pr{R = 1 | s 0},
the probability of signal s 0 being followed by response R 1, and
p 1, 0
= Pr{R = 0 | s 1},
the probability of signal s 1 being followed by response R = 0.
For the linear classifier with vector w and criterion b, w n T is Gaussian with mean zero and unit variance. Hence
p 0, 1 = Pr{w (s 0 + n) T > b } = 1- F( b – w s 0
T )
and
p 1, 0 = Pr{w (s 1 + n) T < b } = F( b – w s 1 T ), (2.2.3)
where F(·) is the cumulative standard Gaussian distribution function.
If we define sensitivity and bias parameters
d 0' = w (s 1 - s 0) T (2.2.4)
and
b 0 = b – w s 0 T, (2.2.5)
then the error rates are
p 0, 1 = 1 - F (b 0) (2.2.6)
and
p 1, 0 = F (b 0 – d 0'). (2.2.7)
These parameters can be found from the error rates as
b 0 = Z(1 - p 0, 1) (2.2.8)
and
d 0' = b 0 - Z(p 1, 0), (2.2.9)
where
Z(·) =F -1(·)
is the functional inverse of the cumulative standard normal distribution function F(·).
2.3 The Ideal Observer
The ideal observer classifying the noisy signals as R = 0 or R = 1 would use the linear classifier
w I = (s 1 - s 0) / ||s 1 - s 0||. (2.3.1)
For the ideal observer,
d I' = w I (s 1 - s 0) T = ||s 1 - s 0|| . (2.3.2)
The efficiency of a non-ideal linear classifier is given by
(d 0'/d I') 2 = (w w I T ) 2 = r 2, (2.3.3)
the square of the correlation between the actual and the ideal classifier coefficients, sometimes called the sampling efficiency.
2.4 A Noisy Human Observer Model
Human observers classify the same images different ways on different presentations. This is modeled here by assuming that the observer's criterion b 0, H (corresponding to b 0) is a normally distributed random variable with
E[b 0, H] = b 0 (2.4.1)
and
Var[b 0, H] = g 2 (2.4.2)
Independent of the noise n. It does not matter whether the variability is added to the criterion or the classification function value. Because the noiseless criterion b 0 was defined as the criterion for a variable with unit variance, the parameter 1+ g2 can be interpreted as the total variance of the classification variable and
a2 = 1/ (1+ g 2 ) (2.4.3)
as the proportion of variance in the classification variable that is from the external noise n. The error probabilities are now
p 0, 1 = Pr{w n T > b 0, H }
= Pr{(w n
T - (b 0, H - b 0)) / (1+
g2) 1/
2 > a b 0 }
= 1- F(a b 0) (2.4.4)
and
p 1, 0 = F(a (b 0 - d 0')). (2.4.5)
If we define the observer's sensitivity and biases as
d H' = a d 0' (2.4.6)
and
b H = a b 0, (2.4.7)
then we can compute these parameters from the human model observer error rates as
b H = Z(1 - p 0, 1) (2.4.8)
and
d H' = b H - Z(p 1, 0). (2.4.9)
The efficiency of the human observer model is
(d H'/d I') 2 = a2 r 2. (2.4.10)
Because r 2 £ 1, a lower bound for alpha is given by
a ³ d H'/d I', (2.4.11)
and an upper bound for g 2 is given by
g 2 £ (d I'/d H') 2 - 1. (2.4.12)
These bounds are reached when w is w I, and the inefficiency is only the result of the internal or criterion noise.
2.5 The Classification Images
The classification image components are the four average noises a s, R, the averages of the noises n for the trials segregated by signal s s and detection response R. We would like to find the mean and the variance of the pixels of a s, R as a function of the parameters (s 1, s 0, w, b 0, and g or a).
2.5.1 The single pixel case
In the single pixel (m=1) case, we are trying to find the mean of a single Gaussian variable n that has been truncated by a random criterion b H. Let n s, R be the truncated variable when s was the stimulus and R was the response and
v s, R = E[n s, R]. (2.5.1.1)
Then because ||w|| = 1, w = ±1. We can assume without loss of generality that s 1 is greater than s 0, and the sign of w is set to maximize correctness, so that w = 1. Hence,
w n T < b 0, H
if and only if
n < b 0, H. (2.5.1.2)
So in the case that s = R = 0,
v 0, 0 = E[n 0, 0] = E[n | n < b 0, H]
and for the other cases
v 0, 1 = E[n 0, 1] = E[n | n > b 0, H]
v 1, 0 = E[n 1, 0] = E[n | n < b 0, H - d 0']
v 1, 1 = E[n 1, 1] = E[n | n > b 0, H - d 0'] . (2.5.1.3)
2.5.2 Single pixel, no noise
Consider now the single pixel case when there is no noise in the criterion (b 0, H = b 0).
(2.5.2.1)
where f(z) is the standard normal density function and the integration of z exp(-z 2 /2) is enabled by the variable substitution
t = -z 2/2.
Similarly,
E[n 0, 1] = E[n | n > b 0]
= f (b 0)/ (1-F (b 0)). (2.5.2.2)
2.5.3 Single pixel, noisy criterion
The Gaussian criterion case can be reduced to the fixed criterion case by a change of variables. Let z be the standard Gaussian used to form the criterion b 0, H, so that
b 0, H = g z + b 0 . (2.5.3.1)
Then
v 0, 0 = E[n 0, 0] = E[n | n < b 0, H]
= E[ n | n < g z + b 0] (2.5.3.2)
if we let
x = a (n – g z) (2.5.3.3)
and
y = a (g n + z), (2.5.3.4)
the new variables x and y are independent (E[x y] = 0), standard (E[x] = E[y] = 0, Var[x] = Var[y] = 1) Gaussian variables. These variables have the properties that
n = a (x + g y) (2.5.3.5)
and that
n < g z + b 0
if and only if
x < a b 0. (2.5.3.6)
So
v 0, 0 = E[n 0, 0] = E[n | n < g z + b 0]
= E[a (x + g y) | x < a b 0]
= a
E[ x | x < a b 0]
= - a f (a b 0) / F (a b 0)
= - a f (b H) / F (b H)
= - a f (Z(p 0, 0))/p 0, 0 . (2.5.3.7)
The effect of the criterion noise on v 0, 0 is to reduce it by the factor a.
Similarly,
v 0, 1 = E[n 0, 1] = E[n | n > b 0, H]
= E[a (x + g y) | x > a b 0]
= a
E[ x | x > a b 0]
= a
f (a b 0) / (1 - F (a b 0))
= a
f (b H) / (1 - F (b H))
= a f (Z(p 0, 0))
/ (1-p 0, 0)
= a f (Z(p 0, 1)) / p 0, 1,