In Exploratory Vision: The Active Eye , M. Landy, L. T. Maloney, M. Pavel, eds., New York: Springer-Verlag (pp. 157-168), 1995.
ABSTRACT
Maloney and Ahumada (1989) have proposed a network learning algorithm that allows the visual system to compensate for irregularities in the positions of its photoreceptors. Weights in the network are adjusted by a process tending to make the internal image representation translation-invariant. We report on the behavior of this translation-invariance algorithm calibrating a visual system that has lost receptors. To attain robust performance in the presence of aliasing noise, the learning adjustment was limited to the receptive field of output units whose receptors were lost. With this modification the translation-invariance learning algorithm provides a physiologically plausible model for solving the recalibration problem posed by retinal degeneration.
During the course of the degenerative disease retinitis pigmentosa (RP), patients experience progressive visual field loss, raised luminance and contrast thresholds, and nightblindness. Visual field loss typically begins in the midperiphery as a ring scotoma and spreads both centrally and peripherally, resulting in severely contracted visual fields (Massof & Finkelstein, 1987). Ultrastructural studies of the RP eye indicate that even in the early stages of the disease there is a diffuse loss of photoreceptors in all regions of the RP-affected eye and the remaining photoreceptors are enlarged (Flannery, Farber, Bird & Bok, 1989; Szamier, Berson, Klein & Meyers, 1979).
1.1 Retinal Degeneration and Bisection Judgments
Turano (1991) studied the perceptual effects of retinal cone loss caused by RP. She investigated spatial position judgments in RP patients using a bisection task. Some patients with RP exhibited spatial position distortions (i.e., constant errors or biases) ranging from 2 to 5 standard deviations beyond the normal range. Other RP subjects were able to judge the relative position with the same accuracy as normal subjects. She was somewhat surprised by the lack of correlation between the magnitude of the position distortions and increased pathology (disease progression) as indexed by visual field loss. As illustrated in Fig. 1, the correlation was 0.124, indicating that increased pathology is a surprisingly small factor.
Figure 1. Mean absolute distortion (bias) of RP patients in units of normal subjects' standard deviation as a function of the estimated of years of disease progression based on visual field size (Massof & Finkelstein, 1987). The data is from Turano (1991).
One possible explanation for the lack of correlation is that RP patients learn to compensate for the loss-induced distortions, especially when the rate of disease progression is slow. Our work on learning theories for geometric calibration (Maloney and Ahumada, 1989) led us to try to see whether these theories, originally designed to calibrate the cone positions of a developing visual system, could be used to explain recalibration of a degenerating one.
1.2 Cone Position Calibration Models
Maloney and Ahumada (1989) proposed a network learning algorithm as a solution to the problem of how the visual system knows the positions of the photoreceptors (Ahumada, 1992; Ahumada and Mulligan, 1990). The network transforms the sampled image values to new values from which an internal image is interpolated. The weights of the network are adjusted by a learning algorithm whose goal is to make the internal image translation invariant, that is, look the same except for a translation when the eye position is changed. The weight adjustment rule can be regarded as a modification to the Widrow and Hoff (1960; Widrow & Stearns, 1985) adaptive linear weight adjustment procedure or delta rule. The delta rule is an error-correcting feedback rule and the error is computed as the difference between the output computed by the network and the desired output of the network. The Maloney and Ahumada rule does not assume that the correct desired output is available. The correct desired output of the delta rule is replaced by a translated and resampled version of the current internal image in a different retinal position. Here, we report on the behavior of this translation-invariance (TI) algorithm calibrating a visual system in which the receptor array experiences random drop-outs. We hoped that the model behavior might provide some insight into the recalibration problem faced by people suffering retinal degeneration and that we would also learn about the suitability of the algorithm for recalibrating remote sensor systems suffering such loss.
2.1 The Visual System Model
To keep calculations manageable, we will simulate a simplified model of the visual system. We assume the input receptors are in a regular rectangular array. We also assume an internal array of corresponding units. The input receptors are connected to the internal units by a linear weighting network. Initially, the weights to corresponding units will be unity and the weights to others zero; the network will just copy the receptor inputs to the internal units. An internal image is interpolated from the internal units using a discrete equivalent of sinc interpolation functions, pulses filtered by a rectangular filter in the spatial frequency domain. Initially, if an input image has no spatial frequency components above the Nyquist frequency of the sampling array, the image will be correctly reconstructed. The initial state is illustrated on the left side of Fig. 2.
Figure 2. (Left) An input image is sampled by receptors, copied to the internal units and reconstructed from a linear combination of interpolation functions. (Right) Collateral weights added to the network to compensate for a missing receptor.
The input function illustrated is a pulse centered over the central unit. The pulse has been low-pass filtered at the Nyquist frequency of the sampling array. It has zero contrast at the other receptors, so the interpolated image is the interpolation function from the central unit alone. The correctness of the interpolation illustrates that the interpolation functions are filtered pulses.
We simulate the loss of a receptor by removing it and its connections from the network. If we did not change the network, the output image would also have this zero. Connections from the remaining receptors to the unit corresponding to the missing receptor allow them to fill in an estimate of what the missing receptor would have seen, as illustrated on the right side of Fig. 2. This network cannot reproduce the input of the left side, which does not now activate any receptors, but it can reproduce correctly stimuli which are restricted to the Nyquist frequency of the less dense sampling array. The image on the right is the same pulse with enough of the highest spatial frequencies removed so that the pulse is sub-Nyquist for the less dense array. It is now reconstructed from a weighted sum of interpolation functions having the shape of the one on the left. Learning rules will allow the development of appropriate weights.
Let s(i) be the value of the input image at the ith sample point (receptor position), and let r(j) be the value of the internal representation at the jth position. We compute the internal values as a linear combination of the remaining receptors. Let s'(i) represent the values of the remaining receptors: s'(i) = s(i), if receptor i is present; s'(i) = 0, if receptor i is missing. Then
r(j) = Sum s'(i)w(i,j), (1)
i
where w(i,j) is the network weight from receptor i to the internal unit
at position j.
2.2 The Delta Rule
The delta rule was developed to compute weights from a sequence of inputs and corresponding desired outputs. From a rich enough set of images for which the missing points are known, compensating weights can be found by the delta rule. This rule is also called the Widrow-Hoff rule, least mean square learning, outer product learning and back propagation (Stone, 1986). The delta rule is error correcting. The error e(j) for output unit j is the difference between the output of the unit r(j) and the image at the jth position s(j),
e(j) = r(j) - s(j). (2)
The weight from receptor i to unit j, w(i,j) is adjusted by
subtracting a fraction lambda of the error multiplied by the
output of receptor i
w(i,j) = w(i,j) - lambda s'(i) e(j). (3)
The solid curve in Fig. 3 shows the error decreasing over trials in an example run of the delta rule learning weights to fill in for one receptor missing from a 7 by 7 array.
Figure 3. Root-mean-square (RMS) difference between the weighted network outputs and the actual image samples as a function of trials for the learning rules described in the text. Weights are being adjusted to fill in for one receptor missing from a 7 x 7 array. Images are white noise low-pass filtered so that the sampling is still adequate for reconstruction.
The images used in the simulation were noises synthesized from Fourier components with equal expected amplitudes. The amplitudes of the sine and cosine components were independent, identically-distributed Gaussian random variables with zero mean. The images included the 25 sine and cosine components having horizontal or vertical spatial frequencies as high as 2 cycles per image. The image resolution was 16 (4 by 4) pixels per receptor. Images were selected for a trial by translating the position of the noise image to a new position selected at random except that it had to move at least one receptor spacing in each dimension (x and y). To simplify calculations, wrap-around motion was simulated. After each block of 100 trials a new noise image was computed. The learning rate lambda was set to the value 0.5 divided by the sum of the squared s'(i) values. The weight matrix was initialized to be the identity matrix. The left side of Fig. 4 shows the weights that were learned for the output whose input is now missing. The weights to the other units are not affected by the delta rule because their error is always zero.
Figure 4. Weights computed by the simulations giving the learning curves of Fig. 3 for the delta rule and the TI rule.
In general, the delta rule can be shown to provide weights that minimize the expected squared error if the learning rate is slowly brought toward zero. In cases, as above, in which a perfect (zero expected error) solution exists, the rule causes the weights to converge towards a solution for a range of fixed learning rates (Widrow & Hoff, 1960; Widrow & Stearns, 1985). Unfortunately, although this rule uses biologically plausible computations, is well behaved, and well understood, the information that it needs, the stimulus input to the missing receptor, is not known.
2.3 The TI Rule
The TI rule was developed for the case that the input image is not known. It bases its adjustments on the output of the weighting network. The output is computed for an input image. The eye is moved a known amount and the output is computed again. This second output plays the role of r(i) in the delta rule. The output from the previous image is then translated to compensate for the eye movement and then interpolated and sampled. These samples are used in place of s(j) in the above formula for e(j). (Complete formulas for the TI rule are given elsewhere (Ahumada, 1992; Ahumada & Mulligan, 1990; Maloney & Ahumada, 1989). This feedback does not directly drive the weights toward a correct solution, but it drives them toward a translation-invariant solution, in which the interpolated output translates along with translations in the input. Correctness is obtained by forcing one of the units to be correct. We do this by connecting one output to only one input, making that weight unity, and not changing it. The translation-invariance learning algorithm then tries to copy this fixed receptive field to all the others. Fig. 3 also contains the learning curve for the TI rule. Although the TI rule learns much more slowly, at the end of 6400 trials the weights were indistinguishable from those of the delta rule after 1200 trials, as shown in Fig. 4. It is easy to understand why the learning is so much slower for the TI rule. For the delta rule, errors only occur at the output whose input is missing, so only the weights in the receptive field of that unit are altered. The TI rule, however, will see a translation error at any position translated to or from the image region near the missing input. Initially it changes all the correct weights and then puts them back as it also finds the weights which fill in for the missing input.
2.4 Inadequate Sampling
In the previous example, the sampling density of the receptors was high enough so that a perfect reconstruction of the band-limited stimulus was possible from the remaining receptors. In this case both rules find weights which provide perfect reconstruction. When the sampling array is no longer capable of reconstructing the stimuli, we encounter a situation where there is no perfect solution. In this case, the delta rule can be shown to find an average least squared error solution, if the learning rate is slowly decreased to zero. The solid and dashed lines in Fig. 5 show the learning performance of the two rules for such an example.
Figure 5. RMS errors from simulations with images that are `pink' noise, low-pass filtered so that the sampling was adequate for reconstruction before the receptor was lost but not after.
The parameters were the same as the previous example except that the noise bandwidth extends to the Nyquist frequency of the original 7 by 7 sampling array. Also, since the coefficients now depend on the content of the stimuli, the amplitude of spatial frequency components of the noise was made inversely proportional to their radial frequency. Finally, the learning rate coefficient was decreased from 0.5 to 0.01, so that the learning itself would not be a large source of error at asymptote. If white noise had been used, rather than this `pink' noise, the missing sample would have been independent of the other samples and no weightings would improve performance. The Gaussian nature of the noise ensures expected squared error optimality of a linear predictor of the missing sample, and thus the delta rule weightings will be nearly optimal. The figure shows that the delta rule error rate becomes stable, but while the TI rule performs well initially, its error rate later becomes as poor as at the start of learning. A check of the weights during learning revealed that the originally correct weights keep dropping in value. They do not return to their correct value as they do when the number of sampling receptors is adequate for correct reconstruction. The rule is probably hunting for a compromise between the least squared error solution and the only solution with no translation error, the zero weight solution.
2.5 A New Rule
One possible solution to this problem is to increase the number of receptive fields constrained to be correct. If the loss of a receptor triggers weight learning for weights to its output unit and not to others, the output units for the remaining receptors remain correct. It seems plausible that the physiological consequences of losing a receptor could trigger the learning process. The lack of activity of the internal unit could also initiate the learning process. The dotted line in Fig. 3 shows that this rule can learn almost as fast as the delta rule when the sampling is adequate. The dotted line in Fig. 5 shows that this rule can learn the same stable solution as the delta rule when the sampling is inadequate. Fig. 6 shows the weights learned by the new rule and the delta rule when the sampling is inadequate.
Figure 6. Weights from the inadequate sampling simulations for the delta rule and the TI rule adjusting only the weights to the output unit which lost its receptor.
2.6 A Final Example
To illustrate that the above conclusions hold when more than one receptor is missing, a final example simulation is shown. The conditions were the same as the previous case of inadequate sampling except that the size of the sampling array is increased to 11 by 11 and 30 per cent of the units were deleted at random. Fig. 7 shows that in this situation the TI rule performs worse than it does when only one receptor is missing.
Figure 7. RMS errors for the three algorithms for the 11 x 11 sampling array with 36 (30 per cent) of the receptors missing.
The larger number of missing receptors causes a faster erosion of the correct coefficients. The performance of the TI rule applied only to the receptive fields of the missing receptors' output units is also worsened, but by a relatively small amount. Fig. 8 shows the weights learned by the new rule and the delta rule in a run where the sampling is inadequate and 30 per cent are missing.
Figure 8. Weights for the 11 x 11 sampling array computed by the delta rule (left) and the TI rule adjusting only the weights to the output units which lost their receptors (right). The X marks the position of the missing receptor for which these weights are compensating. The other 35 missing receptors have no symbol.
These weights are after 100,000 learning trials. The weights for the revised TI rule are similar to those of the delta rule after 50,000 trials and, presumably, are converging to a similar asymptote. The learning rates for these simulations were chosen for convenience and not intended to optimize performance. Performance closer to the range of the adequately sampled case might be achieved by using schedules with decreasing learning rates.
3.1 Known Translations One apparent weakness of the TI model is that the eye movement is assumed to be known exactly. Ahumada and Mulligan (1990) have shown that this assumption can be relaxed somewhat. They showed that when the TI rule is learning weights to compensate for small amounts of jitter in the input unit positions, the exact eye movement knowledge can be replaced by an estimate based on maximizing the correlation between the two output images. We speculate that the same situation will hold for moderate fractions of missing receptors.
3.2 The Interpolated Image
Barlow has proposed an anatomical layer of fine cells in the cortex as the site for the interpolation of a cortical image (Barlow, 1979). The need for such an image has been criticized on the grounds that a homunculus would then be needed to look at it. The TI model needs the interpolated image to obtain its feedback samples unless the eye movements are made in integral multiples of the internal spacing. Other visual processing channels could sample the interpolated image according to their own independent needs.
On the other hand, separate processing channels might carry out their own calibration independently. If a channel can be characterized by a linear impulse response that it computes over space in a translation invariant manner, the TI rule can be used to calibrate it separately. Ahumada and Tabernero (1994) have shown that separate spatial frequency channels can be calibrated independently with the benefit that associative learning processes can be used instead of a fixed correct output unit to provide a target receptive field. Our TI schemes above are useless if the fixed unit is the one that turns out to have its input missing. In addition, a calibration scheme based on separate bandpass spatial frequency channels should be more resistant to the aliasing problems caused by missing receptors, since bandpass images are just as easy to reconstruct as lowpass images using linear weighting functions.
3.3 Two Views from Two Eyes
It is possible that the two views used by the TI method might come from separate eyes rather than two views from the same eye over time. We have not tried to implement this interesting idea. We do know that in order to obtain adequate calibration, the views must differ in more than a single direction (the eyes cannot be assumed to diverge in only the horizontal egocentric direction). Craik (1966) `sunburned' one retina and describes distortions in his undamaged eye apparently induced from adaptations to damage in the other eye, consistent with a model in which inputs from both eyes are mutually calibrated. If there were three eyes, majority rule could prevent miscorrection of an undamaged eye. Similarly, three views over time could prevent the problem of the TI rule messing up the good connections and make unnecessary the restriction of the learning to the region of the missing receptors.
3.4 Partial Damage
It seems unlikely that any destructive process causing diffuse degeneration of receptors would be completely all or none. Cells would be expected to suffer partial damage, which could be modeled as some combination of position distortion, gain change, and noise level increase. We have seen the TI model work for position and gain changes, but as yet have no information about its performance in the case of noisy receptors.
We have investigated the ability of Maloney's TI rule to recalibrate a receptor array after receptor loss. We confirmed the expected result that when the remaining array is adequate to reconstruct the stimuli, the TI rule can find a compensating transformation. However, when there is no perfect solution, the TI rule initially improves the situation by computing weights that fill in, but eventually erodes the correct values of weights from receptors to their corresponding outputs. This problem is easily removed by allowing the TI rule only to adjust the weights in the receptive fields of units who have lost receptors. This modification of the TI rule provides a model for how recalibration for missing receptors might take place.
This work was presented to the 13th European Conference on Visual Perception, Paris, France, in September, 1990, (Ahumada & Turano, 1990) and parts of it were presented to the SPIE Meeting in San Jose, California, in February, 1991 (Ahumada & Mulligan, 1991). This work was supported by NASA RTOP 506-71-51 and NIH/NEI Grant EY07839. Helpful comments were provided by R. L. Gregory, M. Shiffrar, and M. S. Landy.
Ahumada, Jr., A. J. (1992). Learning receptor positions. In Landy, M. S. and Movshon, J. A. (Eds.), Computational Models of Visual Processing (pp.23-34). Cambridge, Massachusetts: MIT Press.
Ahumada, Jr., A. J. & Mulligan, J. B. (1990). Learning receptor positions from imperfectly known motions. In Rogowitz, B. & Allebach, J. (Eds.) Human Vision, Visual Processing, and Digital Display, Proceedings of the SPIE. Volume 1249 (pp.124-134).
Ahumada, Jr., A. J. & Mulligan, J. B. (1991). Network compensation for missing sensors. In Rogowitz, B., Brill, M. H. & Allebach, J. (Eds.) Human Vision, Visual Processing, and Digital Display, Proceedings of the SPIE. Volume 1453 (pp.134-146).
Ahumada, Jr., A. J. & Tabernero, A. (1994). Anti-Hebbian learning and cortical receptive field calibration. Investigative Ophthalmology and Visual Science 35 (4, ARVO Suppl.), 1257 (Abstract).
Ahumada, Jr., A. J. & Turano, K. (1990) Calibration of a visual system with progressive receptor drop-out. Perception, 19, 337 (Abstract).
Barlow, H. B. (1979). Reconstructing the visual image in space and time. Nature, 279, 189-190.
Craik, K. J. W. (1966). The nature of psychology Cambridge, UK: Cambridge University Press.
Flannery, J. K., Farber, D. B., Bird, A. C. & Bok D. (1989) Degenerative changes in a retina affected with autosomal dominant retinitis pigmentosa. Investigative Ophthalmology & Visual Science, 30, 191-211.
Maloney, L. T. & Ahumada, Jr., A. J. (1989). Learning by assertion: Two methods for calibrating a linear visual system. Neural Computation, 1, 392-401.
Massof, R. W. & Finkelstein, D. (1987). A two-stage hypothesis for the natural course of retinitis pigmentosa", In Zrenner, E., Krastel, H. & Goebel, H. (Eds.) Advances in the Biosciences: Research in Retinitis Pigmentosa (pp.29-58).New York: Pergamon Press.
Stone, G. O. (1986). An analysis of the delta rule and the learning of statistical associations. In Rumelhart, D. E. & McClelland, J. L.(Eds.), Parallel Distributed Processing, Vol. I (pp.444-459). Cambridge, Massachusetts: MIT Press.
Szamier, R. B., Berson, E. L., Klein, R. & Meyers, S. (1979). Sex-linked retinitis pigmentosa: Ultrastructure of photoreceptors and pigment epithelium. Investigative Ophthalmology and Visual Science, 30, 191-211.
Turano, K. (1991). Bisection judgments in patients with retinitis pigmentosa. Clinical Vision Science, 6, 119-130.
Widrow, A. B. & Hoff, M. E. (1960). Adaptive switching circuits. In WESCON Convention Record, Part 4, (pp. 96-104).
Widrow, A. B. & Stearns, S. D. (1985). Adaptive signal processing. Englewood Cliffs, NJ: Prentice-Hall.