Antonio Tabernero
Instituto de Optica, Serrano 121
Madrid 28006, Spain
Albert J. Ahumada Jr.
NASA Ames Research Center, MS 262-2
Moffett Field, CA 94035
The simple cells of primary visual cortex have been proposed to result from Hebbian associative learning [15]. Sanger's model [2] of this theory develops a `column' of units supported by a small patch of visual field. The receptive fields of these units converge to the eigen vectors of the covariance matrix of the retinal inputs. Since many eigen values are effectively equal, columns associated with different patches do not in general have units with similar receptive fields in corresponding positions in the columns. Maloney and Ahumada's translation-invariance learning algorithm [1] can ensure that a layer of linear receptive fields is uniform. It can force the corresponding column elements to have the same receptive field. Simulations of these two processes on regularly sampled arrays show that they can cooperatively find eigen vectors and force corresponding column elements to have the same receptive fields. When the image is sampled at slightly disordered positions, the eigen vectors of different columns no longer are identical but the two processes still cooperate to give an appropriate output network.
In this work we combine two previously proposed schemes-- Hebbian associative learning and a translation-invariance learning algorithm-- to develop columns of units modeling the simple cells of primary visual cortex.
Following the pioneering work of von der Malsburg [14] many have proposed that Hebbian association principles might account for the development of the spatial frequency and orientation selective cells with approximately linear receptive fields (RFs), the simple cells of primary visual cortex. Two recent works contain references to the early development of these ideas [11, 12]. Sanger's recent model [2] of this theory is especially convenient because of the mathematical elegance of the result, originally found by Oja [13]. Sanger's model develops a `column' of units supported by a small patch of visual field. Some of these learned eigen vectors resemble edge and bar detectors, orientation specific and somewhat localized in the spatial frequency domain. Since some RFs of cells in the visual cortex [4, 3] also show this selectivity, this model can generate cortex-like RFs. The RFs of these units converge to the eigen vectors of the covariance matrix of the retinal inputs, ordered by the size of the eigen values. Since many eigen values are effectively equal, columns supported by different patches do not in general have units with similar RFs in corresponding positions in the columns.
Maloney and Ahumada [1] proposed a translation-invariance (TI) algorithm to calibrate a simple linear visual system. Their method can compensate for irregularities in the sampling lattice. Given a disordered sampling lattice, the trained weights of the neural network can transform the inputs in such a way that the outputs are the values from the original image sampled at regular positions. The network is therefore forcing the RFs of all output units to be equal. In their case all the RFs are delta functions for suitably low-passed inputs, for the possible distortion of the sampling array. The TI learning algorithm by itself only tries to make all the RFs equal; the reason that all the RFs end up delta functions is that one RF is fixed to be a delta function by connecting it to a single input. It is clear that the same algorithm can be used to train the network to compute the output of any given RF, for example it could generate a layer of cells having a specific orientation and spatial frequency. However, to replicate a RF, we have to have one fixed cell with the desired profile. Having one cell with this special status does not seem biologically realistic.
The main idea of this paper is to apply Sanger's Hebbian algorithm and the TI algorithm simultaneously. Our hope was that they would be compatible and the Hebbian algorithm would develop cortex-like RFs and the TI algorithm would organize them into corresponding layers in each column across the visual field, correcting problems associated with the irregularities of the sampling array.
We will be modeling a simple visual system with linear connections between the inputs and the outputs. The input will consist of an array of photoreceptors that will sample the training image. The size of this array will be N (although we will show results for the 2D case, all the discussion will be restricted to the 1D case, for the sake of simplicity). The output will be N x NRF 'cells', NRF being the number of different kinds of RFs at each of N output positions.
Therefore, the weights Wr,j,i connecting the inputs xi and the outputs yr,j will be triply subscripted, so that
yr,j = Si Wr,j,i xi , Eq. (1)
where r ranges over the NRF levels at a position, and i and j range over the N positions. In particular, if we fix the type of RF (r) and an output position (j), the resulting set of weights is what we call the RF of that cell.
In the combined training procedure, the TI algorithm is applied
independently for each RF type (r is fixed),
and the Hebbian algorithm is applied independently at each
fixed output position.
3.1 The Hebbian component
Sanger's Generalized Hebbian algorithm increments the weights of the network
after each iteration by
D Wr,j,i = g yr,j ( xi -
Skr
Wk,j,i yk,j), Eq. (2)
where g is the Hebbian learning rate.
For r = 1, the weights are being incremented according to the correlation
between the inputs and the outputs.
For outputs at higher levels in the columns, after the weights are close to
the eigen vectors of the input covariances, the outputs are then
correlated with the residuals of the inputs after parts correlated with
the lower units have been extracted.
Note that in this fully connected system,
if the weights for two column positions
j and j' are equal at some point in time,
they will be equal thereafter.
3.2 The TI component
The TI algorithm tries to match what it sees after an 'eye movement'
with what it would expect from its knowledge of the sampled image before
the translation. It will succeed if it learns the positions of the
sampling array. At that point the system will be calibrated. In order to
do so, for each simulated eye movement the system computes the expected
new image, and then compares it with the real one. This gives an error
term that is used to correct the weights with a modified Widrow-Hoff
algorithm.
The Widrow-Hoff algorithm [9, 10]
adjusts weights to make the future outputs more like a desired output
by the increment
D Wr,j,i = l (y'r,j - yr,j) xi, Eq. (3)
where y'r,j - yr,j is the difference between the desired
output (usually provided by the trainer or supervisor) and the
actual output and l is the learning rate.
In the TI procedure, y' is not the desired output, it is
the output image preceding the eye movement translated (by interpolation
and re-sampling) to the corresponding position.
3.3 The combined procedure
Now, the training procedure:
An image is generated, presented, and sampled by the array of photoreceptors.
These samples will be the input data for both the TI and the Hebbian algorithms.
The TI algorithm is applied first.
Since we want to propagate different types of RFs,
the TI algorithm is applied independently to each level.
Then, the Hebbian algorithm is applied to each of the positions in the
network.
After this, a random 'eye movement' is produced, the image is sampled
again, and we repeat the whole procedure.
After a number of repeated viewings of the same image (usually from 10 to 50),
a new image is generated and the process begins again.
The learning rate of the TI algorithm is constant (approximately 0.5),
whereas that of
the Hebbian algorithm is reduced according to the following rule:
The Hebbian algorithm error is averaged over blocks of about 10 images,
and if the average error of a block is lower than that of the
previous one, the learning rate is assumed to be appropriate and kept.
Otherwise, the rate is reduced by a constant factor
(approximately 0.75).
The process continues until both the TI and Hebbian errors have decreased
below some previously defined limit. For the results shown below, the
resulting number of training images was usually about 500 - 1000.
The training images used here are finite Fourier series,
composed of a limited number of frequencies with random (Gaussian
distributed) amplitudes. After generating the images, we apply a
low-pass Gaussian filter with a standard deviation of approximately
2 sample spaces.
The finite number of spatial frequencies
and the Gaussian low-pass filter can be thought of as representing
the blur produced by the optics of the eye and the low-pass nature
of natural imagery.
The TI algorithm has been seen to perform poorly with inadequately
sampled imagery [7].
5.1 Regular case
Training the network in the case of a regular sampling array provides
results easy to interpret.
In this case, we obtain a set of different
RFs , each correctly replicated across the different
output positions.
Fig. 2 shows some cases when the the size of the
input array was 7 x 7 (49 photoreceptors). Fig. 2a, 2b, and 2c
correspond to 3, 5, and 7 layers of RFs.
Only the set of RFs at one position is shown since they are all similar.
The obtained RFs correspond to a low-pass filter
(that always appears in the first position)
and several band-pass filters with different orientations.
Almost all the RFs (except for the fourth one in Fig. 2b)
have vertical, horizontal, or 45 deg orientations.
We can see how some of them share the same frequency
and orientation, but have a 90 deg phase shift with each other, that is,
they are in phase quadrature.
All of them are orthogonal to each other.
Some of these characteristics resemble those of the RFs of
cells in the visual cortex [3].
In this case, the response of an output 'cell' to a given input, the
inner product between the input and the RF of that 'cell', will
capture a particular spatial-frequency feature of that image.
The translation invariance is clear in this case, since we can see that
the RF of a particular type for all the output positions are identical to
each other (the degree of similarity among them depends on the limits that
we imposed to the error of the TI algorithm)
5.2 Irregular case
In the case of irregular sampling, RFs can look very different from the
ones of the regular case in terms of their Wr,j,i.
Moreover, these weights are not translation invariant,
since the weights compensate for the
irregularities of the sampling array as well as filtering out some aspect
of the images.
These effects can be graphically appreciated in Fig. 3.
Fig. 3a shows weights from a 5 x 5 regular input array
with 5 different layers of RFs.
The resultant RFs are similar to those of Fig. 2.
Now, we repeat the procedure with a disordered sampling array.
The distortion imposed on the (x,y) coordinates of the lattice was :
Therefore, what we should show is whether we can still obtain what we want
(that is, what we were obtaining in the regular case) by applying these new
eigen vectors to the non-regular samples of the image.
Let us begin with :
x = (xi) is the column vector of the regular samples of the image,
and
y = (yi) the column vector of the non-regular samples.
Given certain restrictions about the input images (that make the TI
algorithm converge, and that have been preserved in this work), we can
recover the regular samples
from the non-regular ones through a conversion matrix T. This is exactly
what was being done by Maloney and Ahumada [1]. Therefore,
x = T y . Eq. (4)
Then if Q(x) is the correlation matrix of the regularly sampled input,
Q(x) = (E(xixj))
= E(x xT )
= E (T y(T y)T)
= T E (y yT )TT = T Q(y)TT . Eq. (5)
We know that with the Hebbian algorithm we are computing the first
eigen vectors of this matrix (our RFs). The matrix whose rows are
the eigen vectors of Q(x) will be called C(x). Then, we know that for
symmetric matrices like Q(x) or Q(y):
A(x) = C(x) Q(x) C(x)T
A(y) = C(y) Q(y) C(y)T , Eq. (6)
where A(x) and A(y) are diagonal matrices.
Since A(x) and A(y) are both diagonal, they can be related through another
diagonal matrix F :
A(y) = F A(x) FT .
Combining Equations 2 and 3:
A(y) = F(C(x) Q(x) C(x)T) FT
= F(C(x)(T Q(y) TT ) C(x)T) FT
= (F C(x) T) Q(y) (F C(x) T)T . Eq. (7)
Combining Equation 4 and Equation 3a we obtain:
C(y) = F C(x) T . Eq. (8)
We can see that in the regular case, since the rows of C(x) are the
eigen vectors of Q(x) (the RFs obtained with the Hebbian algorithm),
we were computing:
C(x) x, the inner product between each RF and the input x.
And now, in the irregular case we have:
C(y) y = F C(x) T y = F C(x) x = F(C(x) x).
Since F is diagonal the only minor difference is a factor for each RF.
Furthermore, the factor is going to be the same for the case of
degenerate eigen values, so the correction will affect equally those pairs
of RFs which differ only in a phase shift.
This shows that what the network is computing in this irregular case is
what we wanted: the 'spontaneous' generation of some weights that
make the output be what it would have been had we applied some cortex-like
filters to a regularly sampled array.
In order to test this we have corrected the 'disordered' RFs in C(y) with
the matrix T. Computing C(y) T-1 should give us something
similar to the original 'ordered' Rfs of C(x). Indeed, this is what happens
as it is shown in Fig. 3c. In this figure the factors of matrix F are not shown
since the gray level range has been expanded between 0 and 255. This factors
can be appreciated if we compute the inner product between the 5 different RFs
at a particular output position (0,0) :
Work supported by NASA RTOP 506-71-51 and the Educational Ministry
of Spain, CICYT TIC 91 - 0438.
1. L. T. Maloney and A. J. Ahumada, Jr. (1989) "Learning by Assertion: Two Methods for Calibrating a Linear Visual System", Neural Computation, Vol.1, pp.392-401.
2. T. D. Sanger (1989) "Optimal Unsupervised Learning in a Single-Layer Linear
Feedforward Neural Network", Neural Networks, Vol. 2, pp. 459-473.
3. D. A. Pollen and S. F. Ronner (1983) "Visual Cortical Neurons as Localized
Spatial Filters", IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC-13,
No. 5, pp. 907-916.
4. S. Marcelja (1980) "Mathematical Description of the Response of Simple
Cortical Cells", J. Opt. Soc. Am., Vol. 70, No. 11, pp. 1297-1300.
5. A. J. Ahumada, Jr. (1992) "Learning Receptor Positions",
in M. Landy and
J. A. Movshon, eds., Computational Models of Visual Processing, MIT Press,
Cambridge, MA, pp. 23-34.
6. A. J. Ahumada, Jr. and J. B. Mulligan (1990)
"Learning Receptor Position from Imperfectly Known Motions",
in B. Rogowitz and J. Allebach,
Human Vision, Visual Processing, and Digital Display, Proc. SPIE ,Vol. 1249,
pp. 124-134.
7. A. J. Ahumada, Jr. and J. B. Mulligan (1991)
"Network Compensation for Missing Sensors",
in B. Rogowitz and J. Allebach,
Human Vision, Visual Processing, and Digital Display II, Proc. SPIE ,Vol. 1453,
pp. 134-146.
8. G. O. Stone (1986) "An Analysis of the Data Rule and the Learning of
Statistical Associations",
in D. E. Rumelhart and J. L. Mcclelland, eds.,
Parallel Distributed Processing,
Vol. 1, MIT Press, Cambrige, MA, pp. 444-459.
9. A. B. Widrow and M. E. Hoff (1960) "Adaptive Switching Circuits", Inst. of
Radio Engineers, WESCON Record, Part 4, pp. 96-104.
10. A. B. Widrow and S. D. Stearns (1985) Adaptive Signal Processing, Englewood
Cliffs, NJ, Prentice-Hall.
11. S. Amari (1988) "Dynamical Stability of Formation of Cortical Maps",
M. A. Arbib and S. Amari, eds., Dynamic Interactions in Neural Networks:
Models and Data, Springer, New York.
12. T. Kohonen (1989) Self-Organization and Associative Memory, Springer,
New York.
13. E. Oja (1982) "A Simplified Neuron Model as a Principal Component Analyzer",
J. Math. Biology, Vol. 15, pp. 267-273.
14. C. von der Malsburg (1973) "Self-Organization of Orientation Sensitive
Cell in the Striate Cortex", Kybernetik, Vol. 14, pp. 85-100.
15. D. O. Hebb (1949)
Organization of Behavior, John Wiley, Inc., New York.
| ( 0, 0) ( 0, 1) ( 0,-1) ( 1, 0) ( 1, 1) |
| ( 0, 1) ( 1, 0) ( 0, 0) (-1, 1) ( 0, 0) |
| (-1, 1) ( 0, 1) (-1,-1) ( 1, 0) ( 0, 1) |
| ( 1, 0) ( 1, 1) ( 0, 0) ( 1,-1) ( 0,-1) |
| ( 0, 1) ( 0, 0) (-1,-1) ( 1, 1) ( 1, 0) |
where each unit is one fourth of the inter-receptor distance in the
regular array.
With this sampling configuration, the set of weights obtained at the output
position 0,0 are shown in Fig.3b.
To see if these weights are appropriate, we need to translate them into
weights for the regularly sampled positions.
One way of doing this is to present to this network each of the images that
are appropriately band limited and have the value unity at one of the
regular points and are zero at all the other regular points.
For the above case the image for point (0,0) is the sum of all the
cosine phase components of the DFT, appropriately normalized.
The output to each such image gives the weight of the regularly sampled
equivalent for that position.
Another approach is to use the original TI algorithm to find weights
Tj,i that transform the irregular samples into regular samples.
The inverse of this matrix then provides a transformation for converting
weights for the irregular sample positions into weights for the regular
sample positions.
Figure 3c shows that the weights of Fig. 3b actually correspond to weights
that are similar to those obtained in the regular case and the weights for
the other positions are correspondingly similar.
|0.694 0.000 0.000 0.000 0.000|
|0.000 0.588 -0.001 0.000 0.000|
|0.000 -0.001 0.588 0.000 0.000|
|0.000 0.000 0.000 0.674 0.000|
|0.000 0.000 0.000 0.000 0.675|