Contrast mMeasures
for pPredicting
tText
rReadability
L. F. V. Scharff 1, and A. J. Ahumada Jr.2;
Stephen F. Austin State University, Nacogdoches, TX1;
NASA Ames Research Center, Moffett Field, CA2.
Abstract
If text pixels have a constant luminance T on a
constant background luminance B, the contrast of the text can be represented by
the equation C=T/(pT+(1-p)B)-1. If
p=0.5 the equation gives the Mickelson contrast. If p is the proportion of text pixels, the equation gives the
contrast relative to the average luminance.
And, if p=0, the equation gives the contrast relative to the background,
the most common measure for text contrast.
In experiments using positive and negative contrast, when the contrasts
were equated by the equation C=T/B-1, we found that the negative contrast text
was more readable than the positive contrast text, suggesting that it would be
better to measure text contrast using the Mickelson formula or the average
luminance formula. Our previous experiments
with additive and multiplicative transparent text on textured backgrounds show
that readability can be more accurately predicted by adjusting the contrast
with a contrast-gain-like divisive factor that includes the background RMS
contrast. However, the factor performed
poorly at predicting readability of differences on the two
different patterned
backgrounds. Using the same imagesWe of the previous study
we presented
the target words alone and single letters cut out of the target words. We found
that word identification and word discriminability was affected by the
backgrounds in the same way that the paragraph search performance was affected, but that measured
letter identifiability on these two backgrounds and
found that it was predicted by the metricasure. We also found a significant improvement from
including different contrast gains for positive and negative contrasts in the
metric. Unfortunately,
w Word
readability is unfortunately not necessarily simply
related to letter identifiability and simple contrast measures.
Keywords: text
readability, text contrast, visual masking
Scharff and Ahumada 31(2002)
measured text readability for two types of transparent text: additive (as
occurs in head-up displays) and multiplicative (which occurs in see-through LCD
virtual reality displays). These two
types of transparency resulted in polarity differences: additive text wais
lighter than the background (positive polarity), and multiplicative text wais
darker than the background (negative polarity). Text contrast and background texture were also manipulated. Their
Global Masking index (that combines text
contrast and background RMS contrast) predicted search times well (r =
0.89).
To predict their results, Scharff and Ahumada1 used a contrast
metric similar to their earlier Global masking metric2, 3. Their adjusted
masking index,
C = CT / (1 + (CRMS/C2)
2 ) 0.5 ,
combined text contrast CT (with respect to the
image luminance)
and the
image Root-Mean-Square
(RMS) contrast CRMS.
The earlier global masking metric computed
text contrast with respect to the average background luminance and only used
the background contrast in the divisive masking term.
Because the contrast of the experimental conditions
was defined relative to the background, the global metric predicted no effect
of text polarity. The adjusted metric correctly predicted
the direction but under-predicted the magnitude of the polarity effect, an improved performance for the
negative contrast text as compared with the positive contrast text.When
the masking index was adjusted to
include the text pixels as well as the background pixels in computations of
mean luminance and contrast variability, predictability improved further (r =
0.91) and the index now correctly
predicted the improved performance for multiplicative (negative polarity) over
additive (positive polarity) transparent text.
However, theNeither indexmetrices
correctly did
not accurately predicted which background texture would be the most
detrimental to readabilityhave the worse
effect. The indexy predicted
that the “wave” pattern (Figure 1, center, shown with multiplicative
transparency text at 45% contrast) would be more detrimental to
reading than the “culture” background (Figure 1, left, shown with
additive transparency text at 45% contrast) because the wave background had a
larger RMS contrast (0.27 and 0.15, respectively). In fact, the additive culture pattern was the most difficult
condition to read. (GA graphs of these previous results
areis
included in our
Figures 3 and 44 [below] to serve as a for comparison for with the current
results.)


Figure 11. From left to right,
sSample
background patterns with text used in the transparent text experiment1: (left) culture
pattern (shown
with additive transparency text at 45% contrast), (center) wave pattern (shown with
multiplicative transparency text at 45% contrast),
and (right) plain
background (shown
with additive transparency text at 30% contrast).
Each of these examples was
cut from the top left corner of an actual stimulus.
One possibility to explaination for the discrepantis
pattern difference is that the spatial
homogeneity of the culture pattern made it difficult to read any of the
letters, while the large plain areas in the wave pattern allowed some letters
to be seen clearly, althoughwhile
other letters were difficult to identify.
the spatially
homogeneousity of the
culture pattern made it difficult to read any of
the letters,. Mean while
the large white areas in the wave pattern allowed some letters to be seen
clearly, although other letters were difficult to identify. Assuming
If
some clear letters leads to are better
readability
than all somewhat-masked letters, masking for the
wave pattern shcould be less than
predicted by the metric.
The task in the original text readability experiment was to
find one of three target words (triangle, circle, or square) located within
paragraphs of text placed on textured backgrounds. Once the target word was located, the
participant clicked on the corresponding shape located at the bottom of the
screen. See Scharff and Ahumada 3 (2002)
for full stimulus examples.
For In the current
experiment, the target words were cut from the original text stimuli, so that
the backgrounds exactly matched those in the original conditions. Participants performed two word tasks: an
identification task, and a discrimination task using decoy words also cut from
the original text stimuli. These two word experiments were used in an
effort to discretely testdifferentially weight
different aspects of the original task that might have led to the difficulty
with the additive transparency, low contrast, culture-pattern condition. The
word identification task more directly measured readability of the specific
target words, regardless of the other words around them. The discrimination
task required higher-level cognitive processing that occurs when comparing
words in order to determine which is the target. The target words were also chopped up
so that the legibility of the individual letters could be measured. The letter task not
only allowed us to determine if there was a critical number of letters needed
to yield good word readability, but it also allowed for
a more direct comparison with previous research on legibility
using letters rather than words. If the index were applied to the letters alone, it
might do a better job because of the reduced integration range. (do we
want to get into this latter point in this paper?)
. If a letter is in a constant region of the background, the metric will now only consider that region and not reduce its equivalent contrast because of contrast variation elsewhere in the background.
Do we want to also introduce the new stats here and
the idea that the positive and negative polarity systems have different gains?
Twenty-eight participants completed all of the word tasks. Half of these participants completed the letter task for the additive transparency condition, while the remaining 14 did so for the multiplicative condition. An additional 2 participants completed the letter experiment, so that there were 15 for each transparency type. All participants were undergraduate psychology students who received course research participation credit.
For all experiments, the words and letters were cut from the
original transparent text stimuli, so that the way they were placed on a
specific part of the background was identical to the original conditions1. See Figure 3 2 for examples
of cutout words and letters. As in the original experiment we used all
combinations of additive and multiplicative transparency, the plain, culture
and wave backgrounds, and 30% and 45% text contrast. Experiment order was counterbalanced across
participants so that half did the letter experiment first, and half did the
word experiments first. The order of
the word experiments was also counterbalanced across participants. Viewing distance was controlled and matched
to the original experiment by using a headrest. Total testing time was approximately one hour.


Figure 32. The t(Top.) op image shows oOne each of
the three target words cut from the three background patterns. The b(Bottom.) image
shows tThese same words chopped into letters. All examples are of additive transparency
with 30% contrast placed on a plain background of the average luminance of the
patterned backgrounds.
In addition to the above three variables (transparency type, pattern behind the word, and contrast), the word identification task experiment included an additional variable, the background upon which the word was pasted. The background was either a plain background of the same average luminance as that used in the original experiment (~47 cd/m2), or the background matched the background behind the word. Each of these background conditions included all the target words (six trials for each condition, each of the three target words used twice). Note that, for the plain background cases, the words were cut from the three backgrounds in the original experiment, so 1/3 had the wave pattern in a small rectangular area behind the word, another 1/3 had the culture background, and the final third had the plain background, so it blended smoothly with the larger background. For the patterned background condition, each word was placed so that the background pattern behind the word matched the larger background. Thus, the placement of the words was not perfectly centered, but varied somewhat across the trials. In order to make the word identification tasks as similar as possible other than the larger background patterns, the target word placement was also varied around center when using the plain backgrounds. As in the original experiment, there were three shapes at the bottom of the screen, and the participant’s task was to click on the corresponding shape as quickly as possible. Transparency type and the large background conditions were blocked, but word background pattern and contrast were randomized within each block.
The word discrimination task experiment required that the participants make a forced-choice decision regarding whether or not the presented word was one of the three target words (triangle, circle, or square), or one of six other words also cut from the original stimuli. These decoy words were chosen to be similar to the target words: financial, telephone, crisis, cities, shared, and soured. All words were individually placed in the center of a plain background of the same average luminance as that used in the original experiment (~47 cd/m2).
For the letter task, each of the target words (six each for
all conditions) was chopped so that the letters could be tested for
identifiability on the original textured backgrounds. Due to testing time constraints, transparency type was blocked
and run as a between variable.
Otherwise, letter presentation was randomized. For each trial, a small fixation square was presented, followed
by the letter presented on a plain background of the same average luminance as
that used in the original experiment (~47 cd/m2). The letter remained onscreen until the
participant indicated its identity by typing it using a standard keyboard. Feedback was given on each trial by a text
message that indicated the correct letter.
Experiment order was
counterbalanced. Half of the participants did the letter
experiment first, and half did the word experiments first. The order of the word experiments was also
counterbalanced across participants.
Viewing distance was controlled and matched to the original experiment
by using a headrest. Total testing time
was approximately one hour.
For each participant, median reaction times for each
condition were calculated for each of the experiments. Accuracy was also recorded. In all cases for the word experiments,
accuracy was close to perfect; thus, only reaction times were used in the
analyses. For the letter experiment,
accuracy is the median number of letters correct across the six words in each
condition.
A single 4-way ANOVA was performed on the word identification
data (type of large background, transparency type, background pattern behind
the word, and contrast). There were two
significant main effects, and four significant two-way interactions. See Table A1 for the summary table. The pattern behind the words significantly
affected response times, so that the words placed on plain backgrounds were
identified more quickly than those on either of the patterned backgrounds, and
there were no differences for the two backgrounds behind the words. Low text contrast conditions were significantly
slower than high contrast conditions.
The large background did significantly interact with the pattern behind
the words, as did transparency type and contrast. Finally, contrast interacted with transparency type. The wave pattern words placed on the wave
pattern were significantly harder to identify than those placed on the plain
background. The large background made
no difference for the plain or culture pattern words. Culture pattern words presented using additive transparency were
identified significantly slower than those presented using multiplicative
transparency. Transparency type had no
effect for the wave or plain pattern words.
Similarly, the culture pattern words were identified significantly more
slowly at the low contrast, while there was no effect for the wave or plain
pattern words. Finally, the additive
transparency conditions were significantly slower than multiplicative
conditions only for the low contrast conditions. Contrast did not affect the multiplicative conditions. Figure 4 3 shows the
3-way interaction between transparency type, word pattern, and contrast. Large background is not shown as it was not
manipulated in the word discrimination and letter experiments.
A 4-way ANOVA was also performed on the word
discrimination data (target v. decoy, transparency type, background pattern
behind the word, and contrast). See
Table A2 for the statistical summary table.
As with the identification task, there were only significant main
effects for word pattern and contrast, with plain pattern words and higher
contrasts being discriminated more quickly.
Contrast significantly interacted with transparency type, target/decoy,
and word pattern. Further, transparency
type and word pattern interacted. Similar to the word identification task, the
additive transparency condition was significantly slower for the low contrast
conditions, but the multiplicative conditions were not affected by
contrast. Unlike the word
identification task, both the wave and culture word patterns were significantly
slower when using low contrast; again there was no effect of contrast for the
plain conditions. Also similar to the
word identification task, the culture word pattern led to slower discrimination
for the additive transparency conditions; transparency type did not affect the
multiplicative or plain word patterns.
Both the target words and decoy words were significantly slower for the
low contrast conditions; further, at low contrast the decoy words were
discriminated more slowly than the target words, but there was no difference at
high contrast. Figure 3 shows the 3-way
interaction between transparency type, word pattern, and contrast. The target/decoy variable is not shown, as it was not
manipulated in the word identificationdiscrimination and letter experiments.
For letter identification data a 3-way ANOVA was
performed using transparency type (a between participants variable for the
letter task), the pattern behind the letter, and contrast. For the letter identification summary table,
see Table A3. There were significant
main effects for contrast (low contrast slower) and pattern. The plain background led to significantly
faster identification times than both patterns, but in contrast with the word
tasks, the culture pattern letters led to significantly faster responses than the wave
pattern letters. The only significant
interaction occurred between letter pattern and contrast, with the plain
pattern letters not affected by contrast, and the wave and culture pattern
letters both slower at the lower contrast.
Figure 3 shows the 3-way interaction between transparency type, word
pattern, and contrast.









Figure
34: Three-way interactions between transparency
type, pattern, and contrast for word identification, word discrimination,
letter identification, and for comparison, the paragraph word search task from
Scharff and Ahumada 31 (2002). Error bars indicate standard errors 95% confidence
intervalsfor each condition. Only for the letter identification task is
the low contrast, additive transparency (open symbols), wave pattern
(circle symbols) condition slower than the corresponding culture pattern condition.
A 4-way ANOVA was also performed on the word
discrimination data (target v. decoy, transparency type, background pattern
behind the word, and contrast). See
Table A2 for the statistical summary table.
As with the identification task, there were only significant main
effects for word pattern and contrast, with plain pattern words and higher
contrasts being discriminated more quickly.
Contrast significantly interacted with transparency type, target/decoy,
and word pattern. Further, transparency
type and word pattern interacted. Similar to the word identification task, the
additive transparency condition was significantly slower for the low contrast
conditions, but the multiplicative conditions were not affected by
contrast. Unlike the word
identification task, both the wave and culture word patterns were significantly
slower when using low contrast; again there was no effect of contrast for the
plain conditions. Also similar to the
word identification task, the culture word pattern led to slower discrimination
for the additive transparency conditions; transparency type did not affect the
multiplicative or plain word patterns.
Both the target words and decoy words were significantly slower for the
low contrast conditions; further, at low contrast the decoy words were
discriminated more slowly than the target words, but there was no difference at
high contrast. Figure 4 shows
the 3-way interaction between transparency type, word pattern, and
contrast. The target/decoy variable is
not shown as it was not manipulated in the word discrimination and letter
experiments.
Letter identification data was
analyzed for both identification time and accuracy. For both, a
3-way ANOVA was performed using transparency type (a between participants
variable for the letter task), the pattern behind the letter, and
contrast. For the letter identification
summary table, see Table A3, and for the
accuracy summary table, see Table A4. For the
identification times, there were significant
main effects for contrast (low contrast slower) and pattern. The plain pattern led to significantly
faster identification times than both patterns, but unlike with the word tasks,
the culture pattern letters were responded to significantly faster than the
wave pattern letters. The only
significant interaction occurred between letter pattern and contrast, with the
plain pattern letters not affected by contrast, and the wave and culture
pattern letters both slower at the lower contrast. Figure 4 shows
the 3-way interaction between transparency type, word pattern, and contrast.
For letter identification accuracy, all main
effects and 2-way interactions were significant. Multiplicative transparency led to higher accuracy than additive
transparency, and high contrast led to higher accuracy. The plain letter pattern had higher accuracy
than culture pattern letters, which had higher accuracy than wave pattern
letters. However, the difference
between the patterns was only significant for the multiplicative transparency
conditions. When using additive
transparency, culture and wave pattern letters had the same accuracy. Contrast affected the additive transparency
conditions more strongly than the multiplicative conditions. Finally, the plain pattern letter accuracy
was not affected by contrast, while the wave and culture pattern letters were
affected; further, at the higher contrast, the culture pattern letters showed
higher accuracy than the wave pattern letters, while there was no difference
for the low contrast. Figure 5 shows the 3-way interaction between transparency
type, word pattern, and contrast for the accuracy data.

Figure 5: Letter Identification Accuracy as a
function of transparency type, pattern, and contrast. Error bars indicate standard error.
Our above hypothesis proposed that the wave pattern
led to better search times than the culture pattern for the original paragraph
search task because more letters were easily identifiable in the wave pattern
words. The letter analyses seem to
suggest that the culture pattern letters are actually more identifiable. In order to further investigate this issue, for
each word for each condition, the number of letters correctly identified was
tallied…. I will continue with this
when I get back.
We computed predictions
for the above results based on the contrast measures described in Scharff and
Ahumada1. Their
contrast-based metrics were computed from the mean luminance and the average
contrast variance in the image. As they
pointed out, the display is not spatially homogeneous because of the space
between the lines of text and the text margins. The observer’s gaze path was not monitored. And the observer’s spatial averaging
functions for computing the effective mean luminance and the contrast variation
are also unknown. Thus, the average
percentage of text pixels in the regions controlling the mean luminance and the
contrast gain are also unknown. Scharff
and Ahumada1 effectively assumed that these regions are the same and
derived a formula for the contrast metric as a function of the effective
proportion of text pixels pT.
They reported the performance of the metric for the value pT
= 0, which gives the Global masking metric, and for the value pT =
0.2, a value close to the proportion of text pixels in a word, ignoring the
spaces between words, the lines between words, and the margins.
Including the text
pixels in the average value of the luminance level defining zero contrast
reduces the contrast of the text in a nonlinear way: the contrast of positive
contrast text is reduced more than the contrast of negative contrast text is
increased. Including the text pixels in
the average thus allowed the metric to correctly predict that the additive text
was more difficult to read than the multiplicative text when the two conditions
had the same contrast with respect to the background alone. However, with a value of pT =
0.2, a seemingly high estimate of the proportion of text pixels, the adjusted
metric still predicted a much smaller polarity effect than was observed. Nonlinear averaging rules can lead to the
less frequent text pixels being given more weight than their relative frequency
would predict, so we decided to also look at the predictions of the metric for
the case of pT = 0.5, which is the highest weight text pixels could
obtain if the averaging rule is symmetric in contrast polarity and treats text
and background in the same way.
Similarly to Scharff and
Ahumada1, we define T to be a vector (list of numbers) of the text
pixel luminances with mean mT and variance vT, and we let
B be the vector of background pixel luminances with mean mB and
variance vB. We let pT
represent the proportion of text pixels (the number of pixels in T over the
number in T and B) and pB = 1-pT. As detailed in Appendix
B in Scharff and Ahumada1, the average text contrast with respect to
the average luminance is given by
CT = mT
/(pT mT + pB mB) -1.
And, the pooled variance
of the contrasts of all the pixels is given by
CRMS2 =
(pT vT + pB vB + pT pB
(mT - mB) 2) / (pT mT +
pB mB)2 .
These are combined in a
contrast-gain-control fashion to give a predicted effective contrast metric,
C = CT / (1 +
CRMS2/C22 ) 0.5.
As before, we set the
contrast masking threshold C2 to the to contrast value of 0.05.
When we fit the above
metric to the paragraph data, the best fitting value for pT was in
the neighborhood of 0.8, which is difficult to explain when there are fewer
than 50% text pixels. One possibility
we considered was that the luminance response of the monitors used in that
study may not have fit the simple gamma function used in the calibration. In the current studies with words and
letters, only one monitor was used in each study and an empirical gamma
function was used to accurately account for the entire luminance range. Another possibility is that the observer’s
gains for positive and negative contrast are just not the same. We decided to allow for such an effect in
the metric in a simple way: We included a contrast gain asymmetry factor A,
used only when the contrast or the effective contrast is positive,
C ¬ (AC), if C > 0.
We used three values of
pT, pT=0, pT=0.2, and pT=0.5 and
looked at the symmetric predictions (A=1) and also searched for the best value
of A for each value of the three values of pT for each of the four
experiments. For the word and letter
experiments, each metric value was computed for each stimulus separately. The condition prediction was then the mean
of the metrics for all the stimuli in that condition.
Table 1 shows for each
of the four experiments and for each value of pT the value of A that
gave the best correlation and rA, the value of that
correlation. The correlation when A=1
is labeled r1 and the F values test the significance of the
difference between rA and r1
:
F = [ (rA2
– r12 ) / df1 ] / [ (1- rA2 )
/ df2 ],
where the numerator
degrees of freedom df1 is 1, from the one additional parameter
estimated (A), and the denominator degrees of freedom df2=9 is the
number of data points (12) less two for the regression parameters and less one
more for the asymmetry parameter. The
tabled results show a trade-off between the parameters A and pT in
that when a larger value of pT is used, a smaller value of A is
required. For the paragraph and word
experiments, the improvement in fit from the addition of the asymmetry
parameter is not significant, but for the letter experiment, the asymmetry
factor makes a highly significant contribution even for the case of pT=0.5,
where the best fitting value of the asymmetry parameter is very close to unity
(A=0.85). For the letter experiment,
the pT = 0, A = 0.49 fit is as good as those in any of the other
experiments, but it is significantly worse than the non-zero pT
fits, which are similar to each other and extremely good1 (2002)indexesmetricsTWe had no control
over t’swas
not monitored. Aawe do not know observer’s are also unknownwe really do not
know what is are also unknownScharff and Ahumada1 They indexmetricindexmetricwhich gives the Global masking metric,corresponding to the
index used by Scharff, Hill, and Ahumada (2000)indexmetricadjusted indexmetricindexmetric1 (2002),Aan aB 1 (2002)indexmetric2 indexmetricindexmetricwW ¬ (← ), , these correlationsrA and ,r1 which can be
regarded as representing the predictable variance from nested hypotheses ,conditionsexperiments, , .
Table 1: Metric
goodness-of-fit (see text for explanation).
|
pT A rA r1 F |
|
|
|
0.0
0.677 -0.770 -0.734 1.200 |
|
Paragraphs 0.2 0.801 -0.815 -0.758 2.431 |
|
0.5 0.867 -0.806 -0.768 1.527 |
|
|
|
0.0
0.889
-0.825 -0.822 0.144 |
|
Word search 0.2 1.056 -0.857
-0.853 0.220 |
|
0.5 1.129 -0.818 -0.781 1.642 |
|
|
|
|
|
Discrimination 0.2 1.065 -0.858
-0.853 0.299 |
|
0.5 1.134 -0.840 -0.799 2.075 |
|
|
|
|
|
Identification 0.2 0.774 -0.963
-0.835 28.9*** |
|
0.5 0.851 -0.966 -0.887 19.7** |
|
|
|
*F(1,9,0.95) = 5.12; **F(1,9,0 |