Contrast mMeasures for pPredicting tText rReadability

 

L. F. V. Scharff 1, and A. J. Ahumada Jr.2;

 

Stephen F. Austin State University, Nacogdoches, TX1;

NASA Ames Research Center, Moffett Field, CA2.

 

Abstract

 

If text pixels have a constant luminance T on a constant background luminance B, the contrast of the text can be represented by the equation C=T/(pT+(1-p)B)-1.  If p=0.5 the equation gives the Mickelson contrast.  If p is the proportion of text pixels, the equation gives the contrast relative to the average luminance.  And, if p=0, the equation gives the contrast relative to the background, the most common measure for text contrast.  In experiments using positive and negative contrast, when the contrasts were equated by the equation C=T/B-1, we found that the negative contrast text was more readable than the positive contrast text, suggesting that it would be better to measure text contrast using the Mickelson formula or the average luminance formula.  Our previous experiments with additive and multiplicative transparent text on textured backgrounds show that readability can be more accurately predicted by adjusting the contrast with a contrast-gain-like divisive factor that includes the background RMS contrast.  However, the factor performed poorly at predicting readability of differences on the two different patterned backgrounds.  Using the same imagesWe of the previous study we presented the target words alone and single letters cut out of  the target words.  We found that word identification and word discriminability was affected by the backgrounds in the same way that the paragraph search performance was affected, but that measured letter identifiability on these two backgrounds and found that it was predicted by the metricasure.  We also found a significant improvement from including different contrast gains for positive and negative contrasts in the metric. Unfortunately, w Word readability is unfortunately not necessarily simply related to letter identifiability and simple contrast measures.

 

Keywords:   text readability, text contrast, visual masking

 

1. Introduction

 

Scharff and Ahumada 31(2002) measured text readability for two types of transparent text: additive (as occurs in head-up displays) and multiplicative (which occurs in see-through LCD virtual reality displays).  These two types of transparency resulted in polarity differences: additive text wais lighter than the background (positive polarity), and multiplicative text wais darker than the background (negative polarity).  Text contrast and background texture were also manipulated. Their Global Masking index (that combines text contrast and background RMS contrast) predicted search times well (r = 0.89). 

 

To predict their results, Scharff and Ahumada1 used a contrast metric similar to their earlier Global masking metric2, 3.  Their adjusted masking index,

 

C = CT / (1 + (CRMS/C2) 2 ) 0.5 ,

 

combined text contrast CT (with respect to the image luminance) and the image Root-Mean-Square (RMS) contrast CRMS.  The earlier global masking metric computed text contrast with respect to the average background luminance and only used the background contrast in the divisive masking term. 

 

Because the contrast of the experimental conditions was defined relative to the background, the global metric predicted no effect of text polarity.  The adjusted metric correctly predicted the direction but under-predicted the magnitude of the polarity effect, an improved performance for the negative contrast text as compared with the positive contrast text.When the masking index was adjusted to include the text pixels as well as the background pixels in computations of mean luminance and contrast variability, predictability improved further (r = 0.91) and the index now correctly predicted the improved performance for multiplicative (negative polarity) over additive (positive polarity) transparent text.

 

However, theNeither indexmetrices correctly did not accurately predicted which background texture would be the most detrimental to readabilityhave the worse effect.  The indexy predicted that the “wave” pattern (Figure 1, center, shown with multiplicative transparency text at 45% contrast) would be more detrimental to reading than the “culture” background (Figure 1, left, shown with additive transparency text at 45% contrast) because the wave background had a larger RMS contrast (0.27 and 0.15, respectively).  In fact, the additive culture pattern was the most difficult condition to read.  (GA graphs of these previous results areis included in our Figures 3 and 44 [below]  to serve as a for comparison for with the current results.)

 

 

Figure 11. From left to right, sSample background patterns with text used in the transparent text experiment1: (left) culture pattern (shown with additive transparency text at 45% contrast),  (center) wave pattern (shown with multiplicative transparency text at 45% contrast), and (right) plain background (shown with additive transparency text at 30% contrast). Each of these examples was cut from the top left corner of an actual stimulus.

 

One possibility to explaination for the discrepantis pattern difference is that the spatial homogeneity of the culture pattern made it difficult to read any of the letters, while the large plain areas in the wave pattern allowed some letters to be seen clearly, althoughwhile other letters were difficult to identify.

 the spatially homogeneousity of the culture pattern made it difficult to read any of the letters,. Mean while the large white areas in the wave pattern allowed some letters to be seen clearly, although other letters were difficult to identify. Assuming If some clear letters leads to are better readability than all somewhat-masked letters, masking for the wave pattern shcould be less than predicted by the metric.

 

The task in the original text readability experiment was to find one of three target words (triangle, circle, or square) located within paragraphs of text placed on textured backgrounds.  Once the target word was located, the participant clicked on the corresponding shape located at the bottom of the screen.  See Scharff and Ahumada 3 (2002) for full stimulus examples.

 

For In the current experiment, the target words were cut from the original text stimuli, so that the backgrounds exactly matched those in the original conditions.  Participants performed two word tasks: an identification task, and a discrimination task using decoy words also cut from the original text stimuli. These two word experiments were used in an effort to discretely testdifferentially weight different aspects of the original task that might have led to the difficulty with the additive transparency, low contrast, culture-pattern condition. The word identification task more directly measured readability of the specific target words, regardless of the other words around them. The discrimination task required higher-level cognitive processing that occurs when comparing words in order to determine which is the target.  The target words were also chopped up so that the legibility of the individual letters could be measured.  The letter task not only allowed us to determine if there was a critical number of letters needed to yield good word readability, but it also allowed for a more direct comparison with previous research on legibility using letters rather than words. If the index were applied to the letters alone, it might do a better job because of the reduced integration range. (do we want to get into this latter point in this paper?)

.  If a letter is in a constant region of the background, the metric will now only consider that region and not reduce its equivalent contrast because of contrast variation elsewhere in the background.

Do we want to also introduce the new stats here and the idea that the positive and negative polarity systems have different gains?

 

2. Methods

2.1.           Participants

 

Twenty-eight participants completed all of the word tasks.  Half of these participants completed the letter task for the additive transparency condition, while the remaining 14 did so for the multiplicative condition.  An additional 2 participants completed the letter experiment, so that there were 15 for each transparency type.  All participants were undergraduate psychology students who received course research participation credit.

 

2.2.           Stimuli and Procedure

 

For all experiments, the words and letters were cut from the original transparent text stimuli, so that the way they were placed on a specific part of the background was identical to the original conditions1.  See Figure 3 2 for examples of cutout words and letters. As in the original experiment we used all combinations of additive and multiplicative transparency, the plain, culture and wave backgrounds, and 30% and 45% text contrast.  Experiment order was counterbalanced across participants so that half did the letter experiment first, and half did the word experiments first.  The order of the word experiments was also counterbalanced across participants.  Viewing distance was controlled and matched to the original experiment by using a headrest.  Total testing time was approximately one hour.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 32. The t(Top.)  op image shows oOne each of the three target words cut from the three background patterns.  The b(Bottom.) image shows tThese same words chopped into letters.  All examples are of additive transparency with 30% contrast placed on a plain background of the average luminance of the patterned backgrounds.

 

In addition to the above three variables (transparency type, pattern behind the word, and contrast), the word identification task experiment included an additional variable, the background upon which the word was pasted.  The background was either a plain background of the same average luminance as that used in the original experiment (~47 cd/m2), or the background matched the background behind the word.  Each of these background conditions included all the target words (six trials for each condition, each of the three target words used twice).  Note that, for the plain background cases, the words were cut from the three backgrounds in the original experiment, so 1/3 had the wave pattern in a small rectangular area behind the word, another 1/3 had the culture background, and the final third had the plain background, so it blended smoothly with the larger background.  For the patterned background condition, each word was placed so that the background pattern behind the word matched the larger background.  Thus, the placement of the words was not perfectly centered, but varied somewhat across the trials.  In order to make the word identification tasks as similar as possible other than the larger background patterns, the target word placement was also varied around center when using the plain backgrounds. As in the original experiment, there were three shapes at the bottom of the screen, and the participant’s task was to click on the corresponding shape as quickly as possible.  Transparency type and the large background conditions were blocked, but word background pattern and contrast were randomized within each block.

 

The word discrimination task experiment required that the participants make a forced-choice decision regarding whether or not the presented word was one of the three target words (triangle, circle, or square), or one of six other words also cut from the original stimuli.  These decoy words were chosen to be similar to the target words: financial, telephone, crisis, cities, shared, and soured.  All words were individually placed in the center of a plain background of the same average luminance as that used in the original experiment (~47 cd/m2).

 

For the letter task, each of the target words (six each for all conditions) was chopped so that the letters could be tested for identifiability on the original textured backgrounds.  Due to testing time constraints, transparency type was blocked and run as a between variable.  Otherwise, letter presentation was randomized.  For each trial, a small fixation square was presented, followed by the letter presented on a plain background of the same average luminance as that used in the original experiment (~47 cd/m2).  The letter remained onscreen until the participant indicated its identity by typing it using a standard keyboard.  Feedback was given on each trial by a text message that indicated the correct letter.

 

Experiment order was counterbalanced. Half of the participants did the letter experiment first, and half did the word experiments first.  The order of the word experiments was also counterbalanced across participants.  Viewing distance was controlled and matched to the original experiment by using a headrest.  Total testing time was approximately one hour.

 

3. Results

 

For each participant, median reaction times for each condition were calculated for each of the experiments.  Accuracy was also recorded.  In all cases for the word experiments, accuracy was close to perfect; thus, only reaction times were used in the analyses.  For the letter experiment, accuracy is the median number of letters correct across the six words in each condition.

 

A single 4-way ANOVA was performed on the word identification data (type of large background, transparency type, background pattern behind the word, and contrast).  There were two significant main effects, and four significant two-way interactions.  See Table A1 for the summary table.  The pattern behind the words significantly affected response times, so that the words placed on plain backgrounds were identified more quickly than those on either of the patterned backgrounds, and there were no differences for the two backgrounds behind the words.  Low text contrast conditions were significantly slower than high contrast conditions.  The large background did significantly interact with the pattern behind the words, as did transparency type and contrast.  Finally, contrast interacted with transparency type.  The wave pattern words placed on the wave pattern were significantly harder to identify than those placed on the plain background.  The large background made no difference for the plain or culture pattern words.  Culture pattern words presented using additive transparency were identified significantly slower than those presented using multiplicative transparency.  Transparency type had no effect for the wave or plain pattern words.  Similarly, the culture pattern words were identified significantly more slowly at the low contrast, while there was no effect for the wave or plain pattern words.  Finally, the additive transparency conditions were significantly slower than multiplicative conditions only for the low contrast conditions.  Contrast did not affect the multiplicative conditions.  Figure 4 3 shows the 3-way interaction between transparency type, word pattern, and contrast.  Large background is not shown as it was not manipulated in the word discrimination and letter experiments.

 

A 4-way ANOVA was also performed on the word discrimination data (target v. decoy, transparency type, background pattern behind the word, and contrast).  See Table A2 for the statistical summary table.  As with the identification task, there were only significant main effects for word pattern and contrast, with plain pattern words and higher contrasts being discriminated more quickly.  Contrast significantly interacted with transparency type, target/decoy, and word pattern.  Further, transparency type and word pattern interacted. Similar to the word identification task, the additive transparency condition was significantly slower for the low contrast conditions, but the multiplicative conditions were not affected by contrast.  Unlike the word identification task, both the wave and culture word patterns were significantly slower when using low contrast; again there was no effect of contrast for the plain conditions.  Also similar to the word identification task, the culture word pattern led to slower discrimination for the additive transparency conditions; transparency type did not affect the multiplicative or plain word patterns.  Both the target words and decoy words were significantly slower for the low contrast conditions; further, at low contrast the decoy words were discriminated more slowly than the target words, but there was no difference at high contrast.  Figure 3 shows the 3-way interaction between transparency type, word pattern, and contrast.  The target/decoy variable is not shown, as it was not manipulated in the word identificationdiscrimination and letter experiments.

 

For letter identification data a 3-way ANOVA was performed using transparency type (a between participants variable for the letter task), the pattern behind the letter, and contrast.  For the letter identification summary table, see Table A3.  There were significant main effects for contrast (low contrast slower) and pattern.  The plain background led to significantly faster identification times than both patterns, but in contrast with the word tasks, the culture pattern letters led to significantly faster responses than the wave pattern letters.  The only significant interaction occurred between letter pattern and contrast, with the plain pattern letters not affected by contrast, and the wave and culture pattern letters both slower at the lower contrast.  Figure 3 shows the 3-way interaction between transparency type, word pattern, and contrast.

 

 

 

 

 

 

 

 

 

 

 

 

 

 



Figure 34:  Three-way interactions between transparency type, pattern, and contrast for word identification, word discrimination, letter identification, and for comparison, the paragraph word search task from Scharff and Ahumada  31 (2002).  Error bars indicate standard errors 95% confidence intervalsfor each condition.  Only for the letter identification task is the low contrast, additive transparency (open symbols), wave pattern (circle symbols) condition slower than the corresponding culture pattern condition.

 

A 4-way ANOVA was also performed on the word discrimination data (target v. decoy, transparency type, background pattern behind the word, and contrast).  See Table A2 for the statistical summary table.  As with the identification task, there were only significant main effects for word pattern and contrast, with plain pattern words and higher contrasts being discriminated more quickly.  Contrast significantly interacted with transparency type, target/decoy, and word pattern.  Further, transparency type and word pattern interacted. Similar to the word identification task, the additive transparency condition was significantly slower for the low contrast conditions, but the multiplicative conditions were not affected by contrast.  Unlike the word identification task, both the wave and culture word patterns were significantly slower when using low contrast; again there was no effect of contrast for the plain conditions.  Also similar to the word identification task, the culture word pattern led to slower discrimination for the additive transparency conditions; transparency type did not affect the multiplicative or plain word patterns.  Both the target words and decoy words were significantly slower for the low contrast conditions; further, at low contrast the decoy words were discriminated more slowly than the target words, but there was no difference at high contrast.  Figure 4 shows the 3-way interaction between transparency type, word pattern, and contrast.  The target/decoy variable is not shown as it was not manipulated in the word discrimination and letter experiments.

 

Letter identification data was analyzed for both identification time and accuracy.  For both, a 3-way ANOVA was performed using transparency type (a between participants variable for the letter task), the pattern behind the letter, and contrast.  For the letter identification summary table, see Table A3, and for the accuracy summary table, see Table A4.  For the identification times, there were significant main effects for contrast (low contrast slower) and pattern.  The plain pattern led to significantly faster identification times than both patterns, but unlike with the word tasks, the culture pattern letters were responded to significantly faster than the wave pattern letters.  The only significant interaction occurred between letter pattern and contrast, with the plain pattern letters not affected by contrast, and the wave and culture pattern letters both slower at the lower contrast.  Figure 4 shows the 3-way interaction between transparency type, word pattern, and contrast.

 

For letter identification accuracy, all main effects and 2-way interactions were significant.  Multiplicative transparency led to higher accuracy than additive transparency, and high contrast led to higher accuracy.  The plain letter pattern had higher accuracy than culture pattern letters, which had higher accuracy than wave pattern letters.  However, the difference between the patterns was only significant for the multiplicative transparency conditions.  When using additive transparency, culture and wave pattern letters had the same accuracy.  Contrast affected the additive transparency conditions more strongly than the multiplicative conditions.  Finally, the plain pattern letter accuracy was not affected by contrast, while the wave and culture pattern letters were affected; further, at the higher contrast, the culture pattern letters showed higher accuracy than the wave pattern letters, while there was no difference for the low contrast. Figure 5 shows the 3-way interaction between transparency type, word pattern, and contrast for the accuracy data.

 

 

 

Figure 5: Letter Identification Accuracy as a function of transparency type, pattern, and contrast.  Error bars indicate standard error.

 

Our above hypothesis proposed that the wave pattern led to better search times than the culture pattern for the original paragraph search task because more letters were easily identifiable in the wave pattern words.  The letter analyses seem to suggest that the culture pattern letters are actually more identifiable.  In order to further investigate this issue, for each word for each condition, the number of letters correctly identified was tallied….  I will continue with this when I get back.

4. Predictions

 

We computed predictions for the above results based on the contrast measures described in Scharff and Ahumada1.  Their contrast-based metrics were computed from the mean luminance and the average contrast variance in the image.  As they pointed out, the display is not spatially homogeneous because of the space between the lines of text and the text margins.  The observer’s gaze path was not monitored.  And the observer’s spatial averaging functions for computing the effective mean luminance and the contrast variation are also unknown.  Thus, the average percentage of text pixels in the regions controlling the mean luminance and the contrast gain are also unknown.  Scharff and Ahumada1 effectively assumed that these regions are the same and derived a formula for the contrast metric as a function of the effective proportion of text pixels pT.  They reported the performance of the metric for the value pT = 0, which gives the Global masking metric, and for the value pT = 0.2, a value close to the proportion of text pixels in a word, ignoring the spaces between words, the lines between words, and the margins.

 

Including the text pixels in the average value of the luminance level defining zero contrast reduces the contrast of the text in a nonlinear way: the contrast of positive contrast text is reduced more than the contrast of negative contrast text is increased.  Including the text pixels in the average thus allowed the metric to correctly predict that the additive text was more difficult to read than the multiplicative text when the two conditions had the same contrast with respect to the background alone.  However, with a value of pT = 0.2, a seemingly high estimate of the proportion of text pixels, the adjusted metric still predicted a much smaller polarity effect than was observed.  Nonlinear averaging rules can lead to the less frequent text pixels being given more weight than their relative frequency would predict, so we decided to also look at the predictions of the metric for the case of pT = 0.5, which is the highest weight text pixels could obtain if the averaging rule is symmetric in contrast polarity and treats text and background in the same way.

 

Similarly to Scharff and Ahumada1, we define T to be a vector (list of numbers) of the text pixel luminances with mean mT and variance vT, and we let B be the vector of background pixel luminances with mean mB and variance vB.  We let pT represent the proportion of text pixels (the number of pixels in T over the number in T and B) and pB = 1-pT. As detailed in Appendix B in Scharff and Ahumada1, the average text contrast with respect to the average luminance is given by

 

CT = mT /(pT mT + pB mB) -1.

 

And, the pooled variance of the contrasts of all the pixels is given by

 

CRMS2 = (pT vT + pB vB + pT pB (mT - mB) 2) / (pT mT + pB mB)2 .

 

These are combined in a contrast-gain-control fashion to give a predicted effective contrast metric,

 

C = CT / (1 + CRMS2/C22 ) 0.5.

 

As before, we set the contrast masking threshold C2 to the to contrast value of 0.05. 

 

When we fit the above metric to the paragraph data, the best fitting value for pT was in the neighborhood of 0.8, which is difficult to explain when there are fewer than 50% text pixels.  One possibility we considered was that the luminance response of the monitors used in that study may not have fit the simple gamma function used in the calibration.  In the current studies with words and letters, only one monitor was used in each study and an empirical gamma function was used to accurately account for the entire luminance range.  Another possibility is that the observer’s gains for positive and negative contrast are just not the same.  We decided to allow for such an effect in the metric in a simple way: We included a contrast gain asymmetry factor A, used only when the contrast or the effective contrast is positive,

 

C  ¬ (AC), if C > 0.

 

We used three values of pT, pT=0, pT=0.2, and pT=0.5 and looked at the symmetric predictions (A=1) and also searched for the best value of A for each value of the three values of pT for each of the four experiments.  For the word and letter experiments, each metric value was computed for each stimulus separately.  The condition prediction was then the mean of the metrics for all the stimuli in that condition.

 

Table 1 shows for each of the four experiments and for each value of pT the value of A that gave the best correlation and rA, the value of that correlation.  The correlation when A=1 is labeled r1 and the F values test the significance of the difference between rA and  r1 :

 

F = [ (rA2 – r12 ) / df1 ] / [ (1- rA2 ) / df2 ],

 

where the numerator degrees of freedom df1 is 1, from the one additional parameter estimated (A), and the denominator degrees of freedom df2=9 is the number of data points (12) less two for the regression parameters and less one more for the asymmetry parameter.  The tabled results show a trade-off between the parameters A and pT in that when a larger value of pT is used, a smaller value of A is required.  For the paragraph and word experiments, the improvement in fit from the addition of the asymmetry parameter is not significant, but for the letter experiment, the asymmetry factor makes a highly significant contribution even for the case of pT=0.5, where the best fitting value of the asymmetry parameter is very close to unity (A=0.85).  For the letter experiment, the pT = 0, A = 0.49 fit is as good as those in any of the other experiments, but it is significantly worse than the non-zero pT fits, which are similar to each other and extremely good1 (2002)indexesmetricsTWe had no control over t’swas not monitored.  Aawe do not know observer’s  are also unknownwe really do not know what  is are also unknownScharff and Ahumada1 They indexmetricindexmetricwhich gives the Global masking metric,corresponding to the index used by Scharff, Hill, and Ahumada (2000)indexmetricadjusted indexmetricindexmetric1 (2002),Aan aB 1 (2002)indexmetric2 indexmetricindexmetricwW ¬ (  ),    , these correlationsrA and ,r1 which can be regarded as representing the predictable variance from nested hypotheses  ,conditionsexperiments, , . 

 

Table 1: Metric goodness-of-fit (see text for explanation).

 

                               pT             A            rA            r1             F

 

                             0.0              0.677       -0.770   -0.734        1.200

Paragraphs           0.2           0.801       -0.815   -0.758        2.431

                             0.5              0.867       -0.806   -0.768        1.527

   

                             0.0              0.889   -0.825   -0.822            0.144

Word search        0.2           1.056   -0.857   -0.853            0.220

                             0.5              1.129   -0.818   -0.781            1.642

        

 Word                               0.0              0.952   -0.776   -0.776            0.018

Discrimination    0.2           1.065   -0.858   -0.853            0.299

                             0.5              1.134    -0.840   -0.799           2.075

             

 Letter                              0.0              0.490    -0.839   -0.724           5.50*

Identification        0.2           0.774    -0.963   -0.835           28.9***

                             0.5              0.851    -0.966   -0.887           19.7**

            

*F(1,9,0.95) = 5.12; **F(1,9,0