Limits...
Achieving across-laboratory replicability in psychophysical scaling.

Ward LM, Baumann M, Moffat G, Roberts LE, Mori S, Rutledge-Taylor M, West RL - Front Psychol (2015)

Bottom Line: It is well known that, although psychophysical scaling produces good qualitative agreement between experiments, precise quantitative agreement between experimental results, such as that routinely achieved in physics or biology, is rarely or never attained.Constrained scaling (CS), in which observers first learn a standardized meaning for a set of numerical responses relative to a standard sensory continuum and then make magnitude judgments of other sensations using the learned response scale, has produced excellent quantitative agreement between individual observers' psychophysical functions.In general, we found across experiment and across-laboratory agreement using CS to be significantly superior to that typically obtained with conventional magnitude estimation techniques, although some of its potential remains to be realized.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology and Brain Research Centre, University of British Columbia, Vancouver BC, Canada.

ABSTRACT
It is well known that, although psychophysical scaling produces good qualitative agreement between experiments, precise quantitative agreement between experimental results, such as that routinely achieved in physics or biology, is rarely or never attained. A particularly galling example of this is the fact that power function exponents for the same psychological continuum, measured in different laboratories but ostensibly using the same scaling method, magnitude estimation, can vary by a factor of three. Constrained scaling (CS), in which observers first learn a standardized meaning for a set of numerical responses relative to a standard sensory continuum and then make magnitude judgments of other sensations using the learned response scale, has produced excellent quantitative agreement between individual observers' psychophysical functions. Theoretically it could do the same for across-laboratory comparisons, although this needs to be tested directly. We compared nine different experiments from four different laboratories as an example of the level of across experiment and across-laboratory agreement achievable using CS. In general, we found across experiment and across-laboratory agreement using CS to be significantly superior to that typically obtained with conventional magnitude estimation techniques, although some of its potential remains to be realized.

No MeSH data available.


Representative psychophysical functions of “best” and “worst” observers for judgments of loudness of 65, 500, or 5000 Hz pure tones from 52-stimulus and 17-stimulus protocols run in the four different laboratories. These functions are from runs in which judgments of tones at the indicated frequency without feedback were interleaved with judgments of 1000 Hz tones with feedback.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4488602&req=5

Figure 4: Representative psychophysical functions of “best” and “worst” observers for judgments of loudness of 65, 500, or 5000 Hz pure tones from 52-stimulus and 17-stimulus protocols run in the four different laboratories. These functions are from runs in which judgments of tones at the indicated frequency without feedback were interleaved with judgments of 1000 Hz tones with feedback.

Mentions: Figure 3 displays representative psychophysical functions from the “best” and “worst” observers (in terms of rRP2) across all four laboratories from the present experiments. These are from the first recalibration run at 1000 Hz with feedback (interleaved with judgments of silence for the normal subjects, and with judgments of tinnitus magnitude for the tinnitus sufferers) for the 52-stimulus and the 17-stimulus protocols separately. Figure 4 does the same for the 65, 500, and 5000 Hz data. It should be stressed that individual responses to individual stimuli are plotted in Figure 3 and in Figure 4, in contrast to usual psychophysical functions that, even when plotted for individual observers, consist of points based on from several (around 10 is a typical minimum) to many (sometimes over 50) judgments per stimulus. This renders the present functions even more impressive because the well-known variability of responses to repeated presentations of the same stimulus has not been averaged out (there were no repeats of the same stimulus, of course, in these functions). The functions in Figure 3 and in Figure 4 are very comparable to those reported by West et al. (2000). The worst observer for the 17-stimulus protocol and several others across the various experiments with this protocol, however, as mentioned earlier, had rRP2 values lower than our rule-of-thumb criterion of 0.67 for at least one run. Overall, about half of the observers in the 17-stimulus experiments had at least one 1000-Hz run (from among four or five that they completed), usually a later recalibration run, that fell below our criterion. And several observers in each experiment had rRP2 < 0.67 for at least one run for the other frequencies, represented in the “worst” cases for these frequencies. Clearly for these observers on those runs 17 trials were not enough for a reliable estimate of the power function exponent, particularly for novel stimuli.


Achieving across-laboratory replicability in psychophysical scaling.

Ward LM, Baumann M, Moffat G, Roberts LE, Mori S, Rutledge-Taylor M, West RL - Front Psychol (2015)

Representative psychophysical functions of “best” and “worst” observers for judgments of loudness of 65, 500, or 5000 Hz pure tones from 52-stimulus and 17-stimulus protocols run in the four different laboratories. These functions are from runs in which judgments of tones at the indicated frequency without feedback were interleaved with judgments of 1000 Hz tones with feedback.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4488602&req=5

Figure 4: Representative psychophysical functions of “best” and “worst” observers for judgments of loudness of 65, 500, or 5000 Hz pure tones from 52-stimulus and 17-stimulus protocols run in the four different laboratories. These functions are from runs in which judgments of tones at the indicated frequency without feedback were interleaved with judgments of 1000 Hz tones with feedback.
Mentions: Figure 3 displays representative psychophysical functions from the “best” and “worst” observers (in terms of rRP2) across all four laboratories from the present experiments. These are from the first recalibration run at 1000 Hz with feedback (interleaved with judgments of silence for the normal subjects, and with judgments of tinnitus magnitude for the tinnitus sufferers) for the 52-stimulus and the 17-stimulus protocols separately. Figure 4 does the same for the 65, 500, and 5000 Hz data. It should be stressed that individual responses to individual stimuli are plotted in Figure 3 and in Figure 4, in contrast to usual psychophysical functions that, even when plotted for individual observers, consist of points based on from several (around 10 is a typical minimum) to many (sometimes over 50) judgments per stimulus. This renders the present functions even more impressive because the well-known variability of responses to repeated presentations of the same stimulus has not been averaged out (there were no repeats of the same stimulus, of course, in these functions). The functions in Figure 3 and in Figure 4 are very comparable to those reported by West et al. (2000). The worst observer for the 17-stimulus protocol and several others across the various experiments with this protocol, however, as mentioned earlier, had rRP2 values lower than our rule-of-thumb criterion of 0.67 for at least one run. Overall, about half of the observers in the 17-stimulus experiments had at least one 1000-Hz run (from among four or five that they completed), usually a later recalibration run, that fell below our criterion. And several observers in each experiment had rRP2 < 0.67 for at least one run for the other frequencies, represented in the “worst” cases for these frequencies. Clearly for these observers on those runs 17 trials were not enough for a reliable estimate of the power function exponent, particularly for novel stimuli.

Bottom Line: It is well known that, although psychophysical scaling produces good qualitative agreement between experiments, precise quantitative agreement between experimental results, such as that routinely achieved in physics or biology, is rarely or never attained.Constrained scaling (CS), in which observers first learn a standardized meaning for a set of numerical responses relative to a standard sensory continuum and then make magnitude judgments of other sensations using the learned response scale, has produced excellent quantitative agreement between individual observers' psychophysical functions.In general, we found across experiment and across-laboratory agreement using CS to be significantly superior to that typically obtained with conventional magnitude estimation techniques, although some of its potential remains to be realized.

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology and Brain Research Centre, University of British Columbia, Vancouver BC, Canada.

ABSTRACT
It is well known that, although psychophysical scaling produces good qualitative agreement between experiments, precise quantitative agreement between experimental results, such as that routinely achieved in physics or biology, is rarely or never attained. A particularly galling example of this is the fact that power function exponents for the same psychological continuum, measured in different laboratories but ostensibly using the same scaling method, magnitude estimation, can vary by a factor of three. Constrained scaling (CS), in which observers first learn a standardized meaning for a set of numerical responses relative to a standard sensory continuum and then make magnitude judgments of other sensations using the learned response scale, has produced excellent quantitative agreement between individual observers' psychophysical functions. Theoretically it could do the same for across-laboratory comparisons, although this needs to be tested directly. We compared nine different experiments from four different laboratories as an example of the level of across experiment and across-laboratory agreement achievable using CS. In general, we found across experiment and across-laboratory agreement using CS to be significantly superior to that typically obtained with conventional magnitude estimation techniques, although some of its potential remains to be realized.

No MeSH data available.