Archives of Acoustics, 40, 4, pp. 585–594, 2015

Is a Multi-Slider Interface Layout Responsible for a Stimulus Spacing Bias in the MUSHRA Test?

Białystok University of Technology

The multi-stimulus test with hidden reference and anchors (MUSHRA) is commonly used for subjective quality assessment of audio systems. Despite its wide acceptance in scientific and industrial sectors, the method is not free from bias. One possible source of bias in the MUSHRA method may be attributed to a graphical design of its user interface. This paper examines the hypothesis that replacement of the standard multi-slider layout with a single-slider version could reduce a stimulus spacing bias observed in the MUSHRA test. Contrary to the expectation, the aforementioned modification did not reduce the bias. This outcome formally supports the validity of using multiple sliders in the MUSHRA graphical interface.
Keywords: audio quality assessment; subjective quality evaluation; listening tests; psychoacoustics; multi stimulus test with hidden reference and anchors; MUSHRA.
Full Text: PDF
Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN).


Bech S. (1992), Selection and training of subjects for listening tests on sound reproducing equipment, J. Audio Eng. Soc., 40, 590–610.

Beresford K., Ford N., Rumsey F., Zieliński S. (2006), Contextual Effects on Sound Quality Judgements: Part II – Multi-Stimulus vs. Single Stimulus Method, Presented at the 121st Convention of the Audio Engineering Society, Paper 6913.

Berg J., Bustad Ch., Jonsson L., Mossberg L., Nyberg D. (2013), Perceived Audio Quality of Realistic FM and DAB+ Radio Broadcasting Systems, J. Audio Eng. Soc., 61, 755–777.

Blauert J., Jekosch U. (2012), A Layer Model of Sound Quality, J. Audio Eng. Soc., 60, 4–12.

Christie D. (2008), On the Effect of Slider Presentation within the MUSHRA Test, Final Year Tonmeister Technical Project, Institute of Sound Recording, University of Surrey.

EBU Tech 3296 Technical Document (2003), EBU subjective listening tests on low-bitrate audio codecs, European Broadcasting Union, Geneva, Switzerland.

EBU Tech 3324 Technical Document (2007), EBU evaluations of multichannel audio codecs, European Broadcasting Union, Geneva, Switzerland.

ITU-R Rec. BS.1534-2 (2001–2014), Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems, International Telecommunications Union, Geneva, Switzerland.

ITU-T Rec. P.800 (1996), Methods for objective and subjective assessment of quality, International Telecommunications Union, Geneva, Switzerland.

Howell D.C. (1997), Statistical Methods for Psychology, Duxbury, New York.

Lawless H.T., Heymann H. (1998), Sensory Evaluation of Food, Kluwer-Plenum, London.

Lee S., Lee Y-T., Seo J., Baek M-S., Lim Ch-H., Park H. (2011), An Audio Quality Evaluation of Commercial Digital Radio Systems, IEEE Transactions on Broadcasting, 57, 629–636.

Levine T.R., Hullett C.R. (2002), Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research, Human Communication Research, 28, 612–625.

Liebetrau J. et al. (2014), Revision of Rec. ITU-R BS.1534, Presented at the 137th Convention of the Audio Engineering Society, Paper 9172, Los Angeles.

Mellers B.A., Birnbaum M.H. (1982), Loci of Contextual Effects in Judgment, Journal of Experimental Psychology: Human Perception and Performance, 8, 582–601.

Möller S. (2000), Assessment and Prediction of Speech Quality in Telecommunications, Kluwer Academic Publishers, London.

Neuendorf M. et al. (2013), The ISO/MPEG Unified Speech and Audio Coding Standard – Consistent High Quality for All Content Types and at All Bit Rates, J. Audio Eng. Soc., 61, 956–977.

Olejnik S., Algina J. (2000), Measures of Effect Size for Comparative Studies: Applications, Interpretations, and Limitations, Contemporary Educational Psychology, 25, 241–286.

Olive S.E. (2003), Differences in Performance and Preference of Trained versus Untrained Listeners in Loudspeaker Tests: A Case Study, J. Audio Eng. Soc., 51, 806–825.

Poulton E.C. (1989), Bias in Quantifying Judgments, Lawrence Erlbaum, London.

Rumsey F., Zieliński S., Kassier R., Bech S. (2005), Relationships between experienced listener ratings of multichannel audio quality and naïve listener preferences, J. Acoust. Soc. Am., 117, 3832–3840.

Schinkel-Bielefeld N., Lotze N., Nagel F. (2013), Audio quality evaluation by experienced and inexperienced listeners, Proceeding of Meeting on Acoustics, 19, ICA, Montreal, Canada.

Schmider E., Ziegler M., Danay E., Beyer L., Bühner M. (2010), Is It Really Robust? Reinvestigating the Robustness of ANOVA Against Violations of the Normal Distribution Assumption, Methodology European Journal of Research Methods for the Behavioral and Social Sciences, 6, 4, 147–151.

Soulodre G.A., Lavoie M.C. (1999), Subjective Evaluation of Large and Small Impairments in Audio Codecs, Presented at the 17th Audio Engineering

Society International Conference: High-Quality Audio Coding, Florence.

Wickelmaier F., Umbach N., Sergin K., Choisel S. (2012), Scaling sound quality using models for paired-comparison and ranking data, Presented at

DAGA 2012 Congress, Germany.

Zieliński S., Hardisty P., Hummersone C., Rumsey F. (2007), Potential Biases in MUSHRA Listening Tests, Presented at the 123rd Convention of the Audio Engineering Society, Paper 7179, New York.

Zieliński S., Rumsey F., Bech S. (2003), Effects of Down-Mix Algorithms on Quality of Surround Sound, J. Audio Eng. Soc., 51, 780–798.

Zieliński S., Rumsey F., Bech S. (2008), On Some Biases Encountered in Modern Audio Quality Listening Tests – A Review, J. Audio Eng. Soc., 56, 427–451.

DOI: 10.1515/aoa-2015-0058