Psychoacoustics

Psychoacoustic units

A quick incomplete reference to pascals, Bels, erbs, Barks, sones, Hertz, semitones, Mels and whatever else I happen to need.

The actual auditory system is atrociously complex and I’m not going in to compete against e.g. perceptual models here, even if I did know a stirrup from a hammer or a cochlea from a cauliflower ear. Measuring what we can perceive with our sensory apparatus a whole field of hacks to account for masking effects and variable resolution in time, space and frequency, not to mention variation between individuals.

Nonetheless, when studying audio there are some units which are more natural to human perception than the natural-to-a-physicist physical units such Hz and Pascals. SI units are inconvenient when studying musical metrics or machine listening because they do not closely match human perceptual difference — 50 Herz is a significant difference at a base frequency of 100 Herz, but insignificant at 2000 Hz. But how big this difference is and what it means is rather a complex and contingent question.

Since my needs are machine listening features and thus computational speed and simplicity over perfection, I will wilfully and with malice ignore any fine distinctions I cannot be bothered with, regardless of how many articles have been published discussing said details. For example, I will not cover “salience”, “sonorousness” or cultural difference issues.

Start point: physical units

SPL, Hertz, pascals.

First elaboration: Logarithmic units

This innovation is nearly universal in music studies, because of its extreme simplicity. However, it’s constantly surprising to machine listening researchers who keep rediscovering it when they get frustrated with the FFT spectrogram. Bels/deciBels, semitones/octaves, dbA, dbV…

Next elaboration: “Cambridge” and “Munich” frequency units

Bark and ERB measures; these seem to be more common in the acoustics and psycho-acoustics community. An introduction to selected musically useful bits is given by Parncutt and Strasberger .

According to Moore (2014) the key references for Barks is Zwicker “critical band” research extended by Brian Moore et al, e.g. in Moore and Glasberg (1983).

Traunmüller (1990) gives a simple rational formula to approximate the in-any-case-approximate lookup tables, as does , and both relate these to Erbs.

Barks

Descriptions of Barks seem to start with the statement that above about 500 Hz this scale is near logarithmic in the frequency axis. Below 500 Hz the Bark scale approaches linearity. It is defined by an empirically derived table, but there are analytic approximations which seem just as good.

Traunmüller approximation for critical band rate in bark

$z(f) = \frac{26.81}{1+1960/f} - 0.53$

Lach Lau amends the formula:

$z'(f) = z(f) + \mathbb{I}\{z(f)>20.1\}(z(f)-20.1)* 0.22$

Harmut Traunmüller’s online unit conversion page can convert these for you and Dik Hermes summarises some history of how we got this way.

Erbs

Newer, works better on lower frequencies. (but possibly not at high frequencies?) Seem to be popular for analysing psychoacoustic masking effects?

Erbs are given different formulae and capitalisation depending where you look. Here’s one from for the “ERB-rate”

$H_p(f) = H_1\ln\left(\frac{f+f_1}{f+f_2}\right)+H_0,$

where

$H_1 &=11.17 \text{ erb}\\ H_0 &=43.0 \text{ erb}\\ f_1 &= 312 \text{ Hz}\\ f_2 &= 14675 \text{ Hz}$

Erbs themselves (which is different at the erb-rate for a given frequency?)

$B_e = 6.23 \times 10^{-6} f^2 + 0.09339 f + 28.52.$

Elaboration into space: Mel frequencies

Mels are credited by Traunmüller (1990) to Beranek (1949) and by Parncutt (2005) to Stevens and Volkmann (1940).

The mel scale is not used as a metric for computing pitch distance in the present model, because it applies only to pure tones, whereas most of the tone sensations evoked by complex sonorities are of the complex variety (virtual rather than spectral pitches).

Certainly some of the ERB experiment also used pure tones, but maybe… Ach, I don’t even care.

Mels are common in the machine listening community, mostly through the MFCC, the Mel-frequency Cepstral Transform, which is a metric that seems to be historically popular for measuring psychoacoustic similarity of sounds.

Here’s one formula, the “HTK” formula.

$m(f) = 1127 \ln(1+f/700)$

There are others, such as the “Slanek” formula which is much more complicated and piecewise defined. I can’t be bothered searching for details for now.

Perceptual Loudness

ISO 226:2003 equal-loudness contours

Sones are a power-law-intensity scale. Phons, ibid, are a logarithmic intensity scale, something like the dB level of the signal filtered to match the human ear, which is close to… dbA? Something like that. But you can get more sophisticated. Keyword: Fletcher-Munson curves.

For this level of precision, the coupling of frequency and amplitude into perceptual “loudness” becomes important and they are no longer the same at different source sound frequencies. Instead they are related via equal-loudness contours, which you can get from an actively updated ISO standard at great expense, or try to creconstruct from journals. Suzuki et al. (2003) seems to be the accepted modern version, but their report only lists graphs and is missing values in the few equations. Table-based loudness contours are available under the MIT license from the Surrey git repo, under iso226.m. Closed-form approximations for an equal loudness contour at fixed SPL are given in Suzuki and Takeshima (2004) equation 6.

When the loudness of an $$f$$-Hz comparison tone is equal to the loudness of a reference tone at 1 kHz with a sound pressure of $$p_r$$, then the sound pressure of $$p_f$$ at the frequency of $$f$$ Hz is given by the following function:

$p^2_f =\frac{1}{U^2(f)}\left[(p_r^{2\alpha(f)} - p_{rt}^{2\alpha(f)}) + (U(f)p_{ft})^{2\alpha(f)}\right]^{1/\alpha(f)}$

AFAICT they don’t define $$p_{ft}$$ or $$p_{rt}$$ anywhere, and I don’t have enough free attention to find a simple expression for the frequency-dependent parameters, which I think are still spline-fit. (?)

There is an excellent explanation of the point of all this — with diagrams - by Joe Wolfe.

Onwards and upwards like a Shepard tone

At this point, where we are already combining frequency and loudness, things are getting weird; we are usually measuring people’s reported subjective loudness levels for various signals, some of which are unnatural signals (pure tones), and with real signals we rapidly start running into temporal masking effects and phasing and so on.

Thankfully, I am not in the business of exhaustive cochlear modeling, so I can all go home now. The unhealthily curious might read and tell me the good bits, then move onto sensory neurology.

References

Ball, Philip. 1999. Nature News, August.
———. 2014. Nature, June.
Bartlett, M. S., and J. Medhi. 1955. Biometrika 42 (1/2): 143.
Bauer, Benjamin B. 1970. Journal of the Audio Engineering Society 18 (2): 165–72.
Bauer, B., and E. Torick. 1966. IEEE Transactions on Audio and Electroacoustics 14 (3): 141–51.
Benjamin, Eric. 1994. In Audio Engineering Society Convention 97. Audio Engineering Society.
Beranek, Leo Leroy. 1949.
Bidelman, Gavin M., and Ananthanarayan Krishnan. 2009. Journal of Neuroscience 29 (42): 13165–71.
Bingham, Christopher, M. Godfrey, and John W. Tukey. 1967. Audio and Electroacoustics, IEEE Transactions on 15 (2): 56–66.
Bridle, J. S., and M. D. Brown. 1974. “An Experimental Automatic Word Recognition System.” JSRU Report 1003 (5).
Brown, Judith C. 1991. The Journal of the Acoustical Society of America 89 (1): 425–34.
Cancho, Ramon Ferrer i, and Ricard V. Solé. 2003. Proceedings of the National Academy of Sciences 100 (3): 788–91.
Cariani, P. A., and B. Delgutte. 1996a. Journal of neurophysiology 76 (3): 1698–1716.
———. 1996b. Journal of Neurophysiology 76 (3): 1717–34.
Carter, G.Clifford. 1987. Proceedings of the IEEE 75 (2): 236–55.
Cartwright, Julyan H. E., Diego L. González, and Oreste Piro. 1999. Physical Review Letters 82 (26): 5389–92.
Cedolin, Leonardo, and Bertrand Delgutte. 2005. Journal of Neurophysiology 94 (1): 347–62.
Cheveigné, Alain de, and Hideki Kawahara. 2002. The Journal of the Acoustical Society of America 111 (4): 1917–30.
Cochran, W.T., James W. Cooley, D.L. Favin, H.D. Helms, R.A. Kaenel, W.W. Lang, Jr. Maling G.C., D.E. Nelson, C.M. Rader, and Peter D. Welch. 1967. Proceedings of the IEEE 55 (10): 1664–74.
Cooley, J. W., P. A. W. Lewis, and P. D. Welch. 1970. Journal of Sound and Vibration 12 (3): 339–52.
Cooper, Joel, and Russell H. Fazio. 1984. Advances in Experimental Social Psychology 17: 229–68.
Cousineau, Marion, Josh H. McDermott, and Isabelle Peretz. 2012. Proceedings of the National Academy of Sciences 109 (48): 19858–63.
Dattorro, Jon. n.d. “Madaline Model of Musical Pitch Perception,” 27.
Davis, S., and P. Mermelstein. 1980. IEEE Transactions on Acoustics, Speech, and Signal Processing 28 (4): 357–66.
Du, Pan, Warren A. Kibbe, and Simon M. Lin. 2006. Bioinformatics 22 (17): 2059–65.
Duffin, R. J. 1948. Duke Mathematical Journal 15 (3): 781–85.
Elowsson, Anders, and Anders Friberg. 2017. “Long-Term Average Spectrum in Popular Music and Its Relation to the Level of the Percussion.” In Audio Engineering Society Convention 142, 13. Audio Engineering Society.
Fastl, H., and Eberhard Zwicker. 2007. Psychoacoustics: Facts and Models. 3rd. ed. Springer Series in Information Sciences 22. Berlin ; New York: Springer.
Ferguson, Sean, and Richard Parncutt. 2004. In Proceedings of Sound and Music Computing.
Fineberg, Joshua. 2000. Contemporary Music Review 19 (2): 81–113.
Gerzon, M. A. 1976. Electronics Letters 12 (11): 278–79.
Godsill, S., and Manuel Davy. 2005. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, 283–86. IEEE.
Gómez, Emilia, and Perfecto Herrera. 2004. In ISMIR.
Gräf, Albert. 2010. Signal 3: 6.
Guinan Jr., John J. 2012. Hearing Research 292 (1–2): 35–50.
Harris, Fredric J. 1978. Proceedings of the IEEE 66 (1): 51–83.
Hartmann, William M. 1997. Signals, Sound, and Sensation. Modern Acoustics and Signal Processing. Woodbury, N.Y: American Institute of Physics.
Heikkila, Janne. 2004. IEEE Signal Processing Letters 11 (6): 545–48.
Helmholtz, Heinrich. 1863. Die Lehre von Den Tonempfindungen Als Physiologische Grundlage Für Die Theorie Der Musik. Braunschweig: J. Vieweg.
Hennig, Holger, Ragnar Fleischmann, Anneke Fredebohm, York Hagmayer, Jan Nagler, Annette Witt, Fabian J Theis, and Theo Geisel. 2011. PLoS ONE 6 (10): –26457.
Herman, Irving P. 2007. Physics of the Human Body. Biological and Medical Physics, Biomedical Engineering. Berlin ; New York: Springer.
Hermes, Dik J. 1988. The Journal of the Acoustical Society of America 83 (1): 257–64.
Hove, Michael J., Céline Marie, Ian C. Bruce, and Laurel J. Trainor. 2014. Proceedings of the National Academy of Sciences 111 (28): 10383–88.
Huron, David, and Richard Parncutt. 1993. Psychomusicology: A Journal of Research in Music Cognition 12 (2): 154–71.
Irizarry, Rafael A. 2001. Journal of the American Statistical Association 96 (454): 357–67.
Jacob, Bruce L. 1996. Organised Sound 1 (03): 157–65.
Kameoka, Akio, and Mamoru Kuriyagawa. 1969a. The Journal of the Acoustical Society of America 45 (6): 1451–59.
———. 1969b. The Journal of the Acoustical Society of America 45 (6): 1460–69.
Krishnan, Ananthanarayan, Yisheng Xu, Jackson T. Gandour, and Peter A. Cariani. 2004. Hearing Research 189 (1-2): 1–12.
Lahat, M., Russell J. Niederjohn, and D. Krubsack. 1987. IEEE Transactions on Acoustics, Speech and Signal Processing 35 (6): 741–50.
Langner, Gerald. 1992. Hearing Research 60 (2): 115–42.
Lerdahl, Fred. 1996. Music Perception: An Interdisciplinary Journal 13 (3): 319–63.
Li, W. 1992. IEEE Transactions on Information Theory 38 (6): 1842–45.
Licklider, J. C. R. 1951. Experientia 7 (4): 128–34.
Lorrain, Denis. 1980. Computer Music Journal 4 (1): 53–81.
Ma, Ning, Phil Green, Jon Barker, and André Coy. 2007. Speech Communication 49 (12): 874–91.
Manaris, Bill, Juan Romero, Penousal Machado, Dwight Krehbiel, Timothy Hirzel, Walter Pharr, and Robert B. Davis. 2005. Computer Music Journal 29 (1): 55–69.
Masaoka, Ken’ichiro, Kazuho Ono, and Setsu Komiyama. 2001. Acoustical Science and Technology 22 (1): 35–39.
McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. 2013. Nature Neuroscience 16 (4): 493–98.
Medan, Yoav, Eyal Yair, and Dan Chazan. 1991. IEEE Transactions on Signal Processing 39 (1): 40–48.
Mermelstein, Paul, and C H Chen. 1976. In Pattern Recognition and Artificial Intelligence, 101:374–88. Academic Press.
Michon, Romain, and Julius O. Smith. 2011. In Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-11), 199.
Millane, R.P. 1994. Proceedings of the IEEE 82 (3): 413–28.
Moore, Brian C. J. 2007. Cochlear hearing loss: physiological, psychological and technical issues. 2. ed. Wiley series in human communication science. Chichester: Wiley.
———. 2014. Trends in Hearing 18 (September).
Moore, Brian C. J., and Brian R. Glasberg. 1983. The Journal of the Acoustical Society of America 74 (3): 750–53.
Moorer, J.A. 1974. IEEE Transactions on Acoustics, Speech and Signal Processing 22 (5): 330–38.
Morales-Cordovilla, J. A., A. M. Peinado, V. Sanchez, and J. A. Gonzalez. 2011. IEEE Transactions on Audio, Speech, and Language Processing 19 (3): 640–51.
Müller, M., D.P.W. Ellis, A. Klapuri, and G. Richard. 2011. IEEE Journal of Selected Topics in Signal Processing 5 (6): 1088–1110.
Narayan, S. Shyamla, Andrei N. Temchin, Alberto Recio, and Mario A. Ruggero. 1998. Science 282 (5395): 1882–84.
Neely, Stephen T. 1993. Journal of the Acoustical Society of America 94 (1): 137–46.
Noll, A. Michael. 1967. The Journal of the Acoustical Society of America 41 (2): 293–309.
Nordmark, Jan, and Lennart E. Fahlen. 1988. Speech Transmission Laboratory, Quarterly Progress and Status Report.
Olson, Elizabeth S. 2001. The Journal of the Acoustical Society of America 110 (1): 349–67.
Orlarey, Yann, Albert Gräf, and Stefan Kersten. 2006. In Proceedings of the 4th International Linux Audio Conference (Lac06), 39–47.
Pakarinen, Jyri, Vesa Välimäki, Federico Fontana, Victor Lazzarini, and Jonathan S. Abel. 2011. EURASIP Journal on Advances in Signal Processing 2011 (1): 940784.
Parncutt, Richard. 2005. Musikpsychologie–Das Neue Handbuch.
Parncutt, Richard, and Hans Strasburger. 1994. Perspectives of New Music 32 (2): 88–129.
Pestana, Pedro Duarte, Zheng Ma, and Joshua D Reiss. 2013. In New York, 8. Audio Engineering Society.
Plomp, Reinier, and Willem JM Levelt. 1965. The Journal of the Acoustical Society of America 38 (4): 548–60.
Rabiner, L. 1977. IEEE Transactions on Acoustics, Speech, and Signal Processing 25 (1): 24–33.
Rasch, Rudolf, and Reinier Plomp. 1999. The Psychology of Music 2: 89–112.
Reitboeck, H., and T. P. Brody. 1969. Information and Control 15 (2): 130–54.
Robinson, D. W., and R. S. Dadson. 1956. British Journal of Applied Physics 7 (5): 166.
Rouat, Jean, Yong Chun Liu, and Daniel Morissette. 1997. Speech Communication 21 (3): 191–207.
Salamon, Justin, Emilia Gomez, Daniel PW Ellis, and Gael Richard. 2014. IEEE Signal Processing Magazine 31 (2): 118–34.
Salamon, Justin, Joan Serrà, and Emilia Gómez. 2013. International Journal of Multimedia Information Retrieval 2 (1): 45–58.
Schöner, Gregor. 2002. Brain and Cognition 48 (1): 31–51.
Schroeder, Manfred R. 1961. The Journal of the Acoustical Society of America 33 (8): 1061–64.
———. 1962. Journal of the Audio Engineering Society 10 (3): 219–23.
Schroeder, Manfred R., and B. Logan. 1961. Audio, IRE Transactions on AU-9 (6): 209–14.
Serrà, Joan, Álvaro Corral, Marián Boguñá, Martín Haro, and Josep Ll Arcos. 2012. Scientific Reports 2 (July).
Sethares, William A. 1997. The Journal of the Acoustical Society of America 102 (4): 2422–31.
———. 1998. Computer Music Journal 22 (1): 56.
Sethares, William A., Andrew J. Milne, Stefan Tiedje, Anthony Prechtl, and James Plamondon. 2009. Computer Music Journal 33 (2): 71–84.
Skoe, Erika, and Nina Kraus. 2010. Ear and Hearing 31 (3): 302–24.
Slaney, Malcolm. 1998. “Auditory Toolbox.” Interval Research Corporation, Tech. Rep 10: 1998.
Slaney, M., and R. F. Lyon. 1990. In Proceedings of ICASSP, 357–360 vol.1.
Slepecky, Norma B. 1996. In The Cochlea, edited by Peter Dallos, Arthur N. Popper, and Richard R. Fay, 44–129. Springer Handbook of Auditory Research 8. Springer New York.
Smith, Evan C., and Michael S. Lewicki. 2006. Nature 439 (7079): 978–82.
Smith, Julius O. 2010. Online Tutorial: Https://Ccrma. Stanford. Edu/Jos/Aspf.
Smith, Julius O., and Romain Michon. 2011. In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11), 361–64.
Smith, Sonya T., and Richard S. Chadwick. 2011. PLoS ONE 6 (3): e18161.
Sondhi, M. 1968. IEEE Transactions on Audio and Electroacoustics 16 (2): 262–66.
Steele, Charles, Jacques Boutet de Monvel, and Sunil Puria. 2009. Journal of Mechanics of Materials and Structures 4 (4): 755–78.
Stevens, S. S., and J. Volkmann. 1940. The American Journal of Psychology 53 (3): 329–53.
Stevens, S. S., J. Volkmann, and E. B. Newman. 1937. The Journal of the Acoustical Society of America 8 (3): 185–90.
Stolzenburg, Frieder. 2015. Journal of Mathematics and Music 9 (3): 215–38.
Suzuki, Yôiti, Volker Mellert, Utz Richter, Henrik Møller, Leif Nielsen, Rhona Hellman, Kaoru Ashihara, Kenji Ozawa, and Hisashi Takeshima. 2003.
Suzuki, Yôiti, and Hisashi Takeshima. 2004. The Journal of the Acoustical Society of America 116 (2): 918.
Tan, L. N., and A. Alwan. 2011. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4464–67.
Tarnopolsky, Alex, Neville Fletcher, Lloyd Hollenberg, Benjamin Lange, John Smith, and Joe Wolfe. 2005. Nature 436 (7047): 39–39.
Terhardt, Ernst. 1974. The Journal of the Acoustical Society of America 55 (5): 1061–69.
Thompson, William Forde, and Richard Parncutt. 1997. Music Perception, 263–80.
Titchmarsh, E. C. 1926. Mathematische Zeitschrift 25 (1): 321–47.
Traunmüller, Hartmut. 1990. The Journal of the Acoustical Society of America 88 (1): 97–100.
Tymoczko, Dmitri. 2006. Science 313 (5783): 72–74.
Umesh, S., L. Cohen, and D. Nelson. 1999. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. Icassp99 (Cat. No.99CH36258), 1:217–220 vol.1.
Valimaki, V., J.D. Parker, L. Savioja, J.O. Smith, and J.S. Abel. 2012. IEEE Transactions on Audio, Speech, and Language Processing 20 (5): 1421–48.
Wagh, M.D. 1976. India, IEE-IERE Proceedings 14 (5): 185–91.
Welch, Peter D. 1967. IEEE Transactions on Audio and Electroacoustics 15 (2): 70–73.
Williamson, John, and Roderick Murray-Smith. 2002.
Xin, Jack, and Yingyong Qi. 2006. arXiv:math/0603174, March.
Young, Steve, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, Gareth Moore, Julian Odell, Dave Ollason, and Dan Povey. 2002. “The HTK Book.”
Zanette, Damián. 2008. Nature 453 (7198): 988–89.
Zwicker, E. 1961. The Journal of the Acoustical Society of America 33 (2): 248–48.
Zwislocki, J. J. 1980. The Journal of the Acoustical Society of America 67 (5): 1679–79.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.