Psychoacoustics


My proposed psychoacoustic system based on cutlery has limited update so far

Psychoacoustic units

A quick incomplete reference to pascals, Bels, erbs, Barks, sones, Hertz, semitones, Mels and whatever else I happen to need.

The actual auditory system is atrociously complex and I’m not going in to complete e.g. perceptual models here, even if I did know a stirrup from a hammer or a cochlea from a cauliflower ear. Measuring what we can perceive with our sensory apparatus is itself a complex thing, involving masking effects and variable resolution in time, space and frequency, not to mention variation between individuals.

Nonetheless, when studying audio it is worthwhile using units other than the natural-to-a-physicist Hz and Pascals even without hoping to pretend that we have found the native units of the human ear. SI units are inconvenient when studying musical metrics or machine listening because they do not closely match human perceptual difference – 50 Herz is a significant difference at a base frequency of 100 Herz, but insignificant at 2000 Hz. But how big this difference is and what it means is rather a complex and contingent question. This means that we should not be too attached to getting this one “right”, and feel free to take adequate simple approximations as the project demands.

Since my needs are machine listening features and thus computational speed and simplicity over perfection, I will wilfully and with malice ignore any fine distinctions I cannot be bothered with, regardless of how many articles have been published discussing said details. For example, I will not cover “salience”, “sonorousness” or cultural difference issues. I will also ignore issues of uncertainty principles in inferring such qualities.

Start point: physical units

SPL, Hertz, pascals.

First elaboration: Logarithmic units

This innovation is nearly universal in music studies, because of its extreme simplicity. However, it’s constantly surprising to machine listening who keep rediscovering it when they get frustrated with the FFT spectrogram. Bels/deciBels, semitones/octaves… dbV.

Next elaboration: “Cambridge” and “Munich” frequency units

Bark and ERB measures; these seem to be more common in the acoustics and psycho-acoustics community. An introduction to selected musically useful bits is given by Parncutt and Strasberger (Parncutt and Strasburger 1994).

According to Moore (2014) the key references for Barks is Zwicker “critical band” research (Zwicker 1961) extended by Brian Moore, et al. (e.g. in Moore and Glasberg (1983))

Traunmüller (1990) gives a simple rational formula to approximate the in-any-case-approximate lookup tables, as does (Moore and Glasberg 1983), and both relate these to Erbs.

Barks

Descriptions of Barks seem to start with the statement that above about 500 Hz this scale is near logarithmic in the frequency axis. Below 500 Hz the Bark scale approaches linearity. It is defined by an empirically derived table, but there are analytic approximations which seem just as good.

Traunmüller approximation for critical band rate in bark

\[ z(f) = \frac{26.81}{1+1960/f} - 0.53 \]

Lach Lau amends the formula:

\[ z'(f) = z(f) + \mathbb{I}\{z(f)>20.1\}(z(f)-20.1)* 0.22 \]

Harmut Traunmüller’s online unit conversion page can convert these for you and Dik Hermes summarises some history of how we got this way.

erbs

Newer, works better on lower frequencies. (but possibly not at very high frequencies?) Seem to be popular for analysing psychoacoustic masking effects?

Erbs are given different formulae and capitalisation depending where you look. Here’s one from (Parncutt and Strasburger 1994) for the “ERB-rate”

\[ H_p(f) = H_1\ln\left(\frac{f+f_1}{f+f_2}\right)+H_0, \]

where

\[ H_1 &=11.17 \text{ erb}\\ H_0 &=43.0 \text{ erb}\\ f_1 &= 312 \text{ Hz}\\ f_2 &= 14675 \text{ Hz} \]

Erbs themselves (which is different at the erb-rate for a given frequency?)

\[ B_e = 6.23 \times 10^{-6} f^2 + 0.09339 f + 28.52. \]

Elaboration into space: Mel frequencies

Mels are credited by Traunmüller (1990) to Beranek (1949) and by Parncutt (2005) to Stevens and Volkmann (1940).

The mel scale is not used as a metric for computing pitch distance in the present model, because it applies only to pure tones, whereas most of the tone sensations evoked by complex sonorities are of the complex variety (virtual rather than spectral pitches).

Certainly some of the ERB experiment are also done using pure tones, but maybe… Ach, I don’t even care.

Mels are common in the machine listening community, mostly through the MFCC, the Mel-frequency Cepstral Transform, which is a metric that seems to be a historically popular one to measure psychoacoustic similarity of sounds. (Davis and Mermelstein 1980; Mermelstein and Chen 1976)

Here’s one formula, the “HTK” formula.

\[ m(f) = 1127 \ln(1+f/700) \]

There are others, such as the “Slanek” formula which is much more complicated and piecewise defined. I can’t be bothered searching for details for now.

Perceptual Loudness

ISO 226:2003 Equal loudness contour image by Lindosland:

ISO 226:2003 equal-loudness contours

Sones (Stevens and Volkmann 1940) are a power-law-intensity scale. Phons, ibid, are a logarithmic intensity scale, something like the dB level of the signal filtered to match the human ear, which is close to… dbA? Something like that. But you can get more sophisticated. Keyword: Fletcher-Munson curves.

For this level of precision, the coupling of frequency and amplitude into perceptual “loudness” becomes important and they are no longer the same at different source sound frequencies via equal-loudness contours, which you can get from an actively updated ISO standard at great expense, or try to reconstruct from journals. Suzuki et al. (2003) seems to be the accepted modern version, but their report only lists graphs and is missing values in the few equations. Table-based loudness contours are available under the MIT license from the Surrey git repo, under iso226.m. Closed-form approximations for an equal loudness contour at fixed SPL are given in Suzuki and Takeshima (2004) equation 6.

When the loudness of an \(f\)-Hz comparison tone is equal to the loudness of a reference tone at 1 kHz with a sound pressure of \(p_r\), then the sound pressure of \(p_f\) at the frequency of \(f\) Hz is given by the following function:

\[ p^2_f =\frac{1}{U^2(f)}\left[(p_r^{2\alpha(f)} - p_{rt}^{2\alpha(f)}) + (U(f)p_{ft})^{2\alpha(f)}\right]^{1/\alpha(f)} \]

AFAICT they don’t define \(p_{ft}\) or \(p_{rt}\) anywhere, and I don’t have enough free attention to find a simple expression for the frequency-dependent parameters, which I think are still spline-fit. (?)

There is an excellent explanation of the point of all this – with diagrams - by Joe Wolfe.

Onwards and upwards like a Shepard tone

At this point, where we are already combining frequency and loudness, things are getting weird; we are usually measuring people’s reported subjective loudness levels for unnatural signals (pure tones), and with real signals we rapidly start running into temporal masking effects and phasing and so on.

Thankfully, we aren’t in the business of exhaustive cochlear modeling, so we can all go home now. The unhealthily curious might read (Hartmann 1997; Moore 2007) and tell me the good bits, then move onto sensory neurology.

Psychoacoustic models in lossy audio compression

Pure link dump, sorry.

Ball, Philip. 1999. “Pump up the Bass.” Nature News, August. https://doi.org/10.1038/news990708-7.

———. 2014. “Rhythm Is Heard Best in the Bass.” Nature, June. https://doi.org/10.1038/nature.2014.15481.

Bartlett, M. S., and J. Medhi. 1955. “On the Efficiency of Procedures for Smoothing Periodograms from Time Series with Continuous Spectra.” Biometrika 42 (1/2): 143. https://doi.org/10.2307/2333431.

Bauer, Benjamin B. 1970. “Octave-Band Spectral Distribution of Recorded Music.” Journal of the Audio Engineering Society 18 (2): 165–72. http://www.aes.org/e-lib/browse.cfm?elib=1513.

Bauer, B., and E. Torick. 1966. “Researches in Loudness Measurement.” IEEE Transactions on Audio and Electroacoustics 14 (3): 141–51. https://doi.org/10.1109/TAU.1966.1161864.

Benjamin, Eric. 1994. “Characteristics of Musical Signals.” In Audio Engineering Society Convention 97. Audio Engineering Society. http://www.aes.org/e-lib/browse.cfm?elib=6318.

Beranek, Leo Leroy. 1949. “Acoustic Measurements.” https://trid.trb.org/view.aspx?id=521524.

Bidelman, Gavin M., and Ananthanarayan Krishnan. 2009. “Neural Correlates of Consonance, Dissonance, and the Hierarchy of Musical Pitch in the Human Brainstem.” Journal of Neuroscience 29 (42): 13165–71. https://doi.org/10.1523/JNEUROSCI.3900-09.2009.

Bingham, Christopher, M. Godfrey, and John W. Tukey. 1967. “Modern Techniques of Power Spectrum Estimation.” Audio and Electroacoustics, IEEE Transactions on 15 (2): 56–66. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1161895.

Bridle, J. S., and M. D. Brown. 1974. “An Experimental Automatic Word Recognition System.” JSRU Report 1003 (5).

Brown, Judith C. 1991. “Calculation of a Constant Q Spectral Transform.” The Journal of the Acoustical Society of America 89 (1): 425–34. https://doi.org/10.1121/1.400476.

Cancho, Ramon Ferrer i, and Ricard V. Solé. 2003. “Least Effort and the Origins of Scaling in Human Language.” Proceedings of the National Academy of Sciences 100 (3): 788–91. https://doi.org/10.1073/pnas.0335980100.

Cariani, P. A., and B. Delgutte. 1996a. “Neural Correlates of the Pitch of Complex Tones. II. Pitch Shift, Pitch Ambiguity, Phase Invariance, Pitch Circularity, Rate Pitch, and the Dominance Region for Pitch.” Journal of Neurophysiology 76 (3): 1717–34. https://doi.org/10.1152/jn.1996.76.3.1717.

———. 1996b. “Neural Correlates of the Pitch of Complex Tones. I. Pitch and Pitch Salience.” Journal of Neurophysiology 76 (3): 1698–1716. https://doi.org/10.1152/jn.1996.76.3.1698.

Carter, G.Clifford. 1987. “Coherence and Time Delay Estimation.” Proceedings of the IEEE 75 (2): 236–55. https://doi.org/10.1109/PROC.1987.13723.

Cartwright, Julyan H. E., Diego L. González, and Oreste Piro. 1999. “Nonlinear Dynamics of the Perceived Pitch of Complex Sounds.” Physical Review Letters 82 (26): 5389–92. https://doi.org/10.1103/PhysRevLett.82.5389.

Cedolin, Leonardo, and Bertrand Delgutte. 2005. “Pitch of Complex Tones: Rate-Place and Interspike Interval Representations in the Auditory Nerve.” Journal of Neurophysiology 94 (1): 347–62. https://doi.org/10.1152/jn.01114.2004.

Cheveigné, Alain de, and Hideki Kawahara. 2002. “YIN, a Fundamental Frequency Estimator for Speech and Music.” The Journal of the Acoustical Society of America 111 (4): 1917–30. https://doi.org/10.1121/1.1458024.

Cochran, W. T., James W. Cooley, D. L. Favin, H. D. Helms, R. A. Kaenel, W. W. Lang, Jr. Maling G. C., D. E. Nelson, C. M. Rader, and Peter D. Welch. 1967. “What Is the Fast Fourier Transform?” Proceedings of the IEEE 55 (10): 1664–74. https://doi.org/10.1109/PROC.1967.5957.

Cooley, J. W., P. A. W. Lewis, and P. D. Welch. 1970. “The Application of the Fast Fourier Transform Algorithm to the Estimation of Spectra and Cross-Spectra.” Journal of Sound and Vibration 12 (3): 339–52. https://doi.org/10.1016/0022-460X(70)90076-3.

Cooper, Joel, and Russell H. Fazio. 1984. “A New Look at Dissonance.” Advances in Experimental Social Psychology 17: 229–68. http://www.researchgate.net/publication/200773026_A_new_look_at_dissonance_theory/file/9c96052dd64f60e658.pdf.

Cousineau, Marion, Josh H. McDermott, and Isabelle Peretz. 2012. “The Basis of Musical Consonance as Revealed by Congenital Amusia.” Proceedings of the National Academy of Sciences 109 (48): 19858–63. https://doi.org/10.1073/pnas.1207989109.

Dattorro, Jon. n.d. “Madaline Model of Musical Pitch Perception,” 27.

Davis, S., and P. Mermelstein. 1980. “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences.” IEEE Transactions on Acoustics, Speech, and Signal Processing 28 (4): 357–66. https://doi.org/10.1109/TASSP.1980.1163420.

Du, Pan, Warren A. Kibbe, and Simon M. Lin. 2006. “Improved Peak Detection in Mass Spectrum by Incorporating Continuous Wavelet Transform-Based Pattern Matching.” Bioinformatics 22 (17): 2059–65. https://doi.org/10.1093/bioinformatics/btl355.

Duffin, R. J. 1948. “Function Classes Invariant Under the Fourier Transform.” Duke Mathematical Journal 15 (3): 781–85. https://doi.org/10.1215/S0012-7094-48-01569-5.

Elowsson, Anders, and Anders Friberg. 2017. “Long-Term Average Spectrum in Popular Music and Its Relation to the Level of the Percussion.” In Audio Engineering Society Convention 142, 13. Audio Engineering Society.

Fastl, H., and Eberhard Zwicker. 2007. Psychoacoustics: Facts and Models. 3rd. ed. Springer Series in Information Sciences 22. Berlin ; New York: Springer.

Ferguson, Sean, and Richard Parncutt. 2004. “Composing in the Flesh: Perceptually-Informed Harmonic Syntax.” In Proceedings of Sound and Music Computing. http://www.smc-conference.net/smc04/scm04actes/P43.pdf.

Fineberg, Joshua. 2000. “Guide to the Basic Concepts and Techniques of Spectral Music.” Contemporary Music Review 19 (2): 81–113. https://doi.org/10.1080/07494460000640271.

Gerzon, M. A. 1976. “Unitary (Energy-Preserving) Multichannel Networks with Feedback.” Electronics Letters 12 (11): 278–79. https://doi.org/10.1049/el:19760215.

Godsill, S., and Manuel Davy. 2005. “Bayesian Computational Models for Inharmonicity in Musical Instruments.” In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, 283–86. IEEE. https://doi.org/10.1109/ASPAA.2005.1540225.

Gómez, Emilia, and Perfecto Herrera. 2004. “Estimating the Tonality of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies.” In ISMIR. http://www.dtic.upf.edu/~egomez/TonalDescription/GomezHerrera-ISMIR2004.pdf.

Gräf, Albert. 2010. “Term Rewriting Extension for the Faust Programming Language.” Signal 3: 6. http://lac.linuxaudio.org/2010/papers/30.pdf.

Guinan Jr., John J. 2012. “How Are Inner Hair Cells Stimulated? Evidence for Multiple Mechanical Drives.” Hearing Research 292 (1–2): 35–50. https://doi.org/10.1016/j.heares.2012.08.005.

Harris, Fredric J. 1978. “On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.” Proceedings of the IEEE 66 (1): 51–83. https://doi.org/10.1109/PROC.1978.10837.

Hartmann, William M. 1997. Signals, Sound, and Sensation. Modern Acoustics and Signal Processing. Woodbury, N.Y: American Institute of Physics.

Heikkila, Janne. 2004. “A New Class of Shift-Invariant Operators.” IEEE Signal Processing Letters 11 (6): 545–48. https://doi.org/10.1109/LSP.2004.827915.

Helmholtz, Heinrich. 1863. Die Lehre von Den Tonempfindungen Als Physiologische Grundlage Für Die Theorie Der Musik. Braunschweig: J. Vieweg.

Hennig, Holger, Ragnar Fleischmann, Anneke Fredebohm, York Hagmayer, Jan Nagler, Annette Witt, Fabian J Theis, and Theo Geisel. 2011. “The Nature and Perception of Fluctuations in Human Musical Rhythms.” PLoS ONE 6 (10): –26457. https://doi.org/10.1371/journal.pone.0026457.

Herman, Irving P. 2007. Physics of the Human Body. Biological and Medical Physics, Biomedical Engineering. Berlin ; New York: Springer.

Hermes, Dik J. 1988. “Measurement of Pitch by Subharmonic Summation.” The Journal of the Acoustical Society of America 83 (1): 257–64. https://doi.org/10.1121/1.396427.

Hove, Michael J., Céline Marie, Ian C. Bruce, and Laurel J. Trainor. 2014. “Superior Time Perception for Lower Musical Pitch Explains Why Bass-Ranged Instruments Lay down Musical Rhythms.” Proceedings of the National Academy of Sciences 111 (28): 10383–8. https://doi.org/10.1073/pnas.1402039111.

Huron, David, and Richard Parncutt. 1993. “An Improved Model of Tonality Perception Incorporating Pitch Salience and Echoic Memory.” Psychomusicology: A Journal of Research in Music Cognition 12 (2): 154–71. https://doi.org/10.1037/h0094110.

Irizarry, Rafael A. 2001. “Local Harmonic Estimation in Musical Sound Signals.” Journal of the American Statistical Association 96 (454): 357–67. https://doi.org/10.1198/016214501753168082.

Jacob, Bruce L. 1996. “Algorithmic Composition as a Model of Creativity.” Organised Sound 1 (03): 157–65. http://journals.cambridge.org/abstract_S1355771896000222.

Kameoka, Akio, and Mamoru Kuriyagawa. 1969a. “Consonance Theory Part I: Consonance of Dyads.” The Journal of the Acoustical Society of America 45 (6): 1451–9. https://doi.org/10.1121/1.1911623.

———. 1969b. “Consonance Theory Part II: Consonance of Complex Tones and Its Calculation Method.” The Journal of the Acoustical Society of America 45 (6): 1460–9. https://doi.org/10.1121/1.1911624.

Krishnan, Ananthanarayan, Yisheng Xu, Jackson T. Gandour, and Peter A. Cariani. 2004. “Human Frequency-Following Response: Representation of Pitch Contours in Chinese Tones.” Hearing Research 189 (1-2): 1–12. https://doi.org/10.1016/S0378-5955(03)00402-7.

Lahat, M., Russell J. Niederjohn, and D. Krubsack. 1987. “A Spectral Autocorrelation Method for Measurement of the Fundamental Frequency of Noise-Corrupted Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing 35 (6): 741–50. https://doi.org/10.1109/TASSP.1987.1165224.

Langner, Gerald. 1992. “Periodicity Coding in the Auditory System.” Hearing Research 60 (2): 115–42. https://doi.org/10.1016/0378-5955(92)90015-F.

Lerdahl, Fred. 1996. “Calculating Tonal Tension.” Music Perception: An Interdisciplinary Journal 13 (3): 319–63. https://doi.org/10.2307/40286174.

Li, W. 1992. “Random Texts Exhibit Zipf’s-Law-Like Word Frequency Distribution.” IEEE Transactions on Information Theory 38 (6): 1842–5. https://doi.org/10.1109/18.165464.

Licklider, J. C. R. 1951. “A Duplex Theory of Pitch Perception.” Experientia 7 (4): 128–34. https://doi.org/10.1007/BF02156143.

Lorrain, Denis. 1980. “A Panoply of Stochastic’cannons’.” Computer Music Journal 4 (1): 53–81. https://doi.org/10.2307/3679442.

Ma, Ning, Phil Green, Jon Barker, and André Coy. 2007. “Exploiting Correlogram Structure for Robust Speech Recognition with Multiple Speech Sources.” Speech Communication 49 (12): 874–91. https://doi.org/10.1016/j.specom.2007.05.003.

Manaris, Bill, Juan Romero, Penousal Machado, Dwight Krehbiel, Timothy Hirzel, Walter Pharr, and Robert B. Davis. 2005. “Zipf’s Law, Music Classification, and Aesthetics.” Computer Music Journal 29 (1): 55–69. https://doi.org/10.1162/comj.2005.29.1.55.

Masaoka, Ken’ichiro, Kazuho Ono, and Setsu Komiyama. 2001. “A Measurement of Equal-Loudness Level Contours for Tone Burst.” Acoustical Science and Technology 22 (1): 35–39. https://doi.org/10.1250/ast.22.35.

McDermott, Josh H., Michael Schemitsch, and Eero P. Simoncelli. 2013. “Summary Statistics in Auditory Perception.” Nature Neuroscience 16 (4): 493–98. https://doi.org/10.1038/nn.3347.

Medan, Yoav, Eyal Yair, and Dan Chazan. 1991. “Super Resolution Pitch Determination of Speech Signals.” IEEE Transactions on Signal Processing 39 (1): 40–48. https://doi.org/10.1109/78.80763.

Mermelstein, Paul, and CH Chen. 1976. “Distance Measures for Speech Recognition: Psychological and Instrumental.” In Pattern Recognition and Artificial Intelligence, 101:374–88. Academic Press. http://web.haskins.yale.edu/sr/SR047/SR047_07.pdf.

Michon, Romain, and Julius O. Smith. 2011. “Faust-STK: A Set of Linear and Nonlinear Physical Models for the Faust Programming Language.” In Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-11), 199. http://recherche.ircam.fr/pub/dafx11/Papers/20_e.pdf.

Millane, R. P. 1994. “Analytic Properties of the Hartley Transform and Their Implications.” Proceedings of the IEEE 82 (3): 413–28. https://doi.org/10.1109/5.272146.

Moore, Brian C. J. 2007. Cochlear Hearing Loss: Physiological, Psychological and Technical Issues. 2. ed. Wiley Series in Human Communication Science. Chichester: Wiley.

———. 2014. “Development and Current Status of the ‘Cambridge’ Loudness Models.” Trends in Hearing 18 (September). https://doi.org/10.1177/2331216514550620.

Moore, Brian C. J., and Brian R. Glasberg. 1983. “Suggested Formulae for Calculating Auditory‐filter Bandwidths and Excitation Patterns.” The Journal of the Acoustical Society of America 74 (3): 750–53. https://doi.org/10.1121/1.389861.

Moorer, J. A. 1974. “The Optimum Comb Method of Pitch Period Analysis of Continuous Digitized Speech.” IEEE Transactions on Acoustics, Speech and Signal Processing 22 (5): 330–38. https://doi.org/10.1109/TASSP.1974.1162596.

Morales-Cordovilla, J. A., A. M. Peinado, V. Sanchez, and J. A. Gonzalez. 2011. “Feature Extraction Based on Pitch-Synchronous Averaging for Robust Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing 19 (3): 640–51. https://doi.org/10.1109/TASL.2010.2053846.

Müller, M., D. P. W. Ellis, A. Klapuri, and G. Richard. 2011. “Signal Processing for Music Analysis.” IEEE Journal of Selected Topics in Signal Processing 5 (6): 1088–1110. https://doi.org/10.1109/JSTSP.2011.2112333.

Narayan, S. Shyamla, Andrei N. Temchin, Alberto Recio, and Mario A. Ruggero. 1998. “Frequency Tuning of Basilar Membrane and Auditory Nerve Fibers in the Same Cochleae.” Science 282 (5395): 1882–4. https://doi.org/10.1126/science.282.5395.1882.

Neely, Stephen T. 1993. “A Model of Cochlear Mechanics with Outer Hair Cell Motility.” Journal of the Acoustical Society of America 94 (1): 137–46. https://doi.org/10.1121/1.407091.

Noll, A. Michael. 1967. “Cepstrum Pitch Determination.” The Journal of the Acoustical Society of America 41 (2): 293–309. https://doi.org/10.1121/1.1910339.

Nordmark, Jan, and Lennart E. Fahlen. 1988. “Beat Theories of Musical Consonance.” Speech Transmission Laboratory, Quarterly Progress and Status Report. http://www.speech.kth.se/prod/publications/files/qpsr/1988/1988_29_1_111-122.pdf.

Olson, Elizabeth S. 2001. “Intracochlear Pressure Measurements Related to Cochlear Tuning.” The Journal of the Acoustical Society of America 110 (1): 349–67. https://doi.org/10.1121/1.1369098.

Orlarey, Yann, Albert Gräf, and Stefan Kersten. 2006. “DSP Programming with Faust, Q and SuperCollider.” In Proceedings of the 4th International Linux Audio Conference (LAC06), 39–47. http://lac.zkm.de/2006/papers/lac2006_proceedings.pdf#page=39.

Pakarinen, Jyri, Vesa Välimäki, Federico Fontana, Victor Lazzarini, and Jonathan S. Abel. 2011. “Recent Advances in Real-Time Musical Effects, Synthesis, and Virtual Analog Models.” EURASIP Journal on Advances in Signal Processing 2011 (1): 940784. https://doi.org/10.1155/2011/940784.

Parncutt, Richard. 2005. “Psychoacoustics and Music Perception.” Musikpsychologie–Das Neue Handbuch. http://uni-graz.at/~parncutt/PSYCHOACOUSTICS.pdf.

Parncutt, Richard, and Hans Strasburger. 1994. “Applying Psychoacoustics in Composition: "Harmonic" Progressions of "Nonharmonic" Sonorities.” Perspectives of New Music 32 (2): 88–129. https://doi.org/10.2307/833600.

Pestana, Pedro Duarte, Zheng Ma, and Joshua D Reiss. 2013. “Spectral Characteristics of Popular Commercial Recordings 1950-2010.” In New York, 8. Audio Engineering Society. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.704.5281&rep=rep1&type=pdf.

Plomp, Reinier, and Willem JM Levelt. 1965. “Tonal Consonance and Critical Bandwidth.” The Journal of the Acoustical Society of America 38 (4): 548–60. https://doi.org/10.1121/1.1909741.

Rabiner, L. 1977. “On the Use of Autocorrelation Analysis for Pitch Detection.” IEEE Transactions on Acoustics, Speech, and Signal Processing 25 (1): 24–33. https://doi.org/10.1109/TASSP.1977.1162905.

Rasch, Rudolf, and Reinier Plomp. 1999. “The Perception of Musical Tones.” The Psychology of Music 2: 89–112. http://books.google.ch/books?hl=en&lr=&id=A3jkobk4yMMC&oi=fnd&pg=PA89&dq=Plomp+Levelt+Tonal+Consonance+and+Critical+Bandwidth&ots=muKBPxdd0H&sig=XBFkdnskM1suBD2aF8ygKucqRVQ.

Reitboeck, H., and T. P. Brody. 1969. “A Transformation with Invariance Under Cyclic Permutation for Applications in Pattern Recognition.” Information and Control 15 (2): 130–54. https://doi.org/10.1016/S0019-9958(69)90387-8.

Robinson, D. W., and R. S. Dadson. 1956. “A Re-Determination of the Equal-Loudness Relations for Pure Tones.” British Journal of Applied Physics 7 (5): 166. https://doi.org/10.1088/0508-3443/7/5/302.

Rouat, Jean, Yong Chun Liu, and Daniel Morissette. 1997. “A Pitch Determination and Voiced/Unvoiced Decision Algorithm for Noisy Speech.” Speech Communication 21 (3): 191–207. http://www.sciencedirect.com/science/article/pii/S0167639397000022.

Salamon, Justin, Emilia Gomez, Daniel PW Ellis, and Gael Richard. 2014. “Melody Extraction from Polyphonic Music Signals: Approaches, Applications, and Challenges.” IEEE Signal Processing Magazine 31 (2): 118–34. https://doi.org/10.1109/MSP.2013.2271648.

Salamon, Justin, Joan Serrà, and Emilia Gómez. 2013. “Tonal Representations for Music Retrieval: From Version Identification to Query-by-Humming.” International Journal of Multimedia Information Retrieval 2 (1): 45–58. https://doi.org/10.1007/s13735-012-0026-0.

Schöner, Gregor. 2002. “Timing, Clocks, and Dynamical Systems.” Brain and Cognition 48 (1): 31–51. https://doi.org/10.1006/brcg.2001.1302.

Schroeder, Manfred R. 1961. “Improved Quasi-Stereophony and ‘Colorless’ Artificial Reverberation.” The Journal of the Acoustical Society of America 33 (8): 1061–4. https://doi.org/10.1121/1.1908892.

———. 1962. “Natural Sounding Artificial Reverberation.” Journal of the Audio Engineering Society 10 (3): 219–23. http://www.aes.org/e-lib/browse.cfm?elib=849.

Schroeder, Manfred R., and B. Logan. 1961. “"Colorless" Artificial Reverberation.” Audio, IRE Transactions on AU-9 (6): 209–14. https://doi.org/10.1109/TAU.1961.1166351.

Serrà, Joan, Álvaro Corral, Marián Boguñá, Martín Haro, and Josep Ll Arcos. 2012. “Measuring the Evolution of Contemporary Western Popular Music.” Scientific Reports 2 (July). https://doi.org/10.1038/srep00521.

Sethares, William A. 1997. “Specifying Spectra for Musical Scales.” The Journal of the Acoustical Society of America 102 (4): 2422–31. https://doi.org/10.1121/1.419604.

———. 1998. “Consonance-Based Spectral Mappings.” Computer Music Journal 22 (1): 56. https://doi.org/10.2307/3681045.

Sethares, William A., Andrew J. Milne, Stefan Tiedje, Anthony Prechtl, and James Plamondon. 2009. “Spectral Tools for Dynamic Tonality and Audio Morphing.” Computer Music Journal 33 (2): 71–84. https://doi.org/10.1162/comj.2009.33.2.71.

Skoe, Erika, and Nina Kraus. 2010. “Auditory Brainstem Response to Complex Sounds: A Tutorial.” Ear and Hearing 31 (3): 302–24. https://doi.org/10.1097/AUD.0b013e3181cdb272.

Slaney, Malcolm. 1998. “Auditory Toolbox.” Interval Research Corporation, Tech. Rep 10: 1998.

Slaney, M., and R. F. Lyon. 1990. “A Perceptual Pitch Detector.” In Proceedings of ICASSP, 357–60 vol.1. https://doi.org/10.1109/ICASSP.1990.115684.

Slepecky, Norma B. 1996. “Structure of the Mammalian Cochlea.” In The Cochlea, edited by Peter Dallos, Arthur N. Popper, and Richard R. Fay, 44–129. Springer Handbook of Auditory Research 8. Springer New York. http://link.springer.com/chapter/10.1007/978-1-4612-0757-3_2.

Smith, Evan C., and Michael S. Lewicki. 2006. “Efficient Auditory Coding.” Nature 439 (7079): 978–82. https://doi.org/10.1038/nature04485.

Smith, Julius O. 2010. “Audio Signal Processing in Faust.” Online Tutorial: Https://Ccrma. Stanford. Edu/Jos/Aspf. https://ccrma.stanford.edu/~jos/aspf/aspf.pdf.

Smith, Julius O., and Romain Michon. 2011. “Nonlinear Allpass Ladder Filters in Faust.” In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11), 361–64. http://recherche.ircam.fr/pub/dafx11/Papers/38_e.pdf.

Smith, Sonya T., and Richard S. Chadwick. 2011. “Simulation of the Response of the Inner Hair Cell Stereocilia Bundle to an Acoustical Stimulus.” PLoS ONE 6 (3): e18161. https://doi.org/10.1371/journal.pone.0018161.

Sondhi, M. 1968. “New Methods of Pitch Extraction.” IEEE Transactions on Audio and Electroacoustics 16 (2): 262–66. https://doi.org/10.1109/TAU.1968.1161986.

Steele, Charles, Jacques Boutet de Monvel, and Sunil Puria. 2009. “A Multiscale Model of the Organ of Corti.” Journal of Mechanics of Materials and Structures 4 (4): 755–78. https://doi.org/10.2140/jomms.2009.4.755.

Stevens, S. S., and J. Volkmann. 1940. “The Relation of Pitch to Frequency: A Revised Scale.” The American Journal of Psychology 53 (3): 329–53. https://doi.org/10.2307/1417526.

Stevens, S. S., J. Volkmann, and E. B. Newman. 1937. “A Scale for the Measurement of the Psychological Magnitude Pitch.” The Journal of the Acoustical Society of America 8 (3): 185–90. https://doi.org/10.1121/1.1915893.

Stolzenburg, Frieder. 2015. “Harmony Perception by Periodicity Detection.” Journal of Mathematics and Music 9 (3): 215–38. https://doi.org/10.1080/17459737.2015.1033024.

Suzuki, Yôiti, Volker Mellert, Utz Richter, Henrik Møller, Leif Nielsen, Rhona Hellman, Kaoru Ashihara, Kenji Ozawa, and Hisashi Takeshima. 2003. “Precise and Full-Range Determination of Two-Dimensional Equal Loudness Contours.” http://owl-ge.ch/IMG/pdf/is-01e.pdf.

Suzuki, Yôiti, and Hisashi Takeshima. 2004. “Equal-Loudness-Level Contours for Pure Tones.” The Journal of the Acoustical Society of America 116 (2): 918. https://doi.org/10.1121/1.1763601.

Tan, L. N., and A. Alwan. 2011. “Noise-Robust F0 Estimation Using SNR-Weighted Summary Correlograms from Multi-Band Comb Filters.” In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4464–7. https://doi.org/10.1109/ICASSP.2011.5947345.

Tarnopolsky, Alex, Neville Fletcher, Lloyd Hollenberg, Benjamin Lange, John Smith, and Joe Wolfe. 2005. “Acoustics: The Vocal Tract and the Sound of a Didgeridoo.” Nature 436 (7047): 39–39. https://doi.org/10.1038/43639a.

Terhardt, Ernst. 1974. “Pitch, Consonance, and Harmony.” The Journal of the Acoustical Society of America 55 (5): 1061–9. https://doi.org/10.1121/1.1914648.

Thompson, William Forde, and Richard Parncutt. 1997. “Perceptual Judgments of Triads and Dyads: Assessment of a Psychoacoustic Model.” Music Perception, 263–80. http://www.jstor.org/stable/40285721.

Titchmarsh, E. C. 1926. “Reciprocal Formulae Involving Series and Integrals.” Mathematische Zeitschrift 25 (1): 321–47. https://doi.org/10.1007/BF01283842.

Traunmüller, Hartmut. 1990. “Analytical Expressions for the Tonotopic Sensory Scale.” The Journal of the Acoustical Society of America 88 (1): 97–100. https://doi.org/10.1121/1.399849.

Tymoczko, Dmitri. 2006. “The Geometry of Musical Chords.” Science 313 (5783): 72–74. https://doi.org/10.1126/science.1126287.

Umesh, S., L. Cohen, and D. Nelson. 1999. “Fitting the Mel Scale.” In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1:217–20 vol.1. https://doi.org/10.1109/ICASSP.1999.758101.

Valimaki, V., J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel. 2012. “Fifty Years of Artificial Reverberation.” IEEE Transactions on Audio, Speech, and Language Processing 20 (5): 1421–48. https://doi.org/10.1109/TASL.2012.2189567.

Wagh, M. D. 1976. “Cyclic Autocorrelation as a Translation Invariant Transform.” India, IEE-IERE Proceedings 14 (5): 185–91. https://doi.org/10.1049/iipi.1976.0058.

Welch, Peter D. 1967. “The Use of Fast Fourier Transform for the Estimation of Power Spectra: A Method Based on Time Averaging over Short, Modified Periodograms.” IEEE Transactions on Audio and Electroacoustics 15 (2): 70–73. https://doi.org/10.1109/TAU.1967.1161901.

Williamson, John, and Roderick Murray-Smith. 2002. “Audio Feedback for Gesture Recognition.” https://dspace.gla.ac.uk/handle/1905/69.

Xin, Jack, and Yingyong Qi. 2006. “A Many to One Discrete Auditory Transform,” March. http://arxiv.org/abs/math/0603174.

Young, Steve, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, Gareth Moore, Julian Odell, Dave Ollason, and Dan Povey. 2002. “The HTK Book.”

Zanette, Damián. 2008. “Playing by Numbers.” Nature 453 (7198): 988–89. https://doi.org/10.1038/453988a.

Zwicker, E. 1961. “Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen).” The Journal of the Acoustical Society of America 33 (2): 248–48. https://doi.org/10.1121/1.1908630.

Zwislocki, J. J. 1980. “Symposium on Cochlear Mechanics: Where Do We Stand After 50 Years of Research?” The Journal of the Acoustical Society of America 67 (5): 1679–9. https://doi.org/10.1121/1.384293.