# Musical metrics and manifolds

Metrics, kernels and affinities and the spaces and topologies they induce and what they reveal about composition. This has considerable overlap with machine listening, but there I start from audio signals, and here I usually think about more “symbolic” data such as musical scores, and with rhythm, but there I care only about the time axis. There is overlap also with psychoacoustic units and auditory features which are the pure elements from which these amalgams are made.

This is a specialized area with many fine careers built on it, and many schools with long and proud histories of calling adherents of other schools wrong. A short literature search will find many worth distinctions drawn between differences, cultural versus the biological, general versus individual, the different models applicable at different time scales, of masking effects and such.

I will largely pass over these fine contrasts, in my quest for some pragmatically useful, minimally complex features for using in machine-learning algorithms, which do not require truth but functionality.

## General musically-motivated metric spaces

Chords etc.

Consider grouping chords played on a piano with some known (idealised) 16-harmonic spectrum. How can we understand the harmonic relations of chords played upon this keyboard?

We know a-priori that this projection can be naturally embedded in a Euclidean space with something less than $$16\times 12 = 192$$ dimensions, since there are at most that many spectral bands. In fact, since there are only 12 notes, we can get this down to 12 dimensions. The question is, can we get to an even smaller number of dimensions? How small? By “small” here, I mean, can I group chords on a lower number of dimensions than this in some musically interesting way? For bonus points, can I group chords into several different such maps and switch between them?

Without explanation, here is one similarity-distance embedding of all the chords using an ad hoc metric based on thinking about this question. (Brightness: The more notes in the chord, the darker. Hue: I forget.)

Question: can we use word-bag models for note adjacency representation? Since I asked that question I discovered the Chord2vec model of Walder’s lab, whose answer is “yes”. See Madjiheurem, Qu, and Walder (2016).

3 dimensional MDS 1

3 dimensional MDS 2

## To understand

• Basic group theory: What are the symmetries of the group of all pitch class co-occurrences in the 12-tet? The most common pitch class co-occurrence is a 12-vector over the Boolean algebra where a “1” in the first entry means “the note C was played”, in pos 2, “C-sharp was played” etc. Symmetries we care about are, for example, that (under equal temperament) chord harmonic relationships should be invariant under transposition - i.e. rotation of the entries of the vector. (this last is not so in the case of Sethares/Terhardt style dissonance theory, which considers only unwrapped harmonics.)

• Dmitri Tymoczko’s Geometrical Methods in Recent Music Theory

• Henjan Honing’s Musical Cognition: any good?

## Dissonance

a.k.a. “Spectral roughness”. A particular empirically-motivated metric, with good predictive performance despite its simplicity, and wilful lack of concern for the actual mechanisms of the ear and the brain, or modern nuances such as masking effects and the influence of duration on sound perception etc. Invented by Plomp and Levelt , and developed by variously, Sethares, Terhardt and Parncutt and others.

Some sources seem to distinguish roughness in the sense of Sethares from the Plomp and Levelt sense, although they use qualitatively equations. I suspect therefore that the distinction is philosophical, or possibly pointed failure to cite one another because someone said something rude at the after-conference drinkies.

An overview by Vassilakis might help, or the app based on his work by Dr Kelly Fitz.

Plomp and Levelt’s dissonance curves ().

Plomp and Levelt’s dissonance curves

Juan Sebastian Lach Lau produced some actual open-source software (DissonanceLib) that attempts to action this stuff in a musical setting. There is a slightly different version below attributed to Parncutt.

A convenient summary of both is in the DissonanceLib code.

It’s most useful for things where you are given the harmonics a priori; I’m not especially convinced about the tenability of directly inferring this metric from an audio signal (“how dissonant is this signal?”). We should be cautious about the identifiability of this statistic from signals nonparametrically e.g. windowed DTFT power-spectrogram peaks, just because beat frequency stuff is complicated and runs into the uncertainty principle. give it a go, though. Inferring dissonance between two signals known to be not dissonant might work though, or perhaps on might need parametric approaches, as in linear system identification

Dissonance an interesting measure, despite these problems, though because it is very much like a Mercer kernel, in that it constructs a distance defined on an (explicit) high-dimensional space; Also, the “nearly circular” geometry it induces is interesting; For harmonic spectra, you recover the equal-tempered 12-tone scale and the 2:1 octave by minimising dissonance between twelve notes with harmonic spectra (i.e. plucked string spectra), which is suggestive that it might do other useful things.

Also, it’s almost-everywhere differentiable with respect to your signal parameters, which makes fitting it or optimising its value easy.

David E Verotta’s thoughtfully illustrated an essay on this.

Anyway, details.

### Plomp and Levelt’s dissonance curves

Attributed to Plomp and Levelt’s , here is Sethares’ version , also summarised on Sethares’ web page.

Dissonance between two pure sinusoidal frequencies, $$f_1 \leq f_2$$, with amplitudes respectively $$v_1, v_2$$, is given by:

$d_\text{PL}(f_1,f_2, v_1,v_2) := v_1v_2\left[ \exp\left( -as(f_2-f_1) \right) - \exp\left( -bs(f_2-f_1) \right) \right]$

Where

$s=\frac{d^*}{s_1 f_1+s_2}$

and $$a=3.5, b=5.75, d^*=.24, s_1=0.21, s_2= 19$$, the constants being fit by least-squares from experimental data.

If your note has more than one frequency, one sums the pairwise dissonances of all contributing frequencies to find the total dissonance, which is not biologically plausible but seems to work ok. Other ways of working out differences between two composite sounds could be possible (Hausdorff metric etc).

This looks to me like the kind of situation where the actual details of the curve are not so important as getting the points of maximal and minimal dissonance right. Clearly we have a minimal value at $$f_1=f_2$$. We solve for the maximally dissonant frequency $$f_2$$ with respect to a fixed $$f_1, v_1, v_2$$:

\begin{aligned} -as\exp( -as(f_2-f_1) ) &= -bs\exp( -bs(f_2-f_1) )\\ a\exp( -as(f_2-f_1) ) &= b\exp( -bs(f_2-f_1) )\\ \ln a - as(f_2-f_1) &= \ln b -bs(f_2-f_1)\\ \ln a - \ln b &= as(f_2-f_1) -bs(f_2-f_1)\\ \ln a - \ln b &= s(a-b)(f_2-f_1) \\ f_2 &= f_1+\frac{\ln b - \ln a}{s(b-a)}\\ f_2 &= f_1(s_1+C)+s_2C \end{aligned}

where

$C:=\frac{\ln b - \ln a}{d^*(b-a)}$

That affine difference is reminiscent of resolvability criteria in functional bases.

### Parncutt and Barlow dissonance

Differences of exponentials are computationally tedious because of numerical concerns with large frequency values; this is suggestive of approximation by something more convenient, maybe of this form:

$d_\text{simple}(f_1,f_2,v_1, v_2):=C_1(f_2-f_1)\exp -C_2(f_2-f_1)$

The Parncutt approximation takes this approach and additionally transforms the units into heuristically preferable psychoacoustic ones.

Cribbed from Lach Lau’s source code and thesis, where he attributes it to Parncutt and Barlow, although I can’t find any actual articles by Parncutt and/or Barlow which use this. implies it might be unpublished. gives a squared version of the same formula.

For this we take frequencies $$b_1\leq b_2$$ and volumes $$s_1, s_2$$ in, respectively, barks and sones. Then

\begin{aligned} d_\text{PB}(b_1, b_2, s_1, s_2) &:=\sqrt{(s_1 s_2)}(4 ( b_2- b_1) \exp(1 - 4 ( b_2- b_1)))\\ &= \sqrt{(s_1 s_2)}(4 ( b_2- b_1) e \exp( - 4 ( b_2- b_1))) \end{aligned}

Since this scale is relative, I’m not quite sure why we have constants everywhere.

$d_\text{PB}'(b_1, b_2, s_1, s_2) := \sqrt{(s_1 s_2)}\frac{ b_2- b_1}{ \exp(b_2-b_1)}?$

Possibly in order to more closely approximate Sethares?

## Induced topologies

🏗 Nestke (2004) and Mazzola (2012). Tymozcko.

## References

Barlow, Clarence, and Henning Lohner. 1987. Computer Music Journal 11 (1): 44.
Bartlett, M. S., and J. Medhi. 1955. Biometrika 42 (1/2): 143.
Bigand, Emmanuel, and Richard Parncutt. 1999. Psychological Research 62 (4): 237–54.
Bigand, Emmanuel, Richard Parncutt, and Fred Lerdahl. 1996. Perception & Psychophysics 58 (1): 125–41.
Bigo, Louis, Jean-Louis Giavitto, and Antoine Spicher. 2011. In Proceedings of the Third International Conference on Mathematics and Computation in Music, 13–28. MCM’11. Berlin, Heidelberg: Springer-Verlag.
Bingham, Christopher, M. Godfrey, and John W. Tukey. 1967. Audio and Electroacoustics, IEEE Transactions on 15 (2): 56–66.
Bod, Rens. 2002. Journal of New Music Research 31 (1): 27–36.
Boggs, Paul T., and Janet E. Rogers. 1990. Contemporary Mathematics 112: 183–94.
Budney, Ryan, and William Sethares. 2014. Journal of Mathematics and Music 8 (1): 73–92.
Callender, Clifton, Ian Quinn, and Dmitri Tymoczko. 2008. Science (New York, N.Y.) 320 (5874): 346–48.
Cancho, Ramon Ferrer i, and Ricard V. Solé. 2003. Proceedings of the National Academy of Sciences 100 (3): 788–91.
Carlos, Wendy. 1987. Computer Music Journal 11 (1): 29–43.
Casey, M.A., R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Proceedings of the IEEE 96 (4): 668–96.
Casey, M., C. Rhodes, and M. Slaney. 2008. IEEE Transactions on Audio, Speech, and Language Processing 16 (5): 1015–28.
Cheveigné, Alain de. 2005. In Pitch, edited by Christopher J. Plack, Richard R. Fay, Andrew J. Oxenham, and Arthur N. Popper, 169–233. Springer Handbook of Auditory Research 24. Springer New York.
Cook, Norman D., Takashi X. Fujisawa, and Hiroo Konaka. 2007. Empirical Musicology Review 2 (1).
Cooper, Joel, and Russell H. Fazio. 1984. Advances in Experimental Social Psychology 17: 229–68.
Corral, Alfonso del, Teresa León, and Vicente Liern. 2009. In Mathematics and Computation in Music, edited by Elaine Chew, Adrian Childs, and Ching-Hua Chuan, 93–103. Communications in Computer and Information Science 38. Springer Berlin Heidelberg.
Cousineau, Marion, Josh H. McDermott, and Isabelle Peretz. 2012. Proceedings of the National Academy of Sciences 109 (48): 19858–63.
Demaine, Erik D., Francisco Gomez-Martin, Henk Meijer, David Rappaport, Perouz Taslakian, Godfried T. Toussaint, Terry Winograd, and David R. Wood. 2005. In CCCG, 163–66.
———. 2009. In Computational Geometry, 42:429–54.
Demopoulos, Ryan J., and Michael J. Katchabaw. 2007. Technical Report.
Du, Pan, Warren A. Kibbe, and Simon M. Lin. 2006. Bioinformatics 22 (17): 2059–65.
Duffin, R. J. 1948. Duke Mathematical Journal 15 (3): 781–85.
Ferguson, Sean, and Richard Parncutt. 2004. In Proceedings of Sound and Music Computing.
Févotte, Cédric, Nancy Bertin, and Jean-Louis Durrieu. 2008. Neural Computation 21 (3): 793–830.
Flamary, Rémi, Cédric Févotte, Nicolas Courty, and Valentin Emiya. 2016. In arXiv:1609.09799 [Cs, Stat], 703–11. Curran Associates, Inc.
Fokker, Adriaan Daniel. 1969. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen Series b-Physical Sciences 72 (3): 153.
Gashler, Mike, and Tony Martinez. 2011. In, 2617–24. IEEE.
———. 2012. Connection Science 24 (1): 57–69.
Hall, Rachel Wells. 2008. Science 320 (5874): 328–29.
Hansen, Brian. 2014. In. Ann Arbor, MI: Michigan Publishing, University of Michigan Library.
Haussler, David. 1999. Technical report, UC Santa Cruz.
Hawe, S., M. Kleinsteuber, and K. Diepold. 2013. IEEE Transactions on Image Processing 22 (6): 2138–50.
Hemmen, J. Leo van, and Andreas N. Vollmayr. 2013. Biological Cybernetics 107 (4): 491–94.
Hermes, Dik J. 1988. The Journal of the Acoustical Society of America 83 (1): 257–64.
Honingh, Aline, and Rens Bod. 2011. Journal of New Music Research 40 (1): 81–89.
Huron, David. 1994. Music Perception: An Interdisciplinary Journal 11 (3): 289–305.
Huron, David, and Richard Parncutt. 1993. Psychomusicology: A Journal of Research in Music Cognition 12 (2): 154–71.
Jewell, Michael O., Christophe Rhodes, and Mark d’Inverno. 2010. In ISMIR, 483–88. International Society for Music Information Retrieval.
Kameoka, Akio, and Mamoru Kuriyagawa. 1969a. The Journal of the Acoustical Society of America 45 (6): 1451–59.
———. 1969b. The Journal of the Acoustical Society of America 45 (6): 1460–69.
Krumhansl, Carol L. 2000. Psychological Bulletin 126 (1): 159.
Kuswanto, H. 2012. Jurnal Pendidikan Fisika Indonesia 8 (1).
Kuswanto, Heru. 2011. International Journal of Basic & Applied Sciences, August.
Leman, Marc. 1997. Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology. Vol. 1317. Springer.
Lerdahl, Fred. 1988. Music Perception: An Interdisciplinary Journal 5 (3): 315–49.
———. 1996. Music Perception: An Interdisciplinary Journal 13 (3): 319–63.
Levitin, Daniel J., Parag Chordia, and Vinod Menon. 2012. Proceedings of the National Academy of Sciences of the United States of America 109 (10): 3716–20.
Li, Guangming. 2006. In Proceedings of the 7th WSEAS International Conference on Acoustics & Music: Theory & Applications, 65–71. World Scientific and Engineering Academy and Society (WSEAS).
Macke, Jakob H., Philipp Berens, Alexander S. Ecker, Andreas S. Tolias, and Matthias Bethge. 2009. Neural Computation 21 (2): 397–423.
Madjiheurem, Sephora, Lizhen Qu, and Christian Walder. 2016.
Mazzola, Guerino. 2012. Journal of Mathematics and Music 6 (1): 49–60.
McFee, Brian, and Daniel PW Ellis. 2011. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Moustafa, Karim Abou-, Dale Schuurmans, and Frank Ferrie. 2013. In Journal of Machine Learning Research, 341–56.
Nicolls, F., and G. de Jager. 2001. In, 5:3165–68. IEEE.
Nordmark, Jan, and Lennart E. Fahlen. 1988. Speech Transmission Laboratory, Quarterly Progress and Status Report.
Park, C. G. Chul Gyu. 2004. Computational Statistics & Data Analysis 46 (4): 621–30.
Parncutt, Richard. 1989. Harmony: a psychoacoustical approach. Springer series in information sciences 19. Berlin ; New York: Springer-Verlag.
———. 1994. Music Perception: An Interdisciplinary Journal 11 (4): 409–64.
———. 1997. In Music, Gestalt, and Computing, edited by Marc Leman, 181–99. Lecture Notes in Computer Science 1317. Springer Berlin Heidelberg.
———. 2005. Musikpsychologie–Das Neue Handbuch.
———. 2013. Sound Musicianship: Understanding the Crafts of Music, 2.
Parncutt, Richard, and Graham Hair. 2011. J. Interdiscipl. Music Stud 5: 119–66.
Parncutt, Richard, and Hans Strasburger. 1994. Perspectives of New Music 32 (2): 88–129.
Perchy, Salim, and G. Sarria. 2009. In Proc. Of Smc2009, Porto, Portugal.
Plomp, Reinier, and Willem JM Levelt. 1965. The Journal of the Acoustical Society of America 38 (4): 548–60.
Rasch, Rudolf, and Reinier Plomp. 1999. The Psychology of Music 2: 89–112.
Reese, K., R. Yampolskiy, and A. Elmaghraby. 2012. In 2012 17th International Conference on Computer Games (CGAMES), 131–37. CGAMES ’12. Washington, DC, USA: IEEE Computer Society.
Reitboeck, H., and T. P. Brody. 1969. Information and Control 15 (2): 130–54.
Reuter, Christoph. 1997. In Music, Gestalt, and Computing, edited by Marc Leman, 362–74. Lecture Notes in Computer Science 1317. Springer Berlin Heidelberg.
Rohrmeier, Martin, Willem Zuidema, Geraint A. Wiggins, and Constance Scharff. 2015. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 370 (1664): 20140097.
Serrà, Joan, Álvaro Corral, Marián Boguñá, Martín Haro, and Josep Ll Arcos. 2012. Scientific Reports 2 (July).
Sethares, William A. 1997. The Journal of the Acoustical Society of America 102 (4): 2422–31.
———. 1998a. Tuning, Timbre, Spectrum, Scale. Springer London.
———. 1998b. Computer Music Journal 22 (1): 56.
Sethares, William A., Andrew J. Milne, Stefan Tiedje, Anthony Prechtl, and James Plamondon. 2009. Computer Music Journal 33 (2): 71–84.
Shiv, Vighnesh Leonardo. 2011. In Sound and Music Computing Conference, Proceedings.
Stevens, S. S., and J. Volkmann. 1940. The American Journal of Psychology 53 (3): 329–53.
Stolzenburg, Frieder. 2013. arXiv:1306.6458 [Cs], June.
Terhardt, Ernst. 1974. The Journal of the Acoustical Society of America 55 (5): 1061–69.
Terhardt, Ernst, Gerhard Stoll, and Manfred Seewann. 1982. The Journal of the Acoustical Society of America 71 (3): 679–88.
Thompson, William Forde, and Richard Parncutt. 1997. Music Perception, 263–80.
Tillmann, Barbara, Jamshed J. Bharucha, and Emmanuel Bigand. 2000. Psychological Review 107 (4): 885.
Toiviainen, Petri. 1997. In Music, Gestalt, and Computing, edited by Marc Leman, 335–50. Lecture Notes in Computer Science 1317. Springer Berlin Heidelberg.
Toussaint, Godfried. 2005. In Pattern Recognition and Data Mining, edited by Sameer Singh, Maneesha Singh, Chid Apte, and Petra Perner, 3686:18–27. Berlin, Heidelberg: Springer Berlin Heidelberg.
Toussaint, Godfried T. 2004. In ISMIR.
Tymoczko, Dmitri. 2006. Science 313 (5783): 72–74.
———. 2009a. In Mathematics and Computation in Music, edited by Elaine Chew, Adrian Childs, and Ching-Hua Chuan, 258–72. Communications in Computer and Information Science 38. Springer Berlin Heidelberg.
———. 2009b. Journal of Music Theory 53 (2): 227–54.
———. 2011a. Mathematics and Computation in Music, Springer, 297–310.
———. 2011b. A Geometry of Music: Harmony and Counterpoint in the Extended Common Practice. 1 edition. New York: Oxford University Press.
———. 2012. Journal of Music Theory 56 (1): 1–52.
———. 2013. Journal of Mathematics and Music 7 (2): 127–44.
Vassilakis, Pantelis N., and Roger A. Kendall. 2010. In IS&T/SPIE Electronic Imaging, 75270O–. International Society for Optics and Photonics.
Wagh, M.D. 1976. India, IEE-IERE Proceedings 14 (5): 185–91.
Xin, Jack, and Yingyong Qi. 2006. arXiv:math/0603174, March.
Zanette, Damiáan H. 2006. Musicae Scientiae 10 (1): 3–18.
Zhao, Zhizhen, and Amit Singer. 2013. Journal of the Optical Society of America A 30 (5): 871.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.