Rummaging in string bags

Models for language generation

2015-04-25 — 2016-07-13

music
search
semantics
stringology

This needs a better title.

Bags of words, edit distance (as seen in bioinformatics), Hamming distances, cunning kernels, and vector spaces over documents. Vector spaces induced by document structures. Metrics based on generation by finite state machines, *-omics, transformers Maybe co-occurrence metrics would also be useful as musical metrics? Inference complexity.

TBC.

Figure 1

1 Tensor decomposition

At a seminar about () I learned about an interesting formalism that connect language learning to tensor decomposition via Anandkumar et al. (), Huang and Anandkumar () and A Mathematical Framework for Transformer Circuits.

2 References

Anandkumar, Ge, Hsu, et al. 2014. Tensor Decompositions for Learning Latent Variable Models.” The Journal of Machine Learning Research.
Blei, Ng, and Jordan. 2003. Latent Dirichlet Allocation.” Journal of Machine Learning Research.
Chen, and Murfet. 2025. Modes of Sequence Models and Learning Coefficients.”
Huang, and Anandkumar. 2016. Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition.”
Olsson, Elhage, Nanda, et al. 2022. In-Context Learning and Induction Heads.”
Selinger. 2009. A Survey of Graphical Languages for Monoidal Categories.” In A Survey of Graphical Languages for Monoidal Categories. Lecture Notes in Physics.