Rummaging in string bags
Models for language generation
2015-04-25 — 2016-07-13
Wherein bags of words and string metrics are surveyed, and language learning is connected to tensor decomposition and finite‑state generation, with transformer and Hamming‑distance angles being considered.
This needs a better title.
Bags of words, edit distance (as seen in bioinformatics), Hamming distances, cunning kernels, and vector spaces over documents. Vector spaces induced by document structures. Metrics based on generation by finite state machines, *-omics, transformers Maybe co-occurrence metrics would also be useful as musical metrics? Inference complexity.
TBC.
1 Tensor decomposition
At a seminar about (Chen and Murfet 2025) I learned about an interesting formalism that connect language learning to tensor decomposition via Anandkumar et al. (2014), Huang and Anandkumar (2016) and A Mathematical Framework for Transformer Circuits.