stringology on Dan MacKinlay
https://danmackinlay.name/tags/stringology.html
Recent content in stringology on Dan MacKinlayHugo -- gohugo.ioen-usTue, 11 May 2021 07:53:06 +1000Computational symbolic mathematics
https://danmackinlay.name/notebook/computational_symbolic_maths.html
Tue, 11 May 2021 07:53:06 +1000https://danmackinlay.name/notebook/computational_symbolic_maths.htmlHow do computational symbolic mathematics work Tools Sympy Maxima Singular Magma PARI/GP Javascript References A.k.a. computer algebra systems, which is correct but not descriptive enough to be helpful. For now I am simply noting down systems that solve particular problems for me without pretence to general usefulness.
How do computational symbolic mathematics work Long story of which I understand only tiny fragments.Arpeggiate by numbers
https://danmackinlay.name/notebook/arpeggiate_by_numbers.html
Thu, 22 Apr 2021 09:26:33 +0800https://danmackinlay.name/notebook/arpeggiate_by_numbers.htmlSonification Geometric approaches Neural approaches Composition assistants Nestup J74 Helio Orca Hookpad Odesi Intermorphic Rozeta Rapid compose Synfire Harmony Builder Roll your own Arpeggiators Constraint Composition Random ideas References Where my audio software frameworks page does more DSP, this is mostly about MIDI—choosing notes, not timbres. A cousin of generative art with machine learning, with less AI and more UX.
Sometime you don’t want to measure a chord, or hear a chord, you just want to write a chord.Mathematica
https://danmackinlay.name/notebook/mathematica.html
Tue, 06 Apr 2021 17:00:07 +1000https://danmackinlay.name/notebook/mathematica.htmlBasics Pros Cons Gotchas Substitution Pipelines Comments Typing symbols Non-commutative algebra Scope Evaluation Function wrangling and differential equations Building libraries/packages Links A computer symbolic algebra system.
Basics I’m all about open-source tools, as a rule. Mathematica is not that. But the fact remains that the best table of integrals that exists is Mathematica, that emergent epiphenomenon of the cellular automaton that implements Stephen Wolfram’s mind.Voice transcriptions and speech recognition
https://danmackinlay.name/notebook/speech_transcription.html
Sat, 30 Jan 2021 11:10:49 +1100https://danmackinlay.name/notebook/speech_transcription.htmlDictation Transcribing recordings Automation The converse to voice fakes: generating text from speech. a.k. speech-to-text.
This is an older practice than I thought. Check out Volume 89 of Popular Science monthly: Lloyd Darling, The Marvelous Voice Typewriter for the state-of-the-art dictation machine of 1916 (PDF version).
Dictation Speaking as a realtime interactive textual input method. See following roundups of dictation apps to start:
Zapier dictation roundup the rather grimmer Linux-specific roundup.Bandit problems
https://danmackinlay.name/notebook/bandit_problems.html
Fri, 16 Oct 2020 07:49:23 +1100https://danmackinlay.name/notebook/bandit_problems.htmlPseudopolitical diversion Intros Theory Practice Bandits-meet-optimisation Bandits-meet-evolution Details Delayed/sparse reward Multi-world testing Extensions Deep reinforcement learning Markov decision problems POMDP Practicalities Sequential surrogate interactive model optimisation Bandits with theory of mind References Bandit problems, Markov decision processes, a smattering of dynamic programming, game theory, optimal control, and online learning of the solutions to such problems, esp. reinforcement learning.
Learning, where you must learn an optimal action in response to your stimulus, possibly an optimal “policy” of trying different actions over time, not just an MMSE-minimal prediction from complete data.Grammatical inference
https://danmackinlay.name/notebook/grammatical_inference.html
Tue, 13 Oct 2020 10:08:51 +1100https://danmackinlay.name/notebook/grammatical_inference.htmlReferences Mathematically speaking, inferring the “formal language” which can describe a set of expressions. In the slightly looser sense used by linguists studying natural human language, discovering the syntactic rules of a given language, which is kinda the same thing but with every term sloppier, and the subject matter itself messier.
This is already a crazily complex area, and being naturally perverse, I am interested in an especially esoteric corner of it, to whit, grammars of things that aren’t speech; inferring design grammars, say, could allow you to produce more things off the same “basic plan” from some examples of the thing; look at enough trees and you know how to build the rest of the forest, that kind of thing.Natural language processing
https://danmackinlay.name/notebook/nlp.html
Thu, 01 Oct 2020 07:51:38 +1000https://danmackinlay.name/notebook/nlp.htmlWhat is NLP? Software HuggingFace SpaCy Stanza Blingfire pytorch.text pytext NLTK NLP4J Misc other References Computation language translation, parsing, search, generation and understanding.
A mare’s nest of intersecting computational philosophical and mathematical challenges (e.g. semantics, grammatical inference, learning theory) that humans seem to be able to handle subconsciously and which we therefore hope to train machines on. Moreover it is a problem of great commercial benefit so it is likely we can muster the resources to tackle it.Text data processing
https://danmackinlay.name/notebook/text_data_processing.html
Tue, 22 Sep 2020 08:05:14 +1000https://danmackinlay.name/notebook/text_data_processing.htmlGeneral Munging jq yq PowerShell Nushell pxi d2d fx awk tab Searching Getting data in a text-like format gets you a whole world of weird tools to manage and process it.
General Data Cleaner’s cookbook explicates dataframe processing by laundering through CSV/TSV and using command-line fu. Fz mentions various tools including CSV munger xsv.
Munging Here are some popular tools.
jq jq allows one to parse json instead of TSV.Applied string mangling
https://danmackinlay.name/notebook/string_mangling.html
Mon, 14 Sep 2020 17:12:39 +1000https://danmackinlay.name/notebook/string_mangling.htmlRegexp Parsers A.k.a. Un-natural language processing.
Regexp Image used under CC licence from Martin Haverbeke’s Eloquent Javascript.
A.k.a. regexes. A.k.a. “regular expressions”, from a principled origin they presumably had in the theory of syntax. However, regexes as commonly encountered encode a particular way of specifying a language, rather than some arbitrary class of regular languages.
The default flavour of string matching, available in a variety of flavours, all equally boring.Minimum description length
https://danmackinlay.name/notebook/minimum_description_length.html
Thu, 06 Aug 2020 09:31:37 +1000https://danmackinlay.name/notebook/minimum_description_length.htmlReferences A formalisation of Occam’s razor of some kind. I see it invoked in Bayes model selection.
References Arora, Sanjeev, and Yi Zhang. 2021. “Rip van Winkle’s Razor: A Simple Estimate of Overfit to Test Data.” February 25, 2021. http://arxiv.org/abs/2102.13189. Barron, A. R., and T. M. Cover. 1991. “Minimum Complexity Density Estimation.” IEEE Transactions on Information Theory 37 (4): 1034–54. https://doi.org/10.1109/18.86996. Barron, A.Combinatorics of note
https://danmackinlay.name/notebook/combinatorics.html
Sat, 18 Jul 2020 12:25:14 +1000https://danmackinlay.name/notebook/combinatorics.html Algorithmic complexity and quasi monte carlo both consider combinatorial matters too.
Jörg Arndt’s Matters Computational Knowledge geometry
https://danmackinlay.name/notebook/knowledge_topology.html
Fri, 22 May 2020 10:46:01 +1000https://danmackinlay.name/notebook/knowledge_topology.htmlWhat is the shape of collected human knowledge? To Investigate, possibly related Topic modelling in text databases Artificial chemistry Related links References See also:
Innovation Is a material basis for technology plus a knowledge topology equal to a model of technology? I suspect not - surely there are emergent effects. But there must be a relationship. Spaces of strings String dynamics Related question: What is the shape of the vocabulary of communicating people?MAPLE
https://danmackinlay.name/notebook/maple.html
Tue, 19 May 2020 08:00:51 +1000https://danmackinlay.name/notebook/maple.htmlThe other major computer symbolic algebra system (apart from Mathematica) which seems to have not quite as much traction because of… not having a messianic CEO? Having awful branding? It seems to be OK now that I look at it. In particular, it does what I expect regarding transforms of random variables.
Since everyone seems to know Mathematica, I guess I should describe it in terms of that?Statistical relational learning
https://danmackinlay.name/notebook/statistical_relational_learning.html
Mon, 27 Apr 2020 21:45:09 +1000https://danmackinlay.name/notebook/statistical_relational_learning.htmlReferences Placeholder.
I cannot help but notice that the discussions of changing probabilistic domain, and unusual assumptions about exchangability are reminiscent of inference on social graphs. Connections?
See the big book.
References Braz, Rodrigo de Salvo, Eyal Amir, and Dan Roth. 2008. “A Survey of First-Order Probabilistic Models.” In Innovations in Bayesian Networks, edited by Dawn E. Holmes and Lakhmi C. Jain, 156:289–317. Studies in Computational Intelligence.Diff/merge tools
https://danmackinlay.name/notebook/diffing.html
Mon, 09 Mar 2020 14:18:27 +1100https://danmackinlay.name/notebook/diffing.htmlDiff/merge GUIs Recursive diffs Tools to compare and harmonise folders/files.
Diff/merge GUIs Handy as a complement to, e.g. git.
Meld is an open source GUI merge tool. Free, cross-platform for Linux/Windows. a Mac fork exists.
Diffmerge is a classic cross-platform nagware merge. USD19 for a licence. Can we set up as a git merge tool.
kdiff3 is a long-well-regarded GUI, but it is somewhat hard to find in the nature of esoteric coder tools.*-omics
https://danmackinlay.name/notebook/star_omics.html
Wed, 22 Jan 2020 19:00:29 +1100https://danmackinlay.name/notebook/star_omics.htmlReferences I do not truly understand the Roche biochemical pathways poster.
Preoteomics, genomics, phenomics, connectomics. On the understanding and inference of networks of control in living systems using statistics. Generates lots of interesting problems at the nexus of various other statistical problems, like model selection, false discovery rates, causal graphs and so on.
Of course, there is a deep learning angle.
Is nilearn any good?Esoteric language zoo
https://danmackinlay.name/notebook/esolang.html
Fri, 27 Dec 2019 22:48:10 +1100https://danmackinlay.name/notebook/esolang.htmlIverson and Whitney Pure Brainfuck INTERCAL If you want to find more about the weird ends of this hobby, see retrocomputing or the esolang wiki.
The page exists mostly because I don’t think about these things often enough to remember their names but occasionally need to know them as punchlines.
Iverson and Whitney Arthur Whitney is I think the creator of the odd ill-explained k and b languages which make impressive claims about performance and unimpressive claims about community, support and longevity.Algorithmic statistics
https://danmackinlay.name/notebook/algorithmic_statistics.html
Tue, 15 Jan 2019 10:58:55 +1100https://danmackinlay.name/notebook/algorithmic_statistics.htmlInformation-based complexity theory References The intersection between probability, ignorance, and algorithms, butting up against computational complexity, coding theory, dynamical systems, ergodic theory, minimum description length and probability. When is the relation between things sufficiently unstructured that we may treat them as random? Stochastic approximations to deterministic algorithms. Kolmogorov complexity. Compressibility, Shannon information. Sideswipe at deterministic chaos. Chaotic systems treated as if stochastic. (Are “real” systems not precisely that?Models of computation
https://danmackinlay.name/notebook/models_of_computation.html
Sun, 18 Jun 2017 08:51:45 +0800https://danmackinlay.name/notebook/models_of_computation.htmlEverything is Turing-complete Weird stuff Rewriting Systems Everything is Turing-complete Surprsingly Turing Complete
Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be “accidentally Turing-complete”, like MediaWiki templates, sed or repeated regexp/find-replace commands in an editor (any form of string substitution or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a lambda calculus or a term-rewriting language or tag system eg esolangs “///” or Thue ), XSLT, Infinite Minesweeper, Dwarf Fortress3, Starcraft, Minecraft, Ant, Transport Tycoon, C++ templates & Java generics, DNA computing etc are TC but these are not surprising … On the other hand, the vein of computer security research called “weird machines” is a fertile ground of “that’s TC?Granger causation/Transfer Entropy
https://danmackinlay.name/notebook/transfer_entropy.html
Thu, 04 May 2017 00:19:48 +1000https://danmackinlay.name/notebook/transfer_entropy.htmlWhy do we care about this model of causation? Estimating from data References Transfer entropy is one way of learning the arrow of time.
tl;dr I’m not currently using Transfer Entropy so should not be taken as an expert. But I have dumped some notes here from an email I was writing to a physicist, explaining why I don’t think it is, in general, a meaningful thing to estimate from data “non-parametrically”.State space reconstruction
https://danmackinlay.name/notebook/state_space_reconstruction.html
Tue, 02 Aug 2016 11:51:55 +1000https://danmackinlay.name/notebook/state_space_reconstruction.htmlSome stuff I saw that’s maybe related Stuff that I might actually use References Disclaimer: I know next to nothing about this.
But I think it’s something like: Looking at the data from a, possibly stochastic, dynamical system. and hoping to infer cool things about the kinds of hidden states it has, in some general sense, such as some measure of statistical of computational complexity, or how complicated or “large” the underlying state space, in some convenient representation, is.Rummaging in string bags
https://danmackinlay.name/notebook/string_bags.html
Wed, 13 Jul 2016 13:50:58 +1000https://danmackinlay.name/notebook/string_bags.htmlReferences Bags of words, edit distance (as see in bioinformatics, hamming distances, cunning kernels and vector spaces over documents. Vector spaces induced by document structures. Metrics based on generation by finite state machines, *-omics Maybe co-occurrence metrics would also be useful as musical metrics? Inference complexity.
TBC.
References Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022.Text processing
https://danmackinlay.name/notebook/information_retrieval.html
Wed, 13 Jul 2016 13:50:58 +1000https://danmackinlay.name/notebook/information_retrieval.htmlSoftware References Information retrieval via string metrics. Speech tagging. Vector spaces induced by document structures, such as cosine similarit and word2vec style embeddings.
Metrics based on generation by finite state machines. Maybe co-occurrence metrics would also be useful as musical metrics? Inference complexity.
If I were to actually write this entry, it would be a big research project.
Software Luke
“Lucene is an Open Source, mature and high-performance Java search engine.Syntax
https://danmackinlay.name/notebook/syntax.html
Wed, 22 Jun 2016 09:43:15 +1000https://danmackinlay.name/notebook/syntax.htmlReferences What’s so special about speech anyway?
Sam Kriss calls the spamularity the language of god. See also Feral, Thomas Urquhart, natural language processing.
“They’re using phrase-structure grammar, long-distance dependencies. FLN recursion, at least four levels deep and I see no reason why it won’t go deeper with continued contact. […] It doesn’t have a clue what I’m saying.”
“What?”
“It doesn’t even have a clue what it’s saying back,” she added.Stream processing and reactive programming
https://danmackinlay.name/notebook/stream_processing.html
Wed, 01 Jul 2015 13:42:25 +0200https://danmackinlay.name/notebook/stream_processing.htmlCSP/ FRP/ reactive programming Javascript Python Streaming data analysis To read References Lazy bookmark for practical details to processing and transforming possibly-infinite streams of data, from signals to parse trees. Disambiguating “transducers”.
Used in parallel/offline processing of large data sets that do not fit in core, or processing things that happen in realtime such as UI.
I am imagining more general objects than singly-indexed real-valued signals; Tokens, maybe.Artificial chemistry
https://danmackinlay.name/notebook/artificial_chemistry.html
Sun, 31 May 2015 15:31:30 +0200https://danmackinlay.name/notebook/artificial_chemistry.htmlP-systems and membrane computing the Broadcast language Systems which allow interacting particles with string representations, that interact. Distributed or agent-based models for stringology. This might remind us of evolution, or chemistry, or computational learning agents or whatever. Is there a name for this family of systems?
These are popular as models for… Understanding what kind of computing nature might be doing? Or as a source of biomimetic algorithms.Computational mechanics
https://danmackinlay.name/notebook/computational_mechanics.html
Fri, 02 Jan 2015 20:33:58 +0100https://danmackinlay.name/notebook/computational_mechanics.htmlTo read To understand To read Decisional states
“This article introduces both a new algorithm for reconstructing epsilon-machines from data, as well as the decisional states. These are defined as the internal states of a system that lead to the same decision, based on a user-provided utility or pay-off function.”
CRS’s CSSR
To understand Are there actual applications of this to actual physics, or is this keyword purely the mule offspring of physics and computer science?Algebra I would like to learn
https://danmackinlay.name/notebook/group_theory.html
Sat, 22 Nov 2014 10:26:38 +0100https://danmackinlay.name/notebook/group_theory.htmlStringology Probabilistic Cycles in a random permuation Large prime factors of a random number Connection to symmetries Stringology Long story. Group theory for languages and automata.
Properties of the free group. (because of the stringology thing) Cayley Graphs. (because of the stringology thing) Probabilistic Probabilistic methods in algebra (as opposed to algebraic methods in probability).
Cycles in a random permuation The number of cycles in a random permutationDesign grammars
https://danmackinlay.name/notebook/design_grammars.html
Mon, 30 May 2011 04:36:58 +0000https://danmackinlay.name/notebook/design_grammars.htmlSee also grammatical inference, syntax.
In computer graphics these are also called “procedural design” (that being slightly more general), or L-systems.
Prusinkiewicz and Lindenmayer (the L in “L-systems” ) had success describing plants and seashells and other CGI-friendly lifeforms as grammars. Lerdahl and Jackendoff applied these ideas to music. Look around for applications to primatology, genetic programming, gene expression, dynamical systems, Barnsley et al and their fractal image compression…