Applied string mangling

Regexes, parsing, tokenising etc

A.k.a. Un-natural language processing.


(Image used under CC licence from Martin Haverbeke’s Eloquent Javascript.)

A.k.a. regexes. A.k.a. “regular expressions”, from a principled origin they presumably had in the theory of syntax. However, regexes as commonly encountered encode a particular way of specifying a language, rather than some arbitrary class of regular languages.

The default flavour of string matching, available in a variety of flavours, all equally boring.

Catastrophic regex.

Because these are so ubiquitous, and useful, and boring, there are a million bikeshedded tools for interactive regex design.

  • ihateregex visualises regex and designs them interactively.
  • regexper visualizes regexes beautifully.
  • extendaclass tests and visualises regexes in PHP, python and javascript flavours.
  • Rubular is a Ruby-based regular expression editor.
  • regex101 is similar
  • regexr same

Comby is parsing/search replace thing designed for code.


The ad hoc world of regexes not cutting it? Why not generate a parser? Since every computer language out there does this, there are a lot of options. Since regexes can already parse regular languages you are probably looking for deterministic context free language parsres. I do not have much to say here, except maybe check the wikipedia list?. Why not use David Beazley’s SLY? that looks like a nice parser.