A.k.a. Un-natural language processing.
A.k.a. regexes. A.k.a. “regular expressions”, from a principled origin they presumably had in the theory of syntax. However, regexes as commonly encountered encode a particular way of specifying a language, rather than some arbitrary class of regular languages.
The default flavour of string matching, available in a variety of flavours, all equally boring.
Because these are so ubiquitous, and useful, and boring, there are a million bikeshedded tools for interactive regex design.
- ihateregex visualises regex and designs them interactively.
- regexper visualizes regexes beautifully.
- Rubular is a Ruby-based regular expression editor.
- regex101 is similar
- regexr same
Comby is parsing/search replace thing designed for code.
r'\b(\w+)\s+\1\b' # duplicate words (essential for this blog)
The ad hoc world of regexes not cutting it? Why not generate a parser? Since every computer language out there does this, there are a lot of options. Since regexes can already parse regular languages you are probably looking for deterministic context free language parsres. I do not have much to say here, except maybe check the wikipedia list?. Why not use David Beazley’s SLY? that looks like a nice parser.