Voice transcriptions and speech recognition

The converse to voice fakes: generating text from speech. a.k. speech-to-text.

This is an older practice than I thought. Check out Volume 89 of Popular Science monthly: Lloyd Darling, The Marvelous Voice Typewriter for the state-of-the-art dictation machine of 1916 (PDF version).

Phonetic transcription

It has been a long time since I took Phil Rose’s extravagantly weird undergraduate phonetics class, and I have forgotten much. Here is a cheating tool:


Speaking as a realtime interactive textual input method. See following roundups of dictation apps to start:

Here are some options culled from those lists and elsewhere of vague relevence to me

Transcribing recordings

Handy if you have a recording and you want to make it into a text thing.


Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects. Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose.

Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only). See the actions sub-package documentation for more information, including code examples.

This project is a fork of the original t4ngo/dragonfly project.

Dragonfly currently supports the following speech recognition engines:

  • Dragon, a product of Nuance. All versions up to 15 (the latest) should be supported. Home, Professional Individual and previous similar editions of Dragon are supported. Other editions may work too
  • Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XP
  • Kaldi (under development)
  • CMU Pocket Sphinx (with caveats)

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.