The converse to voice fakes: generating text from speech. a.k. speech-to-text.
It has been a long time since I took Phil Rose’s extravagantly weird undergraduate phonetics class, and I have forgotten much. Here is a cheating tool:
Speaking as a realtime interactive textual input method. See following roundups of dictation apps to start:
Here are some options culled from those lists and elsewhere of vague relevence to me
Handy if you have a recording and you want to make it into a text thing.
- producthunt transcription options
- rev transcription is a human-powered service (USD1.25/minute)
- Vatis tech is AI-backed? USD10/hr. Output to video subtitles and identifies different speakers.
- Audioburst offers transcription as part of their podcast service. The price is a mystery.
- Tony door aims to do meeting-specific transcription. AI. 4hr/month free, thereafter USD25/month for uup to 40hr.
- descript aims to integrate editing with transcription and in particular seems to allow editing audio via editing the transcription via voice fake technology. Weaponised social media deep fake here we come. USD 10/month for 10hr/month
- The all-manual option: Type it yourself.
- dictation-toolbox/dragonfly: Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects. Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose. >
Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only). See the actions sub-package documentation for more information, including code examples.
This project is a fork of the original t4ngo/dragonfly project.
Dragonfly currently supports the following speech recognition engines:
- Dragon, a product of Nuance. All versions up to 15 (the latest) should be supported. Home, Professional Individual and previous similar editions of Dragon are supported. Other editions may work too
- Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XP
- Kaldi (under development)
- CMU Pocket Sphinx (with caveats)