The converse to voice fakes: generating text from speech. a.k. speech-to-text.
This is an older practice than I thought. Check out Volume 89 of Popular Science monthly: Lloyd Darling, The Marvelous Voice Typewriter for the state-of-the-art dictation machine of 1916 (PDF version).
Dictation
Speaking as a realtime interactive textual input method. See following roundups of dictation apps to start:
- Zapier dictation roundup
- the rather grimmer Linux-specific roundup.
Here are some options culled from those lists and elsewhere of vague relevence to me
- wreally transcribe has built their own in-browser speech recognizer as well as a manual transcription UI. More augmented-manual than automatic. $20/year.
- dictation.io provides a frontend to google speech recognition.
- A classic is dragon dictate.
- macOS includes dictation.
Transcribing recordings
Handy if you have a recording and you want to make it into a text thing.
- producthunt transcription options
- rev transcription is a human-powered service (USD1.25/minute)
- Vatis tech is AI-backed? USD10/hr. Output to video subtitles and identifies different speakers.
- Audioburst offers transcription as part of their podcast service. The price is a mystery.
- Tony door aims to do meeting-specific transcription. AI. 4hr/month free, thereafter USD25/month for uup to 40hr.
- descript aims to integrate editing with transcription and in particular seems to allow editing audio via editing the transcription via voice fake technology. Weaponised social media deep fake here we come. USD 10/month for 10hr/month
- The all-manual option: Type it yourself.
Automation
Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects. Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose. >
Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only). See the actions sub-package documentation for more information, including code examples.
This project is a fork of the original t4ngo/dragonfly project.
Dragonfly currently supports the following speech recognition engines:
- Dragon, a product of Nuance. All versions up to 15 (the latest) should be supported. Home, Professional Individual and previous similar editions of Dragon are supported. Other editions may work too
- Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XP
- Kaldi (under development)
- CMU Pocket Sphinx (with caveats)
No comments yet!