The converse to voice fakes: generating text from speech. a.k.a. speech-to-text. We might do this in real time, to control something, or “off-line”, to turn an audio recording into text. Or something in between.
Speaking as a realtime textual input method. See following roundups of dictation apps to start:
Here are some options culled from those lists and elsewhere of vague relevance to me:
Automation and coding
Coding by voice command requires two kinds of software: a speech-recognition engine and a platform for voice coding. Dragon from Nuance, a speech-recognition software developer in Burlington, Massachusetts, is an advanced engine and is widely used for programming by voice, with Windows and Mac versions available. Windows also has its own built-in speech recognition system. On the platform side, VoiceCode by Ben Meyer and Talon by Ryan Hileman … are popular.
Two other platforms for voice programming are Caster and Aenea, the latter of which runs on Linux. Both are free and open source, and enable voice-programming functionality in Dragonfly, which is an open-source Python framework that links actions with voice commands detected by a speech-recognition engine.
Full disclosure: I am researching this because I have temporarily disabled my hands. For the moment, for my purposes, the easiest option is to use Serenade for python programming, OS speech recognition for prose typing, and to leave my other activities aside for now. If my arms were to be disabled for a longer period of time I would probably accept the learning curve of using Talon, which seems to solve more problems, at the cost of greater commitment.
One point of friction which I did not anticipate, is that most of these tools will, for various reasons, do their best to switch off any music playing any time you use them. For someone like me who can’t focus for three minutes straight without banging electro in the background this is tricky. My current workaround is to play music on a different device so I can sneak beats past my unnecessarily diligent speech recognition tools trying to control background noise. This means that I am wearing two headsets, which looks funny, but to be honest it is not the worst fashion sacrifice I have been forced to make in the course of this particular injury.
Contrariwise, if I were to try to do the speech control stuff in an open plan office, coworking space or in the family living room, it would be excruciatingly irritating for anyone else who could hear me. My current workaround, when I am annoying some innocent bystander, is to accuse them of being ablist.
Simple, low-lift intuitionistic voice recognition for coding. Includes deep integration for various languages and also various code editors including visual studio code and those jetbrains ones. Free. Simple to use.
- C / C++
The experience is very good for plain code. Editor integration is not awesome when using Jupiter, in line with the general rule that Jupiter makes everything more flaky and complicated.
Powerful hands-free input
- Voice Control — talk to your computer
- Noise Control — click with a back-beat
- Eye Tracking — mouse where you look
- Python Scripts — customize everything
🤳Talon aims to bring programming, realtime video gaming, command line, and full desktop computer proficiency to people who have limited or no use of their hands, and vastly improve productivity and wow-factor of anyone who can use a computer.
macOS High Sierra (10.13) or newer. Talon is a universal2 build with native Apple Silicon support.
Linux / X11 (Ubuntu 18.04+, and most modern distros), Wayland support is currently limited to XWayland
Windows 8 or newer
Powerful voice control - Talon comes with a free speech recognition engine, and it is also compatible with Dragon with no additional setup.
Multiple algorithms for eye tracking mouse control (depends on a single Tobii 4C, Tobii 5 or equivalent eye tracker)
Noise recognition system (pop and hiss). Many more noises coming soon.
Scriptable with Python 3 (via embedded CPython, no need to install or configure Python on your host system).
Talon is very modular and adaptable - you can use eye tracking without speech recognition, or vice versa.
Worked example: Coding with voice dictation using Talon Voice.
Dragonfly is a speech recognition framework for Python that makes it convenient to create custom commands to use with speech recognition software. It was written to make it very easy for Python macros, scripts, and applications to interface with speech recognition engines. Its design allows speech commands and grammar objects to be treated as first-class Python objects. Dragonfly can be used for general programming by voice. It is flexible enough to allow programming in any language, not just Python. It can also be used for speech-enabling applications, automating computer activities and dictating prose.
Dragonfly contains its own powerful framework for defining and executing actions. It includes actions for text input and key-stroke simulation. This framework is cross-platform, working on Windows, macOS and Linux (X11 only). See the actions sub-package documentation for more information, including code examples.
This project is a fork of the original t4ngo/dragonfly project.
Dragonfly currently supports the following speech recognition engines:
- Dragon, a product of Nuance. All versions up to 15 (the latest) should be supported. Home, Professional Individual and previous similar editions of Dragon are supported. Other editions may work too
- Windows Speech Recognition (WSR), included with Microsoft Windows Vista, Windows 7+, and freely available for Windows XP
- Kaldi (under development)
- CMU Pocket Sphinx (with caveats)
Your voice is the most efficient way to communicate. VoiceCode is a concise spoken language that controls your computer in real-time. When writing anything from emails to kernel code, to switching applications or navigating Photoshop – VoiceCode does the job faster and easier.
VoiceCode is different from other voice-command solutions in that commands can be chained and nested in any combination, allowing complex actions to be performed by a single spoken phrase.
By taking advantage of your brain’s natural aptitude for language you can control your computer more efficiently and naturally. It really feels like you’re in the future!
Handy if you have a recording and you want to make it into a text thing offline.
- producthunt transcription options
- descript aims to integrate editing with transcription and in particular seems to allow editing audio via editing the transcription via voice fake technology. Weaponised social media deep fake here we come. USD 10/month for 10hr/month
- rev transcription is a human-powered service (USD1.25/minute)
- Vatis tech is AI-backed? USD10/hr. Output to video subtitles and identifies different speakers.
- Audioburst offers transcription as part of their podcast service. The price is a mystery.
- Tony door aims to do meeting-specific transcription. AI. 4hr/month free, thereafter USD25/month for uup to 40hr.
- The all-manual option: Type it yourself.
- wreally transcribe has built their own in-browser speech recognizer as well as a manual transcription UI. More augmented-manual than automatic. $20/year.
It has been a long time since I took Phil Rose’s extravagantly weird undergraduate phonetics class, and I have forgotten much. A cheating tool:
I cannot easily see how to automate phenetic transcription, but surely that is around somewhere? Some voice transcription software may well use phonetics as an intermediate representation or even as the final output.