Jan 28, 2020

Another look at Windows 10 speech recognition


A few years ago while on "holiday", I returned from dinner to find that my laptop had bluescreened. Panic time! It was Saturday night, and I still had quite a lot of text to translate and deliver on Monday morning. And up on the highest mountain in Portugal, I wasn't sure where I could find a replacement to finish the project, which was, at least, not utterly lost, because I had put it on a memoQ Cloud server for testing. The next day I got lucky: about 50 km away there was a Worten, where I picked up a gamer laptop with lots of RAM and an SSD. Well, not so lucky, as it was a Hewlett Packard Omen, with a fan prone to failure, but that's another story....

This new laptop was my first encounter with Windows 10. I had heard that this operating system offered improved speech recognition capabilities, and since I prefer to dictate my translations and downloading the 3 GB installation file for Dragon NaturallySpeaking (DNS) from my server at the office was going to take forever, I thought I would give Windows 10 speech recognition a try. I hadn't installed my CAT tool of choice yet, so I fired up Microsoft Word and began dictating. "Not bad," I thought. Then I tried it in my translation environment, and the results were a complete disaster. So I put that mess out of my mind.

Since then there have been some notable advances in speech-to-text capabilities on a number of platforms. But the best solution for my languages (German and English) with DNS became increasingly cranky thanks to neglect of the product by Nuance. Every week I read new reports of trouble with DNS in a variety of environments in which it used to perform very well. Apple's iOS 13 was a great leap forward of sorts for speech recognition and voice-controlled editing, but the new features are only available in English, and having Voice Control activated totally screws up my otherwise rather good dictation in German and Portuguese (or any other language). And don't get me started on the crappy vocabulary addition feature, which uses text entry alone with no link to actual pronunciation. Good luck with that garbage. It's not a bad solution in Hey memoQ with the additional command features added, but iOS dictation is not completely up to reasonable professional standards yet.

I probably would have given no further thought to Windows 10's speech-to-text features if it weren't for Anthony Rudd. We've corresponded a bit since I bought his excellent book on regular expressions for translators (and there's another practical guide for us coming soon from him!), and in a recent discussion he alluded to the use of Unicode with regex as a simple way of dealing with some things another colleague was struggling with. I was intrigued by this, and so for about half a day, I ran down a rabbit hole, testing Unicode subscripts and superscripts for a variety of purposes like fixing bad OCR of footnote markers and empirical formulae, autocorrecting common expressions for subscripted variables and chemical terms, including subscripts and superscripts in term bases and much more. Fascinating and useful stuff on the whole, even if some fonts don't support it well.

And of course I looked at using these special Unicode characters in speech-to-text applications. DNS had some funky quirks (not allowing numbers in the "spoken" version of terms, for example), but it worked rather well, so I can now say "calcium nitrate formula" and get Ca(NO₃)₂ without much ado. And for some reason it occurred to me to give Windows 10 speech recognition a try, just because I was curious whether vocabulary could in fact be trained. Indeed it can, and that feature is better than iOS 13 or DNS by far.

But first I had to remember how to activate speech recognition for Windows on my laptop again. When in doubt, type what you're looking for in the search box....

Notice I've pinned Windows Speech Recognition to my taskbar on the right, which is good for quick tasks.

Gesucht, gefunden
. Unlike other speech recognition solutions, the one in Windows 10 works only for the language set for the operating system. And options there are limited to English (United States, United Kingdom, Canada, India, and Australia), French, German, Japanese, Mandarin (Chinese Simplified and Chinese Traditional) and Spanish.

I put on my trusty Plantronics earset (the best microphone I've used for dictation tasks or audio in my occasional webinars in the past year) and began to dictate, first in Microsoft Word, which had shown acceptable results in my tests long ago. I found that adding vocabulary in the Speech Dictionary (accessed via the context menu in the dictation control element shown as a graphic at the top of this post) was dead simple.

The option to record pronunciation enabled me to record non-English names and words in several languages. And sure enough, the Unicode subscripts and superscripts worked, so I can now say CO₂ (I just dictated that) to my heart's content.

I was expecting a mess when I tried to use Windows 10 speech-to-text in a CAT tool, but it was not to be. It was brilliant, actually. I tried it in my copy of SDL Trados Studio, and with the scratchpad disabled so I could dictate directly into the target it worked well. No voice-controlled editing like I'm used to with DNS in memoQ, but that DNS feature does not work in SDL Trados Studio anyway, so this is no worse. But with the scratchpad box enabled (see the screenshot below), I could use voice commands to select and correct text or perform other operations. Brilliant!

After clicking or speaking "Insert", the text will be written to the target field with the proper formatting
So users of SDL Trados Studio who translate to a target language supported by Windows 10 speech recognition are probably better off not giving their money to Nuance, which I'm told can't even be bothered to make a 64-bit version of DNS now (which probably accounts for a lot of the trouble people have with that program.

I tested Wordfast Pro 5, which seems to confuse the speech recognition tool horribly, with source text displayed in the floating bar for some odd reason. But my earlier tests of Wordfast with DNS were equally unhappy, so somehow I'm not surprised. And I didn't test the Memsource desktop editor, which took the price a few years ago for the worst-ever DNS dictation results with a CAT tool. I'll leave that to someone with a much wider masochistic streak.

But what about memoQ, my personal environment of choice for most translation work? Equally brilliant, works just the same as SDL Trados Studio. No voice control for editing without the dictation scratchpad enabled (there, DNS has an advantage in memoQ), but with the scratchpad you can use the voice commands to edit before inserting in the target text field.


Wanna see this in action? Have a look at this short demo video:


I hope that the future will bring us more language support for Windows 10 dictation (Portuguese, Russian and Arabic, please!) and that other providers (like Google, if you're listening, and Apple, which never listens to anyone anymore except to spy on them with Siri) will expand the speech-to-text features offered, particularly to include sound-linked vocabulary training and better adaptation to individual users' speech. Five years ago when I began to investigate alternatives for non-DNS languages, I expected we would have more by now, and we do, but professional needs require all providers to raise their game.

Addendum: Someone asked me if Windows Speech Recognition is a cloud resource or a locally installed one which will work without an Internet connection. It's definitely the latter. So if you have lousy bandwidth or find yourself disconnected from the Internet, you can still use speech-to-text features.

And more: I use a lot of spoken commands for keyboard shortcuts when I work, so I did a little research and testing. It seems that Windows 10 speech recognition gives full access to an application's keyboard shortcuts via voice. So in memoQ, for example, I can dictate the insertion of tags, items from the Translation Results pane and a lot more. Watch out, Nuance. Windows 10 is going to kick your Dragon's scaly butt!

1 comment:

  1. Hello. I’m also having trouble with DNS now. And it’s hard because I physically must use it. Is windows voice recognition able to move between applications or is it only for dictation into one environment? In other words, can you tell it to toggle between programs like you can with the DNS? Thanks.

    ReplyDelete

Notice to spammers: your locations are being traced and fed to the recreational target list for my new line of chemical weapon drones :-)