Authors
Keith Vertanen and Per Ola Kristensson
Cavendish Laboratory, University of Cambridge
JJ Thomson Avenue, Cambridge, UK
{kv277, pok21}@cam.ac.uk
Summary
This paper explores the possibility of speech recognition as a way to input text on a touchscreen device. Speech recognition has generally been blocked from entering the mobile phone arena because of the processor resources required to effectively translate from speech to text. Not only does it normally require far more resources than the processor can spare, the result is usually very erroneous (ie Google Voice voicemail transcriptions). However, despite the setbacks speech to text is very attractive as a text input method because users are already familiar with speaking so no training is involved and users can typically speak up to 200 wpm which far surpasses even the best typists (especially mobile typists).
The researchers built Parakeet to address this issue. When designing it they followed 4 design principles. First, they were to avoid cascading errors involving phase offset. They did this by using a multi-modal speech interface. Second, they needed to exploit the speech recognition hypothesis space. They did this by offering the user several choices from several recognition hypotheses. Third, they must implement efficient and practical interaction by touch. They did this by implementing an interface that only required touch so that it could be used with or without a stylus. Lastly, the design should support fragmented interaction. In other words, the device should anticipate the ADD tendencies of its user. They did this by offering text predictions and alerting the user audibly that their speech is done translating to text.
Users used the device both while seated indoors and walking outdoors to test the effectiveness and mobility of their product. They were given a fixed set of sentences between 8 and 16 words in length to say in 30 minute trials. Once they said a sentence they were to try and correct the sentence using Parakeet's correction interface before moving on to the next sentence. On average, participants completed 41 sentences inside and 27 sentences outside during the 30 minute trials.
Results indicated that participants managed an average of 18 wpm indoors and 13 wpm outdoors while utilizing correction.
Discussion
I don't really know what the average typing speed of phone texters is but I would imagine that 18wpm is a little on the slow side for texters, especially those that text frequently. Obviously, this rate would be vastly improved if the translation algorithm was more accurate, but even the almighty Google is far from achieving that feat.
Subscribe to:
Post Comments (Atom)
Text to speech, speech to text... why can't we just make up our minds and pick one?
ReplyDeletehow Parakeet work which technology it uses
ReplyDeletehow it is built
Once we get a new interface, people get more familiar with it and become crazy experts at using it - why are we trying to make it so much slower? In the time it took to read this paper, I could have sent 30 text messages and only spoken one to this translation mechanism.
ReplyDelete