Writing, speaking, and translating the future

Text to speech in ereaders: Can Roger Ebert read a bedtime story, and can he roll his r’s?

Text to speech in ereaders: Can Roger Ebert read a bedtime story, and can he roll his r's? 1

Text-to-speech(TTS)  is one of the experimental features added to the Kindle 2 and it caused a huge controversy. In this post I will discuss the controversy and  examine some of the more innovative players in text-to-speech technology.

Upon the release of the Kindle 2 the Author’s guild protested to Amazon that the feature would break the copyright of their members and perhaps cut into the sales of Audio books. The accusation seems a bit ludicrous if you compare the text to speech feature to dramatic readings of audio books, but I think the Author’s guild was trying to stake a claim against future technologies rather than Kindle 2’s TTS function.  Rather than have a messy fight over the issue, Amazon acquiesced and allowed the feature to be shut off by the copyright holder. So Amazon avoided the controversy but another controversy developed over TTS very soon thereafter.

Nine disability organizations wrote the six largest publishers arguing to retain the TTS feature. These letters can be seen at the link below.

http://www.icdri.org/legal/Kindle_Issues.htm

The controversy deepened when the DOJ got involved and asked 3 of the 6 universities in Amazon pilot to cease testing the Kindle DX for adoption. These universities were asked to not promote, purchase, or recommend the Kindle DX or any other ereader until the device was fully accessible for the blind.

http://www.justice.gov/opa/pr/2010/January/10-crt-030.html

Of course this presents a real problem for Amazon and any other company wanting to adopt TTS features to ereader devices.  The TTS feature has to have a dedicated button or a voice command that will start the feature. This will mean redesigning the product and perhaps increase manufacturing costs. It will be redesigned, but how will the larger controversy with the Author’s guild be resolved so that accessibility is ensured for these newly designed text to speech readers?  I think that this the issue to watch over the next few months and one I will take up more in depth as it develops.  But for now I’d like to focus on text to speech and where I think it could develop.

The Technology

Text-to-Speech technology is still relatively young. But the field is growing. There are many more companies focusing on Text to speech than there were just a few years ago. And Voice XML and Speech languages are becoming standardized for code.  Though the computer generated voices sound more life-like they are still in need of work. The overwhelming problem is that these systems lack the ability to inflect languages well. This leaves the systems unable to represent human emotions in speech, and thus unable to perform literature well.  So at present these systems are not a threat to audio books, but I think they could be.

Polygot TTS

My interest is in the more obscure Text to speech that allows for multiple languages to be spoken. Unfortunately these are all fairly bad. You can make your computer sound unnatural and robotic in Spanish, hindi, German, French and many other languages.  But none of these text to speech tools have made it into ereader devices just yet and we are all probably better off for it.

Just for fun I tested the Kindle’s text to speech feature with a Spanish text and it did horribly- as expected.  It made me wonder if Amazon had thought about licensing a text to speech product that would allow the TTS feature to read in a variety of languages and dialects for different locales.  Or if they just decided that it was not worth pursuing the technology for the Global edition.

The most innovative company in Text to Speech

There are a few companies that are really innovating in the text to speech field, and among them is a standout called Cereproc.  Cereproc is innovative to say the least. They are working on the largest problem of text to speech which is inflection of the automated voice. This makes their voices sound much more “alive” than other TTS technologies. And they have really invested in non-English speech patterns which would be great for ereaders or other worldwide TTS application. But their most interesting innovation they are working on is capturing the speech patterns of celebrities. And their recent efforts to give Roger Ebert his voice back- described in the NPR link below, may really shift the business model for Text to speech and bring to fruition the Author Guild’s worst nightmare. The Cereproc site also computer-generated examples of GW Bush and President Obama that sound pretty good.

http://www.npr.org/templates/story/story.php?storyId=124087291

http://www.cereproc.com/products/voices

http://www.idyacy.com/cgi-bin/bushomatic.cgi

Future uses

I can see many future uses for text to speech technology. Some are enabling like TTS for the disabled, but others are quite silly like having Dr. Seuss read your children the cat in the hat at bedtime. And I think that there may be room for a much wider adoption if celebrity voices are offered like Itunes tracks for the Kindle and other ereaders.That may very well be a new feature added in future readers. Buy the device get four celebrity voices and download more at a ridiculously low price.

I wonder if this might also change the advertising, animation, and video game voiceover market. It would start out innocently enough- dead actors could once again do VO for animated movies. Paul Lynde could come back and play Templeton for a Charlotte’s Web 2. Then live actors would use it to get more VO jobs and never actually have to record the audio. Of course there would have to be a significant discount if a company used the computer generated John Madden or Tiger Woods rather than the real person. And soon studios and video game makers would begin to curtail the use of real voice actors altogether.  Why would they bother hiring actors of they could get good quality TTS for the price of the software. No more time would be wasted editing audio recordings, or coaching voice talents.

The author’s guild may be right. In a few years these computer generated voices, especially the ones based on human voice patterns, may supplant the audio book market and many other VO actor roles.  If given the choice between a dramatic reading of William S. Burroughs reading me Naked Lunch or the voice actor the publisher chose to read the book, I’d choose Burrough’s no doubt.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.