Jump to content

Development/Tutorials/Text-To-Speech: Difference between revisions

From KDE TechBase
fix link to the speech API
Jucato (talk | contribs)
Mark for archiving
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Archived}}
== What is Text to Speech? ==
== What is Text to Speech? ==


Line 7: Line 9:
Applications send text they wish spoken to Jovie via D-Bus. For example, in a terminal window, you can type the following commands to start Jovie and speak "Hello World".
Applications send text they wish spoken to Jovie via D-Bus. For example, in a terminal window, you can type the following commands to start Jovie and speak "Hello World".


<pre># Start Jovie (if not already running)
<syntaxhighlight lang="bash"># Start Jovie (if not already running)
jovie
jovie
# Send "Hello World" to KTTSD for speaking in English.
# Send "Hello World" to KTTSD for speaking in English.
qdbus org.kde.KSpeech /KSpeech say "Hello World" 0</pre>
qdbus org.kde.KSpeech /KSpeech say "Hello World" 0</syntaxhighlight>


=== Using speech in your application ===
=== Using speech in your application ===
Line 16: Line 18:
The above example shows how to play with speech via command-line, but to use it inside your application you'll likely want to use the dbus interface programmatically.  It's defined in kdelibs/interfaces/kspeech/org.kde.KSpeech.xml and can be used in your application by putting the following in your CMakeLists.txt:
The above example shows how to play with speech via command-line, but to use it inside your application you'll likely want to use the dbus interface programmatically.  It's defined in kdelibs/interfaces/kspeech/org.kde.KSpeech.xml and can be used in your application by putting the following in your CMakeLists.txt:


<pre>qt4_add_dbus_interfaces(my_SRCS ${KDE4_DBUS_INTERFACES_DIR}/org.kde.KSpeech.xml)</pre>
<syntaxhighlight lang="cmake">
qt4_add_dbus_interfaces(my_SRCS ${KDE4_DBUS_INTERFACES_DIR}/org.kde.KSpeech.xml)
</syntaxhighlight>


Then in the source file you want to use speech in, #include <kspeech.h> to get some of the enumerations needed (i.e. KSpeech::soPlainText).  Finally, define an object of type org::kde::KSpeech* and use it like so:
Then in the source file you want to use speech in, #include <kspeech.h> to get some of the enumerations needed (i.e. KSpeech::soPlainText).  Finally, define an object of type org::kde::KSpeech* and use it like so:


<pre>org::kde::KSpeech* kspeech = new org::kde::Kspeech("org.kde.kttsd", "/KSpeech", QDbusConnection::sessionBus());
<syntaxhighlight lang="cpp-qt">
org::kde::KSpeech* kspeech = new org::kde::Kspeech("org.kde.kttsd", "/KSpeech", QDBusConnection::sessionBus());
kspeech->setApplicationName("myappname");
kspeech->setApplicationName("myappname");
kspeech->say("text to speak", KSpeech::soPlainText);</pre>
kspeech->say("text to speak", KSpeech::soPlainText);
</syntaxhighlight>


=== The KSpeech API ===
=== The KSpeech API ===
Line 35: Line 41:
=== User Features ===
=== User Features ===


* Speak any text from the clipboard.
* Speak any text from the clipboard.
* Speak any plain text file.
* Speak any plain text file.
* Speak all or any portion of a text file from Kate, including instances where Kate is embedded in another application.
* Speak all or any portion of a text file from Kate, including instances where Kate is embedded in another application.
* Speak all or any portion of an HTML page from Konqueror.
* Speak all or any portion of an HTML page from Konqueror.
* Use as the speech backend for KMouth
* Use as the speech backend for KMouth
* Speak KDE notifications (KNotify).
* Speak KDE notifications (KNotify).
* Speech is spoken via speech-dispatcher, so any speech-dispatcher backend can be used (espeak, festival, etc.)
* Speech is spoken via speech-dispatcher, so any speech-dispatcher backend can be used (espeak, festival, etc.)
* User-configurable filters for substituting misspoken words, choosing speech synthesizers, and transforming XHMTL/XML documents.
* User-configurable filters for substituting misspoken words, choosing speech synthesizers, and transforming XHMTL/XML documents.


=== Programmer Features ===
=== Programmer Features ===


* Priority system for screen reader outputs, warnings and messages, while still playing regular texts.
* Priority system for screen reader outputs, warnings and messages, while still playing regular texts.
* Permit generation of speech from the command line (or via shell scripts) using the KDE DCOP utilities.
* Permit generation of speech from the command line (or via shell scripts) using the qdbus utilities.
* Provide a lightweight and easily usable interface for applications to generate speech output.
* Provide a lightweight and easily usable interface for applications to generate speech output.
* Applications need not be concerned about contention over the speech device.
* Applications need not be concerned about contention over the speech device.
* FUTURE: Provide support for speech markup languages, such as VoiceXML, Sable, Java Speech Markup Language (JSML), and Speech Markup Meta-language (SMML).
* FUTURE: Provide support for speech markup languages, such as VoiceXML, Sable, Java Speech Markup Language (JSML), and Speech Markup Meta-language (SMML).
* FUTURE: Provide limited support for embedded speech markers.
* FUTURE: Provide limited support for embedded speech markers.
* Asynchronous to prevent system blocking.
* Asynchronous to prevent system blocking.


It is hoped that more programmers will begin adding speech capabilities to their KDE programs using Jovie.
It is hoped that more programmers will begin adding speech capabilities to their KDE programs using Jovie.
Line 73: Line 79:
== Requirements ==
== Requirements ==


* KDE 4.4 or later.
* KDE 4.4 or later.
* speech-dispatcher version 0.6.7 or later.
* speech-dispatcher version 0.6.7 or later.
* A speech synthesizer such as espeak, festival, or flite.
* A speech synthesizer such as espeak, festival, or flite.

Latest revision as of 10:46, 16 May 2019


This page has been archived
The information on this page is outdated or no longer in use but is kept for historical purposes. Please see the Category:Archives for similar pages.

What is Text to Speech?

Jovie (previously called ktts in KDE <= 4.4) -- is a subsystem within the KDE desktop for conversion of text to audible speech. Jovie is currently under development and aims to become the standard subsystem for all KDE applications to provide speech output.

How does it work?

Applications send text they wish spoken to Jovie via D-Bus. For example, in a terminal window, you can type the following commands to start Jovie and speak "Hello World".

# Start Jovie (if not already running)
jovie
# Send "Hello World" to KTTSD for speaking in English.
qdbus org.kde.KSpeech /KSpeech say "Hello World" 0

Using speech in your application

The above example shows how to play with speech via command-line, but to use it inside your application you'll likely want to use the dbus interface programmatically. It's defined in kdelibs/interfaces/kspeech/org.kde.KSpeech.xml and can be used in your application by putting the following in your CMakeLists.txt:

qt4_add_dbus_interfaces(my_SRCS ${KDE4_DBUS_INTERFACES_DIR}/org.kde.KSpeech.xml)

Then in the source file you want to use speech in, #include <kspeech.h> to get some of the enumerations needed (i.e. KSpeech::soPlainText). Finally, define an object of type org::kde::KSpeech* and use it like so:

org::kde::KSpeech* kspeech = new org::kde::Kspeech("org.kde.kttsd", "/KSpeech", QDBusConnection::sessionBus());
kspeech->setApplicationName("myappname");
kspeech->say("text to speak", KSpeech::soPlainText);

The KSpeech API

For a complete description of these and other commands, see the KDE Text-to-Speech API.

Jovie takes care of sending the text to speech-dispatcher. Jovie is not a speech synthesis engine itself. You must install one of the compatible speech engines. Speech-dispatcher is designed with a plugin architecture that makes it easy to write new plugins for other speech engines.

Why? Who needs it?

Jovie provides a common interface for all KDE applications to use for speaking. Programmers need not concern themselves with the details of the particular speech synthesis engine(s) used.

User Features

  • Speak any text from the clipboard.
  • Speak any plain text file.
  • Speak all or any portion of a text file from Kate, including instances where Kate is embedded in another application.
  • Speak all or any portion of an HTML page from Konqueror.
  • Use as the speech backend for KMouth
  • Speak KDE notifications (KNotify).
  • Speech is spoken via speech-dispatcher, so any speech-dispatcher backend can be used (espeak, festival, etc.)
  • User-configurable filters for substituting misspoken words, choosing speech synthesizers, and transforming XHMTL/XML documents.

Programmer Features

  • Priority system for screen reader outputs, warnings and messages, while still playing regular texts.
  • Permit generation of speech from the command line (or via shell scripts) using the qdbus utilities.
  • Provide a lightweight and easily usable interface for applications to generate speech output.
  • Applications need not be concerned about contention over the speech device.
  • FUTURE: Provide support for speech markup languages, such as VoiceXML, Sable, Java Speech Markup Language (JSML), and Speech Markup Meta-language (SMML).
  • FUTURE: Provide limited support for embedded speech markers.
  • Asynchronous to prevent system blocking.

It is hoped that more programmers will begin adding speech capabilities to their KDE programs using Jovie.

Different parts

Jovie actually consists of a few programs:

Jovie

The KDE Text-to-Speech system, a system tray application that provides TTS support to KDE applications. Applications initiate TTS by making D-Bus calls to Jovie.

kcmkttsd

A KControl module for configuring the Text to Speech System. kcmkttsd runs in the KDE Control Center or start it with the command "kcmshell4 kcmkttsd" or by choosing "configure" from Jovie system tray icon's menu.

ktexteditor_kttsd

A plugin for the KDE Advanced Text Editor that permits you to speak an entire text file or any portion of a file.

libkhtmlkttsdplugin

A plugin for Konqueror that permits you to speak all or any portion of an HTML web page.

Requirements

  • KDE 4.4 or later.
  • speech-dispatcher version 0.6.7 or later.
  • A speech synthesizer such as espeak, festival, or flite.