VoiceRecognition

Metreos.MediaControl.VoiceRecognition

Asynchronous Callbacks

Summary

Applies 1 or more speech grammars with a Nuance OSR server to the specified connection in order to detect phrases being spoken in real-time.

Usage

The VoiceRecognition action allows one to detect pre-defined phrases spoken on a connection. One can also specify TTS strings or audio files to be played to the connection in conjunction with the VoiceRecognition command.

Once VoiceRecognition has successfully finished with VoiceRecognition_Complete, one can either extract all result data returned from Nuance OSR, or just extract the top-matched meaning and confidence score from the Meaning and Score event parameters. To help parse through the full result data (not just the top-matched meaning and score), a number of helper actions exist:

GetNumVoiceRecResults

Returns the number of matching results.

GetVoiceRecResult

Returns the score and meaning of a result at a specified index.

GetVoiceRecResultsByMeaning

Returns any results matching the specified string matching criteria for meaning.

GetVoiceRecResultsByScore

Returns any results matching the specified criteria for scores.

XmlQuery

Allows one to specify user-defined XPath expressions to allow custom parsing of the results.

Two action parameters are unique to VoiceRecognition audio streaming: VoiceBargeIn and CancelOnDigit. VoiceBargeIn, if set to true, will cause any specified prompts to stop playing, although the voice recognition continues until completion. CancelOnDigit, if set to true, will cause the voice recognition command to process the audio received up until the digit push and exit with VoiceRecognition_Complete. The Meaning and Score event parameters are valid to use in this case.

The termination condition parameters on the action are a means to create a matrix of reasons that the action should stop successfully.

Remarks

The following properties cover most allowable audio files that can be played by the media engine: sample rate of 6, 8, 11, sample size of 4, 8, and 16 bit, and encoding types of ulaw, alaw, pcm, and adpcm. Only mono vox and wav files are allowed.

A VoiceRecognition to a connection or a conference results in a speech resource being utilized until the action results in the VoiceRecognition_Complete event. The use of prompts still use this same speech resource instead of using an additional voice resource.

Summary of changes made in Cisco Unified Application Environment 2.4(3):

TTS Support

TTS string support in the Prompt1, Prompt2, and Prompt3 fields.

Increased Number of Simultaneous Grammars

One can specify a string[] in the Grammar1, Grammar2, and Grammar3 fields. In effect, any number of grammars (within limits of Nuance OSR) can be specified in any or all of these fields.

More Mechanisms for Provisioning Grammars

One can associate a grammar with a Cisco Unified Application Designer-built application, which will make it automatically HTTP-accessible and therefore accessible by Nuance OSR. Also, one can create a grammar within an application and save the grammar to a file in an HTTP-accessible location on the Cisco Unified Application Server.

Multiple Recognition Results

All scores and meanings returned by Nuance OSR are propogated back in the VoiceRecognition_Complete event in the VR_XMLResult event parameter (not just the highest score and corresponding meaning as before).

Action Parameters
Parameter Name.NET TypeDefaultDescription
TermCondNonSilenceSystem.UInt32The amount of non-silence (in milliseconds) to observe before terminating the voice recognition operation. If this condition is met, the VoiceRecognition command will result in the VoiceRecognition_Complete event.
Grammar3System.StringA string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats, as specified in Grammar1.
GrammarsSystem.String[]Grammars
VoiceBargeInSystem.BooleanIndicates whether the occurrence of voice on the connection should abort any specified prompts.
CancelOnDigitSystem.BooleanIndicates to stop the action successfully when a digit is entered, returning the VoiceRecognition_Complete event.
CommandTimeoutSystem.UInt32Indicates a command timeout value (in milliseconds).
VolumeSystem.Int32The amount by which to modify the volume (in decibels) of audio playback. Valid values range from -10 to 10.
SpeedSystem.Int32The amount by which to modify the speed of audio playback. Valid values range from -10 to 10.
StateSystem.StringOptional user state information which is guaranteed present as the State event parameter in VoiceRecognition_Complete or VoiceRecognition_Failed.
AudioFileSampleRateSystem.UInt32The sample rate of the audio file (in kHz). Valid values are 6, 8, or 11. 11 should be avoided as it has a higher impact on the media engine. If not specified, the media engine configuration file defines the sample rate to use, which by default is 8.
AudioFileSampleSizeSystem.UInt32The sample size used in the audio file (in bits). Valid values are 4, 8, or 16. 4 and 16 should be avoided as each has a higher impact on the media engine.
AudioFileEncodingSystem.StringThe encoding of the audio file: ulaw, alaw, pcm, or adpcm. Pcm and adpcm should be avoided as each has a higher impact on the media engine. If not specified, the media engine configuration file defines the file encoding to use, which by default is ulaw.
Prompt1System.StringA prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts.
Prompt2System.StringA prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts.
Prompt3System.StringA prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts.
PromptsSystem.String[]Prompts
Grammar1 *System.StringA string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats:
Static

If you specify just the grammar file name in the Cisco Unified Application Designer, such as grammar1.grxml, and if that grammar file name exists as a Voice Recognition resource in the project, then the VoiceRecognition action will internally convert that grammar1.grxml file to a URI that references the location on the Application Server at which the grammar file can be accessed with HTTP. That URI is ultimately sent to Nuance OSR, which will in turn fetch that grammar file.

Dynamic

Dynamic grammar files are grammar files created by the script, while the script is running. Typically one would create the file using the SaveDynamicGrammar, which will return the HTTP URI for that file. The variable containing this HTTP URI would then be supplied as the grammar file name.

Builtin

Some grammars are provisioned with Nuance OSR, and it is also possible to provision grammars on Nuance OSR outside of the context of a Cisco Unified Application Environment application. To specify either type of 'builtin' grammar, just specify the grammar file name, such as grammar1.grxml. However, there should be no grammar file associated with your project of the same name.

Grammar2System.StringA string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats, as specified in Grammar1.
TimeoutSystem.Int32The Timeout property specifies to the Application Runtime Environment how long to wait for a response from the provider for the current action. The ReturnValue returned in this case is Timeout. The value must be a literal value in milliseconds.
ConnectionId *System.StringThe connection to perform the VoiceRecognition on.
TermCondMaxTimeSystem.UInt32The amount of time (in milliseconds) that can elapse before terminating the voice recognition operation. If this condition is met, the VoiceRecognition command will result in the VoiceRecognition_Complete event.
TermCondSilenceSystem.UInt32The amount of silence (in milliseconds) to observe before terminating the record operation. If this condition is met, the VoiceRecognition command will result in the >VoiceRecognition_Complete event with a TerminationCondition of silence.
Result Data
Parameter Name.NET TypeDescription
ConnectionIdSystem.StringThe value of the ConnectionId result data is the same as that specified as an action parameter. This ConnectionId is what one would later specify in StopMediaOperation if one were to abort the command programmatically.
OperationIdSystem.StringA unique identifier to this VoiceRecognition operation. This identifier can later be used by the StopMediaOperation action to stop just this particular operation on a connection, even if multiple media operations are concurrently executing on that connection.
ResultCodeSystem.StringA numeric code indicating the result status of the operation. A '0' indicates success; a positive number indicates an error. Please reference the Media Control Error Codes table for descriptions on specific error codes.

Branch Conditions 

Success

No description.

Failure

No description.

Timeout

No description.