Provider
Asynchronous
No Custom Parameters
Applies 1 or more speech grammars with a Nuance OSR server to the specified connection in order to detect phrases being spoken in real-time.
The VoiceRecognition action allows one to detect pre-defined phrases spoken on a connection. One can also specify TTS strings or audio files to be played to the connection in conjunction with the VoiceRecognition command.
Once VoiceRecognition has successfully finished with VoiceRecognition_Complete, one can either extract all result data returned from Nuance OSR, or just extract the top-matched meaning and confidence score from the Meaning and Score event parameters. To help parse through the full result data (not just the top-matched meaning and score), a number of helper actions exist:
GetNumVoiceRecResultsReturns the number of matching results.
GetVoiceRecResultReturns the score and meaning of a result at a specified index.
GetVoiceRecResultsByMeaningReturns any results matching the specified string matching criteria for meaning.
GetVoiceRecResultsByScoreReturns any results matching the specified criteria for scores.
XmlQueryAllows one to specify user-defined XPath expressions to allow custom parsing of the results.
Two action parameters are unique to VoiceRecognition audio streaming: VoiceBargeIn and CancelOnDigit. VoiceBargeIn, if set to true, will cause any specified prompts to stop playing, although the voice recognition continues until completion. CancelOnDigit, if set to true, will cause the voice recognition command to process the audio received up until the digit push and exit with VoiceRecognition_Complete. The Meaning and Score event parameters are valid to use in this case.
The termination condition parameters on the action are a means to create a matrix of reasons that the action should stop successfully.
The following properties cover most allowable audio files that can be played by the media engine: sample rate of 6, 8, 11, sample size of 4, 8, and 16 bit, and encoding types of ulaw, alaw, pcm, and adpcm. Only mono vox and wav files are allowed.
A VoiceRecognition to a connection or a conference results in a speech resource being utilized until the action results in the VoiceRecognition_Complete event. The use of prompts still use this same speech resource instead of using an additional voice resource.
Summary of changes made in Cisco Unified Application Environment 2.4(3):
TTS string support in the Prompt1, Prompt2, and Prompt3 fields.
One can specify a string[] in the Grammar1, Grammar2, and Grammar3 fields. In effect, any number of grammars (within limits of Nuance OSR) can be specified in any or all of these fields.
One can associate a grammar with a Cisco Unified Application Designer-built application, which will make it automatically HTTP-accessible and therefore accessible by Nuance OSR. Also, one can create a grammar within an application and save the grammar to a file in an HTTP-accessible location on the Cisco Unified Application Server.
All scores and meanings returned by Nuance OSR are propogated back in the VoiceRecognition_Complete event in the VR_XMLResult event parameter (not just the highest score and corresponding meaning as before).
| Parameter Name | .NET Type | Default | Description |
|---|---|---|---|
| TermCondNonSilence | System.UInt32 | The amount of non-silence (in milliseconds) to observe before terminating the voice recognition operation.
If this condition is met, the VoiceRecognition command will result in the VoiceRecognition_Complete event. | |
| Grammar3 | System.String | A string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats, as specified in Grammar1. | |
| Grammars | System.String[] | Grammars | |
| VoiceBargeIn | System.Boolean | Indicates whether the occurrence of voice on the connection should abort any specified prompts. | |
| CancelOnDigit | System.Boolean | Indicates to stop the action successfully when a digit is entered,
returning the VoiceRecognition_Complete event. | |
| CommandTimeout | System.UInt32 | Indicates a command timeout value (in milliseconds). | |
| Volume | System.Int32 | The amount by which to modify the volume (in decibels) of audio playback. Valid values range from -10 to 10. | |
| Speed | System.Int32 | The amount by which to modify the speed of audio playback. Valid values range from -10 to 10. | |
| State | System.String | Optional user state information which is guaranteed present as the State event parameter in
VoiceRecognition_Complete or VoiceRecognition_Failed. | |
| AudioFileSampleRate | System.UInt32 | The sample rate of the audio file (in kHz).
Valid values are 6, 8, or 11. 11 should be avoided as it has a higher impact on the media engine. If not specified, the media engine configuration file defines the sample rate to use, which by default is 8. | |
| AudioFileSampleSize | System.UInt32 | The sample size used in the audio file (in bits).
Valid values are 4, 8, or 16. 4 and 16 should be avoided as each has a higher impact on the media engine. | |
| AudioFileEncoding | System.String | The encoding of the audio file: ulaw, alaw, pcm, or adpcm.
Pcm and adpcm should be avoided as each has a higher impact on the media engine. If not specified, the media engine configuration file defines the file encoding to use, which by default is ulaw. | |
| Prompt1 | System.String | A prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts. | |
| Prompt2 | System.String | A prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts. | |
| Prompt3 | System.String | A prompt field can be either an audio file name or a free-formed string which will be converted to text-to-speech. It can be specified as a string or string[] of prompts. | |
| Prompts | System.String[] | Prompts | |
| Grammar1 * | System.String | A string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats:
| |
| Grammar2 | System.String | A string or string[]. These files define the grammar rules to use when interpreting the voice input on the connection. For each specified grammar file, there are three potential formats, as specified in Grammar1. | |
| Timeout | System.Int32 | The Timeout property specifies to the Application Runtime Environment how long
to wait for a response from the provider for the current action.
The ReturnValue returned in this case is Timeout. The value must be a literal value in milliseconds. | |
| ConnectionId * | System.String | The connection to perform the VoiceRecognition on. | |
| TermCondMaxTime | System.UInt32 | The amount of time (in milliseconds) that can elapse before terminating the voice recognition operation.
If this condition is met, the VoiceRecognition command will result in the VoiceRecognition_Complete event. | |
| TermCondSilence | System.UInt32 | The amount of silence (in milliseconds) to observe before terminating the record operation.
If this condition is met, the VoiceRecognition command will result in the >VoiceRecognition_Complete event with a
TerminationCondition of silence. |
| Parameter Name | .NET Type | Description |
|---|---|---|
| ConnectionId | System.String | The value of the ConnectionId result data is the same as that specified as an action parameter.
This ConnectionId is what one would later specify in
StopMediaOperation if one were to abort the command programmatically. |
| OperationId | System.String | A unique identifier to this VoiceRecognition operation. This identifier can later be used by the
StopMediaOperation
action to stop just this particular operation on a connection, even if multiple media operations are concurrently executing on that connection. |
| ResultCode | System.String | A numeric code indicating the result status of the operation. A '0' indicates success; a positive number indicates an error. Please reference the Media Control Error Codes table for descriptions on specific error codes. |
Branch Conditions
No description.
No description.
No description.