Speech To Text

The Speech-to-text API does speech recognition and converts spoken words in audio files into written text. Internally, it uses OpenAI Whisper model.

Sample Scene

The best way to get started with the Text To Speech is to open the [Demo] Speech To Text scene in the Assets/AiToolbox/Samples/Runtime Usage folder.

The [Demo] Speech To Text scene

Select the Microphone input device and click the Record button to start recording. Once you’re done, click the Stop button to stop the recording.

To transcribe the text, press the Transcribe button. The generated text will be displayed in the Transcription field.

The settings for this scene such as the API key and the voice option can be found on the Speech To Text Game Object in the Hierarchy panel. Please select it and find the Speech To Text Demo Component on the Inspector panel.

The [Demo] Speech To Text scene’s settings (Speech To Text Demo Component)

Quick Start

To use Speech To Text in code instead of the component above, you can use the static methods of the SpeechToText class.

Request Method

This method is used to request speech-to-text conversion of the input audio clip.

public static Action Request(
    AudioClip audio,
    SpeechToTextParameters parameters,
    Action<SpeechToTextResponse> completeCallback,
    Action<long, string> failureCallback
)

Parameters:

audio: The input audio clip to convert to text.
parameters: The parameters for the speech-to-text request.
completeCallback: The callback to be called when the request is completed.
failureCallback: The callback to be called when the request fails.

CancelAllRequests Method

This method is used to cancel all pending speech-to-text requests.

public static void CancelAllRequests()

Pricing

The up-to-date Text-to-speech pricing can be found in the following chart of the OpenAI Documentation.

Having Issues?

If you have any questions or need help with the Moderation functionality in AI Toolbox, please contact us.