Chirp 3 Transcription: Enhanced multilingual accuracy

Chirp 3 is the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models, designed to meet user needs based on feedback and experience. Chirp 3 provides enhanced accuracy and speed beyond previous Chirp models and provides diarization and automatic language detection.

Model details

Chirp 3: Transcription, is exclusively available within the Speech-to-Text API V2.

Model identifiers

You can use Chirp 3: Transcription just like any other model by specifying the appropriate model identifier in your recognition request when using the API or the model name while in the Google Cloud console. Specify the appropriate identifier in your recognition.

Model Model identifier
Chirp 3 chirp_3

API methods

Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods:

API version API method Support
V2 Speech.StreamingRecognize (good for streaming and real-time audio) Supported
V2 Speech.Recognize (good for audio shorter than one minute) Supported
V2 Speech.BatchRecognize (good for long audio 1 minute to 1 hour) Supported

Regional availability

Chirp 3 is available in the following Google Cloud regions, with more planned:

Google Cloud Zone Launch Readiness
us(multi-region) Preview
eu(multi-region) Preview
asia-southeast1 Preview
asia-northeast1 Preview

Using the locations API as explained here, you can find the latest list of supported Google Cloud regions, languages and locales, and features for each transcription model.

Language availability for transcription

Chirp 3 supports transcription in StreamingRecognize, Recognize, and BatchRecognize in the following languages:

Language BCP-47 Code Launch Readiness
Catalan (Spain)ca-ESGA
Chinese (Simplified, China)cmn-Hans-CNGA
Croatian (Croatia)hr-HRGA
Danish (Denmark)da-DKGA
Dutch (Netherlands)nl-NLGA
English (Australia)en-AUGA
English (United Kingdom)en-GBGA
English (India)en-INGA
English (United States)en-USGA
Finnish (Finland)fi-FIGA
French (Canada)fr-CAGA
French (France)fr-FRGA
German (Germany)de-DEGA
Greek (Greece)el-GRGA
Hindi (India)hi-INGA
Italian (Italy)it-ITGA
Japanese (Japan)ja-JPGA
Korean (Korea)ko-KRGA
Polish (Poland)pl-PLGA
Portuguese (Brazil)pt-BRGA
Portuguese (Portugal)pt-PTGA
Romanian (Romania)ro-ROGA
Russian (Russia)ru-RUGA
Spanish (Spain)es-ESGA
Spanish (United States)es-USGA
Swedish (Sweden)sv-SEGA
Turkish (Turkey)tr-TRGA
Ukrainian (Ukraine)uk-UAGA
Vietnamese (Vietnam)vi-VNGA
Arabicar-XAPreview
Arabic (Algeria)ar-DZPreview
Arabic (Bahrain)ar-BHPreview
Arabic (Egypt)ar-EGPreview
Arabic (Israel)ar-ILPreview
Arabic (Jordan)ar-JOPreview
Arabic (Kuwait)ar-KWPreview
Arabic (Lebanon)ar-LBPreview
Arabic (Mauritania)ar-MRPreview
Arabic (Morocco)ar-MAPreview
Arabic (Oman)ar-OMPreview
Arabic (Qatar)ar-QAPreview
Arabic (Saudi Arabia)ar-SAPreview
Arabic (State of Palestine)ar-PSPreview
Arabic (Syria)ar-SYPreview
Arabic (Tunisia)ar-TNPreview
Arabic (United Arab Emirates)ar-AEPreview
Arabic (Yemen)ar-YEPreview
Armenian (Armenia)hy-AMPreview
Bengali (Bangladesh)bn-BDPreview
Bengali (India)bn-INPreview
Bulgarian (Bulgaria)bg-BGPreview
Burmese (Myanmar)my-MMPreview
Central Kurdish (Iraq)ar-IQPreview
Chinese, Cantonese (Traditional Hong Kong)yue-Hant-HKPreview
Chinese, Mandarin (Traditional, Taiwan)cmn-Hant-TWPreview
Czech (Czech Republic)cs-CZPreview
English (Philippines)en-PHPreview
Estonian (Estonia)et-EEPreview
Filipino (Philippines)fil-PHPreview
Gujarati (India)gu-INPreview
Hebrew (Israel)iw-ILPreview
Hungarian (Hungary)hu-HUPreview
Indonesian (Indonesia)id-IDPreview
Kannada (India)kn-INPreview
Khmer (Cambodia)km-KHPreview
Lao (Laos)lo-LAPreview
Latvian (Latvia)lv-LVPreview
Lithuanian (Lithuania)lt-LTPreview
Malay (Malaysia)ms-MYPreview
Malayalam (India)ml-INPreview
Marathi (India)mr-INPreview
Nepali (Nepal)ne-NPPreview
Norwegian (Norway)no-NOPreview
Persian (Iran)fa-IRPreview
Serbian (Serbia)sr-RSPreview
Slovak (Slovakia)sk-SKPreview
Slovenian (Slovenia)sl-SIPreview
Spanish (Mexico)es-MXPreview
SwahiliswPreview
Tamil (India)ta-INPreview
Telugu (India)te-INPreview
Thai (Thailand)th-THPreview
Uzbek (Uzbekistan)uz-UZPreview

Language availability for diarization

Chirp 3 supports transcription and diarization only in BatchRecognize and Recognize in the following languages:

Language BCP-47 Code
Chinese (Simplified, China) cmn-Hans-CN
German (Germany) de-DE
English (United Kingdom) en-GB
English (India) en-IN
English (United States) en-US
Spanish (Spain) es-ES
Spanish (United States) es-US
French (Canada) fr-CA
French (France) fr-FR
Hindi (India) hi-IN
Italian (Italy) it-IT
Japanese (Japan) ja-JP
Korean (Korea) ko-KR
Portuguese (Brazil) pt-BR

Feature support and limitations

Chirp 3 supports the following features:

Feature Description Launch Stage
Automatic punctuation Automatically generated by the model and can be optionally disabled. Preview
Automatic capitalization Automatically generated by the model and can be optionally disabled. Preview
Utterance-level Timestamps Automatically generated by the model. Preview
Speaker Diarization Automatically identify the different speakers in a single-channel audio sample. Preview
Speech adaptation (Biasing) Provide hints to the model in the form of phrases or words to improve recognition accuracy for specific terms or proper nouns. Preview
Language-agnostic audio transcription. The model automatically infers the spoken language in your audio file and transcribes in the most prevalent language. Preview

Chirp 3 doesn't support the following features:

Feature Description
Word-level Timestamps Automatically generated by the model and can be optionally enabled, which some transcription degradation expected.
Word-level confidence scores The API returns a value, but it isn't truly a confidence score.

Transcribe using Chirp 3

Discover how to use Chirp 3 for transcription tasks.

Perform streaming speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_streaming_chirp2(
   audio_file: str
) -> cloud_speech.StreamingRecognizeResponse:
   """Transcribes audio from audio file stream using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.

   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"

   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       content = f.read()

   # In practice, stream should be a generator yielding chunks of audio data
   chunk_length = len(content) // 5
   stream = [
       content[start : start + chunk_length]
       for start in range(0, len(content), chunk_length)
   ]
   audio_requests = (
       cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
   )

   recognition_config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )
   streaming_config = cloud_speech.StreamingRecognitionConfig(
       config=recognition_config
   )
   config_request = cloud_speech.StreamingRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       streaming_config=streaming_config,
   )

   def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
       yield config
       yield from audio

   # Transcribes the audio into text
   responses_iterator = client.streaming_recognize(
       requests=requests(config_request, audio_requests)
   )
   responses = []
   for response in responses_iterator:
       responses.append(response)
       for result in response.results:
           print(f"Transcript: {result.alternatives[0].transcript}")

   return responses

Perform synchronous speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Perform batch speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp2(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input audio file.
           E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Transcribes the audio into text
   operation = client.batch_recognize(request=request)

   print("Waiting for operation to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response.results[audio_uri].transcript

Use Chirp 3 Features

Explore how you can use the latest features, with code examples:

Perform a language-agnostic transcription

Chirp 3, can automatically identify and transcribe in the dominant language spoken in the audio which is essential for multilingual applications. To achieve this set language_codes=["auto"] as indicated in the code example:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 2.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["auto"],  # Set language code to auto to detect language.
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

Perform a language-restricted transcription

Chirp 3 can automatically identify and transcribe the dominant language in an audio file. You can also condition it on specific locales you expect, for example: ["en-US", "fr-FR"], which would focus the model's resources on the most probable languages for more reliable results, as demonstrated in the code example:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 2.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US", "fr-FR"],  # Set language codes of the expected spoken locales
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

Perform transcription and speaker diarization

Use Chirp 3 for transcription and diarization tasks.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input
         audio file. E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the
         Speech-to-Text API containing the transcription results.
   """

   # Instantiates a client.
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],  # Use "auto" to detect language.
       model="chirp_3",
       features=cloud_speech.RecognitionFeatures(
           # Enable diarization by setting empty diarization configuration.
           diarization_config=cloud_speech.SpeakerDiarizationConfig(),
       ),
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Creates audio transcription job.
   operation = client.batch_recognize(request=request)

   print("Waiting for transcription job to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")
       print(f"Speakers per word: {result.alternatives[0].words}")

   return response.results[audio_uri].transcript

Improve accuracy with model adaptation

Chirp 3 can improve transcription accuracy for your specific audio using model adaptation. This lets you to provide a list of specific words and phrases, increasing the likelihood that the model will recognize them. It's especially useful for domain-specific terms, proper nouns, or unique vocabulary.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_model_adaptation(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 2 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       # Use model adaptation
       adaptation=cloud_speech.SpeechAdaptation(
         phrase_sets=[
             cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                 inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                   {
                       "value": "alphabet",
                   },
                   {
                         "value": "cell phone service",
                   }
                 ])
             )
         ]
       )
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Enable denoiser and SNR-filtering

Chirp 3 can enhance audio quality by reducing background noise and filtering out unwanted sounds before transcription. You can improve results from noisy environments by enabling the built-in denoiser and Signal-to-Noise Ratio (SNR) filtering.

Setting denoiser_audio=true can effectively help you reduce background music or noises like rain and street traffic.

You can set snr_threshold=X to control the minimum loudness of speech required for transcription. This helps to filter out non-speech audio or background noise, preventing unwanted text in your results. A higher snr_threshold means the user needs to speak louder for the model to transcribe the utterances.

SNR-filtering can be utilized in real-time streaming use cases to avoid sending unnecessary sounds to a model for transcription. A higher value for this setting means that your speech volume must be louder relative to the background noise to be sent to the transcription model.

The configuration of snr_threshold will interact with whether denoise_audio is true or false. When denoise_audio=true, background noise is removed, and speech becomes relatively clearer. The overall SNR of the audio goes up.

If your use case involves only the user's voice without others speaking, set denoise_audio=true to increase the sensitivity of SNR-filtering, which can filter out non-speech noise. If your use case involves people speaking in the background and you want to avoid transcribing background speech, consider setting denoise_audio=false and lowering the SNR threshold.

The following are recommended SNR threshold values. A reasonable snr_threshold value can be set from 0 - 1000. A value of 0 means don't filter anything, and 1000 means filter everything. Fine-tune the value if recommended setting does not work for you.

Denoise audio SNR threshold Speech sensitivity
true 10.0 high
true 20.0 medium
true 40.0 low
true 100.0 very low
false 0.5 high
false 1.0 medium
false 2.0 low
false 5.0 very low

Python

 import os

 from google.cloud.speech_v2 import SpeechClient
 from google.cloud.speech_v2.types import cloud_speech
 from google.api_core.client_options import ClientOptions

 PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
 REGION = "us"

def transcribe_sync_chirp2_with_timestamps(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 2 model of Google Cloud Speech-to-Text V2 API, providing word-level timestamps for each transcribed word.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       denoiser_config={
           denoise_audio: True,
           # Medium snr threshold
           snr_threshold: 20.0,
       }
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Use Chirp 3 in the Google Cloud console

  1. Sign up for a Google Cloud account, and create a project.
  2. Go to Speech in the Google Cloud console.
  3. If the API isn't enabled, enable the API.
  4. Make sure that you have an STT console Workspace. If you don't have a workspace, you must create a workspace.

    1. Go to the transcriptions page, and click New Transcription.

    2. Open the Workspace drop-down and click New Workspace to create a workspace for transcription.

    3. From the Create a new workspace navigation sidebar, click Browse.

    4. Click to create a new bucket.

    5. Enter a name for your bucket and click Continue.

    6. Click Create to create your Cloud Storage bucket.

    7. After the bucket is created, click Select to select your bucket for use.

    8. Click Create to finish creating your workspace for the Speech-to-Text API V2 console.

  5. Perform a transcription on your actual audio.

    Screenshot of the Speech-to-text transcription creation page, showing file selection or upload.

    From the New Transcription page, select your audio file through either upload (Local upload) or specifying an existing Cloud Storage file (Cloud storage).

  6. Click Continue to move to the Transcription options.

    1. Select the Spoken language that you plan to use for recognition with Chirp from your previously created recognizer.

    2. In the model drop-down, select chirp_3.

    3. In the Recognizer drop-down, select your newly created recognizer.

    4. Click Submit to run your first recognition request using chirp_3.

  7. View your Chirp 2 transcription result.

    1. From the Transcriptions page, click the name of the transcription to view its result.

    2. In the Transcription details page, view your transcription result, and optionally playback the audio in the browser.

What's next