Chirp 3 Transcription: Enhanced multilingual accuracy

Chirp 3 is the latest generation of Google's multilingual Automatic Speech Recognition (ASR)-specific generative models, designed to meet user needs based on feedback and experience. Chirp 3 provides enhanced accuracy and speed beyond previous Chirp models and provides diarization and automatic language detection.

Model details

Chirp 3: Transcription, is exclusively available within the Speech-to-Text API V2.

Model identifiers

You can use Chirp 3: Transcription just like any other model by specifying the appropriate model identifier in your recognition request when using the API or the model name while in the Google Cloud console. Specify the appropriate identifier in your recognition.

Model	Model identifier
Chirp 3	`chirp_3`

API methods

Not all recognition methods support the same language availability sets, because Chirp 3 is available in the Speech-to-Text API V2, it supports the following recognition methods:

API version	API method	Support
V2	Speech.StreamingRecognize (good for streaming and real-time audio)	Supported
V2	Speech.Recognize (good for audio shorter than one minute)	Supported
V2	Speech.BatchRecognize (good for long audio 1 minute to 1 hour)	Supported

Regional availability

Chirp 3 is available in the following Google Cloud regions, with more planned:

Google Cloud Zone	Launch Readiness
`us(multi-region)`	Preview
`eu(multi-region)`	Preview
`asia-southeast1`	Preview
`asia-northeast1`	Preview

Using the locations API as explained here, you can find the latest list of supported Google Cloud regions, languages and locales, and features for each transcription model.

Language availability for transcription

Chirp 3 supports transcription in StreamingRecognize, Recognize, and BatchRecognize in the following languages:

Language	`BCP-47 Code`	Launch Readiness
Catalan (Spain)	`ca-ES`	GA
Chinese (Simplified, China)	`cmn-Hans-CN`	GA
Croatian (Croatia)	`hr-HR`	GA
Danish (Denmark)	`da-DK`	GA
Dutch (Netherlands)	`nl-NL`	GA
English (Australia)	`en-AU`	GA
English (United Kingdom)	`en-GB`	GA
English (India)	`en-IN`	GA
English (United States)	`en-US`	GA
Finnish (Finland)	`fi-FI`	GA
French (Canada)	`fr-CA`	GA
French (France)	`fr-FR`	GA
German (Germany)	`de-DE`	GA
Greek (Greece)	`el-GR`	GA
Hindi (India)	`hi-IN`	GA
Italian (Italy)	`it-IT`	GA
Japanese (Japan)	`ja-JP`	GA
Korean (Korea)	`ko-KR`	GA
Polish (Poland)	`pl-PL`	GA
Portuguese (Brazil)	`pt-BR`	GA
Portuguese (Portugal)	`pt-PT`	GA
Romanian (Romania)	`ro-RO`	GA
Russian (Russia)	`ru-RU`	GA
Spanish (Spain)	`es-ES`	GA
Spanish (United States)	`es-US`	GA
Swedish (Sweden)	`sv-SE`	GA
Turkish (Turkey)	`tr-TR`	GA
Ukrainian (Ukraine)	`uk-UA`	GA
Vietnamese (Vietnam)	`vi-VN`	GA
Arabic	`ar-XA`	Preview
Arabic (Algeria)	`ar-DZ`	Preview
Arabic (Bahrain)	`ar-BH`	Preview
Arabic (Egypt)	`ar-EG`	Preview
Arabic (Israel)	`ar-IL`	Preview
Arabic (Jordan)	`ar-JO`	Preview
Arabic (Kuwait)	`ar-KW`	Preview
Arabic (Lebanon)	`ar-LB`	Preview
Arabic (Mauritania)	`ar-MR`	Preview
Arabic (Morocco)	`ar-MA`	Preview
Arabic (Oman)	`ar-OM`	Preview
Arabic (Qatar)	`ar-QA`	Preview
Arabic (Saudi Arabia)	`ar-SA`	Preview
Arabic (State of Palestine)	`ar-PS`	Preview
Arabic (Syria)	`ar-SY`	Preview
Arabic (Tunisia)	`ar-TN`	Preview
Arabic (United Arab Emirates)	`ar-AE`	Preview
Arabic (Yemen)	`ar-YE`	Preview
Armenian (Armenia)	`hy-AM`	Preview
Bengali (Bangladesh)	`bn-BD`	Preview
Bengali (India)	`bn-IN`	Preview
Bulgarian (Bulgaria)	`bg-BG`	Preview
Burmese (Myanmar)	`my-MM`	Preview
Central Kurdish (Iraq)	`ar-IQ`	Preview
Chinese, Cantonese (Traditional Hong Kong)	`yue-Hant-HK`	Preview
Chinese, Mandarin (Traditional, Taiwan)	`cmn-Hant-TW`	Preview
Czech (Czech Republic)	`cs-CZ`	Preview
English (Philippines)	`en-PH`	Preview
Estonian (Estonia)	`et-EE`	Preview
Filipino (Philippines)	`fil-PH`	Preview
Gujarati (India)	`gu-IN`	Preview
Hebrew (Israel)	`iw-IL`	Preview
Hungarian (Hungary)	`hu-HU`	Preview
Indonesian (Indonesia)	`id-ID`	Preview
Kannada (India)	`kn-IN`	Preview
Khmer (Cambodia)	`km-KH`	Preview
Lao (Laos)	`lo-LA`	Preview
Latvian (Latvia)	`lv-LV`	Preview
Lithuanian (Lithuania)	`lt-LT`	Preview
Malay (Malaysia)	`ms-MY`	Preview
Malayalam (India)	`ml-IN`	Preview
Marathi (India)	`mr-IN`	Preview
Nepali (Nepal)	`ne-NP`	Preview
Norwegian (Norway)	`no-NO`	Preview
Persian (Iran)	`fa-IR`	Preview
Serbian (Serbia)	`sr-RS`	Preview
Slovak (Slovakia)	`sk-SK`	Preview
Slovenian (Slovenia)	`sl-SI`	Preview
Spanish (Mexico)	`es-MX`	Preview
Swahili	`sw`	Preview
Tamil (India)	`ta-IN`	Preview
Telugu (India)	`te-IN`	Preview
Thai (Thailand)	`th-TH`	Preview
Uzbek (Uzbekistan)	`uz-UZ`	Preview

Language availability for diarization

Chirp 3 supports transcription and diarization only in BatchRecognize and Recognize in the following languages:

Language	BCP-47 Code
Chinese (Simplified, China)	cmn-Hans-CN
German (Germany)	de-DE
English (United Kingdom)	en-GB
English (India)	en-IN
English (United States)	en-US
Spanish (Spain)	es-ES
Spanish (United States)	es-US
French (Canada)	fr-CA
French (France)	fr-FR
Hindi (India)	hi-IN
Italian (Italy)	it-IT
Japanese (Japan)	ja-JP
Korean (Korea)	ko-KR
Portuguese (Brazil)	pt-BR

Feature support and limitations

Chirp 3 supports the following features:

Feature	Description	Launch Stage
Automatic punctuation	Automatically generated by the model and can be optionally disabled.	Preview
Automatic capitalization	Automatically generated by the model and can be optionally disabled.	Preview
Utterance-level Timestamps	Automatically generated by the model.	Preview
Speaker Diarization	Automatically identify the different speakers in a single-channel audio sample.	Preview
Speech adaptation (Biasing)	Provide hints to the model in the form of phrases or words to improve recognition accuracy for specific terms or proper nouns.	Preview
Language-agnostic audio transcription.	The model automatically infers the spoken language in your audio file and transcribes in the most prevalent language.	Preview

Chirp 3 doesn't support the following features:

Feature	Description
Word-level Timestamps	Automatically generated by the model and can be optionally enabled, which some transcription degradation expected.
Word-level confidence scores	The API returns a value, but it isn't truly a confidence score.

Transcribe using Chirp 3

Discover how to use Chirp 3 for transcription tasks.

Perform streaming speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_streaming_chirp2(
   audio_file: str
) -> cloud_speech.StreamingRecognizeResponse:
   """Transcribes audio from audio file stream using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.

   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"

   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API V2 containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       content = f.read()

   # In practice, stream should be a generator yielding chunks of audio data
   chunk_length = len(content) // 5
   stream = [
       content[start : start + chunk_length]
       for start in range(0, len(content), chunk_length)
   ]
   audio_requests = (
       cloud_speech.StreamingRecognizeRequest(audio=audio) for audio in stream
   )

   recognition_config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )
   streaming_config = cloud_speech.StreamingRecognitionConfig(
       config=recognition_config
   )
   config_request = cloud_speech.StreamingRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       streaming_config=streaming_config,
   )

   def requests(config: cloud_speech.RecognitionConfig, audio: list) -> list:
       yield config
       yield from audio

   # Transcribes the audio into text
   responses_iterator = client.streaming_recognize(
       requests=requests(config_request, audio_requests)
   )
   responses = []
   for response in responses_iterator:
       responses.append(response)
       for result in response.results:
           print(f"Transcript: {result.alternatives[0].transcript}")

   return responses

Perform synchronous speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Perform batch speech recognition

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp2(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 2 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input audio file.
           E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Transcribes the audio into text
   operation = client.batch_recognize(request=request)

   print("Waiting for operation to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response.results[audio_uri].transcript

Use Chirp 3 Features

Explore how you can use the latest features, with code examples:

Perform a language-agnostic transcription

Chirp 3, can automatically identify and transcribe in the dominant language spoken in the audio which is essential for multilingual applications. To achieve this set language_codes=["auto"] as indicated in the code example:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 2.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["auto"],  # Set language code to auto to detect language.
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

Perform a language-restricted transcription

Chirp 3 can automatically identify and transcribe the dominant language in an audio file. You can also condition it on specific locales you expect, for example: ["en-US", "fr-FR"], which would focus the model's resources on the most probable languages for more reliable results, as demonstrated in the code example:

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_auto_detect_language(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file and auto-detect spoken language using Chirp 2.
   Please see https://cloud.google.com/speech-to-text/v2/docs/encoding for more
   information on which audio encodings are supported.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """
   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US", "fr-FR"],  # Set language codes of the expected spoken locales
       model="chirp_3",
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")

   return response

Perform transcription and speaker diarization

Use Chirp 3 for transcription and diarization tasks.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_batch_chirp3(
   audio_uri: str,
) -> cloud_speech.BatchRecognizeResults:
   """Transcribes an audio file from a Google Cloud Storage URI using the Chirp 3 model of Google Cloud Speech-to-Text V2 API.
   Args:
       audio_uri (str): The Google Cloud Storage URI of the input
         audio file. E.g., gs://[BUCKET]/[FILE]
   Returns:
       cloud_speech.RecognizeResponse: The response from the
         Speech-to-Text API containing the transcription results.
   """

   # Instantiates a client.
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],  # Use "auto" to detect language.
       model="chirp_3",
       features=cloud_speech.RecognitionFeatures(
           # Enable diarization by setting empty diarization configuration.
           diarization_config=cloud_speech.SpeakerDiarizationConfig(),
       ),
   )

   file_metadata = cloud_speech.BatchRecognizeFileMetadata(uri=audio_uri)

   request = cloud_speech.BatchRecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       files=[file_metadata],
       recognition_output_config=cloud_speech.RecognitionOutputConfig(
           inline_response_config=cloud_speech.InlineOutputConfig(),
       ),
   )

   # Creates audio transcription job.
   operation = client.batch_recognize(request=request)

   print("Waiting for transcription job to complete...")
   response = operation.result(timeout=120)

   for result in response.results[audio_uri].transcript.results:
       print(f"Transcript: {result.alternatives[0].transcript}")
       print(f"Detected Language: {result.language_code}")
       print(f"Speakers per word: {result.alternatives[0].words}")

   return response.results[audio_uri].transcript

Improve accuracy with model adaptation

Chirp 3 can improve transcription accuracy for your specific audio using model adaptation. This lets you to provide a list of specific words and phrases, increasing the likelihood that the model will recognize them. It's especially useful for domain-specific terms, proper nouns, or unique vocabulary.

Python

import os

from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech
from google.api_core.client_options import ClientOptions

PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
REGION = "us"

def transcribe_sync_chirp2_model_adaptation(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 2 model with adaptation, improving accuracy for specific audio characteristics or vocabulary.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       # Use model adaptation
       adaptation=cloud_speech.SpeechAdaptation(
         phrase_sets=[
             cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                 inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                   {
                       "value": "alphabet",
                   },
                   {
                         "value": "cell phone service",
                   }
                 ])
             )
         ]
       )
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Enable denoiser and SNR-filtering

Chirp 3 can enhance audio quality by reducing background noise and filtering out unwanted sounds before transcription. You can improve results from noisy environments by enabling the built-in denoiser and Signal-to-Noise Ratio (SNR) filtering.

Setting denoiser_audio=true can effectively help you reduce background music or noises like rain and street traffic.

You can set snr_threshold=X to control the minimum loudness of speech required for transcription. This helps to filter out non-speech audio or background noise, preventing unwanted text in your results. A higher snr_threshold means the user needs to speak louder for the model to transcribe the utterances.

SNR-filtering can be utilized in real-time streaming use cases to avoid sending unnecessary sounds to a model for transcription. A higher value for this setting means that your speech volume must be louder relative to the background noise to be sent to the transcription model.

The configuration of snr_threshold will interact with whether denoise_audio is true or false. When denoise_audio=true, background noise is removed, and speech becomes relatively clearer. The overall SNR of the audio goes up.

If your use case involves only the user's voice without others speaking, set denoise_audio=true to increase the sensitivity of SNR-filtering, which can filter out non-speech noise. If your use case involves people speaking in the background and you want to avoid transcribing background speech, consider setting denoise_audio=false and lowering the SNR threshold.

The following are recommended SNR threshold values. A reasonable snr_threshold value can be set from 0 - 1000. A value of 0 means don't filter anything, and 1000 means filter everything. Fine-tune the value if recommended setting does not work for you.

Denoise audio	SNR threshold	Speech sensitivity
true	10.0	high
true	20.0	medium
true	40.0	low
true	100.0	very low
false	0.5	high
false	1.0	medium
false	2.0	low
false	5.0	very low

Python

 import os

 from google.cloud.speech_v2 import SpeechClient
 from google.cloud.speech_v2.types import cloud_speech
 from google.api_core.client_options import ClientOptions

 PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT")
 REGION = "us"

def transcribe_sync_chirp2_with_timestamps(
   audio_file: str
) -> cloud_speech.RecognizeResponse:
   """Transcribes an audio file using the Chirp 2 model of Google Cloud Speech-to-Text V2 API, providing word-level timestamps for each transcribed word.
   Args:
       audio_file (str): Path to the local audio file to be transcribed.
           Example: "resources/audio.wav"
   Returns:
       cloud_speech.RecognizeResponse: The response from the Speech-to-Text API containing
       the transcription results.
   """

   # Instantiates a client
   client = SpeechClient(
       client_options=ClientOptions(
           api_endpoint=f"{REGION}-speech.googleapis.com",
       )
   )

   # Reads a file as bytes
   with open(audio_file, "rb") as f:
       audio_content = f.read()

   config = cloud_speech.RecognitionConfig(
       auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
       language_codes=["en-US"],
       model="chirp_3",
       denoiser_config={
           denoise_audio: True,
           # Medium snr threshold
           snr_threshold: 20.0,
       }
   )

   request = cloud_speech.RecognizeRequest(
       recognizer=f"projects/{PROJECT_ID}/locations/{REGION}/recognizers/_",
       config=config,
       content=audio_content,
   )

   # Transcribes the audio into text
   response = client.recognize(request=request)

   for result in response.results:
       print(f"Transcript: {result.alternatives[0].transcript}")

   return response

Use Chirp 3 in the Google Cloud console

Sign up for a Google Cloud account, and create a project.
Go to Speech in the Google Cloud console.
If the API isn't enabled, enable the API.
Make sure that you have an STT console Workspace. If you don't have a workspace, you must create a workspace.
1. Go to the transcriptions page, and click New Transcription.
2. Open the Workspace drop-down and click New Workspace to create a workspace for transcription.
3. From the Create a new workspace navigation sidebar, click Browse.
4. Click to create a new bucket.
5. Enter a name for your bucket and click Continue.
6. Click Create to create your Cloud Storage bucket.
7. After the bucket is created, click Select to select your bucket for use.
8. Click Create to finish creating your workspace for the Speech-to-Text API V2 console.
Perform a transcription on your actual audio.

From the New Transcription page, select your audio file through either upload (Local upload) or specifying an existing Cloud Storage file (Cloud storage).
Click Continue to move to the Transcription options.
1. Select the Spoken language that you plan to use for recognition with Chirp from your previously created recognizer.
2. In the model drop-down, select chirp_3.
3. In the Recognizer drop-down, select your newly created recognizer.
4. Click Submit to run your first recognition request using chirp_3.
View your Chirp 2 transcription result.
1. From the Transcriptions page, click the name of the transcription to view its result.
2. In the Transcription details page, view your transcription result, and optionally playback the audio in the browser.

What's next

Learn how to transcribe short audio files.
Learn how to transcribe streaming audio.
Learn how to transcribe long audio files.
For best performance, accuracy, and other tips, see the best practices documentation.

Chirp 3 Transcription: Enhanced multilingual accuracy Stay organized with collections Save and categorize content based on your preferences.

Model details

Model identifiers

API methods

Regional availability

Language availability for transcription

Language availability for diarization

Feature support and limitations

Transcribe using Chirp 3

Perform streaming speech recognition

Python

Perform synchronous speech recognition

Python

Perform batch speech recognition

Python

Use Chirp 3 Features

Perform a language-agnostic transcription

Python

Perform a language-restricted transcription

Python

Perform transcription and speaker diarization

Python

Improve accuracy with model adaptation

Python

Enable denoiser and SNR-filtering

Python

Use Chirp 3 in the Google Cloud console

What's next

Chirp 3 Transcription: Enhanced multilingual accuracy