Issue in converting voice to text language detection #6437

anjaly · 2025-06-17T08:23:04Z

anjaly
Jun 17, 2025

Please make sure you have searched for information in the following guides.

Search the issues already opened: https://github.com/GoogleCloudPlatform/google-cloud-node/issues
Search StackOverflow: http://stackoverflow.com/questions/tagged/google-cloud-platform+node.js
Check our Troubleshooting guide: https://github.com/googleapis/google-cloud-node/blob/main/docs/troubleshooting.md
Check our FAQ: https://github.com/googleapis/google-cloud-node/blob/main/docs/faq.md
Check our libraries HOW-TO: https://github.com/googleapis/gax-nodejs/blob/main/client-libraries.md
Check out our authentication guide: https://github.com/googleapis/google-auth-library-nodejs
Check out handwritten samples for many of our APIs: https://github.com/GoogleCloudPlatform/nodejs-docs-samples
Check the API's issue tracker: https://cloud.google.com/support/docs/issue-trackers

A screenshot that you have tested with "Try this API".

.

Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.

https://www.npmjs.com/package/@google-cloud/speech

A step-by-step description of how to reproduce the issue, based on the linked reproduction.

.

A clear and concise description of what the bug is, and what you expected to happen.

@google-cloud/speech

We are experiencing an issue with the Google Speech-to-Text API where English voice inputs are sometimes being incorrectly identified and transcribed as Arabic text. And also the same inputs are returning different responses when asked by different users.

We would appreciate your guidance on the following points:

How can we ensure that speech is consistently recognized in the correct spoken language, especially when English is spoken?

Is there a way to reduce or prevent misclassification into alternative languages like Arabic when English is the actual spoken language?

Can identical voice inputs result in different transcriptions depending on speaker tone, accent, or other voice characteristics?

Our desired outcome is to ensure the transcription reflects the user’s spoken language accurately and consistently.

Looking forward to your assistance on this matter.

Below is the code snippet that we are using for transcription of the voice to text.

converter: function (req, res) { return new Promise( function (resolve, reject) { const grpc = require("@grpc/grpc-js"); const speech = require('@google-cloud/speech').v1p1beta1; const client = new speech.SpeechClient({ projectId: config.api.PROECT_ID, credentials:req.body.oauthJSON, grpc: grpc}); let file = req.fileNoExtension; const filename = "./audio/" + file + ".flac"; let languageCode = req.body.lang_code ? req.body.lang_code : config.api.LANGUAGE_CODE; let alternativeLanguageCodes = req.body.alternative_language_codes && Array.isArray(req.body.alternative_language_codes) ? req.body.alternative_language_codes : config.api.ALTERNATIVE_LANGUAGE_CODES; const config1 = { encoding: config.api.ENCODING, sampleRateHertz: 16000, languageCode: languageCode, alternativeLanguageCodes: alternativeLanguageCodes, }; const audio = { content: fs.readFileSync(filename).toString('base64'), }; const request = { config: config1, audio: audio, }; client .recognize(request) .then(data => { var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { console.log(err); } }) } const response = data[0]; let responseDataObj = {}; responseDataObj.audio_link = req.body.attachment; responseDataObj.totalBilledTime; responseDataObj.convertedText = []; if (response.results && response.results.length > 0) { responseDataObj.totalBilledTime = response.totalBilledTime; response.results.forEach(result => { const alternative = result.alternatives[0]; let dbOut = {}; dbOut.text = alternative.transcript; dbOut.confidence = alternative.confidence; responseDataObj.convertedText.push(dbOut); resolve(responseDataObj); }); } else { req.speechStatus = 'false'; console.info('Unable to process the file ', req.speechStatus) return res.json({ status: 400, info: 'Unable to process the file' }); } }) .catch(err => { req.speechStatus = 'false'; var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { //console.log(err); } }) } return res.json({ status: 400, info: 'Unable to process the audio file' }); }); });
Below is the configuration we use

{"encoding":"FLAC","sampleRateHertz":16000,"languageCode":"en-GB","alternativeLanguageCodes":["ar-AE","de-DE","th-TH"]}

Thanks

A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

The converted text should be in the same language.

sofisl · 2025-06-20T23:16:11Z

sofisl
Jun 20, 2025
Maintainer

Hi there! This is an API-specific question which goes in the Discussions bucket - we try to keep this queue open for library-specific issues. Transferring over there!

3 replies

anjaly Jun 23, 2025
Author

Hi,
We are also experiencing the below issue when trying to convert audio to text.

Missing or Partial Transcription:
When an audio file includes a brief pause or gap between words, the API sometimes returns only the initial portion of the spoken content or, in other cases, just the final portion—missing the middle entirely.
Inconsistent Results on Same Audio:
When submitting the same audio file multiple times through the API, the transcribed text differs each time, even though the audio content remains unchanged. The variations are significant and not limited to punctuation or minor changes.

Could you please help in fixing the issues.

anjaly Jun 24, 2025
Author

@sofisl Could you please provide us with an update on these issues?

anjaly Jul 1, 2025
Author

Any updates on this? We’ve observed that the Arabic audio for the same user is working correctly. Issue only with English audio. Could you please suggest what modification should we do in our code for fixing this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue in converting voice to text language detection #6437

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Issue in converting voice to text language detection #6437

Uh oh!

Uh oh!

anjaly Jun 17, 2025

Please make sure you have searched for information in the following guides.

A screenshot that you have tested with "Try this API".

Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.

A step-by-step description of how to reproduce the issue, based on the linked reproduction.

A clear and concise description of what the bug is, and what you expected to happen.

A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **

Replies: 1 comment · 3 replies

Uh oh!

sofisl Jun 20, 2025 Maintainer

Uh oh!

anjaly Jun 23, 2025 Author

Uh oh!

anjaly Jun 24, 2025 Author

Uh oh!

anjaly Jul 1, 2025 Author

anjaly
Jun 17, 2025

Replies: 1 comment 3 replies

sofisl
Jun 20, 2025
Maintainer

anjaly Jun 23, 2025
Author

anjaly Jun 24, 2025
Author

anjaly Jul 1, 2025
Author