Replies: 1 comment 3 replies
-
Hi there! This is an API-specific question which goes in the Discussions bucket - we try to keep this queue open for library-specific issues. Transferring over there! |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Please make sure you have searched for information in the following guides.
A screenshot that you have tested with "Try this API".
.
Link to the code that reproduces this issue. A link to a public Github Repository or gist with a minimal reproduction.
https://www.npmjs.com/package/@google-cloud/speech
A step-by-step description of how to reproduce the issue, based on the linked reproduction.
.
A clear and concise description of what the bug is, and what you expected to happen.
@google-cloud/speech
We are experiencing an issue with the Google Speech-to-Text API where English voice inputs are sometimes being incorrectly identified and transcribed as Arabic text. And also the same inputs are returning different responses when asked by different users.
We would appreciate your guidance on the following points:
How can we ensure that speech is consistently recognized in the correct spoken language, especially when English is spoken?
Is there a way to reduce or prevent misclassification into alternative languages like Arabic when English is the actual spoken language?
Can identical voice inputs result in different transcriptions depending on speaker tone, accent, or other voice characteristics?
Our desired outcome is to ensure the transcription reflects the user’s spoken language accurately and consistently.
Looking forward to your assistance on this matter.
Below is the code snippet that we are using for transcription of the voice to text.
converter: function (req, res) { return new Promise( function (resolve, reject) { const grpc = require("@grpc/grpc-js"); const speech = require('@google-cloud/speech').v1p1beta1; const client = new speech.SpeechClient({ projectId: config.api.PROECT_ID, credentials:req.body.oauthJSON, grpc: grpc}); let file = req.fileNoExtension; const filename = "./audio/" + file + ".flac"; let languageCode = req.body.lang_code ? req.body.lang_code : config.api.LANGUAGE_CODE; let alternativeLanguageCodes = req.body.alternative_language_codes && Array.isArray(req.body.alternative_language_codes) ? req.body.alternative_language_codes : config.api.ALTERNATIVE_LANGUAGE_CODES; const config1 = { encoding: config.api.ENCODING, sampleRateHertz: 16000, languageCode: languageCode, alternativeLanguageCodes: alternativeLanguageCodes, }; const audio = { content: fs.readFileSync(filename).toString('base64'), }; const request = { config: config1, audio: audio, }; client .recognize(request) .then(data => { var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { console.log(err); } }) } const response = data[0]; let responseDataObj = {}; responseDataObj.audio_link = req.body.attachment; responseDataObj.totalBilledTime; responseDataObj.convertedText = []; if (response.results && response.results.length > 0) { responseDataObj.totalBilledTime = response.totalBilledTime; response.results.forEach(result => { const alternative = result.alternatives[0]; let dbOut = {}; dbOut.text = alternative.transcript; dbOut.confidence = alternative.confidence; responseDataObj.convertedText.push(dbOut); resolve(responseDataObj); }); } else { req.speechStatus = 'false'; console.info('Unable to process the file ', req.speechStatus) return res.json({ status: 400, info: 'Unable to process the file' }); } }) .catch(err => { req.speechStatus = 'false'; var temp = './audio/'; findRemoveSync(temp, { filename: file, extensions: config.api.AUDIO_FORMATS }, function (err, body) { console.log("body", JSON.stringify(body)) }); if (req.options && req.options.filename) { let n = './audio/' + req.options.filename; fs.unlink(n, function (err) { if (err) { //console.log(err); } }) } return res.json({ status: 400, info: 'Unable to process the audio file' }); }); });
Below is the configuration we use
{"encoding":"FLAC","sampleRateHertz":16000,"languageCode":"en-GB","alternativeLanguageCodes":["ar-AE","de-DE","th-TH"]}
Thanks
A clear and concise description WHY you expect this behavior, i.e., was it a recent change, there is documentation that points to this behavior, etc. **
The converted text should be in the same language.
Beta Was this translation helpful? Give feedback.
All reactions