Page MenuHomePhabricator

Avoid storing empty files and show an error in Phonos
Closed, ResolvedPublic5 Estimated Story Points

Description

We've found that sometimes Phonos engines return empty audio files, presumably because the IPA passed to it was invalid or otherwise could not be interpreted. While some of that is expected, there's no reason to leave these empty files lingering around in Swift forever. Phonos should automatically delete the file and instead show a user-facing error that audio could not be generated.

Acceptance criteria

  • Don't store files that are very small (current threshold is 1200 bytes)
  • Show an error to the user, something like "The generated audio appears to be empty. The given IPA may be invalid, or the engine can't interpret it. Using the '$1' parameter may help."
    • NOTE: It's not really safe to say the given parameters are definitively invalid; instead we just want to hint that it could be fixed by editorial trial-and-error.

Event Timeline

MusikAnimal renamed this task from Automatically delete 0 byte files and show an error in Phonos to Automatically delete empty files and show an error in Phonos.Dec 1 2022, 9:48 PM

Change 863051 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Add a minimum file size and show an error if a file less than it

https://gerrit.wikimedia.org/r/863051

See what others think of my solution. I got to coding an realized that we can probably safely deduce audio is empty (i.e. created but < 0:01 in length) solely by going by the byte length of the raw MP3 data, and not have to first create the file. This avoids unnecessary operations in Swift. An alternative is do like Extension:Score and use a script to get the length of the audio and go by that, but this would require a trip through Shellbox which we want to avoid if possible.

I tested many short words like "a", "hi", etc., and all are well over the minimum size I went with of 1200 bytes. For now, I've only done this for the Google engine since I know others (eSpeak in particular) generates shorter audio files.

MusikAnimal renamed this task from Automatically delete empty files and show an error in Phonos to Avoid storing empty files and show an error in Phonos.Dec 1 2022, 11:24 PM
MusikAnimal updated the task description. (Show Details)

For comparison, the first one here is empty and 1,056 bytes long.

Change 863051 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Add a minimum file size and show an error if a file less than it

https://gerrit.wikimedia.org/r/863051

QA notes: Examples of IPA that generate empty audio:

  • <phonos lang="en" ipa="ˈkɑːtɑːr, kΙ™Λˆtɑːr" />
  • <phonos lang="ar" ipa="ΔΆuḍā'Δ«" />
  • <phonos lang="ar" ipa="foobar" />

Hopefully it's easy to create more examples. Basically just refrain from using the text parameter and give it bogus IPA (noting the first two examples above are not actually bogus, though!), it will often fail.

Another thing to be aware of is it's possible (but as-of-yet unproven) that actual playable files are 1200 bytes or smaller. In my testing, I looked at a lot of single-syllable words such as "a", "hi", etc., and none ever seemed to be close to 1200 bytes. So QA'ing might also involve trying to find would-be legitimate Phonos files that never get stored because it's so small.

MusikAnimal set the point value for this task to 5.Dec 6 2022, 12:16 AM

Another thing to be aware of is it's possible (but as-of-yet unproven) that actual playable files are 1200 bytes or smaller. In my testing, I looked at a lot of single-syllable words such as "a", "hi", etc., and none ever seemed to be close to 1200 bytes. So QA'ing might also involve trying to find would-be legitimate Phonos files that never get stored because it's so small.

The smallest I have found so far is 1440 bytes. Here is an example <phonos ipa="Ι‘Μƒ" lang=fr-ca />.

The largest "empty" file I have found so far is 1152 bytes (testing on the commit before this patch).

I extracted all(?) the IPA phonemes from https://cloud.google.com/text-to-speech/docs/phonemes and created a phonos tag for each (P42430).

@MusikAnimal Oh, just to clarify, we always get an mp3 from google? Per T319379. I am assuming that if we did get a WAV from google it would be bigger (even if empty?)

@MusikAnimal Oh, just to clarify, we always get an mp3 from google? Per T319379. I am assuming that if we did get a WAV from google it would be bigger (even if empty?)

Correct, we always get MP3 from Google. For now, there's no minimum on the size of the files for the other engines, since I didn't bother to figure out what the appropriate value should be. eSpeak for instance makes very short audio files, so the threshold would need to be lower for it.