-
Notifications
You must be signed in to change notification settings - Fork 63
Define TTS enhancements in a working group note #1700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it o.k. to leave the pls reference in the example §2.3.2.4.2? It may be legal but now it is there out of the blue...
(As an editorial aside: that example does not look like an example... I guess it should be an <aside> rather than a <div>)
Hm, I hadn't noticed that respec drops the formal numbered heading when a |
Maybe let us leave this as for this PR, and come back to that later... |
Ya, I'm more wondering if we should dump the RS processing requirements. I'm sure if anyone tried to implement them they'd find them vastly underspecified. I don't even know what it means for a reading system to "apply the supplied pronunciation instructions" to the text nodes. The problem is more complicated than that, as it assumes the reading system is doing the voicing. It's more likely that a built-in OS voicing technology or an AT accessing the DOM is going to send the text to a TTS engine be rendered, and it would have to inject any author-defined phonemes at that stage. So the few rules that we have don't even target the right application. The voicing application would only gain knowledge of a PLS lexicon from the |
|
Good point. There is no reason to have anything about PLS in the Reading system if the content document does not even mention it...
|
|
I wonder whether it is worth adding this (informative) reference to the document: https://www.w3.org/TR/spoken-html/ It maybe in a very early stage right now, but may be in a much better shape by the time we get to rec... |
The formulation of SSML created in IDPF is similarly fraught, but is probably a separate issue. That document skims past the same lack of standardization on how to include PLS in html and assumes it can be done. That's where our definition of using link elements might better belong. The use of data-* attributes at this stage is also problematic. Until they move to a more viable proposal, we may want to hold off on citing. It could be misinterpreted as an actual extension. |
|
Let me check. I would not be surprised if PLS is heavily used by at least one textbook publisher in Japan. I know that it heavily uses SSML. |
The change is not that different from CFIs in content. You can still use those, too, even if the spec doesn't actively promote them, and reading systems can support them in content if they want. Similarly, we don't invalidate anything anyone has done, or will continue to do, as we were basically just documenting how to use the The ssml attributes, on the other hand, are solely defined by our specification, so they're here to stay for the duration in some form. It's premature to look at the WAI group's work yet, but if it gains better traction than our own attributes we'll need to address migration eventually. It looks like it may end up with very similar, but likely unprefixed, attributes, which would help in that. I don't see it's something we'll be able to address in this revision, though. After getting burned by |
|
PLS is not broken and is widely used. SSML is used even more. Why do we have to touch SSML and PML, when they are not broken and there are no mature alternatives ? |
Do you have any evidence that PLS is widely used and supported in EPUB? At any rate, CSS 3 Speech is also not mentioned in the specification as there's nothing the specification needs to add to make it valid. We don't need to document every thing you can do. EPUB is not where these technologies should have been defined for HTML, either. As I said at the last meeting, if we're serious about getting traction for PLS, we should look at getting the section incorporated into the spoken presentation specification or publishing it as a separate note (perhaps in the CG, like Alternate Style Sheets). It shouldn't be buried in the EPUB spec. I also think this is better documented in the accessibility techniques document, because, as you say, this is more best practice guidance for making content accessible (i.e., meeting WCAG 3.1.6). |
Yes, Japanese textbook companies use SSML as a pronunciation processing method for digital textbooks, and it seems that PLS is used as a dictionary function. I'm currently checking to textbook companies. Here is an example of a site where PLS is listed. (Japanese only). |
|
I've created the TTS note now. You can preview it at: https://cdn.statically.io/gh/w3c/epub-specs/editorial/issue-1690/epub33/tts/index.html I've added some introductory text around what we had, but I've otherwise kept the requirements as they were. Fixes for the other issues I've opened can be done after this gets merged. The pull request now also modifies the reading systems specification: preview, diff |
|
Thanks. Should I make comments on the new note here? Or, should I create a separate issue? |
|
If it's about how we fix/improve the authoring/rs requirements, I'd say open new issues for those so we can tackle them separately and it'll be clearer what is changing. That's what I've been trying to do. If it's just additions/clarifications you'd like to see in any of the intro text I've added to fill out the document as a more complete note, you can probably add them here. |
|
Matt-san, Thank you for some comments and suggestions. I have confirmed PLS with some Japanese textbook companies that appear to be using SSML. Strictly speaking, textbook companies do not use PLS, but they refer to PLS notation and create a TTS dictionary for the reading system to control SSML. Digital textbooks must be created so as not to make mistakes in reading. Therefore, textbook companies use full-text SSML. Japanese sentences are mixed with kana and kanji (sometimes mixed with foreign languages), so the reading (pronunciation) has the characteristic of changing depending on the structure of the sentence. In Japanese, the most important factor is how to pronounce the characters correctly. In the case of textbooks, full-text SSML is currently used, but if TTS's technical capabilities improve in the future, partial SSML will pronounce it correctly. Therefore, I think it is better to keep the PLS notation in the future. |
|
We are heading for a separate note for PLS+SSML+CSS Speech. Of course, there are pros and cons about this decision. But it is now easier to make some improvements. If there are some low-hanging fruits, please submit a proposal. |
|
Murata-san, I got it. |
Yes, I think we can improve our requirements to make it more obvious that this makes for a conforming implementation. We should only require the correct phonemes be applied independent of how it is done. We could then informatively suggest some known ways, like initializing the TTS engine with the lexicons (if it supports PLS), compiling the lexemes and applying the phonetic spellings to the text passed to the TTS engine, or transforming the PLS file into a format the TTS engine can recognize. But we should take these issues up separately from creating the initial note. It'll be more helpful for a change log to have separate issues and pull requests we can refer to. |
…prove the background section
|
If there are no other editorial issues, I'll merge this by end of day tomorrow so we can move to fixing up the requirements. |
|
@mattgarrish just reading through the tts draft: the text speaks about XHTML only, although we found out that everything is reproducible in SVG, too. It is a note, and RS do not seem to implement TTS in general, so it does not harm if we add SVG alongside HTML. |
Right, I just want to make any substantive changes after we break out the tts note so that we're not changing too much at once. Also so we can directly tie a change log to specific pull requests. Given there's been no other feedback, though, I'm going to merge this and then make the svg changes for #1710. |
Define TTS enhancements in a working group note
This PR fixes #1690 but doesn't change anything from a validation perspective. It is still valid to link to PLS lexicons, and fallbacks are not required for linked resources. All it does is stop promoting the practice given the lack of real-world support.The reading system support requirements are unchanged except for a couple of references to the core specification that now link to the PLS specification and the RS processing section, as appropriate. It might be worth taking PLS out of this document, too, as if we ever do get support it would seemingly have to come through browsers.
If authoring support ever develops, this is probably something we could better take up in the accessibility techniques (same for CSS 3 Speech).Due to changes in direction resulting from further discussion of #1690, this pull request now adds a new note that covers TTS enhancements in EPUB (SSML+PLS+CSS Speech). These technologies are still valid to use in epub publications.
Please refer to #1700 (comment) and onwards in this thread for the current discussion.
Fixes #1690
Fixes #1712
Preview | Diff