-
Notifications
You must be signed in to change notification settings - Fork 756
Description
CSS2 had some aural properties, but the whole concept got deprecated. Some of them have been revived by CSS Speech. Both approaches are tailored to a linear reading of a document for text-to-speech (TTS) engines. Many properties in the Speech module accordingly have a voice- prefix.
Its aural box model roughly equates visual margin, border and padding with aural pause, cue and rest, respectively. While the first and third are gaps, the middle one has substance: cue-before and cue-after take references to audio files, similar to (but way less complex than) border images. One could argue that cues could benefit from supporting some standardised keywords for brief sound effects (SFX) akin to border styles, but that’s not my main point here.
CSS Speech is furthermore concerned with the textual (or replaced) content of the box, whose audible rendering it is intended to style, but there is also an equivalent to the graphical background (image) of the box, which is not covered yet: ambient sounds, including “background” music. CSS2A had play-during for this purpose. Authors currently would most likely use an HTML <audio> element instead.
The background images of a box are always visible unless it is outside of the viewport or covered up by opaque boxes with a higher z-index – or the box is either invisible or not displayed at all. Likewise, ambient sounds should be audible while its box is in the user’s acoustic space – for the sake of initial simplicity this should be equivalent to the visibility on screen of the respective spatial box. Fixed attachment could be possible to keep it audible at all times. It might be helpful to restrict concurrent playback to a single, top-most sound or to introduce gradual audio-opacity to tone down or even mute overlayed sounds.
However, constantly running background music or noise is rather unpopular in most user interfaces, except in exclusive-focus UIs (like immersive games). Also, unless earphones are used, in collaborative or public environments, sound may intrude the perception of bystanders more easily and widely than screen images, which affects privacy and courtesy.
A more important auditive feature for some websites are SFX – or auditory icons – that are played upon user interaction. In CSS, such triggered events are mostly handled by pseudo-classes like :hover, :focus and :active. I think the cue properties might already almost work for that, but the Speech module is currently lacking a clear model how it expects content to be listened to. For now, it seems to expect a linear, non-interactive rendition from start to end of the document – or considers this out of scope for the specification, like the description of temporal pseudo-classes :current, :past and :future indicates:
These pseudo-classes classify elements with respect to the currently-displayed or active position in some timeline, such as during speech rendering of a document (…). CSS does not define this timeline; the host language must do so.
However, CSS would need some way to invoke auditive cues by itself, i.e. (temporarily) influence that timeline, e.g. :hover {time-index: now;}.
Before I make any more concrete proposals, I’d like to know:
- Is acoustic design beyond TTS in scope for CSS?
- Is the WG interested in working on this in the foreseeable future?
- Would this be handled by (a future level of) the existing Speech module or by a new, closely coordinated one?