-
Notifications
You must be signed in to change notification settings - Fork 19
Add guidelines on whitespace characters #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for bp-i18n-specdev ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start!
I think it needs a bit more work, though, as it's unclear what to do as a specification author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes. See comments.
index.html
Outdated
| <p> | ||
| Some specifications that define formal languages (examples include <a href="https://html.spec.whatwg.org/multipage/syntax.html#syntax">HTML</a> or <a href="https://www.w3.org/TR/css-syntax/#whitespace">CSS</a>) will choose to specify ASCII whitespace as part of their grammar. Specifications that deal more generally with text will choose to follow Unicode's definition instead. | ||
| </p> | ||
| <p class="advisement">Specifications that mention whitespace characters SHOULD explicitly state which characters are whitespace characters, by referring to the Unicode <a href="https://www.unicode.org/reports/tr44/#White_Space">White_Space</a> property or <code>General_Category=Z</code> [[UAX44]], or specifying the specific code points.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems a bit "chewy". Perhaps:
Specifications that use the term "whitespace" SHOULD explicitly define what the term means in their context. For most specifications, this should refer to the Unicode
White_Spaceproperty or toGeneral_Category=Z. Otherwise the specific code points should be listed.
We still don't say when to use the property or when to use the character class.
index.html
Outdated
|
|
||
| <div class="req" id="char_define_whitespace"> | ||
| <p> | ||
| Some specifications that define formal languages (examples include <a href="https://html.spec.whatwg.org/multipage/syntax.html#syntax">HTML</a> or <a href="https://www.w3.org/TR/css-syntax/#whitespace">CSS</a>) will choose to specify ASCII whitespace as part of their grammar. Specifications that deal more generally with text will choose to follow Unicode's definition instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... reading my suggestion again, I think we need to define ASCII whitespace. This also doesn't provide quite enough guidance. I don't have time right now to make suggestions but will try to come back to this in a bit. @r12a what do you think?
|
If the spec is specifying a computer language (HTML, CSS, IDL, JSON, JavaScript, WGSL, SQL, XML etc.) then as part of the syntax like delimiting the class names in the If the spec deals with natural language (like CSS Text or String Search), then it should probably allow all Unicode whitespace characters. |
|
One more thing we might need to consider is whitespace in regular expressions. I did some research and here is the current status of some languages: HTML/JavaScript/Dart
Perl
A single Ref: https://perldoc.perl.org/perlre UTS 18
Which includes: Ref: https://unicode.org/reports/tr18/#space Python 3Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes). For Unicode (str) patterns, For 8-bit (bytes) patterns, Ref: https://docs.python.org/3/library/re.html Java
Ref: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html Swift
Ref: https://developer.apple.com/documentation/foundation/nsregularexpression RE2/Go
|
|
Added a list of \p{Whitespace} characters to the previous comment. |
|
I suspect that it would be helpful to list the characters, so that readers can easily evaluate the difference. I created a table that shows the specific characters in the various lists, which i'll send by email. |
Thank you. I've added the table. |
|
@xfq i think you need to add the table to the main text, rather than hide it in the pulldown – where i suspect that few people will notice it. I think you should probably also do one of the following:
|
|
Also, it would be good to allow the table to extend into the right margin, so that it becomes less tall. I have a vague recollection that respec has a declarative way to do that. |
|
Oh, and btw don't forget to add the link to the tracker db. |
Done.
But this will make the viewport on smartphones too wide and unusable. I also searched "table" in https://respec.org/docs/ but didn't get anything that looked promising.
Which link? |
The one like #79 but pointing to whitespace labels. |
|
One suggestion, which will at least make it easier to read (not just because of overflow) the table on Gecko browsers: add to the local style definitions. |
|
Otherwise, this is looking good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! Thank you.
|
Another issue is that the first three columns are all from Unicode but are defined differently, so ideally we should determine which definition should be recommended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are really minor. Otherwise good to merge! Thanks.
|
Merging. Thank you very much for your review! |
Fix #15.
Preview | Diff