Skip to content

Conversation

@xfq
Copy link
Member

@xfq xfq commented Jul 4, 2022

Fix #15.


Preview | Diff

@xfq xfq requested review from aphillips and r12a July 4, 2022 11:27
@netlify
Copy link

netlify bot commented Jul 4, 2022

Deploy Preview for bp-i18n-specdev ready!

Name Link
🔨 Latest commit 0d91101
🔍 Latest deploy log https://app.netlify.com/sites/bp-i18n-specdev/deploys/6335762ad1406e000826ab3f
😎 Deploy Preview https://deploy-preview-77--bp-i18n-specdev.netlify.app/
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

I think it needs a bit more work, though, as it's unclear what to do as a specification author.

Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes. See comments.

index.html Outdated
<p>
Some specifications that define formal languages (examples include <a href="https://html.spec.whatwg.org/multipage/syntax.html#syntax">HTML</a> or <a href="https://www.w3.org/TR/css-syntax/#whitespace">CSS</a>) will choose to specify ASCII whitespace as part of their grammar. Specifications that deal more generally with text will choose to follow Unicode's definition instead.
</p>
<p class="advisement">Specifications that mention whitespace characters SHOULD explicitly state which characters are whitespace characters, by referring to the Unicode <a href="https://www.unicode.org/reports/tr44/#White_Space">White_Space</a> property or <code>General_Category=Z</code> [[UAX44]], or specifying the specific code points.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems a bit "chewy". Perhaps:

Specifications that use the term "whitespace" SHOULD explicitly define what the term means in their context. For most specifications, this should refer to the Unicode White_Space property or to General_Category=Z. Otherwise the specific code points should be listed.

We still don't say when to use the property or when to use the character class.

index.html Outdated

<div class="req" id="char_define_whitespace">
<p>
Some specifications that define formal languages (examples include <a href="https://html.spec.whatwg.org/multipage/syntax.html#syntax">HTML</a> or <a href="https://www.w3.org/TR/css-syntax/#whitespace">CSS</a>) will choose to specify ASCII whitespace as part of their grammar. Specifications that deal more generally with text will choose to follow Unicode's definition instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... reading my suggestion again, I think we need to define ASCII whitespace. This also doesn't provide quite enough guidance. I don't have time right now to make suggestions but will try to come back to this in a bit. @r12a what do you think?

@xfq
Copy link
Member Author

xfq commented Jul 11, 2022

If the spec is specifying a computer language (HTML, CSS, IDL, JSON, JavaScript, WGSL, SQL, XML etc.) then as part of the syntax like delimiting the class names in the class attribute of HTML, then it should probably allow ASCII whitespace only because I don't see any use case allowing all Unicode whitespace characters.

If the spec deals with natural language (like CSS Text or String Search), then it should probably allow all Unicode whitespace characters.

@xfq
Copy link
Member Author

xfq commented Jul 11, 2022

One more thing we might need to consider is whitespace in regular expressions.

I did some research and here is the current status of some languages:

HTML/JavaScript/Dart

\s is equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].

Ref: 1 2 3

Perl

\s means the five characters [ \f\n\r\t], and starting in Perl v5.18, the vertical tab.

A single /x tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. The set of characters that are deemed whitespace are:

U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000B LINE TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0020 SPACE
U+0085 NEXT LINE
U+200E LEFT-TO-RIGHT MARK
U+200F RIGHT-TO-LEFT MARK
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

Ref: https://perldoc.perl.org/perlre

UTS 18

\p{Whitespace}

Which includes:

0009..000D    ; White_Space # Cc   [5] <control-0009>..<control-000D>
0020          ; White_Space # Zs       SPACE
0085          ; White_Space # Cc       <control-0085>
00A0          ; White_Space # Zs       NO-BREAK SPACE
1680          ; White_Space # Zs       OGHAM SPACE MARK
2000..200A    ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
2028          ; White_Space # Zl       LINE SEPARATOR
2029          ; White_Space # Zp       PARAGRAPH SEPARATOR
202F          ; White_Space # Zs       NARROW NO-BREAK SPACE
205F          ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
3000          ; White_Space # Zs       IDEOGRAPHIC SPACE

Ref: https://unicode.org/reports/tr18/#space

Python 3

Both patterns and strings to be searched can be Unicode strings (str) as well as 8-bit strings (bytes).

For Unicode (str) patterns, \s matches Unicode whitespace characters.

For 8-bit (bytes) patterns, \s matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].

Ref: https://docs.python.org/3/library/re.html

Java

\s is equivalent to [ \t\n\x0B\f\r].

Ref: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Swift

\s is equivalent to [\t\n\f\r\p{Z}].

\p{Z} means General_Category=Z.

Ref: https://developer.apple.com/documentation/foundation/nsregularexpression

RE2/Go

\s is equivalent to [\t\n\f\r ].

[[:space:]] is equivalent to [\t\n\v\f\r ].

\pZ means General_Category=Z.

Ref: 1 2

@r12a
Copy link
Contributor

r12a commented Jul 15, 2022

Added a list of \p{Whitespace} characters to the previous comment.

@r12a
Copy link
Contributor

r12a commented Jul 22, 2022

I suspect that it would be helpful to list the characters, so that readers can easily evaluate the difference.

I created a table that shows the specific characters in the various lists, which i'll send by email.

@xfq
Copy link
Member Author

xfq commented Aug 1, 2022

I created a table that shows the specific characters in the various lists, which i'll send by email.

Thank you. I've added the table.

@r12a
Copy link
Contributor

r12a commented Aug 1, 2022

@xfq i think you need to add the table to the main text, rather than hide it in the pulldown – where i suspect that few people will notice it. I think you should probably also do one of the following:

  1. add a line of text after the table saying something like "Links to the latest definitions of the information in the table can be found by expanding the 'explanations & examples'."
  2. add a link to the relevant comment in the table.
  3. move the links out of the pulldown and make them normal text, too (with a sentence of introdction)

@r12a
Copy link
Contributor

r12a commented Aug 1, 2022

Also, it would be good to allow the table to extend into the right margin, so that it becomes less tall. I have a vague recollection that respec has a declarative way to do that.

@r12a
Copy link
Contributor

r12a commented Aug 1, 2022

Oh, and btw don't forget to add the link to the tracker db.

@xfq
Copy link
Member Author

xfq commented Aug 2, 2022

add a line of text after the table saying something like "Links to the latest definitions of the information in the table can be found by expanding the 'explanations & examples'."

Done.

Also, it would be good to allow the table to extend into the right margin, so that it becomes less tall. I have a vague recollection that respec has a declarative way to do that.

But this will make the viewport on smartphones too wide and unusable. I also searched "table" in https://respec.org/docs/ but didn't get anything that looked promising.

Oh, and btw don't forget to add the link to the tracker db.

Which link?

@r12a
Copy link
Contributor

r12a commented Aug 2, 2022

Which link?

The one like #79 but pointing to whitespace labels.

@r12a
Copy link
Contributor

r12a commented Aug 2, 2022

One suggestion, which will at least make it easier to read (not just because of overflow) the table on Gecko browsers: add

table.whitespace thead th {
  writing-mode: sideways-lr;
  }

to the local style definitions.

@r12a
Copy link
Contributor

r12a commented Aug 2, 2022

Otherwise, this is looking good.

Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress! Thank you.

@xfq xfq requested review from aphillips and r12a August 3, 2022 02:27
@xfq
Copy link
Member Author

xfq commented Aug 3, 2022

Another issue is that the first three columns are all from Unicode but are defined differently, so ideally we should determine which definition should be recommended.

Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are really minor. Otherwise good to merge! Thanks.

@xfq xfq merged commit 3e0bda5 into gh-pages Sep 29, 2022
@xfq xfq deleted the xfq/whitespace branch September 29, 2022 10:42
@xfq
Copy link
Member Author

xfq commented Sep 29, 2022

Merging. Thank you very much for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should include advice on what "White-space" is.

4 participants