Text about surrogate pairs has confusing example.

The text says:
> A UTF-16 surrogate code point, even if in a valid UTF-16 surrogate pair, e.g. `\uD83D\uDE03` or `\UD83DDE03`.

That's confusing because `\UD83dDE03` is not a valid surrogate pair, it represents the (non-existing) code point U+D83DDE03.
Using it as an example suggests that there is _some_ way to interpret it as two 16-bit code points, so that, fx, `"\u00410042"` could be a valid way to write the string `"AB"`.

(A string is a sequence of valid code points that are not in the surrogate range. That's the same as valid _scalar values_ - Unicode scalar values are Unicode code points except the surrogates).

The same place, just above, also says that the following is not allowed in a string literal:

* An invalid Unicode code point, e.g. `\u2FE0`.

Is this only if the value occurs as an escape, or also if the source contains the literal U+2FE0 code point?
(There is no specification of what input source _is_, other than that it contains "characters", so likely it's a sequence of scalar values.)

The Unicode specification does not define any code point as "invalid". 
Is it any code point which is currently _unassigned_? 
(If so, which strings are valid depends on which Unicode version is being validated against.)
Does that include code points that are _reserved_? Or assigned, but as non-characters?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text about surrogate pairs has confusing example. #516

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Text about surrogate pairs has confusing example. #516

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions