The text says:
A UTF-16 surrogate code point, even if in a valid UTF-16 surrogate pair, e.g. \uD83D\uDE03 or \UD83DDE03.
That's confusing because \UD83dDE03 is not a valid surrogate pair, it represents the (non-existing) code point U+D83DDE03.
Using it as an example suggests that there is some way to interpret it as two 16-bit code points, so that, fx, "\u00410042" could be a valid way to write the string "AB".
(A string is a sequence of valid code points that are not in the surrogate range. That's the same as valid scalar values - Unicode scalar values are Unicode code points except the surrogates).
The same place, just above, also says that the following is not allowed in a string literal:
- An invalid Unicode code point, e.g.
\u2FE0.
Is this only if the value occurs as an escape, or also if the source contains the literal U+2FE0 code point?
(There is no specification of what input source is, other than that it contains "characters", so likely it's a sequence of scalar values.)
The Unicode specification does not define any code point as "invalid".
Is it any code point which is currently unassigned?
(If so, which strings are valid depends on which Unicode version is being validated against.)
Does that include code points that are reserved? Or assigned, but as non-characters?
The text says:
That's confusing because
\UD83dDE03is not a valid surrogate pair, it represents the (non-existing) code point U+D83DDE03.Using it as an example suggests that there is some way to interpret it as two 16-bit code points, so that, fx,
"\u00410042"could be a valid way to write the string"AB".(A string is a sequence of valid code points that are not in the surrogate range. That's the same as valid scalar values - Unicode scalar values are Unicode code points except the surrogates).
The same place, just above, also says that the following is not allowed in a string literal:
\u2FE0.Is this only if the value occurs as an escape, or also if the source contains the literal U+2FE0 code point?
(There is no specification of what input source is, other than that it contains "characters", so likely it's a sequence of scalar values.)
The Unicode specification does not define any code point as "invalid".
Is it any code point which is currently unassigned?
(If so, which strings are valid depends on which Unicode version is being validated against.)
Does that include code points that are reserved? Or assigned, but as non-characters?