Tags: arp242/uni
Tags
Version v2.7.0
- Improve `-format` flag:
- Add `%name` as an alias for `%(name l:auto)`; this is a lot less typing and
requires less shell quoting, and >90% of the time this is what you want.
- Automatically prepend character, codepoint, and name if the format flag
starts with `+`; for example:
% uni identify -f +'%unicode %plane' a
Name Unicode Plane
'a' U+0061 LATIN SMALL LETTER A 1.1 Basic Multilingual Plane
This should make quickly printing some property a lot quicker.
- Align and colourize JSON output.
- Update CLDR information, adding significantly more aliases for emojis.
- Add `cells` column, which returns how many cells a codepoint will display at
(0, 1, or 2).
- Add `aliases` column, which lists the alias names. Also add this to the
default output:
% uni s factorial
CPoint Dec UTF8 HTML Name Aliases
'!' U+0021 33 21 ! EXCLAMATION MARK [factorial, bang]
- Add `refs` columns, which references other related/similar codepoints:
% uni p -q U+46 -f '%(name): %(refs)'
LATIN CAPITAL LETTER F: U+2109, U+2131, U+2132
% uni p -q U+46 -f '%(refs)' | uni p
CPoint Dec UTF8 HTML Name Aliases
'℉' U+2109 8457 e2 84 89 ℉ DEGREE FAHRENHEIT
'ℱ' U+2131 8497 e2 84 b1 ℱ SCRIPT CAPITAL F [Fourier transform]
'Ⅎ' U+2132 8498 e2 84 b2 Ⅎ TURNED CAPITAL F [Claudian digamma inversum]
- Allow arguments to `print`to start or end with a comma or slash. This comes up
when copy/pasting some list of codepoints from another source; there's no real
reason to error out on this.
- Allow listing unicode versions with `uni list unicode` and planes with `uni
list planes`.
- `uni list` without arguments errors, instead of listing all.
- Add `h` format flag to not print the header for this column.
Release 2.6.0
- Update to Unicode 15.1.
- Add "script" property – also supported in the list and print commands:
% uni identify -f '%(script l:auto) %(cpoint) %(name)' 'a Ω'
Script CPoint Name
Latin U+0061 LATIN SMALL LETTER A
Common U+0020 SPACE
Greek U+03A9 GREEK CAPITAL LETTER OMEGA
% uni list scripts
Scripts:
Name Assigned
Adlam 83
Ahom 54
Anatolian Hieroglyphs 582
…
% uni print 'script:linear a'
Showing script Linear A
CPoint Dec UTF8 HTML Name (Cat)
'𐘀' U+10600 67072 f0 90 98 80 𐘀 LINEAR A SIGN AB001 (Other_Letter)
'𐘁' U+10601 67073 f0 90 98 81 𐘁 LINEAR A SIGN AB002 (Other_Letter)
'𐘂' U+10602 67074 f0 90 98 82 𐘂 LINEAR A SIGN AB003 (Other_Letter)
…
- Add "unicode" property, which tells you in which Unicode version a codepoint
was introduced:
% uni identify -f '%(unicode l:auto) %(cpoint l:auto) %(name)' a𐘂🫁
Unicode CPoint Name
1.1 U+0061 LATIN SMALL LETTER A
7.0 U+10602 LINEAR A SIGN AB003
13.0 U+1FAC1 LUNGS
- Show unprintable control characters as the open box (␣, U+2423) instead of the
replacement character (�, U+FFFD). It already did that for C1 control
characters, and U+FFFD looked more like a bug than intentional. The -raw/-r
flag still overrides this.
- Always print Private Use characters as-is for %(char) instead of using U+FFFD
replacement character. It's usually safe to print this, and having to use -raw
is confusing.
- `ls` command is now an alias for `list.
Release v2.3.0
Changes:
- Update to Unicode 14.0.
- UTF-16 and JSON are printed as lower case, just like UTF-8 was. Upper-case is
used only for codepoints (i.e. U+00AC).
- `uni print` can now print from UTF-8 byte sequence; for example to print the €
sign:
uni p utf8:e282ac
uni p 'utf8:e2 82 ac'
uni p 'utf8:0xe2 0x82 0xac'
Bytes can optionally be separated by any combination of `0x`, `-`, `_`, or spaces.
PreviousNext