utf8info is a small utility that reads a UTF-8 stream and prints out the raw codepoint information. It's useful for spotting invisible control characters like U+202E RIGHT-TO-LEFT OVERRIDE, and interrogating complex Zero-Width-Joiner sequences like 👨👩👧👦, which is composed of 7 characters!
This tool supports codepoints from the latest published version of the Unicode Standard, sourcing data from the Unicode Character Database.
On macOS and Linux, it should be as simple as running the following inside the utf8info directory:
make && make install
When a new version of the standard is released, you can fetch the latest UCD with make update, and then build as before.
Windows is not officially supported, but it'll likely work under WSL.
Note: Building utf8info depends on curl, unzip, and a C++17-compatible C++ compiler being present.
-v, --verbose Enable verbose output. This prints the raw UTF-8 bytes next to the codepoint info.
-d, --definitions Display definitions for CJK Unified Ideographs
-a, --all List all known codepoints and exit.