1 stable release

Uses new Rust 2024

1.0.0 Mar 15, 2026

#1645 in Algorithms

Custom license

135KB
4K SLoC

ious

Integers of unknown sizes.

Introduction

An IOUS (Integer Of Unknown Size) is a variable-length integer that's encoded as part of a data stream. This is contrasted with an IOKS (Integer Of Known Size), which has a known size and is used to represent the actual integer stored within an IOUS. The plural forms of IOUS and IOKS are IOUSes and IOKSes, and the names are inspired from the Rodents Of Unusual Size (ROUSes) from The Princess Bride.

While the IOUS format is generic over all sorts of parameters, it's designed so that any one implementation of it will be very easy to implement and very fast to both encode and decode. Instead of using continuation markers like most variable-length formats, it stores all length information at the beginning of the stream, often entirely avoiding the need for branching at all.

Conceptually, an IOUS indicates its length with the number of disabled bits at the beginning of the stream. In the most common case, with an IOUS formed of 8-bit units capped at 9 units of length + data, any integer between 8 and 128 bits can be decoded using four instructions:

  1. Count the leading zeroes of the first byte. (the length)
  2. Shift the first byte left by the length.
  3. Shift the first byte right by the length.
  4. Reverse the order of the bytes. (on little-endian architectures)

Although this simple case is relatively straightforward, code that operates under a completely generic form of IOUSes can be much more complicated, although any individual instance of the format can be similarly optimised to a very fast and simple set of instructions.

The IOKS format exists solely as a means to describe the endianness of an IOUS after decoding or before encoding. With byte-sized units, it's equivalent to a big-endian representation, but the possibility for alternatively sized units makes the endianness trickier to explain, and simpler to just describe as a separate format.

Units

IOUSes and IOKSes are comprised of multiple units, which represent standard fixed-length integer types. The signedness of the units extends to the signedness of the integer data itself, meaning that signed units can be used to represent signed integers and unsigned units can be used to represent unsigned integers.

Larger units help reduce the number of bits required to describe the length of the IOUS, while also reducing the granularity of their size. Although not required by the spec, in practice, most units will be a power-of-two multiple of 8 bits, those being the standard 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit sizes. Substantially larger units can still be used, for example, using a 1024-bit size to represent cryptographic keys.

Endianness

IOKSes are always stored in "big-endian" format with the most significant unit first, although the endianness within units themselves is left unspecified. Unit conversion is outside the scope of this spec.

Since IOUSes are intended to be represented on their own without any additional data, they must contain at least one unit of data even if that data is an unsigned zero. Technically, unsigned IOKSes can unambiguously represent an integer zero with zero units, although the spec will assume that IOKSes have at least one unit for simplicity.

Length bits

IOUSes are designed to be very similar to variable-length-quantities (VLQs), with the difference that all length information is stored at the beginning of the IOUS. Conversion from an IOUS to an IOKS thus simply requires replacing the length information with zeroes or sign-extended bits depending on the units used.

IOUSes also have an upper bound on the number of bits that can be used to represent the length, called the ceiling, which both bounds the total size of an IOUS in units and allows representing a maximum-length IOUS with one fewer bit than otherwise required. This is specific to IOUSes, and does not affect IOKSes, since they contain no length information.

The capacity of the IOUS is the number of bits of data that can be stored at its maximum length. The ceiling plus the capacity is the maximum length of the IOUS in bits.

Layout

To describe the layout of an IOUS, we need two parameters: a unit type and a ceiling size. To make describing things easier, we will represent the unit type by its total length in bits (unit bits) and its signedness (signed or unsigned).

Although bits may internally be stored differently for units, we will describe the format as if the bits within individual units are stored in order of descending significance, even though this may not be the case. In practice, machine instructions will allow operating on units as if they are actually stored this way, e.g. via bit-shifting, which makes this assumption reasonable.

IOUSes are split into two parts: length bits and data bits, in that order. The total number of bits must add up to a multiple of the number of unit bits, since we will be storing the IOUS as a sequence of units.

The length bits will always start with some number of disabled bits (that number can be zero). If this number of bits is less than the ceiling, it must be followed by an enabled terminating bit to indicate the end of the length. An IOUS is considered invalid if it does not have valid length bits, which can only happen if it has zero units, or is exclusively units filled with disabled bits that add up to less than the ceiling; the latter can occur if the ceiling is larger than the unit bits times the number of units in the IOUS.

While most IOUSes will choose to keep the ceiling bounded by the unit bits, the standard allows for more flexibility here. In general, the benefits of prefixing all the length information in an IOUS are lessened if multiple units have to be checked to determine the length, but that's for you to decide. These cases do introduce the second invalid state mentioned above, although this is only an invalid state if your data stream ends prematurely, which you'd want to treat as an error anyway.

The portion of the length bits without the terminating zero is called the length data.

We round up the number of bits in the length bits to a multiple of the unit bits to determine a whole number of length units, which may contain some non-length bits. We call these bits extra data bits since they are not represented in the length data, and merely add to the total number of data bits available. If there are a nonzero number of extra data bits, then the unit containing those bits is called the extra data unit.

The extra data unit is the singular unit which contains both length bits and data bits. If the length bits line up to a multiple of the unit bits, then there is no extra data unit, but usually, there will be one.

Immediately following the length units is some number of data units, equal to the number of bits in the length data. The extra data bits, followed by the data units, constitutes the data bits. If the IOUS does not actually contain enough units to account for the extra units, it is considered invalid.

Conversion

Converting from an IOUS to an IOKS requires excluding the length units and then overwriting the length bits in the extra data unit with either zeroes or a sign extension depending on the unit type. To convert back from an IOKS to an IOUS, however, you will need the space for the length units, so that you have room to write the length bits.

When converting to an IOKS, the length units can be entirely discarded, since they contain no data. However, as mentioned earlier, it's desirable to configure your ceiling and unit size such that there are no entirely-length units, and any units containing length bits will also contain data bits. There is one exception to this, however: if the ceiling is equal to the unit bits, then a maximum-length IOUS will have a single length unit, instead of an extra data unit. Since this case is uncommon, it may make sense to always include this extra unit and simply zero or sign- extend over it.

Loading an IOKS into a system register simply requires padding the IOKS out to the necessary length, then converting the order of the units to the native endianness.

Creating an IOKS requires determining the number of units required, then converting the native endianness back to big-endian units. Once you have an IOKS, you can then mask in the length bits to create an IOUS.

As mentioned in the summary, one benefit of IOUSes is the ability to write fast and simple code for specific cases, and as such, you may not want to use the fully generic version in everyday cases.

License

The IOUS spec is released to the public domain to the greatest extent possible via the CC0 License. Honestly, the spec shouldn't be copyrightable at all, but we live in an awful system.

However, the ious crate and all its code is available under the copyleft Parity License, and must be used under those terms only.

Dependencies

~205KB