#utf-8 #iterator #unicode #unicode-encoding

utf8_iter

Iterator by char over potentially-invalid UTF-8 in &[u8]

5 stable releases

1.0.4 Dec 1, 2023
1.0.3 Sep 9, 2022
1.0.1 Jul 19, 2022
1.0.0 Apr 19, 2022

#1377 in Text processing

Download history 4773389/week @ 2025-12-15 2712921/week @ 2025-12-22 2625941/week @ 2025-12-29 4764661/week @ 2026-01-05 5352829/week @ 2026-01-12 5510124/week @ 2026-01-19 5823577/week @ 2026-01-26 6204847/week @ 2026-02-02 6507880/week @ 2026-02-09 5947687/week @ 2026-02-16 6431176/week @ 2026-02-23 7355743/week @ 2026-03-02 7723001/week @ 2026-03-09 7507571/week @ 2026-03-16 7369170/week @ 2026-03-23 7589506/week @ 2026-03-30

30,759,821 downloads per month
Used in 53,283 crates (8 directly)

Apache-2.0 OR MIT

27KB
444 lines

utf8_iter

crates.io docs.rs

utf8_iter provides iteration by char over potentially-invalid UTF-8 &[u8] such that UTF-8 errors are handled according to the WHATWG Encoding Standard.

Iteration by Result<char,Utf8CharsError> is provided as an alternative that distinguishes UTF-8 errors from U+FFFD appearing in the input.

An implementation of char_indices() analogous to the same-name method on str is also provided.

Key parts of the code are copypaste from the UTF-8 to UTF-16 conversion code in encoding_rs, which was optimized for speed in the case of valid input. The implementation here uses the structure that was found to be fast in the encoding_rs context but the structure hasn't been benchmarked in this context.

This is a no_std crate.

Licensing

TL;DR: Apache-2.0 OR MIT

Please see the file named COPYRIGHT.

Documentation

Generated API documentation is available online.

Release Notes

1.0.4

  • Add iteration by Result<char,Utf8CharsError>.

1.0.3

  • Fix an error in documentation.

1.0.2

  • char_indices() implementation.

1.0.1

  • as_slice() method.
  • Implement DoubleEndedIterator

1.0.0

The initial release.

No runtime deps