Skip to content

Breno-S/UTF8_C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTF8_C

The functions in this project leverage the properties of C unions to improve readability and reasoning for a simple UTF-8 codec algorithm.

Unlike structs, unions have their members sharing a single memory space the size of the biggest member. Since you can create unions with any type, an array of smaller integers can span the space of a bigger integer.

This is the main idea implemented here. A 32-bit integer — codepoint — is logically separated into four 8-bit integers — octets. This allows us to pick the full information or only what's useful to the algorithm, thus simplifying bit manipulations and making the code much more readable and intuitive.

About

My experiments with UTF-8 in C, using unions, bit manipulation and narrow characters.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages