Intro
UTF-8 (Unicode Transformation Format - 8 bits) is a variable-length character encoding (1 - 4 bytes). UTF-8 is backwards-compatible with ASCII and is the preferred encoding for web pages.
UTF-8 Byte Examples
Character Unicode UTF-8 Bytes (hex) A U+0041 41 é U+00E9 C3 A9 あ U+3042 E3 81 82 😀 U+1F600 F0 9F 98 80
- ASCII characters (U+0000 to U+007F) → 1 byte
- Characters beyond ASCII → 2, 3, or 4 bytes depending on the code point
- Unicode is a character set. It is a list where all characters have a unique number:
A = 65 B = 66 C = 67 D = 68The decimal numbers that represent the string
"hello":h e l l o 104 101 108 108 111 Binary (UTF-8): 01101000 01100101 01101100 01101100 01101111
- UTF-8 is an encoding. It is how Unicode numbers are translated into bytes for storage and transmission.




