What you see as one character might be several codepoints and many bytes. This tool reports four numbers for any input: visible graphemes, UTF-16 code units like JavaScript .length, Unicode codepoints, and UTF-8 bytes. Each number answers a different question, and using the right one matters when you are enforcing a character limit that a storage layer or a display layer cares about.
Character limits on social platforms count graphemes; check your draft against the grapheme total before posting.
A VARCHAR column is usually sized in bytes or codepoints. Choose the right metric to avoid truncation at insert time.
SMS uses GSM-7 or UCS-2 encodings. The byte count flags when your message will split into multiple segments.
A mismatch between grapheme and codepoint counts hints at combining marks or zero-width joiners you may not have noticed.
Graphemes are what a user perceives, codepoints are Unicode scalar values, bytes are the serialized UTF-8 form. All three are shown.
Anything the Unicode segmentation algorithm groups into one cluster, including emoji with modifiers and flags.
Yes. Spaces, tabs, and newlines are characters. If you want to exclude whitespace, trim it first.
Casing does not affect counts. Upper and lowercase letters are one grapheme each.