unicode.
paste anything: see grapheme segmentation, codepoints, utf-8 byte lengths, and all four normalization forms. reveals the gap between "what i typed" and "what the computer sees".
graphemes
36
what you read as 'characters'
chars
39
js string .length (utf-16 units)
codepoints
38
utf-8 bytes
67
ascii safe
no
P
U+0050
o
U+006F
k
U+006B
é
U+00E9
m
U+006D
o
U+006F
n
U+006E
·
U+0020
🦁⬛
U+1F981U+200DU+2B1B
⚭ 3·
U+0020
—
U+2014
·
U+0020
c
U+0063
a
U+0061
f
U+0066
é
U+00E9
·
U+0020
·
U+00B7
·
U+0020
한
U+D55C
글
U+AE00
·
U+0020
·
U+00B7
·
U+0020
你
U+4F60
好
U+597D
·
U+0020
·
U+00B7
·
U+0020
ا
U+0627
ل
U+0644
ع
U+0639
ر
U+0631
ب
U+0628
ي
U+064A
ة
U+0629
── normalization forms
NFCcanonical composition
Pokémon 🦁⬛ — café · 한글 · 你好 · العربية67 bytes= input
NFDcanonical decomposition
Pokémon 🦁⬛ — café · 한글 · 你好 · العربية81 bytes≠ input
NFKCcompatibility composition
Pokémon 🦁⬛ — café · 한글 · 你好 · العربية67 bytes= input
NFKDcompatibility decomposition
Pokémon 🦁⬛ — café · 한글 · 你好 · العربية81 bytes≠ input