CharacterNormalize
✖
CharacterNormalize
converts the characters in text to the specified normalization form.
Details

- CharacterNormalize supports the following Unicode normalization forms:
-
"NFD" canonical decomposition (Form D) "NFC" canonical decomposition, followed by canonical composition (Form C) "NFKD" compatibility decomposition (Form KD) "NFKC" compatibility decomposition, followed by canonical composition (Form KC) - In CharacterNormalize[text,…], text can be a string or a list of strings.
- In "NFD" and "NFC", canonical decomposition refers to these four type of operations:
-
Å Å,… decompose marks Ȱ Ȱ,… decompose and order marks 한 한, … decompose Hangul and conjoining Jamo Ω(Ohm) Ω (Omega),… map character to its canonical Unicode equivalent - In "NFKD" and "NFKC", compatibility decomposition refers to operations such as:
-
ℌH ,ℍH,… normalize font variants (NBSP)(Space), … normalize linebreaking differences ﻉ ع,ﻊ ع, … normalize positional variants ①1, … normalize circled variants カカ, … normalize width variants ︷{ ,︸} , … normalize rotated variants i⁹ i9,i₉ i9, … normalize subscripts/superscripts ㌀アパート, … decompose squared characters ¼ 1/4 , … normalize fractions dž→dž, … other normalizations
Examples
open allclose allBasic Examples (5)Summary of the most common use cases
Normalize string characters using canonical decomposition:

https://wolfram.com/xid/05fcn43fok-f3153s

Normalize string characters using compatibility decomposition:

https://wolfram.com/xid/05fcn43fok-py8wko

Normalize string characters using compatibility decomposition followed by canonical composition:

https://wolfram.com/xid/05fcn43fok-sd1qqy

Normalize string characters using canonical decomposition followed by canonical composition:

https://wolfram.com/xid/05fcn43fok-d8jb7v

Normalize the characters in the string using compatibility decomposition:

https://wolfram.com/xid/05fcn43fok-wwcsrq

Characters with diacritics have been decomposed:

https://wolfram.com/xid/05fcn43fok-820s9y

Scope (2)Survey of the scope of standard use cases
Decompose a composite character into its constituents:

https://wolfram.com/xid/05fcn43fok-9a8snb

Ordering of the mark and the character has changed after normalization:

https://wolfram.com/xid/05fcn43fok-3a8r0h

Obtain the "Ohm" character from its code:

https://wolfram.com/xid/05fcn43fok-kwo17l

NFD maps characters to their canonically equivalent Unicode. Normalize the character using NFD:

https://wolfram.com/xid/05fcn43fok-qdm8iq

Convert the output (omega) to its character code:

https://wolfram.com/xid/05fcn43fok-bnul0u

Generalizations & Extensions (1)Generalized and extended use cases
CharacterNormalize threads itself elementwise over lists:

https://wolfram.com/xid/05fcn43fok-5xvlby

CharacterNormalize works on strings of different scripts and letters:

https://wolfram.com/xid/05fcn43fok-no7g73

Possible Issues (1)Common pitfalls and unexpected behavior
Compatibility equivalence may convert different forms of a character to a canonical form:

https://wolfram.com/xid/05fcn43fok-56axxd


https://wolfram.com/xid/05fcn43fok-ee364p

Compatibility equivalence may remove formatting distinctions that are not changed in canonical equivalent characters:

https://wolfram.com/xid/05fcn43fok-2ikh48


https://wolfram.com/xid/05fcn43fok-cd1ss9

Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
Text
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
CMS
Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.
Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.
APA
Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html
Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html
BibTeX
@misc{reference.wolfram_2025_characternormalize, author="Wolfram Research", title="{CharacterNormalize}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/CharacterNormalize.html}", note=[Accessed: 26-March-2025
]}
BibLaTeX
@online{reference.wolfram_2025_characternormalize, organization={Wolfram Research}, title={CharacterNormalize}, year={2020}, url={https://reference.wolfram.com/language/ref/CharacterNormalize.html}, note=[Accessed: 26-March-2025
]}