WOLFRAM

CharacterNormalize["text",form]

converts the characters in text to the specified normalization form.

Details

  • CharacterNormalize supports the following Unicode normalization forms:
  • "NFD"canonical decomposition (Form D)
    "NFC"canonical decomposition, followed by canonical composition (Form C)
    "NFKD"compatibility decomposition (Form KD)
    "NFKC"compatibility decomposition, followed by canonical composition (Form KC)
  • In CharacterNormalize[text,], text can be a string or a list of strings.
  • In "NFD" and "NFC", canonical decomposition refers to these four type of operations:
  • Å ,decompose marks
    Ȱ Ȱ,decompose and order marks
    한, decompose Hangul and conjoining Jamo
    (Ohm) Ω (Omega),map character to its canonical Unicode equivalent
  • In "NFKD" and "NFKC", compatibility decomposition refers to operations such as:
  • H ,H,normalize font variants
    (NBSP)(Space), normalize linebreaking differences
    ع, ع, normalize positional variants
    1, normalize circled variants
    , normalize width variants
    { ,} , normalize rotated variants
    i⁹ i9,i₉ i9, normalize subscripts/superscripts
    アパート, decompose squared characters
    ¼ 1/4 , normalize fractions
    dždž, other normalizations

Examples

open allclose all

Basic Examples  (5)Summary of the most common use cases

Normalize string characters using canonical decomposition:

Out[1]=1

Normalize string characters using compatibility decomposition:

Out[1]=1

Normalize string characters using compatibility decomposition followed by canonical composition:

Out[1]=1

Normalize string characters using canonical decomposition followed by canonical composition:

Out[1]=1

Normalize the characters in the string using compatibility decomposition:

Out[1]=1

Characters with diacritics have been decomposed:

Out[2]=2

Scope  (2)Survey of the scope of standard use cases

Decompose a composite character into its constituents:

Out[1]=1

Ordering of the mark and the character has changed after normalization:

Out[2]=2

Obtain the "Ohm" character from its code:

Out[1]=1

NFD maps characters to their canonically equivalent Unicode. Normalize the character using NFD:

Out[2]=2

Convert the output (omega) to its character code:

Out[3]=3

Generalizations & Extensions  (1)Generalized and extended use cases

CharacterNormalize threads itself elementwise over lists:

Out[1]=1

CharacterNormalize works on strings of different scripts and letters:

Out[2]=2

Possible Issues  (1)Common pitfalls and unexpected behavior

Compatibility equivalence may convert different forms of a character to a canonical form:

Out[1]=1
Out[2]=2

Compatibility equivalence may remove formatting distinctions that are not changed in canonical equivalent characters:

Out[3]=3
Out[4]=4
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.

Text

Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.

Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.

CMS

Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.

Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.

APA

Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html

Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html

BibTeX

@misc{reference.wolfram_2025_characternormalize, author="Wolfram Research", title="{CharacterNormalize}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/CharacterNormalize.html}", note=[Accessed: 26-March-2025 ]}

@misc{reference.wolfram_2025_characternormalize, author="Wolfram Research", title="{CharacterNormalize}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/CharacterNormalize.html}", note=[Accessed: 26-March-2025 ]}

BibLaTeX

@online{reference.wolfram_2025_characternormalize, organization={Wolfram Research}, title={CharacterNormalize}, year={2020}, url={https://reference.wolfram.com/language/ref/CharacterNormalize.html}, note=[Accessed: 26-March-2025 ]}

@online{reference.wolfram_2025_characternormalize, organization={Wolfram Research}, title={CharacterNormalize}, year={2020}, url={https://reference.wolfram.com/language/ref/CharacterNormalize.html}, note=[Accessed: 26-March-2025 ]}