CharacterNormalize
CharacterNormalize["text",form]
converts the characters in text to the specified normalization form.
Details
- CharacterNormalize supports the following Unicode normalization forms:
-
"NFD" canonical decomposition (Form D) "NFC" canonical decomposition, followed by canonical composition (Form C) "NFKD" compatibility decomposition (Form KD) "NFKC" compatibility decomposition, followed by canonical composition (Form KC) - In CharacterNormalize[text,…], text can be a string or a list of strings.
- In "NFD" and "NFC", canonical decomposition refers to these four type of operations:
-
Å Å,… decompose marks Ȱ Ȱ,… decompose and order marks 한 한, … decompose Hangul and conjoining Jamo Ω(Ohm) Ω (Omega),… map character to its canonical Unicode equivalent - In "NFKD" and "NFKC", compatibility decomposition refers to operations such as:
-
ℌH ,ℍH,… normalize font variants (NBSP)(Space), … normalize linebreaking differences ﻉ ع,ﻊ ع, … normalize positional variants ①1, … normalize circled variants カカ, … normalize width variants ︷{ ,︸} , … normalize rotated variants i⁹ i9,i₉ i9, … normalize subscripts/superscripts ㌀アパート, … decompose squared characters ¼ 1/4 , … normalize fractions dž→dž, … other normalizations
Examples
open allclose allBasic Examples (5)
Normalize string characters using canonical decomposition:
Normalize string characters using compatibility decomposition:
Normalize string characters using compatibility decomposition followed by canonical composition:
Normalize string characters using canonical decomposition followed by canonical composition:
Normalize the characters in the string using compatibility decomposition:
Scope (2)
Generalizations & Extensions (1)
CharacterNormalize threads itself elementwise over lists:
CharacterNormalize works on strings of different scripts and letters:
Text
Wolfram Research (2020), CharacterNormalize, Wolfram Language function, https://reference.wolfram.com/language/ref/CharacterNormalize.html.
CMS
Wolfram Language. 2020. "CharacterNormalize." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/CharacterNormalize.html.
APA
Wolfram Language. (2020). CharacterNormalize. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/CharacterNormalize.html