- 
    Functions
    
- AlphabeticSort
 - Alternatives
 - CharacterCounts
 - CharacterEncoding
 - CharacterNormalize
 - Containing
 - DeleteStopwords
 - DictionaryLookup
 - DictionaryWordQ
 - Entity
 - IgnoreCase
 - Import
 - ImportString
 - Interpreter
 - LanguageIdentify
 - LetterCounts
 - LLMExampleFunction
 - LLMFunction
 - LLMResourceFunction
 - LLMTool
 - PrintableASCIIQ
 - RemoveDiacritics
 - SpellingCorrectionList
 - StringCases
 - StringDelete
 - StringDrop
 - StringExtract
 - StringPadLeft
 - StringPadRight
 - StringReplace
 - StringSplit
 - StringTake
 - StringTrim
 - TextCases
 - TextSentences
 - TextTranslation
 - TextWords
 - ToLowerCase
 - ToUpperCase
 - Transliterate
 - WordCounts
 - WordFrequency
 - WordFrequencyData
 - WordStem
 - WordTranslation
 
 - Related Guides
 - Related Workflow Guides
 - 
    
    
- 
      Functions
      
- AlphabeticSort
 - Alternatives
 - CharacterCounts
 - CharacterEncoding
 - CharacterNormalize
 - Containing
 - DeleteStopwords
 - DictionaryLookup
 - DictionaryWordQ
 - Entity
 - IgnoreCase
 - Import
 - ImportString
 - Interpreter
 - LanguageIdentify
 - LetterCounts
 - LLMExampleFunction
 - LLMFunction
 - LLMResourceFunction
 - LLMTool
 - PrintableASCIIQ
 - RemoveDiacritics
 - SpellingCorrectionList
 - StringCases
 - StringDelete
 - StringDrop
 - StringExtract
 - StringPadLeft
 - StringPadRight
 - StringReplace
 - StringSplit
 - StringTake
 - StringTrim
 - TextCases
 - TextSentences
 - TextTranslation
 - TextWords
 - ToLowerCase
 - ToUpperCase
 - Transliterate
 - WordCounts
 - WordFrequency
 - WordFrequencyData
 - WordStem
 - WordTranslation
 
 - Related Guides
 - Related Workflow Guides
 
 - 
      Functions
      
 
Text Normalization
The Wolfram Language provides powerful knowledge-based tools for normalizing text in preparation for text analysis, visualization, etc.
Character-Level Normalization
ToLowerCase, ToUpperCase — convert all characters to lower, uppercase
IgnoreCase — option to ignore case of letters
RemoveDiacritics — remove diacritics such as accents, umlauts, etc.
CharacterNormalize — reduce or decompose characters to normal forms (e.g. ¼  1⁄4, ï ī )
Transliterate — transliterate to ASCII or other writing scripts
PrintableASCIIQ — test if a string contains only printable ASCII characters
CharacterEncoding — specify the character encoding to assume
Structural String Normalization
StringSplit — split a string at newlines or other delimiters
StringDelete — delete substrings or patterns
StringReplace — replace substrings or patterns
StringDrop ▪ StringTake ▪ StringCases
StringTrim — trim whitespace or other patterns from strings
StringPadLeft, StringPadRight — pad to fixed width
StringExtract — extract specified parts of strings
Text-Level Normalization
TextSentences — extract a list of sentences
TextWords — extract a list of words
DeleteStopwords — delete standard stopwords ("the", "and", etc.)
Content Extraction
TextCases — extract symbolically specified elements
Containing ▪ Alternatives ▪ Entity
Morphological & Linguistic Normalization
WordStem — reduce a word to its stem
DictionaryLookup — look up a word in dictionaries
Interpreter — convert to many forms from natural language
SpellingCorrectionList — list of spelling suggestions for misspelled words
DictionaryWordQ — test if a word is a correctly spelled dictionary word
Language Translation
LanguageIdentify — identify what language a text is in
WordTranslation — give translations for a word
TextTranslation — translate text using an integrated external service
Word List Normalization
AlphabeticSort — sort strings into alphabetic order
WordCounts ▪ LetterCounts ▪ CharacterCounts
WordFrequency — frequency of words or 
-grams in text
WordFrequencyData — data on overall word frequencies in typical text
LLM-Based Normalization »
LLMResourceFunction — apply operations from the Wolfram Prompt Repository
LLMExampleFunction ▪ LLMFunction ▪ LLMTool ▪ ...
Normalization of External Data
Import — import data from files or the web
"Text", "PDF", "TeX", "HTML" — pick out plain text, table data, etc.
ImportString — convert a string with a particular external format