"LanguageExtended" (Built-in Classifier)

Identify which natural language the text is in.

Classes

Details

  • This classifier assumes the text input is written in a unique language. The probabilities reflect the belief in which language the text is written, not the proportion of languages.
  • In the current version, all texts must be written in one of their official alphabets.

Examples

open allclose all

Basic Examples  (2)

Determine the languages of a list of examples:

Obtain the probabilities for the most likely languages:

Obtain a ClassifierFunction for this classifier:

Apply the classifier to a list of texts:

Scope  (2)

Load the ClassifierFunction corresponding to the built-in classifier:

Obtain the possible classes:

Load the ClassifierFunction corresponding to the built-in classifier:

Apply the classifier to a list of text samples:

Transliterate sample texts and apply the classifier to the transliterated texts:

Options  (3)

ClassPriors  (1)

Use a custom ClassPriors to restrict the possible outputs:

IndeterminateThreshold  (1)

Use a custom IndeterminateThreshold:

UtilityFunction  (1)

Obtain the utility function of the classifier:

Modify this utility function to penalize being misclassified as:

Classify the text using this new utility:

Compare to the result with the default utility:

Possible Issues  (1)

In some cases, non-language texts are still recognized as one language:

Neat Examples  (1)

Obtain different possible languages with their corresponding probabilities:

Visualize the result using WordCloud:

Introduced in 2020
 (12.1)