Wolfram Language & System Documentation Center

FeatureExtraction

FeatureExtraction[examples]

generates a FeatureExtractorFunction[…] trained from the examples given.

FeatureExtraction[examples,spec]

uses the specified feature extractor method spec.

FeatureExtraction[examples,spec,props]

gives the feature extraction properties specified by props.

Details and Options

FeatureExtraction is typically used to define a function that processes raw data into usable features (e.g. for training a machine learning algorithm).
FeatureExtraction can be used on many types of data, including numerical, textual, sounds, images, graphs and time series, as well as combinations of these.
Possible values of examples are:
{example₁,…} a list of training examples

Dataset[…] a Dataset object

Tabular[…] a Tabular object

None no training examples
Each example_i can be a single data element, a list of data elements or an association of data elements.
Possible values for spec include:

	extractor	use the specified extractor method
	partextractor	apply the extractor to the specific example part
	{part₁extractor₁,…}	specify extractors for specific parts

Possible feature extractor methods extractor include:

	Automatic	automatic extraction
	Identity	give data unchanged
	"ConformedData"	conformed images, colors, dates, etc.
	"NumericVector"	numeric vector from any data
	"name"	a named extractor method
	f	applies function f to each example
	{extractor₁,extractor₂,…}	use a sequence of extractors in turn

Possible forms of part are:

	All	all parts of each example
	i	i part of each example
	{i₁,i₂,…}	parts i₁, i₂, … of each example
	"key"	part with the specified key in each example
	{"key₁","key₂",…}	parts with names "key_i" in each example

When explicitly specifying parts, any unmentioned parts are dropped when extracting features.

Extractors

FeatureExtraction[examples] is equivalent to FeatureExtraction[examples,Automatic], which is typically equivalent to FeatureExtraction[examples,"NumericVector"].
The "NumericVector" method will typically convert examples to numeric vectors, impute missing data and reduce the dimension using DimensionReduction.
Feature extractor methods specific to a single data type are applied only to data elements with whose types they are compatible. Other data elements are returned unchanged.
Not all specific feature extractors are available when the examples is None.
The specific extractors are:
Numeric data:

	"DiscretizedVector"	discretized numerical data
	"DimensionReducedVector"	reduced-dimension numeric vectors
	"MissingImputed"	data with missing values imputed
	"StandardizedVector"	numeric data processed with Standardize

Nominal data:
"IndicatorVector" nominal data "one-hot encoded" with indicator vectors

"IntegerVector" nominal data encoded with integers
Text:

	"LowerCasedText"	text with each character lowercase
	"SegmentedCharacters"	text segmented into characters
	"SegmentedWords"	text segmented into words
	"SentenceVector"	semantic vector from a text
	"TFIDF"	term frequency-inverse document frequency vector
	"WordVectors"	semantic vectors sequence from a text (English only)

Images:

	"FaceFeatures"	semantic vector from an image of a human face
	"ImageFeatures"	semantic vector from an image
	"PixelVector"	vector of pixel values from an image

Audio objects:

	"AudioFeatures"	sequence of semantic vectors from an audio object
	"AudioFeatureVector"	semantic vector from an audio object
	"LPC"	audio linear prediction coefficients
	"MelSpectrogram"	audio spectrogram with logarithmic frequencies bins
	"MFCC"	audio mel-frequency cepstral coefficients vectors sequence
	"SpeakerFeatures"	sequence of semantic speaker vectors
	"SpeakerFeatureVector"	semantic vector for a speaker
	"Spectrogram"	audio spectrogram

Video objects:
"VideoFeatures" sequence of semantic vectors from a video object

"VideoFeatureVector" semantic vector from a video object
Graphs:
"GraphFeatures" numeric vector summarizing graph properties
Molecules:

	"AtomPairs"	Boolean vector from pairs of atoms and the path lengths between them
	"MoleculeExtendedConnectivity"	Boolean vector from enumerated molecule subgraphs
	"MoleculeFeatures"	numeric vector summarizing molecule properties
	"MoleculeTopologicalFeatures"	Boolean vector from circular atom neighborhoods

Properties

In FeatureExtraction[examples,extractors,props], props can be a single property or a list of properties. Possible properties include:

	"ExtractorFunction"	FeatureExtractorFunction[…] (default)
	"ExtractedFeatures"	examples after feature extraction
	"ReconstructedData"	examples after extraction and inverse extraction
	"FeatureDistance"	FeatureDistance[…] generated from the extractor

The "ExtractedFeatures" and "ReconstructedData" properties are not available when examples is None.
The "ReconstructedData" property can be computed only when every specified extractor is invertible.

Options

The following options can be given:

FeatureNames	Automatic	names to assign to elements of the example_i
FeatureTypes	Automatic	feature types to assume for elements of the example_i
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Examples

open all close all

Basic Examples (3)

Train a FeatureExtractorFunction on a simple dataset:

Wolfram Language code: fe = FeatureExtraction[{{1.4, "A"}, {1.5, "A"}, {2.3, "B"}, {5.4, "B"}}]

Extract features from a new example:

Wolfram Language code: fe[{2.4, "A"}]

Extract features from a list of examples:

Wolfram Language code: fe[{{2.4, "A"}, {3.7, "B"}}]

Train a feature extractor on a dataset of images:

Wolfram Language code: fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}]

Use the feature extractor on the training set:

Wolfram Language code: fe[{[image], [image], [image], [image], [image], [image]}]

Specify a specific extractor:

Wolfram Language code: fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}, "ImageFeatures"]

Scope (32)

Input Shape (9)

Train a feature extractor on a list of examples with a single feature:

Wolfram Language code:

fe = FeatureExtraction[{"It was the best of times.", "A journey of a thousand miles begins with a single step.", "To be or not to be, that is the question."}]

Extract features from a new example:

Wolfram Language code: fe["A rose by any other name"]

Extract features from multiple new examples:

Wolfram Language code: fe[{"A rose by any other name", "It was the worst of times"}]

Train a feature extractor on a list of examples with multiple features:

Wolfram Language code:

fe = FeatureExtraction[{{"It was the best of times.", "Charles Dickens"}, {"A journey of a thousand miles begins with a single step.", "Laozi (attrib.)"}, {"To be or not to be, that is the question.", "William Shakespere"}}]

Extract features from multiple new examples:

Wolfram Language code:

fe[{{"All the world’s a stage, and all the men and women merely players", "William Shakespere"}, {"Knowing others is intelligence; knowing yourself is true wisdom", "Laozi (attrib.)"}}]

Train a feature extractor on a mixed-type dataset:

Wolfram Language code:

fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}]

Extract features from a new example:

Wolfram Language code: fe[{"the nice cat", [image]}]

Train a feature extractor from a list of associations:

Wolfram Language code:

fe = FeatureExtraction[{<|"age" -> 32, "height" -> 160, "gender" -> "female"|>, 
	<|"height" -> 183, "age" -> 41, "gender" -> "female"|>, 
	<|"height" -> 123, "age" -> 30, "gender" -> "female"|>, 
	<|"height" -> 175, "age" -> 21, "gender" -> "male"|>, 
	<|"height" -> 150, "age" -> 11, "gender" -> "male"|>, 
	<|"age" -> 52, "height" -> 164, "gender" -> "female"|>}]

Extract features from a new example:

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

Extract features from multiple new examples:

Wolfram Language code: fe[{<|"age" -> 19, "height" -> 176, "gender" -> "male"|>, <|"age" -> 45, "height" -> 164, "gender" -> "female"|>}]

Train a feature extractor from data given as feature lists:

Wolfram Language code:

FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

Train a feature extractor from a Tabular:

Wolfram Language code:

FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["age" -> Association["ElementType" -> "Integer64"], 
      "height" -> Association["ElementType" -> "Integer64"], 
      "gender" -> Association["ElementType" -> "String"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{32, 41, 30, 21, 11, 52}, {}, None}, 
          "ElementType" -> "Integer64"]], TabularColumn[Association[
          "Data" -> {{160, 183, 123, 175, 150, 164}, {}, None}, "ElementType" -> "Integer64"]], 
        TabularColumn[Association["Data" -> {{3, {0, 6, 12, 18, 22, 26, 32}, 
             "femalefemalefemalemalemalefemale"}, {}, None}, "ElementType" -> "String"]]}}]]]]]

Train a feature extractor from a Dataset:

Wolfram Language code:

FeatureExtraction[Dataset[{Association["age" -> 32, "height" -> 160, "gender" -> "female"], 
  Association["age" -> 41, "height" -> 183, "gender" -> "female"], 
  Association["age" -> 30, "height" -> 123, "gender" -> "female"], 
  Association["age" -> 21, "height" -> 175, "gender" -> "male"], 
  Association["age" -> 11, "height" -> 150, "gender" -> "male"], 
  Association["age" -> 52, "height" -> 164, "gender" -> "female"]}]]

Train a feature extractor from a dataset that contains missing values:

Wolfram Language code: FeatureExtraction[{{1.4, Missing[], "A"}, {1.5, 50.2, "A"}, {Missing[], 42.3, "B"}, {5.4, 61.7, "B"}}]

Define a feature extractor that requires no training:

Wolfram Language code: fe = FeatureExtraction[None, "WordVectors"]

Apply it on some text:

Wolfram Language code: fe["hi there"]//Shallow

Extractor Specifications (10)

Specify the feature extractor "SentenceVector" on a single textual feature:

Wolfram Language code: fe = FeatureExtraction[{"the lizard is green"}, "SentenceVector"]

Apply it on some text:

Wolfram Language code: fe[{"there is a large cat"}]//Short

Train a feature extractor using the "StandardizedVector" method:

Wolfram Language code: fe = FeatureExtraction[{{1.4, 42.1}, {1.5, 50.2}, {4.2, 42.3}, {5.4, 61.7}}, "StandardizedVector"]

Extract features from a new example:

Wolfram Language code: features = fe[{6.4, 32.1}]

Since this feature extractor is invertible, the FeatureExtractorFunction property "OriginalData" can be used to perform the inverse extraction:

Wolfram Language code: fe[features, "OriginalData"]

Train a feature extractor on text using the "TFIDF" method followed by the "DimensionReducedVector" method:

Wolfram Language code:

fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"TFIDF", "DimensionReducedVector"}]

Extract features on the training set:

Wolfram Language code: fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Train a feature extractor on texts and images using the text-only "TFIDF" method:

Wolfram Language code:

fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}, "TFIDF"]

Features will only be extracted from the text part:

Wolfram Language code: fe[{"the nice cat", [image]}]

Specify the feature extraction on multiple features by position:

Wolfram Language code:

fe = FeatureExtraction[{{"Glucose", Molecule["Glucose"]}, {"Water", Molecule["Water"]}, {"Acetic Acid", Molecule["Acetic Acid"]}}, {1  -> { "SentenceVector", "DimensionReducedVector"}, 2  -> "MoleculeFeatures"}]

Use the feature extractor on new features:

Wolfram Language code: fe[{{"Sucrose", Molecule["Sucrose"]}}]

A list of two items will be assumed to be a single input of two features:

Wolfram Language code: fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

Train a feature extractor with the "IndicatorVector" method on only the second nominal variable:

Wolfram Language code: fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, 2 -> "IndicatorVector"]

The first nominal variable is dropped:

Wolfram Language code: Normal@fe[{"Yes", "A"}]

Use the Identity extractor method to copy the first variable:

Wolfram Language code:

fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> Identity}]

The first variable is copied:

Wolfram Language code: fe[{"Yes", "A"}]

A variable can be copied multiple times:

Wolfram Language code:

fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> "IndicatorVector", 1 -> Identity}]

Wolfram Language code: fe[{"Yes", "A"}]

Specify the feature extraction on multiple features by key:

Wolfram Language code:

fe = FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["Name" -> Association["ElementType" -> "String"], 
      "Molecule" -> Association["ElementType" -> "InertExpression"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{3, {0, 7, 12, 23}, "GlucoseWaterAcetic Acid"}, {}, 
            None}, "ElementType" -> "String"]], TabularColumn[
         Association["Data" -> {{Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", 
               "O", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H"}, 
              {Bond[{1, 2}, "Double"], Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, 
                "Single"], Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
               Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], Bond[
                {11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], Bond[
                {4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], Bond[{7, 18}, 
                "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], Bond[{10, 21}, 
                "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], Bond[{12, 24}, 
                "Single"]}, {StereochemistryElements -> {Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], Association[
                  "StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> "Clockwise"], 
                 Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, "Direction" -> 
                   "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
             Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
             Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[
                {2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], Bond[{1, 6}, 
                "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}, {}, None}, 
          "ElementType" -> "InertExpression", "CachedOriginalExpression" -> 
           {Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", "O", "H", "H", "H", 
              "H", "H", "H", "H", "H", "H", "H", "H", "H"}, {Bond[{1, 2}, "Double"], 
              Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, "Single"], 
              Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
              Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], 
              Bond[{11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], 
              Bond[{4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], 
              Bond[{7, 18}, "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], 
              Bond[{10, 21}, "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], 
              Bond[{12, 24}, "Single"]}, {StereochemistryElements -> {Association["StereoType" -> 
                  "Tetrahedral", "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], 
                Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> 
                  "Clockwise"], Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, 
                 "Direction" -> "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                 "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
            Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
            Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], 
              Bond[{2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], 
              Bond[{1, 6}, "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}]]}}]]]],   {"Name"  -> { "SentenceVector", "DimensionReducedVector"}, "Molecule"  -> "MoleculeFeatures"}]

Use the feature extractor on new features:

Wolfram Language code: fe[<|"Name" -> "Hydrochloric Acid", "Molecule" -> Molecule["Hydrochloric Acid"]|>]

Using the feature extractor on a list will assume the same ordering of features as originally specified:

Wolfram Language code: fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

Generate a feature extractor using a custom function:

Wolfram Language code:

data = {DateObject[{2014, 5, 5}, TimeObject[{9, 53, 6.30158}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2000, 1, 1}, TimeObject[{0, 0, 0.}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2006, 12}], DateObject[{2007, 8, 23}], DateObject[{2016, 4, 4}, TimeObject[{15, 59, 18.2738}, TimeZone -> -4.], TimeZone -> -4.]};

Wolfram Language code: fe = FeatureExtraction[data, {AbsoluteTime[#], #["Year"]}&]

Apply the extractor on the training set:

Wolfram Language code: fe[data]

Chain the custom extractor with the "StandardizedVector" method:

Wolfram Language code: fe2 = FeatureExtraction[data, {{AbsoluteTime[#], #["Year"]}&, "StandardizedVector"}]

Wolfram Language code: fe2[data]

Conform data prior to processing:

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, {"ConformedData", "ImageFeatures"}]

Reduce the dimensionality of the output:

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, {"ImageFeatures", "DimensionReducedVector"}]

Feature Types (10)

Create a feature extractor for textual data using the "SentenceVector" extractor with no training:

Wolfram Language code: fe = FeatureExtraction[None, "SentenceVector"]

Input type is inferred from the specified extractor. Use the feature extractor on some examples:

Wolfram Language code: fe[{"it is not a cat", "what a nice dog", "here is a dog again"}]//Short

Create a feature extractor for examples with implicit textual and image features:

Wolfram Language code: fe = FeatureExtraction[None, {1 -> "SentenceVector", 2 -> "ImageFeatures"}]

Features will be extracted from both parts:

Wolfram Language code: fe[{"the nice cat", [image]}]//Short

Train a feature extractor on textual data:

Wolfram Language code: FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Train a feature extractor with the "IndicatorVector" method on nominal variables:

Wolfram Language code: FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, "IndicatorVector"]

Train a feature extractor to compute term frequency-inverse document frequency vectors from texts:

Wolfram Language code: fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

The term frequency-inverse document frequency matrix of the training set can be computed in a SparseArray:

Wolfram Language code: matrix = fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Visualize the matrix:

Wolfram Language code: matrix//MatrixPlot

The "TFIDF" method can also be used on tokenized data (nominal bags):

Wolfram Language code:

FeatureExtraction[{{"the", "cat", "is", "grey"}, {"my" , "cat", "is", "fast"}, {"this", "dog", "is", "scary"}, {"the", "big", "dog"}}, "TFIDF", "ExtractedFeatures"]

Train a feature extractor on a list of DateObject instances:

Wolfram Language code:

fe = FeatureExtraction[{DateObject[{2014, 5, 5, 9, 53, 6.30158}, "Instant", "Gregorian", -6.], DateObject[{2000, 1, 1, 0, 0, 0.}, "Instant", "Gregorian", -6.], DateObject[{2006, 12}, "Month", "Gregorian", -6.], DateObject[{2007, 8, 23}, "Day", "Gregorian", -6.], DateObject[{2016, 4, 4, 15, 59, 18.2738}, "Instant", "Gregorian", -4.]}]

Extract features from a new DateObject:

Wolfram Language code: fe[DateObject[{2003, 1, 2}, "Day", "Gregorian", -6.]]

A string date can also be given:

Wolfram Language code: fe["2nd of January 2003"]

Train a feature extractor on a list of Graph instances:

Wolfram Language code:

fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]}]

Extract features from a new graph:

Wolfram Language code: fe[[image]]

Train a feature extractor on a list of TimeSeries instances:

Wolfram Language code:

FeatureExtraction[{TemporalData[TimeSeries, {{{0, 1, 0, 3, 0, 0, 0, 0, 2, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 
    0, 0, 0, 0, 0, 2, 0, 0, 36, 0, 6, 1, 2, 8, 6, 24, 20, 31, 68, 45, 140, 116, 65, 376, 322, 382, 
    516, 544, 767, 1133, 1788, 1360, 5886, 5412 ... tion[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, {DateFunction -> Automatic, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 23, 2, 1, 3, 5, 4, 13, 6, 11, 9, 20, 11, 6, 
    23, 14, 38, 50, 86, 66, 103, 37, 121, 70, 1 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 9, 0, 7, 5, 6, 7, 14, 99, 0, 11, 38, 
    121, 51, 249, 172, 228, 572, 331, 323, 307,  ... TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, 
    {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 9, 0, 4, 0, 3, 0, 8, 17, 14, 4, 27, 
    24, 33, 52, 54, 53, 61, 71, 57, 163, 182, 196 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 6, 0, 4, 9, 12, 20, 11, 
    28, 9, 26, 68, 35, 46, 101, 92, 21, 48, 69 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2]}]

Train a feature extractor on Molecule data:

Wolfram Language code: FeatureExtraction[{Molecule["Sucrose"], Molecule["Pentane"]}]

Train a feature extractor on a list of Audio instances:

Wolfram Language code:

FeatureExtraction[{Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Violin"], {i, 0, 12}]]], Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Trumpet"], {i, 0, 12}]]]}]

Information (3)

Get Information from a trained FeatureExtractorFunction:

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]]

Find the available properties:

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], "Properties"]

Get information about the input and output types:

Wolfram Language code:

Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], {"InputTypes", "OutputTypes"}]

Options (4)

FeatureNames (2)

Train a feature extractor and give a name to each feature:

Wolfram Language code:

fe = FeatureExtraction[{{2.3, "male"}, {4.8, Missing[]}, {Missing[], "female"}, {5.2, "female"}}, FeatureNames -> {"age", "gender"}]

Use the association format to extract features from a new example:

Wolfram Language code: fe[<|"age" -> 3.3, "gender" -> "male"|>]

The list format can still be used:

Wolfram Language code: fe[{3.3, "male"}]

Use FeatureNames to set up names and refer to them in FeatureExtraction[examples,{spec₁ext₁,…}]:

Wolfram Language code:

fe = FeatureExtraction[{{"A", "female"}, {"B", "female"}, {"C", "male"}, {"B", "male"}}, {"class" -> Identity, "gender" -> "IndicatorVector"}, FeatureNames -> {"class", "gender"}]

Extract features on a new example using the names to specify the features:

Wolfram Language code: fe[<|"gender" -> "female", "class" -> "B"|>]

FeatureTypes (2)

Train a feature extractor with the "IndicatorVector" method on a simple dataset:

Wolfram Language code: fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector"]

As "IndicatorVector" method only acts on nominal features, the first feature has been assumed to be nominal.

Use FeatureTypes to enforce the interpretation of the first feature as numerical:

Wolfram Language code:

fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector", FeatureTypes -> <|1 -> "Numerical"|>]

The first feature has been interpreted as numerical. Since the "IndicatorVector" method only acts on nominal features, the first feature will be unchanged:

Wolfram Language code: fe[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}]

Creating a feature extractor with no training infers the expected data type from the specific extractor:

Wolfram Language code: fe = FeatureExtraction[None, "SentenceVector"]

Specifying the feature type will override the assumption:

Wolfram Language code: fe2 = FeatureExtraction[None, "SentenceVector", FeatureTypes -> {"first" -> "Numerical", "second" -> "Text"}]

Apply to named features:

Wolfram Language code: fe2[<|"first" -> 1, "second" -> "Good morning!"|>]//Short

Applications (3)

Image Search (1)

Construct a dataset of dog images:

Wolfram Language code:

dataset = {[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]};

Train an extractor function from this dataset:

Wolfram Language code: fe = FeatureExtraction[dataset]

Generate a NearestFunction on the extracted features of the dataset:

Wolfram Language code: nf = Nearest[fe[dataset] -> Automatic]

Using the NearestFunction, construct a function that displays the nearest image of the dataset:

Wolfram Language code: nearestdog = dataset[[First@nf[fe[#]]]]&

Use this function on images that are not in the dataset:

Wolfram Language code: nearestdog[[image]]

This feature extractor function can also be used to delete image pairs that are too similar:

Wolfram Language code: features = # -> fe[#]& /@ dataset;

Wolfram Language code: First /@ DeleteDuplicates[features, CosineDistance[#1[[2]], #2[[2]]] < 1 &]

Text Search (1)

Load the text of Alice in Wonderland:

Wolfram Language code: alice = ExampleData[{"Text", "AliceInWonderland"}];

Split the text into sentences:

Wolfram Language code: sentences = TextSentences[alice];

Train a feature extractor on these sentences:

Wolfram Language code: fe = FeatureExtraction[sentences]

Generate a NearestFunction with the sentences' features:

Wolfram Language code: nf = Nearest[fe[sentences] -> Automatic]

Using the NearestFunction, construct a function that displays the nearest sentence in Alice in Wonderland:

Wolfram Language code: nearestalice = sentences[[First@nf[fe[#]]]]&

Use this function with a few queries:

Wolfram Language code: nearestalice["Alice and the Rabbit"]

Wolfram Language code: nearestalice["The cat and the Queen"]

Wolfram Language code: nearestalice["Off her head"]

Imputation (1)

Load the "MNIST" dataset from ExampleData and keep the images:

Wolfram Language code: digits = First /@ ExampleData[{"MachineLearning", "MNIST"}, "TestData"];

Wolfram Language code: RandomSample[digits, 10]

Convert images to numerical data and separate the dataset into a training set and a test set:

Wolfram Language code: digits = Flatten[ImageData[#]]& /@ RandomSample[digits];

Wolfram Language code:

trainingset = digits[[ ;; 9000]];
testset = digits[[9001 ;; ]];

The dimension of the dataset is 784:

Wolfram Language code: Dimensions[trainingset]

Create a feature extractor using the "MissingImputed" method:

Wolfram Language code: fe = FeatureExtraction[trainingset, "MissingImputed"]

Replace some values of a test-set vector by Missing[] and visualize it:

Wolfram Language code: vector = RandomChoice[testset];

Wolfram Language code: toimage = Image[Partition[#, 28], ImageSize -> Tiny] &;

Wolfram Language code:

vectormissing = vector;
vectormissing[[309 ;; 364]] = Missing[];
imagemissing = toimage[Replace[vectormissing, _Missing -> .5, {1}]]

Impute missing values using the FeatureExtractorFunction[…]:

Wolfram Language code: imputedimage = toimage[fe[vectormissing]]

Visualize the original image, the image with missing values, and the imputed image:

Wolfram Language code: {toimage[vector], imagemissing, imputedimage}

Properties & Relations (4)

Train a feature extractor from data with named features:

Wolfram Language code:

fe = FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

Unrecognized keys will be ignored:

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

Wolfram Language code: fe[<|"age" -> 19, "height" -> 176, "gender" -> "male", "weight" -> 32|>]

FeatureExtraction[…,"ExtractedFeatures"] is equivalent to FeatureExtract[…]:

Wolfram Language code: data = {"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"};

Wolfram Language code: FeatureExtraction[data, "TFIDF", "ExtractedFeatures"] == FeatureExtract[data, "TFIDF"]

The "FeatureDistance" property is equivalent to using FeatureDistance on the extractor:

Wolfram Language code:

fd1 = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF", "FeatureDistance"]

Compute the FeatureExtractorFunction first:

Wolfram Language code: fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

Construct a feature distance for this feature extractor:

Wolfram Language code: fd2 = FeatureDistance[fe]

The two distance functions are identical:

Wolfram Language code: fd1["the cat is grey", "the big dog"]

Wolfram Language code: fd2["the cat is grey", "the big dog"]

Creating a FeatureExtractorFunction on some training data creates a feature space representing those features:

Wolfram Language code: fe = FeatureExtraction[{Molecule["Glucose"], Molecule["Sucrose"]}]

Using different training data can result in a sized feature space:

Wolfram Language code:

molecules = Molecule /@ EntityList[EntityClass["Chemical", "AminoAcids"]];
fe2 = FeatureExtraction[molecules]

Wolfram Language code: fe[Molecule["Water"]] === fe2[Molecule["Water"]]

Creating the same item with no data will result in a untrained function that will consistently give the same results in the same feature space:

Wolfram Language code: fe3 = FeatureExtraction[None, "MoleculeFeatures"]

Possible Issues (7)

Training an extractor on anonymous data will use automatic feature names:

Wolfram Language code: feAutomatic = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector"]

Wolfram Language code: Information[feAutomatic, "FeatureNames"]

Using custom names when applying the function will give a feature missing error:

Wolfram Language code: feAutomatic[<|"Letter" -> "S"|>]

Feature names can be specified at training time:

Wolfram Language code:

feNamed = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector", FeatureNames -> "Letter"]

Check the feature names of a FeatureExtractorFunction:

Wolfram Language code: Information[feNamed, "FeatureNames"]

The custom name can now be used:

Wolfram Language code: feNamed[<|"Letter" -> "S"|>]

The FeatureExtraction property "ReconstructedData" can be used to obtain the data after extraction and reconstruction:

Wolfram Language code:

FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DimensionReducedVector", "ReconstructedData"]

Some feature extractors can only perform an approximation of the inverse extraction:

Wolfram Language code:

fe = FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DiscretizedVector", "ReconstructedData"]

Some feature extractors cannot be inverted:

Wolfram Language code: FeatureExtraction[{[image], [image], [image], [image]}, "ImageFeatures", "ReconstructedData"]

The property "ReconstructedData" cannot be used without training data:

Wolfram Language code: FeatureExtraction[None, "DimensionReducedVector", "ReconstructedData"]

Some extractors can be created without needing data:

Wolfram Language code: FeatureExtraction[None, "LowerCasedText"]

Others require examples to initialize them:

Wolfram Language code: FeatureExtraction[None, "StandardizedVector"]

Similarity, not all properties are supported:

Wolfram Language code: FeatureExtraction[None, "LowerCasedText", "FeatureDistance"]

Extractors that do not match the data type are ignored:

Wolfram Language code:

fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}]

The input type is "Text", so the "IndicatorVector" extractor ignores the input type:

Wolfram Language code: fe["Yes"] == fe["yes"]

Similarly, forcing the input to "Nominal" will cause the "LowerCasedText" to be ignored:

Wolfram Language code:

fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}, FeatureTypes -> "Nominal"]

Wolfram Language code: fe["Yes"]

The "ConformedData" extractor requires additional information to operate in a data-free context:

Wolfram Language code: FeatureExtraction[None, "ConformedData"]

Specifying the FeatureTypes explicitly:

Wolfram Language code: FeatureExtraction[None, "ConformedData", FeatureTypes -> "Image"]

The feature type can also be implicitly inferred from subsequent extractors:

Wolfram Language code: FeatureExtraction[None, {"ConformedData", "ImageFeatures"}]

The automatic feature extraction often applies a dimension reduction step:

Wolfram Language code: fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}]

Explicit feature extractors do not include dimensional reduction and typically result in longer vectors:

Wolfram Language code: fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, "SentenceVector"]

Use the "DimensionReducedVector" to add a dimension reduction step:

Wolfram Language code:

fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, {"SentenceVector", "DimensionReducedVector"}]

Dimension reduction must be trained on the available features and therefore cannot be applied when no data is provided:

Wolfram Language code: fe = FeatureExtraction[None, {"SentenceVector", "DimensionReducedVector"}]

Create a FeatureExtractorFunction using named features:

Wolfram Language code: fe = FeatureExtraction[None, {"Name" -> "LowerCasedText", "Molecule" -> "MoleculeFeatures"}]

Rules are interpreted as atomic expressions, not feature names:

Wolfram Language code: fe[{"Name" -> "Amoxicillin", "Molecule" -> Molecule["Amoxicillin"]}]

Use an association to specify named features:

Wolfram Language code: fe[<|"Name" -> "Amoxicillin", "Molecule" -> Molecule["Amoxicillin"]|>]//Short

Output type is set by the processor:

Wolfram Language code:

response = {"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"};
fe = FeatureExtraction[response, "LowerCasedText"]

Subsequent processors requiring a different input type will be ignored:

Wolfram Language code: fe = FeatureExtraction[response, {"LowerCasedText", "IndicatorVector"}]

Use a two stage extraction process to reinterpret the type:

Wolfram Language code:

fe1 = FeatureExtraction[response, "LowerCasedText"]
fe2 = FeatureExtraction[fe1[response], "IndicatorVector"]

Apply in sequence to get a result:

Wolfram Language code: fe2[fe1["No"]]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FeatureExtraction

Details and Options

Extractors

Properties

Options

Examples

Basic Examples (3)

Scope (32)

Input Shape (9)

Extractor Specifications (10)

Feature Types (10)

Information (3)

Options (4)

FeatureNames (2)

FeatureTypes (2)

Applications (3)

Image Search (1)

Text Search (1)

Imputation (1)

Properties & Relations (4)

Possible Issues (7)

Text

CMS

APA

BibTeX

BibLaTeX

FeatureExtraction

Details and Options

Extractors

Properties

Options

Examples

Basic Examples (3)

Scope (32)

Input Shape (9)

Extractor Specifications (10)

Feature Types (10)

Information (3)

Options (4)

FeatureNames (2)

FeatureTypes (2)

Applications (3)

Image Search (1)

Text Search (1)

Imputation (1)

Properties & Relations (4)

Possible Issues (7)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX