---
title: "FeatureExtraction"
language: "en"
type: "Symbol"
summary: "FeatureExtraction[examples] generates a FeatureExtractorFunction[...] trained from the examples given. FeatureExtraction[examples, spec] uses the specified feature extractor method spec. FeatureExtraction[examples, spec, props] gives the feature extraction properties specified by props."
keywords: 
- feature vector
- dimension reduction
- dimensionality reduction
- feature extract
- image feature
- audio feature
- vector feature
- machine learning
- classification
- training
canonical_url: "https://reference.wolfram.com/language/ref/FeatureExtraction.html"
source: "Wolfram Language Documentation"
related_guides: 
  - 
    title: "Machine Learning"
    link: "https://reference.wolfram.com/language/guide/MachineLearning.en.md"
  - 
    title: "Unsupervised Machine Learning"
    link: "https://reference.wolfram.com/language/guide/UnsupervisedMachineLearning.en.md"
  - 
    title: "Natural Language Processing"
    link: "https://reference.wolfram.com/language/guide/NaturalLanguageProcessing.en.md"
related_functions: 
  - 
    title: "FeatureExtract"
    link: "https://reference.wolfram.com/language/ref/FeatureExtract.en.md"
  - 
    title: "FeatureExtractor"
    link: "https://reference.wolfram.com/language/ref/FeatureExtractor.en.md"
  - 
    title: "FeatureExtractorFunction"
    link: "https://reference.wolfram.com/language/ref/FeatureExtractorFunction.en.md"
  - 
    title: "DimensionReduction"
    link: "https://reference.wolfram.com/language/ref/DimensionReduction.en.md"
  - 
    title: "FeatureNearest"
    link: "https://reference.wolfram.com/language/ref/FeatureNearest.en.md"
  - 
    title: "FeatureDistance"
    link: "https://reference.wolfram.com/language/ref/FeatureDistance.en.md"
  - 
    title: "Classify"
    link: "https://reference.wolfram.com/language/ref/Classify.en.md"
  - 
    title: "FeatureSpacePlot"
    link: "https://reference.wolfram.com/language/ref/FeatureSpacePlot.en.md"
  - 
    title: "CreateVectorDatabase"
    link: "https://reference.wolfram.com/language/ref/CreateVectorDatabase.en.md"
---
[EXPERIMENTAL]

# FeatureExtraction

FeatureExtraction[examples] generates a FeatureExtractorFunction[…] trained from the examples given.

FeatureExtraction[examples, spec] uses the specified feature extractor method spec.

FeatureExtraction[examples, spec, props] gives the feature extraction properties specified by props.

## Details and Options

* ``FeatureExtraction`` is typically used to define a function that processes raw data into usable features (e.g. for training a machine learning algorithm).

* ``FeatureExtraction`` can be used on many types of data, including numerical, textual, sounds, images, graphs and time series, as well as combinations of these.

* Possible values of ``examples`` are:

|               |                             |
| ------------- | --------------------------- |
| {example1, …} | a list of training examples |
| Dataset[…]    | a Dataset object            |
| Tabular[…]    | a Tabular object            |
| None          | no training examples        |

* Each ``examplei`` can be a single data element, a list of data elements or an association of data elements.

* Possible values for ``spec`` include:

|                         |                                                  |
| ----------------------- | ------------------------------------------------ |
| extractor               | use the specified extractor method               |
| part -> extractor        | apply the extractor to the specific example part |
| {part1 -> extractor1, …} | specify extractors for specific parts            |

* Possible feature extractor methods ``extractor`` include:

|                             |                                       |
| --------------------------- | ------------------------------------- |
| Automatic                   | automatic extraction                  |
| Identity                    | give data unchanged                   |
| "ConformedData"             | conformed images, colors, dates, etc. |
| "NumericVector"             | numeric vector from any data          |
| "name"                      | a named extractor method              |
| f                           | applies function f to each example    |
| {extractor1, extractor2, …} | use a sequence of extractors in turn  |

* Possible forms of ``part`` are:

All	all parts of each example
      	i	i\[Null]^th part of each example
      	{Subscript[i, 1],Subscript[i, 2],\[Ellipsis]}	parts Subscript[i, 1], Subscript[i, 2], \[Ellipsis] of each example
      	"key"	part with the specified key in each example
      	{"Subscript[key, 1]","Subscript[key, 2]",\[Ellipsis]}	parts with names "Subscript[key, i]" in each example

* When explicitly specifying parts, any unmentioned parts are dropped when extracting features.

### Extractors

* ``FeatureExtraction[examples]`` is equivalent to ``FeatureExtraction[examples, Automatic]``, which is typically equivalent to ``FeatureExtraction[examples, "NumericVector"]``.

* The ``"NumericVector"`` method will typically convert examples to numeric vectors, impute missing data and reduce the dimension using ``DimensionReduction``.

* Feature extractor methods specific to a single data type are applied only to data elements with whose types they are compatible. Other data elements are returned unchanged.

* Not all specific feature extractors are available when the ``examples`` is ``None``.

* The specific extractors are:

* Numeric data:

|                          |                                         |
| ------------------------ | --------------------------------------- |
| "DiscretizedVector"      | discretized numerical data              |
| "DimensionReducedVector" | reduced-dimension numeric vectors       |
| "MissingImputed"         | data with missing values imputed        |
| "StandardizedVector"     | numeric data processed with Standardize |

* Nominal data:

|                   |                                                        |
| ----------------- | ------------------------------------------------------ |
| "IndicatorVector" | nominal data "one-hot encoded" with indicator vectors  |
| "IntegerVector"   | nominal data encoded with integers                     |

* Text:

|                       |                                                      |
| --------------------- | ---------------------------------------------------- |
| "LowerCasedText"      | text with each character lowercase                   |
| "SegmentedCharacters" | text segmented into characters                       |
| "SegmentedWords"      | text segmented into words                            |
| "SentenceVector"      | semantic vector from a text                          |
| "TFIDF"               | term frequency-inverse document frequency vector     |
| "WordVectors"         | semantic vectors sequence from a text (English only) |

* Images:

|                 |                                               |
| --------------- | --------------------------------------------- |
| "FaceFeatures"  | semantic vector from an image of a human face |
| "ImageFeatures" | semantic vector from an image                 |
| "PixelVector"   | vector of pixel values from an image          |

* Audio objects:

|                        |                                                            |
| ---------------------- | ---------------------------------------------------------- |
| "AudioFeatures"        | sequence of semantic vectors from an audio object          |
| "AudioFeatureVector"   | semantic vector from an audio object                       |
| "LPC"                  | audio linear prediction coefficients                       |
| "MelSpectrogram"       | audio spectrogram with logarithmic frequencies bins        |
| "MFCC"                 | audio mel-frequency cepstral coefficients vectors sequence |
| "SpeakerFeatures"      | sequence of semantic speaker vectors                       |
| "SpeakerFeatureVector" | semantic vector for a speaker                              |
| "Spectrogram"          | audio spectrogram                                          |

* Video objects:

|                      |                                                  |
| -------------------- | ------------------------------------------------ |
| "VideoFeatures"      | sequence of semantic vectors from a video object |
| "VideoFeatureVector" | semantic vector from a video object              |

* Graphs:

"GraphFeatures"	numeric vector summarizing graph properties

* Molecules:

|                                |                                                                      |
| ------------------------------ | -------------------------------------------------------------------- |
| "AtomPairs"                    | Boolean vector from pairs of atoms and the path lengths between them |
| "MoleculeExtendedConnectivity" | Boolean vector from enumerated molecule subgraphs                    |
| "MoleculeFeatures"             | numeric vector summarizing molecule properties                       |
| "MoleculeTopologicalFeatures"  | Boolean vector from circular atom neighborhoods                      |

### Properties

* In ``FeatureExtraction[examples, extractors, props]``, ``props`` can be a single property or a list of properties. Possible properties include:

|                     |                                                  |
| ------------------- | ------------------------------------------------ |
| "ExtractorFunction" | FeatureExtractorFunction[…] (default)            |
| "ExtractedFeatures" | examples after feature extraction                |
| "ReconstructedData" | examples after extraction and inverse extraction |
| "FeatureDistance"   | FeatureDistance[…] generated from the extractor  |

* The ``"ExtractedFeatures"`` and ``"ReconstructedData"`` properties are not available when ``examples`` is ``None``.

* The ``"ReconstructedData"`` property can be computed only when every specified ``extractor`` is invertible.

### Options

* The following options can be given:

|               |           |                                                                   |
| ------------- | --------- | ----------------------------------------------------------------- |
| FeatureNames  | Automatic | names to assign to elements of the examplei                       |
| FeatureTypes  | Automatic | feature types to assume for elements of the examplei              |
| RandomSeeding | 1234      | what seeding of pseudorandom generators should be done internally |

* Possible settings for ``RandomSeeding`` include:

|           |                                                        |
| --------- | ------------------------------------------------------ |
| Automatic | automatically reseed every time the function is called |
| Inherited | use externally seeded random numbers                   |
| seed      | use an explicit integer or strings as a seed           |

---

## Examples (53)

### Basic Examples (3)

Train a ``FeatureExtractorFunction`` on a simple dataset:

```wl
In[1]:= fe = FeatureExtraction[{{1.4, "A"}, {1.5, "A"}, {2.3, "B"}, {5.4, "B"}}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"], 
       "f2" -> Association["Type" -> "No ... ate" -> DateObject[{2025, 5, 2, 12, 51, 
       3.280664`7.268536741306756}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Extract features from a new example:

```wl
In[2]:= fe[{2.4, "A"}]

Out[2]= {-0.153792, 1.00001, 1.}
```

Extract features from a list of examples:

```wl
In[3]:= fe[{{2.4, "A"}, {3.7, "B"}}]

Out[3]= {{-0.153792, 1.00001, 1.}, {0.645925, -1., -1.}}
```

---

Train a feature extractor on a dataset of images:

```wl
In[1]:= fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ... te" -> DateObject[{2025, 5, 2, 12, 53, 
       12.421336`7.846743295594106}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Use the feature extractor on the training set:

```wl
In[2]:= fe[{[image], [image], [image], [image], [image], [image]}]

Out[2]= {{17.654, -7.96819, -14.7183, -16.2649, 9.57529}, {4.8526, -13.1574, 7.03847, -3.41644, -22.3498}, {24.0119, 16.7143, 9.30536, 12.5164, 2.60713}, {-13.8595, 2.49587, -21.3937, 15.7634, -3.69741}, {-19.9103, 19.2434, 5.89901, -15.6432, -0.324524}, {-12.7487, -17.3279, 13.8691, 7.0447, 14.1893}}
```

---

Specify a specific extractor:

```wl
In[1]:= fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image]}, "ImageFeatures"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ... ate" -> DateObject[{2025, 5, 2, 12, 53, 
       18.225281`8.01324921730673}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

### Scope (32)

#### Input Shape (9)

Train a feature extractor on a list of examples with a single feature:

```wl
In[1]:= fe = FeatureExtraction[{"It was the best of times.", "A journey of a thousand miles begins with a single step.", "To be or not to be, that is the question."}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... te" -> DateObject[{2025, 5, 2, 12, 53, 
       39.196879`8.345826466792053}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Extract features from a new example:

```wl
In[2]:= fe["A rose by any other name"]

Out[2]= {5.07088, 6.58043}
```

Extract features from multiple new examples:

```wl
In[3]:= fe[{"A rose by any other name", "It was the worst of times"}]

Out[3]= {{5.07088, 6.58043}, {12.5357, -7.86591}}
```

---

Train a feature extractor on a list of examples with multiple features:

```wl
In[1]:= fe = FeatureExtraction[{{"It was the best of times.", "Charles Dickens"}, {"A journey of a thousand miles begins with a single step.", "Laozi (attrib.)"}, {"To be or not to be, that is the question.", "William Shakespere"}}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"], 
       "f2" -> Association["Type" -> "Text"]] ... ate" -> DateObject[{2025, 5, 2, 12, 53, 
       51.924759`8.46794946475687}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Extract features from multiple new examples:

```wl
In[2]:= fe[{{"All the world’s a stage, and all the men and women merely players", "William Shakespere"}, {"Knowing others is intelligence; knowing yourself is true wisdom", "Laozi (attrib.)"}}]

Out[2]= {{5.39697, 15.7993}, {-11.2052, 2.75744}}
```

---

Train a feature extractor on a mixed-type dataset:

```wl
In[1]:= fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"], 
       "f2" -> Association["Type" -> "Image"] ...    "Date" -> DateObject[{2025, 5, 2, 12, 56, 8.346588`7.6740839635865665}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features from a new example:

```wl
In[2]:= fe[{"the nice cat", [image]}]

Out[2]= {-152.932, -60.3095, -79.2789}
```

---

Train a feature extractor from a list of associations:

```wl
In[1]:=
fe = FeatureExtraction[{<|"age" -> 32, "height" -> 160, "gender" -> "female"|>, 
	<|"height" -> 183, "age" -> 41, "gender" -> "female"|>, 
	<|"height" -> 123, "age" -> 30, "gender" -> "female"|>, 
	<|"height" -> 175, "age" -> 21, "gender" -> "male"|>, 
	<|"height" -> 150, "age" -> 11, "gender" -> "male"|>, 
	<|"age" -> 52, "height" -> 164, "gender" -> "female"|>}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "height" -> Association["Type" - ... sion" -> {14.3, 0}, "Date" -> DateObject[{2025, 4, 23, 12, 13, 56.985526}, 
      "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", 
    "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features from a new example:

```wl
In[2]:= fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

Out[2]= {-1.41422, -1.41421, -0.922868, 0.872214}
```

Extract features from multiple new examples:

```wl
In[3]:= fe[{<|"age" -> 19, "height" -> 176, "gender" -> "male"|>, <|"age" -> 45, "height" -> 164, "gender" -> "female"|>}]

Out[3]= {{-1.41422, -1.41421, -0.922868, 0.872214}, {0.707105, 0.707107, 1.04929, 0.250438}}
```

---

Train a feature extractor from data given as feature lists:

```wl
In[1]:= FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "height" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 12, 54, 
       5.442868`7.488402789765891}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor from a ``Tabular`` :

```wl
In[1]:=
FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["age" -> Association["ElementType" -> "Integer64"], 
      "height" -> Association["ElementType" -> "Integer64"], 
      "gender" -> Association["ElementType" -> "String"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{32, 41, 30, 21, 11, 52}, {}, None}, 
          "ElementType" -> "Integer64"]], TabularColumn[Association[
          "Data" -> {{160, 183, 123, 175, 150, 164}, {}, None}, "ElementType" -> "Integer64"]], 
        TabularColumn[Association["Data" -> {{3, {0, 6, 12, 18, 22, 26, 32}, 
             "femalefemalefemalemalemalefemale"}, {}, None}, "ElementType" -> "String"]]}}]]]]]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "height" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 12, 54, 
       6.367993`7.556577564856213}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor from a ``Dataset`` :

```wl
In[1]:=
FeatureExtraction[Dataset[{Association["age" -> 32, "height" -> 160, "gender" -> "female"], 
  Association["age" -> 41, "height" -> 183, "gender" -> "female"], 
  Association["age" -> 30, "height" -> 123, "gender" -> "female"], 
  Association["age" -> 21, "height" -> 175, "gender" -> "male"], 
  Association["age" -> 11, "height" -> 150, "gender" -> "male"], 
  Association["age" -> 52, "height" -> 164, "gender" -> "female"]}]]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "height" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 12, 54, 
       7.745856`7.641644406186997}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor from a dataset that contains missing values:

```wl
In[1]:= FeatureExtraction[{{1.4, Missing[], "A"}, {1.5, 50.2, "A"}, {Missing[], 42.3, "B"}, {5.4, 61.7, "B"}}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"], 
       "f2" -> Association["Type" -> "Nu ... Date" -> DateObject[{2025, 5, 2, 12, 54, 
       8.913351`7.70261599597648}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Define a feature extractor that requires no training:

```wl
In[1]:= fe = FeatureExtraction[None, "WordVectors"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ...    "Date" -> DateObject[{2025, 5, 2, 12, 54, 16.057616`7.958256052830818}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Apply it on some text:

```wl
In[2]:= fe["hi there"]//Shallow

Out[2]//Shallow= {{-0.54313, 0.34427, 0.27125, 1.0487, -1.1642, -1.2722, 0.35781, -0.56527, -0.29879, 0.85179, «40»}, {0.68491, 0.32385, -0.11592, -0.35925, 0.49889, 0.042541, -0.40153, -0.36793, -0.61441, -0.41148, «40»}}
```

#### Extractor Specifications (10)

Specify the feature extractor ``"SentenceVector"`` on a single textual feature:

```wl
In[1]:= fe = FeatureExtraction[{"the lizard is green"}, "SentenceVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 1, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... te" -> DateObject[{2025, 5, 2, 12, 54, 
       41.171326`8.367169832470543}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Apply it on some text:

```wl
In[2]:= fe[{"there is a large cat"}]//Short

Out[2]//Short= {{0.845454, 0.274286, «380», 0.167039, 0.416839}}
```

---

Train a feature extractor using the ``"StandardizedVector"`` method:

```wl
In[1]:= fe = FeatureExtraction[{{1.4, 42.1}, {1.5, 50.2}, {4.2, 42.3}, {5.4, 61.7}}, "StandardizedVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "NumericalVector", 
         "Length" -> 2]], "Output"  ...     "Date" -> DateObject[{2025, 5, 2, 12, 54, 43.59812`8.392042739837102}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features from a new example:

```wl
In[2]:= features = fe[{6.4, 32.1}]

Out[2]= {1.89497, -2.12517}
```

Since this feature extractor is invertible, the ``FeatureExtractorFunction`` property ``"OriginalData"`` can be used to perform the inverse extraction:

```wl
In[3]:= fe[features, "OriginalData"]

Out[3]= {6.4, 32.1}
```

---

Train a feature extractor on text using the ``"TFIDF"`` method followed by the ``"DimensionReducedVector"`` method:

```wl
In[1]:= fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"TFIDF", "DimensionReducedVector"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ...    "Date" -> DateObject[{2025, 5, 2, 12, 54, 50.536128`8.456176939409076}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features on the training set:

```wl
In[2]:= fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Out[2]= {{-1.03721, 1.22635, 2.19043}, {-2.79602, 0.266957, -1.63574}, {0.851446, -3.0376, 0.304013}, {2.98179, 1.54429, -0.858706}}
```

---

Train a feature extractor on texts and images using the text-only ``"TFIDF"`` method:

```wl
In[1]:= fe = FeatureExtraction[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}, "TFIDF"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"], 
       "f2" -> Association["Type" -> "Image"] ...    "Date" -> DateObject[{2025, 5, 2, 12, 55, 39.243496`8.346342668002116}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Features will only be extracted from the text part:

```wl
In[2]:= fe[{"the nice cat", [image]}]

Out[2]= {{0., 0., 0., 0., 0.16906, 0., 0., 0., 0., 0., 0.16906}, [image]}
```

---

Specify the feature extraction on multiple features by position:

```wl
In[1]:= fe = FeatureExtraction[{{"Glucose", Molecule["Glucose"]}, {"Water", Molecule["Water"]}, {"Acetic Acid", Molecule["Acetic Acid"]}}, {1  -> { "SentenceVector", "DimensionReducedVector"}, 2  -> "MoleculeFeatures"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"], 
       "f2" -> Association["Type" -> "Molecul ... te" -> DateObject[{2025, 4, 10, 15, 57, 
       24.09025`8.134416289588444}, "Instant", "Gregorian", 1.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Use the feature extractor on new features:

```wl
In[2]:= fe[{{"Sucrose", Molecule["Sucrose"]}}]

Out[2]= {{{-0.31503, -1.63424}, {1.5, -1.5}}}
```

A list of two items will be assumed to be a single input of two features:

```wl
In[3]:= fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

Out[3]= {{7.05778, -1.22937}, {2.75, -2.25}}
```

---

Train a feature extractor with the ``"IndicatorVector"`` method on only the second nominal variable:

```wl
In[1]:= fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, 2 -> "IndicatorVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"], 
       "f2" -> Association["Type" -> "Nomi ...   "Date" -> DateObject[{2020, 10, 7, 12, 13, 53.577936`8.481560953832968}, "Instant", 
      "Gregorian", 2.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
    "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

The first nominal variable is dropped:

```wl
In[2]:= Normal@fe[{"Yes", "A"}]

Out[2]= {1., 0., 0.}
```

Use the ``Identity`` extractor method to copy the first variable:

```wl
In[3]:= fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> Identity}]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"], 
       "f2" -> Association["Type" -> "Nomi ...    "Date" -> DateObject[{2020, 10, 7, 12, 13, 56.509021`8.50469275724136}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

The first variable is copied:

```wl
In[4]:= fe[{"Yes", "A"}]

Out[4]= {{1., 0., 0.}, "Yes"}
```

A variable can be copied multiple times:

```wl
In[5]:= fe = FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> "IndicatorVector", 1 -> Identity}]

Out[5]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"], 
       "f2" -> Association["Type" -> "Nomi ...    "Date" -> DateObject[{2025, 4, 11, 11, 44, 24.003267`8.13284533863846}, "Instant", "Gregorian", 
      1.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]

In[6]:= fe[{"Yes", "A"}]

Out[6]= {{1., 0., 0.}, {0., 0., 1.}, "Yes"}
```

---

Specify the feature extraction on multiple features by key:

```wl
In[1]:=
fe = FeatureExtraction[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["Name" -> Association["ElementType" -> "String"], 
      "Molecule" -> Association["ElementType" -> "InertExpression"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{3, {0, 7, 12, 23}, "GlucoseWaterAcetic Acid"}, {}, 
            None}, "ElementType" -> "String"]], TabularColumn[
         Association["Data" -> {{Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", 
               "O", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H"}, 
              {Bond[{1, 2}, "Double"], Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, 
                "Single"], Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
               Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], Bond[
                {11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], Bond[
                {4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], Bond[{7, 18}, 
                "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], Bond[{10, 21}, 
                "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], Bond[{12, 24}, 
                "Single"]}, {StereochemistryElements -> {Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], Association[
                  "StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> "Clockwise"], 
                 Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, "Direction" -> 
                   "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
             Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
             Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[
                {2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], Bond[{1, 6}, 
                "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}, {}, None}, 
          "ElementType" -> "InertExpression", "CachedOriginalExpression" -> 
           {Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", "O", "H", "H", "H", 
              "H", "H", "H", "H", "H", "H", "H", "H", "H"}, {Bond[{1, 2}, "Double"], 
              Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, "Single"], 
              Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
              Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], 
              Bond[{11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], 
              Bond[{4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], 
              Bond[{7, 18}, "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], 
              Bond[{10, 21}, "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], 
              Bond[{12, 24}, "Single"]}, {StereochemistryElements -> {Association["StereoType" -> 
                  "Tetrahedral", "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], 
                Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> 
                  "Clockwise"], Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, 
                 "Direction" -> "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                 "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
            Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
            Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], 
              Bond[{2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], 
              Bond[{1, 6}, "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}]]}}]]]],   {"Name"  -> { "SentenceVector", "DimensionReducedVector"}, "Molecule"  -> "MoleculeFeatures"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["Name" -> Association["Type" -> "Text"], 
       "Molecule" -> Association["Type" ->  ... sion" -> {14.3, 0}, "Date" -> DateObject[{2025, 4, 23, 12, 22, 26.111295}, 
      "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", 
    "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Use the feature extractor on new features:

```wl
In[2]:= fe[<|"Name" -> "Hydrochloric Acid", "Molecule" -> Molecule["Hydrochloric Acid"]|>]

Out[2]= {{7.05778, -1.22937}, {2.75, -2.25}}
```

Using the feature extractor on a list will assume the same ordering of features as originally specified:

```wl
In[3]:= fe[{"Hydrochloric Acid", Molecule["Hydrochloric Acid"]}]

Out[3]= {{7.05778, -1.22937}, {2.75, -2.25}}
```

---

Generate a feature extractor using a custom function:

```wl
In[1]:= data = {DateObject[{2014, 5, 5}, TimeObject[{9, 53, 6.30158}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2000, 1, 1}, TimeObject[{0, 0, 0.}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2006, 12}], DateObject[{2007, 8, 23}], DateObject[{2016, 4, 4}, TimeObject[{15, 59, 18.2738}, TimeZone -> -4.], TimeZone -> -4.]};

In[2]:= fe = FeatureExtraction[data, {AbsoluteTime[#], #["Year"]}&]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Date"]], 
     "Output" -> Association["f1" -> Associa ...     "Date" -> DateObject[{2020, 10, 7, 12, 13, 38.20122`8.33465221162347}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Apply the extractor on the training set:

```wl
In[3]:= fe[data]

Out[3]= {{3.608297586301579*^9, 2014}, {3155698800, 2000}, {3373920000, 2006}, {3396816000, 2007}, {3.668795958273754*^9, 2016}}
```

Chain the custom extractor with the ``"StandardizedVector"`` method:

```wl
In[4]:= fe2 = FeatureExtraction[data, {{AbsoluteTime[#], #["Year"]}&, "StandardizedVector"}]

Out[4]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Date"]], 
     "Output" -> Association["f1" -> Associa ...   "Date" -> DateObject[{2020, 10, 7, 12, 13, 42.295471`8.378868843305996}, "Instant", 
      "Gregorian", 2.], "ProcessorCount" -> 4, "ProcessorType" -> "x86-64", 
    "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, "Evaluations" -> {}]]]

In[5]:= fe2[data]

Out[5]= {{0.915031, 0.933815}, {-1.5561, -1.48719}, {-0.364641, -0.449614}, {-0.239632, -0.276686}, {1.24534, 1.27967}}
```

---

Conform data prior to processing:

```wl
In[1]:= FeatureExtraction[{[image], [image], [image], [image]}, {"ConformedData", "ImageFeatures"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ... 
    "Date" -> DateObject[{2025, 5, 2, 12, 56, 37.82621`8.33036780837186}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

---

Reduce the dimensionality of the output:

```wl
In[1]:= FeatureExtraction[{[image], [image], [image], [image]}, {"ImageFeatures", "DimensionReducedVector"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ... 
    "Date" -> DateObject[{2025, 5, 2, 12, 56, 51.67523`8.46585739354372}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

#### Feature Types (10)

Create a feature extractor for textual data using the ``"SentenceVector"`` extractor with no training:

```wl
In[1]:= fe = FeatureExtraction[None, "SentenceVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ...    "Date" -> DateObject[{2025, 5, 2, 12, 58, 2.353738`7.1243331074927765}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Input type is inferred from the specified extractor. Use the feature extractor on some examples:

```wl
In[2]:= fe[{"it is not a cat", "what a nice dog", "here is a dog again"}]//Short

Out[2]//Short= {{0.387969, 0.299338, «380», 0.49624, 0.176009}, {«1»}, {«1»}}
```

---

Create a feature extractor for examples with implicit textual and image features:

```wl
In[1]:= fe = FeatureExtraction[None, {1 -> "SentenceVector", 2 -> "ImageFeatures"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"], 
       "f2" -> Association["Type" -> "Image"] ...     "Date" -> DateObject[{2025, 5, 2, 12, 58, 7.202704`7.610070554768168}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Features will be extracted from both parts:

```wl
In[2]:= fe[{"the nice cat", [image]}]//Short

Out[2]//Short= {{-0.0557431, 0.380375, «380», 0.381394, 0.373169}, {«1»}}
```

---

Train a feature extractor on textual data:

```wl
In[1]:= FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... te" -> DateObject[{2025, 5, 2, 12, 58, 
       21.181312`8.078527840842556}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor with the ``"IndicatorVector"`` method on nominal variables:

```wl
In[1]:= FeatureExtraction[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, "IndicatorVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"], 
       "f2" -> Association["Type" -> "Nomi ... te" -> DateObject[{2025, 5, 2, 12, 58, 
       27.420414`8.190648989381069}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor to compute term frequency-inverse document frequency vectors from texts:

```wl
In[1]:= fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... te" -> DateObject[{2025, 5, 2, 12, 58, 
       28.740862`8.211074770937966}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

The term frequency-inverse document frequency matrix of the training set can be computed in a ``SparseArray`` :

```wl
In[2]:= matrix = fe[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Out[2]=
SparseArray[Automatic, {4, 11}, 0., 
 {1, {{0, 4, 8, 12, 15}, {{2}, {5}, {6}, {11}, {2}, {5}, {8}, {10}, {1}, {2}, {3}, {4}, {1}, {9}, 
    {11}}}, {0.04051860175377196, 0.09762636345914723, 0.19525272691829446, 0.09762636345914723, 
   0.04051860175377196, 0.09762636345914723, 0.19525272691829446, 0.19525272691829446, 
   0.09762636345914723, 0.04051860175377196, 0.19525272691829446, 0.19525272691829446, 
   0.1359112118744991, 0.2718224237489982, 0.1359112118744991}}]
```

Visualize the matrix:

```wl
In[3]:= matrix//MatrixPlot

Out[3]= [image]
```

The ``"TFIDF"`` method can also be used on tokenized data (nominal bags):

```wl
In[4]:= FeatureExtraction[{{"the", "cat", "is", "grey"}, {"my" , "cat", "is", "fast"}, {"this", "dog", "is", "scary"}, {"the", "big", "dog"}}, "TFIDF", "ExtractedFeatures"]

Out[4]=
SparseArray[Automatic, {4, 10}, 0., 
 {1, {{0, 4, 8, 12, 15}, {{2}, {5}, {6}, {10}, {2}, {5}, {7}, {9}, {1}, {2}, {3}, {4}, {1}, {8}, 
    {10}}}, {0.07016635913458072, 0.1690602879414501, 0.3381205758829002, 0.1690602879414501, 
   0.07016635913458072, 0.1690602879414501, 0.3381205758829002, 0.3381205758829002, 
   0.1690602879414501, 0.07016635913458072, 0.3381205758829002, 0.3381205758829002, 
   0.22359586469675655, 0.4471917293935131, 0.22359586469675655}}]
```

---

Train a feature extractor on a list of ``DateObject`` instances:

```wl
In[1]:= fe = FeatureExtraction[{DateObject[{2014, 5, 5, 9, 53, 6.30158}, "Instant", "Gregorian", -6.], DateObject[{2000, 1, 1, 0, 0, 0.}, "Instant", "Gregorian", -6.], DateObject[{2006, 12}, "Month", "Gregorian", -6.], DateObject[{2007, 8, 23}, "Day", "Gregorian", -6.], DateObject[{2016, 4, 4, 15, 59, 18.2738}, "Instant", "Gregorian", -4.]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Date"]], 
     "Output" -> Association["f1" -> Associa ...     "Date" -> DateObject[{2025, 5, 2, 12, 58, 45.276009`8.40844311446409}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features from a new ``DateObject`` :

```wl
In[2]:= fe[DateObject[{2003, 1, 2}, "Day", "Gregorian", -6.]]

Out[2]= {-2.58564, -1.82676, 1.38014, 0.692171}
```

A string date can also be given:

```wl
In[3]:= fe["2nd of January 2003"]

Out[3]= {-2.58568, -1.8268, 1.38016, 0.69216}
```

---

Train a feature extractor on a list of ``Graph`` instances:

```wl
In[1]:= fe = FeatureExtraction[{[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 15, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Graph"]], 
     "Output" -> Association["f1" -> Assoc ... te" -> DateObject[{2025, 5, 2, 12, 58, 
       49.824971`8.450022030391994}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Extract features from a new graph:

```wl
In[2]:= fe[[image]]

Out[2]= {1.63311, -2.5484, 0.160699, -0.652795, 0.164148, 1.03672}
```

---

Train a feature extractor on a list of ``TimeSeries`` instances:

```wl
In[1]:=
FeatureExtraction[{TemporalData[TimeSeries, {CompressedData["«979»"], {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, 
    {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {DateFunction -> Automatic, ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, 
   ValueDimensions -> 1}}, True, 12.2], TemporalData[TimeSeries, {CompressedData["«903»"], {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, 
    {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {CompressedData["«893»"], {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, 
    {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {CompressedData["«749»"], 
  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {CompressedData["«757»"], 
  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 5, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "TimeSeries", 
         "Dimensions" -> {"Varying"}]],  ... ate" -> DateObject[{2025, 5, 2, 12, 58, 
       54.570435`8.48953239014126}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor on ``Molecule`` data:

```wl
In[1]:= FeatureExtraction[{Molecule["Sucrose"], Molecule["Pentane"]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 2, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Molecule"]], 
     "Output" -> Association["f1" -> Ass ... te" -> DateObject[{2025, 5, 2, 12, 58, 
       57.716833`8.513877466367845}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

---

Train a feature extractor on a list of ``Audio`` instances:

```wl
In[1]:= FeatureExtraction[{Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Violin"], {i, 0, 12}]]], Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Trumpet"], {i, 0, 12}]]]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 2, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Audio"]], 
     "Output" -> Association["f1" -> Associ ... ate" -> DateObject[{2025, 5, 2, 12, 58, 
       59.98649`8.530628423799303}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

#### Information (3)

Get ``Information``  from a trained ``FeatureExtractorFunction`` :

```wl
In[1]:=
Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]]

Out[1]=
InformationData[Association["ObjectType" -> "FeatureExtractorFunction", 
  "InputTypes" -> {"Numerical", "Nominal"}, "OutputTypes" -> 
   Row[{"NumericalVector", Style[Row[{" (", Row[{3}, "×"], ")"}], GrayLevel[0.5]]}], 
  "FunctionMemory" -> Quantity[67., "Kilobytes"], "ExampleNumber" -> Quantity[4, "Examples"], 
  "TrainingTime" -> Quantity[36.1, "Milliseconds"], "Invertibility" -> "Impossible", 
  "MissingValues" -> "Imputed"], True]
```

---

Find the available properties:

```wl
In[1]:=
Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], "Properties"]

Out[1]= {"ExampleNumber", "FeatureNames", "FeatureNumber", "FeatureTypes", "FunctionMemory", "InputTypes", "Invertibility", "MaxTrainingMemory", "MissingValues", "OutputTypes", "PerformanceGoal", "Properties", "TrainingTime"}
```

---

Get information about the input and output types:

```wl
In[1]:=
Information[FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]], {"InputTypes", "OutputTypes"}]

Out[1]= {<|"age" -> "Numerical", "gender" -> "Nominal"|>, <|"(f1f2)" -> "NumericalVector"|>}
```

### Options (4)

#### FeatureNames (2)

Train a feature extractor and give a name to each feature:

```wl
In[1]:= fe = FeatureExtraction[{{2.3, "male"}, {4.8, Missing[]}, {Missing[], "female"}, {5.2, "female"}}, FeatureNames -> {"age", "gender"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "gender" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 0, 
       39.042795`8.344115878952366}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Use the association format to extract features from a new example:

```wl
In[2]:= fe[<|"age" -> 3.3, "gender" -> "male"|>]

Out[2]= {-0.914498, -1.73205, -1.73205}
```

The list format can still be used:

```wl
In[3]:= fe[{3.3, "male"}]

Out[3]= {-0.914498, -1.73205, -1.73205}
```

---

Use ``FeatureNames`` to set up names and refer to them in ``FeatureExtraction[examples, {spec1 -> ext1, …}]`` :

```wl
In[1]:= fe = FeatureExtraction[{{"A", "female"}, {"B", "female"}, {"C", "male"}, {"B", "male"}}, {"class" -> Identity, "gender" -> "IndicatorVector"}, FeatureNames -> {"class", "gender"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["class" -> Association["Type" -> "Nominal"], 
       "gender" -> Association["Type" - ...     "Date" -> DateObject[{2025, 5, 2, 16, 0, 6.835085`7.5873189072760585}, "Instant", "Gregorian", 
      1.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Extract features on a new example using the names to specify the features:

```wl
In[2]:= fe[<|"gender" -> "female", "class" -> "B"|>]

Out[2]= {"B", {1., 0.}}
```

#### FeatureTypes (2)

Train a feature extractor with the ``"IndicatorVector"`` method on a simple dataset:

```wl
In[1]:= fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Numerical"], 
       "f2" -> Association["Type" -> "No ... Date" -> DateObject[{2025, 5, 2, 13, 0, 
       46.75861`8.422436568959782}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

The first feature has been interpreted as numerical. Since the ``"IndicatorVector"`` method only acts on nominal features, the first feature is unchanged:

```wl
In[2]:= fe[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}]

Out[2]= {{1, {1., 0.}}, {2, {1., 0.}}, {2, {0., 1.}}, {1, {0., 1.}}}
```

Use ``FeatureTypes`` to enforce the interpretation of the first feature as nominal:

```wl
In[3]:= fe = FeatureExtraction[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector", FeatureTypes -> <|1 -> "Nominal"|>]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"], 
       "f2" -> Association["Type" -> "Nomi ... Date" -> DateObject[{2025, 5, 2, 13, 0, 
       48.73237`8.440392508994217}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Now both features are encoded as indicator vectors:

```wl
In[4]:= fe[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}] // MatrixForm

Out[4]//MatrixForm=
(⁠|    |    |    |    |
| -- | -- | -- | -- |
| 1. | 0. | 1. | 0. |
| 0. | 1. | 1. | 0. |
| 0. | 1. | 0. | 1. |
| 1. | 0. | 0. | 1. |⁠)
```

---

Creating a feature extractor with no training infers the expected data type from the specific extractor:

```wl
In[1]:= fe = FeatureExtraction[None, "SentenceVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ...    "Date" -> DateObject[{2025, 5, 6, 13, 37, 59.424057`8.526537271705928}, "Instant", "Gregorian", 
      1.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Specifying the feature type will override the assumption:

```wl
In[2]:= fe2 = FeatureExtraction[None, "SentenceVector", FeatureTypes -> {"first" -> "Numerical", "second" -> "Text"}]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["first" -> Association["Type" -> "Numerical"], 
       "second" -> Association["Type" ... "Date" -> DateObject[{2025, 5, 6, 13, 38, 
       0.791516`6.6510346874379}, "Instant", "Gregorian", 1.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Apply to named features:

```wl
In[3]:= fe2[<|"first" -> 1, "second" -> "Good morning!"|>]//Short

Out[3]//Short= {1, {-0.182758, «382», 0.408377}}
```

### Applications (3)

#### Image Search (1)

Construct a dataset of dog images:

```wl
In[1]:= dataset = {[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]};
```

Train an extractor function from this dataset:

```wl
In[2]:= fe = FeatureExtraction[dataset]

Out[2]= FeatureExtractorFunction[«1»]
```

Generate a ``NearestFunction`` on the extracted features of the dataset:

```wl
In[3]:= nf = Nearest[fe[dataset] -> Automatic]

Out[3]= NearestFunction[Hold[Nearest[CompressedData["«4678»"] -> Automatic]]]
```

Using the ``NearestFunction``, construct a function that displays the nearest image of the dataset:

```wl
In[4]:= nearestdog = dataset[[First@nf[fe[#]]]]&

Out[4]= dataset[[First[nf[fe[#1]]]]]&
```

Use this function on images that are not in the dataset:

```wl
In[5]:= nearestdog[[image]]

Out[5]= [image]

In[6]:= nearestdog[[image]]

Out[6]= [image]

In[7]:= nearestdog[[image]]

Out[7]= [image]
```

This feature extractor function can also be used to delete image pairs that are too similar:

```wl
In[8]:= features = # -> fe[#]& /@ dataset;

In[9]:= First /@ DeleteDuplicates[features, CosineDistance[#1[[2]], #2[[2]]] < 1 &]

Out[9]= {[image], [image], [image], [image], [image], [image]}
```

#### Text Search (1)

Load the text of *Alice in Wonderland* :

```wl
In[1]:= alice = ExampleData[{"Text", "AliceInWonderland"}];
```

Split the text into sentences:

```wl
In[2]:= sentences = TextSentences[alice];
```

Train a feature extractor on these sentences:

```wl
In[3]:= fe = FeatureExtraction[sentences]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 551, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Assoc ... ate" -> DateObject[{2025, 5, 2, 13, 5, 
       13.008275`7.866794695115702}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Generate a ``NearestFunction`` with the sentences' features:

```wl
In[4]:= nf = Nearest[fe[sentences] -> Automatic]

Out[4]= NearestFunction[{384, <>}]
```

Using the ``NearestFunction``, construct a function that displays the nearest sentence in *Alice in Wonderland* :

```wl
In[5]:= nearestalice = sentences[[First@nf[fe[#]]]]&

Out[5]= sentences[[First[nf[fe[#1]]]]]&
```

Use this function with a few queries:

```wl
In[6]:= nearestalice["Alice and the Rabbit"]

Out[6]= "Next came the guests, mostly Kings and Queens, and among them Alice recognized the White Rabbit."

In[7]:= nearestalice["The cat and the Queen"]

Out[7]= "\"How do you like the Queen?\" said the Cat in a low voice."

In[8]:= nearestalice["Off her head"]

Out[8]= "\"Off with her head!\" the Queen shouted at the top of her voice."
```

#### Imputation (1)

Load the ``"MNIST"`` dataset from ``ExampleData`` and keep the images:

```wl
In[1]:= digits = First /@ ExampleData[{"MachineLearning", "MNIST"}, "TestData"];

In[2]:= RandomSample[digits, 10]

Out[2]= {[image], [image], [image], [image], [image], [image], [image], [image], [image], [image]}
```

Convert images to numerical data and separate the dataset into a training set and a test set:

```wl
In[3]:= digits = Flatten[ImageData[#]]& /@ RandomSample[digits];

In[4]:=
trainingset = digits[[ ;; 9000]];
testset = digits[[9001 ;; ]];
```

The dimension of the dataset is ``784`` :

```wl
In[5]:= Dimensions[trainingset]

Out[5]= {9000, 784}
```

Create a feature extractor using the ``"MissingImputed"`` method:

```wl
In[6]:= fe = FeatureExtraction[trainingset, "MissingImputed"]

Out[6]= FeatureExtractorFunction[«1»]
```

Replace some values of a test-set vector by ``Missing[]`` and visualize it:

```wl
In[7]:= vector = RandomChoice[testset];

In[8]:= toimage = Image[Partition[#, 28], ImageSize -> Tiny] &;

In[9]:=
vectormissing = vector;
vectormissing[[309 ;; 364]] = Missing[];
imagemissing = toimage[Replace[vectormissing, _Missing -> .5, {1}]]

Out[9]= [image]
```

Impute missing values using the ``FeatureExtractorFunction[…]`` :

```wl
In[10]:= imputedimage = toimage[fe[vectormissing]]

Out[10]= [image]
```

Visualize the original image, the image with missing values, and the imputed image:

```wl
In[11]:= {toimage[vector], imagemissing, imputedimage}

Out[11]= {[image], [image], [image]}
```

### Properties & Relations (4)

Train a feature extractor from data with named features:

```wl
In[1]:= fe = FeatureExtraction[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 6, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["age" -> Association["Type" -> "Numerical"], 
       "height" -> Association["Type" - ... ate" -> DateObject[{2025, 5, 2, 13, 5, 
       40.590446`8.360998801537427}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Unrecognized keys will be ignored:

```wl
In[2]:= fe[<|"age" -> 19, "height" -> 176, "gender" -> "male"|>]

Out[2]= {-1.41422, -1.41421, -0.922868, 0.872214}

In[3]:= fe[<|"age" -> 19, "height" -> 176, "gender" -> "male", "weight" -> 32|>]

Out[3]= {-1.41422, -1.41421, -0.922868, 0.872214}
```

---

``FeatureExtraction[…, "ExtractedFeatures"]`` is equivalent to ``FeatureExtract[…]`` :

```wl
In[1]:= data = {"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"};

In[2]:= FeatureExtraction[data, "TFIDF", "ExtractedFeatures"] == FeatureExtract[data, "TFIDF"]

Out[2]= True
```

---

The ``"FeatureDistance"`` property is equivalent to using ``FeatureDistance`` on the extractor:

```wl
In[1]:= fd1 = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF", "FeatureDistance"]

Out[1]=
FeatureDistance[Association["FeatureExtractorFunction" -> FeatureExtractorFunction[
    Association["ExampleNumber" -> 4, "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
       Association["Input" -> Association["f1" -> Association["T ... .098169`8.452396828979815}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
       "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
       "Evaluations" -> {}]]], "DistanceFunction" -> EuclideanDistance]]
```

Compute the ``FeatureExtractorFunction`` first:

```wl
In[2]:= fe = FeatureExtraction[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 4, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... ate" -> DateObject[{2025, 5, 2, 13, 5, 
       52.008071`8.468645720895607}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Construct a feature distance for this feature extractor:

```wl
In[3]:= fd2 = FeatureDistance[fe]

Out[3]=
FeatureDistance[Association["FeatureExtractorFunction" -> FeatureExtractorFunction[
    Association["ExampleNumber" -> 4, "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
       Association["Input" -> Association["f1" -> Association["T ... .008071`8.468645720895607}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
       "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
       "Evaluations" -> {}]]], "DistanceFunction" -> EuclideanDistance]]
```

The two distance functions are identical:

```wl
In[4]:= fd1["the cat is grey", "the big dog"]

Out[4]= 0.378314

In[5]:= fd2["the cat is grey", "the big dog"]

Out[5]= 0.378314
```

---

Creating a ``FeatureExtractorFunction`` on some training data creates a feature space representing those features:

```wl
In[1]:= fe = FeatureExtraction[{Molecule["Glucose"], Molecule["Sucrose"]}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 2, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Molecule"]], 
     "Output" -> Association["f1" -> Ass ... ate" -> DateObject[{2025, 5, 2, 13, 6, 
       41.248479`8.367982916945797}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Using different training data can result in a sized feature space:

```wl
In[2]:=
molecules = Molecule /@ EntityList[EntityClass["Chemical", "AminoAcids"]];
fe2 = FeatureExtraction[molecules]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 20, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Molecule"]], 
     "Output" -> Association["f1" -> As ... ate" -> DateObject[{2025, 5, 2, 13, 6, 
       53.308538`8.479371746757444}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]

In[3]:= fe[Molecule["Water"]] === fe2[Molecule["Water"]]

Out[3]= False
```

Creating the same item with no data will result in a untrained function that will consistently give the same results in the same feature space:

```wl
In[4]:= fe3 = FeatureExtraction[None, "MoleculeFeatures"]

Out[4]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Molecule"]], 
     "Output" -> Association["f1" -> Ass ... 
    "Date" -> DateObject[{2025, 5, 2, 13, 7, 0.941114`6.726217223184653}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

### Possible Issues (7)

Training an extractor on anonymous data will use automatic feature names:

```wl
In[1]:= feAutomatic = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 12, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"]], 
     "Output" -> Association["f1" -> Ass ... ate" -> DateObject[{2025, 5, 2, 13, 8, 
       15.889505`7.953685353067917}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]

In[2]:= Information[feAutomatic, "FeatureNames"]

Out[2]= {"f1"}
```

Using custom names when applying the function will give a feature missing error:

```wl
In[3]:= feAutomatic[<|"Letter" -> "S"|>]
```

FeatureExtractorFunction::mlincfttp: Incompatible variable type (Nominal) and variable value ({Missing[]}).

```wl
Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 12, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"]], 
     "Output" -> Association["f1" -> Ass ... 025, 5, 2, 13, 8, 
       15.889505`7.953685353067917}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]][<|"Letter" -> "S"|>]
```

Feature names can be specified at training time:

```wl
In[4]:= feNamed = FeatureExtraction[{"C", "B", "C", "B", "A", "B", "B", "A", "B", "B", "A", "A"}, "IndicatorVector", FeatureNames -> "Letter"]

Out[4]=
FeatureExtractorFunction[Association["ExampleNumber" -> 12, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["Letter" -> Association["Type" -> "Nominal"]], 
     "Output" -> Association["f1" -> ... e" -> DateObject[{2025, 5, 15, 11, 16, 
       22.334025`8.101541981097203}, "Instant", "Gregorian", 1.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Check the feature names of a ``FeatureExtractorFunction`` :

```wl
In[5]:= Information[feNamed, "FeatureNames"]

Out[5]= {"Letter"}
```

The custom name can now be used:

```wl
In[6]:= feNamed[<|"Letter" -> "S"|>]

Out[6]= SparseArray[Automatic, {3}, 0., {1, {{0, 0}, {}}, {}}]
```

---

The ``FeatureExtraction`` property ``"ReconstructedData"`` can be used to obtain the data after extraction and reconstruction:

```wl
In[1]:= FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DimensionReducedVector", "ReconstructedData"]

Out[1]= {{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}
```

Some feature extractors can only perform an approximation of the inverse extraction:

```wl
In[2]:= fe = FeatureExtraction[{{1.4, 1.4, 5.4, 5.2}, {1.5, 1.5, 6.4, 5.2}, {1.2, 1.2, 6.2, 5.2}, {1.6, 1.6, 4.3, 5.2}}, "DiscretizedVector", "ReconstructedData"]

Out[2]= {{1.39652, 1.31629, 5.36193, 5.2}, {1.4083, 1.49599, 5.91349, 5.2}, {1.322, 1.25194, 5.46787, 5.2}, {1.53559, 1.45366, 5.31705, 5.2}}
```

Some feature extractors cannot be inverted:

```wl
In[3]:= FeatureExtraction[{[image], [image], [image], [image]}, "ImageFeatures", "ReconstructedData"]
```

FeatureExtraction::imprec: The feature extractor cannot be inverted: output "ReconstructedData" cannot be computed.

```wl
Out[3]= FeatureExtraction[{[image], [image], [image], [image]}, "ImageFeatures", "ReconstructedData"]
```

---

The property ``"ReconstructedData"`` cannot be used without training data:

```wl
In[1]:= FeatureExtraction[None, "DimensionReducedVector", "ReconstructedData"]
```

FeatureExtraction::mlnooutnone: Property {ReconstructedData} requires training examples to be evaluated.

```wl
Out[1]= FeatureExtraction[None, "DimensionReducedVector", "ReconstructedData"]
```

---

Some extractors can be created without needing data:

```wl
In[1]:= FeatureExtraction[None, "LowerCasedText"]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... ate" -> DateObject[{2025, 5, 2, 13, 9, 
       15.672704`7.947718916506609}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Others require examples to initialize them:

```wl
In[2]:= FeatureExtraction[None, "StandardizedVector"]
```

FeatureExtraction::mlnoprocnone: Processor  requires training examples to be created.

```wl
Out[2]= FeatureExtraction[None, "StandardizedVector"]
```

Similarity, not all properties are supported:

```wl
In[3]:= FeatureExtraction[None, "LowerCasedText", "FeatureDistance"]
```

FeatureExtraction::mlnooutnone: Property {FeatureDistance} requires training examples to be evaluated.

```wl
Out[3]= FeatureExtraction[None, "LowerCasedText", "FeatureDistance"]
```

---

Extractors that do not match the data type are ignored:

```wl
In[1]:= fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 10, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Nominal"]], 
     "Output" -> Association["f1" -> Ass ...    "Date" -> DateObject[{2025, 5, 2, 13, 12, 42.333696`8.379261164494736}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

The input type is ``"Nominal"``, so the ``"LowerCasedText"`` extractor ignores the input type:

```wl
In[2]:= fe["Yes"] == fe["yes"]

Out[2]= False
```

Similarly, forcing the input to ``"Text"`` will cause the ``"IndicatorVector"`` to be ignored:

```wl
In[3]:= fe = FeatureExtraction[{"No", "No", "no", "no", "no", "no", "yes", "no", "Yes", "Yes"}, {"LowerCasedText", "IndicatorVector"}, FeatureTypes -> "Text"]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 10, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associ ...    "Date" -> DateObject[{2025, 5, 2, 13, 12, 42.512242`8.381088986989328}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]

In[4]:= fe["Yes"]

Out[4]= "yes"
```

---

The ``"ConformedData"`` extractor requires additional information to operate in a data-free context:

```wl
In[1]:= FeatureExtraction[None, "ConformedData"]
```

FeatureExtraction::mlprocnonedta: Processor ConformedData requires additional context (via FeatureTypes or other processors) to be created.

```wl
Out[1]= FeatureExtraction[None, "ConformedData"]
```

Specifying the ``FeatureTypes`` explicitly:

```wl
In[2]:= FeatureExtraction[None, "ConformedData", FeatureTypes -> "Image"]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ... te" -> DateObject[{2025, 5, 2, 13, 14, 
       28.858597`8.212850194975799}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

The feature type can also be implicitly inferred from subsequent extractors:

```wl
In[3]:= FeatureExtraction[None, {"ConformedData", "ImageFeatures"}]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 0, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Image"]], 
     "Output" -> Association["f1" -> Associ ...     "Date" -> DateObject[{2025, 5, 2, 13, 14, 33.220233`8.27397765433888}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

---

The automatic feature extraction often applies a dimension reduction step:

```wl
In[1]:= fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}]

Out[1]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... ate" -> DateObject[{2025, 5, 2, 13, 14, 
       55.987417`8.50066540578063}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Explicit feature extractors do not include dimensional reduction and typically result in longer vectors:

```wl
In[2]:= fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, "SentenceVector"]

Out[2]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... ate" -> DateObject[{2025, 5, 2, 13, 14, 
       58.124803`8.51693646730439}, "Instant", "Gregorian", 2.], "ProcessorCount" -> 10, 
    "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", "SystemWordLength" -> 64, 
    "Evaluations" -> {}]]]
```

Use the ``"DimensionReducedVector"`` to add a dimension reduction step:

```wl
In[3]:= fe = FeatureExtraction[{"rhinos have horns", "deer have antlers", "fish have scales"}, {"SentenceVector", "DimensionReducedVector"}]

Out[3]=
FeatureExtractorFunction[Association["ExampleNumber" -> 3, 
  "Preprocessor" -> MachineLearning`MLProcessor["ToMLDataset", 
    Association["Input" -> Association["f1" -> Association["Type" -> "Text"]], 
     "Output" -> Association["f1" -> Associa ... 
    "Date" -> DateObject[{2025, 5, 2, 13, 15, 0.273291`6.18920031893276}, "Instant", "Gregorian", 
      2.], "ProcessorCount" -> 10, "ProcessorType" -> "ARM64", "OperatingSystem" -> "MacOSX", 
    "SystemWordLength" -> 64, "Evaluations" -> {}]]]
```

Dimension reduction must be trained on the available features and therefore cannot be applied when no data is provided:

```wl
In[4]:= fe = FeatureExtraction[None, {"SentenceVector", "DimensionReducedVector"}]
```

FeatureExtraction::mlnoprocnone: Processor MergeVectors requires training examples to be created.

```wl
Out[4]= FeatureExtraction[None, {"SentenceVector", "DimensionReducedVector"}]
```

## See Also

* [`FeatureExtract`](https://reference.wolfram.com/language/ref/FeatureExtract.en.md)
* [`FeatureExtractor`](https://reference.wolfram.com/language/ref/FeatureExtractor.en.md)
* [`FeatureExtractorFunction`](https://reference.wolfram.com/language/ref/FeatureExtractorFunction.en.md)
* [`DimensionReduction`](https://reference.wolfram.com/language/ref/DimensionReduction.en.md)
* [`FeatureNearest`](https://reference.wolfram.com/language/ref/FeatureNearest.en.md)
* [`FeatureDistance`](https://reference.wolfram.com/language/ref/FeatureDistance.en.md)
* [`Classify`](https://reference.wolfram.com/language/ref/Classify.en.md)
* [`FeatureSpacePlot`](https://reference.wolfram.com/language/ref/FeatureSpacePlot.en.md)
* [`CreateVectorDatabase`](https://reference.wolfram.com/language/ref/CreateVectorDatabase.en.md)

## Related Guides

* [Machine Learning](https://reference.wolfram.com/language/guide/MachineLearning.en.md)
* [Unsupervised Machine Learning](https://reference.wolfram.com/language/guide/UnsupervisedMachineLearning.en.md)
* [Natural Language Processing](https://reference.wolfram.com/language/guide/NaturalLanguageProcessing.en.md)

## History

* [Introduced in 2016 (11.0)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn110.en.md) \| [Updated in 2017 (11.2)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn112.en.md) ▪ [2020 (12.1)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn121.en.md) ▪ [2020 (12.2)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn122.en.md) ▪ [2021 (12.3)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn123.en.md) ▪ [2025 (14.3)](https://reference.wolfram.com/language/guide/SummaryOfNewFeaturesIn143.en.md)