FeatureExtraction

FeatureExtraction[{example1,example2,}]

generates a FeatureExtractorFunction[] trained from the examples given.

FeatureExtraction[examples,extractor]

uses the specified feature extractor method.

FeatureExtraction[examples,{extractor1,extractor2,}]

applies the extractori in sequence to generate a feature extractor.

FeatureExtraction[examples,specext]

uses the extractor methods specified by ext on parts of examples specified by spec.

FeatureExtraction[examples,{spec1ext1,spec2ext2,}]

uses the extractor methods exti on parts of examples specified by the speci.

FeatureExtraction[examples,extractor,props]

gives the feature extraction properties specified by props.

Details and Options

  • FeatureExtraction can be used on many types of data, including numerical, textual, sounds, and images, as well as combinations of these.
  • Each examplei can be a single data element, a list of data elements, an association of data elements, or a Dataset object.
  • FeatureExtraction[examples] returns a FeatureExtractorFunction[] that can be applied to specific data.
  • Possible feature extractor methods include:
  • Automaticautomatic extraction
    Identitygive data unchanged
    "ConformedData"conformed images, colors, dates, etc.
    "NumericVector"numeric vector from any data
    fapplies function f to each example
    {extractor1,extractor2,}use a sequence of extractors in turn
  • Additional feature extractor methods can also be used for each data type.
  • Numeric data:
  • "DiscretizedVector"discretized numerical data
    "DimensionReducedVector"reduced-dimension numeric vectors
    "MissingImputed"data with missing values imputed
    "StandardizedVector"numeric data processed with Standardize
  • Nominal data:
  • "IndicatorVector"nominal data "one-hot encoded" with indicator vectors
    "IntegerVector"nominal data encoded with integers
  • Text:
  • "LowerCasedText"text with each character lowercase
    "SegmentedCharacters"text segmented into characters
    "SegmentedWords"text segmented into words
    "TFIDF"term frequency-inverse document frequency vector
    "WordVectors"semantic vectors sequence from a text (English only)
  • Images:
  • "FaceFeatures"semantic vector from an image of a human face
    "ImageFeatures"semantic vector from an image
    "PixelVector"vector of pixel values from an image
  • Audio objects:
  • "AudioFeatures"sequence of semantic vectors from an audio object
    "AudioFeatureVector"semantic vector from an audio object
    "LPC"audio linear prediction coefficients
    "MelSpectrogram"audio spectrogram with logarithmic frequencies bins
    "MFCC"audio mel-frequency cepstral coefficients vectors sequence
    "SpeakerFeatures"sequence of semantic speaker vectors
    "SpeakerFeatureVector"semantic vector for a speaker
    "Spectrogram"audio spectrogram
  • Feature extractor methods are applied to data elements with whose types they are compatible. Other data elements are returned unchanged.
  • FeatureExtraction[examples] is equivalent to FeatureExtraction[examples,Automatic], which is typically equivalent to FeatureExtraction[examples,"NumericVector"].
  • The "NumericVector" method will typically convert examples to numeric vectors, impute missing data, and reduce the dimension using DimensionReduction.
  • In FeatureExtraction[examples,extractors,props], props can be a single property or a list of properties. Possible properties include:
  • "ExtractorFunction"FeatureExtractorFunction[] (default)
    "ExtractedFeatures"examples after feature extraction
    "ReconstructedData"examples after extraction and inverse extraction
    "FeatureDistance"FeatureDistance[] generated from the extractor
  • In FeatureExtraction[examples,specext] or FeatureExtraction[examples,{spec1ext1,}], possible forms for spec and the speci include:
  • Allall parts of each example
    ii^(th) part of each example
    {i1,i2,}parts i1, i2, of each example
    "name"part with the specified name in each example
    {"name1","name2",}parts with names "namei" in each example
  • Parts not mentioned in spec or the speci are dropped for the purpose of extracting features.
  • In FeatureExtraction[examples,{spec1ext1,}], the exti are all applied separately to examples.
  • The following options can be given:
  • FeatureNamesAutomaticnames to assign to elements of the examplei
    FeatureTypesAutomaticfeature types to assume for elements of the examplei
    RandomSeeding1234what seeding of pseudorandom generators should be done internally
  • Possible settings for RandomSeeding include:
  • Automaticautomatically reseed every time the function is called
    Inheriteduse externally seeded random numbers
    seeduse an explicit integer or strings as a seed
  • FeatureExtraction[,"ExtractedFeatures"] is equivalent to FeatureExtract[].
  • FeatureExtraction[,"FeatureDistance"] is equivalent to FeatureDistance[FeatureExtraction[]].

Examples

open allclose all

Basic Examples  (3)

Train a FeatureExtractorFunction on a simple dataset:

Extract features from a new example:

Extract features from a list of examples:

Train a feature extractor on a dataset of images:

Use the feature extractor on the training set:

Construct a feature extractor from a numerical dataset using the "StandardizedVector" extractor method:

Use the feature extractor on the training set:

The property "ExtractedFeatures" can be used to perform this operation in one step:

Multiple properties can be queried:

Scope  (12)

Train a feature extractor on textual data:

Use the feature extractor on new examples:

Train a feature extractor on a list of DateObject:

Extract features from a new DateObject:

A string date can also be given:

Train a feature extractor to compute term frequency-inverse document frequency vectors from texts:

The term frequency-inverse document frequency matrix of the training set can be computed in a SparseArray:

Visualize the matrix:

The "TFIDF" method can also be used on tokenized data (nominal bags):

Train a feature extractor on text using the "TFIDF" method followed by the "DimensionReduced" method:

Extract features on the training set:

Generate a feature extractor using a custom function:

Apply the extractor on the training set:

Chain the custom extractor with the "StandardizedVector" method:

Train a feature extractor with the "IndicatorVector" method on nominal variables:

Extract features from a new example:

Train a feature extractor with the "IndicatorVector" method on only the second nominal variable:

The first nominal variable is dropped:

Use the Identity extractor method to copy the first variable:

The first variable is copied:

A variable can be copied multiple times:

Train a feature extractor on a mixed-type dataset:

Extract features from a new example:

Train a feature extractor on texts and images using the "TFIDF" method:

Features will only be extracted from the text part:

Train a feature extractor from a dataset that contains missing values:

This feature extractor can extract features even when values are missing:

Train a feature extractor using the "StandardizedVector" method:

Extract features from a new example:

Since this feature extractor is invertible, the FeatureExtractorFunction property "OriginalData" can be used to perform the inverse extraction:

Some feature extractors can only perform an approximation of the inverse extraction:

The FeatureExtraction property "ReconstructedData" can be used to obtain the data after extraction and reconstruction:

Some feature extractors cannot be inverted:

Train a feature extractor from a list of associations:

Extract features from a new example:

Options  (3)

FeatureNames  (2)

Train a feature extractor and give a name to each feature:

Use the association format to extract features from a new example:

The list format can still be used:

Use FeatureNames to set up names and refer to them in FeatureExtraction[examples,{spec1ext1,}]:

FeatureTypes  (1)

Train a feature extractor with the "IndicatorVector" method on a simple dataset:

The first feature has been interpreted as numerical. Since the "IndicatorVector" method only acts on nominal features, the first feature is unchanged:

Use FeatureTypes to enforce the interpretation of the first feature as nominal:

Applications  (3)

Image Search  (1)

Construct a dataset of dog images:

Train an extractor function from this dataset:

Generate a NearestFunction on the extracted features of the dataset:

Using the NearestFunction, construct a function that displays the nearest image of the dataset:

Use this function on images that are not in the dataset:

This feature extractor function can also be used to delete image pairs that are too similar:

Text Search  (1)

Load the text of Alice in Wonderland:

Split the text into sentences:

Train a feature extractor on these sentences:

Generate a NearestFunction with the sentences' features:

Using the NearestFunction, construct a function that displays the nearest sentence in Alice in Wonderland:

Use this function with a few queries:

Imputation  (1)

Load the "MNIST" dataset from ExampleData and keep the images:

Convert images to numerical data and separate the dataset into a training set and a test set:

The dimension of the dataset is 784:

Create a feature extractor using the "MissingImputed" method:

Replace some values of a test-set vector by Missing[] and visualize it:

Impute missing values using the FeatureExtractorFunction[]:

Visualize the original image, the image with missing values, and the imputed image:

Introduced in 2016
 (11.0)
 |
Updated in 2017
 (11.2)
2020
 (12.1)