Wolfram Language & System Documentation Center

FeatureExtraction

FeatureExtraction[examples]

generates a FeatureExtractorFunction[…] trained from the examples given.

FeatureExtraction[examples,spec]

uses the specified feature extractor method spec.

FeatureExtraction[examples,spec,props]

gives the feature extraction properties specified by props.

Details and Options

FeatureExtraction is typically used to define a function that processes raw data into usable features (e.g. for training a machine learning algorithm).
FeatureExtraction can be used on many types of data, including numerical, textual, sounds, images, graphs and time series, as well as combinations of these.
Possible values of examples are:
{example₁,…} a list of training examples

Dataset[…] a Dataset object

Tabular[…] a Tabular object

None no training examples
Each example_i can be a single data element, a list of data elements or an association of data elements.
Possible values for spec include:

	extractor	use the specified extractor method
	partextractor	apply the extractor to the specific example part
	{part₁extractor₁,…}	specify extractors for specific parts

Possible feature extractor methods extractor include:

	Automatic	automatic extraction
	Identity	give data unchanged
	"ConformedData"	conformed images, colors, dates, etc.
	"NumericVector"	numeric vector from any data
	"name"	a named extractor method
	f	applies function f to each example
	{extractor₁,extractor₂,…}	use a sequence of extractors in turn

Possible forms of part are:

	All	all parts of each example
	i	i part of each example
	{i₁,i₂,…}	parts i₁, i₂, … of each example
	"key"	part with the specified key in each example
	{"key₁","key₂",…}	parts with names "key_i" in each example

When explicitly specifying parts, any unmentioned parts are dropped when extracting features.

Extractors

FeatureExtraction[examples] is equivalent to FeatureExtraction[examples,Automatic], which is typically equivalent to FeatureExtraction[examples,"NumericVector"].
The "NumericVector" method will typically convert examples to numeric vectors, impute missing data and reduce the dimension using DimensionReduction.
Feature extractor methods specific to a single data type are applied only to data elements with whose types they are compatible. Other data elements are returned unchanged.
Not all specific feature extractors are available when the examples is None.
The specific extractors are:
Numeric data:

	"DiscretizedVector"	discretized numerical data
	"DimensionReducedVector"	reduced-dimension numeric vectors
	"MissingImputed"	data with missing values imputed
	"StandardizedVector"	numeric data processed with Standardize

Nominal data:
"IndicatorVector" nominal data "one-hot encoded" with indicator vectors

"IntegerVector" nominal data encoded with integers
Text:

	"LowerCasedText"	text with each character lowercase
	"SegmentedCharacters"	text segmented into characters
	"SegmentedWords"	text segmented into words
	"SentenceVector"	semantic vector from a text
	"TFIDF"	term frequency-inverse document frequency vector
	"WordVectors"	semantic vectors sequence from a text (English only)

Images:

	"FaceFeatures"	semantic vector from an image of a human face
	"ImageFeatures"	semantic vector from an image
	"PixelVector"	vector of pixel values from an image

Audio objects:

	"AudioFeatures"	sequence of semantic vectors from an audio object
	"AudioFeatureVector"	semantic vector from an audio object
	"LPC"	audio linear prediction coefficients
	"MelSpectrogram"	audio spectrogram with logarithmic frequencies bins
	"MFCC"	audio mel-frequency cepstral coefficients vectors sequence
	"SpeakerFeatures"	sequence of semantic speaker vectors
	"SpeakerFeatureVector"	semantic vector for a speaker
	"Spectrogram"	audio spectrogram

Video objects:
"VideoFeatures" sequence of semantic vectors from a video object

"VideoFeatureVector" semantic vector from a video object
Graphs:
"GraphFeatures" numeric vector summarizing graph properties
Molecules:

	"AtomPairs"	Boolean vector from pairs of atoms and the path lengths between them
	"MoleculeExtendedConnectivity"	Boolean vector from enumerated molecule subgraphs
	"MoleculeFeatures"	numeric vector summarizing molecule properties
	"MoleculeTopologicalFeatures"	Boolean vector from circular atom neighborhoods

Properties

In FeatureExtraction[examples,extractors,props], props can be a single property or a list of properties. Possible properties include:

	"ExtractorFunction"	FeatureExtractorFunction[…] (default)
	"ExtractedFeatures"	examples after feature extraction
	"ReconstructedData"	examples after extraction and inverse extraction
	"FeatureDistance"	FeatureDistance[…] generated from the extractor

The "ExtractedFeatures" and "ReconstructedData" properties are not available when examples is None.
The "ReconstructedData" property can be computed only when every specified extractor is invertible.

Options

The following options can be given:

FeatureNames	Automatic	names to assign to elements of the example_i
FeatureTypes	Automatic	feature types to assume for elements of the example_i
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

Examples

open all close all

Basic Examples (3)

Train a FeatureExtractorFunction on a simple dataset:

Extract features from a new example:

Extract features from a list of examples:

Train a feature extractor on a dataset of images:

Use the feature extractor on the training set:

Specify a specific extractor:

Scope (32)

Input Shape (9)

Train a feature extractor on a list of examples with a single feature:

Extract features from a new example:

Extract features from multiple new examples:

Train a feature extractor on a list of examples with multiple features:

Extract features from multiple new examples:

Train a feature extractor on a mixed-type dataset:

Extract features from a new example:

Train a feature extractor from a list of associations:

Extract features from a new example:

Extract features from multiple new examples:

Train a feature extractor from data given as feature lists:

Train a feature extractor from a Tabular:

Train a feature extractor from a Dataset:

Train a feature extractor from a dataset that contains missing values:

Define a feature extractor that requires no training:

Apply it on some text:

Extractor Specifications (10)

Specify the feature extractor "SentenceVector" on a single textual feature:

Apply it on some text:

Train a feature extractor using the "StandardizedVector" method:

Extract features from a new example:

Since this feature extractor is invertible, the FeatureExtractorFunction property "OriginalData" can be used to perform the inverse extraction:

Train a feature extractor on text using the "TFIDF" method followed by the "DimensionReducedVector" method:

Extract features on the training set:

Train a feature extractor on texts and images using the text-only "TFIDF" method:

Features will only be extracted from the text part:

Specify the feature extraction on multiple features by position:

Use the feature extractor on new features:

A list of two items will be assumed to be a single input of two features:

Train a feature extractor with the "IndicatorVector" method on only the second nominal variable:

The first nominal variable is dropped:

Use the Identity extractor method to copy the first variable:

The first variable is copied:

A variable can be copied multiple times:

Specify the feature extraction on multiple features by key:

Use the feature extractor on new features:

Using the feature extractor on a list will assume the same ordering of features as originally specified:

Generate a feature extractor using a custom function:

Apply the extractor on the training set:

Chain the custom extractor with the "StandardizedVector" method:

Conform data prior to processing:

Reduce the dimensionality of the output:

Feature Types (10)

Create a feature extractor for textual data using the "SentenceVector" extractor with no training:

Input type is inferred from the specified extractor. Use the feature extractor on some examples:

Create a feature extractor for examples with implicit textual and image features:

Features will be extracted from both parts:

Train a feature extractor on textual data:

Train a feature extractor with the "IndicatorVector" method on nominal variables:

Train a feature extractor to compute term frequency-inverse document frequency vectors from texts:

The term frequency-inverse document frequency matrix of the training set can be computed in a SparseArray:

Visualize the matrix:

The "TFIDF" method can also be used on tokenized data (nominal bags):

Train a feature extractor on a list of DateObject instances:

Extract features from a new DateObject:

A string date can also be given:

Train a feature extractor on a list of Graph instances:

Extract features from a new graph:

Train a feature extractor on a list of TimeSeries instances:

Train a feature extractor on Molecule data:

Train a feature extractor on a list of Audio instances:

Information (3)

Get Information from a trained FeatureExtractorFunction:

Find the available properties:

Get information about the input and output types:

Options (4)

FeatureNames (2)

Train a feature extractor and give a name to each feature:

Use the association format to extract features from a new example:

The list format can still be used:

Use FeatureNames to set up names and refer to them in FeatureExtraction[examples,{spec₁ext₁,…}]:

Extract features on a new example using the names to specify the features:

FeatureTypes (2)

Train a feature extractor with the "IndicatorVector" method on a simple dataset:

The first feature has been interpreted as numerical. Since the "IndicatorVector" method only acts on nominal features, the first feature is unchanged:

Use FeatureTypes to enforce the interpretation of the first feature as nominal:

Now both features are encoded as indicator vectors:

Creating a feature extractor with no training infers the expected data type from the specific extractor:

Specifying the feature type will override the assumption:

Apply to named features:

Applications (3)

Image Search (1)

Construct a dataset of dog images:

Train an extractor function from this dataset:

Generate a NearestFunction on the extracted features of the dataset:

Using the NearestFunction, construct a function that displays the nearest image of the dataset:

Use this function on images that are not in the dataset:

This feature extractor function can also be used to delete image pairs that are too similar:

Text Search (1)

Load the text of Alice in Wonderland:

Split the text into sentences:

Train a feature extractor on these sentences:

Generate a NearestFunction with the sentences' features:

Using the NearestFunction, construct a function that displays the nearest sentence in Alice in Wonderland:

Use this function with a few queries:

Imputation (1)

Load the "MNIST" dataset from ExampleData and keep the images:

Convert images to numerical data and separate the dataset into a training set and a test set:

The dimension of the dataset is 784:

Create a feature extractor using the "MissingImputed" method:

Replace some values of a test-set vector by Missing[] and visualize it:

Impute missing values using the FeatureExtractorFunction[…]:

Visualize the original image, the image with missing values, and the imputed image:

Properties & Relations (4)

Train a feature extractor from data with named features:

Unrecognized keys will be ignored:

FeatureExtraction[…,"ExtractedFeatures"] is equivalent to FeatureExtract[…]:

The "FeatureDistance" property is equivalent to using FeatureDistance on the extractor:

Compute the FeatureExtractorFunction first:

Construct a feature distance for this feature extractor:

The two distance functions are identical:

Creating a FeatureExtractorFunction on some training data creates a feature space representing those features:

Using different training data can result in a sized feature space:

Creating the same item with no data will result in a untrained function that will consistently give the same results in the same feature space:

Possible Issues (7)

Training an extractor on anonymous data will use automatic feature names:

Using custom names when applying the function will give a feature missing error:

Feature names can be specified at training time:

Check the feature names of a FeatureExtractorFunction:

The custom name can now be used:

The FeatureExtraction property "ReconstructedData" can be used to obtain the data after extraction and reconstruction:

Some feature extractors can only perform an approximation of the inverse extraction:

Some feature extractors cannot be inverted:

The property "ReconstructedData" cannot be used without training data:

Some extractors can be created without needing data:

Others require examples to initialize them:

Similarity, not all properties are supported:

Extractors that do not match the data type are ignored:

The input type is "Nominal", so the "LowerCasedText" extractor ignores the input type:

Similarly, forcing the input to "Text" will cause the "IndicatorVector" to be ignored:

The "ConformedData" extractor requires additional information to operate in a data-free context:

Specifying the FeatureTypes explicitly:

The feature type can also be implicitly inferred from subsequent extractors:

The automatic feature extraction often applies a dimension reduction step:

Explicit feature extractors do not include dimensional reduction and typically result in longer vectors:

Use the "DimensionReducedVector" to add a dimension reduction step:

Dimension reduction must be trained on the available features and therefore cannot be applied when no data is provided:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FeatureExtraction