FeatureExtract

FeatureExtract[{example₁,example₂,…}]

extracts features for each of the example_i using a feature extractor trained on all the example_i.

FeatureExtract[examples,extractor]

extracts features using the specified feature extractor method.

FeatureExtract[examples,{extractor₁,extractor₂,…}]

extracts features by applying the extractor_i in sequence.

FeatureExtract[examples,specext]

uses the extractor methods specified by ext on parts of examples specified by spec.

FeatureExtract[examples,{spec₁ext₁,spec₂ext₂,…}]

uses the extractor methods ext_i on parts of examples specified by the spec_i.

Details and Options

FeatureExtract can be used on many types of data, including numerical, textual, sounds, images, graphs, time series and combinations of these.
Each example_i can be a single data element, a list of data elements, an association of data elements, or a Dataset object.
Possible feature extractor methods include:

	Automatic	automatic extraction
	Identity	give data unchanged
	"ConformedData"	conformed images, colors, dates, etc.
	"NumericVector"	numeric vector from any data
	f	applies function f to each example
	{extractor₁,extractor₂,…}	use a sequence of extractors in turn

Additional feature extractor methods can also be used for each data type.
Numeric data:

	"DiscretizedVector"	discretized numerical data
	"DimensionReducedVector"	reduced-dimension numeric vectors
	"MissingImputed"	data with missing values imputed
	"StandardizedVector"	numeric data processed with Standardize

Nominal data:
"IndicatorVector" nominal data "one-hot encoded" with indicator vectors

"IntegerVector" nominal data encoded with integers
Text:

	"LowerCasedText"	text with each character lowercase
	"SegmentedCharacters"	text segmented into characters
	"SegmentedWords"	text segmented into words
	"TFIDF"	term frequency-inverse document frequency vector
	"WordVectors"	semantic vectors sequence from a text (English only)

Images:

	"FaceFeatures"	semantic vector from an image of a human face
	"ImageFeatures"	semantic vector from an image
	"PixelVector"	vector of pixel values from an image

Audio objects:

	"AudioFeatures"	sequence of semantic vectors from an audio object
	"AudioFeatureVector"	semantic vector from an audio object
	"LPC"	audio linear prediction coefficients
	"MelSpectrogram"	audio spectrogram with logarithmic frequencies bins
	"MFCC"	audio mel-frequency cepstral coefficients vectors sequence
	"SpeakerFeatures"	sequence of semantic speaker vectors
	"SpeakerFeatureVector"	semantic vector for a speaker
	"Spectrogram"	audio spectrogram

Video objects:
"VideoFeatures" sequence of semantic vectors from a video object

"VideoFeatureVector" semantic vector from a video object
Graphs:
"GraphFeatures" numeric vector summarizing graph properties
Molecules:

	"AtomPairs"	Boolean vector from pairs of atoms and the path lengths between them
	"MoleculeExtendedConnectivity"	Boolean vector from enumerated molecule subgraphs
	"MoleculeFeatures"	numeric vector summarizing molecule properties
	"MoleculeTopologicalFeatures"	Boolean vector from circular atom neighborhoods

Feature extractor methods are applied to data elements with whose types they are compatible. Other data elements are returned unchanged.
FeatureExtract[examples] is typically equivalent to FeatureExtract[examples,"NumericVector"].
In FeatureExtract[examples,specext] or FeatureExtract[examples,{spec₁ext₁,…}], possible forms for spec and the spec_i include:

	All	all parts of each example
	i	i part of each example
	{i₁,i₂,…}	parts i₁, i₂, … of each example
	"name"	part with the specified name in each example
	{"name₁","name₂",…}	parts with names "name_i" in each example

Parts not mentioned in spec or the spec_i are dropped for the purpose of extracting features.
In FeatureExtract[examples,{spec₁ext₁,…}], the ext_i are all applied separately to examples.
The following options can be given:

FeatureNames	Automatic	names to assign to elements of the example_i
FeatureTypes	Automatic	feature types to assume for elements of the example_i
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

FeatureExtract[…] is equivalent to FeatureExtraction[…,"ExtractedFeatures"].

Examples

open allclose all

Basic Examples (4)

Extract features from a simple dataset:

Extract feature from images:

Standardized numerical values using the "StandardizedVector" extractor method:

Extract TFIDF vectors on characters by chaining the extractor methods "SegmentedCharacters" and "TFIDF":

Scope (10)

Extract features from a list of DateObject:

Train a feature extractor on a list of Graph:

Train a feature extractor on a list of TimeSeries:

Compute term frequency-inverse document frequency vectors from texts:

By default, texts will be segmented into words. This gives the same result:

Extract features on text using the "TFIDF" method followed by the "DimensionReduced" method:

Extract features with the "IndicatorVector" method on nominal variables:

Extract features with the "IndicatorVector" method on the second nominal variable only:

Use the Identity extractor method to copy the first variable as well:

A variable can be copied multiple times:

Extract features on a mixed-type dataset:

Extract features on texts and images using the "TFIDF" method:

Features have only been extracted from the text part, since "TFIDF" does not apply to images.

Extract features from a dataset that contains missing values:

Extract features from a dataset formatted as a list of associations:

Options (2)

FeatureNames (1)

Use FeatureNames to name features, and refer to their names in part specifications:

FeatureTypes (1)

Extract features with the "IndicatorVector" method on a simple dataset:

The first feature has been interpreted as numerical, and since the "IndicatorVector" method only acts on nominal features, the first feature is unchanged.

Use FeatureTypes to enforce the interpretation of the first feature as nominal:

Applications (1)

Dataset Visualization (1)

Construct a dataset of dog images:

Extract features from this dataset:

Reduce the dimension of the extracted vectors to 2:

Visualize the images at their feature positions:

A similar visualization can be directly obtained using FeatureSpacePlot:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FeatureExtract

Details and Options

Examples

Basic Examples (4)

Scope (10)

Options (2)

FeatureNames (1)

FeatureTypes (1)

Applications (1)

Dataset Visualization (1)

Text

CMS

APA

BibTeX

BibLaTeX

	"IndicatorVector"	nominal data "one-hot encoded" with indicator vectors
	"IntegerVector"	nominal data encoded with integers

	"VideoFeatures"	sequence of semantic vectors from a video object
	"VideoFeatureVector"	semantic vector from a video object

FeatureExtract

Details and Options

Examples

Basic Examples (4)

Scope (10)

Options (2)

FeatureNames (1)

FeatureTypes (1)

Applications (1)

Dataset Visualization (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX