FeatureExtraction
FeatureExtraction[{example1,example2,…}]
generates a FeatureExtractorFunction[…] trained from the examples given.
FeatureExtraction[examples,extractor]
uses the specified feature extractor method.
FeatureExtraction[examples,{extractor1,extractor2,…}]
applies the extractori in sequence to generate a feature extractor.
FeatureExtraction[examples,specext]
uses the extractor methods specified by ext on parts of examples specified by spec.
FeatureExtraction[examples,{spec1ext1,spec2ext2,…}]
uses the extractor methods exti on parts of examples specified by the speci.
FeatureExtraction[examples,extractor,props]
gives the feature extraction properties specified by props.
Details and Options
- FeatureExtraction can be used on many types of data, including numerical, textual, sounds, images, graphs and time series, as well as combinations of these.
- Each examplei can be a single data element, a list of data elements, an association of data elements, or a Dataset object.
- FeatureExtraction[examples] returns a FeatureExtractorFunction[…] that can be applied to specific data.
- Possible feature extractor methods include:
-
Automatic automatic extraction Identity give data unchanged "ConformedData" conformed images, colors, dates, etc. "NumericVector" numeric vector from any data f applies function f to each example {extractor1,extractor2,…} use a sequence of extractors in turn - Additional feature extractor methods can also be used for each data type.
- Numeric data:
-
"DiscretizedVector" discretized numerical data "DimensionReducedVector" reduced-dimension numeric vectors "MissingImputed" data with missing values imputed "StandardizedVector" numeric data processed with Standardize - Nominal data:
-
"IndicatorVector" nominal data "one-hot encoded" with indicator vectors "IntegerVector" nominal data encoded with integers - Text:
-
"LowerCasedText" text with each character lowercase "SegmentedCharacters" text segmented into characters "SegmentedWords" text segmented into words "TFIDF" term frequency-inverse document frequency vector "WordVectors" semantic vectors sequence from a text (English only) - Images:
-
"FaceFeatures" semantic vector from an image of a human face "ImageFeatures" semantic vector from an image "PixelVector" vector of pixel values from an image - Audio objects:
-
"AudioFeatures" sequence of semantic vectors from an audio object "AudioFeatureVector" semantic vector from an audio object "LPC" audio linear prediction coefficients "MelSpectrogram" audio spectrogram with logarithmic frequencies bins "MFCC" audio mel-frequency cepstral coefficients vectors sequence "SpeakerFeatures" sequence of semantic speaker vectors "SpeakerFeatureVector" semantic vector for a speaker "Spectrogram" audio spectrogram - Video objects:
-
"VideoFeatures" sequence of semantic vectors from a video object "VideoFeatureVector" semantic vector from a video object - Graphs:
-
"GraphFeatures" numeric vector summarizing graph properties - Molecules:
-
"AtomPairs" Boolean vector from pairs of atoms and the path lengths between them "MoleculeExtendedConnectivity" Boolean vector from enumerated molecule subgraphs "MoleculeFeatures" numeric vector summarizing molecule properties "MoleculeTopologicalFeatures" Boolean vector from circular atom neighborhoods - Feature extractor methods are applied to data elements with whose types they are compatible. Other data elements are returned unchanged.
- FeatureExtraction[examples] is equivalent to FeatureExtraction[examples,Automatic], which is typically equivalent to FeatureExtraction[examples,"NumericVector"].
- The "NumericVector" method will typically convert examples to numeric vectors, impute missing data, and reduce the dimension using DimensionReduction.
- In FeatureExtraction[examples,extractors,props], props can be a single property or a list of properties. Possible properties include:
-
"ExtractorFunction" FeatureExtractorFunction[…] (default) "ExtractedFeatures" examples after feature extraction "ReconstructedData" examples after extraction and inverse extraction "FeatureDistance" FeatureDistance[…] generated from the extractor - In FeatureExtraction[examples,specext] or FeatureExtraction[examples,{spec1ext1,…}], possible forms for spec and the speci include:
-
All all parts of each example i i part of each example {i1,i2,…} parts i1, i2, … of each example "name" part with the specified name in each example {"name1","name2",…} parts with names "namei" in each example - Parts not mentioned in spec or the speci are dropped for the purpose of extracting features.
- In FeatureExtraction[examples,{spec1ext1,…}], the exti are all applied separately to examples.
- The following options can be given:
-
FeatureNames Automatic names to assign to elements of the examplei FeatureTypes Automatic feature types to assume for elements of the examplei RandomSeeding 1234 what seeding of pseudorandom generators should be done internally - Possible settings for RandomSeeding include:
-
Automatic automatically reseed every time the function is called Inherited use externally seeded random numbers seed use an explicit integer or strings as a seed - FeatureExtraction[…,"ExtractedFeatures"] is equivalent to FeatureExtract[…].
- FeatureExtraction[…,"FeatureDistance"] is equivalent to FeatureDistance[FeatureExtraction[…]].
Examples
open allclose allBasic Examples (3)
Train a FeatureExtractorFunction on a simple dataset:
Extract features from a new example:
Extract features from a list of examples:
Train a feature extractor on a dataset of images:
Use the feature extractor on the training set:
Construct a feature extractor from a numerical dataset using the "StandardizedVector" extractor method:
Use the feature extractor on the training set:
The property "ExtractedFeatures" can be used to perform this operation in one step:
Scope (14)
Train a feature extractor on textual data:
Use the feature extractor on new examples:
Train a feature extractor on a list of DateObject:
Extract features from a new DateObject:
A string date can also be given:
Train a feature extractor on a list of Graph:
Extract features from a new graph:
Train a feature extractor on a list of TimeSeries:
Extract features from a new TimeSeries:
Train a feature extractor to compute term frequency-inverse document frequency vectors from texts:
The term frequency-inverse document frequency matrix of the training set can be computed in a SparseArray:
The "TFIDF" method can also be used on tokenized data (nominal bags):
Train a feature extractor on text using the "TFIDF" method followed by the "DimensionReduced" method:
Extract features on the training set:
Generate a feature extractor using a custom function:
Apply the extractor on the training set:
Chain the custom extractor with the "StandardizedVector" method:
Train a feature extractor with the "IndicatorVector" method on nominal variables:
Extract features from a new example:
Train a feature extractor with the "IndicatorVector" method on only the second nominal variable:
The first nominal variable is dropped:
Use the Identity extractor method to copy the first variable:
A variable can be copied multiple times:
Train a feature extractor on a mixed-type dataset:
Extract features from a new example:
Train a feature extractor on texts and images using the "TFIDF" method:
Features will only be extracted from the text part:
Train a feature extractor from a dataset that contains missing values:
This feature extractor can extract features even when values are missing:
Train a feature extractor using the "StandardizedVector" method:
Extract features from a new example:
Since this feature extractor is invertible, the FeatureExtractorFunction property "OriginalData" can be used to perform the inverse extraction:
Some feature extractors can only perform an approximation of the inverse extraction:
The FeatureExtraction property "ReconstructedData" can be used to obtain the data after extraction and reconstruction:
Some feature extractors cannot be inverted:
Options (3)
FeatureNames (2)
Train a feature extractor and give a name to each feature:
Use the association format to extract features from a new example:
The list format can still be used:
Use FeatureNames to set up names and refer to them in FeatureExtraction[examples,{spec1ext1,…}]:
FeatureTypes (1)
Train a feature extractor with the "IndicatorVector" method on a simple dataset:
The first feature has been interpreted as numerical. Since the "IndicatorVector" method only acts on nominal features, the first feature is unchanged:
Use FeatureTypes to enforce the interpretation of the first feature as nominal:
Applications (3)
Image Search (1)
Construct a dataset of dog images:
Train an extractor function from this dataset:
Generate a NearestFunction on the extracted features of the dataset:
Using the NearestFunction, construct a function that displays the nearest image of the dataset:
Use this function on images that are not in the dataset:
This feature extractor function can also be used to delete image pairs that are too similar:
Text Search (1)
Load the text of Alice in Wonderland:
Split the text into sentences:
Train a feature extractor on these sentences:
Generate a NearestFunction with the sentences' features:
Using the NearestFunction, construct a function that displays the nearest sentence in Alice in Wonderland:
Imputation (1)
Load the "MNIST" dataset from ExampleData and keep the images:
Convert images to numerical data and separate the dataset into a training set and a test set:
The dimension of the dataset is 784:
Create a feature extractor using the "MissingImputed" method:
Replace some values of a test-set vector by Missing[] and visualize it:
Impute missing values using the FeatureExtractorFunction[…]:
Visualize the original image, the image with missing values, and the imputed image:
Text
Wolfram Research (2016), FeatureExtraction, Wolfram Language function, https://reference.wolfram.com/language/ref/FeatureExtraction.html (updated 2021).
CMS
Wolfram Language. 2016. "FeatureExtraction." Wolfram Language & System Documentation Center. Wolfram Research. Last Modified 2021. https://reference.wolfram.com/language/ref/FeatureExtraction.html.
APA
Wolfram Language. (2016). FeatureExtraction. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/FeatureExtraction.html