NormalizationLayer

NormalizationLayer[]

represents a trainable net layer that normalizes its input data across the second and subsequent dimensions and applies an independent scaling and bias to each component of the first dimension.

NormalizationLayer[aggregationlevels]

normalizes data across the specified aggregation levels and applies a learned scaling and bias on the remaining levels.

NormalizationLayer[aggregationlevels,scalinglevels]

applies a learned scaling and bias at the specified scaling levels.

Details and Options

NormalizationLayer is used to perform data normalization, followed by a learnable affine transformation.
The aggregationlevels determine which levels of the input are aggregated over to compute the mean and variance statistics, while the scalinglevels determine which levels of the input have componentwise specific scaling values and biases applied to them.
Possible values of aggregationlevels are:
a aggregate level a only

a₁;;a₂ aggregate levels a₁ through a₂

{a₁,…} aggregate levels a_i
The default aggregation levels are 2;;All.
Possible values of scalinglevels are:

	"Complement"	all levels except the aggregation levels (default)
	"Same"	the same levels as the aggregation levels
	s	level s only
	s₁;;s₂	levels s₁ through s₂
	{s₁,…}	levels s_i

The following optional parameters can be included:

"Epsilon"	0.001	stability parameter
"GroupNumber"	None	whether to split the data into groups
LearningRateMultipliers	Automatic	learning rate multipliers for the scaling and/or bias parameters
"Unbiased"	False	whether to normalize the standard deviation by the length minus one (instead of the length)
Method	"Standardize"	the normalization method to use

Possible settings for Method include:
"RMS" use the root mean square x_i/

"Standardize" use Standardize (x_i-μ)/
With aggregation levels {a₁,…,a_n}, by setting "GroupNumber"{g₁,…,g_n}, the a_j level is split into g_j groups, which are normalized independently. The dimension of the level a_j must be an integer multiple of g_j.
The following learnable parameters can be specified:
"Biases" Automatic learnable bias parameters

"Scaling" Automatic learnable scaling parameters
With Automatic settings, scaling and bias parameters are initialized automatically when NetInitialize or NetTrain is used. With None settings, scaling and adding biases can be disabled.
If scaling and bias parameters have been initialized or disabled, NormalizationLayer[…][input] explicitly computes the output from applying the layer.
NormalizationLayer[…][input] explicitly computes the output from applying the layer to input.
NormalizationLayer[…][{input₁,input₂,…}] explicitly computes outputs for each of the input_i.
When it cannot be inferred from other layers in a larger net, the option "Input"->{d₁,d₂,…} can be used to fix the input dimensions of NormalizationLayer.
NetExtract can be used to extract the scaling, biases and epsilon parameters from a NormalizationLayer object.
NormalizationLayer is typically used inside NetChain, NetGraph, etc.
NormalizationLayer exposes the following ports for use in NetGraph etc.:
"Input" an array of rank greater than 1

"Output" an array of rank greater than 1
Consider an input array that can be decomposed into d_a×d_b×… subarrays x_A across the aggregation levels {a,b,…} and into d_i×d_j×… subarrays x_I across the scaling levels {i,j,…}. The scaling and biases parameters and have dimensions (d_i×d_j×…).
The output of the "Standardize" method is obtained via , where the mean array is given by and the variance by .
The output of the "RMS" method is obtained via , where the RMS array is given by .
The output always has the same dimensions as the input.
The default NormalizationLayer[] corresponds to instance normalization in Ulyanov et al., "Instance Normalization: The Missing Ingredient for Fast Stylization", 2016, with the channel dimension at the first level and spatial dimensions at subsequent levels.
Method"RMS" can be use to recreate the RMSNorm described in Zhang et al., "Root Mean Square Layer Normalization", 2019. »
NormalizationLayer[2;;All,"Same","Input"{n,Automatic}] corresponds to layer normalization in Ba et al., "Layer Normalization", 2016, with the channel dimension at the last level and the time dimension at the first level.
NormalizationLayer[All,1,"GroupNumber"{n,1,1,…},"Input"{m*n,…}] corresponds to group normalization Wu et al., "Group Normalization", 2018, with the channel dimension at the first level and spatial dimensions at subsequent levels.
Options[NormalizationLayer] gives the list of default options to construct the layer. Options[NormalizationLayer[…]] gives the list of default options to evaluate the layer on some data.
Information[NormalizationLayer[…]] gives a report about the layer.
Information[NormalizationLayer[…],prop] gives the value of the property prop of NormalizationLayer[…]. Possible properties are the same as for NetGraph.

Examples

open allclose all

Basic Examples (2)

Create a NormalizationLayer:

Create an initialized NormalizationLayer that takes a rank-3 array:

Apply the layer to an input:

Scope (3)

Create an instance normalization layer that takes an RGB image and returns an RGB image:

Apply the layer to an image:

NormalizationLayer automatically threads over batches of inputs:

Create a NormalizationLayer that takes a sequence of two-dimensional vectors and returns the sequence of normalized vectors scaled by a factor of 2 and shifted by :

Apply layer normalization to some sinusoidal input:

Visualize the values before and after normalization:

Create an initialized NormalizationLayer that takes a rank-6 array and normalize across all dimensions:

Apply the layer to an input:

Options (9)

"Biases" (2)

Create a NormalizationLayer with the biases explicitly specified:

Extract the biases:

Create a NormalizationLayer without any bias:

The layer does not have any learnable bias:

"Epsilon" (1)

Create a NormalizationLayer with the "Epsilon" parameter explicitly specified:

Extract the "Epsilon" parameter:

"GroupNumber" (2)

NormalizationLayer computes the variance and mean across the entire aggregation level:

Specify that the aggregation level has to be split in two groups:

Each group is normalized independently:

This is equivalent to reshaping the input before the normalization:

In combination with applying the normalization one level deeper:

Create a NormalizationLayer without any scaling:

The layer does not have any learnable scaling parameter:

Method (1)

Create the RMSNorm layer from Zhang et al., "Root Mean Square Layer Normalization":

"Scaling" (2)

Create a NormalizationLayer with the scaling parameter explicitly specified:

Extract the scaling parameters:

Create a NormalizationLayer without any scaling:

The layer does not have any learnable scaling parameter:

"Unbiased" (1)

The default variance estimator is typically biased in neural net applications:

This is equivalent to:

Define a NormalizationLayer with an unbiased variance estimator:

The variance computation is now equivalent to Variance:

Properties & Relations (1)

NormalizationLayer is, in general, not the same as Standardize:

Initialize the layer with unit "Scaling" and zero "Biases":

Compute data standardization:

The stability parameter and the biased variance estimator introduce a difference:

Define a NormalizationLayer without regularization:

The result is closer to Standardize:

Possible Issues (1)

NormalizationLayer cannot be initialized until all its input and output dimensions are known:

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

NormalizationLayer

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (9)

"Biases" (2)

"Epsilon" (1)

"GroupNumber" (2)

Method (1)

"Scaling" (2)

"Unbiased" (1)

Properties & Relations (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	a	aggregate level a only
	a₁;;a₂	aggregate levels a₁ through a₂
	{a₁,…}	aggregate levels a_i

	"RMS"	use the root mean square	x_i/
	"Standardize"	use Standardize	(x_i-μ)/

	"Biases"	Automatic	learnable bias parameters
	"Scaling"	Automatic	learnable scaling parameters

	"Input"	an array of rank greater than 1
	"Output"	an array of rank greater than 1

NormalizationLayer

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (9)

"Biases" (2)

"Epsilon" (1)

"GroupNumber" (2)

Method (1)

"Scaling" (2)

"Unbiased" (1)

Properties & Relations (1)

Possible Issues (1)

See Also

Tech Notes

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX