This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

# Standard Scaler

## Description

The standard scaler scales the given data set, so that all features will have a user specified mean and variance. In case the user does not provide a specific mean and standard deviation, the standard scaler transforms the features of the input data set to have mean equal to 0 and standard deviation equal to 1. Given a set of input data $x_1, x_2,… x_n$, with mean:

and standard deviation:

The scaled data set $z_1, z_2,…,z_n$ will be:

where $\textit{std}$ and $\textit{mean}$ are the user specified values for the standard deviation and mean.

## Operations

StandardScaler is a Transformer. As such, it supports the fit and transform operation.

### Fit

StandardScaler is trained on all subtypes of Vector or LabeledVector:

• fit[T <: Vector]: DataSet[T] => Unit
• fit: DataSet[LabeledVector] => Unit

### Transform

StandardScaler transforms all subtypes of Vector or LabeledVector into the respective type:

• transform[T <: Vector]: DataSet[T] => DataSet[T]
• transform: DataSet[LabeledVector] => DataSet[LabeledVector]

## Parameters

The standard scaler implementation can be controlled by the following two parameters:

Parameters Description
Mean

The mean of the scaled data set. (Default value: 0.0)

Std

The standard deviation of the scaled data set. (Default value: 1.0)