This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.
$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$
Important: Maven artifacts which depend on Scala are now suffixed with the Scala major version, e.g. "2.10" or "2.11". Please consult the migration guide on the project Wiki.

Standard Scaler

Description

The standard scaler scales the given data set, so that all features will have a user specified mean and variance. In case the user does not provide a specific mean and standard deviation, the standard scaler transforms the features of the input data set to have mean equal to 0 and standard deviation equal to 1. Given a set of input data $x_1, x_2,… x_n$, with mean:

and standard deviation:

The scaled data set $z_1, z_2,…,z_n$ will be:

where $\textit{std}$ and $\textit{mean}$ are the user specified values for the standard deviation and mean.

Operations

StandardScaler is a Transformer. As such, it supports the fit and transform operation.

Fit

StandardScaler is trained on all subtypes of Vector or LabeledVector:

  • fit[T <: Vector]: DataSet[T] => Unit
  • fit: DataSet[LabeledVector] => Unit

Transform

StandardScaler transforms all subtypes of Vector or LabeledVector into the respective type:

  • transform[T <: Vector]: DataSet[T] => DataSet[T]
  • transform: DataSet[LabeledVector] => DataSet[LabeledVector]

Parameters

The standard scaler implementation can be controlled by the following two parameters:

Parameters Description
Mean

The mean of the scaled data set. (Default value: 0.0)

Std

The standard deviation of the scaled data set. (Default value: 1.0)

Examples

// Create standard scaler transformer
val scaler = StandardScaler()
.setMean(10.0)
.setStd(2.0)

// Obtain data set to be scaled
val dataSet: DataSet[Vector] = ...

// Learn the mean and standard deviation of the training data
scaler.fit(dataSet)

// Scale the provided data set to have mean=10.0 and std=2.0
val scaledDS = scaler.transform(dataSet)