Processing math: 100%
This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Standard Scaler

Description

The standard scaler scales the given data set, so that all features will have a user specified mean and variance. In case the user does not provide a specific mean and standard deviation, the standard scaler transforms the features of the input data set to have mean equal to 0 and standard deviation equal to 1. Given a set of input data x1,x2,xn, with mean:

ˉx=1nni=1xi

and standard deviation:

σx=1nni=1(xiˉx)2

The scaled data set z1,z2,,zn will be:

zi=std(xiˉxσx)+mean

where std and mean are the user specified values for the standard deviation and mean.

Operations

StandardScaler is a Transformer. As such, it supports the fit and transform operation.

Fit

StandardScaler is trained on all subtypes of Vector or LabeledVector:

  • fit[T <: Vector]: DataSet[T] => Unit
  • fit: DataSet[LabeledVector] => Unit

Transform

StandardScaler transforms all subtypes of Vector or LabeledVector into the respective type:

  • transform[T <: Vector]: DataSet[T] => DataSet[T]
  • transform: DataSet[LabeledVector] => DataSet[LabeledVector]

Parameters

The standard scaler implementation can be controlled by the following two parameters:

Parameters Description
Mean

The mean of the scaled data set. (Default value: 0.0)

Std

The standard deviation of the scaled data set. (Default value: 1.0)

Examples

// Create standard scaler transformer
val scaler = StandardScaler()
.setMean(10.0)
.setStd(2.0)

// Obtain data set to be scaled
val dataSet: DataSet[Vector] = ...

// Learn the mean and standard deviation of the training data
scaler.fit(dataSet)

// Scale the provided data set to have mean=10.0 and std=2.0
val scaledDS = scaler.transform(dataSet)

Back to top