The standard scaler scales the given data set, so that all features will have a user specified mean and variance. In case the user does not provide a specific mean and standard deviation, the standard scaler transforms the features of the input data set to have mean equal to 0 and standard deviation equal to 1. Given a set of input data x1,x2,…xn, with mean:
ˉx=1nn∑i=1xiand standard deviation:
σx=√1nn∑i=1(xi−ˉx)2The scaled data set z1,z2,…,zn will be:
zi=std(xi−ˉxσx)+meanwhere std and mean are the user specified values for the standard deviation and mean.
StandardScaler
is a Transformer
.
As such, it supports the fit
and transform
operation.
StandardScaler is trained on all subtypes of Vector
or LabeledVector
:
fit[T <: Vector]: DataSet[T] => Unit
fit: DataSet[LabeledVector] => Unit
StandardScaler transforms all subtypes of Vector
or LabeledVector
into the respective type:
transform[T <: Vector]: DataSet[T] => DataSet[T]
transform: DataSet[LabeledVector] => DataSet[LabeledVector]
The standard scaler implementation can be controlled by the following two parameters:
Parameters | Description |
---|---|
Mean |
The mean of the scaled data set. (Default value: 0.0) |
Std |
The standard deviation of the scaled data set. (Default value: 1.0) |
// Create standard scaler transformer
val scaler = StandardScaler()
.setMean(10.0)
.setStd(2.0)
// Obtain data set to be scaled
val dataSet: DataSet[Vector] = ...
// Learn the mean and standard deviation of the training data
scaler.fit(dataSet)
// Scale the provided data set to have mean=10.0 and std=2.0
val scaledDS = scaler.transform(dataSet)