This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.
$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$
Important: Maven artifacts which depend on Scala are now suffixed with the Scala major version, e.g. "2.10" or "2.11". Please consult the migration guide on the project Wiki.

MinMax Scaler

Description

The MinMax scaler scales the given data set, so that all values will lie between a user specified range [min,max]. In case the user does not provide a specific minimum and maximum value for the scaling range, the MinMax scaler transforms the features of the input data set to lie in the [0,1] interval. Given a set of input data $x_1, x_2,… x_n$, with minimum value:

and maximum value:

The scaled data set $z_1, z_2,…,z_n$ will be:

where $\textit{min}$ and $\textit{max}$ are the user specified minimum and maximum values of the range to scale.

Operations

MinMaxScaler is a Transformer. As such, it supports the fit and transform operation.

Fit

MinMaxScaler is trained on all subtypes of Vector or LabeledVector:

  • fit[T <: Vector]: DataSet[T] => Unit
  • fit: DataSet[LabeledVector] => Unit

Transform

MinMaxScaler transforms all subtypes of Vector or LabeledVector into the respective type:

  • transform[T <: Vector]: DataSet[T] => DataSet[T]
  • transform: DataSet[LabeledVector] => DataSet[LabeledVector]

Parameters

The MinMax scaler implementation can be controlled by the following two parameters:

Parameters Description
Min

The minimum value of the range for the scaled data set. (Default value: 0.0)

Max

The maximum value of the range for the scaled data set. (Default value: 1.0)

Examples

// Create MinMax scaler transformer
val minMaxscaler = MinMaxScaler()
  .setMin(-1.0)

// Obtain data set to be scaled
val dataSet: DataSet[Vector] = ...

// Learn the minimum and maximum values of the training data
minMaxscaler.fit(dataSet)

// Scale the provided data set to have min=-1.0 and max=1.0
val scaledDS = minMaxscaler.transform(dataSet)