This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.
$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$
Important: Maven artifacts which depend on Scala are now suffixed with the Scala major version, e.g. "2.10" or "2.11". Please consult the migration guide on the project Wiki.

Distance Metrics

Description

Different metrics of distance are convenient for different types of analysis. Flink ML provides built-in implementations for many standard distance metrics. You can create custom distance metrics by implementing the DistanceMetric trait.

Built-in Implementations

Currently, FlinkML supports the following metrics:

Metric Description
Euclidean Distance $$d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}$$
Squared Euclidean Distance $$d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2$$
Cosine Similarity $$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$
Chebyshev Distance $$d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)$$
Manhattan Distance $$d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert$$
Minkowski Distance $$d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$
Tanimoto Distance $$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$ with $\x$ and $\y$ being bit-vectors

Custom Implementation

You can create your own distance metric by implementing the DistanceMetric trait.

class MyDistance extends DistanceMetric {
  override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric
}

object MyDistance {
  def apply() = new MyDistance()
}

val myMetric = MyDistance()