$$ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}\!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} $$

Multiple Linear Regression


Multiple linear regression tries to find a linear function which best fits the provided input data. Given a set of input data with its value $(\mathbf{x}, y)$, multiple linear regression finds a vector $\mathbf{w}$ such that the sum of the squared residuals is minimized:

Written in matrix notation, we obtain the following formulation:

This problem has a closed form solution which is given by:

However, in cases where the input data set is so huge that a complete parse over the whole data set is prohibitive, one can apply stochastic gradient descent (SGD) to approximate the solution. SGD first calculates for a random subset of the input data set the gradients. The gradient for a given point $\mathbf{x}_i$ is given by:

The gradients are averaged and scaled. The scaling is defined by $\gamma = \frac{s}{\sqrt{j}}$ with $s$ being the initial step size and $j$ being the current iteration number. The resulting gradient is subtracted from the current weight vector giving the new weight vector for the next iteration:

The multiple linear regression algorithm computes either a fixed number of SGD iterations or terminates based on a dynamic convergence criterion. The convergence criterion is the relative change in the sum of squared residuals:


MultipleLinearRegression is a Predictor. As such, it supports the fit and predict operation.


MultipleLinearRegression is trained on a set of LabeledVector:

  • fit: DataSet[LabeledVector] => Unit


MultipleLinearRegression predicts for all subtypes of Vector the corresponding regression value:

  • predict[T <: Vector]: DataSet[T] => DataSet[LabeledVector]

If we call predict with a DataSet[LabeledVector], we make a prediction on the regression value for each example, and return a DataSet[(Double, Double)]. In each tuple the first element is the true value, as was provided from the input DataSet[LabeledVector] and the second element is the predicted value. You can then use these (truth, prediction) tuples to evaluate the algorithm’s performance.

  • predict: DataSet[LabeledVector] => DataSet[(Double, Double)]


The multiple linear regression implementation can be controlled by the following parameters:

Parameters Description

The maximum number of iterations. (Default value: 10)


Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance. (Default value: 0.1)


Threshold for relative change of the sum of squared residuals until the iteration is stopped. (Default value: None)


Learning rate method used to calculate the effective learning rate for each iteration. See the list of supported learning rate methods. (Default value: LearningRateMethod.Default)


// Create multiple linear regression learner
val mlr = MultipleLinearRegression()

// Obtain training and testing data set
val trainingDS: DataSet[LabeledVector] = ...
val testingDS: DataSet[Vector] = ...

// Fit the linear model to the provided data

// Calculate the predictions for the test data
val predictions = mlr.predict(testingDS)