# Multiple Linear Regression

## Description

Multiple linear regression tries to find a linear function which best fits the provided input data. Given a set of input data with its value $(\mathbf{x}, y)$, multiple linear regression finds a vector $\mathbf{w}$ such that the sum of the squared residuals is minimized:

Written in matrix notation, we obtain the following formulation:

This problem has a closed form solution which is given by:

However, in cases where the input data set is so huge that a complete parse over the whole data set is prohibitive, one can apply stochastic gradient descent (SGD) to approximate the solution. SGD first calculates for a random subset of the input data set the gradients. The gradient for a given point $\mathbf{x}_i$ is given by:

The gradients are averaged and scaled. The scaling is defined by $\gamma = \frac{s}{\sqrt{j}}$ with $s$ being the initial step size and $j$ being the current iteration number. The resulting gradient is subtracted from the current weight vector giving the new weight vector for the next iteration:

The multiple linear regression algorithm computes either a fixed number of SGD iterations or terminates based on a dynamic convergence criterion. The convergence criterion is the relative change in the sum of squared residuals:

## Operations

MultipleLinearRegression is a Predictor. As such, it supports the fit and predict operation.

### Fit

MultipleLinearRegression is trained on a set of LabeledVector:

• fit: DataSet[LabeledVector] => Unit

### Predict

MultipleLinearRegression predicts for all subtypes of Vector the corresponding regression value:

• predict[T <: Vector]: DataSet[T] => DataSet[(T, Double)]

## Parameters

The multiple linear regression implementation can be controlled by the following parameters:

Parameters Description
Iterations

The maximum number of iterations. (Default value: 10)

Stepsize

Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance. (Default value: 0.1)

ConvergenceThreshold

Threshold for relative change of the sum of squared residuals until the iteration is stopped. (Default value: None)

LearningRateMethod

Learning rate method used to calculate the effective learning rate for each iteration. See the list of supported learning rate methods. (Default value: LearningRateMethod.Default)