RIDGE REGRESSION

Performs a PWAS constrained ridge regression on a set of data.

Contents

Function of MOBY-DIC TOOLBOX.

Description

Given a matrix X ($n_{data} \times n_{dim}$) and a vector Y ($n_{data} \times 1$), this routine finds a pwas function f that minimizes the least square residual

$$ || f(X) \textrm{--} Y ||^2 \qquad (1) $$

The function f is expressed as a weighted sum of alpha basis functions:

$$ f(x) = \sum_{j=1}^{Nbs} w_j \alpha_j(x) \qquad (2) $$

In this way, by substituting (2) in (1) the optimization problem can be recast in terms of the weights vector w:

$$\min_w || Gw \textrm{--} d ||^2 $$

A Tichkonov 0 or 1 order regularization is also performed in order to filter noise. Moreover, constraints on weights w can be imposed in order to reproduce desired behaviours.

At the end, the optimization problem which is solved is the following:

$$ \min_w || Gw \textrm{--} d ||^2 + || \lambda L w ||^2 $$

subject to:

$$ A w \leq B$$

$$ Aeq w = Beq $$

The parameter lambda, if not provided, is estimated in order to minimize a cost function. If a test set is provided, lambda is chosen in order to minimize the quadratic error in the test set; if it is not provided a Generalized Cross Validation (GCV) technique is adopted in order to select the best value for lambda (see function GCVcost).

Syntax

[fpwas info] = ridgeRegression(X,Y,P,[Xt],[Yt])

X must be a [ndata x ndim] matrix and Y a [ndata x 1] array. P defines the simplicial partition you want the fpwas function to be defined on. If P is a scalar, each dimension of the domain is subdivided into P intervals. If it is a vector, you specify individually the number of subdivisions per dimension. P can also be a cell array whose i-th element contains the i-th component of the vertices of the simplicial partition (for non-uniform partition). The domain of the pwas function is automatically extrapolated from input data X and Xt. Xt and Yt (optional) represent a test dataset used to estimate the optimal value for the Tikhonov parameter lambda. lambda is chosen in order to minimize | f(Xt) - Yt |^2 being f the pwas function obtained starting from the training dataset (X,Y). If Xt and Yt are not provided, lambda is chosen with a GCV approach. Xt must be a [ndatatest x ndim] matrix and Yt a [ndatatest x 1] array.

fpwas is a pwas object defining the pwas function obtained after the regression.

info is a struct with the following fields:

[fpwas info] = ridgeRegression(X,Y,P,[Xt],[Yt],D)

As above, but the domain of the pwas function is passed from outside the function. D is a matrix in the form: $$\left[ \begin{array}{cccc} x_{min}^1 & x_{min}^2 & \ldots & x_{min}^{nx}\\ x_{max}^1 & x_{max}^2 & \ldots & x_{max}^{nx} \end{array} \right] $$

[fpwas info] = ridgeRegression(X,Y,P,[Xt],[Yt],options)

options is a structure with the following fields:

                - type: it can be either 'bounds' or 'equality'. If it
                        is 'bounds' constraints in the form
                        lbound <= f(x) <= ubound are imposed. lbound and
                        ubound can be provided through fields lbound and
                        ubound. If they are not provided they are
                        imposed as the minimum and maximum values of the
                        data Y and Yt contained in the dataset.
                        If type is equality, equality constraints in the
                        form f(x) = k can be imposed for x lying on a
                        hyper-plane parallel to the domain components.
                - lbound: lower bound for constraints of type 'bounds'
                - ubound: upper bound for constraints of type 'bounds'
                - variables: needed for equality constraints. It is an
                             array of strings indicating for which set of
                             points you want to impose the constraints. For
                             example if you want to impose a constraint
                             on f, for x1 = 3 and x3 = 2, variables must
                             be ['x1 = 3', 'x3 = 2'].
                - value: needed for equality constraints. It is the
                         value you want the function to assume in
                         correspondence of the points defined in field
                         variables. To impose the constraint f(x) = 0,
                         for x2 = 1 and x4 = 5 you gave to set
                         variables = ['x2 = 1', 'x4 = 5'] and value = 0.

[fpwas info] = ridgeRegression(X,Y,P,[Xt],[Yt],D,options)

All fields explained above are specified.

Acknowledgements

Contributors:

Copyright is with: