Penalized linear least-square fitting problem. More...

#include <utils.h>

Public Types
enum	APPROXSTATUS { AS_OK, AS_NOMEM, AS_INVRANGE, AS_NODATA, AS_SINGULAR }
	status codes returned by status() function and by fitting routines. More...

Public Member Functions
	CSplineApprox (const std::vector< double > &_xvalues, const std::vector< double > &_knots)
	initialize workspace for xvalues=x, knots=X in the above formulation. More...

APPROXSTATUS	status () const
	return status of the object

APPROXSTATUS	loadyvalues (const std::vector< double > &_yvalues)
	initialize fitting process for given data points y. More...

APPROXSTATUS	fitData (const double lambda, std::vector< double > splineValues, double rmserror=NULL, double *edf=NULL)
	perform actual fitting for y loaded previously, with given smoothing parameter lambda. More...

APPROXSTATUS	fitDataOptimal (std::vector< double > splineValues, double rmserror=NULL, double edf=NULL, double lambda=NULL)
	perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC. More...

APPROXSTATUS	fitDataOversmooth (const double deltaAIC, std::vector< double > splineValues, double rmserror=NULL, double edf=NULL, double lambda=NULL)
	perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda. More...

APPROXSTATUS	computeRegressionAtPoints (const std::vector< double > &xpoints, std::vector< double > *ypoints)
	convenience function to evaluate regression function at a given set of points, after fitData*** was called and solution for weights has been obtained

Private Member Functions
void	initRoughnessMatrix ()
	compute integrals over products of second derivatives of basis functions, and transform R to M+singValues

void	computeYvalues (std::vector< double > *splineValues)
	compute Y-values at spline knots X[k] after the weights w have been determined, and also two endpoint derivatives

APPROXSTATUS	fitDataSingular (std::vector< double > splineValues, double rmserror, double *edf)
	In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix. More...

Private Attributes
APPROXSTATUS	mystatus
	internal status of the object

const size_t	numDataPoints
	number of x[i],y[i] pairs (original data)

const size_t	numKnots
	number of X[k] knots in the fitting spline

const size_t	numBasisFnc
	number of b-spline basis functions = numKnots+2

gsl_vector *	knots
	b-spline knots X[k], k=0..numKnots-1

gsl_vector *	xvalues
	x[i], i=0..numDataPoints-1

gsl_vector *	yvalues
	y[i], overwritten each time loadYvalues is called

gsl_vector *	weightCoefs
	w_p, weight coefficients for basis functions to be found in the process of fitting

gsl_vector *	zRHS
	z_p = C^T y, right hand side of linear system

gsl_matrix *	bsplineMatrix
	matrix "C_ip" used in fitting process; i=0..numDataPoints-1, p=0..numBasisFnc-1

gsl_matrix *	LMatrix
	lower triangular matrix L is Cholesky decomposition of matrix A = C^T C, of size numBasisFnc*numBasisFnc

gsl_matrix *	MMatrix
	matrix "M" which is the transformed version of roughness matrix "R_pq" of integrals of product of second derivatives of basis functions; p,q=0..numBasisFnc-1

gsl_vector *	singValues
	part of the decomposition of the roughness matrix

gsl_vector *	MTz
	pre-computed M^T z

gsl_vector *	tempv
	some routines require temporary storage

gsl_bspline_workspace *	bsplineWorkspace
	workspace for b-spline evaluation

gsl_vector *	bsplineValues
	to compute values of all b-spline basis functions at a given point x

gsl_bspline_deriv_workspace *	bsplineDerivWorkspace
	workspace for derivative computation

gsl_matrix *	bsplineDerivValues
	to compute values and derivatives of basis functions

double	ynorm2
	\|y\|^2 - used to compute residual sum of squares (RSS)

Detailed Description

Penalized linear least-square fitting problem.

Approximate the data series {x[i], y[i], i=0..numDataPoints-1} with spline defined at {X[k], Y[k], k=0..numKnots-1} in the least-square sense, possibly with additional penalty term for 'roughness' (curvature).

Initialized once for a given set of x, X, and may be used to fit multiple sets of y with arbitrary degree of smoothing $\lambda$ .

Internally, the approximation is performed by multi-parameter linear least-square fitting: minimize

$\sum_{i=0}^{numDataPoints-1} (y_i - \hat y(x_i))^2 + \lambda \int [\hat y''(x)]^2 dx,$

where

$\hat y(x) = \sum_{p=0}^{numBasisFnc-1} w_p B_p(x)$

is the approximated regression for input data, $ B_p(x) $ are its basis functions and $ w_p $ are weights to be found.

Basis functions are b-splines with knots at X[k], k=0..numKnots-1; the number of basis functions is numKnots+2. Equivalently, the regression can be represented by clamped cubic spline with numKnots control points; b-splines are only used internally.

LLS fitting is done by solving the following linear system: $(\mathsf{A} + \lambda \mathsf{R}) \mathbf{w} = \mathbf{z}$ , where A and R are square matrices of size numBasisFnc, w and z are vectors of the same size, and $\lambda$ is a scalar (smoothing parameter).

A = C^T C, where C_ip is a matrix of size numDataPoints*numBasisFnc, containing value of p-th basis function at x[i], p=0..numBasisFnc-1, i=0..numDataPoints-1. z = C^T y, where y[i] is vector of original data points R is the roughness penalty matrix: $R_pq = \int B''_p(x) B''_q(x) dx$ .

Internally several arrays are initialized, the largest being C_ip matrix; in case of out-of-memory error no work is done and status is set to AS_NOMEM.

Member Enumeration Documentation

enum smile::CSplineApprox::APPROXSTATUS

status codes returned by status() function and by fitting routines.

Everything other than AS_OK or AS_SINGULAR means that return arrays are uninitialized

Enumerator
AS_OK	everything is fine
AS_NOMEM	not enough memory in initializing arrays
AS_INVRANGE	invalid range of input x and X values (the former should lie within the latter)
AS_NODATA	no y-values have been set
AS_SINGULAR	linear system is [near-]singular, in this case a backup SVD-based algorithm is used to fit but it doesn't deal with smoothing

Constructor & Destructor Documentation

smile::CSplineApprox::CSplineApprox	(	const std::vector< double > &	_xvalues,
		const std::vector< double > &	_knots
	)

initialize workspace for xvalues=x, knots=X in the above formulation.

knots must be sorted in ascending order, and all xvalues must lie between knots.front() and knots.back()

Member Function Documentation

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitData	(	const double	lambda,
		std::vector< double > *	splineValues,
		double *	rmserror = `NULL`,
		double *	edf = `NULL`
	)

perform actual fitting for y loaded previously, with given smoothing parameter lambda.

return spline values Y at knots X, and if necessary, rms error and EDF (equivalent degrees of freedom) in corresponding output parameters (if they are not NULL). Y has length of numKnots+2, the last two values are spline derivatives at endpoints (to pass to initialization of clamped spline): the internal fitting process uses b-splines, not natural cubic splines, therefore the endpoint derivatives are non-zero. EDF is equivalent number of free parameters in fit, increasing smoothing decreases EDF: 2<=EDF<=numKnots.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOptimal	(	std::vector< double > *	splineValues,
		double *	rmserror = `NULL`,
		double *	edf = `NULL`,
		double *	lambda = `NULL`
	)

perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC.

AIC (Akaike information criterion) is defined as log(rmserror*numDataPoints) + 2*EDF/(numDataPoints-EDF-1) . return spline values Y, rms error, equivalent degrees of freedom (EDF) and best-choice value of lambda.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOversmooth	(	const double	deltaAIC,
		std::vector< double > *	splineValues,
		double *	rmserror = `NULL`,
		double *	edf = `NULL`,
		double *	lambda = `NULL`
	)

perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda.

The difference in AIC (Akaike information criterion) between the solution with no smoothing and the returned solution is equal to deltaAIC (i.e. smooth more than optimal amount defined above). return spline values Y, rms error, equivalent degrees of freedom and best-choice value of lambda.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataSingular	(	std::vector< double > *	splineValues,
		double *	rmserror,
		double *	edf
	)

private

In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix.

It is much slower and cannot accomodate nonzero smoothing. This routine is automatically called from all fitData*** if mystatus==AS_SINGULAR.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::loadyvalues ( const std::vector< double > & _yvalues )

initialize fitting process for given data points y.

returns AS_INVRANGE if the number of elements in yvalues != xvalues.size()

The documentation for this class was generated from the following files:

utils.h
utils.cpp

Public Types

Public Member Functions

Private Member Functions

Private Attributes

Detailed Description

Member Enumeration Documentation

Constructor & Destructor Documentation

Member Function Documentation