SMILE  v2.5
Schwarzschild Modelling Interactive expLoratory Environment
Public Types | Public Member Functions | Private Member Functions | Private Attributes | List of all members
smile::CSplineApprox Class Reference

Penalized linear least-square fitting problem. More...

#include <utils.h>

Public Types

enum  APPROXSTATUS {
  AS_OK, AS_NOMEM, AS_INVRANGE, AS_NODATA,
  AS_SINGULAR
}
 status codes returned by status() function and by fitting routines. More...
 

Public Member Functions

 CSplineApprox (const std::vector< double > &_xvalues, const std::vector< double > &_knots)
 initialize workspace for xvalues=x, knots=X in the above formulation. More...
 
APPROXSTATUS status () const
 return status of the object
 
APPROXSTATUS loadyvalues (const std::vector< double > &_yvalues)
 initialize fitting process for given data points y. More...
 
APPROXSTATUS fitData (const double lambda, std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL)
 perform actual fitting for y loaded previously, with given smoothing parameter lambda. More...
 
APPROXSTATUS fitDataOptimal (std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL, double *lambda=NULL)
 perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC. More...
 
APPROXSTATUS fitDataOversmooth (const double deltaAIC, std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL, double *lambda=NULL)
 perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda. More...
 
APPROXSTATUS computeRegressionAtPoints (const std::vector< double > &xpoints, std::vector< double > *ypoints)
 convenience function to evaluate regression function at a given set of points, after fitData*** was called and solution for weights has been obtained
 

Private Member Functions

void initRoughnessMatrix ()
 compute integrals over products of second derivatives of basis functions, and transform R to M+singValues
 
void computeYvalues (std::vector< double > *splineValues)
 compute Y-values at spline knots X[k] after the weights w have been determined, and also two endpoint derivatives
 
APPROXSTATUS fitDataSingular (std::vector< double > *splineValues, double *rmserror, double *edf)
 In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix. More...
 

Private Attributes

APPROXSTATUS mystatus
 internal status of the object
 
const size_t numDataPoints
 number of x[i],y[i] pairs (original data)
 
const size_t numKnots
 number of X[k] knots in the fitting spline
 
const size_t numBasisFnc
 number of b-spline basis functions = numKnots+2
 
gsl_vector * knots
 b-spline knots X[k], k=0..numKnots-1
 
gsl_vector * xvalues
 x[i], i=0..numDataPoints-1
 
gsl_vector * yvalues
 y[i], overwritten each time loadYvalues is called
 
gsl_vector * weightCoefs
 w_p, weight coefficients for basis functions to be found in the process of fitting
 
gsl_vector * zRHS
 z_p = C^T y, right hand side of linear system
 
gsl_matrix * bsplineMatrix
 matrix "C_ip" used in fitting process; i=0..numDataPoints-1, p=0..numBasisFnc-1
 
gsl_matrix * LMatrix
 lower triangular matrix L is Cholesky decomposition of matrix A = C^T C, of size numBasisFnc*numBasisFnc
 
gsl_matrix * MMatrix
 matrix "M" which is the transformed version of roughness matrix "R_pq" of integrals of product of second derivatives of basis functions; p,q=0..numBasisFnc-1
 
gsl_vector * singValues
 part of the decomposition of the roughness matrix
 
gsl_vector * MTz
 pre-computed M^T z
 
gsl_vector * tempv
 some routines require temporary storage
 
gsl_bspline_workspace * bsplineWorkspace
 workspace for b-spline evaluation
 
gsl_vector * bsplineValues
 to compute values of all b-spline basis functions at a given point x
 
gsl_bspline_deriv_workspace * bsplineDerivWorkspace
 workspace for derivative computation
 
gsl_matrix * bsplineDerivValues
 to compute values and derivatives of basis functions
 
double ynorm2
 |y|^2 - used to compute residual sum of squares (RSS)
 

Detailed Description

Penalized linear least-square fitting problem.

Approximate the data series {x[i], y[i], i=0..numDataPoints-1} with spline defined at {X[k], Y[k], k=0..numKnots-1} in the least-square sense, possibly with additional penalty term for 'roughness' (curvature).

Initialized once for a given set of x, X, and may be used to fit multiple sets of y with arbitrary degree of smoothing $ \lambda $.

Internally, the approximation is performed by multi-parameter linear least-square fitting: minimize

\[ \sum_{i=0}^{numDataPoints-1} (y_i - \hat y(x_i))^2 + \lambda \int [\hat y''(x)]^2 dx, \]

where

\[ \hat y(x) = \sum_{p=0}^{numBasisFnc-1} w_p B_p(x) \]

is the approximated regression for input data, $ B_p(x) $ are its basis functions and $ w_p $ are weights to be found.

Basis functions are b-splines with knots at X[k], k=0..numKnots-1; the number of basis functions is numKnots+2. Equivalently, the regression can be represented by clamped cubic spline with numKnots control points; b-splines are only used internally.

LLS fitting is done by solving the following linear system: $ (\mathsf{A} + \lambda \mathsf{R}) \mathbf{w} = \mathbf{z} $, where A and R are square matrices of size numBasisFnc, w and z are vectors of the same size, and $ \lambda $ is a scalar (smoothing parameter).

A = C^T C, where C_ip is a matrix of size numDataPoints*numBasisFnc, containing value of p-th basis function at x[i], p=0..numBasisFnc-1, i=0..numDataPoints-1. z = C^T y, where y[i] is vector of original data points R is the roughness penalty matrix: $ R_pq = \int B''_p(x) B''_q(x) dx $.

Internally several arrays are initialized, the largest being C_ip matrix; in case of out-of-memory error no work is done and status is set to AS_NOMEM.

Member Enumeration Documentation

status codes returned by status() function and by fitting routines.

Everything other than AS_OK or AS_SINGULAR means that return arrays are uninitialized

Enumerator
AS_OK 

everything is fine

AS_NOMEM 

not enough memory in initializing arrays

AS_INVRANGE 

invalid range of input x and X values (the former should lie within the latter)

AS_NODATA 

no y-values have been set

AS_SINGULAR 

linear system is [near-]singular, in this case a backup SVD-based algorithm is used to fit but it doesn't deal with smoothing

Constructor & Destructor Documentation

smile::CSplineApprox::CSplineApprox ( const std::vector< double > &  _xvalues,
const std::vector< double > &  _knots 
)

initialize workspace for xvalues=x, knots=X in the above formulation.

knots must be sorted in ascending order, and all xvalues must lie between knots.front() and knots.back()

Member Function Documentation

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitData ( const double  lambda,
std::vector< double > *  splineValues,
double *  rmserror = NULL,
double *  edf = NULL 
)

perform actual fitting for y loaded previously, with given smoothing parameter lambda.

return spline values Y at knots X, and if necessary, rms error and EDF (equivalent degrees of freedom) in corresponding output parameters (if they are not NULL). Y has length of numKnots+2, the last two values are spline derivatives at endpoints (to pass to initialization of clamped spline): the internal fitting process uses b-splines, not natural cubic splines, therefore the endpoint derivatives are non-zero. EDF is equivalent number of free parameters in fit, increasing smoothing decreases EDF: 2<=EDF<=numKnots.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOptimal ( std::vector< double > *  splineValues,
double *  rmserror = NULL,
double *  edf = NULL,
double *  lambda = NULL 
)

perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC.

AIC (Akaike information criterion) is defined as log(rmserror*numDataPoints) + 2*EDF/(numDataPoints-EDF-1) . return spline values Y, rms error, equivalent degrees of freedom (EDF) and best-choice value of lambda.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOversmooth ( const double  deltaAIC,
std::vector< double > *  splineValues,
double *  rmserror = NULL,
double *  edf = NULL,
double *  lambda = NULL 
)

perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda.

The difference in AIC (Akaike information criterion) between the solution with no smoothing and the returned solution is equal to deltaAIC (i.e. smooth more than optimal amount defined above). return spline values Y, rms error, equivalent degrees of freedom and best-choice value of lambda.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataSingular ( std::vector< double > *  splineValues,
double *  rmserror,
double *  edf 
)
private

In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix.

It is much slower and cannot accomodate nonzero smoothing. This routine is automatically called from all fitData*** if mystatus==AS_SINGULAR.

CSplineApprox::APPROXSTATUS smile::CSplineApprox::loadyvalues ( const std::vector< double > &  _yvalues)

initialize fitting process for given data points y.

returns AS_INVRANGE if the number of elements in yvalues != xvalues.size()


The documentation for this class was generated from the following files: