SMILE
v2.5
Schwarzschild Modelling Interactive expLoratory Environment
|
Penalized linear least-square fitting problem. More...
#include <utils.h>
Public Types | |
enum | APPROXSTATUS { AS_OK, AS_NOMEM, AS_INVRANGE, AS_NODATA, AS_SINGULAR } |
status codes returned by status() function and by fitting routines. More... | |
Public Member Functions | |
CSplineApprox (const std::vector< double > &_xvalues, const std::vector< double > &_knots) | |
initialize workspace for xvalues=x, knots=X in the above formulation. More... | |
APPROXSTATUS | status () const |
return status of the object | |
APPROXSTATUS | loadyvalues (const std::vector< double > &_yvalues) |
initialize fitting process for given data points y. More... | |
APPROXSTATUS | fitData (const double lambda, std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL) |
perform actual fitting for y loaded previously, with given smoothing parameter lambda. More... | |
APPROXSTATUS | fitDataOptimal (std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL, double *lambda=NULL) |
perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC. More... | |
APPROXSTATUS | fitDataOversmooth (const double deltaAIC, std::vector< double > *splineValues, double *rmserror=NULL, double *edf=NULL, double *lambda=NULL) |
perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda. More... | |
APPROXSTATUS | computeRegressionAtPoints (const std::vector< double > &xpoints, std::vector< double > *ypoints) |
convenience function to evaluate regression function at a given set of points, after fitData*** was called and solution for weights has been obtained | |
Private Member Functions | |
void | initRoughnessMatrix () |
compute integrals over products of second derivatives of basis functions, and transform R to M+singValues | |
void | computeYvalues (std::vector< double > *splineValues) |
compute Y-values at spline knots X[k] after the weights w have been determined, and also two endpoint derivatives | |
APPROXSTATUS | fitDataSingular (std::vector< double > *splineValues, double *rmserror, double *edf) |
In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix. More... | |
Private Attributes | |
APPROXSTATUS | mystatus |
internal status of the object | |
const size_t | numDataPoints |
number of x[i],y[i] pairs (original data) | |
const size_t | numKnots |
number of X[k] knots in the fitting spline | |
const size_t | numBasisFnc |
number of b-spline basis functions = numKnots+2 | |
gsl_vector * | knots |
b-spline knots X[k], k=0..numKnots-1 | |
gsl_vector * | xvalues |
x[i], i=0..numDataPoints-1 | |
gsl_vector * | yvalues |
y[i], overwritten each time loadYvalues is called | |
gsl_vector * | weightCoefs |
w_p, weight coefficients for basis functions to be found in the process of fitting | |
gsl_vector * | zRHS |
z_p = C^T y, right hand side of linear system | |
gsl_matrix * | bsplineMatrix |
matrix "C_ip" used in fitting process; i=0..numDataPoints-1, p=0..numBasisFnc-1 | |
gsl_matrix * | LMatrix |
lower triangular matrix L is Cholesky decomposition of matrix A = C^T C, of size numBasisFnc*numBasisFnc | |
gsl_matrix * | MMatrix |
matrix "M" which is the transformed version of roughness matrix "R_pq" of integrals of product of second derivatives of basis functions; p,q=0..numBasisFnc-1 | |
gsl_vector * | singValues |
part of the decomposition of the roughness matrix | |
gsl_vector * | MTz |
pre-computed M^T z | |
gsl_vector * | tempv |
some routines require temporary storage | |
gsl_bspline_workspace * | bsplineWorkspace |
workspace for b-spline evaluation | |
gsl_vector * | bsplineValues |
to compute values of all b-spline basis functions at a given point x | |
gsl_bspline_deriv_workspace * | bsplineDerivWorkspace |
workspace for derivative computation | |
gsl_matrix * | bsplineDerivValues |
to compute values and derivatives of basis functions | |
double | ynorm2 |
|y|^2 - used to compute residual sum of squares (RSS) | |
Penalized linear least-square fitting problem.
Approximate the data series {x[i], y[i], i=0..numDataPoints-1} with spline defined at {X[k], Y[k], k=0..numKnots-1} in the least-square sense, possibly with additional penalty term for 'roughness' (curvature).
Initialized once for a given set of x, X, and may be used to fit multiple sets of y with arbitrary degree of smoothing .
Internally, the approximation is performed by multi-parameter linear least-square fitting: minimize
where
is the approximated regression for input data, are its basis functions and are weights to be found.
Basis functions are b-splines with knots at X[k], k=0..numKnots-1; the number of basis functions is numKnots+2. Equivalently, the regression can be represented by clamped cubic spline with numKnots control points; b-splines are only used internally.
LLS fitting is done by solving the following linear system: , where A and R are square matrices of size numBasisFnc, w and z are vectors of the same size, and is a scalar (smoothing parameter).
A = C^T C, where C_ip is a matrix of size numDataPoints*numBasisFnc, containing value of p-th basis function at x[i], p=0..numBasisFnc-1, i=0..numDataPoints-1. z = C^T y, where y[i] is vector of original data points R is the roughness penalty matrix: .
Internally several arrays are initialized, the largest being C_ip matrix; in case of out-of-memory error no work is done and status is set to AS_NOMEM.
status codes returned by status() function and by fitting routines.
Everything other than AS_OK or AS_SINGULAR means that return arrays are uninitialized
smile::CSplineApprox::CSplineApprox | ( | const std::vector< double > & | _xvalues, |
const std::vector< double > & | _knots | ||
) |
initialize workspace for xvalues=x, knots=X in the above formulation.
knots must be sorted in ascending order, and all xvalues must lie between knots.front() and knots.back()
CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitData | ( | const double | lambda, |
std::vector< double > * | splineValues, | ||
double * | rmserror = NULL , |
||
double * | edf = NULL |
||
) |
perform actual fitting for y loaded previously, with given smoothing parameter lambda.
return spline values Y at knots X, and if necessary, rms error and EDF (equivalent degrees of freedom) in corresponding output parameters (if they are not NULL). Y has length of numKnots+2, the last two values are spline derivatives at endpoints (to pass to initialization of clamped spline): the internal fitting process uses b-splines, not natural cubic splines, therefore the endpoint derivatives are non-zero. EDF is equivalent number of free parameters in fit, increasing smoothing decreases EDF: 2<=EDF<=numKnots.
CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOptimal | ( | std::vector< double > * | splineValues, |
double * | rmserror = NULL , |
||
double * | edf = NULL , |
||
double * | lambda = NULL |
||
) |
perform fitting with adaptive choice of smoothing parameter lambda, to minimize AIC.
AIC (Akaike information criterion) is defined as log(rmserror*numDataPoints) + 2*EDF/(numDataPoints-EDF-1) . return spline values Y, rms error, equivalent degrees of freedom (EDF) and best-choice value of lambda.
CSplineApprox::APPROXSTATUS smile::CSplineApprox::fitDataOversmooth | ( | const double | deltaAIC, |
std::vector< double > * | splineValues, | ||
double * | rmserror = NULL , |
||
double * | edf = NULL , |
||
double * | lambda = NULL |
||
) |
perform an 'oversmooth' fitting with adaptive choice of smoothing parameter lambda.
The difference in AIC (Akaike information criterion) between the solution with no smoothing and the returned solution is equal to deltaAIC (i.e. smooth more than optimal amount defined above). return spline values Y, rms error, equivalent degrees of freedom and best-choice value of lambda.
|
private |
In the unfortunate case that the fit matrix appears to be singular, another algorithm is used which is based on the GSL multifit routine, which performs SVD of bsplineMatrix.
It is much slower and cannot accomodate nonzero smoothing. This routine is automatically called from all fitData*** if mystatus==AS_SINGULAR.
CSplineApprox::APPROXSTATUS smile::CSplineApprox::loadyvalues | ( | const std::vector< double > & | _yvalues | ) |
initialize fitting process for given data points y.
returns AS_INVRANGE if the number of elements in yvalues != xvalues.size()