Curve fitting and splines

There are many ways to fit a curve. There are strategies that minimize \(\chi^2/{\rm d.o.f.}\) for when you know the functional form ahead of time, splines for when you don’t, and other methods. The AnalysisToolbox includes some routines that are helpful for this purpose. By default the Fitter is parallelized with DEFAULTTHREADS processes. Set nproc=1 if you want to turn off parallelization. Splines are not parallelized in this way, but should be fast because they wrap SciPy methods.

\(\chi^2\) minimization

In the module

import latqcdtools.statistics.fitting

one finds a Fitter class for carrying out fits. The Fitter class encapsulates all information relevant to a fit, like its \(x\)-data, \(y\)-data, the fit form, and so on. After constructing a fitter object, one can then use associated methods to try various kinds of fits. These are generally wrappers from scipy.optimize. An easy example is given in testing/fitting/simple_example.py, shown below.

import numpy as np
import matplotlib.pyplot as plt
from latqcdtools.statistics.fitting import Fitter
from latqcdtools.base.logger import set_log_level

set_log_level('DEBUG')

print("\n Example of a simple 3-parameter quadratic fit.\n")

# Here we define our fit function. we pass it its independent variable followed by the fit parameters we are
# trying to determine.
def fit_func(x, params):
    a = params[0]
    b = params[1]
    c = params[2]
    return a*x**2 + b*x + c

xdata, ydata, edata = np.genfromtxt("wurf.dat", usecols=(0,2,3), unpack=True)

# We initialize our Fitter object. If expand = True, fit_func has to look like
#            func(x, a, b, *args)
#        otherwise it has to look like
#            func(x, params, *args).
fitter = Fitter(fit_func, xdata, ydata, expand = False)

# Here we try a fit, using the 'curve_fit' method, specifying the starting guesses for the fit parameters. Since
# ret_pcov = True, we will get back the covariance matrix as well.
res, res_err, chi_dof, pcov = fitter.try_fit(start_params = [1, 2, 3], algorithms = ['curve_fit'], ret_pcov = True)

print(" a , b,  c : ",res)
print(" ae, be, ce: ",res_err)
print("chi2/d.o.f.: ",chi_dof)
print("       pcov: \n",pcov,"\n")

fitter.plot_fit()
plt.show()

Supported fit algorithms include

L-BFGS-B
TNC
Powell
COBYLA
SLSQP which are essentially wrappers for scipy fit functions. If one does not specify an algorithm, the try_fit method will attempt all of them and return the result of whichever one had the best \(\chi^2/{\rm d.o.f.}\)

IMPORTANT: The functions that you pass to these fitting routines have to be able to handle arrays! E.g. you pass it [x0, x1, ..., xN] and get back [f(x0), f(x1), ..., f(xN)]. It is written this way to force better performance; if it were a typical loop it would be slow. If you are having trouble figuring out how to write your function in a way to handle arrays, a good starting point can be to use np.vectorize.

The covariance matrix of the fit parameters are computed through error propagation of the covariance matrix of the \(y\)-data. The error is then obtained by

\(\sigma = \sqrt{\diag{\text{cov}}}\)

In some codes, such as gnuplot, it is customary to multiply this error by a further factor \(\chi^2/{\rm d.o.f.}\). The intuition behind this is that the error will be increased if the fit is poor. This is okay if you would like to be somewhat more conservative with your error bar, but it is strictly speaking not necessary. It also makes the error bar more difficult to interpret clearly, i.e. if your input data and errors were well estimated, then it’s not clear that the true fit parameters will fall within one \(\sigma\) of the estimators 67% of the time. The default behavior is not to include this factor, but in case you would like it in your fits, for example because you are feeling conservative, or for comparison with gnuplot, you can pass the option norm_err_chi2=True to your try_fit or do_fit call.

Splines

There are several methods in the toolbox to fit a 1D spline to some xdata and ydata. These can be found in latcqdtools.math.spline.py. The basic method is getSpline

getSpline(xdata, ydata, num_knots, order=3, rand=False, fixedKnots=None)

This is by default a wrapper for scipy.interpolate.LSQUnivariateSpline. Here you specify how many knots num_knots you want and the order order of the spline. By default, the spline will create a list of num_knots evenly spaced knots, but you can specify rand=True to have it pick the knot locations randomly. If you need to specify some knot locations yourself, this is accomplished by passing a list of specified knots to fixedKnots. Note that num_knots includes these fixed knots in its counting; hence if

len(fixedKnots)==num_knots

no knots will be generated randomly. There also exists the option to use natural splines.