# Curve fitting and splines There are many ways to fit a curve. There are strategies that minimize $\chi^2/{\rm d.o.f.}$ for when you know the functional form ahead of time, splines for when you don't, and other methods. The AnalysisToolbox includes some routines that are helpful for this purpose. By default the `Fitter` is [parallelized](../base/speedify.md) with `DEFAULTTHREADS` processes. Set `nproc=1` if you want to turn off parallelization. Splines are not parallelized in this way, but should be fast because they wrap SciPy methods. ## $\chi^2$ minimization In the module ```Python import latqcdtools.statistics.fitting ``` one finds a `Fitter` class for carrying out fits. The `Fitter` class encapsulates all information relevant to a fit, like its $x$-data, $y$-data, the fit form, and so on. After constructing a `fitter` object, one can then use associated methods to try various kinds of fits. These are generally wrappers from `scipy.optimize`. An easy example is given in `testing/fitting/simple_example.py`, shown below. ```Python import numpy as np import matplotlib.pyplot as plt from latqcdtools.statistics.fitting import Fitter from latqcdtools.base.logger import set_log_level set_log_level('DEBUG') print("\n Example of a simple 3-parameter quadratic fit.\n") # Here we define our fit function. we pass it its independent variable followed by the fit parameters we are # trying to determine. def fit_func(x, params): a = params[0] b = params[1] c = params[2] return a*x**2 + b*x + c xdata, ydata, edata = np.genfromtxt("wurf.dat", usecols=(0,2,3), unpack=True) # We initialize our Fitter object. If expand = True, fit_func has to look like # func(x, a, b, *args) # otherwise it has to look like # func(x, params, *args). fitter = Fitter(fit_func, xdata, ydata, expand = False) # Here we try a fit, using the 'curve_fit' method, specifying the starting guesses for the fit parameters. Since # ret_pcov = True, we will get back the covariance matrix as well. res, res_err, chi_dof, pcov = fitter.try_fit(start_params = [1, 2, 3], algorithms = ['curve_fit'], ret_pcov = True) print(" a , b, c : ",res) print(" ae, be, ce: ",res_err) print("chi2/d.o.f.: ",chi_dof) print(" pcov: \n",pcov,"\n") fitter.plot_fit() plt.show() ``` Supported fit algorithms include - L-BFGS-B - TNC - Powell - COBYLA - SLSQP which are essentially wrappers for `scipy` fit functions. If one does not specify an algorithm, the `try_fit` method will attempt all of them and return the result of whichever one had the best $\chi^2/{\rm d.o.f.}$ **IMPORTANT: The functions that you pass to these fitting routines have to be able to handle arrays!** E.g. you pass it `[x0, x1, ..., xN]` and get back `[f(x0), f(x1), ..., f(xN)]`. It is written this way to force better performance; if it were a typical loop it would be slow. If you are having trouble figuring out how to write your function in a way to handle arrays, a good starting point can be to use [np.vectorize](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html). The covariance matrix of the fit parameters are computed through error propagation of the covariance matrix of the $y$-data. The error is then obtained by $\sigma = \sqrt{\diag{\text{cov}}}$ In some codes, such as `gnuplot`, it is customary to multiply this error by a further factor $\chi^2/{\rm d.o.f.}$. The intuition behind this is that the error will be increased if the fit is poor. This is okay if you would like to be somewhat more conservative with your error bar, but it is strictly speaking not necessary. It also makes the error bar more difficult to interpret clearly, i.e. if your input data and errors were well estimated, then it's not clear that the true fit parameters will fall within one $\sigma$ of the estimators 67% of the time. The default behavior is not to include this factor, but in case you would like it in your fits, for example because you are feeling conservative, or for comparison with `gnuplot`, you can pass the option `norm_err_chi2=True` to your `try_fit` or `do_fit` call. ## Splines There are several methods in the toolbox to fit a 1D spline to some `xdata` and `ydata`. These can be found in `latcqdtools.math.spline.py`. The basic method is `getSpline` ```Python getSpline(xdata, ydata, num_knots, order=3, rand=False, fixedKnots=None) ``` This is by default a wrapper for `scipy.interpolate.LSQUnivariateSpline`. Here you specify how many knots `num_knots` you want and the order `order` of the spline. By default, the spline will create a list of `num_knots` evenly spaced knots, but you can specify `rand=True` to have it pick the knot locations randomly. If you need to specify some knot locations yourself, this is accomplished by passing a list of specified knots to `fixedKnots`. Note that `num_knots` includes these fixed knots in its counting; hence if ```Python len(fixedKnots)==num_knots ``` no knots will be generated randomly. There also exists the option to use natural splines.