Bootstrapping routines

Given a set of \(N\) measurements \(\{x_1,...,x_N\}\), the statistical bootstrap allows one to estimate the error in some function of the measurements \(f\). Sometimes this is advantageous to error propagation, since analytically calculating the error in the original function is too complicated. In the context of lattice field theory, this happens e.g. when fitting correlators and trying to get the error from a fit parameter. In the Analysistoolbox, these methods can be found in

import latqcdtools.statistics.bootstr

Just as with the jackknife, we stress that an advantage of the bootstrapping routines is that you can pass them arbitrary functions.

Ordinary bootstrap

Starting with our original measurements, one builds a bootstrap sample by drawing \(N\) data from the original sample with replacement. One repeats this process \(K\) times. From bootstrap sample \(i\), one gets an estimate of the mean of interest. Averaging the \(K\) means from each bootstrap sample gives a bootstrap mean. The method

bootstr(func, data, numb_samples, sample_size = 0, same_rand_for_obs = False, conf_axis = 1, return_sample = False,
        seed = None, err_by_dist = False, args=(), nproc=DEFAULTTHREADS)

accomplishes this for an arbitrary \(f\) func.

By default, the bootstrap sample size is equal to the original number of measurements. We resample with replacement. The size can be adjusted with the sample_size argument. By default the bootstrap is parallelized with DEFAULTTHREADS processes. Set nproc=1 if you want to turn off parallelization.

Gaussian bootstrap

The Gaussian bootstrap method,

bootstr_from_gauss(func, data, data_std_dev, numb_samples, sample_size = 1, same_rand_for_obs = False,
                   return_sample = False, seed = None, err_by_dist = True, useCovariance = False,
                   Covariance = None, args = (), nproc = DEFAULTTHREADS, asym_err=False)

will resample as follows: For each element of data, random data will be drawn from normal distributions with means equal to the values in data and standard deviations from data_std_dev. This defines one Gaussian bootstrap sample, and the function func is applied to the sample. This process is repeated numb_samples times.

By default, the Gaussian bootstrap returns the median and 68-percentiles from the sample. You can return the standard deviation instead by switching err_by_dist to False. You also have the option to get back asymmetric quantiles/errors using asymm_err=True.