Statistics

A collection of useful methods for statistics calculations can be found in

import latqcdtools.statistics.statistics

Mean, median, and standard error

There are wrappers for np.mean and np.median called std_mean and std_median. The advantage of using the wrappers is that you don’t have to specify the Axis. There is also a std_err, which calculates the error bar of the mean of the array, assuming the data are independent.

Error propagation

The function

error_prop_func(x, func, means, errors, grad=None, args=())

can be used to automatically propagate the errors of measurements means into the function func. Here x is a variable the function depends on that does not have error. If you like, you can specify the gradient grad of func yourself; otherwise this will be calculated numerically.

Gaussian difference test (Z-test)

A Gaussian difference test can be used to check whether two measurements are statistically consistent with one another. Given are two measurements x1 and x2 drawn from Gaussian distributions with the same mean and respective error bars e1 and e2. A call to

gaudif(x1,e1,x2,e2)

returns the q-value, which is the likelihood that x1 and x2 are at least as far apart as was observed.

Student different test (T-test)

In the case you have a small number of data, a more correct measure of the tension between two means is given by the Student difference test. This can be called with

studif(x1,e1,ndat1,x2,e2,ndat2)

where ndat1 is the number of measurements leading to mean x1 and ndat2 is the number of measurements leading to mean x2. For large enough ndat1 and ndat2, the results should be similar to the Gaussian difference test.

Integrated autocorrelation time

The integrated autocorrelation time is the ratio between the estimated variance of the sample mean and what this variance would have been if the data were independent; i.e. \(\sigma^2_{\bar{X}}=\frac{\sigma^2}{N}\tau_{\text{int}}.\) A call to

getTauInt(timeSeries, nbins, tpickMax, acoutfileName)

returns an estimate for \(\tau_{\text{int}}\), its error bar, its bias, and the Monte Carlo time separation at which it found this \(\tau_{\text{int}}\). It takes a time series timeSeries, a number of jackknife bins needed for the calculation nbins, and an estimate for the largest that the autocorrelation time could be. (You can play around with this latter variable a bit if you are having trouble to get it to run. Usually I pick a little less than half the size of the series.) The results are saved by default in acoutfileName=acor.d, but you can change this as well.