Statistics
A collection of useful methods for statistics calculations can be found in
import latqcdtools.statistics.statistics
Mean, median, and standard error
There are wrappers for np.mean
and np.median
called std_mean
and std_median
. The advantage of using the wrappers is that you don’t have to specify the Axis.
There is also a std_err
, which calculates the error bar of the mean of the array, assuming
the data are independent.
Error propagation
The function
error_prop_func(x, func, means, errors, grad=None, args=())
can be used to automatically propagate the errors
of measurements means
into the function func
. Here x
is a variable the function depends on that does not have error. If you like, you can specify the gradient grad
of func
yourself; otherwise this will be calculated numerically.
Gaussian difference test (Z-test)
A Gaussian difference test can be used to check whether two
measurements are statistically consistent with one another. Given are two measurements x1
and x2
drawn from Gaussian distributions with the same mean and respective error bars e1
and e2
. A call to
gaudif(x1,e1,x2,e2)
returns the q-value, which is the likelihood that x1
and x2
are at least as far apart as
was observed.
Student different test (T-test)
In the case you have a small number of data, a more correct measure of the tension between two means is given by the Student difference test. This can be called with
studif(x1,e1,ndat1,x2,e2,ndat2)
where ndat1
is the number of measurements leading to mean x1
and ndat2
is the number of
measurements leading to mean x2
. For large enough ndat1
and ndat2
, the results should
be similar to the Gaussian difference test.
Integrated autocorrelation time
The integrated autocorrelation time is the ratio between the estimated variance of the sample mean and what this variance would have been if the data were independent; i.e. \(\sigma^2_{\bar{X}}=\frac{\sigma^2}{N}\tau_{\text{int}}.\) A call to
getTauInt(timeSeries, nbins, tpickMax, acoutfileName)
returns an estimate for \(\tau_{\text{int}}\), its error bar, its bias, and the Monte Carlo time
separation at which it found this \(\tau_{\text{int}}\). It takes a time series timeSeries
, a
number of jackknife bins needed for the calculation nbins
, and an estimate for the largest that
the autocorrelation time could be. (You can play around with this latter variable a bit if you
are having trouble to get it to run. Usually I pick a little less than half the size of the series.)
The results are saved by default in acoutfileName=acor.d
, but you can change this as well.