# Statistics 

A collection of useful methods for statistics calculations can be found in
```Python
import latqcdtools.statistics.statistics
```

## Mean, median, and standard error 
There are wrappers for `np.mean` and `np.median` called `std_mean` 
and `std_median`. The advantage of using the wrappers is that you don't have to specify the Axis. 
There is also a `std_err`, which calculates the error bar of the mean of the array, assuming 
the data are independent.

## Error propagation
The function  
```Python
error_prop_func(x, func, means, errors, grad=None, args=())
```
can be used to automatically propagate the `errors` of measurements `means` into the function `func`. Here `x`
is a variable the function depends on that does not have error. If you like, you can specify the gradient `grad`
of `func` yourself; otherwise this will be calculated numerically.

## Gaussian difference test (Z-test) 
A Gaussian difference test can be used to check whether two 
measurements are statistically consistent with one another. Given are two measurements `x1` 
and `x2` drawn from Gaussian distributions with the same mean and respective error bars `e1` 
and `e2`. A call to
```Python
gaudif(x1,e1,x2,e2)
```
returns the q-value, which is the likelihood that `x1` and `x2` are at least as far apart as 
was observed.

## Student different test (T-test)
In the case you have a small number of data, a more correct measure of the tension between two means
is given by the Student difference test. This can be called with
```Python
studif(x1,e1,ndat1,x2,e2,ndat2)
```
where `ndat1` is the number of measurements leading to mean `x1` and `ndat2` is the number of
measurements leading to mean `x2`. For large enough `ndat1` and `ndat2`, the results should
be similar to the Gaussian difference test.

## Integrated autocorrelation time
The integrated autocorrelation time is the ratio between 
the estimated variance of the sample mean and what this variance would have been if the data were 
independent; i.e. $\sigma^2_{\bar{X}}=\frac{\sigma^2}{N}\tau_{\text{int}}.$ 
A call to
```Python
getTauInt(timeSeries, nbins, tpickMax, acoutfileName)
```
returns an estimate for $\tau_{\text{int}}$, its error bar, its bias, and the Monte Carlo time 
separation at which it found this $\tau_{\text{int}}$. It takes a time series `timeSeries`, a 
number of jackknife bins needed for the calculation `nbins`, and an estimate for the largest that 
the autocorrelation time could be. (You can play around with this latter variable a bit if you 
are having trouble to get it to run. Usually I pick a little less than half the size of the series.) 
The results are saved by default in `acoutfileName=acor.d`, but you can change this as well.