cosmo montecarlo cluster stat hdr

function ds_z=cosmo_montecarlo_cluster_stat(ds,nbrhood,varargin)
% compute random-effect cluster statistics corrected for multiple comparisons
%
% ds_z=cosmo_montecarlo_cluster_stat(ds,nh,varargin)
%
% Inputs:
%   ds                  dataset struct
%   nbrhood             neighborhood structure, typically from
%                       cosmo_cluster_neighborhood
%   'niter',niter       number of null iterations. More null
%                       iterations leads to a more precise estimate of the
%                       cluster statistics, but also takes longer. Except
%                       for testing scenarios, this value should usually be
%                       at least 1000, but 10,000 or more is recommended
%                       for publication-quality analyses.
%   'h0_mean'           Mean under the null hypothesis
%                       Required if and only if a one-sample t-test is
%                       performed, i.e. all values in ds.sa.targets are the
%                       same and all values in ds.sa.chunks are different
%                       - If ds contains correlation differences, then
%                         h0_mean=0 is a typical value
%                       - If ds contains classification accuracies, then
%                         h0_mean=1/C, with C the number of classes
%                         (conditions), is a typical value (assuming that
%                         the partitioning scheme is balanced)
%   'feature_stat',f    (optional) statistic for features, one of 'auto' or
%                       'none':
%                       - 'auto': Compute cluster-based statistics (one- or
%                                 two-sample t-test, one-way ANOVA, or
%                                 repeated measures ANOVA; see Notes
%                                 section below how to set the .sa.targets
%                                 and .sa.chunks attributes.
%                       - 'none': Do not compute statistics, but instead
%                                 use the input data from ds directly. In
%                                 this case:
%                                 * no statistic is computed;
%                                 * ds.samples must be a row vector.
%                                 * h0_mean is required.
%                                 * the 'null' option is required
%                                 * 'niter' must not be provided.
%                                 * when using the 'tfce' cluster_stat
%                                   option (the default), 'dh' must be
%                                   provided explicitly.
%                                 This option is intended for use when
%                                 null data and feature-wise statistics
%                                 have already been computed.
%                       Default: 'auto'.
%   'cluster_stat',s    (optional) statistic for clusters, one of 'tfce'
%                       'max', 'maxsum', 'maxsize'. Default: 'tfce'
%                       (Threshold-Free Cluster Enhancement; see
%                       References). For more details on the other options,
%                       see the help of cosmo_measure_clusters.
%   'dh',dh             (optional) Threshold step (only if cluster_stat is
%                       'tfce'). The default value of dh=0.1 should be fine
%                       in most (if not all) cases.
%                       Exception: when using 'feature_stat','none',
%                       the 'dh' option is required when using 'tfce'.
%                       For typical use cases, a value so that 100*dh is
%                       in the same order of magnitude as the range
%                       (maximum minus minimum) of the input (in .samples)
%                       may be a reasonable compromise between speed and
%                       accuracy
%   'p_uncorrected'     (optional) Uncorrected (feature-wise) p-value (only
%                       if cluster_stat is not 'tfce')
%   'null', null_data   (optional) 1xP cell with null datasets, for example
%                       from cosmo_randomize_targets. Each dataset in
%                       null_data{i} must have the same size as ds, but
%                       must contain null data. For example, if a
%                       one-sample t-test across participants is used to
%                       test classification  accuracies against chance
%                       level (with the 'h0_mean' option), then each
%                       dataset null_data{i} should contain classification
%                       accuracies for each participant that were computed
%                       using randomized values for .sa.targets.
%                       If this option is provided, then null data is
%                       sampled from null_data. According to Stelzer et al
%                       (see References), about 100 null datasets per
%                       participant are sufficient for good estimates of
%                       data under the null hypothesis.
%                       If this option is not provided, null data is based
%                       on the contents of ds, which is less precise (more
%                       conservative, in other words has less power).
%                       Exception: when using 'feature_stat','none', then
%                       the recommendation for the number of null_data
%                       elements is the same as the recommendation of
%                       'niter' if the 'feature_stat' option is not 'none'.
%   'progress',p        Show progress every p steps (default: 10). Use
%                       p=false to not show progress.
%   'nproc', np         If the Matlab parallel processing toolbox, or the
%                       GNU Octave parallel package is available, use
%                       np parallel threads. (Multiple threads may speed
%                       up computations).
%                       If parallel processing is not available, or if
%                       this option is not provided, then a single thread
%                       is used.
%   'seed',s            Use seed s for pseudo-random number generation. If
%                       this option is provided, then this function behaves
%                       deterministically. If this option is omitted (the
%                       default), then repeated calls with the same input
%                       may lead to slightly different results.
%
% Output:
%   ds_z                Dataset struct with (two-tailed) z-scores for each
%                       feature corrected for multiple comparisons.
%                       The type of feature-wise statistic (one-sample
%                       t-test, two-sample t-test, one-way ANOVA, or
%                       repeated-measures ANOVA) depends on the contents of
%                       ds.sa.targets and ds.sa.chunks (see Notes).
%                       For example, at alpha=0.05, these can be
%                       interpreted as:
%                       - For a one-tailed test:
%                           z < -1.6449   statistic is below expected mean
%                          |z|<  1.6449   statistic is not significant
%                           z >  1.6449   statistic is above expected mean
%                       - For a two-tailed test:
%                           z < -1.9600  statistic is below expected mean
%                          |z|<  1.9600  statistic is not significant
%                           z >  1.9600  statistic is above expected mean
%                       where |z| denotes the absolute value of z.
%                       Use normcdf to convert the z-scores to p-values.
%
% Example:
%     % Generate tiny synthetic dataset representing 6 subjects, one
%     % condition, 6 voxels
%     ds=cosmo_synthetic_dataset('ntargets',1,'nchunks',6);
%     %
%     % Define clustering neighborhood
%     nh=cosmo_cluster_neighborhood(ds,'progress',false);
%     %
%     % Set options for monte carlo based clustering statistic
%     opt=struct();
%     opt.cluster_stat='tfce';  % Threshold-Free Cluster Enhancement;
%                               % this is a (very reasonable) default
%     opt.niter=50;             % this is way too small except for testing;
%                               % should usually be >=1000;
%                               % better is >=10,000
%     opt.h0_mean=0;            % test against mean of zero
%     opt.seed=1;               % should usually not be used, unless exact
%                               % replication of results is required
%                               %  replication
%     opt.progress=false;       % do not show progress
%                               % (only for in this example)
%     %
%     % Apply cluster-based correction
%     z_ds=cosmo_montecarlo_cluster_stat(ds,nh,opt);
%     %
%     % Show z-scores of each feature corrected for multiple comparisons;
%     % One feature has abs(z_ds.samples)>1.96, indicating it
%     % survives correction for multiple comparisons
%     cosmo_disp(z_ds.samples)
%     ||[ 0.929         0      1.76         0      1.29         0 ]
%
%
% Notes:
%   - In the input dataset, ds.sa.targets should indicate the condition (or
%     class), whereas ds.sa.chunks should indicate (in)dependency of
%     measurements. In a second-level (group) analysis, ds.sa.chunks should
%     be set so that different subjects have different values in
%     ds.sa.chunks; and different conditions should have different values
%     in ds.sa.targets. The following table illustrates how .sa.targets and
%     .sa.chunks should be set to perform within- or between-group
%     statistical tests:
%
%     ds.sa.targets'   ds.sa.chunks'
%     --------------   -------------
%     [1 1 1 1 1 1]    [1 2 3 4 5 6]    Six subjects, one condition;
%                                       one-sample t-test against the null
%                                       hypothesis of a mean of h0_mean
%                                       (h0_mean is required)
%     [1 1 1 2 2 2]    [1 2 3 4 5 6]    Six subjects, two groups;
%                                       two-sample (unpaired) t-test
%                                       against the null hypothesis of no
%                                       differences between the two groups
%                                       (h0_mean is not allowed)
%     [1 1 2 2 3 3]    [1 2 3 4 5 6]    Six subjects, three groups;
%                                       one-way ANOVA against the null
%                                       hypothesis of no differences
%                                       between the three groups.
%                                       (h0_mean is not allowed)
%     [1 1 1 2 2 2]    [1 2 3 1 2 3]    Three subjects, two conditions;
%                                       paired t-test against the null
%                                       hypothesis of no differences
%                                       between samples with targets=1 and
%                                       those with targets=2.
%                                       (h0_mean is not allowed)
%     [1 1 2 2 3 3]    [1 2 1 2 1 2]    Two subjects, three conditions;
%                                       repeated-measures ANOVA against
%                                       the null hypothesis of no
%                                       differences between samples with
%                                       targets=1, targets=2, and targets=3
%                                       (h0_mean is not allowed)
%
%   - As illustrated above, if the 'h0_mean' option is provided, then a
%     one-sample t-test against a mean of h0_mean is performed. Use-cases
%     involve testing one group of subjects against a mean of zero for
%     correlation differences (if maps were obtained using a searchlight
%     with cosmo_correlation_measure), or against a mean of 1/C (if maps
%     were obtained using a searchlight with classification analysis
%     with C classes, with each class occuring equally often in the
%     training set).
%   - The permutations used with the 'h0_mean' option involve randomly
%     flipping the sign of samples (over all features) after subtracting
%     'h0_mean' (This preserves spatial smoothness in individual samples).
%     This approach is fast but somewhat conservative, and it becomes
%     more conservative with larger effect sizes. For the most accurate
%     (but also slowest to compute) results, use the 'null' data option
%     instead.
%   - The number of iterations determines the precision of the estimates
%     z-scores. For publication-quality papers, 10,000 iterations is
%     recommended.
%   - The neighborhood used for clustering can, for almost all use cases,
%     be computed using cosmo_cluster_neighborhood.
%   - The rationale for returning z-scores (instead of p-values) that are
%     corrected for multiple comparisons is that extreme values are the
%     most significant; when visualized using an external package, a
%     threshold can be applied to see which features (e.g. voxels or nodes)
%     survive a particular cluster-corrected significance threshold.
%   - Versions of this function from before 23 November 2016 incorrectly
%     producted z-scores corresponding to one-tailed, rather than
%     two-tailed, probablity values.
%   - p-values are computed by dividing as (r+1) / (niter+1), with r the
%     number of times that the original data as less then the null
%     distributions. This follows the recommendation of North et al (2002).
%
% References:
%   - Stephen M. Smith, Thomas E. Nichols (2009), Threshold-free
%     cluster enhancement: Addressing problems of smoothing, threshold
%     dependence and localisation in cluster inference, NeuroImage, Volume
%     44, 83-98.
%   - Maris, E., Oostenveld, R. Nonparametric statistical testing of EEG-
%     and MEG-data. Journal of Neuroscience Methods (2007).
%   - Johannes Stelzer, Yi Chen and Robert Turner (2013). Statistical
%     inference and multiple testing correction in classification-based
%     multi-voxel pattern analysis (MVPA): Random permutations and cluster
%     size control. NeuroImage, Volume 65, 69-82.
%   - North, Bernard V., David Curtis, and Pak C. Sham. "A note on the
%     calculation of empirical P values from Monte Carlo procedures." The
%     American Journal of Human Genetics 71.2 (2002): 439-441.
%
% See also: cosmo_cluster_neighborhood
%
% #   For CoSMoMVPA's copyright information and license terms,   #
% #   see the COPYING file distributed with CoSMoMVPA.           #