function ds_avg=cosmo_average_samples(ds, varargin)
% average subsets of samples by unique combinations of sample attributes
%
% ds_avg=cosmo_average_samples(ds, ...)
%
% Inputs:
% ds dataset struct with field:
% .samples NS x NF
% .sa with fields .targets and .chunks
% 'ratio', ratio ratio (between 0 and 1) of samples to select for
% each average. Not compatible with 'count' (default: 1).
% 'count', c number of samples to select for each average.
% Not compatible with 'ratio'.
% 'resamplings',s Maximum number of times each sample in ds is used for
% averaging. Not compatible with 'repeats' (default: 1)
% 'repeats', r Number of times an average is computed for each unique
% combination of targets and chunks. Not compatible with
% 'resamplings'
% 'seed', d Use seed d for pseudo-random sampling (optional); d
% can be any integer between 1 and 2^32-1.
% If this option is omitted, then different calls to this
% function may (usually: will) return different results.
% 'split_by',fs A cell with fieldnames by which the dataset is split
% prior to averaging each bin.
% Default: {'targets','chunks'}.
%
%
% Returns
% ds_avg dataset struct with field:
% .samples ('repeats'*unq) x NF, where
% unq is the number of unique combinations of values in
% sample attribute as indicated by 'split_by' (by
% default, data is split by 'targets' and 'chunks').
% Each sample is an average from samples that share the
% same values for these attributes. The number of times
% each sample is used to compute average values differs
% by one at most.
% .sa Based on averaged samples.
% .fa,.a Same as in ds (if present).
%
% Examples:
% % generate simple dataset with 3 times (2 targets x 3 chunks)
% ds=cosmo_synthetic_dataset('nreps',3);
% size(ds.samples)
% %|| [ 18 6 ]
% cosmo_disp([ds.sa.targets ds.sa.chunks])
% %|| [ 1 1
% %|| 2 1
% %|| 1 2
% %|| : :
% %|| 2 2
% %|| 1 3
% %|| 2 3 ]@18x2
% % average each unique combination of chunks and targets
% ds_avg=cosmo_average_samples(ds);
% cosmo_disp([ds_avg.sa.targets ds_avg.sa.chunks]);
% %|| [ 1 1
% %|| 1 2
% %|| 1 3
% %|| 2 1
% %|| 2 2
% %|| 2 3 ]
% %
% % for each unique target-chunk combination, select 50% of the samples
% % randomly and average these; repeat the random selection process 4
% % times. Each sample in 'ds' is used twice (=.5*4) as an element
% % to compute an average. The output has 24 samples
% ds_avg2=cosmo_average_samples(ds,'ratio',.5,'repeats',4);
% cosmo_disp([ds_avg2.sa.targets ds_avg2.sa.chunks],'edgeitems',5);
% %|| [ 1 1
% %|| 1 1
% %|| 1 1
% %|| 1 1
% %|| 1 2
% %|| : :
% %|| 2 2
% %|| 2 3
% %|| 2 3
% %|| 2 3
% %|| 2 3 ]@24x2
%
% Notes:
% - this function averages feature-wise; the output has the same features
% as the input.
% - it can be used to average data from trials safely without circular
% analysis issues.
% - as a result the number of trials in each chunk and target is
% identical, so balancing of partitions is not necessary for data from
% this function.
% - the default behaviour of this function computes a single average for
% each unique combination of chunks and targets.
% - if the number of samples differs for different combinations of chunks
% and targets, then some samples may not be used to compute averages,
% as the least number of samples across combinations is used to set
% - As illustration, consider a dataset with the following number of
% samples for each unique targets and chunks combiniation
%
% .sa.chunks .sa.targets number of samples
% ---------- ----------- -----------------
% 1 1 12
% 1 2 16
% 2 1 15
% 2 2 24
%
% The least number of samples is 12, which determines how many averages
% are computed. Different parameters result in a different number of
% averages; some examples:
%
% parameters number of output samples for each
% combination of targets and chunks
% ---------- ---------------------------------
% 'count', 2 6 averages from 2 samples [*]
% 'count', 3 4 averages from 3 samples [*]
% 'ratio', .25 4 averages from 3 samples [*]
% 'ratio', .5 2 averages from 6 samples [*]
% 'ratio', .5, 'repeats', 3 6 averages from 6 samples
% 'ratio', .5, 'resamplings', 3 12 averages from 6 samples
%
% [*]: not all samples in the input are used to compute averages from
% the output.
%
% Briefly, 'ratio' or 'count' determine, together with the least number
% of samples, how many samples are averaged for each output sample.
% 'resamplings' and 'repeats' determine how many averages are taken,
% based on how many samples are averaged for each output sample.
% - To compute averages based on other sample attributes than 'targets'
% and 'chunks', use the 'split_by' option
%
% See also: cosmo_balance_partitions
%
% # For CoSMoMVPA's copyright information and license terms, #
% # see the COPYING file distributed with CoSMoMVPA. #