function chunks=cosmo_chunkize(ds,nchunks_out)
% assigns chunks that are as balanced as possible based on targets.
%
% chunks_out=cosmo_chunkize(ds_targets,nchunks_out)
%
% Inputs:
% ds A dataset struct with fields:
% .sa.targets Px1 targets with class labels for P samples
% .sa.chunks Px1 initial chunks for P samples; different values mean
% that the corresponding data can be assumed to be
% independent
% nchunks_out scalar indicating how many different chunks should be
% assigned.
%
% Output:
% chunks_out Px1 chunks assigned, in the range 1:nchunks. It is
% required that N=numel(unique(ds.sa.targets)) is greater
% than or equal to nchunks.
%
%
% Example:
% % ds is an MEEG dataset with 48 samples
% ds=cosmo_synthetic_dataset('type','timelock','nreps',8);
% %
% % with no chunks set, this function gives an error
% ds.sa=rmfield(ds.sa,'chunks');
% cosmo_chunkize(ds)
% %|| error('dataset has no field .sa.chunks. ...');
% %
% % set chunks so that all samples are assumed to be independent
% ds.sa.chunks=(1:size(ds.samples,1))';
% %
% % show initial dataset targets and chunks
% cosmo_disp([ds.sa.targets ds.sa.chunks])
% %|| [ 1 1
% %|| 2 2
% %|| 1 3
% %|| : :
% %|| 2 46
% %|| 1 47
% %|| 2 48 ]@48x2
% %
% % Re-assign chunks pseudo-randomly in the range 1:4.
% % samples (rows) with the same chunk original chunk value
% % will still have the same chunk value (but the reverse is not
% % necessarily true)
% ds.sa.chunks=cosmo_chunkize(ds,4);
% %
% % sanity check
% cosmo_check_dataset(ds);
% % Show result
% cosmo_disp([ds.sa.targets ds.sa.chunks])
% %|| [ 1 1
% %|| 2 1
% %|| 1 2
% %|| : :
% %|| 2 3
% %|| 1 4
% %|| 2 4 ]@48x2
%
% Notes:
% - This function is indended for MEEG datasets, or other datasets
% where each trial can be assumed to be 'independant' of other trials.
% - To indicate independence between all trials in a dataset ds, use:
% ds.sa.chunks=(1:size(ds.samples,1))';
% prior to using this function
% - When this function is used prior to classification using partitioning
% (with cosmo_nchoosek_partitioner or cosmo_nfold_paritioner),
% it is recommended to apply cosmo_balance_partitions to
% that partitioning
% - Usage for fMRI datasets is not recommended, unless you really know
% what you are doing. Rather, for fMRI datasets usually the chunks are
% assigned manually so that each run has a different chunk value.
%
% # For CoSMoMVPA's copyright information and license terms, #
% # see the COPYING file distributed with CoSMoMVPA. #