cosmo chunkize hdrΒΆ

function chunks=cosmo_chunkize(ds,nchunks_out)
% assigns chunks that are as balanced as possible based on targets.
%
% chunks_out=cosmo_chunkize(ds_targets,nchunks_out)
%
% Inputs:
%   ds             A dataset struct with fields:
%    .sa.targets   Px1 targets with class labels for P samples
%    .sa.chunks    Px1 initial chunks for P samples; different values mean
%                  that the corresponding data can be assumed to be
%                  independent
%   nchunks_out    scalar indicating how many different chunks should be
%                  assigned.
%
% Output:
%   chunks_out     Px1 chunks assigned, in the range 1:nchunks. It is
%                  required that N=numel(unique(ds.sa.targets)) is greater
%                  than or equal to nchunks.
%
%
% Example:
%     % ds is an MEEG dataset with 48 samples
%     ds=cosmo_synthetic_dataset('type','timelock','nreps',8);
%     %
%     % with no chunks set, this function gives an error
%     ds.sa=rmfield(ds.sa,'chunks');
%     cosmo_chunkize(ds)
%     %|| error('dataset has no field .sa.chunks. ...');
%     %
%     % set chunks so that all samples are assumed to be independent
%     ds.sa.chunks=(1:size(ds.samples,1))';
%     %
%     % show initial dataset targets and chunks
%     cosmo_disp([ds.sa.targets ds.sa.chunks])
%     %|| [ 1         1
%     %||   2         2
%     %||   1         3
%     %||   :         :
%     %||   2        46
%     %||   1        47
%     %||   2        48 ]@48x2
%     %
%     % Re-assign chunks pseudo-randomly in the range 1:4.
%     % samples (rows) with the same chunk original chunk value
%     % will still have the same chunk value (but the reverse is not
%     % necessarily true)
%     ds.sa.chunks=cosmo_chunkize(ds,4);
%     %
%     % sanity check
%     cosmo_check_dataset(ds);
%     % Show result
%     cosmo_disp([ds.sa.targets ds.sa.chunks])
%     %|| [ 1         1
%     %||   2         1
%     %||   1         2
%     %||   :         :
%     %||   2         3
%     %||   1         4
%     %||   2         4 ]@48x2
%
% Notes:
%  - This function is indended for MEEG datasets, or other datasets
%    where each trial can be assumed to be 'independant' of other trials.
%  - To indicate independence between all trials in a dataset ds, use:
%      ds.sa.chunks=(1:size(ds.samples,1))';
%    prior to using this function
%  - When this function is used prior to classification using partitioning
%    (with cosmo_nchoosek_partitioner or cosmo_nfold_paritioner),
%    it is recommended to apply cosmo_balance_partitions to
%    that partitioning
%  - Usage for fMRI datasets is not recommended, unless you really know
%    what you are doing. Rather, for fMRI datasets usually the chunks are
%    assigned manually so that each run has a different chunk value.
%
% #   For CoSMoMVPA's copyright information and license terms,   #
% #   see the COPYING file distributed with CoSMoMVPA.           #