cosmo independent samples partitioner hdrΒΆ

function partitions=cosmo_independent_samples_partitioner(ds,varargin)
% Compute partitioning scheme based on dataset with independent samples
%
% partitions=cosmo_independent_samples_partitioner(ds,...)
%
% Inputs:
%   ds                          dataset structure with fields .samples,
%                               .sa.targets and .sa.chunks. Since this
%                               function is intended for the case that all
%                               rows (patterns) in .samples are
%                               independent, it is required that all values
%                               in .sa.chunks are unique.
%   'fold_count',c              Return partitions with c folds.
%   'test_count', tc            } Return partitions so that in each test
%   'test_ratio', tr            } set there are tc samples (or (tr*100%)
%                               } per unique target. These two options are
%                               } mutually exclusive
%   'seed',s                    (optional) use seed s for pseudo-random
%                               number generator (default: s=1). If
%                               provided, then this function behaves
%                               pseudo-ranomly but deterministically, and
%                               different calls return the same output.
%                               If s=0, then repeated calls to this
%                               function gives different outputs.
%   'max_fold_count'            Safety limit to the maximum number of folds
%                               that can be returned (default: 10000). When
%                               this number is set to a larger value, this
%                               may result in too much memory being
%                               required and slowing down or crashing the
%                               machine.
% Output:
%   partitions                  Cell with fields .train_indices and
%                               .test_indices, both of size c x 1.
%                               Each element in .test_indices has tc (when
%                               using 'test_count') or tr * min_count
%                               (when using 'test_ratio'; min_count is the
%                               minimum number of samples over classes)
%                               elements; each element in .train_indices
%                               has min_count-tc elements.
%                               In other words, the resulting partitions
%                               are balanced for both training and test
%                               set.
%
% Examples:
%     % make simple dataset with 9 samples, 3 features
%     ds=struct();
%     ds.samples=randn(9,3);
%     ds.sa.targets=[1 1 1 1 1 2 2 2 2]';
%     ds.sa.chunks=(1:9)';
%     %
%     % Partition scheme with 5 folds, each in which 1 target in each chunk
%     % is used for testing
%     partitions=cosmo_independent_samples_partitioner(ds,...
%                                             'test_count',1,...
%                                             'fold_count',5);
%     cosmo_disp(partitions)
%     %|| .test_indices
%     %||   { [ 2    [ 4    [ 2    [ 5    [ 1
%     %||       6 ]    8 ]    7 ]    8 ]    7 ] }
%     %|| .train_indices
%     %||   { [ 1    [ 1    [ 1    [ 1    [ 3
%     %||       3      3      3      2      4
%     %||       5      5      5      4      5
%     %||       7      6      6      6      6
%     %||       8      7      8      7      8
%     %||       9 ]    9 ]    9 ]    9 ]    9 ] }
%     %
%     % As above, but now with 2 targets in each chunk used for testing
%     partitions=cosmo_independent_samples_partitioner(ds,...
%                                             'test_count',2,...
%                                             'fold_count',5);
%     cosmo_disp(partitions)
%     %|| .test_indices
%     %||   { [ 1    [ 4    [ 2    [ 1    [ 1
%     %||       2      5      3      5      5
%     %||       6      7      6      6      6
%     %||       7 ]    8 ]    7 ]    8 ]    7 ] }
%     %|| .train_indices
%     %||   { [ 3    [ 1    [ 1    [ 2    [ 3
%     %||       5      3      5      4      4
%     %||       8      6      8      7      8
%     %||       9 ]    9 ]    9 ]    9 ]    9 ] }
%     %
%     % Now use 30% of the targets in each chunk for testing,
%     % and return 20 chunks.
%     partitions=cosmo_independent_samples_partitioner(ds,...
%                                             'test_ratio',0.3,...
%                                             'fold_count',20);
%     cosmo_disp(partitions)
%     %|| .test_indices
%     %||   { [ 3    [ 4    [ 1   ... [ 5    [ 1    [ 4
%     %||       6 ]    7 ]    8 ]       6 ]    9 ]    7 ]   }@1x20
%     %|| .train_indices
%     %||   { [ 1    [ 2    [ 2   ... [ 1    [ 3    [ 1
%     %||       2      3      4         2      4      2
%     %||       4      5      5         3      5      5
%     %||       7      6      6         7      6      6
%     %||       8      8      7         8      7      8
%     %||       9 ]    9 ]    9 ]       9 ]    8 ]    9 ]   }@1x20
%
% Notes:
% - Unless the number of targets and chunks is very small, the number of
%   partitions returned by this function (=c) is less than the total number
%   of possible partitions. In these cases, a random subset of possible
%   partitions is chosen, with the constraint that no combination of train
%   and test indices is repeated in partitions. No attempt is made to
%   balance the number of times each sample is used for training and/or
%   testing.
% - This function behaves, by default, pseudo-randomly and
%   deterministically; different calls to this function, with the same
%   inputs, result in the same output. To get different outputs for
%   different calls, set the 'seed' option to 0.
%
% See also: cosmo_nfold_partitions, cosmo_nchoosek_partitioner
%
% #   For CoSMoMVPA's copyright information and license terms,   #
% #   see the COPYING file distributed with CoSMoMVPA.           #