function partitions=cosmo_independent_samples_partitioner(ds,varargin)
% Compute partitioning scheme based on dataset with independent samples
%
% partitions=cosmo_independent_samples_partitioner(ds,...)
%
% Inputs:
% ds dataset structure with fields .samples,
% .sa.targets and .sa.chunks. Since this
% function is intended for the case that all
% rows (patterns) in .samples are
% independent, it is required that all values
% in .sa.chunks are unique.
% 'fold_count',c Return partitions with c folds.
% 'test_count', tc } Return partitions so that in each test
% 'test_ratio', tr } set there are tc samples (or (tr*100%)
% } per unique target. These two options are
% } mutually exclusive
% 'seed',s (optional) use seed s for pseudo-random
% number generator (default: s=1). If
% provided, then this function behaves
% pseudo-ranomly but deterministically, and
% different calls return the same output.
% If s=0, then repeated calls to this
% function gives different outputs.
% 'max_fold_count' Safety limit to the maximum number of folds
% that can be returned (default: 10000). When
% this number is set to a larger value, this
% may result in too much memory being
% required and slowing down or crashing the
% machine.
% Output:
% partitions Cell with fields .train_indices and
% .test_indices, both of size c x 1.
% Each element in .test_indices has tc (when
% using 'test_count') or tr * min_count
% (when using 'test_ratio'; min_count is the
% minimum number of samples over classes)
% elements; each element in .train_indices
% has min_count-tc elements.
% In other words, the resulting partitions
% are balanced for both training and test
% set.
%
% Examples:
% % make simple dataset with 9 samples, 3 features
% ds=struct();
% ds.samples=randn(9,3);
% ds.sa.targets=[1 1 1 1 1 2 2 2 2]';
% ds.sa.chunks=(1:9)';
% %
% % Partition scheme with 5 folds, each in which 1 target in each chunk
% % is used for testing
% partitions=cosmo_independent_samples_partitioner(ds,...
% 'test_count',1,...
% 'fold_count',5);
% cosmo_disp(partitions)
% %|| .test_indices
% %|| { [ 2 [ 4 [ 2 [ 5 [ 1
% %|| 6 ] 8 ] 7 ] 8 ] 7 ] }
% %|| .train_indices
% %|| { [ 1 [ 1 [ 1 [ 1 [ 3
% %|| 3 3 3 2 4
% %|| 5 5 5 4 5
% %|| 7 6 6 6 6
% %|| 8 7 8 7 8
% %|| 9 ] 9 ] 9 ] 9 ] 9 ] }
% %
% % As above, but now with 2 targets in each chunk used for testing
% partitions=cosmo_independent_samples_partitioner(ds,...
% 'test_count',2,...
% 'fold_count',5);
% cosmo_disp(partitions)
% %|| .test_indices
% %|| { [ 1 [ 4 [ 2 [ 1 [ 1
% %|| 2 5 3 5 5
% %|| 6 7 6 6 6
% %|| 7 ] 8 ] 7 ] 8 ] 7 ] }
% %|| .train_indices
% %|| { [ 3 [ 1 [ 1 [ 2 [ 3
% %|| 5 3 5 4 4
% %|| 8 6 8 7 8
% %|| 9 ] 9 ] 9 ] 9 ] 9 ] }
% %
% % Now use 30% of the targets in each chunk for testing,
% % and return 20 chunks.
% partitions=cosmo_independent_samples_partitioner(ds,...
% 'test_ratio',0.3,...
% 'fold_count',20);
% cosmo_disp(partitions)
% %|| .test_indices
% %|| { [ 3 [ 4 [ 1 ... [ 5 [ 1 [ 4
% %|| 6 ] 7 ] 8 ] 6 ] 9 ] 7 ] }@1x20
% %|| .train_indices
% %|| { [ 1 [ 2 [ 2 ... [ 1 [ 3 [ 1
% %|| 2 3 4 2 4 2
% %|| 4 5 5 3 5 5
% %|| 7 6 6 7 6 6
% %|| 8 8 7 8 7 8
% %|| 9 ] 9 ] 9 ] 9 ] 8 ] 9 ] }@1x20
%
% Notes:
% - Unless the number of targets and chunks is very small, the number of
% partitions returned by this function (=c) is less than the total number
% of possible partitions. In these cases, a random subset of possible
% partitions is chosen, with the constraint that no combination of train
% and test indices is repeated in partitions. No attempt is made to
% balance the number of times each sample is used for training and/or
% testing.
% - This function behaves, by default, pseudo-randomly and
% deterministically; different calls to this function, with the same
% inputs, result in the same output. To get different outputs for
% different calls, set the 'seed' option to 0.
%
% See also: cosmo_nfold_partitions, cosmo_nchoosek_partitioner
%
% # For CoSMoMVPA's copyright information and license terms, #
% # see the COPYING file distributed with CoSMoMVPA. #