Cross-validation part 1: N-Fold Partitioner¶
Before we can do cross validation, we need to partition the data into different sets of training and testing folds. In the standard leave-one-run-out cross-validation scheme we make N-partitions (for N-runs) where each run takes turns being the testing data, while the classifier is trained on all the other runs. This means that for every data fold we need a set of sample indices for training and another for testing. Below is a an incomplete function that computes the partitions for a given set of chunks sample attributes. Your task is to complete the function by writing the missing for-loop.
function partitions = cosmo_nfold_partitioner(chunks)
% generates an n-fold partition scheme
%
% partitions=cosmo_nfold_partitioner(chunks)
%
% Input
% - chunks Px1 chunk indices for P samples. It can also be a
% dataset with field .sa.chunks
%
% Output:
% - partitions A struct with fields .train_indices and .test_indices.
% Each of these is an 1xQ cell for Q partitions, where
% .train_indices{k} and .test_indices{k} contain the
% sample indices for the k-th fold.
%
% Example:
% % simple partitioning scheme with 3 chunks with two samples each
% % (chunk values are not necessarily in increasing order)
% p=cosmo_nfold_partitioner([3 1 2 3 2 1]);
% cosmo_disp(p);
% %|| .train_indices
% %|| { [ 1 [ 1 [ 2
% %|| 3 2 3
% %|| 4 4 5
% %|| 5 ] 6 ] 6 ] }
% %|| .test_indices
% %|| { [ 2 [ 3 [ 1
% %|| 6 ] 5 ] 4 ] }
%
% % show the same with a dataset struct
% ds=struct();
% ds.samples=randn(6,99); % 6 samples, 99 features
% ds.sa.targets=[1 2 1 2 1 2]'; % conditions; ignored by this function
% ds.sa.chunks=[3 1 2 3 2 1]'; % used for partitioning
% p=cosmo_nfold_partitioner(ds);
% cosmo_disp(p);
% %|| .train_indices
% %|| { [ 1 [ 1 [ 2
% %|| 3 2 3
% %|| 4 4 5
% %|| 5 ] 6 ] 6 ] }
% %|| .test_indices
% %|| { [ 2 [ 3 [ 1
% %|| 6 ] 5 ] 4 ] }
%
%
% % Example of an unbalanced partitioning scheme. Generally it is
% % advised to balance the partitions before using them for MVPA.
% % (see cosmo_balance_partitions)
% ds=struct();
% ds.samples=randn(7,99); % 7 samples (1 extra), 99 features
% ds.sa.targets=[1 2 1 2 1 2 2]';
% ds.sa.chunks= [1 1 3 3 3 3 3]';
% p=cosmo_nfold_partitioner(ds);
% cosmo_disp(p);
% %|| .train_indices
% %|| { [ 3 [ 1
% %|| 4 2 ]
% %|| 5
% %|| 6
% %|| 7 ] }
% %|| .test_indices
% %|| { [ 1 [ 3
% %|| 2 ] 4
% %|| 5
% %|| 6
% %|| 7 ] }
%
%
% Note:
% - for cross-validation it is recommended to balance partitions using
% cosmo_balance_partitions.
% - More advanced partitining is provided by cosmo_nchoosek_partitioner.
%
% See also: cosmo_balance_partitions, cosmo_nchoosek_partitioner
%
% # For CoSMoMVPA's copyright information and license terms, #
% # see the COPYING file distributed with CoSMoMVPA. #
Hint: cosmo nfold partitioner skl
Solution: cosmo nfold partitioner
Extra exercise: write a split half partitioner where there are two partitions only of approximately equal size (for example, using odd and even chunks).
Hint: cosmo oddeven partitioner hdr
Solution: cosmo oddeven partitioner
Extra advanced exercise: write a (K,N)-fold partitioner that returns all partitions for N chunks so that there are K chunks in the test set and (N-K) chunks in the training set.