Cross-validation part 1: N-Fold Partitioner¶

Before we can do cross validation, we need to partition the data into different sets of training and testing folds. In the standard leave-one-run-out cross-validation scheme we make N-partitions (for N-runs) where each run takes turns being the testing data, while the classifier is trained on all the other runs. This means that for every data fold we need a set of sample indices for training and another for testing. Below is a an incomplete function that computes the partitions for a given set of chunks sample attributes. Your task is to complete the function by writing the missing for-loop.

function partitions = cosmo_nfold_partitioner(chunks)
% generates an n-fold partition scheme
%
% partitions=cosmo_nfold_partitioner(chunks)
%
% Input
%  - chunks          Px1 chunk indices for P samples. It can also be a
%                    dataset with field .sa.chunks
%
% Output:
%  - partitions      A struct with fields .train_indices and .test_indices.
%                    Each of these is an 1xQ cell for Q partitions, where
%                    .train_indices{k} and .test_indices{k} contain the
%                    sample indices for the k-th fold.
%
% Example:
%     % simple partitioning scheme with 3 chunks with two samples each
%     % (chunk values are not necessarily in increasing order)
%     p=cosmo_nfold_partitioner([3 1 2 3 2 1]);
%     cosmo_disp(p);
%     %|| .train_indices
%     %||   { [ 1    [ 1    [ 2
%     %||       3      2      3
%     %||       4      4      5
%     %||       5 ]    6 ]    6 ] }
%     %|| .test_indices
%     %||   { [ 2    [ 3    [ 1
%     %||       6 ]    5 ]    4 ] }
%
%     % show the same with a dataset struct
%     ds=struct();
%     ds.samples=randn(6,99); % 6 samples, 99 features
%     ds.sa.targets=[1 2 1 2 1 2]'; % conditions; ignored by this function
%     ds.sa.chunks=[3 1 2 3 2 1]';  % used for partitioning
%     p=cosmo_nfold_partitioner(ds);
%     cosmo_disp(p);
%     %|| .train_indices
%     %||   { [ 1    [ 1    [ 2
%     %||       3      2      3
%     %||       4      4      5
%     %||       5 ]    6 ]    6 ] }
%     %|| .test_indices
%     %||   { [ 2    [ 3    [ 1
%     %||       6 ]    5 ]    4 ] }
%
%
%     % Example of an unbalanced partitioning scheme. Generally it is
%     % advised to balance the partitions before using them for MVPA.
%     % (see cosmo_balance_partitions)
%     ds=struct();
%     ds.samples=randn(7,99); % 7 samples (1 extra), 99 features
%     ds.sa.targets=[1 2 1 2 1 2 2]';
%     ds.sa.chunks= [1 1 3 3 3 3 3]';
%     p=cosmo_nfold_partitioner(ds);
%     cosmo_disp(p);
%     %|| .train_indices
%     %||   { [ 3    [ 1
%     %||       4      2 ]
%     %||       5
%     %||       6
%     %||       7 ]        }
%     %|| .test_indices
%     %||   { [ 1    [ 3
%     %||       2 ]    4
%     %||              5
%     %||              6
%     %||              7 ] }
%
%
% Note:
%  - for cross-validation it is recommended to balance partitions using
%    cosmo_balance_partitions.
%  - More advanced partitining is provided by cosmo_nchoosek_partitioner.
%
% See also: cosmo_balance_partitions, cosmo_nchoosek_partitioner
%
% #   For CoSMoMVPA's copyright information and license terms,   #
% #   see the COPYING file distributed with CoSMoMVPA.           #

Hint: cosmo nfold partitioner skl

Solution: cosmo nfold partitioner

Extra exercise: write a split half partitioner where there are two partitions only of approximately equal size (for example, using odd and even chunks).

Hint: cosmo oddeven partitioner hdr

Solution: cosmo oddeven partitioner

Extra advanced exercise: write a (K,N)-fold partitioner that returns all partitions for N chunks so that there are K chunks in the test set and (N-K) chunks in the training set.

Cross-validation part 1: N-Fold Partitioner¶

Previous topic

Next topic

This Page