Double dipping
% Warning: this exercise shows the *bad* practice of double dipping % (also known as circular analysis). You must never, ever use % results double dipping to interpret results for a real analysis that you % would publish. nfeatures = 100; nsamples_per_class = 200; nclasses = 2; niter = 1000; % compute number of samples nsamples = nclasses * nsamples_per_class; % set targets targets = repmat((1:nclasses)', nsamples_per_class, 1); % allocate space for output accuracies = zeros(niter, 2); for iter = 1:niter % generate random gaussian train data of size nsamples x nfeatures % assign the result to a variable 'train_data' % >@@> train_data = randn(nsamples, nfeatures); % <@@< % for the double dipping test data, assign 'double_dipping_test_data' % to be the same as the training data. % % *** WARNING *** % For real data analyses (that you would publish in a paper) you % must never do double dipping analysis - its results are invalid % **************** % >@@> double_dipping_test_data = train_data; % <@@< % for the independent data, generate random gaussian data (of the % same size as train_data) and assign to a variable % 'independent_test_data' % >@@> independent_test_data = randn(nsamples, nfeatures); % <@@< % compute class labels predictions for both test sets using % cosmo_classify_lda. Store the predictions in % 'double_dipping_pred' and 'independent_pred', respectively % >@@> double_dipping_pred = cosmo_classify_lda(train_data, targets, ... double_dipping_test_data); independent_pred = cosmo_classify_lda(train_data, targets, ... independent_test_data); % <@@< % compute classification accuracies double_dipping_acc = mean(double_dipping_pred == targets); independent_acc = mean(independent_pred == targets); % store accuracies in the iter-th row of the 'accuracies' matrix % >@@> accuracies(iter, :) = [double_dipping_acc, independent_acc]; % <@@< end % show histogram hist(accuracies, 100); legend({'double dipping', 'independent'});
