AllDataWithBottleAttribute.zip contains all standardised spectra in weka arff format as single instances sets. rawSSMFiles.zip contains all the original, unstandardised spectra in their SSM format DefinedSplits.zip contains all of the saved splits produced by sampling the first set of zips, examples in code, papers.Large17ForgedAlcohol //BUG: //Post submission, it was found that there is a double precision error, resulting //in (on the order of) one in every few thousand instances being classified differently //Here, train/test splits are defined by loading in the full dataset and splitting //in memory. In the experiments that made the published results, splits were locally //made and saved to file, before being read in to classify on the cluster. //The method used to save the splits wrote out to 6 decimal places, whereas creating them in memory //will mean they have full double precision. //On average, results should be the same, if anything the extra precision should //mean results created by splitting and classifying in memory are on average a bit better, than the published results //though no where near significantly so. tl;dr: Use the saved splits for an identical recreation of the published results, use the provided sampling methods to sample from the full datasets for maximal precision. In the method papers.Large17ForgeDAlcohol.paperExampleCode(), there is a boolean 'useSavedSplits' to toggle between these two, once the paths to the data have been set up to where they are on your machine.