-
Notifications
You must be signed in to change notification settings - Fork 130
Open
Description
B. Machine Learning, Chapter 3, Regression & Model Assessment, Section 3.6.1 k-Fold Cross-Validation. In the code snippets to calculate the 10-fold cross validation across all cases, the training set is amassed incorrectly and has redundant data.
Instead of
train = pd.concat([soldata[splits[i]:], soldata[splits[i + 1]:]])
it should be
train = pd.concat([soldata[:splits[i]], soldata[splits[i + 1] :]])
The change is pretty minor when we consider the whole dataset, but switching to only looking at a subset of the data, there is a significant variation in error based on the choice of k.
Metadata
Metadata
Assignees
Labels
No labels