Skip to content

Issue via Email #250

@whitead

Description

@whitead

B. Machine Learning, Chapter 3, Regression & Model Assessment, Section 3.6.1 k-Fold Cross-Validation. In the code snippets to calculate the 10-fold cross validation across all cases, the training set is amassed incorrectly and has redundant data.

Instead of

train = pd.concat([soldata[splits[i]:], soldata[splits[i + 1]:]])

it should be

train = pd.concat([soldata[:splits[i]], soldata[splits[i + 1] :]])

The change is pretty minor when we consider the whole dataset, but switching to only looking at a subset of the data, there is a significant variation in error based on the choice of k.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions