ENH Creating a synthetic example dataset

This is based on a Gitter conversation with @adrinjalali @hildeweerts @MiroDudik where we agreed that it would be nice to have a synthetic dataset available for our examples. @adrinjalali suggested the following code using `sklearn`'s `make_classification`:

```
rng = RandomState(seed=42)

X_women, y_women = make_classification(
    n_samples=500,
    n_features=20,
    n_informative=4,
    n_classes=2,
    class_sep=1,
    random_state=rng,
)

X_men, y_men = make_classification(
    n_samples=500,
    n_features=20,
    n_informative=4,
    n_classes=2,
    class_sep=2,
    random_state=rng,
)

X_unspecified, y_unspecified = make_classification(
    n_samples=500,
    n_features=20,
    n_informative=4,
    n_classes=2,
    class_sep=0.5,
    random_state=rng,
)

X = np.r_[X_women, X_men, X_unspecified]
y = np.r_[y_women, y_men, y_unspecified]
gender = np.r_[["Woman"] * 500, ["Man"] * 500, ["Unspecified"] * 500].reshape(
    -1,
)

X_train, X_test, y_train, y_test, gender_train, gender_test = train_test_split(
    X, y, gender, test_size=0.3, random_state=rng
)
```
@MiroDudik suggested extending this to have at least 2 sensitive features and 1 control feature to allow us to use it in basically all our examples. 

@fairlearn/fairlearn-maintainers any objection with putting this in the `fairlearn.datasets` module?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH Creating a synthetic example dataset #793

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH Creating a synthetic example dataset #793

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions