Skip to content

Conversation

@EdAbati
Copy link
Contributor

@EdAbati EdAbati commented Mar 25, 2025

Description

Towards #1522
(thanks @StefanieSenger for your refactoring PR, made this work way easier :) )

This PR makes the utils _input_validations.py narwhals compatible. Since all the rest doesn't expect narwhals yet, I convert everything to native before the return. This should be updated in the future :)

FYI @TamaraAtanasoska I made the PR to narwhals-integration that is behind main (but branched of main): that's why you see some unrelated commits. We should update narwhals-integration to see a cleaner diff.

Tests

  • no new tests required
  • new tests added
  • existing tests adjusted

Documentation

  • no documentation changes needed
  • user guide added or updated
  • API docs added or updated
  • example notebook added or updated

Screenshots

Tagging the narwhalifying experts @FBruzzesi @MarcoGorelli for suggestions on how to improve this

Comment on lines 12 to 18
# TODO: sklearn.utils.validation.check_array does not support it
# def pandas_nullable_constructor(obj) -> IntoDataFrame:
# return pd.DataFrame(obj).convert_dtypes(dtype_backend="numpy_nullable") # type: ignore[no-any-return]

# TODO: sklearn.utils.validation.check_array does not support it
# def pandas_pyarrow_constructor(obj) -> IntoDataFrame:
# return pd.DataFrame(obj).convert_dtypes(dtype_backend="pyarrow") # type: ignore[no-any-return]
Copy link
Contributor Author

@EdAbati EdAbati Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesing one.

To reproduce:

import pandas as pd
from sklearn.utils.validation import check_array

df = pd.DataFrame({"alpha": ["a", "a", "b"], "beta": [1, 2, 1]})
df = df.convert_dtypes(dtype_backend="pyarrow")
# or
# df = df.convert_dtypes(dtype_backend="numpy_nullable")

check_array(df, dtype=None)

Did you encounter this already @FBruzzesi? Maybe in scikit-lego?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe scikit-lego was very early days of Narwhals, and we are not testing with different pandas dtype_backend 🙈

@TamaraAtanasoska
Copy link
Contributor

Hi @EdAbati, just acknowledging today that I saw this PR! I think that the frequency of my engagement here is not enough to keep everything up to date on the spin-off branch fully unfortunately, I was expecting a different situation these weeks but here we are 😄. We have a maintainer meeting tomorrow, I might discuss to move the merges directly into main, as we are very likely going to postpone the patch release that we planned for. Not sure how fast we will move, but it would be nice to wrap up the first stable edition of the integration by the summer.

@EdAbati EdAbati changed the base branch from narwhals-integration to main March 25, 2025 14:32
@EdAbati
Copy link
Contributor Author

EdAbati commented Mar 25, 2025

No problem at all @TamaraAtanasoska :)

I changed the base branch to main to make the review easier then.
Let me know if I should switch back to the integration one

Copy link
Contributor

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdAbati I found some time to skim through this again! Left a couple of comments actually one relevant comment and it does look 🔥 😁

@EdAbati
Copy link
Contributor Author

EdAbati commented May 4, 2025

Apologies for the delay, but I finally found time to address the comments! 😄

Thank you for the review everyone, please let me know if there is anything else that doesn't look right

Comment on lines +123 to +124
# TODO: remove this once we supports all narwhals inputs in this module
default_backend=nw.Implementation.PANDAS,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add this in places were narwhals is not yet supported. Since we import polars in the tests, it becomes the default backend.

Comment on lines +23 to +31
if get_polars():
return Implementation.POLARS
if get_pandas():
return Implementation.PANDAS
if get_pyarrow():
return Implementation.PYARROW
raise ImportError(
"No supported dataframe backend found. Please install either polars, pandas or pyarrow."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is everyone happy about this priority?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants