GH-16147: Add new python support#16817
Draft
tomasfryda wants to merge 115 commits into
Draft
Conversation
83ea089 to
283eeae
Compare
54d6c20 to
1155ab7
Compare
|
1 similar comment
|
1fad7b2 to
dff8a71
Compare
…th the old version from the time dev-python-base was built
- utilsPY.py: replace `import imp` with `import importlib.util` and
convert imp.find_module('numpy') try/except to find_spec None-check
- pyunit_h_pubdev-8768_synthetic_data.py: same migration for
sklearn_gbmi presence check
- kmeans_aic_bic_diagnostics.ipynb: remove import imp, add importlib.util, convert two try/except imp.find_module blocks (pandas, seaborn) to find_spec None-check pattern with if-guard imports - turbofan_NOPASS_phm.ipynb: replace import imp and four-library try/except block with h2o_only flag using all(find_spec()) check
…BYTE-01) - Tests confirm tuple-precedence bug: current code returns ast.Tuple, not ast.Compare - Behavioural tests verify correct True/False returns for skip/non-skip opcodes - RED: test_should_be_skipped_uses_set_literal fails (Tuple vs Set comparator)
- Replace tuple expression with set literal so membership test is correct
- Before: `return instr in "COPY_FREE_VARS", "RESUME", "PUSH_NULL"` always truthy
- After: `return instr in {"COPY_FREE_VARS", "RESUME", "PUSH_NULL"}` correct bool
…BYTE-02..05) - Remove _unpack_opargs try/except block (private API removed in Python 3.12) - Remove import sys (no longer needed after rewrite) - Rewrite _disassemble_lambda to use dis.get_instructions() public API - Normalize RETURN_CONST (3.12+) to LOAD_CONST + RETURN_VALUE - Normalize LOAD_FAST_LOAD_FAST (3.13+) to two LOAD_FAST entries - Fix _call_bc: only skip PRECALL if instruction is actually PRECALL (3.12+ guard) - BINARY_OPS table confirmed correct, no changes needed (D-05)
…a (GH-16147) - Update version warning string from 3.11.x to 3.14.x (META-01) - Add Python 3.12, 3.13, 3.14 classifiers to setup.py (META-02) - Confirmed no bare datatable module-level imports in tests (DEP-03 satisfied by test-requirements.txt)
…ath (VAL-05) The py3.7-changed-only CI stage copies test files to a separate directory, breaking the relative ../../h2o/astfun.py path. Use h2o.__file__ to locate astfun.py via the installed package, which works regardless of workspace layout.
…L-02, VAL-03, VAL-04) Modern numpy (1.23+) passes str to converter callbacks, not bytes. The legacy .decode() call dates from 2014-2016 py2/3 compat work and breaks on Python 3.12+ where newer numpy is required. Removing .decode() works on the numpy versions used across all supported Python versions in CI.
Older numpy on Py3.7-3.10 still passes bytes to converters; numpy 1.23+ on Py3.11+ passes str. Handle both via isinstance(s, bytes) check.
…AL-02..04) On Py3.12+, native xgboost is pinned to 3.2.0 which diverges from H2O's bundled xgboost output. Compare H2O against native (xgboost 1.7.6) on Py<3.12, and against a baseline value gathered from a Py3.11 run on Py>=3.12. H2O's bundled xgboost is consistent across Python versions, so the same baseline applies to all rows.
…(VAL-02..04) pandas.util.testing was deprecated in pandas 1.0 (Jan 2020) and removed in pandas 2.0 (April 2023). pandas.testing has been the canonical path since pandas 1.0, so the migration works on all Python versions in CI.
…ce we didn't build them yet
…AL-04) Newer sklearn (1.6+, used on Py3.12+) made check_is_fitted stricter: it now requires either a trailing-underscore attribute or __sklearn_is_fitted__() method. The H2O wrapper sets self._estimator after fit() but had neither signal, so pipeline.predict() raised NotFittedError despite fit() having succeeded. Adding the protocol method works on all sklearn 1.x versions.
…VAL-02..04) statsmodels<0.14 returned NaN for Tweedie loglike(eql=False), causing the AIC comparison to silently pass via NaN propagation. statsmodels 0.14+ implements the actual Wright-Bessel based likelihood, exposing a real disagreement: H2O estimates phi by ML (golden-section), statsmodels uses Pearson chi^2 / df_resid in GLM.fit() by default. Same per-observation log-likelihood formula, different phi → ~32 llf gap (~62 in AIC) over 380 obs, exactly the curvature cost of evaluating ML at a non-ML phi. Apply the same pattern the Gamma sub-test uses (lines 95-104): pin H2O's dispersion to sm.scale via fix_dispersion_parameter + init_dispersion_parameter so both implementations evaluate Tweedie density at the same phi.
Python 3.14 added LOAD_FAST_BORROW and LOAD_FAST_BORROW_LOAD_FAST_BORROW opcodes — variants of LOAD_FAST that skip the refcount increment when the following instruction won't consume the reference. Functionally equivalent to LOAD_FAST for AST extraction (we only need the variable name). Without this normalization, lambdas like `lambda x: x.mean()` fail on Py3.14 with "Unexpected bytecode disassembly" because the disassembler emits LOAD_FAST_BORROW where Py3.13 emitted LOAD_FAST.
…ibrary phi The Tweedie ML-phi sanity block in test_glm_aic_tweedie_no_regularization asserted |H2O ML phi - statsmodels Pearson scale| / sm < 0.25. That's the wrong reference frame: H2O's `dispersion_parameter_method="ml"` and statsmodels' default GLM.fit() scale (Pearson chi^2 / df_resid) are different estimators by construction. On prostate.csv (n=380) they legitimately differ by ~40% (ML phi=2.116 vs Pearson=3.465), and the ML phi is in fact the better likelihood by ~32 nats — the previous assertion was firing on correct behaviour. Replace with a self-consistency check on H2O alone: the ML phi must, by definition, yield a loglikelihood >= the loglikelihood at any other phi. `glm_no_reg` is already trained with phi pinned to the Pearson scale; assert glm_ml_phi.loglikelihood() >= glm_no_reg.loglikelihood(). This catches a genuine golden-section regression (non-optimal phi) and does not depend on statsmodels' estimator choice.
…n_is_fitted__ Commit 6596cc5 tightened the fit signal from `self._estimator is not None` to also require `self._estimator._model_json is not None`. That works for plain estimators where `_estimator` is an H2OEstimator subclass, but breaks H2OAutoML: the AutoML wrapper has no `_model_json` of its own — model JSON lives on the leader. As a result, sklearn Pipelines containing H2OAutoMLClassifier raise NotFittedError on predict/score after a successful fit. Fix __sklearn_is_fitted__ to accept either signal: `_model_json` on the estimator directly (plain) or `.leader` on the estimator (AutoML). Mirror the same fallback in `classes_` so sklearn scorers that read `estimator.classes_` work for fitted AutoML classifiers too.
…sklearn 1.8 On sklearn 1.8, ClassifierMixin/RegressorMixin no longer carry `_estimator_type` — the type is exposed only via __sklearn_tags__. Wrappers built by make_classifier / make_regressor add the matching mixin to bases but use is_generic=False, so the dynamic __init__ doesn't take an `estimator_type` argument and the base wrapper never records `_estimator_type` on the instance. Result on sklearn 1.8: getattr(self, '_estimator_type', None) returns None, super().__sklearn_tags__() returns BaseEstimator's defaults (estimator_type=None), and the type-specific branches in our __sklearn_tags__ override never fire. is_classifier(self) then returns False, the params_as_h2o_frames decorator skips asfactor() on y, and algorithms that only support binomial classification (e.g. AdaBoost) end up training as regression and crashing in predict. Add a final MRO-based fallback: if neither _estimator_type nor the upstream tags expose the type, look for ClassifierMixin / RegressorMixin in type(self).__mro__.
dff8a71 to
a3b35f3
Compare
- Changes.md: move GH-16147 entries (Improvement + Breaking change) into the new 3.46.0.11 section. The Python 3.12-3.14 support, base_score pin, na_values kwarg, and iter_h2oframes additions were previously filed under the already-published 3.46.0.10 section, which would have made the headline feature appear to ship in binaries that do not contain it. - welcome.rst: numpy support tier corrected to 'numpy<2 on Python 3.7-3.11 and numpy>=2 on Python 3.12+'. The previous '3.11+' claim contradicted the test-requirements pin numpy==1.26.4 on Py3.11. - downloading.rst: drop the stale 'pip install future' instruction; this PR removed the last past.utils.old_div user from the runtime.
warnings.showwarning bypasses the warnings filter, so users on
unsupported Pythons could not silence the import-time message via
warnings.simplefilter('ignore'), warnings.catch_warnings(), or
PYTHONWARNINGS=ignore. Switch to warnings.warn(... UserWarning ...)
which honors the standard filter machinery.
Pre-existing bug; widened impact by adding Py3.15+ to the unsupported
range alongside this PR's 3.7-3.14 support expansion.
…fix, overflow guard - __sklearn_tags__: add a distribution-based fallback so generic wrappers (make_estimator with type='estimator') that derive classifier-ness from H2O's distribution= param resolve correctly on sklearn 1.6+. Without this, sklearn's is_classifier() would silently return False while BaseEstimatorMixin.is_classifier() returns True, so Pipeline/GridSearchCV(scoring='accuracy') would fall back to regression metrics. - __sklearn_is_fitted__: read _leader_id (cheap attribute) instead of the leader @Property (server round-trip via h2o.get_model on every check_is_fitted call — sklearn invokes this from every predict/score/decision_function). - _classes_array: also catch OverflowError when coercing canonical int-looking domains to np.int64 (e.g. INT64_MAX+1 round-trips through int() but overflows the dtype). Falls through to a string ndarray instead of propagating. - Tests cover the three new code paths (MRO fallback, distribution fallback, AutoML _leader_id) plus a regression guard asserting __sklearn_is_fitted__ never reads the `leader` property.
…tified fr= ctor - H2OPartitionIterator gains a Template Method default for _fold_assignment_numpy so KFold/StratifiedKFold only need to override _compute_fold_column and _fold_h2oframe. - Contract-change FutureWarning + large-frame UserWarning are now latched at module scope. A GridSearchCV over N hyperparameter sets used to emit N identical warnings; now exactly one. - iter_legacy() added as a one-release DeprecationWarning-gated shim yielding the pre-3.46.0.11 (train_mask_frame, test_mask_frame) contract for callers that indexed back into the H2OFrame with masks. - fold_assignments / .masks @Property aliases preserve the old attribute access for one release (DeprecationWarning). - H2OStratifiedKFold accepts an optional fr= ctor arg. iter_h2oframes on stratified now raises NotImplementedError if fr is missing — previously it silently yielded slices of the response column only, breaking the documented sklearn-compat migration path. - Large-frame estimate fixed: report max-per-yield (n_folds-1)/n_folds × n × 10 bytes, not the meaningless "per fold" figure. - Tests reset module latches per test, cover stratified raise/work, iter_legacy DeprecationWarning, property alias.
The new as_data_frame() NA-default FutureWarning previously surfaced from h2o's own display / DMatrix-conversion code paths (head, repr, xgboost training), making it look like h2o-py was yelling at itself when the warning was intended for user code calling as_data_frame directly. Pass na_values=[''] (the new default, made explicit) at: - frame.py:3869, 3893, 3906 — internal XGBoost DMatrix conversion - frame.py:5039 — H2OFrame._to_str (repr / head display) Align convert_with_polars default with as_data_frame: null_values defaults to [''] instead of '', so direct polars-path callers see the same NA semantics.
…t arg-drop astfun.py: - _call_func_ex_bc PUSH_NULL skip is now version-gated to Py3.14+ and to the flags=0 branch only. The previous unconditional skip could consume a PUSH_NULL belonging to an earlier expression on Py3.11-3.13. - Free-function calls (LOAD_GLOBAL / LOAD_FAST / LOAD_CONST callable) with non-empty args/kwargs now raise loudly with a recovery hint instead of silently producing an ExprNode with zero children. Pre-PR, `lambda x: somefunc(x)` produced bare 'somefunc' Rapids. - BINARY_OPS[26] comment corrected: NB_SUBSCR was folded into BINARY_OP in Py3.14 (BINARY_SUBSCR survives as a distinct opcode on 3.12/3.13). - Extract _PUSH_NULL constant; four literal sites replaced. expr.py: - ExprNode._arg_to_expr now unwraps numpy 2.x scalars in slice start/stop/step before arithmetic so `slice(np.int64(0), np.int64(5), np.int64(2))` produces a Rapids-parseable "[0:5:2]" form instead of "np.int64(...)" leaking into the AST. pyunit_astfun_bytecode_shapes.py (NEW): dis.get_instructions-anchored tests that the running Python's bytecode actually contains the expected opcode (CALL_KW on Py3.13+, BINARY_OP/NB_SUBSCR on Py3.14+) before asserting on lambda_to_expr output. Version-independent guards on BINARY_OPS[26]=="cols" and the cross-version Rapids form of x[0].
…nce, document tree_method pyunit_native_comparison.py: on xgboost>=2.0 the pre-PR code replaced the H2O-vs-native cross-check with a "train H2O twice and compare to itself" determinism check — a regression in bundled xgboost4j 1.6 would have passed silently. Skip explicitly (with the existing TODO to install xgboost 1.7.6 alongside 3.x in Py3.11+ images) instead of running a vacuous assertion. pyunit_xgboost_reweight_tree.py: pandas check_less_precise=3 translates to BOTH atol AND rtol set to 0.5*10^-(N+1)=5e-4 per the pandas 2.0 migration notes. The previous translation used rtol=1e-3 alone, dropping the absolute component (matters for SHAP contributions near zero where |value| << 1) AND doubling the relative tolerance. pyunit_H2OXGBoost_native_XGBoost_mixed_multinomial_compare_large_sparse_singleNode.py: document why tree_method='exact' (not 'auto') is the correct native pin. H2O's XGBoostModel.getActualTreeMethod resolves tree_method=auto to TreeMethod.exact for single-node runs with rows < 4M; the pre-PR 'auto' had native xgboost resolving to approx/hist for sparse data, an apples-to-oranges comparison. Added a 7-line comment citing the Java resolution logic. No behavior change.
The ml_llf >= pearson_llf - 1e-6 check is correct in principle (ML must dominate by definition), but the absolute 1e-6 was too tight for loglikelihood values in the 1e3-1e5 range. Golden-section search converges to ~eps*(b-a); any bump in H2O's dispersion_epsilon could trip the bound spuriously without indicating a real ML-search regression. Allow slack = max(1e-6, abs(pearson_llf) * 1e-8) — still strict enough to catch a real regression in absolute terms, scales with the LL magnitude.
…f-check
Both tests previously loosened H2O-vs-lifelines rtol to 1e-4 on
Py3.11+ for an un-bisected drift, which silently absorbed any
regression below that threshold and gave no signal to track trend.
Two changes per file:
1. _diagnose_baseline_drift prints max_abs / max_rel / argmax row
before each lifelines assertion. Surfaces the actual drift
magnitude in CI logs so a future bisect can compare green vs
failing builds without re-running the failing test.
2. _check_h2o_baseline_self_consistency promotes the Breslow identity
S_0(t) = exp(-cumsum(h_0(t))) from soft print to HARD assertion at
atol=1e-12. Pure math, independent of lifelines — a real H2O
regression in baseline_hazard_frame / baseline_survival_frame
computation trips this regardless of cross-library drift.
Measured locally with lifelines 0.30.3 + scipy 1.15.3 + pandas 2.3.3:
rossi hazard max_rel=5.2e-9, survival max_rel=1.2e-9, breslow=1.7e-15
shelter hazard max_rel=2.3e-6, survival max_rel=4.6e-6, breslow=7.0e-15
Based on these:
- rossi _BASELINE_RTOL on Py3.11+ tightened from 1e-4 to 1e-6
(200x headroom over observed worst case).
- shelter rtol left at 1e-4 — strata cases drift up to 4.6e-6, so
tightening risks false-fail on CI where BLAS thread order differs.
- Breslow self-check asserted at 1e-12 in both (5 OOM headroom).
Also seeded the previously unseeded np.random.normal in
with_strata_and_weights so the test inputs (not just the H2O/lifelines
comparison) are reproducible across runs.
… drop whitespace strip docker/scripts/install_python_version: - Use `set -euxo pipefail` explicitly so the script keeps fail-fast semantics even if invoked as `bash install_python_version ...` (which would drop the shebang -ex flags). - Replace `rm *-requirements*.txt` glob (runs in Docker default `/`) with explicit filenames — narrows the blast radius if anything else matching the pattern lands in the working dir. h2o-py/h2o/model/extensions/feature_interaction.py: replace manual pd.ExcelWriter(...) + writer.close() with `with ... as writer:` so the writer is closed even if to_excel raises mid-loop. docker/Jenkinsfile-build-docker: revert the cosmetic trailing-tab strip on line 132. Keeps the PR diff focused.
With shap>=0.40 (Py3.11 now pins 0.46.0, Py3.12+ 0.51.0) the demos use the 20,640-row california dataset, and shap.force_plot over the full training set runs hierarchical clustering with optimal leaf ordering, which scales ~cubically and ran past the 9,800s stage kill (py3.11-demo-notebooks: shap_values_drf/gbm terminated, cloud marked bad). Subsample to 1,000 rows (~3s) as shap's own warning advises; boston's 506 rows on Py3.7 are unaffected.
…IVE (Py3.12+) 'lambda col: not col' failed on 3.13/3.14 (TO_BOOL inserted before UNARY_NOT was unrecognized) and 'lambda col: +col' failed on 3.12+ (UNARY_POSITIVE is now CALL_INTRINSIC_1/INTRINSIC_UNARY_POSITIVE). Normalize both in _disassemble_lambda; extract the thrice-copied reverse-kwnames walk into _read_kwnames_args; cross-reference the two encodings of subscription. Verified identical Rapids ASTs across CPython 3.9/3.11/3.12/3.13/3.14.
iter_legacy() called _fold_h2oframe(), which raises NotImplementedError when fr= was not passed — the only constructor shape that existed before this branch. The result was never used (masks only need the fold column); drop the call and cover the stratified legacy path in the CV pyunit.
frame.py said 3.46.0.10 and cross_validation.py said 3.46.0.11 — both already shipped without this change, and Changes.md filed the entries under the released 3.46.0.11 section. Move them to a new 3.46.0.12 (in development) section and align every docstring/warning citation.
…ites h2o.ls(), h2o.as_list() and the sklearn wrapper's _to_numpy() still hit the default-NA path, so the one-shot warning could be burned by an internal call the user cannot act on (as_list had no na_values passthrough at all). Pass na_values=[''] internally and expose na_values= on as_list.
The per-coefficient comparison only printed on mismatch, so coef_tolerance had no effect. Collect offenders and assert. Drop the meaningless data= kwarg from the array-API sm.GLM calls.
…ead fallback sklearn >= 1.6 reads classes_ on every score call; resolving via the leader property round-tripped the server each time, contradicting the perf rationale in __sklearn_is_fitted__. Cache per fitted estimator. The ImportError fallback for NotFittedError was unreachable (sklearn.base is imported unconditionally) and would have broken the hasattr contract.
…parity claims __getitem__ docstring now covers the numpy-array row-selection semantics (including the deliberate asymmetry with Python lists) and Ellipsis; the as_data_frame docstring stops claiming full pandas/polars parity (NA recognition is consistent, all-NA dtype is not) and documents that na_values is ignored on the use_pandas=False path. Drop the redundant _np_ndarray() wrapper; soften the speculative driver-memory estimate.
…xing The headline NA-handling change was only covered via a mock; add real-cluster tests: literal 'NA'/'None' levels survive the default, the legacy list restores coercion, pandas and polars paths agree on NA masks, and the FutureWarning fires exactly once and only for default calls. Cover the new numpy __getitem__ branches (bare array, bool mask, tuple, Ellipsis, error dtypes) including the documented row-vs-column asymmetry.
…, doc polish - XMLTestReporter: per-JVM unique fallback token instead of the shared 'unknown' constant, which would reintroduce the file-write collision the PID suffix prevents. - expr.py: resolve numpy once at module load instead of importing per Rapids argument; skip the str copy for exact-str inputs. - install_python_version: '|| true' on the causalml-filter grep (set -e kills the build if grep selects zero lines); note the skip branch is defensive. - pyunit_native_comparison: document why this test skips on xgboost>=2 while the convert-helper comparison tests pin defaults and keep running. - pyunit_xgboost_reweight_tree: fix the 10x-off tolerance formula in the comment (code value was already correct). - README: mirror welcome.rst's numpy<2 / numpy>=2 guidance. - verify_requirements: skip distributions with missing Version metadata. - demos: reword version-adaptive comments to be reader-facing (drop test matrix / pin / env-rebuild references).
DEFAULT_PYTHON_VERSION drives the Build H2O-3 stage (jar, wheel, generated bindings, R pkg), the fallback for stages without an explicit pythonVersion (Flow), the env baked into the jdk test images, and the release pipeline. Re-root the image chain accordingly: dev-r-base and dev-mojocompat now build FROM dev-python-3.14, the warmup script and jdk-others-base install 3.14, and the image-build stash list persists dev-python-3.14. Pre-verified locally under CPython 3.14.5: full bindings codegen is byte-identical to the committed files (the build stage git-diff gate), and the wheel builds with setuptools 82, installs, and imports cleanly.
CPython 3.12+ emits SyntaxWarning for invalid escape sequences like \c, \l, \d (Rd doc markup inside plain Python strings) and documents that they will become SyntaxErrors in a future release. Double only the invalid sequences (token-level fix, valid escapes untouched) — provably content-preserving: \c already evaluated to backslash+c. Verified zero SyntaxWarnings under -W always and byte-identical codegen output.
Replace direct 'setup.py bdist_wheel' invocation (deprecated by setuptools) with 'python -m build --wheel --no-isolation' for h2o-py, h2o-py-mlflow-flavor and h2o-py-cloud-extensions; add [build-system]-only pyproject.toml files. Metadata deliberately stays in setup.py: the conda recipes harvest it via load_setup_py_data() and install through legacy 'setup.py install' on the pinned-old release toolchain — moving to [project] must ship with the conda modernization and a test release. The client variant is selected via H2O_PY_CLIENT env var under PEP 517 (argv --client kept for conda build.sh). Remove '[bdist_wheel] universal = 1' (py2 support is gone; setuptools warns this will become a build error): wheels are now py3-none-any. Updated the live name references (release pipeline, install-clients, local_dev, ec2 script with old-release fallback); CI test stages use wildcard globs and are unaffected. Verified on CPython 3.14.5 + setuptools 82: all four wheels build warning-free with correct names (h2o / h2o_client), main wheel installs and imports.
…stages invokeStage() in defineTestStages.groovy had its own DEFAULT_PYTHON = '3.7', shadowing buildConfig.DEFAULT_PYTHON_VERSION. Stages without an explicit pythonVersion (R smokes, Java smokes, Flow) activated /envs/h2o_env_python3.7 inside the rebuilt dev-r-*/dev-jdk-* images, which now only contain the 3.14 env — failing with 'No such file' before any results were produced (PR-16817 #102). Wire the default to buildConfig.DEFAULT_PYTHON_VERSION so there is a single source of truth. Hadoop stages pin pythonVersion '3.7' explicitly against their own image family and are unaffected. Also clear the 9 invalid-escape SyntaxWarnings in scripts/run.py (same token-level backslash doubling as the R generator fix; regexes unchanged).
The exists-check + makedirs pattern in _process_response races when parallel clients save into the same directory: the assembly-to-mojo2 stage runs 6 test processes sharing one freshly-wiped results dir, and two concurrent download_mojo calls hit the window between the check and makedirs — FileExistsError surfaced as a misleading 'Cannot write to file'. Flaky on any Python version; use os.makedirs(exist_ok=True).
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#16380