Skip to content

Conversation

@BenCurran98
Copy link
Contributor

This is to fix the issue mentioned here where XGBoost wasn't working when users had CUDA v13 present. Now the behaviour of the package is to attempt to load GPU-compatible artifacts for XGBoost via XGBoost_GPU_jll, but if it can't find a compatible artifact it falls back to the standard CPU build in XGBoost_jll

@BenCurran98
Copy link
Contributor Author

Also the underlying XGBoost library has been updated to v2.1.4

@ExpandingMan
Copy link
Collaborator

Thanks! A few minor concerns, see above.

@BenCurran98
Copy link
Contributor Author

@ExpandingMan I can't see any comments above? :)

src/Lib.jl Outdated
# only enable GPU support if there is a valid binary compatible with this system
# should we place a warning here?
if XGBoost_GPU_jll.is_available()
lib_xgboost = XGBoost_GPU_jll.libxgboost
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the symbol to lib_xgboost? Is the generator now generating that instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if there would be any confusion if I used the same symbol as the libraries exported by the JLL's. I can change it back if you like though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's even exported here, but if it were one could just use import to avoid masking.

Please change this back, it's more verbose and should be unnecessary.

test/runtests.jl Outdated

has_cuda() && @testset "cuda" begin
# only test GPU cababilities if an appropriate device + artifact is available
has_cuda() && XGBoost_GPU_jll.is_available() && @testset "cuda" begin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any scenario in which we would expect has_cuda() to be true but the xgboost cuda binaries not to be available? I would think that we'd want this to fail anyway if XGBoost_GPU_jll.is_available() fails here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the user has an incompatible CUDA version with XGBoost_GPU_jll we won't want it to fail I think? Although that's more to stop it failing if e.g. a user has CUDA 13 which isn't supported yet by XGBoost.jl, rather than specifically for the test suite. I can condition this bit just on the artifact being available though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the PR yet, just trying to see if I can help with the issue.

XGBoost has a build_info function that exposes build-time information like the CUDA version used for compiling XGBoost. If it's a CPU-only build, it should be reflected in the build_info as well. You can access this field without touching CTK13-specific functions (hence no runtime error).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel pretty comfortable saying that if has_cuda() is true but the library isn't available the tests should fail anyway. Pkg is supposed to handle ensuring it's installed and if it's missing something would have gone wrong.

I would say let's make this has_cuda() only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the PR yet, just trying to see if I can help with the issue.

XGBoost has a build_info function that exposes build-time information like the CUDA version used for compiling XGBoost. If it's a CPU-only build, it should be reflected in the build_info as well. You can access this field without touching CTK13-specific functions (hence no runtime error).

I can't quite see where this function is defined in here?

@ExpandingMan
Copy link
Collaborator

Man, this github review feature feels so broken. Ok can you see it now?

@BenCurran98
Copy link
Contributor Author

Man, this github review feature feels so broken. Ok can you see it now?

Lol yep, thanks :)

@ExpandingMan
Copy link
Collaborator

Alright, I think this is fine, but I will allow a chance for more comments before merging.

@ExpandingMan
Copy link
Collaborator

Alright, let's do this. Thanks @BenCurran98 .

@ExpandingMan ExpandingMan merged commit 00faf62 into dmlc:master Sep 19, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants