ENH: minor speedups in non-tree KernelExplainer #3983

tylerjereddy · 2025-01-30T19:59:29Z

A few minor simplifications/adjustments that seem to give about 5-10% better performance for non-tree KernelExplainer for the scenario described at Query, ENH: faster KernelExplainer #3943.
I don't think we have formal asv-style benchmarks in this project, but informal testing with 5 trials each (times in seconds) for the reproducer in the above issue (on an ARM Mac):
- master: 52.803, 50.711, 51.503, 50.613, 51.348
- this branch: 47.557, 47.128, 48.197, 47.651, 47.216
This is less impactful than then improvements at ENH: faster non-tree KernelExplainer #3944, but also less intrusive, and the improvements should compound with each other.
I added postdoc @arhall0 as a co-author on the commit, since they did the initial profiling work to find the ~10% bottleneck.
Since it appears that we only really need to identify the first point of difference between the arrays, it is probably possible to write an early-break algorithm that is more efficient, but that would probably increase complexity/need for compiled backend, so this is just a start to bump things a little bit.

Checklist

All pre-commit checks pass.
Unit tests added (if fixing a bug or adding a new feature)

* A few minor simplifications/adjustments that seem to give about 5-10% better performance for non-tree `KernelExplainer` for the scenario described at shapgh-3943. * I don't think we have formal `asv`-style benchmarks in this project, but informal testing with 5 trials each (times in seconds) for the reproducer in the above issue (on an ARM Mac): - `master`: 52.803, 50.711, 51.503, 50.613, 51.348 - this branch: 47.557, 47.128, 48.197, 47.651, 47.216 * This is less impactful than then improvements at shapgh-3944, but also less intrusive, and the improvements should compound with each other. Co-authored-by: Aaron R Hall <arhall@lanl.gov>

tylerjereddy · 2025-01-30T20:16:24Z

shap/explainers/_kernel.py

+            return 0 if np.allclose(i, j, equal_nan=True) else 1
+        elif hasattr(i, "dtype") and hasattr(j, "dtype"):
+            if np.issubdtype(i.dtype, np.number) and np.issubdtype(j.dtype, np.number):
+                return 0 if np.allclose(i, j, equal_nan=True) else 1


This part of the changes was needed because some "perfectly-normal" NumPy arrays in the testsuite were not caught by the condition above after removing the frompyfunc business (which, of course, doesn't actually make CPython code any faster).

Obviously, make sure you/we are comfortable that the current testsuite is robust for this codepath. It did seem to have many failures without this shim, so possibly coverage is "ok."

CloseChoice · 2025-02-08T14:55:14Z

shap/explainers/_kernel.py

+            return 0 if np.allclose(i, j, equal_nan=True) else 1
+        elif hasattr(i, "dtype") and hasattr(j, "dtype"):
+            if np.issubdtype(i.dtype, np.number) and np.issubdtype(j.dtype, np.number):
+                return 0 if np.allclose(i, j, equal_nan=True) else 1


Thanks for the comment. I started a new conversation on these lines 'cause I somehow couldn't comment + suggest on the 4 lines on the previous one. Isn't there also a case where i and j are not a subtype of np.number? Is there a chance that we return None here? I can't think of one but just to be sure could we do:

Suggested change

return 0 if np.allclose(i, j, equal_nan=True) else 1

elif hasattr(i, "dtype") and hasattr(j, "dtype"):

if np.issubdtype(i.dtype, np.number) and np.issubdtype(j.dtype, np.number):

return 0 if np.allclose(i, j, equal_nan=True) else 1

return 0 if np.allclose(i, j, equal_nan=True) else 1

elif hasattr(i, "dtype") and hasattr(j, "dtype"):

if np.issubdtype(i.dtype, np.number) and np.issubdtype(j.dtype, np.number):

return 0 if np.allclose(i, j, equal_nan=True) else 1

return 0 if i == j else 1

To have the general comparison as a fallback for all cases where if/else branches might be missing?

CloseChoice · 2025-02-08T15:03:17Z

shap/explainers/_kernel.py

                    x_group = x_group.todense()
-                num_mismatches = np.sum(np.frompyfunc(self.not_equal, 2, 1)(x_group, self.data.data[:, inds]))
-                varying[i] = num_mismatches > 0
+                varying[i] = self.not_equal(x_group, self.data.data[:, inds])


nice improvement

codecov · 2025-02-08T15:23:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.71%. Comparing base (baace0f) to head (b3c8bb8).
Report is 39 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3983      +/-   ##
==========================================
+ Coverage   64.67%   64.71%   +0.04%     
==========================================
  Files          92       92              
  Lines       12862    12884      +22     
==========================================
+ Hits         8318     8338      +20     
- Misses       4544     4546       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

* Expand the handling of array/object types in `not_equal()` based on reviewer feedback. Add a related test that fails when using the naive `i == j` fallback.

for more information, see https://pre-commit.ci

tylerjereddy · 2025-02-09T01:02:34Z

shap/explainers/_kernel.py

+                return 0 if np.allclose(i, j, equal_nan=True) else 1
+            if np.issubdtype(i.dtype, np.bool_) and np.issubdtype(j.dtype, np.bool_):
+                return 0 if np.allclose(i, j, equal_nan=True) else 1
+            return 0 if all(i == j) else 1


I tried to address #3983 (comment), but found that I actually needed a slight adjustment to the fallback for array-like inputs with all(). I also added custom handling for bool_, which seems "ok" with np.allclose().

tylerjereddy · 2025-02-09T01:09:39Z

tests/explainers/test_kernel.py

    np.testing.assert_allclose(sigm(shap_values.values.sum(1) + explainer.expected_value), pred, atol=1e-04)
+
+
+@pytest.mark.parametrize("dt", [np.bool_, np.object_])


This test is a bit weird in the sense that it doesn't really "fail before" and "pass after," but it does fail if the fallback I recently added uses return 0 if i == j else 1 instead of return 0 if all(i == j) else 1 because for the np.object_ dtype you'll get the typical ValueError: The truth value of an array with more than one element is ambiguous.

So it is more of a sniff test for things slipping through the cracks, but it isn't super picky beyond that, and in fact the original returning of None noted above in some cases still allows the test to pass. It may be possible to cook the test up in a way that is more stringent than that, but it is at least doing something.

* Relax the stringency of the numeric closeness check in `test_explainer_non_number_dtype`, which was failing in CI.

tylerjereddy · 2025-02-09T01:14:47Z

tests/explainers/test_kernel.py

+    rf.fit(X, y)
+    explainer = shap.KernelExplainer(model=rf.predict_proba, data=X, random_state=seed)
+    shap_values = explainer(X)
+    np.testing.assert_allclose(shap_values.values.max(), 0.26548, rtol=1e-2)


the rtol needed for CI to be happy is rather high; conversely, the assertion here isn't really needed for the purpose of the test, which is mostly to fail if there is fundamental logic issue in not_equal like using i == j when all(i == j) is needed.

CloseChoice

LGMT, very nice.

tylerjereddy commented Jan 30, 2025

View reviewed changes

CloseChoice requested changes Feb 8, 2025

View reviewed changes

tylerjereddy and others added 2 commits February 8, 2025 17:56

MAINT: PR 3983 revisions

d9e83d7

* Expand the handling of array/object types in `not_equal()` based on reviewer feedback. Add a related test that fails when using the naive `i == j` fallback.

[pre-commit.ci] auto fixes from pre-commit.com hooks

703335a

for more information, see https://pre-commit.ci

tylerjereddy commented Feb 9, 2025

View reviewed changes

MAINT, TST: PR 3983 revisions

b3c8bb8

* Relax the stringency of the numeric closeness check in `test_explainer_non_number_dtype`, which was failing in CI.

tylerjereddy commented Feb 9, 2025

View reviewed changes

CloseChoice approved these changes Feb 11, 2025

View reviewed changes

CloseChoice merged commit f1808f5 into shap:master Feb 11, 2025
18 checks passed

tylerjereddy deleted the treddy_issue_3943_close_speedup branch February 11, 2025 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: minor speedups in non-tree KernelExplainer #3983

ENH: minor speedups in non-tree KernelExplainer #3983

Uh oh!

tylerjereddy commented Jan 30, 2025

Uh oh!

tylerjereddy Jan 30, 2025

Uh oh!

CloseChoice Feb 8, 2025

Uh oh!

CloseChoice Feb 8, 2025

Uh oh!

codecov bot commented Feb 8, 2025 •

edited

Loading

Uh oh!

tylerjereddy Feb 9, 2025

Uh oh!

tylerjereddy Feb 9, 2025

Uh oh!

tylerjereddy Feb 9, 2025

Uh oh!

CloseChoice left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		np.testing.assert_allclose(sigm(shap_values.values.sum(1) + explainer.expected_value), pred, atol=1e-04)


		@pytest.mark.parametrize("dt", [np.bool_, np.object_])

ENH: minor speedups in non-tree KernelExplainer #3983

ENH: minor speedups in non-tree KernelExplainer #3983

Uh oh!

Conversation

tylerjereddy commented Jan 30, 2025

Checklist

Uh oh!

tylerjereddy Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

CloseChoice Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

CloseChoice Feb 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tylerjereddy Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Feb 9, 2025

Choose a reason for hiding this comment

Uh oh!

CloseChoice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 8, 2025 •

edited

Loading