Reimplement NN ensemble using PyTorch#926
Conversation
5bdbf64 to
d82a54a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #926 +/- ##
=======================================
Coverage 99.63% 99.63%
=======================================
Files 103 103
Lines 8241 8242 +1
=======================================
+ Hits 8211 8212 +1
Misses 30 30 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d82a54a to
da479eb
Compare
|
Selecting of the PyTorch variant (CPU or CUDA x.y or ROCm or...) when setting up the development environment using The problem is that But fortunately, it is possible to have some degree of control over the resolution by setting up "extras" and then declaring a "conflict" between them. This causes uv to "fork" the resolution into different "branches", each having their own dependency tree. So in commit e629963, I added two new extras: The end result is that these two extras can be used to select the PyTorch variant at Here are examples of how this works now: 1.
|
|
I refined the above solution by adding an Maybe not ideal, but it works. |
|
I ran benchmarking runs using Annif-tutorial YSO-NLF dataset on annif-data-kk server (it has 6 CPUs). The used script and output data are in the benchmarking branch train
eval
Compared to TensorFlow implementation PyTorch requires twice as much memory in training and is slightly slower (107% in usertime); but in inference the situation is the opposite: PyTorch is faster (~98%) and takes less memory. |
|
Thanks @juhoinkinen ! The RAM usage doubling is interesting. First hypothesis: Maybe PT uses higher precision floats than TF? I'll investigate. |
…upports CUDA 12.6 and 13.0 + 13.2 but no longer 12.8); declare tqdm dependency
…gs to match reality
| n_samples = len(dataset) | ||
| n_eval = min(self.EARLY_STOP_EVAL_ROWS, n_samples) | ||
| eval_indices = rng.choice(n_samples, size=n_eval, replace=False) | ||
| eval_inputs, eval_targets = dataset.get_subset(eval_indices.tolist()) |
| criterion = nn.BCEWithLogitsLoss() | ||
| early_stopping = EarlyStopping(patience=self.EARLY_STOPPING_PATIENCE) | ||
|
|
||
| for epoch in range(max_epochs): |
| except ImportError: | ||
| raise ValueError( | ||
| "Keras and TensorFlow not available, cannot use " + "nn_ensemble backend" | ||
| ) | ||
| raise ValueError("PyTorch not available, cannot use nn_ensemble backend") |
57d2d88 to
e6b934c
Compare
|
I dropped the last two commits (related to the schemathesis test failues) from this PR branch and moved them into their own PR #941 , to be merged first. |
|
|
Results of current implementation measured like before: train
eval
|
|
🚀🚀🚀On Jun 10, 2026, at 9:03 AM, Juho Inkinen ***@***.***> wrote:juhoinkinen left a comment (NatLibFi/Annif#926)
Results of current implementation measured like before:
train
Before (main) -j1
After (this PR) -j1
Before (main) -j6
After (this PR) -j6
user time (seconds)
2810.63
2815.48
2948.25
2973.60
percent CPU
106%
107%
571%
576%
wall time
44:26.96
44:35.79
8:45.21
8:50.06
max RSS
3_368_876
3_744_964
2_599_604
2_955_204
model disk size (bytes)
1_304_759_580
1_247_592_467
(same as -j1)
(same as -j1)
eval
Before (main) -j1
After (main) -j1
Before (this PR) -j1
After (this PR) -j6
user time
475.29
474.49
485.92
494.97
percent CPU
99%
99%
498%
505%
wall time
7:58.65
7:57.95
1:38.66
1:38.75
max RSS
2_666_460
2_189_608
2_105_688
1788876
nDCG
0.4805
0.4774
0.4775
0.4755
—Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
I think this is now good enough. Internal benchmarks show that with the Finto AI YSO models, results are on average better than with the old NN ensemble, though on some data sets they got worse. There is still another model variant (torch_nn_split) implemented as a prototype in the nn-ensemble-experiments repository. It is a bit more complex than this one and I know it works better on some data sets (esp. KOKO). Maybe it can be integrated with this backend as a variant in the future. |
|
Here are the results from a new test that @mfakaehler and I carried out, too. We used nn classic based on Keras/Tensorflow with Annif.1.4.1 and, for the new nn based on PyTorch, the Annif version 1.5.0.dev0. train (25.000 tocs)
The nn-train.mdb and nn-model.pt files are much smaller in the new PyTorch neural network than in the classic neural network. The runtime is also shorter (though this is, of course, partly due to the early stopping functionality). Note: The new nn PyTorch ensemble has stopped after the 4th epoch due to the early stopping functionality. Message: eval (40.885 tocs)
The new nn PyTorch is 2 minutes 30 seconds faster in evaluation than nn classic. There is just a few difference in performance: But new nn PyTorch is a little bit better than the nn classic achieve. index (40.885 tocs)
The nn classic requires 0.097 seconds per document. The new nn PyTorch requires 0.155 seconds per document. Thank you very much for this development @osma and best regards to Helsinki! |
This PR reimplements the NN ensemble using PyTorch instead of Keras/TensorFlow.
To test this, you will have to use
uv sync --group all --extra torch-cpuor similar (see comments below).Some notes about the implementation:
top_k_categorical_accuracy, but this was not easily available in PyTorch, so I switched to using the nDCG metric computed for a random subset (n=512 documents) of the given train set and this metric is used for early stoppingmax-epochsparameter), but tracks nDCG on a small sample (n=512) of the train set and stops when scores start to decline (with patience=2)allfor installing all extras (a substitute for--all-extraswhich won't work anymore) as well as special extras for selecting the PyTorch variant. There are nowtorch-cu126andtorch-cu130extras for now (for CUDA 12.6 and 13.0, respectively), but I think the setup could quite easily be extended to other PyTorch variants such as CUDA 13.2, ROCm or Intel XPU, though obviously these would require more configuration inpyproject.toml.Fixes #895