Skip to content

Reimplement NN ensemble using PyTorch#926

Merged
osma merged 53 commits into
mainfrom
issue895-nn-ensemble-pytorch
Jun 10, 2026
Merged

Reimplement NN ensemble using PyTorch#926
osma merged 53 commits into
mainfrom
issue895-nn-ensemble-pytorch

Conversation

@osma

@osma osma commented Jan 13, 2026

Copy link
Copy Markdown
Member

This PR reimplements the NN ensemble using PyTorch instead of Keras/TensorFlow.

To test this, you will have to use uv sync --group all --extra torch-cpu or similar (see comments below).

Some notes about the implementation:

  • the neural network architecture has been radically simplified; it turned out that a much simpler model (separate linear models for each concept) gives better results than the old MLP-based model
  • the old code displayed top_k_categorical_accuracy, but this was not easily available in PyTorch, so I switched to using the nDCG metric computed for a random subset (n=512 documents) of the given train set and this metric is used for early stopping
  • the progress bar shown during training now uses tdqm, so it looks a bit different than the Keras one; it is also displayed on stderr and not stdout as the old one used to be
  • the code implements early stopping; it could train up to 50 epochs (can be set with max-epochs parameter), but tracks nDCG on a small sample (n=512) of the train set and stops when scores start to decline (with patience=2)
  • the old code showed a detailed error message when model loading failed; I couldn't figure out (yet) how to do that with PyTorch models, but the model is stored with metadata (python version, torch version etc.) that may be helpful in implementing such an error message later on if it turns out to be necessary. In general, the models should be pretty much PyTorch-version-agnostic so there may not be a need for this.
  • This PR sets up a dependency group all for installing all extras (a substitute for --all-extras which won't work anymore) as well as special extras for selecting the PyTorch variant. There are now torch-cu126 and torch-cu130 extras for now (for CUDA 12.6 and 13.0, respectively), but I think the setup could quite easily be extended to other PyTorch variants such as CUDA 13.2, ROCm or Intel XPU, though obviously these would require more configuration in pyproject.toml.
  • This NN ensemble will not make use of a GPU anyway; the model is trained and inference is performed using CPU only. The model is so small that using GPU computation would not bring any practical benefit. But the infrastructure for GPU use is now in place for other PyTorch based backends such as EBM or XTransformer that would benefit from GPU computing.

Fixes #895

@osma osma self-assigned this Jan 13, 2026
@osma osma force-pushed the issue895-nn-ensemble-pytorch branch from 5bdbf64 to d82a54a Compare January 13, 2026 11:40
@codecov

codecov Bot commented Jan 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.63%. Comparing base (14b7443) to head (e318f64).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #926   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files         103      103           
  Lines        8241     8242    +1     
=======================================
+ Hits         8211     8212    +1     
  Misses         30       30           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@osma osma force-pushed the issue895-nn-ensemble-pytorch branch from d82a54a to da479eb Compare January 15, 2026 12:25
@osma

osma commented Jan 15, 2026

Copy link
Copy Markdown
Member Author

Selecting of the PyTorch variant (CPU or CUDA x.y or ROCm or...) when setting up the development environment using uv sync has been a headache, but I think I've found a workable solution. It's not super elegant, but at least it seems to work.

The problem is that uv sync wants to perform "universal resolution", that is, resolve all the transitive dependencies once and for all, then write the result into the uv.lock file. This can be parameterized by OS, Python version and some other factors, but not by anything that the user could set when running uv sync. Since different PyTorch variants have different dependencies (e.g. CUDA libraries), dependencies for each of them would have to be resolved separately.

But fortunately, it is possible to have some degree of control over the resolution by setting up "extras" and then declaring a "conflict" between them. This causes uv to "fork" the resolution into different "branches", each having their own dependency tree.

So in commit e629963, I added two new extras: torch-cpu (CPU only) and torch-cu128 (CUDA 12.8 GPU), and declared a conflict between them, i.e., you can't install both extras at the same time. (This will unfortunately cause --all-extras to stop working, which is a shame, since it means that lots of specific --extra parameters are needed in typical situations.) These extras are then tied to specific PyTorch package indexes and thus different variants of the torch package.

The end result is that these two extras can be used to select the PyTorch variant at uv sync time. The torch dependency is still also defined for the nn extra, without a specific index. This means that installing only the nn extra will install whatever is the default PyTorch variant (on Linux it is a CUDA variant).

Here are examples of how this works now:

1. uv sync without extras

This installs 439MB of dependencies, no PyTorch.

$ uv sync
Resolved 212 packages in 1.71s
      Built annif @ file:///home/oisuomin/git/Annif
Prepared 1 package in 261ms
Uninstalled 1 package in 0.21ms
Installed 1 package in 0.50ms
 ~ annif==1.5.0.dev0 (from file:///home/oisuomin/git/Annif)

$ du -sh .venv
439M	.venv

2. uv sync with just the nn extra

This installs the default PyTorch CUDA variant, for a total 2.2GB of dependencies.

$ uv sync --extra nn
Resolved 212 packages in 0.77ms
Installed 6 packages in 96ms
 + lmdb==1.7.5
 + mpmath==1.3.0
 + networkx==3.6.1
 + setuptools==80.9.0
 + sympy==1.14.0
 + torch==2.9.1

$ du -sh .venv
2.2G	.venv

3. uv sync with both nn and torch-cpu extras

This switches to the CPU-only variant of PyTorch. Dependencies are now only 1.2GB.

$ uv sync --extra nn --extra torch-cpu
Resolved 212 packages in 0.78ms
Uninstalled 1 package in 69ms
Installed 1 package in 93ms
 - torch==2.9.1
 + torch==2.9.1+cpu

$ du -sh .venv
1.2G	.venv

4. uv sync with both nn and torch-cu128 extras

This installs the PyTorch CUDA 12.8 variant and lots of nvidia-* library packages, for a whopping 7.0GB of dependencies. (I wonder why this isn't the same as the default PyTorch CUDA build that got installed in step 2 above?)

$ uv sync --extra nn --extra torch-cu128
Resolved 212 packages in 0.77ms
Uninstalled 1 package in 72ms
Installed 17 packages in 97ms
 + nvidia-cublas-cu12==12.8.4.1
 + nvidia-cuda-cupti-cu12==12.8.90
 + nvidia-cuda-nvrtc-cu12==12.8.93
 + nvidia-cuda-runtime-cu12==12.8.90
 + nvidia-cudnn-cu12==9.10.2.21
 + nvidia-cufft-cu12==11.3.3.83
 + nvidia-cufile-cu12==1.13.1.3
 + nvidia-curand-cu12==10.3.9.90
 + nvidia-cusolver-cu12==11.7.3.90
 + nvidia-cusparse-cu12==12.5.8.93
 + nvidia-cusparselt-cu12==0.7.1
 + nvidia-nccl-cu12==2.27.5
 + nvidia-nvjitlink-cu12==12.8.93
 + nvidia-nvshmem-cu12==3.3.20
 + nvidia-nvtx-cu12==12.8.90
 - torch==2.9.1+cpu
 + torch==2.9.1+cu128
 + triton==3.5.1

$ du -sh .venv
7.0G	.venv

@osma osma mentioned this pull request Jan 15, 2026
@osma

osma commented Jan 16, 2026

Copy link
Copy Markdown
Member Author

I refined the above solution by adding an all dependency group (because --all-extras cannot be used anymore). Now a basic developer install with all CPU-only extra features can be installed with:

uv sync --group all --extra torch-cpu

Maybe not ideal, but it works.

@osma osma requested a review from juhoinkinen January 16, 2026 13:54
@osma osma added this to the 1.5 milestone Jan 16, 2026
@osma osma marked this pull request as ready for review January 16, 2026 15:54
@osma osma changed the title [WIP] Reimplement NN ensemble using PyTorch Reimplement NN ensemble using PyTorch Jan 16, 2026
@juhoinkinen

juhoinkinen commented Jan 22, 2026

Copy link
Copy Markdown
Member

I ran benchmarking runs using Annif-tutorial YSO-NLF dataset on annif-data-kk server (it has 6 CPUs).

The used script and output data are in the benchmarking branch

train

Before (main) -j1 After (this PR) -j1 Before (main) -j6 After (this PR) -j6
user time (seconds) 2810.63 3023.01 2948.25 3208.04
percent CPU 106% 112% 571% 538%
wall time 44:26.96 45:36.19 8:45.21 10:10.04
max RSS 3_368_876 7_076_980 2_599_604 6_764_364
model disk size (bytes) 1_304_759_580 1_131_495_858 (same as -j1) (same as -j1)

eval

Before (main) -j1 After (main) -j6 Before (this PR) -j1 After (this PR) -j6
user time 475.29 471.15 485.92 473.70
percent CPU 99% 99% 498% 507%
wall time 7:58.65 7:53.83 1:38.66 1:34.24
max RSS 2_666_460 2_176_184 2_105_688 1_840_860
nDCG 0.4805 0.4750 0.4775 0.4691

Compared to TensorFlow implementation PyTorch requires twice as much memory in training and is slightly slower (107% in usertime); but in inference the situation is the opposite: PyTorch is faster (~98%) and takes less memory.

@osma

osma commented Jan 22, 2026

Copy link
Copy Markdown
Member Author

Thanks @juhoinkinen ! The RAM usage doubling is interesting. First hypothesis: Maybe PT uses higher precision floats than TF? I'll investigate.

This comment was marked as outdated.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Comment thread annif/backend/nn_ensemble.py
Comment thread annif/backend/nn_ensemble.py Outdated
Comment thread README.md Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Comment thread annif/backend/nn_ensemble.py Outdated
Comment thread annif/backend/nn_ensemble.py
Comment thread tests/test_backend_nn_ensemble.py
Comment thread README.md

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Comment on lines +374 to +377
n_samples = len(dataset)
n_eval = min(self.EARLY_STOP_EVAL_ROWS, n_samples)
eval_indices = rng.choice(n_samples, size=n_eval, replace=False)
eval_inputs, eval_targets = dataset.get_subset(eval_indices.tolist())
Comment on lines +386 to +389
criterion = nn.BCEWithLogitsLoss()
early_stopping = EarlyStopping(patience=self.EARLY_STOPPING_PATIENCE)

for epoch in range(max_epochs):
Comment thread annif/backend/__init__.py
Comment on lines 50 to +51
except ImportError:
raise ValueError(
"Keras and TensorFlow not available, cannot use " + "nn_ensemble backend"
)
raise ValueError("PyTorch not available, cannot use nn_ensemble backend")
@osma

osma commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

I dropped the last two commits (related to the schemathesis test failues) from this PR branch and moved them into their own PR #941 , to be merged first.

@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

@osma osma requested a review from juhoinkinen June 2, 2026 14:12
@juhoinkinen

Copy link
Copy Markdown
Member

Results of current implementation measured like before:

train

Before (main) -j1 After (this PR) -j1 Before (main) -j6 After (this PR) -j6
user time (seconds) 2810.63 2815.48 2948.25 2973.60
percent CPU 106% 107% 571% 576%
wall time 44:26.96 44:35.79 8:45.21 8:50.06
max RSS 3_368_876 3_744_964 2_599_604 2_955_204
model disk size (bytes) 1_304_759_580 1_247_592_467 (same as -j1) (same as -j1)

eval

Before (main) -j1 After (main) -j1 Before (this PR) -j1 After (this PR) -j6
user time 475.29 474.49 485.92 494.97
percent CPU 99% 99% 498% 505%
wall time 7:58.65 7:57.95 1:38.66 1:38.75
max RSS 2_666_460 2_189_608 2_105_688 1788876
nDCG 0.4805 0.4774 0.4775 0.4755

@mjsuhonos

mjsuhonos commented Jun 10, 2026 via email

Copy link
Copy Markdown
Contributor

@osma

osma commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

I think this is now good enough. Internal benchmarks show that with the Finto AI YSO models, results are on average better than with the old NN ensemble, though on some data sets they got worse.

There is still another model variant (torch_nn_split) implemented as a prototype in the nn-ensemble-experiments repository. It is a bit more complex than this one and I know it works better on some data sets (esp. KOKO). Maybe it can be integrated with this backend as a variant in the future.

@osma osma merged commit f6030a3 into main Jun 10, 2026
15 checks passed
@osma osma deleted the issue895-nn-ensemble-pytorch branch June 10, 2026 10:03
@san-uh

san-uh commented Jun 12, 2026

Copy link
Copy Markdown

Here are the results from a new test that @mfakaehler and I carried out, too.
This is a repeat of the test whose results we posted on 23 February 2026. For details of the test conditions (Test case settings, Singlemodells, nn-ensemble parameters and Technical settings) please see above #926 (comment)

We used nn classic based on Keras/Tensorflow with Annif.1.4.1 and, for the new nn based on PyTorch, the Annif version 1.5.0.dev0.

train (25.000 tocs)

nn classic 25k Keras/TensorFlow(main) -j80 nn new 25k PyTorch (this PR) -j80
real time 66m12,713s 17m53,712s
model disk size 1024 MB (nn-train.mdb) 2,6 GB (nn-model.keras) 99 MB (nn-train.mdb) 8,8 MB (nn-model.pt)

The nn-train.mdb and nn-model.pt files are much smaller in the new PyTorch neural network than in the classic neural network. The runtime is also shorter (though this is, of course, partly due to the early stopping functionality).

Note: The new nn PyTorch ensemble has stopped after the 4th epoch due to the early stopping functionality. Message:
Backend nn_ensemble: Epoch 4/50: NDCG=0.9869
Backend nn_ensemble: Model no longer improving, using best epoch 2.

eval (40.885 tocs)

nn classic 25k Keras/TensorFlow(main) -j80 nn new 25k PyTorch (this PR) -j80
real time 12m53,328s 10m23,170s
F1@5 (doc avg) 0.3941 0.4075
F1@10 (doc avg) 0.2880 0.2946
NDCG@10 0.6383 0.6613

The new nn PyTorch is 2 minutes 30 seconds faster in evaluation than nn classic.

There is just a few difference in performance: But new nn PyTorch is a little bit better than the nn classic achieve.

index (40.885 tocs)

nn classic 25k Keras/TensorFlow(main) -j80 nn new 25k PyTorch (this PR) -j80
real time 66m12,713s 105m58,835s

The nn classic requires 0.097 seconds per document. The new nn PyTorch requires 0.155 seconds per document.

Thank you very much for this development @osma and best regards to Helsinki!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reimplement NN ensemble using Pytorch instead of TensorFlow

7 participants