Skip to content

Conversation

@niekdejonge
Copy link
Contributor

@niekdejonge niekdejonge commented Mar 22, 2024

Adding a module to MZMine that is able to make ms2deepscore (https://github.com/matchms/ms2deepscore) predictions.

  • Load model
  • Add test for making correct predictions for tensor
  • Implement embedding creation. (without directly calculating cosine)
  • Add method for tensorizing spectrum
  • Create flexible method for tensorizing spectrum, based on model settings (the additional metadata is still fixed)
  • Add test for correct tensorization
  • Add integration test for correct embedding generation from spectrum
  • Add dot product calculations to create similarity scores
  • Add module to MZMine to use ms2deepscore for molecular networking and library searching
  • Add a parameter to add the path to the ms2deepscore model (instead of default test model)
  • Add automatic download of ms2deepscore model (from zenodo?)
  • Add minSignals (maybe using SignalFiltersParameters)

@niekdejonge niekdejonge marked this pull request as draft March 22, 2024 12:43
@niekdejonge niekdejonge marked this pull request as ready for review April 3, 2024 14:56
@niekdejonge niekdejonge marked this pull request as draft April 3, 2024 14:56
@niekdejonge
Copy link
Contributor Author

@robinschmid Maybe a good moment for an intermediate code review.

It seems to all work pretty well. The model makes predictions and they are the same as the model in python!
Before I start adding a method for tensorizing spectra and creating a mzmine module, could you maybe do a quick code review to check if there is anything weird I am doing?

@robinschmid
Copy link
Member

I will look at it tomorrow. Sounds great so far.

robinschmid
robinschmid previously approved these changes Apr 4, 2024
Copy link
Member

@robinschmid robinschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great so far

@robinschmid
Copy link
Member

Mapping the SpectralTensors might be best done if the model can run in parallel natively but if that does not work you could do:

List<SpectralTensor> list =  ...
list.stream.parallel().map(tensor -> process to get new tensor).toList() 

This will give you a new list

@niekdejonge niekdejonge requested a review from robinschmid July 1, 2024 15:42
@niekdejonge
Copy link
Contributor Author

@robinschmid The current version has all basic functionality. MS/MS spectral networking (ms2deepscore) works. We can select a model file. I have added a test file to the resources/models so you can use this for testing (but the results will be meaningless). The larger models I tested myself also work fine. And visualization in interactive ion identity molecular networking is working.

Could you let me know if everything is implemented in a correct way and what steps we should still do?

public static final OptionalParameter<DirectoryParameter> downloadDirectory = new OptionalParameter<>(
new DirectoryParameter("Download model directory",
"The directory in which the model is stored",
Objects.requireNonNull(FileAndPathUtil.resolveInMzmineDir("models")).toString()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should set the default to the users/.mzmine/models folder. However, I was not able to check this, since it seems to be overwritten by the last set directory. (which I did not no how to reset) So please double check if this is correct. @robinschmid

public static final FileNameParameter ms2deepscoreModelFile = new FileNameParameter(
"MS2Deepscore model",
"The file location of the MS2Deepscore model, click download to download the model.",
FileSelectionType.OPEN, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if this field is updated automatically when we set the download folder. Or at least that the last folder is updated to reflect the location of the downloaded files. Is something like that easily doable?

"The file location of the MS2Deepscore model, click download to download the model.",
FileSelectionType.OPEN, true);

public static final FileNameParameter ms2deepscoreSettingsFile = new FileNameParameter(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave the user now has a lot of options. Like where to store the downloaded models, or to use a different model etc. An alternative is to remove that option altogether and just always downloading the model to a predefined location. What do you think is most user friendly? @robinschmid

@niekdejonge
Copy link
Contributor Author

niekdejonge commented Jul 3, 2024

@robinschmid @ansgarkorf Everything is working now. There is an option to download a ms2deepscore model in the module, predictions are made and it can be visualized in the network overview. I left a few comments regarding steps that could be improved for setting the parameters, but I didn't know how to implement them.

One more thing, I did not build in any check that it can only run on LC/MSMS data. Could one of you add this maybe?

I will be on holidays for about a month. So feel free to make changes and/or merge the PR if you feel like it is ready for release.

@SteffenHeu
Copy link
Member

SteffenHeu commented Jul 11, 2024

Thank you for all the work @niekdejonge and enjoy your holiday! Did @robinschmid and you discuss why MS2Deepscore does not implement the SpectralSimilarityFunction interface so we can use it everywhere where we use cosine similarity?

@robinschmid
Copy link
Member

we have to check it. And see if we can come up with a good strategy for this. Not sure if the interface design is good.
Definitely need to check how we can combine it in one module in the end. Very different parameters needed for cosine / modified cosine than for MS2Deepscore.

@SteffenHeu
Copy link
Member

The input takes a parameter set, so i think that should not be a problem, or?

@robinschmid
Copy link
Member

MS2Deepscore works so differently that it does not really fit in. No mz tolerance, no minimum matched signals.

public abstract SpectralSimilarity getSimilarity(ParameterSet parameters, MZTolerance mzTol,
      int minMatch, DataPoint[] library, DataPoint[] query);

Best is to send a batch of spectral data to calculate the embeddings - then the similarity is a simple vector comparison by cosine sim. So the spectral alignment step is not needed. I think it might be good to have two interfaces somehow - one for vector based and one for spectral based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants