Skip to content

Conversation

@wchu207
Copy link
Contributor

@wchu207 wchu207 commented Feb 12, 2025

3 main parts to this:

  1. RIType for features, calculated by RICalculationModule
  2. RI added DBEntryField, uses RIRecord type to handle different GC column types
  3. Additions to spectral library search, to optionally use the retention indices

@robinschmid
Copy link
Member

Thanks for your pull request. I will take a look at it and try to provide feedback.

Are you already using this module / custom mzmine version in your work?
Do you have some example files and maybe a batch file that you can share with us?

@wchu207
Copy link
Contributor Author

wchu207 commented Feb 13, 2025

Thanks for your pull request. I will take a look at it and try to provide feedback.

Are you already using this module / custom mzmine version in your work? Do you have some example files and maybe a batch file that you can share with us?

I don't think I can provide you with the scans of the breath samples since that's medical information, but here's the batch file & reference files used to construct the retention index scales.

RI batch file.zip



// This holds information from the RI field in spectral libraries for gas chromatography
// The RI field is of form "s=RI/CI/n_samples n=RI/CI/n_samples p=RI/CI/n_samples" (CI denotes confidence interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this imported from the NIST RI library?
Which file is this specifically?

I know they provide much more information about each individual method so maybe we can load more info and turn this into something like an RILibrary that can help any other compound annotation module to also match RI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, this is custom-made. When I export MSP files from NIST, this is all I can get. In MSSearch this is under the field "Experimental RI median (plus-minus) deviation"

*/
public class RICalculationParameters extends SimpleParameterSet {

public static final FeatureListsParameter featureLists = new FeatureListsParameter("Feature lists", 1, Integer.MAX_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use the default constructor new FeatureListParameter and get the same behavior

@wchu207 wchu207 requested a review from robinschmid March 6, 2025 17:01
@SteffenHeu
Copy link
Member

Hi @wchu207, Robin is on a conference this week, so he will look into this PR again afterwards. In the meantime: We have added a parser for the time in this PR: #2430. Would be great if you could check if it works for this PR and swap to it.

* tolerance given in constructor) are in minutes in methods such as getToleranceRange or
* checkWithinTolerance
*/
public class RITolerance {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the javadoc I changed the text

@robinschmid
Copy link
Member

Hi @wchu207 sorry for the delay. We had some big features in the pipeline.

I think the PR can be merged after these two items:

  1. I made a PR to your branch applying the formatter, license header, and simplifying the RIRecord and harmonizing the types
  2. Can you write a short documentation page for this that can be linked from the ParameterSet class in its constructor: You can use this template or look at other modules: https://github.com/mzmine/mzmine_documentation/blob/master/docs/contribute_docu_template.md

You can find examples in the documentation:
https://mzmine.github.io/mzmine_documentation/module_docs/id_lipid_annotation/lipid-annotation.html

Simplify RIRecord and style, header, type adjustments
@wchu207
Copy link
Contributor Author

wchu207 commented Apr 7, 2025

I've accepted the pull request. Somehow this has broken RIMaxType, RIMinType, RIDiffType. I suspect that the way GCAlignerModule evaluates row bindings is involved somehow, so I'll look into that.

I'll work on the documentation after I fix that, hopefully tomorrow.

@wchu207
Copy link
Contributor Author

wchu207 commented Apr 7, 2025

I've identified the bug. RICalculationModule was modified to add the summary row types via outputList.addRowBinding(new Type().createDefaultRowBindings()) instead of adding them as feature types. Makes sense. Except GCAlignerModule creates a wholly new feature list and naively adds them as row types, omitting the feature type listeners.

I don't have an elegant fix for this. Using the summary types as feature types would be redundant and clutter things. Modifying GCAlignerModule to add the bindings would lead to the same bindings being added once per pre-existing list. We can't check if the bindings are already present since they're objects unique to each pre-existing list. One compromise would be to check if the type has been added, and if not, always add the listeners for that type, but that would mean always adding listeners and presumably there's a reason we don't do that in addRowType

@robinschmid
Copy link
Member

Oh you are right. My fault.
This is something that we never did before. Although binding a row type to another feature type should be possible somehow.

For now you can just add them as feature types again and this will resolve it. They are only added if the user requests that so no big issue

@wchu207
Copy link
Contributor Author

wchu207 commented Apr 23, 2025

Oh I just checked in and saw you requested changes wrt a javadoc. Fixed.

Also, I noticed you changed the method for selecting the scale in RICalculationTask to a search for the nearest one. The previous method's purpose wasn't obvious -- it would sort the list to newest-first, then find the first scale that was made before the sample was taken; i.e. find the nearest scale in the past.

We designed it this way because sometimes the column would degrade to the point of needing to be replaced, after which we would take new measurements for the straight-chain alkanes. This creates scales that are only applicable for runs that occur after the column was changed.

Do you mind if I revert that change? Or maybe include an option to switch between these modes? The latter option would also let me put in a description of this, which should help with clarity

@robinschmid
Copy link
Member

We just pushed a new release and I am about to merge your PR into mzmine and this will generate a dev release.
Therefore, I added some changes like more error messages and logs and wanted to notify you.
During this process I found that the module would silently run on a dataset where the samples were measured before the RI ladder. I know people should do it the opposite way but I thought it would be nice to handle this missing case by just using the closest ladder.

But you are right it makes sense to always use the previous and only in this special case just use the closest ladder.
I will change this.

@robinschmid
Copy link
Member

you can now also add the time to the file names and it will use it to decide which scale to use.
image

@robinschmid robinschmid merged commit 2733239 into mzmine:master Apr 24, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants