Skip to content

Conversation

@joefutrelle
Copy link
Contributor

This new command provides the ability to produce lists of bins on the backend based on filter parameters, and optionally to take action to move bins between datasets.

@joefutrelle joefutrelle added the enhancement New feature or request label Nov 25, 2024
@joefutrelle joefutrelle self-assigned this Nov 25, 2024
@joefutrelle joefutrelle modified the milestones: v4.3, 4.4 Nov 25, 2024
@mike-kaimika
Copy link
Collaborator

@joefutrelle it looks like what it say sit does from the help text, but I'd consider:

  1. Renaming it to something other than "listbins", because it has the potential to be destructive with the --remove-dataset argument. E.g., "managebins"
  2. Split the add and remove flags to a separate command

I could be persuaded against it, but with something named "list" I think of it as not destructive and that the --remove-dataset flag wouldn't actually physically delete something. You could interpret that as "find me these matching records, but remove this dataset from the list of returned results"

@joefutrelle
Copy link
Contributor Author

This is on track to become a kind of "swiss army knife" for dealing with filtered lists of bins, and I think that approach might be easier to maintain than a set of commands all of which take all the filtering params. So I'm inclined to go for just renaming it to something that reflects this multi-purpose usage, like bintool (since it's used for things other than "managing" bins). If we go with 2 instead, we just need a common codebase for the filtering args and the impl of filtering so we're not repeating ourselves.

@mbrosnahan
Copy link

A concern(?) is potential conflict with normal accession process. The latter is based on poll of directories that are associated with a given time series.
If listbins were used to remove a time series association but bin files remain in the directory associated with the removed time series, association will be reestablished on next accession.
Intuition in the converse is less obvious, i.e., case where a time series association is made with a particular bin but its files are not present in the time series' data directories. I can experiment but my guess is that without bin being present in directories of a given time series, images from the bin will not be served through time-series associated links.

This tool is still really helpful and needed for some of our use cases but important to spell out the intended behavior and if/how actions should be tied to organization of data files.

@joefutrelle
Copy link
Contributor Author

joefutrelle commented Nov 25, 2024

If listbins were used to remove a time series association but bin files remain in the directory associated with the removed time series, association will be reestablished on next accession.

True, but why would an IFCB data manager put themselves in this situation?

Edit: not true, existing bins (ones that have an entry in the database) are skipped during accession.

@mike-kaimika
Copy link
Collaborator

@joefutrelle I'm good with calling it bintools. Agreed on needing to pull out a common library/class if there were separate commands that all needed to do filtering

To cover the other scenarios mentioned, you could add additional flags, like is_deleted to indicate a soft delete and prevent accession re-adding it on the next sync. And then the opposite missing_files or something to that effect to indicate there's physical files missing.

But those would likely be out of scope for this particular tool/command

@joefutrelle
Copy link
Contributor Author

OK, awaiting additional feedback / discussion from @mbrosnahan (or others) before proceeding further.

@mbrosnahan
Copy link

I tinkered a bit with habon-ifcb instance to evaluate how listbins (now bintool) behaves and potential for conflict with regular accession behavior.

Experiment 1:
Used listbins to assign a time series [NEW] to bins present/associated with another time series [OLD]. Bin data is present in [OLD] data directory but not in [NEW].
Result: Records of bins are visible in [NEW] time series page but image data is not (no collage). Records and images are both visible/accessible via OLD webpage..

Experiment 2:
Used listins to remove [OLD] time series from bins assigned to [NEW] in experiment 1.
Result: Bins no longer visible or accessible in dataset OLD. This is true both via Web interface/timeline and when using listbins function to filter/query dataset OLD.

Experiment 3:
Used Dataset Management (Web UI) tool to repeat accession of [OLD].
Results: NEW bins added back to OLD. This is contrary to @joefutrelle here.

Experiment 4:
Used listbins to repeat remove-dataset of bins from OLD then confirmed not present via both listbins query and refresh of webpage. Next, used Web UI accession tool again but checked 'Newest data only'.
Result: NEW bins not added back to OLD. This is in agreement with @joefutrelle note.

Experiment 5:
Repeated Experiment 4 but used 'manage.py syncdataset --newest OLD'. Same result as Experiment 4. Bins not added back to time series.

Experiment 6:
Repeated Experiment 3 but used 'manage.py sync dataset OLD'. Same result as Experiment 3. Bins added back (listbins action overwritten by syncdataset tool).

@mbrosnahan
Copy link

Is all of above stable and as intended? If yes, we will move forward with implementation of listbins/bintool that allows credentialed NHABON network contributors to share IFCB data via our WHOI-based ifcbdb instance. Data sharer app ingests outside IFCB operator bins via AWS, check integrity, then publishes through habon-ifcb instance on operator configured time-series endpoints.
Bins pushed by an outside operator will be logged and operator will have the ability to manage/reassign bins to different time-series without direct handling/intervention by habon-ifcb admin team.

Example scenario:
Outside operator BEN deploys IFCB with data sharer app and configures to stream results to habon-ifcb dashboard time-series 'BEN_ts1'. Data are pushed as created to habon-ifcb data system and published at https://habon-ifcb.whoi.edu/timeline?dataset=BEN_ts1
BEN redeploys IFCB at another location but mistakenly pushes data to BEN_ts1 when intended time series was BEN_ts2. Via data sharer web portal, BEN reassigns bins from BEN_ts1 to BEN_ts2. Data sharer app propagates these reassignments so that appropriate bins are only accessible via https://habon-ifcb.whoi.edu/timeline?dataset=BEN_ts2

@joefutrelle
Copy link
Contributor Author

Apologies for the misstatement above. Here's the actual logic that will re-add a bin when it already exists. Everything is working as intended; @mbrosnahan you're bringing up new use cases that aren't covered in the current design which is why you're seeing behavior you don't want.

b, created = Bin.objects.get_or_create(pid=pid, defaults={
'timestamp': timestamp,
'sample_time': timestamp,
'instrument': instrument,
'skip': True, # in case accession is interrupted
})
if not created:
self.dataset.bins.add(b)
continue

The current design is based on the idea that a dataset is the mechanism for associating directories with bins, not the bin object. If you want arbitrary combinations of bins, datasets, and directories, there would need to be a fundamental change to how we model and implement the location of the data for a bin, as well as the management tools around them.

I can see some ways to handle some of these scenarios without such a fundamental redesign. For instance you can have more than one dataset point to the same directory and then you can add/remove the bin from either or both datasets without breaking the link to the data or requiring syncing more than once.

@joefutrelle
Copy link
Contributor Author

I'm inclined to merge this, since it's working as intended, and work on new features and capabilities in another PR.

@mbrosnahan
Copy link

Thanks, Joe and Mike, for discussion here. Wonderful that listbins add/remove dataset functions are available now!

@joefutrelle joefutrelle merged commit 2e68ff5 into master Dec 3, 2024
@joefutrelle joefutrelle deleted the listbins branch December 3, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants