Skip to content

Add custom codecs for RDKit Molecules and Biotite AtomArrays#243

Merged
cwognum merged 3 commits into
mainfrom
feat/custom-object-codecs
Jan 10, 2025
Merged

Add custom codecs for RDKit Molecules and Biotite AtomArrays#243
cwognum merged 3 commits into
mainfrom
feat/custom-object-codecs

Conversation

@cwognum

@cwognum cwognum commented Jan 10, 2025

Copy link
Copy Markdown
Collaborator

Changelogs

  • Add a custom object codec for RDKit Molecules and Biotite AtomArrays

Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix, chore, documentation or test (or ask a maintainer to do it for you).

This PR builds on the discussion in #241.

The use of custom numcodecs-compatible object codecs for use in Zarr seems like a natural interface point to document and implement conversion from and to drug discovery specific format.

You would simply use this like:

import zarr
import datamol as dm
from polaris.dataset.zarr._codecs import RDKitMolCodec

root = zarr.open("test.zarr", mode="w")
root.empty("molecules", shape=100, dtype=object, object_codec=RDKitMolCodec(), chunks=(5,))
root["molecules"][0] = dm.to_mol("C1=CC=CC=C1")

@cwognum cwognum added the feature Annotates any PR that adds new features; Used in the release process label Jan 10, 2025

@jstlaurent jstlaurent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread polaris/dataset/zarr/_codecs.py Outdated
Comment thread polaris/dataset/zarr/_codecs.py Outdated
Comment thread polaris/dataset/zarr/_codecs.py
Comment thread polaris/dataset/zarr/_codecs.py
cwognum and others added 2 commits January 10, 2025 14:35
Co-authored-by: Julien St-Laurent <jstlaurent@users.noreply.github.com>

@Andrewq11 Andrewq11 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we went down that rabbit hole and figured this out. Kudos to you and @jstlaurent!

Just left a single comment but it looks good to me 🚀

Comment thread polaris/dataset/zarr/codecs.py
@cwognum cwognum merged commit ffca9da into main Jan 10, 2025
@cwognum cwognum deleted the feat/custom-object-codecs branch January 10, 2025 20:09
@cwognum cwognum self-assigned this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Annotates any PR that adds new features; Used in the release process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants