Skip to content

Conversation

@andreasprlic
Copy link
Member

@andreasprlic andreasprlic commented Dec 14, 2025

Summary:

In this MR we are adding support to create correct hgvs_p names, for variants that impact Selenocysteine positions.

Previously they would yield p.?, however now it is possible to create names like NP_065184.2:p.Sec462Arg, or NP_065184.2:p.U127del

Details:

  1. Implemented selenocysteine auto-detection in RefTranscriptData.init:
  • After getting the protein accession, fetches the reference protein sequence

  • Checks if it contains 'U' (selenocysteine)

  • If found, re-translates using TranslationTable.selenocysteine

  • Stores the actual translation table used in self.translation_table

  1. Updated VariantMapper.c_to_p to use the detected translation table:
  • Passes reference_data.translation_table to AltSeqBuilder so variant sequences use the correct table

Unfortunately also needed to address linting in the files, since I could not commit otherwise.

Copy link
Member

@reece reece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, as usual @andreasprlic. Tests pass for me and the code makes sense.

Please address the minor comment below. Sorry you got tangled up in the rollout of the ruff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants