Skip to content

Tags: luispedro/argNorm

Tags

v1.1.0

Toggle v1.1.0's commit message
RLS Version 1.1.0

Improves NCBI mappings (particularly when data has been hAMRonized)

v1.0.0

Toggle v1.0.0's commit message
RLS Version 1.0.0

Tag current release as version 1.0 to mark paper publication. No major
changes since v0.8

v0.8.0

Toggle v0.8.0's commit message
RLS Version 0.8.0

BUG FIXES
- Fixed reference gene preprocessing for abricate's resfinder option—previous version missed to include a part of the reference genes for abricate's resfinder option.

USER-FACING CHANGES

- Added columns to output with English names for AROs, drugs, and drug classes (`ARO_name`, `confers_resistance_to_names`, `resistance_to_drug_classes_names`)
- Added `--version` argument

v0.7.0

Toggle v0.7.0's commit message
RLS Version 0.7.0

USER-FACING CHANGES

1. Add version in output
2. Mappings updated for CARD and ARO v4.0.0
3. Improved manual curation
4. Added `Cut_Off` column to outputs
5. Added `hamronization` as a subcommand to the CLI
6. Support amrfinderplus v4 alongisde amrfinderplus v3
7. Better error messages if output file is missing

INTERNAL IMPROVEMENTS

1. Update `confers_resistance_to()` to use `regulates`, `part_of`, and `participates_in` ARO relationships
2. Updated to modern build systems (uv and pixi)
3. Add testing for Python 3.13

v0.6.0

Toggle v0.6.0's commit message
RLS Version 0.6.0

Big change is adding GROOT support

Full Changelog:

- argNorm supports the GROOT v1.1.2 ARG annotation tool: https://github.com/will-rowe/groot
- GROOT support is via the `GrootNormalizer` (for use in python scripts) and the `groot` tool parameter with the `groot-db`, `groot-core-db`, `groot-argannot`, `groot-card`, and `groot-resfinder` `db` parameters in the CLI.

Other
-----

- `__version__` attribute added to the package (accessible as `argnorm.__version__` or `argnorm.lib.__version__`)
- Use atomic writing for outputs (https://github.com/untitaker/python-atomicwrites/tree/master)

funcscan integration
--------------------

- argNorm has been included as an nf-core module: https://nf-co.re/modules/argnorm/
- argNorm will also be available on the funcscan pipeline: nf-core/funcscan#410

DB harmonisation
----------------

- SARG db link was changed in `crude_db_harmonisation` to https://raw.githubusercontent.com/xinehc/args_oap/a3e5cff4a6c09f81e4834cfd9a31e6ce7d678d71/src/args_oap/db/sarg.fasta as old link (Galaxy instance, http://smile.hku.hk/SARGs) is down
- RGI outputs in `crude_db_harmonisation` are concatenated so frequencies of `perfect`, `strict`, and `loose` hits can be calculated from concatenated file

v0.5.0

Toggle v0.5.0's commit message
RLS Version 0.5.0

Updated the categorization and improved manual curation.

USER-FACING CHANGES

Improved drug categorization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `drugs_to_drug_classes()` also uses the 'has_part' ARO relationship now to get drug classes for antibiotic mixtures. In case of antibiotic mixtures, the drug classes of the drugs associated with 'has_part' are returned rather than 'antibiotic mixture' (ARO:3000707).
- 'antibiotic mixture' will not be reported as a drug class, rather the individual antibiotic classes making up the antibiotic mixture will be reported.

Improved curation
~~~~~~~~~~~~~~~~~

- **manual curation (argannot)**: `(Tet)tetH:EF460464:6286-7839:1554` was incorrectly annotated as ARO:3004797 which is a beta-lactamase due to a loose RGI hit. This was manually curated to ARO:3000175.
- **Improved curation**:
    - resfinder_curation: grdA_1_QJX10702 -> 3007380 & EstDL136_1_JN242251 -> 3000557
    - megares_curation: MEG_2865|Drugs|Phenicol|Chloramphenicol_hydrolase|ESTD -> 3000557

Bugfixes
~~~~~~~~

- `confers_resistance_to()` now gets drugs information even if it is encoded at a higher level in the ARO. For example, OXA-19 previously only returned cephalosporin and penam, but now will also return oxacillin (from AMR gene family).
- `drugs_to_drug_classes()` now correctly only returns the immediate child of 'antibiotic molecule' as the drug class (this was previously not the case for certain corner cases).
- **inconsistent ARO versions** deeparg, megares, resfinderfg & sarg curation: ARO:3004445 -> ARO:3005440, this was due to a change in the ARO and the ARO number for the RSA2 gene changing, but the version of ARO bundled with argNorm was out of sync.

INTERNAL CHANGES

- AROs were previously handled as integers in the `get_aro_mapping_table()` function and this posed challenges when ARO numbers such as 'ARO:0010004' (leading zeros leading to issues). To fix this, AROs are now treated as strings so leading zeros can be maintained.

v0.4.0

Toggle v0.4.0's commit message
RLS Version 0.4.0

Major changes:
- Bundle a specific version of ARO with the package instead of downloading it from the internet (ensures reproducibility)
- Add missing ARO mappings to manual curation.
- Command line tool accept database/tool names in case-independent way (by @sebastianLedzianowski)
- `lib.map_to_aro` returns `None` if there is no mapping (raises an exception if the name is missing)

v0.3.0

Toggle v0.3.0's commit message
RLS Version 0.3.0

Full Changelog

HANDLING GENE CLUSTERS & REVERSE COMPLEMENTS IN RESFINDER
- Resfinder has gene clusters which can't be passed through RGI using 'contig' mode.
- Gene clusters were identified and were manually assigned ARO numbers.
- A seperate file with manual curation for gene clusters and RCs was created, and their AROs were updated after concatenating RGI results and genes not in RGI results.
- 40 gene clusters present.
- 9 genes in reverse complement form also present.
- RC genes were manually curated.

USING AMINO ACID FILE FOR ARGANNOT & RESFINDER RATHER THAN NUCLEOTIDE FILE
- ARG-ANNOT and Resfinder are comprised of coding sequences. The data wasn't being handled properly before as contig mode was used when passing coding sequences to RGI. Now, the amino acid versions of ARG-ANNOT & Resfinder are used with protein mode when running the database in RGI.
- ARG-ANNOT AA file is available online. Resfinder AA file is generated using biopython.
- One to many ARO mapping such as NG_047831:101-955 to Erm(K) and almG in ARG-ANNOT eliminated as protein mode used
- A total of 10 ARO mappings changed in ARG-ANNOT

ARGNORM.LIB: MAKING ARGNORM MORE USABLE AS A LIBRARY
- Introduce `argnorm.lib` module
- Users can import the `map_to_aro` function from `argnorm.lib`. The function takes a gene name as input, maps the gene to the ARO and returns a pronto term object with the ARO mapping.
- The `get_aro_mapping_table` function, previously within the BaseNormalizer class, has also been moved to `lib.py` to give users the ability to access the mapping tables being used for normalization.
- With the introduction of `lib.py`, users will be able to access core mapping utilities through `argnorm.lib`, drug categorization through `argnorm.drug_categorization`, and the traditional normalizers through `argnorm.normalizers`.

v0.2.0

Toggle v0.2.0's commit message
RLS Version 0.2.0

ARO Mapping & Normalization

- Updated mappings and manual curation tables for latest RGI
- Hamronized ResFinderFG support
- Removed python syntax in output

Drug Categorization

- Improved drug categorization by using superclasses whenever direct drug categorization is not possible
- Added better column headings for drug categorization (confers_resistance_to and resistance_to_drug_class)

Testing

- Improved pytest testing
- Added integration tests

v0.1.0

Toggle v0.1.0's commit message
RLV Version 0.1.0

- Added hamronized support for AMRFinderPlus
- Fixed ARO:nan issue (added manually curated mapping tables and integrated it with normalizers)
- Added drug categorization feature and integrated it with normalizers
- Added AMRFinderPlusNormalizer, ResFinderNormalizer
- Added specific smoke tests for ARGSOAPNormalizer, DeepARGNormalizer, AbricateNormalizer, AMRFinderPlusNormalizer and ResFinderNormalizer