Skip to content

Releases: jftuga/deidentification

v1.3.2

03 May 12:10
ab95888

Choose a tag to compare

What's Changed

  • Add fraction symbol normalization by @jftuga in #16

Full Changelog: v1.3.1...v1.3.2

v1.3.1

24 Mar 12:23
6e5b7fe

Choose a tag to compare

What's Changed

  • Use single quotes inside a double-quoted f-string by @danielmlow in #14
  • add support for 'her' object pronoun by @jftuga in #15

New Contributors

Full Changelog: v1.3.0...v1.3.1

v1.3.0

10 Jan 21:06
72a5a7d

Choose a tag to compare

add --exclude option

Add ability to exclude entities from de-identification with -x, --exclude.

  • This uses a comma as the delimiter to allow for multiple entities.
  • Comma can be overridden by setting the DEIDENTIFY_EXCLUDE_DELIM environment variable.

The Python API can also use this option be setting a DeidentificationConfig.excluded_entities option to a Python set data type.


Improve Python API

  • reset all internal variables at the beginning of the deidentify method
  • lower-case all config.excluded_entities
  • added API testing with api_test.py

v1.2.1

04 Jan 22:55
1a3fd96

Choose a tag to compare

prepare for PiPY deployment

  • create and/or update files for PyPI
  • Created Makefile and get_project_name.py to deploy to test and prod PyPI servers
  • updated install instructions in README.md
  • set minimum Python version to 3.10

allow for multiple languages

  • allow for multiple languages in the future by making GENDER_PRONOUNS a dict which uses the DeidentificationLanguages Enum-style class as keys
  • moved helper classes to deidentification_constants.py to avoid a circular dependency
  • DeidentificationLanguages now maps the default DeidentificationConfig.replacement word to a language-specific noun, such as PERSON

v1.2.0

03 Jan 22:13
26c0fa7

Choose a tag to compare

Model Download

  • When a spaCy model has not been downloaded, advise the user on how to manually download it.

v1.1.2

03 Jan 03:19
afb0cd1

Choose a tag to compare

Small Bug Fixes

  • get_identified_elements() will now always return pronouns
    • If multiple passes were needed in deidentify(), then get_identified_elements() would not have returned any pronouns.
  • use self.text instead of self.replaced_text in get_identified_elements()
  • Include small refinements to README.md

v1.1.0

02 Jan 13:41
b2d8f1e

Choose a tag to compare

CLI Improvements

  • added third-party VeryPrettyTable module as a dependency
  • documented the CLI program, deidentify in README.md
  • added -t to save detected entities to a JSON file to the CLI
  • added -d for debug mode to the CLI
  • use the third-party chardet module to detect file character encodings for input files
  • updated Deidentification class to accommodate these CLI options

v1.0.0

02 Jan 01:40
1c10533

Choose a tag to compare

1.0.0