Data anonymizer from CSV/Database to CSV file. (more sources/outputs to come)
- Python 3.9
- poetry (optional)
- Clone the repository
- Install dependencies
poetry install
or
pip install -r requirements.txt- Create a
config.ymlfile with the configuration for source, output and rules to apply. (see Config) (example config) - Run:
python -m anonymize -c config.yml
Sources (see sources.py)
List of supported databases.
source:
type: db
uri: postgres://postgres:pass@localhost:5432/postgres
table: mydatasource:
type: csv
path: /path/to/data.csv
separator: '|' # Optional, default is ','Outputs (see outputs.py)
output:
type: csv
path: /path/to/output.csv
separator: '|' # Optional, default is ','Rules (see rules.py)
- The rules will validate the column name and the method, and then apply the method to the column
- If the column is not found in the source, it will be ignored.
- The if the column is not found in rules list, it will be kept as is.
Available algorithms are the ones in hashlib module.
rules:
- column: credit_card
method: hash
algorithm: md5
salt: my_very_secret_saltAvailable types are: email, firstname, lastname, fullname.
rules:
- column: name
method: fake
faker_type: firstnamerules:
- column: email
method: mask_right
n_chars: 5
mask_char: xrules:
- column: birthdate
method: mask_left
n_chars: 4
mask_char: "*"The destroy name is inspired from postgresql_anonymizer
rules:
- column: email
method: destroy
replace_with: "SOME VALUE" # Optional, default is "CONFIDENTIAL"Shuffle letters and numbers separately (example:
abc1.2!3->skM4.9!0)
rules:
- column: email
method: shuffle- 🍴 Fork the repository
- ⬇️ Install dev dependencies:
poetry install --with=devorpip install -r requirements-dev.txt - 🌳 Create a branch
git checkout -b feature/my-feature - 🔧 Make your changes
- ✅ Run formatting, linting and tests
poe all(seepyproject.toml) - 🔃 Create a pull request
- Add database output
- Validation (especially for database sources)
- More rules (rounding, etc.)
- Destroy
- Shuffle