GitHub - bvader/elastic-pii

Introduction:

The prevalence of high-entropy logs in distributed systems has significantly raised the risk of PII (Personally Identifiable Information) seeping into our logs, which can result in security and compliance issues. This 2-part blog delves into the crucial task of identifying and managing this issue using Elasticsearch. We will explore using NLP (Natural Language Processing) and Pattern matching to detect, assess, and, where feasible, redact PII from logs that are being ingested into Elasticsearch.

In Part 1 of this blog, that can be found here on the Elastic Observability Labs we will cover the following:

Review the techniques and tools we have available manage PII in our logs
Understand the roles of NLP / NER in PII detection
Build a composable processing pipeline to detect and assess PII
Sample logs and run them through the NER Model
Assess the results of the NER Model

In Part 2 of this blog (Coming Soo), we will cover the following:

Redact PII using NER and the redact processor
Apply field-level security to control access to the un-redacted data
Enhance the dashboards and alerts
Production considerations and scaling
How to run these processes on incoming or historical data

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
elastic		elastic
python		python
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

bvader/elastic-pii

Folders and files

Latest commit

History

Repository files navigation

Introduction:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages