Skip to content

bvader/elastic-pii

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction:

The prevalence of high-entropy logs in distributed systems has significantly raised the risk of PII (Personally Identifiable Information) seeping into our logs, which can result in security and compliance issues. This 2-part blog delves into the crucial task of identifying and managing this issue using Elasticsearch. We will explore using NLP (Natural Language Processing) and Pattern matching to detect, assess, and, where feasible, redact PII from logs that are being ingested into Elasticsearch.

In Part 1 of this blog, that can be found here on the Elastic Observability Labs we will cover the following:

  • Review the techniques and tools we have available manage PII in our logs
  • Understand the roles of NLP / NER in PII detection
  • Build a composable processing pipeline to detect and assess PII
  • Sample logs and run them through the NER Model
  • Assess the results of the NER Model

In Part 2 of this blog (Coming Soo), we will cover the following:

  • Redact PII using NER and the redact processor
  • Apply field-level security to control access to the un-redacted data
  • Enhance the dashboards and alerts
  • Production considerations and scaling
  • How to run these processes on incoming or historical data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •