Skip to content

imweijh/logstash-deduplication

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Link do wpisu: https://wiadrodanych.pl/big-data/deduplikacja-zdarzen-logstash-redis/

本来想借鉴此方法,后面发现用logstash内置filter:fingerprint + memcached,可以实现去重功能


  ### fingerprint 和 memcached 实现去重(dedupe)功能begin
  fingerprint {
    source => [ "cjdate", "gddm", "zqdm", "wtid", "cjid" ]
    target => "[fingerprint]"
    concatenate_sources => true
    method => "SHA1"
  }

  memcached {
      id => "memcached-getfp"
      hosts => ["127.0.0.1:11211"]
      namespace => "dedupe"
      get => {"%{fingerprint}" => "[cjexists]"}
  }

  if [cjexists] {
    mutate { add_tag => [ "dedupe" ] }
    # memcached 根据 fingerprint 如果查回来cjexists字段,表明为重复记录,加tag,也可以直接drop{}
  } else {
    memcached {
      hosts => ["127.0.0.1:11211"]
      namespace => "dedupe"
      ttl => 28800                          # ttl一定要配否则内存爆,28800秒=8小时
      set => { "gddm" => "%{fingerprint}" } # 存什么字段不重要,这里是存的股东代码
      id => "memcached-setfp"
   }
  }
  ### fingerprint 和 memcached 实现去重(dedupe)功能end
  

About

Deduplicating events in Logstash using ruby filter and Redis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%