Flume
What is Flume?
• Is a distributed service for collecting, aggregating, and
moving large data to a centralized data store
• Was developed by Apache
• Has the following features:
– Simple
– Reliable
– Fault tolerant
– Used for online analytic applications
6-2
Flume: Architecture
Source Sink
Channel
Agent
HDFS
Web
Server
6-3
Flume Sources (Consume Events)
• Avro source
• Exec source
• Spooling Directory source
• Sequence Generator source
• Syslog source
• HTTP source
Source Sink
• Custom source
Channel
Agent
Web
HDFS
Server
6-4
Flume Channels (Hold Events)
• Memory channel
• JDBC channel
• File channel
• Custom channel
Source Sink
Channel
Agent
Web
HDFS
Server
6-5
Flume Sinks (Deliver Events)
• HDFS sink
• Logger sink
• Avro sink
• IRC sink
• File Roll sink
• Null sink Source Sink
• HBase sink
• AsyncHBaseSink
• ElasticSearchSink Channel
• Custom sink Agent
Web HDFS
Server
6-6
Configuring Flume
1. Create a configuration file (flume.conf).
2. Store the file in the flume-ng/conf directory.
3. Configure individual components.
4. (Optional) Edit flume-env.sh.
5. Verify the installation by running the following command:
$ flume-ng help
6-7