TRabl StreamProcessing
TRabl StreamProcessing
Tilmann Rabl
Berlin Big Data Center
www.dima.tu-berlin.de | bbdc.berlin | rabl@tu-berlin.de
1 © 2013 Berlin Big Data Center • All Rights Reserved © DIMA 2017
Agenda
Introduction to Streams
• Use cases
• Stream Processing 101
• Traffic Monitoring
– High event rates: millions events / sec
– High query rates: thousands queries / sec
– Queries: filtering, notifications, analytical
Source: http://theroadtochangeindia.wordpress.com/2011/01/13/better-roads/
1Cobb: http://www.hybridcars.com/tech-experts-put-the-brakes-on-autonomous-cars/
4
4 © DIMA 2017
Why is this hard?
7 © DIMA 2017
What is a Stream?
• Unbounded data
– Conceptually infinite, ever growing set of data items / events
– Practically continuous stream of data, which needs to be processed / analyzed
• Push model
– Data production and procession is controlled by the source
– Publish / subscribe model
• Concept of time
– Often need to reason about when data is produced and when processed data should be
output
– Time agnostic, processing time, ingestion time, event time
This part is largely based on Tyler Akidau‘s great blog on streaming - https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
8
8 © DIMA 2017
Stream Models
S = si, si+1, … si = <data item, timestamp>
• Turnstile
– Elements can come and go
– Underlying model is a vector of elements (domain)
– si is an update (increment or decrement) to a vector element
– Traditional database model
– Flexible model for algorithms
• Cash register
– Similar to turnstile, but elements cannot leave
• Time series
– si is is a new vector entry
– Vector is increasing
– This is what all big stream processing engines use
9
9 © DIMA 2017
Event Time
• Event time
– Data item production time
• Ingestion time
– System time when data item is received
• Processing time
– System time when data item is processed
11
11 © DIMA 2017
Time Agnostic Processing II
12
12 © DIMA 2017
Approximate Processing
13
13 © DIMA 2017
Windows
• Fixed
– Also tumbling
• Sliding
– Also hopping
• Session
– Based on activity
14
14 © DIMA 2017
Processing Time Windows
15
15 © DIMA 2017
Event Time Windows
16
16 © DIMA 2017
Basic Stream Operators
• Windowed Aggregation
– E.g., average speed
– Sum of URL accesses
– Daily highscore
Aggregate
• Windowed Join
– Correlated observations in timeframe
– E.g., temperature in time
9
12
10
17
17 © DIMA 2017
Flink’s Windowing
• Windows can be any combination of (multiple) triggers & evictions
– Arbitrary tumbling, sliding, session, etc. windows can be constructed.
• Examples:
dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)));
dataStream.keyBy(0).window(TumblingEventTimeWindows.of(Time.seconds(5)));
18
18 © DIMA 2017
Example Analysis: Windowed Aggregation
(2) StockPrice(HDP, 23.8)
StockPrice(SPX, 2113.9)
(4) StockPrice(FTSE, 6931.7)
StockPrice(HDP, 25.2)
22 © DIMA 2017
8 Requirements of Big Streaming
• Keep the data moving • Integrate stored and streaming data
– Streaming architecture – Hybrid stream and batch
25
25 © DIMA 2017
Map Reduce
• Framework / programming model by Google
– Presented 2004 at OSDI'04
• Inspired by map and reduce functions in functional languages / MPI
– Second order functions
• Simple parallelization model for shared nothing architectures (“commodity hardware”)
• Apache Hadoop
– Open-source implementation
– Initiated at Yahoo
Map: Computation Reduce: Aggregation
For each input create list of output values Combine all intermediate values for one key
Example: Example:
For each word in a sentence emit a k/v pair Sum up all values for the same key
indicating one occurrence of the word (“Hello”,(“1”, “1”, “1”, “1”)) -> (“Hello”,(“4”))
(key, “hello world”) -> (“hello”,”1”), (“world”,”1”) Signature
Signature reduce (key, list(value)) -> list(value’)
map (key, value) -> list(key’, value’)
26
26 © DIMA 2017
MR Data Flow
k a a 1
k b MAP b 1
a 1 1 1 REDUCE a 3
k a a 1
b 1 1 b 2
k c MAP c 1 REDUCE
c 1 c 1
k e e 1
k a a 1 d 1 e 2
k d MAP REDUCE
d 1 e 1 1 d 1
k e e 1
27
27 © DIMA 2017
MR / Batch Processing
28
28 © DIMA 2017
MR / Batch Processing
29
29 © DIMA 2017
MR / Batch Window Processing
30
30 © DIMA 2017
MR Discussion
31
31 © DIMA 2017
How to keep data moving?
Discretized Streams (mini-batch)
Stream
discretizer
while (true) { Job Job Job Job
// get next few records
// issue batch computation
}
Native streaming
Long-standing
while (true) { operators
// process next record
}
32
32 © DIMA 2017
Discussion of Mini-Batch
• Easy to implement
• Easy consistency and fault-tolerance
• Hard to do event time and sessions
35
35 © DIMA 2017
Handle Imperfections – Watermarks
• Data items arrive early, on-time, or late
• Solution: Watermarks
– Perfect or heuristic measure on when window is complete
3737
37 © DIMA 2017
Lessons Learned from Batch
batch-2 batch-1
© DIMA 201738
38
38
Taking Snapshots – the naïve way
t1 t2
execution snapshots
Initial approach (e.g., Naiad)
• Pause execution on t1,t2,..
• Collect state
• Restore execution
© DIMA 201739
39
39
Asynchronous Snapshots in Flink
snapshotting snapshotting
t1 t2
Propagating markers/barriers
snap - t1 snap - t2
[Carbone et. al. 2015] “Lightweight Asynchronous Snapshots for Distributed Dataflows”, Tech. Report. http://arxiv.org/abs/1506.08603
40
40 © DIMA 2017
Automatic partitioning and scaling
• 3 Types of Parallelization
41
41 © DIMA 2017
Big Data Streaming Systems
42 © DIMA 2017
Streaming Systems Overview
Closed Source Open Source
Cloud DataFlow
(BigTable)
Naiad
StreamInsights Streaming
Systems
InfoSphere
Stream
Processing Esper
Language (SPL) Academia Aurora
AWS
NiagaraCQ
Kinesis
CQL
43
43 © DIMA 2017
Closed Source/Commercial Systems
Cloud DataFlow: • Unified primitives for batch and stream processing
• Runs in Google‘s cloud only
• Open Source SDK (programs can run on other systems)
• Check out the Apache Beam Project! (http://beam.apache.org/)
44
44 © DIMA 2017
Open Source Systems by Apache (1/2)
• Reliable handling of huge numbers of concurrent reads and writes
• Can be used as data-source / data-sink for Storm, Samza, Flink, Spark and many more systems
• Fault tolerant: Messages are persisted on disk and replicated within the cluster. Messages (reads and
writes) can be repeated
45
45 © DIMA 2017
Open Source Systems by Apache (2/2)
Spark implements a batch execution engine
• The execution of a job graph is done in stages
• Operator outputs are materialized in memory (or disk) until the consuming operator is
ready to consume the materialized data
Spark uses Discretized Streams (D-Streams)
• Streams are interpreted as a series of deterministic batch-processing jobs
• Micro batches have a fixed granularity
• All windows defined in queries must be multiples of this granularity
Aurora
• First design and implementation that parallelizes stream computation including rich operation and windowing semantics
• Windows are always specified as ranges on some measure
• Was continued in Borealis Project
NiagaraCQ
• Focuses more on scalability than on the flexibility
• Provides various optimizations techniques to share common computation within and across queries
• Only time-based windows are possible
CQL
• Continuous query language
• Implemented by the STREAM DSMS at Stanford
• Captures a wide range of streaming application in an SQL-like query language
47
47 © DIMA 2017
Cloud-Based Streaming Systems (example)
48
48 © DIMA 2017
Storm, Spark Streaming, and Flink
49 © DIMA 2017
Big Data Analytics Ecosystem
Hive Cascading Giraph
Applications &
Languages Mahout Pig Crunch
MapReduce Flink
Data processing
engines Spark Storm Tez
52
52 © DIMA 2017
Storm Bolt Example
Source: https://storm.apache.org/documentation/Tutorial.html
53
53 © DIMA 2017
Building a Storm Topology
1) Use the TopologyBuilder class to connect spouts and bolts:
builder.setSpout(“name”,new MySpout());
builder.setBolt(“name”,new MyBolt());
54
54 Source: Allen et al., Storm Applied: Strategies for Real-Time Event Processing © DIMA 2017
Storm – Trident
• High-level abstraction built on top of Storm core:
– operators like filter, join, groupBy, ...
• Stream-oriented API + UDFs
• Stateful, incremental processing
• Micro-Batch oriented (ordered & partitionable)
• Exactly-once semantics
• Trident topology compiled into spouts and bolt
55
55 © DIMA 2017
Storm – Heron
• New real-time streaming system based on Storm
• Introduced June 2015 by Twitter (SIGMOD)
• Fully compatible with Storm API
• Container-based implementation
• Back pressure mechanism
• Easy debugging of heron topologies through UI
• better performance than Storm (latency + throughput)
56
56 © DIMA 2017
Apache Spark
• In memory abstraction for big data processing
– Resilient Distributed Data Sets
– Fault-tolerance through lineage
– Richt APIs for all kind of processing
Client
Loop outside the system, in
Step Step Step Step Step driver program
57
57 © DIMA 2017
Spark Job
58
58 © DIMA 2017
Spark Streaming
59
59 © DIMA 2017
Apache Flink
Apache Flink is an open source platform for scalable batch and stream data processing.
http://flink.apache.org
60
60 © DIMA 2017
Architecture
• Hybrid MapReduce and MPP database runtime
• Pipelined/Streaming engine
– Complete DAG deployed
Worker 1 Worker 2
Job Manager
Worker 3 Worker 4
61
61 © DIMA 2017
Built-in vs. driver-based looping
Client
Loop outside the system,
in driver program
Step Step Step Step Step
Iterative program looks
Client like many independent
jobs
Step Step Step Step Step
63 63
63 © DIMA 2017
Some Benchmark Results
Initially performed by Yahoo!
Engineering, Dec 16, 2015,
http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
https://data-artisans.com/extending-the-yahoo-streaming-benchmark/
64 © DIMA 2017
Processing Time vs Event Time
DEMO - STREAMING
„Inspired“ by
https://github.com/dataArtisans/oscon
65 © 2013 Berlin Big Data Center • All Rights Reserved
65 © DIMA 2017
Stream Optimizations
66 © DIMA 2017
Overview
• 11 Optimizations (numbered from 2 to 12 )
67
67 © DIMA 2017
Reordering and Elimination
68
68 © DIMA 2017
Operator Separation
69
69 © DIMA 2017
Fusion
70
70 © DIMA 2017
Fission
71
71 © DIMA 2017
Placement
72
72 © DIMA 2017
Load Balancing
73
73 © DIMA 2017
State Sharing
Chaining again…
Share memory among several operators instead of copying the data
74
74 © DIMA 2017
Batching
* Zaharia, Matei, et al. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the
4th USENIX conference on Hot Topics in Cloud Ccomputing. USENIX Association, 2012. S. 10-10.
https://www.usenix.org/system/files/conference/hotcloud12/hotcloud12-final28.pdf
75
75 © DIMA 2017
Algorithm Selection & Load Shedding
76
76 © DIMA 2017
Cost Model
• Traditional cost-based query optimization is based on cardinality estimation
➟ inadequate for unbounded streams
• Possible solution: rate-based cost estimation
– (Viglas et al.: Rate-based query optimization for streaming information sources, SIGMOD 2002)
• Challenges:
– Fluctuating streams
– Data-parallel processing
78
78 © DIMA 2017
Thank You
Contact:
Tilmann Rabl
rabl@tu-berlin.de We are hiring!
79 © 2013 Berlin Big Data Center • All Rights Reserved © DIMA 2017
Further Reading
Historical papers on STREAM, Aurora, TelegraphCQ, Borealis, CQL, ...
• Papers and blogs on Storm, Heron, Flink, Spark Streaming, ...
• Alexandrov, Alexander, et al. The Stratosphere platform for big data analytics. The VLDB Journal-The International Journal on Very Large Data Bases, 2014, 23. Jg., Nr.
6, S. 939-964.
• Zaharia, Matei, et al. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX conference on
Hot Topics in Cloud Ccomputing. USENIX Association, 2012. S. 10-10.
• Murray, Derek G., et al. Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013. S. 439-
455.
Windows & Semantics
• Ghanem et al.: Incremental Evaluation of Sliding-Window Queries over Data Streams, TKDE 19(1), 2007
• Tucker et al.: Exploiting Punctuation Semantics in Continuous Data Streams, TKDE 15(3), 2003
• Krämer et al.: Sematics and Implementation of Continuous Sliding Window Queries over Data Streams, TODS 34(1), 2009
• Botan et al.: SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems, VLDB 2010
CEP:
• Wu et al.: High-Performance Complex Event Processing over Streams, SIGMOD 2006
• Schultz-Moeller et al.: Distributed Complex Event Processing with Query Rewriting, DEBS 2009
Fault Tolerance:
• Hwang et al.: High-availability algorithms for distributed stream processing, ICDE 2005
• Zaharia et al.: Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. HotCloud, 2012.
• Fernandez et al.: Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management, SIGMOD 2013
Partitioning & Optimization:
• Hirzel et al.: A Catalog of Stream Processing Optimizations, ACM Comp. Surveys 46(4), 2014.
• Gedik et al.: Elastic Scaling for Data Streams, TPDS 25(6), 2014.
• Viglas et al.: Rate-Based Query Optimization for Streaming Information Sources, SIGMOD 2002
80
80 © DIMA 2017