0% found this document useful (0 votes)
19 views8 pages

Week5 Lesson6

The document provides an overview of Spark Streaming, highlighting its ability to process real-time analytics by converting data streams into discrete RDDs. It lists various supported streaming data sources such as Kafka, Flume, and Twitter, and explains the concept of DStreams and sliding windows for data processing. Key takeaways include the application of the same transformations on DStreams as on batch RDDs and the functionality of sliding windows for time-based calculations.

Uploaded by

keshavharpal2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Week5 Lesson6

The document provides an overview of Spark Streaming, highlighting its ability to process real-time analytics by converting data streams into discrete RDDs. It lists various supported streaming data sources such as Kafka, Flume, and Twitter, and explains the concept of DStreams and sliding windows for data processing. Key takeaways include the application of the same transformations on DStreams as on batch RDDs and the functionality of sliding windows for time-based calculations.

Uploaded by

keshavharpal2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Spark Streaming

After this video you will be able to..

• Summarize how Spark reads streaming data


• List several sources of streaming data
supported by Spark
• Describe Spark’s sliding windows
Spark
SparkSQL Streaming MLlib GraphX
Spark Streaming
Spark Core

• Scalable processing for real-


time analytics
• Data streams converted to
discrete RDDs
• Has APIs for Scala, Java, and
Python
Spark Streaming Sources
• Kafka
• Flume
• HDFS
• S3
• Twitter
• Socket
• …etc.
Creating and Processing DStreams
Streaming Source
10 9 8 7 6 5 4 3 2 1 Discretize

DStream
Transformation
RDD
RDD
RDD
RDD
RDD

DStream
Action Results
RDD
RDD
RDD
RDD
RDD
Creating and Processing DStreams
Streaming Source
10 9 8 7 6 5 4 3 2 1
Discretize

Batch length: 2 seconds


DStream
Transformation
RDD
RDD
RDD
RDD
RDD

DStream
Action Results
RDD
RDD
RDD
RDD
RDD
Creating and Processing DStreams
Streaming Source
10 9 8 7 6 5 4 3 2 1
Discretize

Batch length: 2 seconds


DStream
Transformation
RDD
RDD
RDD
RDD
RDD

Window size: 4
DStream Sliding interval: 2
Action Results
RDD
RDD
RDD
RDD
RDD
Main Take-Aways
• Spark uses DStreams to make
discrete RDDs from streaming data.
• Same transformations and calculations
applied to batch RDDs can be applied
• DStreams can create a sliding
window to perform calculations on a
window of time.

You might also like