Action DAG Scheduler. Stages: Cluster Manager

The document explains how Apache Spark utilizes a Directed Acyclic Graph (DAG) for optimizing data processing by creating an operator graph and dividing tasks into stages for efficient execution. Unlike Hadoop MapReduce, which processes tasks independently and can waste resources, Spark's DAG allows for better global optimization and minimizes data shuffling. This results in faster computation times, especially for complex jobs, by allowing parallel execution of tasks based on RDD partitions.

Uploaded by

Akshay Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Action DAG Scheduler. Stages: Cluster Manager

Uploaded by

Akshay Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

● The interpreter is the first layer, using a Scala interpreter, Spark interprets the

code with some modifications.

● Spark creates an operator graph when you enter your code in Spark console.
● When we call an Action on Spark RDD at a high level, Spark submits the
operator graph to the DAG Scheduler.
● Divide the operators into stages of the task in the DAG Scheduler. A stage
contains task based on the partition of the input data. The DAG scheduler
pipelines operators together. For example, map operators schedule in a single
stage.
● The stages pass on to the Task Scheduler. It launches task through cluster
manager. The dependencies of stages are unknown to the task scheduler.
● The Workers execute the task on the slave.
DAG a finite direct graph with no directed cycles. There are finitely many vertices and
edges, where each edge directed from one vertex to another. It contains a sequence of vertices
such that every edge is directed from earlier to later in the sequence. It is a strict
generalization of MapReduce model. DAG operations can do better global optimization than
other systems like MapReduce. The picture of DAG becomes clear in more complex jobs.
Apache Spark DAG allows the user to dive into the stage and expand on detail on any stage.
In the stage view, the details of all RDDs belonging to that stage are expanded. The
Scheduler splits the Spark RDD into stages based on various transformation applied. (You
can refer this link to learn RDD
Transformations and Actions in detail) Each stage is comprised of tasks, based on the
partitions of the RDD, which will perform same computation in parallel. The graph here
refers to navigation, and directed and acyclic refers to how it is done.

Need for DAG

The limitations of Hadoop MapReduce became a key point to introduce DAG in Spark. The
computation through MapReduce in three steps:

● The data is read from HDFS.

● Then apply Map and Reduce operations.
● The computed result is written back to HDFS.

Each MapReduce operation is independent of each other and HADOOP has no idea of which
Map reduce would come next. Sometimes for some iteration, it is irrelevant to read and write
back the immediate result between two map-reduce jobs. In such case, the memory in stable
storage (HDFS) or disk memory gets wasted.

In multiple-step, till the completion of the previous job all the jobs block from the beginning.
As a result, complex computation can require a long time with small data volume.

While in Spark, a DAG (Directed Acyclic Graph) of consecutive computation stages is

formed. In this way, we optimize the execution plan, e.g. to minimize shuffling data around.
In contrast, it is done manually in MapReduce by tuning each MapReduce step.

BDA Lec8
No ratings yet
BDA Lec8
39 pages
Learn by Doing It
No ratings yet
Learn by Doing It
9 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
SPARK Architecture
No ratings yet
SPARK Architecture
22 pages
Slips Bigdata
No ratings yet
Slips Bigdata
6 pages
Architecture and Components of Spark
No ratings yet
Architecture and Components of Spark
6 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
8 Apache Spark
No ratings yet
8 Apache Spark
25 pages
Unit V
No ratings yet
Unit V
35 pages
Spark Everything
No ratings yet
Spark Everything
34 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
19 pages
Lecturer 5
No ratings yet
Lecturer 5
21 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
Spark Jobs and APIs Spark 2 0 Architecture 1732231641
No ratings yet
Spark Jobs and APIs Spark 2 0 Architecture 1732231641
26 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
SPARK
No ratings yet
SPARK
125 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
BDA Unit III
No ratings yet
BDA Unit III
19 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Unit 5 Note
No ratings yet
Unit 5 Note
18 pages
Spark
No ratings yet
Spark
7 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Spark Intro
No ratings yet
Spark Intro
24 pages
Interview Question Spark Day1
No ratings yet
Interview Question Spark Day1
3 pages
Comp9313: Big Data Management: Introduction To Mapreduce and Spark
No ratings yet
Comp9313: Big Data Management: Introduction To Mapreduce and Spark
30 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
Spark Class 1 PPT
No ratings yet
Spark Class 1 PPT
33 pages
Spark Class 1
No ratings yet
Spark Class 1
33 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
8 pages
Spark Guide for 4th Year Engineering Students
No ratings yet
Spark Guide for 4th Year Engineering Students
241 pages
Spark Programming Basics
No ratings yet
Spark Programming Basics
54 pages
Apache Spark Overview & Features
No ratings yet
Apache Spark Overview & Features
65 pages
Spark Architecture for Developers
No ratings yet
Spark Architecture for Developers
7 pages
Spark & Databricks Guide for Developers
No ratings yet
Spark & Databricks Guide for Developers
71 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
43 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
96 pages
Spark Interview Prep for Telugu Speakers
100% (3)
Spark Interview Prep for Telugu Speakers
31 pages
Day 9
No ratings yet
Day 9
30 pages
Spark & RDD Guide for Developers
No ratings yet
Spark & RDD Guide for Developers
1 page
Bda Unit5
No ratings yet
Bda Unit5
11 pages
Learn Apache Spark
100% (1)
Learn Apache Spark
31 pages
Spark
No ratings yet
Spark
51 pages
BigData Spark Sparklyr
No ratings yet
BigData Spark Sparklyr
80 pages
DAG Vs MapReduce
No ratings yet
DAG Vs MapReduce
4 pages
Spark Questions Imp
No ratings yet
Spark Questions Imp
33 pages
Apache Spark IP Chatgpt 2 PDF
No ratings yet
Apache Spark IP Chatgpt 2 PDF
34 pages
Understanding Apache Spark Architecture
No ratings yet
Understanding Apache Spark Architecture
30 pages
Unit - 4
No ratings yet
Unit - 4
49 pages
Spark Slides
No ratings yet
Spark Slides
23 pages
Cluster A Personality Disorders Case Report by Slidesgo
No ratings yet
Cluster A Personality Disorders Case Report by Slidesgo
40 pages
Fake Dating Adrian Hunter - Skyla Summers
100% (1)
Fake Dating Adrian Hunter - Skyla Summers
289 pages
RSETI Model of Entrepreneurship Development
No ratings yet
RSETI Model of Entrepreneurship Development
25 pages
The New Sales Manager
No ratings yet
The New Sales Manager
14 pages
Project Management
100% (1)
Project Management
83 pages
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
50% (4)
The Effects of Watching Korean Dramas To The Academic Performance of Grade 11 GHT 2a Phinma University of Pangasinan
6 pages
Lifebuoy Soap: Present By: Bheem Soothar 1539105 Naveed Iftikhar 1539117
No ratings yet
Lifebuoy Soap: Present By: Bheem Soothar 1539105 Naveed Iftikhar 1539117
5 pages
RSF SSF Comparison
No ratings yet
RSF SSF Comparison
6 pages
iSolarCloud WEB 3.0 User Manual
No ratings yet
iSolarCloud WEB 3.0 User Manual
74 pages
Gold Tantric Sex
79% (24)
Gold Tantric Sex
183 pages
Streams and Eligibility
No ratings yet
Streams and Eligibility
7 pages
Compilation - of - Swimming - Officials - Questions - and - Answers 2
No ratings yet
Compilation - of - Swimming - Officials - Questions - and - Answers 2
13 pages
Build123d Readthedocs Io en Latest
No ratings yet
Build123d Readthedocs Io en Latest
392 pages
Sec 1 Maths WA3 Mock Exam Guide
No ratings yet
Sec 1 Maths WA3 Mock Exam Guide
7 pages
Resume - Jenica David
No ratings yet
Resume - Jenica David
1 page
Elliptic Curve Cryptography and Applications
No ratings yet
Elliptic Curve Cryptography and Applications
9 pages
Weelee Centurion - 2018 - White - Audi - A4 1.4 T FSi S-Tronic Sport - 05 May 2025
No ratings yet
Weelee Centurion - 2018 - White - Audi - A4 1.4 T FSi S-Tronic Sport - 05 May 2025
5 pages
Unit-2 Introduction To Ethernet
No ratings yet
Unit-2 Introduction To Ethernet
32 pages
Zond-12e Catalogue
100% (1)
Zond-12e Catalogue
12 pages
Note Making PDF
No ratings yet
Note Making PDF
4 pages
Vintage Lens Guillotine Shutter Guide
No ratings yet
Vintage Lens Guillotine Shutter Guide
10 pages
Walker
No ratings yet
Walker
1 page
Dr. Peggy Kern's Capstone Statistics Practice #2: The Normal Distribution & Z Scores
No ratings yet
Dr. Peggy Kern's Capstone Statistics Practice #2: The Normal Distribution & Z Scores
3 pages
MSC Nastran 2022.1 Linear Static Analysis User Guide
No ratings yet
MSC Nastran 2022.1 Linear Static Analysis User Guide
716 pages
FBSW 3000Wrms 12
No ratings yet
FBSW 3000Wrms 12
2 pages
The State of Higher Education and Training in Uganda 2013
No ratings yet
The State of Higher Education and Training in Uganda 2013
35 pages
10 Chapter 5
No ratings yet
10 Chapter 5
18 pages
01 01 25 Updated CV Samir Bougueroua For Panel Operator
No ratings yet
01 01 25 Updated CV Samir Bougueroua For Panel Operator
3 pages
Boswellia Propagation Somaliland 5 - Google
No ratings yet
Boswellia Propagation Somaliland 5 - Google
23 pages
Ak DSH CV
No ratings yet
Ak DSH CV
3 pages

Action DAG Scheduler. Stages: Cluster Manager

Uploaded by

Action DAG Scheduler. Stages: Cluster Manager

Uploaded by

● The interpreter is the first layer, using a Scala interpreter, Spark interprets the

code with some modifications.

Need for DAG

● The data is read from HDFS.

While in Spark, a DAG (Directed Acyclic Graph) of consecutive computation stages is

You might also like