Slips Bigdata

Uploaded by

Nandini Tati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

Slips Bigdata

Uploaded by

Nandini Tati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

The Directed Acyclic Graph (DAG) architecture is a fundamental component of Spark's execution model,

playing a crucial role in optimizing and parallelizing data processing tasks. Let's break down what it is and
how it works:
A DAG is a graph (a set of nodes and edges) with two key properties:
 Directed: The edges have a direction, indicating a flow from one node to another.
 Acyclic: There are no cycles in the graph, meaning you cannot start at a node and follow the edges to
return to the same node.
 Nodes: Represent RDD (Resilient Distributed Dataset) transformations or operations (e.g., map,
filter, groupBy, join).
 Edges: Represent dependencies between these transformations. An edge from node A to node B
means that the output of transformation A is required as input for transformation B.
How Spark Uses DAGs Logical Plan: When you write Spark code using transformations on RDDs or
DataFrames/Datasets, Spark doesn't execute the transformations immediately. Instead, it builds a logical
plan, which is essentially a series of transformations you've defined.
1. DAG Construction: Spark analyzes the logical plan and constructs a DAG. The DAG represents the
dependencies between the transformations. It shows the order in which the transformations need to
be executed and how the data flows between them.
2. Optimization: Spark's Catalyst optimizer analyzes the DAG to find the most efficient execution
plan. This includes: Stage creation: Spark breaks the DAG into stages. A stage is a set of tasks that
can be executed in parallel without shuffling data. Shuffles (data movement across the network)
occur between stages.
o Pipeline optimization: Spark can combine multiple transformations within a stage into a
single operation, reducing overhead.
o Operator pushdown: Where possible, Spark pushes down operations like filters to the data
source, reducing the amount of data that needs to be read and processed.
3. Physical Plan: After optimization, Spark creates a physical plan, which specifies how the
transformations will be executed on the cluster. This includes choosing specific algorithms for each
transformation and determining how the data will be partitioned and shuffled.
4. Task Scheduling: Spark's task scheduler uses the physical plan to create tasks (the smallest units of
work) and distribute them to executors on the cluster. The DAG is used to determine the order in
which tasks need to be executed, respecting the dependencies between transformations.
5. Execution: The executors execute the tasks, processing the data according to the physical plan.
Role of DAGs in Spark's Execution Model Parallelism: The DAG enables Spark to identify independent
transformations that can be executed in parallel. This significantly speeds up processing, especially for large
datasets.
 Optimization: The DAG allows Spark to optimize the execution plan by rearranging
transformations, combining operations, and minimizing data shuffling. This results in more efficient
processing.
 Fault Tolerance: The DAG helps Spark track the dependencies between transformations. If a task
fails, Spark can use the DAG to determine which other tasks need to be re-executed.
 Lazy Evaluation: Spark uses lazy evaluation, meaning that transformations are not executed until an
action (like collect, count, or write) is performed. This allows Spark to build the DAG and optimize
the execution plan before actually processing the data.
Example:
Scala
val data = sc.textFile("data.txt")
val filteredData = data.filter(_.contains("error"))
val mappedData = filteredData.map(_.toUpperCase)
val counts = mappedData.count() // Action triggers execution
1. Cluster Managers in Spark A cluster manager is responsible for allocating resources (CPU, memory) to
Spark applications. Spark supports several cluster managers:
 Standalone: A simple cluster manager that comes with Spark itself. Easy to set up for testing and
small deployments.
 YARN (Yet Another Resource Negotiator): A popular cluster manager often used in Hadoop
environments. Allows Spark to run alongside other Hadoop workloads.
 Mesos: Another general-purpose cluster manager.
 Kubernetes: A container orchestration system that can also manage Spark clusters.
2. Standalone Mode In standalone mode, the cluster manager consists of:
 Master:
o Resource Allocation: The Master is the central resource manager. It keeps track of available
resources (workers) in the cluster.
o Application Scheduling: When a Spark application (driver) submits a job, the Master
negotiates with workers to allocate resources (executors) to the application.
o Worker Management: The Master monitors the health of worker nodes. If a worker fails,
the Master can reschedule tasks on other workers.
o Application Monitoring: The Master provides a web UI that shows the status of running and
completed applications.
 Worker:
o Resource Provisioning: Workers register themselves with the Master and offer their
resources (CPU cores, memory).
o Executor Launching: When the Master allocates resources to an application, the Workers
launch executors for that application.
o Task Execution: Executors run the tasks assigned to them by the driver.
o Resource Reporting: Workers periodically report their status and resource usage to the
Master.
3. Spark on YARN When running Spark on YARN, YARN's ResourceManager acts as the cluster manager:
 ResourceManager:
o Resource Negotiation: The ResourceManager negotiates with YARN's NodeManagers to
allocate containers (which will host Spark executors) to Spark applications.
o Application Management: The ResourceManager manages the lifecycle of Spark
applications running on YARN.
o Container Allocation: It allocates containers to the Spark application's ApplicationMaster.
 NodeManager:
o Container Management: NodeManagers run on each node in the YARN cluster and manage
the containers allocated to them by the ResourceManager.
o Resource Monitoring: NodeManagers monitor the resource usage (CPU, memory) of the
containers running on their nodes.
o Container Launching/Stopping: They launch and stop containers as instructed by the
ResourceManager.
 ApplicationMaster (Spark Driver runs here):
o Resource Request: The ApplicationMaster (which runs the Spark driver program) requests
containers from the ResourceManager.
o Executor Management: The ApplicationMaster is responsible for launching and managing
the Spark executors within the allocated containers.
o Job Scheduling: The driver (running in the ApplicationMaster) schedules the tasks of the
Spark application.
4. Spark Logs Spark generates various logs that are essential for debugging and monitoring:
 Driver Logs: These logs contain information about the Spark driver program, including job
submissions, task scheduling, and application progress. They are crucial for understanding the
overall execution flow.
 Executor Logs: These logs contain information about the tasks executed by the executors, including
errors, resource usage, and performance metrics. They are helpful for diagnosing issues with
individual tasks.
 History Server Logs: After an application completes, its logs are stored in the Spark History Server,
allowing you to review them later. This is very useful for debugging completed applications.
 YARN Logs (if using YARN): YARN logs provide information about container allocation, resource
usage, and application status from the YARN perspective.
Log Levels: Spark uses standard log levels (DEBUG, INFO, WARN, ERROR) to categorize log messages.
You can configure the log level to control the verbosity of the logs.
Spark logs are invaluable for:
 Debugging: Identifying and resolving errors in your Spark applications.
 Performance tuning: Analyzing performance bottlenecks and optimizing your code.
 Monitoring: Tracking the progress of running applications and identifying potential issues.
Spark performance tuning, stream processing fundamentals, and the crucial concepts of event time and
stateful processing.
1. Spark Performance Tuning Optimizing Spark applications is crucial for efficient execution. Here's a
breakdown of key areas:
 Data Partitioning: Importance: Distributing data evenly across the cluster is essential for parallel
processing. Uneven partitioning (data skew) can lead to bottlenecks.
o Strategies: Increase partitions: If your data isn't well-distributed, increase the number of
partitions using repartition() or coalesce(). repartition() shuffles the data, while coalesce()
only reduces the number of partitions (can be faster if reducing).
 Partitioning by key: If you frequently perform operations like groupBy(), partition
your data by the key you'll be grouping by. This minimizes shuffling during
aggregations.
 Salting: For extreme skew, where some keys are vastly more frequent than others,
"salting" can help. This involves adding random prefixes to the skewed keys to
distribute them more evenly.
 Data Serialization:
o Importance: Efficient serialization reduces the amount of data that needs to be shuffled
across the network.
o Best Practices:
 Kryo: Use Kryo serialization, which is significantly faster and more compact than
Java serialization. Configure it in your SparkConf.
 Avoid Java serialization: Stick to Kryo unless you have a specific reason to use Java
serialization.
 Caching:
o Importance: Caching frequently used DataFrames or RDDs in memory can dramatically
speed up processing.
o Strategies:
 cache() or persist(): Use these methods to cache data. persist() allows you to specify
storage levels (memory only, memory and disk, etc.).
 Use judiciously: Don't cache everything. Cache only the data that is reused multiple
times. Caching too much can lead to memory pressure and garbage collection issues.
 Shuffling:
o Importance: Shuffle operations (like groupBy(), join(), reduceByKey()) are expensive
because they require data to be moved across the network.
o Minimization: Try to minimize shuffles by:
 Broadcasting small tables: If you're joining a large table with a small table,
broadcast the small table to all executors to avoid a full shuffle. Use broadcast() from
org.apache.spark.sql.functions.
 Pre-aggregating: If possible, perform aggregations before shuffling.
 Resource Allocation:
o Executors: Configure the number and size of executors appropriately. More executors can
increase parallelism, but too many can lead to overhead.
o Cores: Adjust the number of cores per executor.
o Memory: Allocate sufficient memory to executors and drivers.
 Code Optimization:
o Avoid unnecessary shuffles: Be mindful of operations that trigger shuffles.
o Use efficient data structures: Use appropriate data structures in your transformations.
o Optimize data formats: Parquet or ORC files are generally more efficient than text files for
analytical queries.
 Monitoring and Profiling:
o Spark UI: Use the Spark UI to monitor the performance of your applications and identify
bottlenecks.
o Profiling tools: Use profiling tools to analyze the performance of your code.
2. Stream Processing Fundamentals
 Micro-batch vs. Continuous Streaming:
o Micro-batch: The stream is divided into small batches, and each batch is processed as a
separate job. Good for throughput, reasonable latency.
o Continuous Streaming: Data is processed more continuously, with lower latency.
 Exactly-Once Semantics: Ensures that each record in the stream is processed exactly once, even in
the presence of failures. Structured Streaming provides this guarantee.
 Fault Tolerance: Streaming systems should be fault-tolerant, meaning they can recover from
failures without losing data. Checkpointing is a key mechanism for fault tolerance.
 Windowing: Allows you to perform operations on data within specific time intervals (e.g.,
calculating the average temperature over the last hour).
 State Management: Streaming applications often need to maintain state (e.g., running counts,
aggregations). This state needs to be managed reliably.
3. Event Time and Stateful Processing
 Event Time:
o Importance: The time an event actually occurred, as opposed to the time it was ingested by
the system (processing time). Crucial for accurate analysis, especially when dealing with late
data.
o Use: Use event time for windowing and other time-based operations. This ensures that data is
processed according to when it happened, not when it arrived.
 Stateful Processing:
o Necessity: Many streaming applications require state to perform operations. Examples
include:
 Aggregations: Maintaining running totals, counts, etc.
 Joins: Buffering data from one stream until a matching record arrives from another
stream.
 Deduplication: Tracking which records have already been processed.
o Management: Spark provides mechanisms to manage state reliably and fault-tolerantly:
 State Store: Spark stores state in a fault-tolerant state store (e.g., HDFS, RocksDB).
 Checkpoints: The state store is checkpointed periodically to allow for recovery in
case of failures.
o Considerations:
 State size: Large state can impact performance. Try to minimize the amount of state
you need to maintain.
 State cleanup: Clean up state that is no longer needed to prevent unbounded growth.
Time-to-live (TTL) can be used to remove old state.

Watermarks are crucial in handling late data in streaming applications, especially when dealing with
windowed aggregations.
The Problem: Late Data
In real-world streaming scenarios, data doesn't always arrive in perfect order. Sometimes, data for an earlier
time window might arrive after the window has already been processed. This is called "late data."
If you're performing aggregations over time windows (e.g., calculating the total sales for each product every
10 minutes), late data can cause incorrect results. If a sale for a product arrives late, after the 10-minute
window has closed, that sale won't be included in the total, leading to an undercount.
How Watermarks Solve the Problem
Watermarks provide a mechanism to deal with late data gracefully. They essentially tell the streaming
system: "We expect data up to this point in time. Data arriving after this point, for earlier windows, is
considered late, and we'll handle it accordingly."
Here's how watermarks work:
1. Defining a Watermark: You configure a watermark by specifying a delay threshold. For example,
you might set a watermark of "10 minutes." This means: "We expect all data for a given time
window to arrive within 10 minutes of the end of that window."
2. Tracking Event Time: The streaming system tracks the event time of each data record (the time the
event actually occurred, not the time it arrived in the system).
3. Watermark Generation: Based on the observed event times and the delay threshold, the system
generates a watermark. The watermark represents the earliest point in time for which we are
reasonably confident that all data has arrived. It's calculated as: Watermark = Max(Event Time
Observed) - Delay Threshold
4. Window Closure: The streaming system uses the watermark to determine when a window can be
considered complete. A window is closed when the watermark passes the end time of that window.
For example, if you have a 10-minute window from 10:00 to 10:10, and the watermark reaches
10:10, the system knows it has likely received all the data for that window (within the 10-minute
tolerance).
5. Handling Late Data: Data that arrives after the watermark has passed the end time of its window is
considered late. You have a few options for how to handle it:
o Drop it: The simplest approach, but you lose the late data.
o Update the results (if supported): Some aggregations (e.g., count, sum) can be updated
even after the window has closed. This is more complex but provides more accurate results.
Structured Streaming supports this with outputMode set to update or complete.
o Store it for later processing: You could store the late data in a separate location and process
it later.
 Accuracy: Watermarks improve the accuracy of windowed aggregations by providing a way to
account for late data.
 Completeness: They help ensure that you're not missing significant portions of your data in your
analysis.
 Efficiency: They allow the streaming system to close windows and release resources, preventing
unbounded memory usage.
 Control: They give you control over how to handle late data, allowing you to choose the best
strategy for your application.
Example:
Let's say you have a 10-minute window and a watermark of 5 minutes.
 Data arrives for the 10:00-10:10 window.
 The watermark progresses as new data arrives.
 At 10:15, the watermark reaches 10:10. The 10:00-10:10 window is closed.
 If data for the 10:00-10:10 window arrives after 10:15, it's considered late.

Structured Streaming in Apache Spark, covering its core concepts, practical usage, transformations, and
input/output mechanisms.
1. Core Concepts
 Unbounded Tables: The fundamental idea is to treat a data stream as a continuously growing
(unbounded) table. New data arriving is like adding rows to this table. This allows you to use
familiar table-like operations (SQL queries, DataFrame/Dataset APIs) on streaming data.
 Incremental Processing: Instead of re-processing the entire stream every time new data arrives,
Structured Streaming performs incremental processing. It only processes the new data since the last
processing cycle. This is much more efficient.
 Micro-batch or Continuous Streaming:
o Micro-batch: The stream is divided into small, discrete batches (like tiny tables). Spark
processes each batch as a separate job. This provides good throughput and reasonable latency
(often in the hundreds of milliseconds to seconds range).
o Continuous Streaming: A newer mode that processes data more continuously with lower
latency (often in the milliseconds range). It avoids the batching overhead.
 DataFrame/Dataset APIs: You use the same DataFrame and Dataset APIs (familiar from batch
Spark) to perform operations on the streaming data. This unifies batch and stream processing,
making it easier to learn and use.
 State Management: Streaming applications often need to maintain state (e.g., running counts,
aggregations over time windows). Structured Streaming provides mechanisms to manage this state
reliably and fault-tolerantly.
2. Structured Streaming in Action (Example)
Imagine a stream of sales data. Each record might contain (product_id, quantity, timestamp). You could use
Structured Streaming to:
1. Read the stream: Read the incoming sales data from a source like Kafka.
2. Transform: Calculate the total sales for each product within a 10-minute window.
3. Write: Store the aggregated sales data in a database or display it on a dashboard.
Structured Streaming handles the complexities of:
 Windowing: Grouping sales by product and time window.
 State Management: Maintaining the running totals for each product.
 Fault Tolerance: Restarting processing from the last checkpoint if the application fails.
3. Transformations on Streams
Just like with batch DataFrames/Datasets, you can apply a wide range of transformations to streaming
DataFrames/Datasets:
 map(): Apply a function to each element.
 filter(): Keep only elements that satisfy a condition.
 groupBy(): Group data based on one or more columns. Crucial for aggregations.
 agg(): Perform aggregations (e.g., count(), sum(), avg()) on grouped data.
 join(): Combine data from two streams (or a stream and a static DataFrame). More complex, but
powerful.
 window(): Define time windows (e.g., tumbling, sliding) for aggregations. Essential for time-based
analysis.
 flatMap(): Similar to map(), but can produce multiple output elements for each input element.
Example Transformation (Windowed Count):
Scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
// ... (Read the stream into a DataFrame called 'sales') ...
val windowedCounts = sales
.groupBy("product_id", window("timestamp", "10 minutes")) // Define a 10-minute window
.agg(sum("quantity").alias("total_sales")) // Calculate total sales in each window
// ... (Write the results) ...
4. Input and Output
 Input Sources: Structured Streaming can read data from various sources:
o File-based: Text files, Parquet, JSON, etc. (for batch or micro-batch streaming)
o Message queues: Kafka (most common), Kinesis
o Sockets: For testing or simple streams
o Other: Custom sources can be developed.
 Output Sinks: Where the processed streaming data is written:
o File-based: Same formats as input.
o Databases: JDBC connections.
o Console: For debugging and simple applications.
o Foreach: Allows custom logic to be applied to each micro-batch. Very flexible.
o Memory: For testing.
o Other: Custom sinks can be developed.
 Checkpointing: Essential for fault tolerance. Spark saves the state of the streaming application
periodically, so it can restart from the last checkpoint in case of failures.
 Watermarking: Helps manage late data in windowed aggregations. It tells Spark when it can be
reasonably sure that all data for a particular window has arrived.
 Latency vs. Throughput: Micro-batch streaming generally provides higher throughput but higher
latency. Continuous streaming aims for lower latency but might have slightly lower throughput. The
choice depends on your application's requirements.

Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Unit V
No ratings yet
Unit V
35 pages
BDA Lec8
No ratings yet
BDA Lec8
39 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
Learn by Doing It
No ratings yet
Learn by Doing It
9 pages
Spark Everything
No ratings yet
Spark Everything
34 pages
Architecture and Components of Spark
No ratings yet
Architecture and Components of Spark
6 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
19 pages
Pyspark 1
No ratings yet
Pyspark 1
19 pages
Spark Notes
No ratings yet
Spark Notes
37 pages
Spark Notes
No ratings yet
Spark Notes
19 pages
Spark Architecture for Developers
No ratings yet
Spark Architecture for Developers
7 pages
4 Spark SBP
No ratings yet
4 Spark SBP
74 pages
Spark Architecture 1731099692
No ratings yet
Spark Architecture 1731099692
10 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
Unit 5 Note
No ratings yet
Unit 5 Note
18 pages
Spark Intro
No ratings yet
Spark Intro
24 pages
Apache Spark Overview & Features
No ratings yet
Apache Spark Overview & Features
65 pages
Spark Runtime Architecture Overview
No ratings yet
Spark Runtime Architecture Overview
5 pages
Bda Unit 5 - Mam
No ratings yet
Bda Unit 5 - Mam
44 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Spark
No ratings yet
Spark
7 pages
Cerificate Report Sharique
No ratings yet
Cerificate Report Sharique
12 pages
HDP Training Tesco - II Notes
No ratings yet
HDP Training Tesco - II Notes
250 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
Unit - 4
No ratings yet
Unit - 4
49 pages
Spark Jobs and APIs Spark 2 0 Architecture 1732231641
No ratings yet
Spark Jobs and APIs Spark 2 0 Architecture 1732231641
26 pages
Spark Guide for 4th Year Engineering Students
No ratings yet
Spark Guide for 4th Year Engineering Students
241 pages
Action DAG Scheduler. Stages: Cluster Manager
No ratings yet
Action DAG Scheduler. Stages: Cluster Manager
2 pages
Spark: Unified Computing Engine
No ratings yet
Spark: Unified Computing Engine
55 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
10 pages
Bda U4
No ratings yet
Bda U4
49 pages
Lecturer 5
No ratings yet
Lecturer 5
21 pages
Spark Class 1 PPT
No ratings yet
Spark Class 1 PPT
33 pages
Spark Class 1
No ratings yet
Spark Class 1
33 pages
Spark & Databricks Guide for Developers
No ratings yet
Spark & Databricks Guide for Developers
71 pages
Spark Architecture Explained
100% (1)
Spark Architecture Explained
12 pages
SPARK Question Answers
No ratings yet
SPARK Question Answers
19 pages
Spark Architecture and Deploy Modes
No ratings yet
Spark Architecture and Deploy Modes
22 pages
Complete Spark & Azure Databricks Interview Guide - Claude
No ratings yet
Complete Spark & Azure Databricks Interview Guide - Claude
46 pages
Final Note
No ratings yet
Final Note
31 pages
Apache Spark
No ratings yet
Apache Spark
3 pages
Big Data Technology: Vietnam National University of HCMC
No ratings yet
Big Data Technology: Vietnam National University of HCMC
39 pages
BDA GTU Study Material Presentations Unit-6 03102021061221PM
No ratings yet
BDA GTU Study Material Presentations Unit-6 03102021061221PM
23 pages
Big Data With Spark Detailed Presentation
No ratings yet
Big Data With Spark Detailed Presentation
13 pages
Unit 4 Topic 4 Capped Collections Spark
No ratings yet
Unit 4 Topic 4 Capped Collections Spark
30 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
43 pages
Lecture 25
No ratings yet
Lecture 25
59 pages
Spark Questions Imp
No ratings yet
Spark Questions Imp
33 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
Understanding Apache Spark Architecture
No ratings yet
Understanding Apache Spark Architecture
30 pages
Data Engineers Guide Apache Spark Delta Lake v3
No ratings yet
Data Engineers Guide Apache Spark Delta Lake v3
94 pages
Bda Unit5
No ratings yet
Bda Unit5
11 pages
Spark Databricks Summary
80% (5)
Spark Databricks Summary
100 pages
Real Time Processing Framework
No ratings yet
Real Time Processing Framework
9 pages
SPARK
No ratings yet
SPARK
125 pages
EBI R500.1 CU5 Software Release Bulletin
No ratings yet
EBI R500.1 CU5 Software Release Bulletin
43 pages
Spring Boot Interview Questions
No ratings yet
Spring Boot Interview Questions
154 pages
Unit 3 Bacn
No ratings yet
Unit 3 Bacn
56 pages
Prashant Sharma
No ratings yet
Prashant Sharma
6 pages
Traditional Vs Machine Learning Programming
No ratings yet
Traditional Vs Machine Learning Programming
2 pages
Mobile Computing 2
No ratings yet
Mobile Computing 2
17 pages
(Matrikon) OPC Server For Modbus Devices User's Ma (B-Ok - Xyz)
No ratings yet
(Matrikon) OPC Server For Modbus Devices User's Ma (B-Ok - Xyz)
69 pages
Manuscript
No ratings yet
Manuscript
57 pages
LEP Hardware and Software Maintenance Policy
No ratings yet
LEP Hardware and Software Maintenance Policy
3 pages
Suraj 2024 Resume
No ratings yet
Suraj 2024 Resume
5 pages
Performance Engineering Interview Questions Part 1 1741841476
No ratings yet
Performance Engineering Interview Questions Part 1 1741841476
16 pages
SB8015 Correct
No ratings yet
SB8015 Correct
70 pages
EDI: Streamlining Business Transactions
No ratings yet
EDI: Streamlining Business Transactions
28 pages
Nagasruthi Sailpoint 5+yrs
No ratings yet
Nagasruthi Sailpoint 5+yrs
2 pages
The 3 Aspects of Network Performance Management
No ratings yet
The 3 Aspects of Network Performance Management
24 pages
Sap Apo Cif
No ratings yet
Sap Apo Cif
5 pages
Work From Home Accomplishment Report and Weekly Plan
No ratings yet
Work From Home Accomplishment Report and Weekly Plan
2 pages
IOT Data Management and Analytics
No ratings yet
IOT Data Management and Analytics
27 pages
College Managemenrt System
No ratings yet
College Managemenrt System
55 pages
Integrative Programming Midterm 45
No ratings yet
Integrative Programming Midterm 45
25 pages
Crash Recovery
No ratings yet
Crash Recovery
3 pages
The 8 Useful Java Testing Tools
No ratings yet
The 8 Useful Java Testing Tools
4 pages
Software Development Concepts & True/False Quiz
No ratings yet
Software Development Concepts & True/False Quiz
40 pages
ITEC-75 Technopreneurship
100% (1)
ITEC-75 Technopreneurship
10 pages
Data Communication Basics & Protocols
No ratings yet
Data Communication Basics & Protocols
15 pages
Full Name: Career Objective
No ratings yet
Full Name: Career Objective
2 pages
Etical Hacking
No ratings yet
Etical Hacking
60 pages
Alagu Maharaja
No ratings yet
Alagu Maharaja
3 pages
Cyber Security Training Guide
No ratings yet
Cyber Security Training Guide
1 page
OOP Concepts for C++ Students
No ratings yet
OOP Concepts for C++ Students
13 pages

Slips Bigdata

Uploaded by

Slips Bigdata

Uploaded by

The Directed Acyclic Graph (DAG) architecture is a fundamental component of Spark's execution model,

You might also like