Kate Wilson

The document discusses big data stream computing, emphasizing its importance for real-time applications where data is continuously generated and requires immediate processing. It contrasts big data stream computing with batch computing, highlighting the need for low-latency and high-throughput systems to handle vast amounts of data. The chapter also outlines the system architecture, key technologies, and application scenarios of big data stream computing, showcasing its relevance in various industries such as finance and the Internet of Things.

Uploaded by

pra2112catprep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

21 views27 pages

Kate Wilson

Uploaded by

pra2112catprep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 27

Chapter 1. Key Technologies for Big Data Stream Computing Dawei Sun, Guangyan Zhang, Weimin Zheng, and Keqin Li Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China 1.1 Introduction Big data computing is a new trend for future computing with the quantity of data growing and the speed of data increasing. In general, there are two main mechanisms for big data computing, i.e., big data stream computing and big data batch computing. Big data stream computing is a ‘model of straight through computing, such as Storm [1] and S4 [2] which do for stream computing what Hadoop does for batch computing, while big data batch comput 1g is a model of storing then ‘computing, such as MapReduce framework [3] open sourced by the Hadoop implementation [4]. Essentially, big data batch computing is not sufficient for many real-time application scenarios, where a data stream changes frequently over time, and the latest data are the most important and, most valuable. For example, when analyzing data from real-time transactions (c.g., financial trades, email messages, users search requests, sensor data tracking), a data stream grows monotonically over time as more transactions take place. Idcally, a real-time application environment can be supported by big data stream computing. Generally, big data streaming computing has the following defining characteristics [5, 6]. (1) The input data stream is a real-time data stream and needs real-time computing, and the results must be updated every time the data changes. (2) Incoming data arrive continuously at volumes that far exceed the capabilities of individual ‘machines. (3) Input streams incur multi-staged computing at low latency to produce output streams, where any incoming data entry is ideally reflected in the newly generated results in output streams within seconds. ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 Bakgoc.acin)LLI Stream Computing ‘Stream computing, the long-held dream of “high real-time computing” and “high-throughput computing”, with programs that compute continuous data streams, has opened up the new era of future computing due to big data, which is a datasets that is large, fast, dispersed, unstructured, and beyond the ability of available hardware and software facilities to undertake their acquisition, access, analytics, and application in a reasonable amount of time and space [7] [8]. Stream ‘computing is a computing paradigm that reads data from collections of software or hardware sensors in a stream form and computes continuous data streams, where feedback results should be in a real-time data stream as well. A data stream is a sequence of data sets, and a continuous stream is an infinite sequence of data sets, and parallel streams have more than one stream to be processed at the same time. Stream computing is one effective way to support big data by providing extremely low-latency velocities with massively parallel processing architectures, and is becoming the fastest and most efficient way to obtain useful knowledge from big data, allowing organizations to react quickly when problems appear or to predict new trends in the near future [9] [10] A big data input stream has the characteristics of high speed, real time, and large volume for applications such as sensor networks, network monitoring, micro blog, web exploring, social networking, and so on. These data sources often take the form of continuous data streams, and timely analysis of such a data stream is very important as the life cycle of most data is very short [8] [11] [12]. Furthermore, the volume of data is so high that there is no enough space for storage, and not all data need to be stored. Thus, the storing-then-computing batch computing model does not fit at all. Nearly all data in big data environments have the feature of streams, and stream computing has appeared to solve the dilemma of big data computing by computing data online Downloaded by Pranav .(pranav1931129@akgec.acin)within real-time constraints [13]. Consequently, the stream computing model will be a new trend for high-throughput computing in the big data era. 1.1.2 Application Background Big data stream computing is able to analyze and process data in real time to gain an immediate insight, and it is typically applied to the analysis of vast amount of data in real time and to process them at a high speed. Many application scenarios require big data stream computing. For example, in financial industries, big data stream computing technologies can be used in risk management, marketing management, business intelligence, and so on. In the Internet, big data stream ‘computing technologies can be used in search engines, social networking, and so on, In Internet of things, big data stream computing technologies can be used in intelligent transportation, environmental monitoring, and so on Usually, a big data stream computing environment is deployed in a highly distributed clustered environment, as the amount of data is infinite, the rate of data stream is high, and the results should be real-time feedback. 1.1.3 Chapter Organization ‘The remainder of this paper is organized as follows. In Section 1.2, we introduce data stream ‘graphs and the system architecture for big data stream computing (BDSC), and key technologies for BDSC systems. In Section 1.3, we present the system architecture and key technologies of four popular example BDSC systems, which are Twitter Storm, Yahoo! $4, and Microsoft TimeStream and Naiad. Finally, we discuss grand challenges and future directions in Section 1.4. 1.2 Overview of a Big Data Stream Computing System In this section, we first present some related concepts and definitions of directed acyclic graphs ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)and stream computing. Then, we introduce the system architecture for stream computing and the key technologies for BDSC systems in big data stream computing environments, 1.2.1 DAG and Stream Computing In stream computing, the multiple continuous parallel data streams can be represented by a task topology, also named a data stream graph, which is usually described by a directed acyclic graph (DAG) [5, 14-16]. A measurable data stream graph view can be defined by Definition 1 Definition 1. A data stream graph G is a directed acyclic graph, which is composed of a vertices set and a directed edges set, and has a logical structure and a special function, and is denoted as G=(V(G),£(G)), where V(G)={v,,v,.---.¥,} is a finite set of n vertices, which represent tasks, and E(G)={é,3.¢,3°**+,ai9} 18a finite set of directed edges, which represent data stream between vertices. If 3¢,; « E(G), then v,,v,€V(G), v,#¥;,and (v,,y;) is an ordered pair, where a data stream comes from y, and goes to v, The in-degree of vertex v, is the number of incoming edges, and the out-degree of vertex the number of outgoing edges. A source vertex is a vertex whose in-degree is zero, and an end vertex is a vertex whose out-degree is zero. A data stream graph G has at least one source vertex and one end vertex. For the example data stream graph with eleven vertices shown in Figure 1, the vertices set is vv, }, the directed edges set is { v4 and » te eacslses'*%€)x} «the source vertices are v, and vp and the end vertex is v,. The in-degree of vertex vs is one, and the out-degree of vertex v, is two. Downloaded by Pranav .(pranav1931129@akgec.acin)Figure 1. A data stream graph Definition 2. A sub-Graph sub-G of the data stream graph G is a sub-graph consisting of a subset of the vertices with the edges in between. For vertices v, and v, in the sub-Graph sub-G and any vertex v in the data stream graph G , v must also be in the sub-G if vis on a directed path from ¥, to vj that is yj, €V (sub-G), Wve (G), if veV(p(y,.v,}), then veV(p(sub-G)). A sub-Graph sub-G is logically equivalent and can be substituted by a vertex. But reducing that sub-Graph to a single logical vertex would create a graph with cycle, not a DAG. Definition 3. A path p(v,,v,) from vertex v, to vertex ¥, is a subset of E(p(v,,v,)), which should meet the conditions: 3¢,, € p(%s.%)+¢., € P(Ys%), for any directed edge ¢,, in path p(v,,¥,) displays the properties: if ki, then 3m, and ¢,., € p(v4.¥)sif 4 j, then 3m, and fim © (Yar). ‘The latency /,,(v,,v,) of a path from vertex ¥, to vertex vis the sum of latencies of both. vertices and edges on the path, as given by (1): 1, (Mar x ay A critical path, also called the longest path, is a path with the longest latency from a source Tiedooimaninnsiiotectcheseen Fy studocu Downloaded by Pranav .(pranav1931129@akgec.acin)vertex », to an end vertex v, in a data stream graph G, which is also the latency of data stream graph. G. If there are m paths from source vertex to end vertex v¢ in data stream graph G, then the latency 1(G) of data stream graph G is given by (2): v¥e)f @ 1(G)=max{l, (v,.¥)alp, (Ye¥e)» where /,,(v,,¥2) is the latency of the ith path from vertex v, to vertex ¥. Definition 4. In data stream graph G,, if 3¢, ; from vertex v, to vertex v,, then vertex y, isa parent of vertex v,, and vertex v, is a child of vertex y,. Definition 5. The throughput t(v,) of vertex y; is the average rate of successful data stream computing in a big data environment, and is usually measured in bits per second (bps). We identify the source vertex v, as in the first level, the children of source vertex v, as in the second level, and so on, and the end vertex v. as in the last level. ‘The throughput 1(/evel,) of the ith level can be calculated by (3): ve (level, = 2 (,). @) where n, is the number of vertices in the ith level If data stream graph G has m levels, then the throughput 1(G) of the data stream graph G is the minimum throughput of all the throughput in the m levels, as described by (4): 1(G)=min|{t(level,),t(level, . @) where ¢(level,) is the throughput of the ith level in data stream G. Definition 6. A topological sort TS(G)=(v..¥.,+ v,,) of the vertices V(G) in data stream Downloaded by Pranav .(pranav1931129@akgec.acin)graph G is a linear ordering of its vertices, such that for every directed edge ¢, ., (€,,., €E(G)) from vertex v, to vertex v,,, v,, comes before v,, in the topological ordering, A topological sort is possible if and only if the graph has no directed cycle, that is, it needs to be a directed acyclic graph. Any directed acyclic graph has at least one topological sort. Definition 7. A graph partitioning GP(G)={GP,,GP,,---,GP,} of the data stream graph G is a topological sort based split of the vertex set V(G) and the corresponding directed edges. A graph partitioning should meet the non-overlapping and covering properties, that is, if Vi# j, i je[Lm], then GPMGP, =, and (JGR =V(G). 1.2.2 System Architecture for Stream Computing In big data stream computing environments, stream computing is the model of straight through ‘computing. As shown in Figure 2, the input data stream is in a real-time data stream form, and all continuous data streams are computed in real time, and the results must be updated also in real time. The volume of data is so high that there is no enough space for storage, and not all data need to be stored, Most data will be discarded, and only a small portion of the data will be permanently stored in hard disks Stream computing ‘Memory { | Data stream Hardware ‘Storage Figure 2. A big data stream computing environment ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)1.2.3 Key Technologies for BDSC Systems Due to data streams’ distinct features of real time, volatility, burstiness, irregularity, and infinity in a big data environment, a well-designed big data stream computing (BDSC) system always optimizes in system structure, data transmission, application interfaces, high-availability, and so on [17-19] 1.2.3.1 System Structure ‘Symmetric structure and master-slave structure are two main system structures for BDSC systems, as shown in Figure 3 and Figure 4, respectively. In the symmetric structure system, as shown in Figure 3, the functions of all nodes are the same. So it is easy to add a new node or to remove an unused node, and to improve the scalability of a system. However, some global functions such as resources allocation, fault tolerance, and load balancing are hard to achieve without a global node. In $4 system, the global functions are achieved by borrowing distributed protocol zookeeper. Figure 3. Symmetric structure In the master-slave structure system, as shown in Figure 4, one node is the master node, and other nodes are slave nodes. The master node is responsible for global control of the system, such as resources allocation, fault tolerance, and load balancing, Each slave node has a special function, and it always receives a data stream from the master node, processes the data stream, and sends the Downloaded by Pranav .(pranav1931129@akgec.acin)results to the master node. Usually, the master node is the bottleneck in the master-slave structure system. If it fails, the whole system will not work. Figure 4. Master-slave structure 1.2.3.2 Data Stream Transmission Push and pull are two main data stream transmissions in a BDSC system. Ina push system, once an upstream node gets a result, it will immediately push the result data to downstream nodes. Doing like this, the upstream data will be immediately sent to downstream nodes. However, if some downstream nodes are busy or fail, some data will be discarded. Ina pull system, a downstream node requests data from an upstream node. If some data need to be further processed, the upstream node will send the data to the requesting downstream node. Doing like this, the upstream data will be stored in upstream nodes until corresponding downstream nodes request ‘Some data will wait a long time for further processing and may lose their timeliness. 1.2.3.3 Application Interfaces An application interface is used to design a data stream graph, a bridge between a user and a BDSC system. Usually, a good application interface is flexible and efficient for users. Currently, most of BDSC systems provide MapReduce-like interfaces, e.g., the Storm system provides Spout and Bolt as an application interface, and a user can design a data stream graph by Spout and Bolt. ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)10 Some other BDSC systems provide SQL-like interfaces and graphical user interfaces. 1.2.3.4 High-Availability State backup and recovery is the main method to achieve high-availability in a BDSC system. ‘There are three main high-availability strategies, i.e., passive standby strategy, active standby strategy, and upstream backup strategy. In passive standby strategy (see Figure 5), each primary node periodically sends checkpoint data to a backup node. If the primary node fails, the backup node takes over from the last checkpoint. Usually, this strategy will achieve precise recovery. ACK. c}+ “Checkpoint ~ Figure 5. Passive standby In active standby strategy (see Figure 6), the secondary nodes compute all data stream in parallel with their primaries. Usually, the recovery time of this strategy is the shortest. ACK (ay —+(8) \ Trim 1 N@) we) Figure 6. Active standby In upstream backup strategy (see Figure 7), upstream nodes act as backups for their downstream neighbors by preserving data stream in their output queues while their downstream neighbors compute them. If a node fails, its upstream nodes replay the logged data stream on a recovery node. Usually, the runtime overhead of this strategy is the lowest. Downloaded by Pranav .(pranav1931129@akgec.acin)w ‘Trim. ACK. Figure 7. Upstream backup A comparison of the three main high-availability strategies, i-e., passive standby strategy, active standby strategy, and upstream backup strategy in runtime overhead and recovery time is shown in Figure 8. The recovery time of upstream backup strategy is the longest, while the runtime overhead of passive standby strategy is the greatest. \ (Bie passive (tandby Etna OS VS peoqione suuuns (O upsteam (“backup recovery time sure 8. Comparison of high-availability strategies in runtime overhead and recovery time 1.3 Example BDSC Systems In this section, the system architecture and key technologies of four popular BDSC system instances are presented. These systems are Twitter Storm, Yahoo! S4, and Microsoft TimeStream and Naiad, which are specially designed for big data stream computing. 1.3.1 Twitter Storm Storm is an open source and distributed big data stream computing system licensed under the itives for doing batch Eclipse Public License. Similar to how Hadoop provides a set of general prin Tiedooimaninnsiiotectcheseen Fy studocu 1@akgec.acin) Downloaded by Pranav . (pranavi931112 processing, Storm provides a set of general primitives for doing real-time big data computing. Storm platform has the features of simplicity, scalability, fault-tolerance, and so on. It can be used with any programming language, and is easy to set up and operate [1, 20, 21]. 1.3.1.1 Task Topology In big data stream computing environments, the logic for an application is packaged in the form of task topology. Once a task topology is designed and submitted to a system, it will run forever until the user kills it. A task topology can be described as a directed acyclic graph and comprises of spouts and bolts, as shown in Figure 9. A spout is a source of streams in a task topology, and will read data stream (in tuples) from an external source and emit them into bolts, Spouts can emit more than one data stream, The processing of a data stream in a task topology is done in bolts. Anything can be done by bolts, such as filtering, aggregations, joins, and so on. Some simple functions can be achieved by a bolt, while complex functions will be achieve by many bolts. The logic should be designed by a user. For example, transforming a stream of tweets into a stream of trending images requires at least two steps: a bolt to do a rolling count of retweets for each image, and one or more bolts to stream out the top 7 images. Bolts can also emit more than one stream. Each edge in the directed acyclic graph represents a bolt subscribing to the output stream of some other spout or bolt. Figure 9. Task topology of Storm A data stream is an unbounded sequence of tuples that is processed and created in parallel in a Downloaded by Pranav .(pranav1931129@akgec.acin)1B distributed big data stream computing environment. A task topology processes data streams in any complex ways. Repartitioning the streams between each stage of the computation is needed. Task topologies are inherently parallel and run across a cluster of machines. Any vertex in a task topology can be created in many instances. All those vertices will simultaneously process a data stream, and different parts of the topology can be allocated in different machines, A good allocating strategy will greatly improve system performance. A data stream grouping defines how that stream should be partitioned among the bolt’s tasks, spouts and bolts execute in parallel as many tasks across the cluster. There are seven built-in stream groupings in Storm, such as, shuffle grouping, fields grouping, all grouping, global grouping, none grouping, direct grouping, local or shuffle grouping, and a custom stream grouping to meet special needing can also be implemented by the CustomStreamGrouping interface. 1.3.1.2 Fault-Tolerance Fault-tolerance is an important feature of Storm. If a worker dies, Storm will automatically restart it. If a node dies, the worker will be restarted on another node. In Storm, Nimbus and the Supervisors are designed to be stateless and fail-fast whenever any unexpected situation is encountered, and all state information is stored in Zookeeper server. If Nimbus or the Supervisors die, they will restart like nothing happened. This means you can kill the Nimbus and the Supervisors without affecting the health of the cluster or task topologies. When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine. When a machine dies, the tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines. ‘When Nimbus or Supervisor dies, they will restart like nothing happened. No worker processes ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)14 are affected by the death of Nimbus or the Supervisors. 1.3.1.3 Reliability In Storm, the reliability mechanisms guarantee every spout tuple will be fully processed by corresponding topology. It does this by tracking the tree of tuples triggered by every spout tuple and determining when that tree of tuples has been successfully completed. Every topology has a “message timeout” associated with it. If Storm fails to detect that a spout tuple has been completed within that timeout, then it fails the tuple and replays it later. The reliability mechanisms of Storm are completely distributed, scalable, and fault-tolerant. Storm uses mod hashing to map a spout tuple id to an acker task. Since every tuple carries with it the spout tuple ids of all the trees they exist within, they know which acker tasks to communicate with, When a spout task emits a new tuple, it simply sends a message to the appropriate acker telling it that its task id is responsible for that spout tuple. Then, when an acker sees a tree has been completed, it knows to which task id to send the completion message An acker task stores a map from a spout tuple id to a pair of values. The first value is the task id that created the spout tuple which is used later on to send completion messages. The second value is a 64 bit number called the “ack val”. The ack val is a representation of the state of the entire tuple tree, no matter how big or how small. It is simply the XOR of all tuple ids that have been created and/or acked in the tree. When an acker task sees that an “ack val” has become 0, then it knows that the tuple tree is completed. 1.3.1.4 Storm Cluster A Storm cluster is superficially similar to a Hadoop cluster. Whereas on Hadoop you run. “MapReduce jobs”, on Storm you run “topologies”. As shown in Figure 10, there are two kinds of nodes on a Storm cluster, i.e., the master node and the worker nodes. Downloaded by Pranav .(pranav1931129@akgec.acin)15 arsnyp Sanus Figure 10, Storm cluster ‘The master node runs Nimbus node that is similar to Hadoop’s “JobTracker”. In Storm, Nimbus node is responsible for distributing code around the cluster, assigning tasks to machines, monitoring for failures, and so on. Each worker node runs Supervisor node. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Each worker process executes a subset of a topology. Usually, a running topology consists of many worker processes spread across many machines. ‘The coordination between Nimbus and the Supervisors is done through a Zookeeper cluster. Additionally, the Nimbus daemon and Supervisor daemons are fail-fast and stateless; all states are kept in Zookeeper server. This means that if you kill the Nimbus or the Supervisors, they will start back up like nothing has happened. 1.3.2 Yahoo! S4 S4 is a general-purpose, distributed, scalable, fault-tolerant, pluggable platform that allows programmers to easily develop applications for computing continuous unbounded streams of big data, The core part of $4 is written in Java. The implementation is modular and pluggable, and S4 Tiedooimaninnsiiotectcheseen Fy studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)16 applications can be easily and dynamically combined for creating more sophisticated stream, processing systems. $4 was initially released by Yahoo! Inc. in October 2010 and is an Apache Incubator project since September 2011. It is licensed under the Apache 2.0 license [22-26]. 1.3.2.1 Processing Element ‘The computing unit of $4 is processing element (PE). As shown in Figure 11, each instance of processing element can be identified by four components, i., functionality, types of events, keyed attribute, and value of the keyed attribute, Each processing element processes exactly those events which correspond to the value on which it is keyed. (tanetonalty topes of events (Ceyed atibate alae of key atid Figure 11. Processing element A special class of processing elements is the set of keyless processing elements, with no keyed attribute or value. This type of processing elements will process all events of the type with which they are associated. Usually, the keyless processing elements are typically used at the input layer of an S4 cluster, where events are assigned a key. 1.3.2.2 Processing Nodes Processing nodes (PNs) are the logical hosts to processing elements. Many processing elements are working in processing element container, as shown in Figure 12. A processing node is responsible for event listener, dispatcher events, and emitter output events. In addition, routing model, load balancing model, failover management model, transport protocols, and zookeeper are deployed in communication layer. Downloaded by Pranav .(pranav1931129@akgec.acin)7 i Processing element container z CET) CPE2) ~~ (CPEn | Gwent listene®) (dispatcher) (_emiter) Routing) (Load balancing) | 2 8 ‘ailover management) —_(_Transj rotocols) “= 5. ¢ agement) Transpor protocols) « Zookeeper jE Figure 12. Processing node Alll events will be routed to processing nodes by S4 according to a hash function. Every keyed processing element can be mapped to exactly one processing node based on the value of the hash. function applied to the value of the keyed attribute of that processing element. However, keyless processing elements may be instantiated on every processing node. The event listener model of processing node will always listen event from S4. If an event is allocated to a processing node, it will be routed to an appropriate processing element within that processing node. 1.3.2.3 Fail-over, Checkpointing, and Recovery Mechanism In S4, fail-over mechani m will provide a high availability environment for $4. When a node is dead, a corresponding standby node will be used. In order to minimize state loss when a node is dead, a checkp. ing and recovery mechanism is employed by S4 In order to improve the availability of $4 system, it provides a fail-over mechanism (o automatically detect failed nodes and redirect data stream to a standby node. If you have n partitions and start m nodes, with m>n, you get m-n standby nodes. For instance, if there are 7 live nodes, 4 partitions available, then 4 of the nodes pick the available partitions in Zookeeper, and the remaining 3 nodes will be standby nodes available. Each active node consistently receives messages for the partition it picked, as shown in Figure 13(a). When Zookeeper detects nodes ‘This documents asilble ecictonwon EY studocu Downloaded by Pranav . (pranavi9311 1@akgec.acin)18 failures and notifies the other nodes. As shown in Figure 13(b), the node assigned with partition 1 fails. Unassigned nodes compete for a partition assignment and only 1 of them picks it. Other nodes are notified of the new assignment and can reroute data stream for partition 1, as shown in Figure 13(c). (Zookeeper) ‘ookeeper) Active nodes Standby nodes Active nodes Standby nodes Active nodes Standby nodes f@) (b) ©) Figure 13. Fail-over mechanism If a node is unreachable after a session timeout, Zookeeper will identify this node as dead. The session timeout is specified by the client upon connection, and is at minimum twice the heartbeat specified in the Zookeeper ensemble configuration. In order to minimize state loss when a node is dead, a checkpointing and recovery mechanism is employed by S4. The states of processing elements are periodically checkpointed and stored. Whenever a node fails, the checkpoint information will be used by the recovery mechanism to recover the state of the failed node to the corresponding standby node. Most of the previous state of a failed node can be seen in the corresponding standby node. 1.3.2.4 System Architecture InS4, a decentralized and symmetric architecture is used; all nodes share the same functionality and responsibilities (see Figure 14). There is no central node with specialized responsibilities. This greatly simplifies deployment and maintenance. Downloaded by Pranav .(pranav1931129@akgec.acin)19 7 os

Big Data Notes
No ratings yet
Big Data Notes
37 pages
Unit 2
No ratings yet
Unit 2
10 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Lec 01
No ratings yet
Lec 01
17 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Big Data
No ratings yet
Big Data
37 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
UNIT-2 (Big Data)
No ratings yet
UNIT-2 (Big Data)
30 pages
3 Challenges of Data Streaming Pipelines and How To Overcome Them
No ratings yet
3 Challenges of Data Streaming Pipelines and How To Overcome Them
5 pages
Unit4 2
No ratings yet
Unit4 2
40 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Stream Processing for IT/CSE Students
No ratings yet
Stream Processing for IT/CSE Students
57 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Lec 19
No ratings yet
Lec 19
23 pages
Data Stream in Data Analytics
No ratings yet
Data Stream in Data Analytics
4 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
Unit 3 - BD - Streaming
No ratings yet
Unit 3 - BD - Streaming
42 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Module3A MiningBigDataStreams
No ratings yet
Module3A MiningBigDataStreams
145 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Unit 3
No ratings yet
Unit 3
30 pages
Understanding Data Streams
No ratings yet
Understanding Data Streams
10 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
JyothsnaDST Unit-1 Extra
No ratings yet
JyothsnaDST Unit-1 Extra
25 pages
MMD3
0% (1)
MMD3
17 pages
Lec 19
No ratings yet
Lec 19
24 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Big Data Stream Processing Guide
No ratings yet
Big Data Stream Processing Guide
22 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
Bda 2
No ratings yet
Bda 2
16 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
No ratings yet
Mining Data Streams in Data Analytics Refers To The Process of Extracting Useful Patterns
30 pages
Big Data Stream Processing
No ratings yet
Big Data Stream Processing
25 pages
Bda Ut2 Que Ans
No ratings yet
Bda Ut2 Que Ans
14 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
100% (1)
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Data Stream Mining Essentials
No ratings yet
Data Stream Mining Essentials
33 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Streaming with Apache Flink
No ratings yet
Streaming with Apache Flink
232 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Kafka
No ratings yet
Kafka
78 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
33 pages
Lec 05
No ratings yet
Lec 05
10 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Data Stream MG
No ratings yet
Data Stream MG
528 pages
Cyber Law 2
No ratings yet
Cyber Law 2
88 pages
UNIT-5 (D) Women Entrepreneurship, Introduction, Definition and Women Entrepreneurship in India-2022
No ratings yet
UNIT-5 (D) Women Entrepreneurship, Introduction, Definition and Women Entrepreneurship in India-2022
27 pages
UNIT-4 (C) Dimensions of HRD & Basic Amenities & Population Composition-2022
No ratings yet
UNIT-4 (C) Dimensions of HRD & Basic Amenities & Population Composition-2022
26 pages
UNIT-5 (C) Rural Entrepreneurship & Rural Industry in India (Ok)
No ratings yet
UNIT-5 (C) Rural Entrepreneurship & Rural Industry in India (Ok)
34 pages
Audience Selection
No ratings yet
Audience Selection
1 page
6) Maths Unit1 Extra Question Charpit's Method, Cauchy's Method & Non - Linear Pde
No ratings yet
6) Maths Unit1 Extra Question Charpit's Method, Cauchy's Method & Non - Linear Pde
15 pages
Module-3
No ratings yet
Module-3
44 pages
Math Data Analysis
No ratings yet
Math Data Analysis
12 pages
1) Maths Unit1 NP Bali
No ratings yet
1) Maths Unit1 NP Bali
74 pages
Solution VaR and Systematic Risk
No ratings yet
Solution VaR and Systematic Risk
65 pages

Kate Wilson

Uploaded by

Kate Wilson

Uploaded by

You might also like