Tracing of large‐scale actor systems
M Ciołczyk, M Wojakowski… - … : Practice and Experience, 2018 - Wiley Online Library
M Ciołczyk, M Wojakowski, M Malawski
Concurrency and Computation: Practice and Experience, 2018•Wiley Online LibraryIn large‐scale and distributed actor systems, there are situations where processing
messages within one of the actors fails, often due to failures that had occurred earlier in the
system. In such cases, tracing down the origin of the failure is difficult since existing
monitoring tools only provide ways to collect metrics and statistical information about system
execution. In this paper, we describe a new tool for tracing distributed actor systems, Akka
Tracing Tool, a library that allows users to generate a trace graph of messages. To address …
messages within one of the actors fails, often due to failures that had occurred earlier in the
system. In such cases, tracing down the origin of the failure is difficult since existing
monitoring tools only provide ways to collect metrics and statistical information about system
execution. In this paper, we describe a new tool for tracing distributed actor systems, Akka
Tracing Tool, a library that allows users to generate a trace graph of messages. To address …
Summary
In large‐scale and distributed actor systems, there are situations where processing messages within one of the actors fails, often due to failures that had occurred earlier in the system. In such cases, tracing down the origin of the failure is difficult since existing monitoring tools only provide ways to collect metrics and statistical information about system execution. In this paper, we describe a new tool for tracing distributed actor systems, Akka Tracing Tool, a library that allows users to generate a trace graph of messages. To address the distributed nature of the environment, we proposed an efficient data collection mechanism based on the one‐way replication technique implemented in CouchDB, a popular document database. The tool was evaluated in a distributed environment of up to 50 nodes set up in the Amazon Web Services (AWS) computing cloud on a real application: car traffic simulation. The measured overhead when tracing all messages was between 39% to 45% on average. The library also proved to be scalable with respect to the number of nodes in the actor system and to be user‐friendly. Owing to these properties, we expect that the tool can simplify finding errors and speed up the development process of actor systems.
Showing the best result for this search. See all results