0% found this document useful (0 votes)
77 views8 pages

Werner Et. Al, 2020

This document discusses the development of a PLC-based 'flight recorder' for troubleshooting in machine and plant manufacturing, addressing the challenges of monitoring and debugging complex control software. It presents concepts for recording and replaying runtime data to analyze errors, emphasizing the importance of capturing input, output, and internal states without affecting real-time performance. The paper also outlines industrial use cases and requirements derived from interviews with manufacturers, aiming to enhance fault analysis and improve software quality management.

Uploaded by

Arihant Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views8 pages

Werner Et. Al, 2020

This document discusses the development of a PLC-based 'flight recorder' for troubleshooting in machine and plant manufacturing, addressing the challenges of monitoring and debugging complex control software. It presents concepts for recording and replaying runtime data to analyze errors, emphasizing the importance of capturing input, output, and internal states without affecting real-time performance. The paper also outlines industrial use cases and requirements derived from interviews with manufacturers, aiming to enhance fault analysis and improve software quality management.

Uploaded by

Arihant Dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Supporting troubleshooting in machine and plant

manufacturing by backstepping of PLC-control


software

Bernhard Werner Birgit Vogel-Heuser Simon Ziegltrum


CODESYS GmbH Chair of Automation and Information Chair of Automation and Information
Kempten, Germany Systems Systems
b.werner@codesys.com Technical University Munich Technical University Munich
Garching, Germany Garching, Germany
vogel-heuser@tum.de s.ziegltrum@tum.de

Herbert Gröbl Claus Botzenhardt


DORST Technologies MULTIVAC Sepp Haggenmüller
GmbH & Co. KG GmbH & Co. KG
Kochel am See, Germany Wolfertschweden, Germany
herbert.groebl@dorst.de claus.botzenhardt@multivac.de

Abstract—Inreasingly flexible production systems realize carried out to identify faults before commissioning. Such
functions using a combination of versatile sensors and approaches focus on the late design phase. Nevertheless,
actuators with complex control software. Hence, quality even in operation unexpected and therefore untested and
control of software and debugging of sporadic and difficult to unsupervised faults can still occur - due to design flaws or
find errors is becoming more and more expensive. Therefore hardware wear. Such events still need to be identified and
a method to monitor and replay the behavious of the system fixed. If such an error happens (e.g., incorrectly sorted
would be highly beneficial. For the realization of a PLC-based workpieces), the primary reason leading subsequently to the
"flight recorder" for machines or plants, technical as well as error often remains unclear. Extensive damage during a
industrial requirements are analyzed. Interviews conducted
failure or hasty repair attempts can obfuscate the cause of a
in two representative machine and plant manufacturers allow
the derivation of a comprehensive set of diverse use cases,
problem and, therefore, drastically increase the probability
which could be used as benchmark set for similar concepts in for hidden weak points being still present after repair.
the future. Based on the use cases, two different concepts for To be able to analyze error causes by monitoring the
the recorder functionality were developed, prototypically execution of a control program, it is necessary to have
implemented, and tested according to the machine and plant knowledge of the previous runtime data of the control, i.e.,
manufacturers’ requirements. New programming language input and output data, as well as internal states to reconstruct
elements, necessary for the implementation of this
the course of events afterward. However, in most cases, this
functionality like an operator to identify the actual running
has not been possible up until now due to insufficient or
task and the possibility to control a code dump form within
the program have already found their way into the public
incomplete historical runtime data.
software development environment (IDE) of CODESYS. In order to handle this challenge and to enable a reliable
After an evaluation using a real laboratory plant, the concepts error analysis, this paper presents a kind of flight recorder
were iteratively improved. Finally, insights of remaining functionality (“Replay-function”) that can track, recreate
research challenges or beneficial future applications for the and depict the data mentioned afore. Therefore, runtime data
developed methods are given. of the machine and plant control, such as the process image
(input and output data from sensors and actuators) and
Keywords—PLC, Replay, Troubleshooting, Backstepping,
Computer and Control Systems
internal states, is recorded. It has to be clarified how this
data acquisition and storage take place without influencing
I. MOTIVATION AND INTRODUCTION the runtime characteristics of the machine too much.
Otherwise, the real-time capability of the control could be
Today, reliability and availability [1] play an at risk. A selection of relevant data, as well as an efficient
increasingly important role in automated Production concept for data acquisition and storage, must be developed.
Systems (aPS), as increasing global competition severely
penalizes production downtime and lack of quality. These In this contribution, a concept to handle the recording of
systems are getting ever more complex, and especially the process data in order to support the subsequent fault analysis
complexity of the control software [2] is almost for M&P is introduced. The paper is structured as follows:
overwhelming. Therefore, quality management is already a Chapter II summarizes related work and state of the art
crucial task in machine and plant manufacturing (M&P). methodologies for data acquisition. In Chapter III, the
concept is illustrated and evaluated in a laboratory plant in
In research, many approaches have been proposed to Chapter IV. The findings of this work are summarized, and
handle the systems’ complexity, e.g., using model-driven an outlook for further work is given in Chapter V.
engineering methods [3], component architectures [4], and
approaches for distributed systems [5]. Most M&P uses the II. FUNDAMENTALS AND RELATED WORK
IEC 61131-3 [6] to realize the required complex and
flexible functionality of the aPS in software. In order to still This chapter provides a brief introduction of the
guarantee a high level of quality, typically, system tests are operating principles of Programmable Logic Controllers

978-1-7281-6389-5/20/$31.00 242
Authorized licensed use limited to:©2020 IEEEof Science and Technology. Downloaded
SRM Institute on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
(PLC), related work, and technologies for the realization of [11]. All presented approaches offer tools to find specific
the concept to be introduced, focusing on approaches for events in recordings. However, an ergonomic concept fully
data acquisition and analysis, as well as software testing. integrated into an IDE which presents the trace in a way
programmers are used to, i.e. like the step-mode known
The cyclic operating principle of a PLC (cp. Fig. 1) from debuggers, is still missing.
needs to be introduced in brief to highlight the importance
of recording input and output data for backstepping. The B. Software Testing
cycle time Tc is the whole time the PLC requires to digitize Quality improvement of control software is an
and copy the consistent process image (Slot “I”), execute a important topic at M&P and, therefore, many approaches
control task in which the new values for the output variables for testing in this field already exist. A differentiation
are calculated based on the input values (Slot “X”) and, after between static and dynamic testing exists. While static
that, write all the output values to the process image (Slot approaches are investigating, e.g., the structure of the
“O”). The execution time Tx can fluctuate depending on the control code [11] or integrated system models [13],
application that is executed while the cycle time Tc is fixed. dynamic testing is clustered in model-based test generation,
Slot “W” represents the laxity and, therefore, the buffer until formal verification, and virtual commissioning.
a new cycle starts. During the execution slot (“X”), no
variables contained in the process image can be set nor read Time diagrams based on a timed automata structure are
by the PLC. Only internal variables (e.g., counter variables) investigated in several approaches, such as Vyatkin and
can be changed during this time slot [7]. Bouzon [14] or Katzke and Vogel-Heuser [15]. To achieve
a formal verification of the specific behavior and to check
whether the specific behavior of the timing diagram
matches a given path in a function block, in [14], they are
formalized by using net condition and event systems.
Fig. 1. Time model of the cyclic PLC behavior [7] In Hametner et al. [16], processes of Test-Driven
Development for the development of business software are
A. Data Acquisition and Analysis
adapted to fulfill the requirements of control software. The
The realization of a procedure for the subsequent step- approach enables systematic testing by a model-based test
by-step analysis of the control program’s behavior is the case generation using UML.
subject of the work of Prähofer et al. [8]: In contrast to this
approach, our concept can be used for data analysis directly Dynamic approaches establish a relation between
on the PLC. executed code and the associated test cases by recording
traces during the execution. This can be used for the
Solutions for partial data acquisition from control prioritizing of test cases by identifying system changes
software and its memory that fulfill the applications’ real- [17], [18]. These works cannot fully be applied to M&P,
time requirements through functions and plug-ins are because essential requirements like the mechatronic
already on the market. Company A, one of the character of M&P or real-time restrictions are not
manufacturers involved in the research project, has considered in these computer science-based approaches.
experience in recording and displaying selected state These approaches do not allow the possibility to interrupt
variables like using an oscilloscope or logic analyzer. Both or interact with a test case during runtime manually, and
solutions can be triggered based on a variety of conditions also a runtime overhead is not in scope. The approach
and are suitable for optimization of a process but are limited investigated in [19] seems to be feasible for M&P, but it is
to a small number of variables and therefore restrict root neither considering test automation nor testing for newly
cause analysis of unexpected errors only to a tiny part of the introduced bugs during software patches, also called
process image. regression testing.
As suggested in Prähofer et al. [8], the sequence of the Despite many research projects which proved the benefit
control code can be reconstructed based on an initial status of a consistent replay of PLC software for several possible
image and the evaluation of reading accesses to input use cases, a holistic approach to record and depict all
variables of the PLC. This concept allows only to necessary data is still missing. All of the considered
reconstruct such software reactions that have been already techniques either lack the proof to be real-time capable on
observed and were therefore induced by hardware behavior an industrial scale or do not cover all required data. Finally,
like sensors, including malfunctions. The real-time an ergonomic way is needed to replay and browse the
applicability of the approach to industrial-scale systems has recording. Therefore, a new concept will be introduced to
not been proven yet. Especially recording techniques like fill those deficiencies in the following chapters.
these that rely on periodic snapshots can have a severe effect
on execution time, which will be shown later. III. INDUSTRIAL USE CASES AND REQUIREMENTS
Provided a successful data recording, deterministic Besides general requirements gathered, two
replay debugging allows reproducing an event of an error representative M&P partners were included in the project
deterministically. Schatz and Prähofer [9] provided a providing realistic use cases concerning industrial
concept that allows the determination of a searched event. scalability and detailed requirements. Complementary to
Therefore, a trace search algorithm using a trace log as an the functional restrictions from Section II focusing on the
input and an example application are presented. Wirth et al. recording itself, further requirements regarding the context
[10] devise an approach for dynamic analyzing, visualizing, of the data, like detection and size of interesting time
and exploring PLC programs and their reactive behavior. A frames, will be derived from typical practical examples.
tool that visualizes a trace and supports the offline Also, domain experts of these companies were interviewed
debugging of PLC applications is introduced in Berger et al. concerning a publicly available use case to discuss the

243
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
value of the new approach in Section VI. The conducted time. A low impact of the recording instrumentation on
interviews created a comprehensive set of diverse use real-time behavior is crucial for this use case.
cases, which could be used as benchmark set for similar
concepts in the future. B. Company B – Packaging Machine Manufacturer
Company B is a global leader in the field of packaging
A. Company A –Manufacturer for Metal Powder Presses machines headquartered in Germany and mainly producing
The compression of metal powder into automotive parts packaging machines for applications in the food industry,
is an economical but also technologically very complex but also for healthcare as well as for consuming and
manufacturing process in which the quality of the industrial goods. As use case, a fault analysis after a trouble
manufactured parts is highly important. The production report from their customer was analyzed. This error type
machines used in this process are equipped with up to 15 occurs from time to time throughout their product lineout
closed-loop controlled axis. The task of trajectory planning and is well understood by highly skilled technicians but ist
incorporates the technological know-how to coordinate hard to diagnose remotely and therefore would benefit
these closed-loop controlled axes to produce a perfect, tremendously from a remote diagnosis.
crack-free product. The process represents a typical
example for a cyclic production of discrete products. The Use case screw break (UC3 – low sample rate, usually
main goal of company A is to provide traceability of code unknown which variables are relevant, difficult to find
execution in case of a fault. Therefore, as a minimum but costly errors)
requirement, all input and output changes shall be recorded A commutation error arises on the servo and, therefore,
and stored until a trigger, marking the end of a production it rotates in the wrong direction after wearing of the screw
cycle within the control code or an emergency stop. With break. The error could be detected by the recording of the
this recorded data, the initial cause of the emergency stop range sensor data. The error occurs sporadically and with
shall be identified, especially for sporadic and rare faults. increasing frequency, starting from once per month up to
several times per week. The symptoms of this defect are
Use case adaptor change (UC1 – low sample rate, similar to a defective motor, wherefore an unskilled service
many surveilled IO variables) technician could try to replace the expensive servo motor
For this use case, a continuous tracing of data is and servo amplifier including a new cable set. However, the
required for a minimum of 15 seconds. The process to error could not be resolved like this and a support request
exchange the press adaptor tool is triggered manually by an is usually sent to the manufacturer shortly after.
operator. After the data collection, the data analysis shall This use case raises several challenges. It takes a long
be performed offline to identify the source of the time until the fault occurs once again with the scope of
malfunction. Typical error causes are polluted sensors, online evaluation. This is especially not feasible for
worn out breaks or stuck actuators, which can cause sporadic faults, as a trigger is hard to identify, and in the
interruptions or damage to the expensive tools during the case of an error, often relevant variables have not been
complex exchange procedure. Even if a sequential process selected for recording. Therefore, a more appropriate
like an adaptor change runs typically relatively slow, e.g., backstepping in case an error has to be realized. To fulfill
under 100 Hz, and is, therefore, less sensitive concerning these challenges, the following requirements are defined:
jitter of cycle time, thousands of internal and external
variables – mostly boolean, some integer – have to be - recorded program sequence can be replayed on the
recorded since many states and modules of a plant are application engineer’s computer offline
included. Also additional functions like setup or calibration
- debugging is possible during the post-processing of the
code are affected.
program
Use case automatic mode (UC2 – high sample rate,
- scopes can be recorded during the program sequence
few but big surveilled IO and internal variables)
In case of one production cycle (closure of a press), the With this concept, several advantages can be achieved:
tracing must start at the beginning of the closure. Data on The replay function replaces time-consuming and costly
prior cycles may be deleted at that time. One production waiting for the fault to occur at the customer site and its
cycle will last less than 10 seconds. In this operating mode, data recording by the service technician. If necessary, in the
states and current values, e.g., of the closed-loop controlled event of an error, the last ten minutes of the whole machine
axis, are relevant. The following values should be traced: state should be saved. Furthermore, the replay function
allows reproducing the fault at any time, even if the
- Position and force values machine is fixed preliminary.
- Control value of the controller
- Hydraulic pressure, volume flow Throughout the industrial study, three different, but
representative, use cases, each with a specific set of
The trigger conditions to stop the recording should be, requirements for a replay concept, could be identified. Due
for example, a workpiece with bad quality (e.g., detected to their heterogeneous nature, possible partial solutions can
by an operator and indicated via input from a human- be derived, combined, discussed and well validated in the
machine interface) and the abnormal behavior of an axis following sections.
(e.g., deviation from standard trajectory detected within the
control code). While for the production monitoring, only a IV. REPLAY CONCEPT TO SUPPORT FAULT HANDLING
small number of variables is interesting, hydraulic process In the following, the main concept of the “flight recorder
control has to be executed often with cycle times above 1 functionality” is proposed. After determining the
kHz, and typical numerical control requires large floating requirements, the prototypical implementation of solution
point variables, which gather up a lot of data over a short modules for the individual requirements in the form of

244
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
software prototypes has been worked on. While many - Predefined set of variables ("templates") for certain error
building blocks are already known from other research frameworks preconfigured stored and activated
fields, most of them have never been applied to the automatically or manually in the event of errors (UC2)
challenge of PLC data recording. Therefore, many concepts
and prototypes had to be created by the research team, - Event-based or change-based recording (e.g., for binary
whereof some will be discussed in the following. With the data): only recording of value changes instead of
help of several interim evaluations, it was possible to refine recording in each cycle (UC1)
the concepts incrementally with regard to data content, - Recording of the internal memory image at the
recording scope, recording control, and data analysis. beginning of the machine cycle and subsequent
deterministic extrapolation using the I/O data and the
A. Content of Data
control code (UC1, UC3)
Based on the question of the later replay analysis, a
differentiated selection of simultaneously observed While in feasibility experiments, the reduction of the
information is required. In order to be able to cover as wide recorded data exclusively to changes could be identified as
a range of questions as possible, the recording of several very advantageous, the negative effect of any complex
data sources should be examined more closely. The preprocessing logic during the recording on the critical time
recording of signals to and from the PLC is relevant behavior of the control can be regarded as fatal. Algorithms
because it continuously monitors the status of the plant for lossless compression or even reducing noise on analog
hardware and allows deviations from expected behavior to sensors can cause a negative impact on real-time capablility.
be dynamically reconstructed in the event of an error. Especially by recording a complete memory image at the
Depending on the application, this includes a large number beginning of a machine cycle, the jitter was increased
of boolean (UC1) and floating point values (UC2), some unacceptably (UC2). Due to the limited recording time, the
with very different change frequencies, rendering typical memory dump would also have to be repeated regularly,
measures to reduce data sizes like only saving accumulated which would have even worsened the effects on the time
averages or changed data almost useless. Since errors can behavior of the PLC drastically.
also be caused by software or by the interaction of hardware In case of the last of the evaluated concepts, the old
and software, e.g., due to unforeseen hardware behavior values of variables were only logged once each time
and missing or incorrect handling of hardware behavior in variables were overwritten using instrumented code,
software, the recording of PLC-internal variables is also including the destination address. Inverting past approaches
necessary for efficient troubleshooting using replay (UC3). of preceding research groups, by further improving the
Depending on the application, different variables are concept to a single complete memory snapshot at the
relevant here, e.g., the step position in sequence chains. The moment of the error detection and offline backward
documentation of the exact change sequence of the internal extrapolation of the software state, it was possible to realize
variables turned out to be critical (UC1). a complete program run without the need for a single
As prerequisites to overcome present technical complete memory snapshot during process runtime.
limitations restraining the realization of the desired
recording functionality, several new programming C. Control of Recording
language constructs had to be introduced to IEC 61131-3 In addition to the data itself, the time range under
by the research team. Modern PLCs are often based on consideration is also important for troubleshooting. The
processors with several computing cores, which enables recording must be started and stopped in time so that all
them to execute several independent processes relevant data for troubleshooting is still persistently
simultaneously. In order to avoid race conditions and available. The problem was identified that rare machine
ensure data consistency, new operators for the atomic errors are usually not explicitly handled in the software and,
increment of variables had to be implemented in Structured therefore, cannot be detected automatically (UC3). The
Text (ST, one of the 5 IEC 61131-3 languages) and PLC would continue to run cyclically and generate data that
integrated into the CODESYS IDE (CODESYS software would overwrite the relevant history of the error after a
development environment). short time. For this reason, a specific software method was
B. Scope of Recording integrated into the framework, which aborts the recording
at the time of its call and can thus be triggered by further
During requirements analysis, a maximum recording logic within the software or by manual intervention. This
time of two machine cycles plus the time needed to detect enables a later system programmer to implement all
or notice an occurred error was identified as well suited. One developed concepts simultaneously (all use cases):
machine cycle is the time required to process a single
workpiece. The duration of a machine cycle can vary greatly - Error message (automatic): if an error triggers a
depending on the technical process; in the case of the message, the recording can be stopped automatically
industrial partner’s problems, 10 minutes recording time by the error message.
was considered ideal by the industrial experts despite the - Value triggered: transition of a certain variable to a
significantly shorter cycle time (UC1, UC3). Depending on specified value (e.g., value overflow at position
the cycle time of the PLC and the number and rate of encoder of a motor).
changes of the variables, mechanisms were required to - Manual triggering via HMI: if the system error does
reduce the amount of stored data as much as possible. not trigger a message, skilled personnel should be able
to stop the recording manually when the error is
In order to extend the technically possible recording noticed.
time, the following different mechanisms for reducing the - Further possibilities of the trigger, e.g., envelopes and
required storage space are implemented, tested and further anomaly detection.
investigated in this work:

245
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
D. Data Analysis - r-value, i.e., the source of an assignment, is constant, or
Once the data had been recorded, tools were created to the r-value only contains data that is manipulated in the
facilitate access to the data for experts and thus enable the same task in which the access is recorded
beneficial use of the data for troubleshooting. Since many - r-value contains no function calls or only calls to
of the recorded data have a different context-sensitive functions without side effects
meaning, it must be possible to analyze the recorded data Each saved differential log-entry contains the following
by depicting the recorded values as graphs (e.g., sensor and information:
actuator data as well as internal variables) integrated with
the active state of the program at the time (e.g., active - an identification of the currently executed task
process step) and by jumping back and forth step by step - an identification of the source position
through the time curve. The playback of the recorded data, - the written value (except for logs that only record the
therefore, was realized prototypically in CODESYS in the control flow).
form of a debugging step mode, in which the control In the following code snippets, an example is given,
program runs step by step, and variables can be monitored where the source code in Fig. 2 produces the instrumented
by watches. The data evaluation and linking to the PLC code from Fig. 3. If the code is found to be executed only in
program using the demonstrator can also be carried out one task, then the task Id is a constant (0 in this case). To
remotely, offline and without connection to any PLC, e.g. efficiently determine the currently executed task, a new
in the engineering department of the manufacturer, by implicit operator was introduced, which is suppressed in this
linking the recorded data to simulate the value curves of the example for readability. All tasks use one combined write
plant sensors and actuators. Only the first reconstruction of log. The write log is stored to file together with the core
the data requires some initial computing time in the case of dump and emptied every time when a new core dump is
the last prototype, which could be optimized in the future. created.
V. PROTOTYPICALLY IMPLEMENTATION TO SUPPORT
FAULT ANALYSIS
The two most promising of the presented concepts were
fully implemented prototypically and will be introduced in
this chapter.
Fig. 2. Exemplary Source Code to generate Instrumentation Code
A. Variant I – Core Dump with Recorded Data
The first concept that has been implemented is more
closely related to the concept, as proposed by [8]. Starting
from a defined state, determined by a complete core dump
of the data, changing values are recorded (if necessary), as
well as the control flow of the program. At the time of the
replayed execution, interpreter code is generated to process Fig. 3. Produced Instrumentation Code
the logged data and reconstruct the program execution.
When the PLC is halted (at stop, breakpoint or because
A core dump is triggered by special code in the PLC- of an exception), the user can issue a command to load a
cycle. The data of the core dump is synchronously copied in replay from the PLC. When executing this command, the
the RAM, and then asynchronously written to a file on following actions are triggered:
persistent storage. If a core dump is finished, the next core
dump is triggered. This way, a history of core dumps is 1. newest core dump from the runtime is loaded
created. Since the recorded data is also part of the core 2. write log values are loaded
dump, this history can be used in principle to reconstruct the 3. special interpreter code is generated for the application
program execution for a very long time. to process the log data in combination with interpreted
write accesses
The code is instrumented with the following goals: 4. interpreter code is executed, starting with the core
- record all entries to functions dump values and producing a complete list of write
- record all necessary write accesses to static data accesses, which then is used for the replay control.
- record the value of THIS-Pointer in method calls The interpreter executes all data accesses on the core
- record path in control flow (Loops, IF-statements). dump. For recorded accesses, the values to be written are
One advantage of this concept is that many data accesses taken out of the write log. For all other accesses, the
can be reconstructed at replay time using the core dump. For interpreter code is generated. For example, a constant value
these purposes, it is important to distinguish between the assignment a := 444 is processed directly by the interpreter
terms l-values and r-values. In general, assignments consist by writing the value 444 to the memory position of the
of a left side (l-values) and a right side (r-values). Thereby, variable in the core dump. For each write access, an entry
l-values can be described as the target of an assignment or in a list of write accesses is generated, containing the old
allocation, whereby r-values represent their source. The and new values of the write access, and an identification of
following scenarios were identified, in which write accesses the corresponding source code position.
do not necessarily need to be recorded: The following example (Fig. 4 and Fig. 5) illustrates the
- l-value, i.e., the target of an assignment, is not addressed concept. In this example only constant values are
via a pointer or via a pointer which is only used in one interpreted, and all other accesses are retrieved from the
task write log.

246
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Source Code

Fig. 5. Textual representation of Interpreter Code


Fig. 7. Screenshot of Log Control Interface
After the execution of the interpreter code, the core
dump has the same state as the memory at the time when environment and to show the current values of variables as
the process was stopped and the replay was executed. In well as the execution position. In addition to normal
addition to that, the interpreter produced a complete list of debugging, stepping can be performed backward as well as
write accesses, starting at the moment of the core dump, forward and the previous and the next assignments to be
ending at the stop of the application and containing old executed are presented to the application engineer.
data, new data, and the source position of the write access.
VI. FEASIBILITY STUDY AND EVALUATION OF THE REPLAY
This list can then be used to move forwards and backward
FUNCTIONALITY
to any point in the execution when data was manipulated.
After completion of the requirement analysis and
B. Variant II – Core Dump Written at the end of the concept development, several tests were carried out to
Execution determine the technical feasibility on the basis of a
In the second concept, a core dump is only written once laboratory plant [20] and evaluated with experts from the
at the end of the execution (on user request, triggered from application partners.
the application or as a result of an exception in the code),
A. Application Example xPPU
and the list of write accesses is created backward from this
state. Therefore, the replay-log contains all overwritten The lab-sized M&P demonstrator extended Pick & Place
data (the old data), and the new data is retrieved out of the Unit (xPPU, cp. Fig. 8) [21] was developed as a case study
core dump. on the evolution in M&P automation within the Priority
Programme 1593. It consists of mechanics of a reduced
The instrumented code illustrated in Fig. 6 looks similar scale but uses authentic automation hardware. Workpieces
to the one from Variant I, but instead of the value written, are processed in four modules: They are fed from a stack-
the current value that will be overwritten is logged to the storage and are routed differently depending on their
write log. Since this overwritten value can never be known, properties. Either they are directly transported to the sorting
all write accesses in the code require instrumentation. unit by the crane, or they are moved to the stamping unit for
processing first, followed by transportation to the sorting
logistics unit. Due to the use of industrial automation
hardware as well as a complex control logic written in IEC
61131-3 ST, experiments conducted at the xPPU are
suitable to benchmark the developed prototypes and to
transfer the results to the industrial use cases.

Fig. 6. Example of the Instrumented Code

The actions that take place when a replay is executed


are similar, but not equal to concept I:
1. the newest core dump from the runtime is loaded
2. the write log values are loaded
3. the list of write accesses is generated directly (and
backward) from the core dump and the write log and
fed to the replay control.
C. User Interface for Log Control
The log control that is used to control the replay is the
same for both concepts (Fig. 7). The goal was to provide a Fig. 8. Extended Pick and Place Unit (xPPU) [21]
similar user interface as for the usual debugging

247
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
B. Evaluation Procedure and Results maximum execution time is almost neglectable. In the
In Variant I, a complete memory image of the PLC was opinion of experts from the M&P project partners, the
created cyclically and the variable values for all read influence of the instrumentation of the deployment on the
accesses were stored. In this way, the amount of data to be amount of code and data was considered acceptable, i.e., the
recorded could be reduced by eliminating values that are increase of the average cycle time. In summary, the second
subsequently no longer read (Table 1, Table 2). In both method was evaluated as superior and a successful proof-of-
Variants, additional log entries were needed to reconstruct concept. All identified functional and industrial
the control flow. While the increase in code size depends requirements were met.
solely on the logic of the control code, the data buffer can TABLE 1 Influence on Code and Data Size, lower is better
be scaled to accommodate a longer log period.
Code ca. Data ca.
Without instrumentation, only current values of [kBytes] [kBytes]
variables are stored in volatile memory, while in Variant I Without Replay-
and II, additional buffer size can be allocated to maintain Instrumentation
485 443
extended periods of recorded data. In the measurements, With Replay-Instrumentation 527 2,280
the buffer was tuned to accommodate approximately one (Variant I) (+8.6%) (+514.7%)
process cycle. While the increase in size of the control code With Replay-Instrumentation 540 3,066
executable of 9 and 11 percent might be less important (Variant II) (+11.3%) (+692.1%)
compared to the size and prize of modern non-volatile TABLE 2 Data Volume Replay per PLC Cycle, lower is better
storage, the size of additionally necessary high-speed
Logs / Cycle Recorded Data / Cycle [Bytes]
storage has to be discussed. Mass storage media like HDDs
are not suitable to deal with the necessary data ingestion Variant I 167 205
rates. Even modern high speed mass storage like flash or Variant II 180 190
TABLE 3 Influences of Replay on Real-Time Behavior, lower is better
even x-point drives are seldomly used or even certified for
industrial applications. Therefore, only S- or DRAM Average Maximum
remains as viable solution for the required technical Execution Time Execution Time Tx
specifications. However, depending on the desired Tx [µs] [µs]
recording time, e.g. one machine cycle, an increase of over Without Replay-Code 31 46
500% in necessary and expensive RAM has to be carefully 78 4,288
With Replay-Code
balanced against the possible benefit. Even considering the (Variant I) (+151.6%) (+832.2%)
amount of RAM integrated in industrial PCs, the maximum 52 74
amount of RAM within PLCs is usually very limited. With Replay-Code
(Variant II) (+67.7%) (+60.9%)
Depending on the frequency of core dumps in relation to
incremental logs in Variant I, both approaches can be C. Discussion of Both Approaches
superior concerning storage needs. Variant I has the advantage that the number of write
However, the enormous influence of Variant I on the accesses can be reduced. In reality, it was found that it is
maximum cycle time was classified as unacceptable for a challenging to have complete knowledge of the written
real-time system (Table 3) in many cases by the industrial data. Any access via a pointer to memory can alter any
experts of M&P. The control code was excuted on a other data. There may even be other tasks running on a PLC
CX2040 by Beckhoff, using a modified version of the with access to the same data as the PLC application. Even
runtime CODESYS Control. The process logic interacted side effects like differences in the PLC firmware influence
with approx. 30 electropneumatic actuators and 70 digital the scheduling. To be on the safe side, only constant values
and analog inputs, whereas depending on the currently could be interpreted by the interpreter. Regardless of these
active module of the production chain, not all sensors were rare uncertainties, which can be further reduced using
checked at all times. The PLC cycle time was fixed to 20 organizational precautions, since the results are not directly
ms during the measurements. The measurements were used for safety-critical applications, they can enable a great
averaged for several minutes. Even though the xPPU is insight in the development of otherwise hard to monitor
significantly smaller and the logic also simpler than the error cascades. Another advantage of the concept is that it
industrial use cases, all measurements were rated as can trace back the execution for a very long time, due to the
transferable to industrial applications by experts of M&P as history of core dumps. However, the side effects of writing
well as CODESYS. Since most manufacturers select their core dumps continuously were found to be too grave to be
PLCs with minimal overhead in computational capacity for ignored. Furthermore, the size of memory reserved for the
future functionality, the introduced increase in average write log needs to be sufficiently large to hold all logged
execution time of 152 and 68% will usually require a data between two core dumps, though this size can hardly
dedicated upgrade of the hardware. However, regarding the be calculated precisely at compile time.
potential benefits of these new concepts, upgrades could be Variant II is simpler and more robust because no
still reasonable economically. The disproportionate interpreter code is needed, and it can be more easily scaled
increase in maximum execution time of Variant I of over down. Even a write log consisting of as few as ten write
800%, caused by the periodic core dumps, requires specific accesses may contain helpful information if, at the point of
use cases to justify the added effort. an exception, the last ten assignments can be reconstructed.
In Variant II, a complete memory dump was created No cyclic core dump is written during runtime and,
only at the end of the recording, and the value changes were therefore, the list of write accesses can only be
reconstructed from overwritten variable values from the log. reconstructed as far as the logged differential data updates
Hence, the influence of Variant II on the jitter of the reach back in time. The effort for the PLC is still bigger

248
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.
than for a normal cycle since more log calls are necessary. REFERENCES
However, there is no peak in PLC cycle time for creating [1] B. Vogel-Heuser, A. Fay, I. Schäefer, and M. Tichy, “Evolution of
the core dump like in Variant I. This behavior is usually an software in automated production systems: Challenges and research
advantage in process control, where predictable and directions,” J. Syst. Softw., vol. 110, pp. 54–84, Dec. 2015
constant cycle times are desired. [2] V. Vyatkin, “Software engineering in industrial automation: State-of-
theart review,” IEEE T-II., vol. 9, no. 3, pp. 1234–1249,. 2013.
Concludingly, Variant II offers a suitable solution for [3] M. L. Alvarez, I. Sarachaga, A. Burgos, E. Estévez, and M. Marcos,
UC1 and UC2, where interesting variables are known up “A methodological approach to model-driven design and
front. Since the influence on required space and real-time development of automation systems,” IEEE T-ASE., vol. 15, no. 1,
capability of the PLC is small, the approach is suitable to pp. 67–79, 2018.
run in background of a productive plant continuously. In [4] R. Hametner, A. Zoitl, and M. Semo, “Automation component
UC3 on the other hand, interesting variables are either not architecture for the efficient development of industrial automation
known before or are not necessarily actively processed by systems,” in IEEE CASE,. 2010, pp. 156–161.
the control logic and therefore not covered by Variant II. [5] F. Basile, P. Chiacchio, and D. Gerbasio, “On the implementation of
Even in this case, Variant I is still a conceivable option industrial automation systems based on PLC,” IEEE T-ASE., vol. 10,
no. 4, pp. 990–1003, 2013.
since it is capable to record whole memory snapshots of a
[6] Programmable Controllers—Part 3: Programming Languages,
PLC, regardless of the final point of interest. This feature Standard IEC 61131-3, International Electrotechnical Commission,
in turn has to be paid by increased requirements on PLC 2003.
hardware as well as increased jitter in PLC cycle times. [7] D. Witsch and B. Vogel-Heuser. "PLC-Statecharts: An Approach to
Integrate UML-Statecharts in Open-Loop Control Engineering –
VII. CONCLUSION AND FUTURE WORK Aspects on Behavioral Semantics and Model-Checking," in 18th
Based on a study of two representative manufacturers, IFAC World Congress, 2011, pp. 7866-7872.
three use cases for the realization of the "flight recorder [8] H. Prähofer, R. Schatz, C. Wirth and H. Mössenböck, "Deterministic
Replay Debugging of IEC 61131-3 SoftPLC programs,"2010 8th
functionality" were derived. Combining existing partial IEEE ICIT, 2010, pp. 1110-1117.
solution blocks with new innovative approaches, two [9] R. Schatz and H. Prähofer, "Analyzing Long-Running Controller
different concepts were developed, prototypically Applications for Specification Violations Based on Deterministic
implemented, and tested according to functional and Replay,"2012 38th Euromicro Conference on Software Engineering
industrial requirements and use cases. and Advanced Applications, 2012, pp. 55-62.
[10] C. Wirth, H. Prähofer and R. Schatz, "A multi-level approach for
After an evaluation using a laboratory plant, the visualization and exploration of reactive program behavior,"2011
concepts were discussed with M&D experts and improved 6th International Workshop on Visualizing Software for
iteratively. Finally, an approach was chosen in which all Understanding and Analysis, Williamsburg, VA, 2011, pp. 1-4.
overwritten variable values are stored in a log by automatic [11] R. Berger, H. Prähofer, C. Wirth and R. Schatz, "A tool for trace
instrumentation of the code. The available recording visualization and offline debugging of PLC
applications,"Proceedings of 2012 IEEE ETFA, 2012, pp. 1-8.
history depends on the log size and the number of write
[12] H. Prähofer, F. Angerer, R. Ramler, H. Lacheiner, and F.
accesses of the machine logic but can be scaled within the Grillenberger, “Opportunities and challenges of static code analysis
available memory space. Starting from a recorded memory of IEC 61131-3 programs,” in IEEE ETFA, 2012, pp. 1–8.
image, the program can be reconstructed offline and [13] E. Estevez and M. Marcos, “Model-based validation of industrial
backward within the IDE and analyzed step by step. control systems,” IEEE T-II, vol. 8, no. 2, pp. 302–310, 2012.
In the course of the field study, it was shown that both [14] V. Vyatkin and G. Bouzon, “Using Visual Specifications in
Verification of Industrial Automation Controllers,” EURASIP
implemented concepts are suitable for finding errors in the Journal on Embedded Systems, vol. 2008, no. 3, pp. 1–9, 2008.
code, for example, in case of unexpected exceptions in the [15] U. Katzke and B. Vogel-Heuser. "Combining UML with IEC 61131-
code execution. However, when it comes to errors in the 3 languages to preserve the usability of graphical notations in the
execution of the machine, for example, because a sensor software development of complex automation systems," in 10th
does not work, then both methods are only suitable if the IFAC IFIP/IFORS/IEA Symposium on Analysis, Design, and
corresponding IO variables are manipulated by the Evaluation of Human-Machine Systems, Sep. 2007, pp. 90-94.
software during the incident and therefor instrumented. In [16] Hametner, R., Winkler, D., Östreicher, T., Biffl, S., & Zoitl, A.
(2010). The adaptation of test-driven software processes to industrial
this use case, an approach to record the IO points in every automation systems. In IEEE INDIN.
PLC cycle would be more suitable. [17] G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold, “Prioritizing
Future work should concentrate on how beneficial test cases for regression testing,” IEEE Trans. Softw. Eng., vol. 27,
no. 10, pp. 929–948, 2001.
variables for error tracing could be identified in advance.
[18] H. Prähofer, R. Schatz, C. Wirth, and H. Mössenböck, “A
Furthermore, applications in the field of factory automation comprehensive solution for deterministic replay debugging of
typically have a machine cycle, a repeating sequence of SoftPLC applications,” IEEE T-II, vol. 7, no. 4, pp. 641–651, Nov.
similar work steps. The values of process inputs are 2011.
supposed to follow patterns according to this cycle. The [19] B. Vogel-Heuser, S. Rösch, J. Fischer, T. Simon, S. Ulewicz and J.
gathered data could be used to detect these patterns and Folmer. "Fault handling in PLC-based Industry 4.0 automated
perhaps to detect anomalies and, therefore, hardware production systems as a basis for restart and self-configuration and
its evaluation,"Journal of Software Engineering and Applications,
problems before it comes to a failure of the machine. vol. 9, no. 1, pp. 1-43, 2016.
ACKNOWLEDGMENT [20] B. Vogel-Heuser, C. Legat, J. Folmer and S. Feldmann.
“Researching Evolution in Industrial Plant Automation: Scenarios
We thank the Bavarian Research Foundation for the and Documentation of the Pick and Place Unit.” mediaTUM,
funding of the research project EFIMA- "Efficient Munich, Germany: 2014.
troubleshooting for safe, versatile machine and plant [21] R. Heinrich, S. Koch, S. Cha, K. Busch, R. Reussner and B. Vogel-
Heuser. "Architecture-based Change Impact Analysis in Cross-
automation". disciplinary Automated Production Systems," Journal of Systems
and Software (JSS), vol. 146, no. 1, pp. 167-185, 2018.

249
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 05,2023 at 13:56:37 UTC from IEEE Xplore. Restrictions apply.

You might also like