0% found this document useful (0 votes)
29 views13 pages

10-Mining Specifications

This paper presents specification mining, a machine learning approach to automatically discover formal specifications for program behaviors, particularly in the context of APIs and abstract data types. By analyzing execution traces of a running program, the tool infers specifications that can help identify errors and improve program verification. The authors highlight the potential of this technique to enhance the practicality of program verification by reducing the need for manual specification writing.

Uploaded by

jojokaway
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views13 pages

10-Mining Specifications

This paper presents specification mining, a machine learning approach to automatically discover formal specifications for program behaviors, particularly in the context of APIs and abstract data types. By analyzing execution traces of a running program, the tool infers specifications that can help identify errors and improve program verification. The authors highlight the potential of this technique to enhance the practicality of program verification by reducing the need for manual specification writing.

Uploaded by

jojokaway
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Mining Specifications

Glenn Ammons Rastislav Bodı́k James R. Larus


Dept. of Computer Sciences Dept. of Computer Sciences Microsoft Research
University of Wisconsin University of Wisconsin One Microsoft Way
Madison, Wisconsin, USA Madison, Wisconsin, USA Redmond, Washington, USA
ammons@cs.wisc.edu bodik@cs.wisc.edu larus@microsoft.com

ABSTRACT These tools, in general, statically compute an approximation of


Program verification is a promising approach to improving program a program’s possible dynamic behaviors and compare it against a
quality, because it can search all possible program executions for specification of correct behavior. These specifications often are
specific errors. However, the need to formally describe correct be- easy to develop for language-specific properties—such as avoid-
havior or errors is a major barrier to the widespread adoptio n of ing null dereferences and staying within array bounds. Even when
program verification, since programmers historically have been re- language properties are more difficult to express and check, they
luctant to write formal specifications. Automating the process of potentially apply to every program written in the language, so an
formulating specifications would remove a barrier to program ver- investment in a verification tool can be amortized easily.
ification and enhance its practicality. On the other hand, specifications particular to a program, say of
This paper describes specification mining, a machine learning its abstractions or datatypes, may be difficult and expensive to de-
approach to discovering formal specifications of the protocols that velop because of the complexity of these mechanisms and the lim-
code must obey when interacting with an application program in- ited number of people who understand them. Also, as these spec-
terface or abstract data type. Starting from the assumption that a ifications may apply to only one program, their benefits are corre-
working program is well enough debugged to reveal strong hints spondingly reduced. Program verification is unlikely to be widely
of correct protocols, our tool infers a specification by observing used without cheaper and easier ways to formulate specifications.
program execution and concisely summarizing the frequent inter- This paper explores one approach to automating much of the pro-
action patterns as state machines that capture both temporal and cess of producing specifications. This technique, called specifica-
data dependences. These state machines can be examined by a pro- tion mining, discovers some of the temporal and data-dependence
grammer, to refine the specification and identify errors, and can be relationships that a program follows when it interacts with an ap-
utilized by automatic verification tools, to find bugs. plication programming interface (API) or abstract datatype (ADT).
Our preliminary experience with the mining tool has been A specification miner observes these interactions in a running pro-
promising. We were able to learn specifications that not only cap- gram and uses this empirical data to infer a general rule about how
tured the correct protocol, but also discovered serious bugs. programs should interact with the API or ADT. These rules are con-
cisely summarized as state machines that capture both temporal and
data dependences. These state machines can be both examined by
1. INTRODUCTION a programmer, to refine the specification and identify errors, and
It is difficult to imagine software without bugs. The richness utilized by automatic verification tools, to find bugs.
and variety of errors require an equally diverse set of techniques Mining proceeds under the assumption that an executing pro-
to avoid, detect, and correct them. Testing currently is the detec- gram, which presumably has passed some tests, generally uses an
tion method of choice. However, the high cost and inherent limita- API or ADT correctly, so that if a miner can identify the common
tions of testing has lead to a renewed interest in other approaches behavior, it can produce a correct specification, even from pro-
to finding bugs. One of the most promising directions is tools that grams that contain errors. Rather than start from the program’s
systematically detect important classes of errors [1–3, 5, 6, 9, 10]. text, in which feasible and infeasible paths are intermixed and cor-
While program verification tools do not prevent programming rect paths are indistinguishable, mining begins with traces of a pro-
errors, they can quickly and cheaply identify oversights early in gram’s run-time interaction with an API or ADT. These traces are
the development process, when an error can be corrected by a pro- not only limited to feasible paths, but in general do not contain er-
grammer familiar with the code. Moreover, unlike testing, some rors.
verification tools can provide strong assurances that a program is The program in Figure 1 illustrates these points. The program
free of a certain type of error. uses the server-side socket API [7]. It generally observes the cor-
rect protocol: create a new socket s through a call to socket,
prepare s to accept connections by calling bind and listen, call
accept for each connection, service each connection, and finally
Permission to make digital or hard copies of all or part of this work for call close to destroy s. Unfortunately, the program is buggy: if
personal or classroom use is granted without fee provided that copies are the return statement on line 16 is executed, s is never closed.
not made or distributed for profit or commercial advantage and that copies Even though a program is buggy, individual interaction traces
bear this notice and the full citation on the first page. To copy otherwise, to can be correct. Figure 2 shows one such trace. If cond1 is rarely
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
true, it might be difficult to invent a test to force the program to
POPL ’02, Jan. 16-18, 2002 Portland, OR USA behave badly. On the other hand, correct traces enable a miner to
Copyright 2002 ACM ISBN 1-58113-450-9/02/01 ...$5.00.
1 int s = socket(AF_INET, SOCK_STREAM, 0); socket(return = x)
2 ...
3 bind(s, &serv_addr, sizeof(serv_addr));
4 ...
5 listen(s, 5); bind(so = x)
6 ...
7 while(1) {
8 int ns = accept(s, &addr, &len);
9 if (ns < 0) break;
10 do { listen(so = x)
11 read(ns, buffer, 255);
12 ...
13 write(ns, buffer, size);
14 if (cond1) return; accept(so = x, return = y)
15 } while (cond2)
16 close(ns);
17 }
18 close(s);

Figure 1: An example program using the socket API. read(fd = y) write(fd = y)

1 socket(domain = 2, type = 1, proto = 0,


return = 7)
2 bind(so = 7, addr = 0x400120, addr_len = 6,
return = 0)
3 listen(so = 7, backlog = 5, return = 0)
4 accept(so = 7, addr = 0x400200, close(fd = y)
addr_len = 0x400240, return = 8)
5 read(fd = 8, buf = 0x400320, len = 255,
return = 12)
6 write(fd = 8, buf = 0x400320, len = 12, close(fd = x)
return = 12)
7 read(fd = 8, buf = 0x400320, len = 255, Figure 3: The output of our mining process: a specification
return = 7) automaton for the socket protocol.
8 write(fd = 8, buf = 0x400320, len = 7,
return = 7)
9 close(fd = 8, return = 0)
10 accept(so = 7, addr = 0x400200,
addr_len = 0x400240, return = 10)
interactions—and puts the scenarios into a standard, abstract form.
11 read(fd = 10, buf = 0x400320, len = 255, The automaton learner is composed of two parts: an off-the-shelf
return = 13) probablistic finite state automaton (PFSA) learner and a postpro-
12 write(fd = 10, buf = 0x400320, len = 13, cessor called the corer. The PFSA learner analyzes the scenario
return = 13) strings and generates a PFSA, which should be both small and
13 close(fd = 10, return = 0) likely to generate the scenarios. A PFSA is a nondeterministic fi-
14 close(fd = 7, return = 0)
nite automaton (NFA) in which each edge is labelled by an abstract
interaction and weighted by how often the edge is traversed while
Figure 2: Part of the input to our mining process: a trace of an generating or accepting scenario strings. Rarely-used edges corre-
execution of the program in Figure 1. spond to infrequent behavior, so the corer removes them. The corer
also discards the weights, leaving an NFA. A human can validate
the NFA by inspection, at which point the NFA specification can
be used for program verification. Figure 1 shows a specification for
infer a specification of the correct protocol. A verification tool that
the socket protocol of the program in Figure 1.
uses the specification to examine all program paths (e.g. [2, 10])
This paper makes the following contributions:
could then find the rare bug.
Our specification mining system is composed of four parts: • We introduce a new approach, called specifications mining,
tracer, flow dependence annotator, scenario extractor, and automa- for learning formal correctness specifications. Since specifi-
ton learner (Figure 4). The tracer instruments programs so that they cation is the portion of program verification still dependent
trace and record their interactions with an API or ADT, as well as primarily on people, automating this step can improve the
compute their usual results. We implemented two tracers. One is appeal of verification and help improve software quality.
a replacement for the C stdio library, which requires recompiling
programs. The other is a more general executable editing tool that • We use the observation that common behavior is often cor-
allows arbitrary tracing code to be inserted at call sites. The tracers rect behavior to refine the specifications mining problem into
produce traces in a standard form, so that the rest of the process is a problem of probabilistic learning from execution traces.
independent of the tracing technology. • We develop and demonstrate a practical technique for prob-
Flow dependence annotation is the first step in refining the traces abilistic learning from execution traces. Our technique re-
into interaction scenarios, which can be fed to the learner. It con- duces specification mining to the problem of learning regular
nects an interaction that produces a value with the interactions that languages, for which off-the-shelf learners exist.
consume the value. Next, the scenario extractor uses these depen-
dences to extract interaction scenarios—small sets of dependent The rest of the paper is organized as follows. Section 2 develops
Program Tracer Instrumented with malloc), and then destroys the linked list, deallocating the
program nodes with free in first-allocated, last-freed order. Ignoring the
finite arithmetic, the traces do not form a regular language. First,
a regular language must be defined over a finite alphabet, but
LinkedList can make an unbounded number of distinct mal-
Test inputs Run Traces loc and free calls. Second, LinkedList always makes a num-
ber of malloc calls followed by an equal number of free calls,
which is the canonical non-regular language.
Although LinkedList’s traces do not form a regular language,
Flow dependence Annotated
its traces contain subtraces that do. Given a trace and an object o
annotator traces
mentioned in that trace, consider the subtrace of the trace contain-
ing calls to malloc that return o and calls to free that are passed
o. The subtrace is simply a malloc call, followed by a free call.
Abstract If the trace mentions n objects, there is one such subtrace for each
Scenario seeds Scenario extractor
scenarios object. Each subtrace is exactly like all of the others, except for
the particular object that it allocates and frees. Now replace that
object in each subtrace with a standard name, say ostd . Now, all
of the subtraces are identical, and the learner has a very strong hint
Automaton Specification that free should always follow malloc. We call the renamed
learner subtraces interaction scenarios.
Our approach simplifies Problem 2.1 in two ways. First, the
learner does not learn directly from traces. Instead, a preprocessor
Figure 4: Overview of our specification mining system.
extracts interaction scenarios from the traces. The scenarios manip-
ulate no more than k data objects, for some k; in the LinkedList
example, k = 1. Second, the set CS of correct scenarios is required
a formal statement of the specification mining problem. Section 3 to be regular. The simplified specification mining problem can now
discusses tracers and the flow dependence annotator. Section 4 de- be defined:
scribes the scenario extractor and the automaton learner. Section 5
presents a dynamic checker for the mined specifications. Section 6 P ROBLEM 2.2. Let IS be the set of all interaction scenarios
discusses the results of some experiments with the mining tool and with an API or ADT that manipulate no more than k data objects.
the checker. Section 7 discusses related work and Section 8 con- Let CS ⊆ IS be the regular set of all correct scenarios. Given an
cludes the paper. unlabelled training set TS of interaction scenarios from IS , find a
finite-state automaton AS that generates exactly the scenarios in
CS .
2. THE PROBLEM
This section develops a formal statement of the specification Problem 2.2 is also impossible to solve. The careful reader may
mining problem. At its most ambitious, specification mining at- have noticed that the training set TS does not depend on CS . That
tempts to solve an unsolvable problem: is, no matter what CS is, any subset of IS is a valid training set!
Obviously, under these conditions, there is no basis on which to
P ROBLEM 2.1. Let I be the set of all traces of interactions with choose SS . The definition of Problem 2.2 allowed the training sets
an API or ADT, and C ⊆ I be the set of all correct traces of in- to be chosen so liberally in order to allow “noisy” training sets that
teractions with the API or ADT. Given an unlabelled training set T contain bad examples (that is, bugs) that are not in CS . A satis-
of interaction traces1 from I, find an automaton A that generates factory definition of noise must wait until the problem has been
exactly the traces in C. A is called a specification. An algorithm simplified further. For now, we simplify the problem by assuming
that finds A is called a specification miner. that the training “set” is in fact an infinite sequence of scenarios
The rest of this section examines successive restrictions of this from CS alone, such that each element of CS occurs at least once:
problem, which lead to a problem that can be attacked with the
P ROBLEM 2.3. Let IS be the set of all interaction scenarios
methods of this paper. These simplifications were choosen to acco-
with an API or ADT that manipulate no more than k data objects.
modate our techniques, and other restatements of the specification
Let CS ⊆ IS be the regular set of all such correct scenarios. Fi-
mining problem are certainly possible.
nally, let TS = c0 , c1 , . . . be an infinite sequence of elements from
Problem 2.1 can not be solved because it places no restrictions
CS in which each element of CS occurs at least once.
on the set C. If C is not recursively enumerable, then A does not
For each n > 0, examine the first n elements of TS and produce
exist. In this paper, we require that C be generated by a finite-
a finite-state automaton ASn , such that the sequence of finite-state
state automaton (that is, C is a regular language). This decision
automata AS0 , AS1 , . . . has this property: for some N ≥ 0, ASN
is forced by two practical considerations. First, model checkers re-
generates exactly the scenarios in CS and ASn = ASN for all
quire finite-state specifications. Second, the algorithms for learning
n ≥ N . We say that the sequence AS0 , AS1 , . . . identifies CS in
finite-state automata are relatively well-developed.
the limit.
It is not enough to simply say that C must be regular, because
traces of most programs are not regular. For example, consider Perhaps surprisingly, Problem 2.3 is also undecidable. Our def-
a C program (LinkedList) that takes a number n on the com- inition of Problem 2.3 is inspired by E Mark Gold’s seminal paper
mand line, constructs a linked list of size n (allocating the nodes on language identification in the limit [14], in which Gold shows
1 that regular languages can not be identified in the limit [14, The-
By an “unlabelled training set”, we mean that no information is
provided as to which of the elements in T are also in C. orem I.8]. His proof is too long to repeat here, but the idea of
int instrumented_socket(int do-
the proof is to present the members of an infinite regular language main, int type, int proto)
to the learner in such a way that the learner is forced to change {
its guess infinitely often, cycling through a never ending sequence int rc = socket(domain, type, proto);
of finite sublanguages of the infinite language. Intuitively, the fprintf(the_trace_fp,
learner’s dilemma is that any finite sequence of examples from the "socket(domain = %d, type = %d, "
infinite language is also a sequence of examples from a finite lan- "proto = %d, return = %d)\n",
domain, type, proto, rc);
guage, and the learner has no basis for preferring one of these over return rc;
the other. Since CS is a possibly infinite regular language, Gold’s }
theorem applies to Problem 2.3.
Gold’s paper did not end work on learning regular languages
from examples. Subsequent work avoids the dilemma exploited Figure 5: Illustration of trace instrumentation (instrumented
in Gold’s proof by providing the learner with extra information that version of socket).
allows it to justify choosing a less general automaton over a more
general one (and vice versa). One class of approaches presents the
learner with examples generated according to a probability distri- Unfortunately, with reasonable distance metrics D, it has been
bution; this sort of approach is particularly interesting to us because shown that Problem 2.4 is not efficiently learnable [16]. An ef-
it also gives the learner a method for dealing with noise in its in- ficient solution has been found for the case where M and M are
put. The task of the learner is to learn a close approximation of the required to be acyclic and deterministic [24]. Since many interest-
probability distribution: ing specifications of program behavior contain loops, we chose to
Let IS be the set of all interaction scenarios with an use a greedy PFSA learning algorithm that is not guaranteed to find
API or ADT that manipulate no more than k data ob- an ǫ-good approximation of M , but in practice generates succinct
jects. Let P and Pb be probability distributions over specifications.
IS . We say that Pb is an ǫ-good approximation of P ,
for ǫ ≥ 0, if
3. TRACING AND FLOW DEPENDENCE
D(P, Pb) ≤ ǫ ANNOTATION
where D(P, Pb) is some measure of distance between This section describes the tracing and flow dependence annota-
P and Pb. tion that produce the input to the scenario extractor.
Tracing A tracer instruments a program, so that running it pro-
Just as Problem 2.2 restricted CS to be a regular set, P must be duces a trace of its interactions with an API or ADT, as well as its
restricted to a manageable class of distributions. We choose the usual results. This paper assumes that a tracer only records function
distributions generated by probabilistic finite state automata (PF- calls and returns, although depending on the API/ADT, the mining
SAs). A PFSA is a probabilistic analogue of a nondeterministic system allows tracing other events, such as variable accesses or net-
finite state automaton. That is, a PFSA is a tuple (Σ, Q, qs , qf , p) work messages.
where Figure 5 shows an illustration of the trace instrumentation,
• Σ is an output alphabet. specifically the C code for an instrumented version of the socket
call. This wrapper calls the real socket and records information
• Q is a set of states. about the interaction: the name of the call (socket), arguments,
and return value. The entire socket API could be traced with an
• qs ∈ Q is the start state of the automaton.
instrumented version of each function.
• qf ∈ Q is the final state of the automaton. Our system currently uses two tracers. The first instruments the
C stdio library, by capturing all library calls and macro invocations
• p(q, q ′ , a) is a probability function, giving the probability of
in that API. The second consists of two parts: Perl scripts that au-
transitioning from q ∈ Q to q ′ ∈ Q while outputting the
tomatically generate instrumented versions of the function calls in
symbol a ∈ Σ. Note that p(qf , q ′ , a) = 0 for all q ∈ Q and
the X11 API, and a tool that edits program executables to replace
a ∈ Σ.
calls on these routines with calls to instrumented versions. The lat-
Thus, a PFSA generates a distribution that assigns positive prob- ter tool is based on the EEL Executable Editing Library [17] and
abilities to the strings in a regular language. Basing our defini- is very general. It takes as input an executable, a library of in-
tion on the standard definition for learning probabilistic finite au- strumented functions, and a file specifying which calls in the exe-
tomata [20], we can now give our final formulation of the specifi- cutable to replace with calls to instrumented functions. The most
cation mining problem: time-consuming part of tracing an interface is writing the instru-
mented version of each API call, but we believe that this step is
P ROBLEM 2.4. Let IS be the set of all interaction scenarios
easily automated.
with an API or ADT that manipulate no more than k data objects.
All tracers record interactions in the same format, so that the rest
Let M be a target PFSA, and P M be the distribution over IS that
of the mining system is independent of the particular tracer used.
M generates. Intuitively, P M assigns high probabilities to correct
An interaction skeleton is of the form
traces and low probabilities to incorrect traces.
Given a confidence parameter δ > 0 and an approximation pa- interaction(attribute 0 , . . . , attribute n )
rameter ǫ > 0, efficiently find with probability at least 1 − δ a
PFSA M such that its distribution P M is an ǫ-good approximation where interaction names the interaction (that is, the name of a
of P M . “Efficiently” means that the mining algorithm must run in function) and attribute i names the ith attribute of the interaction.
time polynomial in 1/ǫ, 1/δ, an upper bound n on the number of Skeletons are just a convenient way of grouping interactions. They
states of M , and the size of the alphabet Σ of I. do not appear in traces. An interaction instantiates a skeleton by
assigning values to the attributes: Definers: socket.return
bind.so
interaction(attribute 0 = v0 , . . . , attribute n = vn ) listen.so
accept.return
When tracing function calls, interaction attributes usually represent close.fd
function arguments and return values, as in Figure 2. Structured
data can be represented by flattening the structures. For example, Users: bind.so
given this C code listen.so
struct S { int x; int y; }; accept.so
void f(S* s); read.fd
write.fd
the tracer could record interactions with f with instances of this close.fd
skeleton

f(S, S_x, S_y) Figure 7: Attributes of socket interactions that define and use
their values.
By convention, this paper names traces with the letter T and in-
teractions with variations of the letter t. The actual interactions in
a trace of length n + 1 are numbered from 0 to n; for example, Flow dependences connect attributes that change the state of an
T = t0 , . . . , tn . The notation t.a denotes the a attribute of the abstract object (that is, attributes that define the object) to interac-
interaction t. tion attributes that depend on the state of an abstract object (that
is, attributes that use the object). Ideally, the dependence analyzer
Flow dependence annotator would annotate a trace with flow dependences using no information
beyond the trace itself. Our current system, however, relies on an
Untyped trace expert to tell the analyzer which attributes of interactions may de-
Traces Dependence analysis with dependences fine objects, and which attributes may use objects. This work must
be done once for each API/ADT. Extending the system to infer the
sets of definers and users automatically is future work.
Type inference For simplicity, the examples in this paper assume that only
socket-valued attributes of the interactions in Figure 2 carry depen-
dences. Figure 7 lists attributes of interactions in Figure 2 that de-
fine and use socket values. We constructed this table as follows. For
Annotated each socket, the kernel maintains a hidden data structure. Some of
traces the fields of that structure carry the state of the socket: whether the
socket is closed or open, whether or not it can accept connections,
Figure 6: Detailed view of the flow dependence annotator. and so on. Other fields simply hold data: bytes that are outstanding,
the port to which the socket is connected, and so on. Definers in
Figure 2 typically modify one or more of the state fields of the data
Flow dependence annotation Flow dependence annotation anno- structure. Users typically read one or more of those fields. Fields
tates each input trace with flow dependences and type assignments. of the structure that merely hold data are ignored.
The scenario extractor uses these annotations to extract scenarios— Creating Figure 7 required expert knowledge. However, note that
small sets of dependent interactions—from the trace and to put each whenever the state fields of a socket’s data structure change, the set
scenario into a canonical form. The detailed view in Figure 6 shows of API calls that may follow also changes. For example, after a
that flow dependence annotation is a two-step process. First, de- socket is closed, read and write calls are no longer allowed. The
pendence analysis marks the trace with flow dependences, which fact that close changes the state of the socket can be inferred from
constrain how interactions may be reordered and identify related the trace: before a close, there are reads and writes; after a
interactions that could be grouped into a scenario for the automa- close, there are no reads and writes. That is, interactions that
ton learner. Next, type inference assigns a type to each interaction change state also change the sorts of interactions that may follow.
attribute in the trace. The scenario extractor uses the types to avoid In future work, we hope to replace the expert with an automatic
naming conflicts when it puts a scenario into standard form. De- tool that uses this fact to infer the sets of definers and users.
pendence analysis and type inference both examine the entirety of Given the lists of attributes that define or use objects, dependence
each input trace, so their running time must be nearly linear. analysis is a dynamic version of the reaching definitions problem.
The miner treats all values as abstract objects whose underly- The analyzer traverses the trace T = t0 , . . . , tn in order from t0 to
ing representation is unknown. However, interactions can depend tn , maintaining a table M that maps values to attributes of actual
on results from other interactions. For example, in Figure 2, the interactions. Initially, M is empty. If ti .a defines an object o, the
bind call (line 2) depends on the socket call (line 1), because annotator updates M to map o to ti .a. If ti .a uses an object o and
the bind call uses file descriptor 7 returned by the socket call. M maps o to ti′ .a′ , then the analyzer places a flow dependence
The order of these two interactions can not be reversed. By con- from ti .a to ti′ .a′ . The running time of the algorithm scales lin-
trast, the interactions that manipulate file descriptor 8 (lines 4–9) early in the length of the trace. The space required scales linearly
could be exchanged with the interactions that manipulate file de- in the number of different values referenced by the trace.
scriptor 10 (lines 10–13), since these groups of operations are in- For notational convenience, we introduce a relation df such that
dependent of each other. More importantly, a scenario that contains df (ti .a, ti′ .a′ ) if and only if there is a flow dependence from ti .a
all interactions related to the close on line 13 should not include to ti′ .a′ . The relation is extended from interaction attributes to
the interactions on lines 4–9. interactions in the natural way: df (ti , ti′ ) holds if and only if there
Type(socket.return) = T0 bind y to more than one instance of ns as it simulates the loop in
Type(bind.so) = T0 lines 7–17.
Type(listen.so) = T0 The scenario extractor simplifies and standardizes the scenarios
Type(accept.so) = T0 before passing them to the automaton learner, because our speci-
Type(accept.return) = T0 fication mining system uses an off-the-shelf PFSA learner. An al-
Type(read.fd) = T0 ternative, which we have not tried, is to design a special-purpose
Type(write.fd) = T0 learner for scenarios. Both schemes have benefits and costs.
Type(close.fd) = T0 There are several off-the-shelf learners that learn PFSAs and
similar automata from strings. Since our design transforms sce-
narios to strings, a new learner can be substituted for the learner
Figure 8: The only valid typing for the skeleton attributes used currently used. If the new learner learns PFSAs, no changes to the
by the trace in Figure 2. mining system are necessary. If the learner does not learn PFSAs,
the corer may need to be changed, but none of the components be-
fore the automaton learner in Figure 4 would require modification.
is some f and f ′ such that df (ti .a, ti′ .a′ ). In our experience, this flexible design was helpful. Before settling
Type inference is the next step in the flow dependence annotator. on the PFSA learner as use it now, we tried and rejected one other
Type inference assigns a type to each skeleton attribute that is in- PFSA learner [21].
volved in dependences. If a value never flows between an instance On the other hand, a special-purpose learner could defer deci-
of one skeleton attribute and an instance of another skeleton at- sions that our mining system now must make before invoking the
tribute, then type inference assigns the skeleton attributes separate off-the-shelf learner. For example, when the scenario extractor re-
types. Strictly speaking, flow dependences alone give the scenario places the concrete values in a scenario with abstract names, it does
extractor enough information to extract scenarios and put them into so without regard to the names given to values in other scenarios.
a standard form. However, as Section 4 explains, the scenario ex- Although the extractor always names equivalent scenarios in the
tractor can use the assurance that values will never flow between exact same way (see below for details), when two scenarios are
certain attributes in a scenario to reduce naming conflicts. Type “close” but not equivalent, the extractor’s choice of names can pre-
inference infers a typing that satisfies this condition: vent the PFSA learner from merging states that it would be able to
merge with a different naming.
If df (ti .a, ti′ .a′ ), then the typing gives the skeleton
attribute of ti .a and the skeleton attribute of ti′ .a the
same type.
4.1 Scenario extraction

Figure 8 gives a typing for the skeleton attributes used by the Annotated Scenario extractor
traces
socket trace in Figure 2. In this example, every skeleton attribute
must have the same type because all socket attributes in the trace Extraction scenarios
are on some dependence chain with an instance of a close.fd Scenario seeds
attribute.
The inference algorithm uses Tarjan’s union-find algorithm [26]
and requires time nearly linear in the trace. The type inferer Simplification simplified scenarios

starts with an initial typing that gives each skeleton attribute


its own unique type. Then, the inferer visits each dependence
df (ti .a, ti′ .a′ ) and unifies the types of the skeleton attribute of ti .a
Standardization
and the skeleton attribute of ti′ .a. Type inference is complete when
all dependences have been visited.

Abstract
4. SCENARIO EXTRACTION AND scenario strings
AUTOMATON LEARNING
This section explains how the scenario extractor and automaton Figure 9: Detailed view of the scenario extractor.
learner work. The first tool extracts interaction scenarios—small
sets of interdependent interactions—from annotated traces and pre-
pares them for the automaton learner. The second tool infers spec- Figure 9 is a detailed view of the scenario extractor. It receives
ifications from scenarios, not complete traces, for two reasons. two inputs. The first is a set of traces, annotated as described in
The primary reason is that scenarios are much shorter than traces Section 3. In addition, the user controls which scenarios will be
and the running time of our PFSA learner increases as the third extracted by supplying a set of scenario seeds. Each seed is an
power of the length of its input—this is typical for automaton learn- interaction skeleton. The extractor searches the input traces for
ers. interactions that match the seeds and extracts a scenario from each
Also, we restrict scenarios to refer to a small number of objects interaction. For example, suppose the extractor was given the trace
by bounding the size of the scenario. Section 2 argues that bound- of socket interactions in Figure 2 and accept(so, return) as
ing the number of objects makes specification mining tractable. the seed. The extractor would produce two scenarios, one around
Bounding the number of objects is not a severe limitation because the accept on line 4 and the other around the accept on line
verification tools can verify that the specification holds for multiple 10.
bindings of program objects to specification objects. For example, Extraction Producing scenarios from input traces is the first step of
although the protocol specified in Figure 1 mentions two objects, x the extraction process. Informally, a scenario is a set of interactions
and y, a tool that attempts to verify the program in Figure 1 might related by flow dependences. Formally, given an annotated trace
1 socket(domain = 2, type = 1, proto = 0, 1 socket(return = 7)
return = 7) 2 bind(so = 7)
2 bind(so = 7, addr = 0x400120, addr_len = 6, 3 listen(so = 7)
return = 0) 4 accept(so = 7, return = 8) [seed]
3 listen(so = 7, backlog = 5, return = 0) 5 read(fd = 8)
4 accept(so = 7, addr = 0x400200, 6 write(fd = 8)
addr_len = 0x400240, 7 read(fd = 8)
return = 8) [seed] 8 write(fd = 8)
5 read(fd = 8, buf = 0x400320, len = 255, 9 close(fd = 8)
return = 12)
6 write(fd = 8, buf = 0x400320, len = 12,
return = 12) Figure 11: The simplification of the scenario in Figure 10.
7 read(fd = 8, buf = 0x400320, len = 255,
return = 7)
8 write(fd = 8, buf = 0x400320, len = 7, 1 socket(return = x0:T0) (A)
return = 7) 2 bind(so = x0:T0) (B)
9 close(fd = 8, return = 0) 3 listen(so = x0:T0) (C)
4 accept(so = x0:T0, return = x1:T0) [seed] (D)
5 read(fd = x1:T0) (E)
7 read(fd = x1:T0) (E)
Figure 10: A scenario extracted from around line 4 of Figure 2, 6 write(fd = x1:T0) (F)
with N = 10 8 write(fd = x1:T0) (F)
9 close(fd = x1:T0) (G)

T = t0 , . . . , tn , a scenario is a set S ⊆ {t0 , . . . , tn } with the


property: Figure 12: Scenario string for the simplified scenario from Fig-
ure 11.
If ti0 ∈ S, tin ∈ S, and ti0 , . . . , tin is a chain of
flow dependent interactions in T , then tij ∈ S for any
0 ≤ j ≤ n.
Sar = {t ∈ [ta , td ] | ∃t′ ∈ Sa .t′ reaches t}
The extractor builds a scenario around each interaction in the trace Sdr = {t ∈ [ta , td ] | ∃t′ ∈ Sd .t′ reachesinreverse t}
that matches a scenario seed. For any scenario S, seed (S) ∈ S is
the interaction that initially matches the seed.
A user-tunable parameter N restricts the number of interactions The final scenario is S = Sad ∪ (Sar ∩ Sdr ). Figure 10 shows a
in the extracted scenarios. Each scenario contains at most N an- scenario extracted from the trace in Figure 2 with N = 10, around
cestors and at most N descendants of the seed interaction. The the accept on line 4. The seed is marked. Also note that the
extractor prefers ancestors and descendants whose position in the interactions in S inherit the dependences from the annotated trace.
input trace is close to the position of seed interaction. Simplification Given the extracted scenarios, simplification elimi-
Once an interaction ts matching a seed is found, the extractor nates all interaction attributes that do not carry a flow dependence
uses a two-step algorithm to produce a scenario. First, the extractor in any training traces. The typing inferred by the dependence an-
constructs the sets: notator (see Section 3) assigns a type to an skeleton attribute if and
only if an instance of that attribute is involved in a flow dependence
somewhere in a trace. So, simplification preserves an interaction at-
Sa = {N closest ancestors of ts } tribute if and only if the corresponding skeleton attribute is typed.
Figure 11 is the simplified version of the scenario in Figure 10.
Sd = {N closest descendants of ts }
Standardization Standardization converts a scenario into a sce-
Sad = {ts } ∪ Sa ∪ Sd nario string for the PFSA learner. Standardization improves the
performance of the PFSA learner by producing scenario strings so
that similar scenarios receive similar strings.
The extractor uses a simple prioritized worklist algorithm to con- Figure 12 shows the result of standardizing the scenario in Fig-
struct the set of ancestors (descendants). The initial worklist is the ure 11. Standardization applies two transformations: naming and
set of immediate ancestors (descendants) of ts . Repeatedly, un- reordering.
til the worklist is empty or N ancestors (descendants) are found, Naming replaces attribute values with symbolic variables. In
the extractor removes from the worklist the ancestor (descendant) Figure 12, value 7 is replaced with the symbolic name x0:T0, and
whose position in the trace is nearest ts , adds it to the set of ances- value 8 is replaced with the symbolic name x1:T0. Naming ex-
tors (descendants), and adds its immediate ancestors (descendants) poses similarities between different scenarios by naming flow de-
to the worklist. pendences. For example, a scenario extracted around line 10 of
The result, Sad , is not necessarily a scenario, because interac- Figure 2 manipulates different socket values (7 and 10 instead of
tions along some flow dependence chains from ancestors of ts to 7 and 8), but naming still calls one of these values x0:T0 and the
descendants of ts might be missing. Any such interactions must other x1:T0.
lie in the trace between the earliest ancestor ta in Sad and the lat- When a value flows from one attribute to another, naming indi-
est descendant td in Sad , and must be reachable both by following cates the dependence by assigning the same name to both attributes.
flow dependences from some ancestor of ts and by following flow The dependence annotation typing (section 3) guarantees that, if
dependences in reverse from some descendant of ts . Thus, the ex- two skeleton attributes are assigned different types, values never
tractor searches depth-first forwards from each element of Sa and flow between instances of those attributes. Thus, naming uses a
backwards from each element of Sd to construct separate namespace for attributes of each type. Figure 13 illus-
Original S0 S1 S S
0 1
1 A(x=0, y=0) [seed] E(x=0, v=1) [seed]
2 B(x=0, y=0) B(x=0, y=0) σ0 σ1
3 C(x=0, y=0) C(x=0, y=0)

Untyped
1 A(x=x0, y=x1) E(x=x0, v=x1)
2 B(x=x0, y=x1) B(x=x0, y=x2) Γ0 Γ1
3 C(x=x0, y=x1) C(x=x0, y=x2)

Typed
1 A(x=x0:T0, y=x0:T1) E(x=x0:T0, v=x0:T2) Figure 15: Equivalent scenarios.
2 B(x=x0:T0, y=x0:T1) B(x=x0:T0, y=x0:T1)
3 C(x=x0:T0, y=x0:T1) C(x=x0:T0, y=x0:T1)
Naive(S)
MaxSize := maximum size of a scenario
Figure 13: Two nearly equivalent scenarios and their scenario X := a totally ordered set of MaxSize symbolic names
strings, with untyped and typed naming. AllStrings := ∅
Permutes := all dependence-preserving permutations of S
Foreach σ ∈ Permutes
Namings := all dependence-preserving namings of σ(S) from X
Foreach Γ ∈ Namings
Equivalent Standardization
Add Γ(σ(S)) to AllStrings
scenarios
Return the lexicographically smallest element of AllStrings

Simplified scenarios Scenario strings Figure 16: Naive standardization algorithm.

swap the source and sink of any dependence. Figure 12 illustrates


Figure 14: Standardization, as a many-to-one mapping. a dependence-preserving permutation that swaps the read on line
6 with the write on line 7.
A naming Γ of S replaces each value in S with a symbolic name,
taken from a set X. If si .a is an attribute in S, Γ(si .a) is the
trates how separate namespaces help expose more similarities to symbolic name given to that attribute in Γ(S). We say that Γ is
the PFSA learner. Lines 2 and 3 of S0 and S1 are the same, but dependence-preserving if, for any si .a and si′ .a′ , d(si .a, si′ .a′ ) ≡
line 1 differs in each scenario. Assume that naming assigns names Γ(si .a) = Γ(si .a′ ).
to each interaction in turn, starting at the seed interaction. Without Now let S0 = s0,0 , . . . , s0,n and S1 = s1,0 , . . . , s1,n be
types, naming treats lines 2 and 3 differently. two simplified scenarios. S0 and S1 are equivalent iff there are
Reordering standardizes the order of scenario interactions. A dependence-preserving permutations σ0 of S0 and σ1 of S1 and
scenario contains interactions that are partially ordered by flow, dependence-preserving namings Γ0 of σ0 (S0 ) and Γ1 of σ1 (S1 )
anti, and output dependences. That is, each scenario corresponds to such that Γ0 (σ0 (S0 )) = Γ1 (σ1 (S1 )) (Figure 15). In fact, the
a directed acyclic graph (DAG). The order in which the interactions choice of σ0 and Γ0 is not important. We assert that if S0 and
appear in the original traces is just one legal total order. Reorder- S1 are equivalent, then for any dependence-preserving σ0 of S0
ing puts two scenarios with the same DAG into the same total order, and dependence-preserving Γ0 of σ0 (S0 ), there is a dependence-
even when their trace order differs, so that a PFSA learner is pre- preserving σ1 of S1 and a dependence-preserving Γ1 of σ1 (Γ1 )
sented with fewer distinct strings. In Figure 12, reordering swapped such that Γ0 (σ0 (S0 )) = Γ1 (σ1 (S1 )).
the write on line 6 with the read on line 7. Figure 16 presents a naive standardization algorithm. Naive
To a PFSA learner, each interaction in a scenario string is merely tries all dependence-preserving permutations of S and all
an atomic letter. To emphasize this point, the right-hand side of dependence-preserving namings of each permutation and returns
Figure 12 replaces each interaction with a shorthand letter. Stan- the scenario string that comes first in lexicographic order. If Naive
dardization uses a small number of letters to represent a given set assigns S0 and S1 the same scenario string, then S0 and S1 are
of scenarios. Using a small alphabet increases the PFSA learner’s equivalent, since the algorithm has found permutations and nam-
opportunities to find similarities in the scenario strings. Also, PFSA ings that make them equal. And, if S0 and S1 are equivalent, then
learners run more slowly with large alphabets. Naive generates the same AllStrings set for both of them.
The rest of this section discusses our standardization algorithm in So, equivalence characterizes the preimage of Naive, as promised.
detail. At a high level, standardization is a many-to-one mapping However, the running time of the algorithm is exponential in |X|
from simplified scenarios to scenario strings (Figure 14). Under and |S|.
this mapping, the preimage of a scenario string is a set of equiva- The algorithm in Figure 17 removes the exponential behavior in
lent scenarios. Intuitively, equivalent scenarios manipulate abstract |X| by considering only one standard naming for each permuted
objects in the same way. In the following, we define equivalence, scenario. This optimization is safe because if S0 and S1 are equiv-
present our standardization algorithm, and show that equivalence alent up to a dependence-preserving naming, then they differ only
characterizes the scenarios that standardization maps to the same in their values, and StandardName does not depend on the iden-
scenario string. tities of the values at attributes, but only on their types and the
Let S = s0 , . . . , sn be a simplified scenario. A dependence- dependences that they carry.
preserving permutation of S is a permutation σ of S such that if StandardName draws names from separate name spaces for
d(si , si′ ), then σ(i) < σ(i′ ). That is, the permutation does not separate types. The GetNextName operation returns the next
NameInteraction(s) OfLeastSkeletons(S)
Foreach attribute s.a return {s ∈ S | ¬∃s′ .skeleton of s′ precedes
Type := the type of s.a’s skeleton attribute skeleton of s lexicographically}
Value := the value at s.a
NameSpace := name space for Type RestrictedPermutations(S, Pos)
If NameSpace[Value] has not been set Permutes := ∅
NameSpace[Value] := GetNextName(NameSpace) Ready := {s ∈ S | ¬∃s′ ∈ S.d(s′ , s)}
Replace Value with NameSpace[Value] in s.a * Selected := OfLeastKinds(Ready)
Foreach s ∈ Selected
StandardName(S = s0 , . . . , sn ) Rest := RestrictedPermutations(S − {s}, Pos + 1)
is := index of the seed in S Foreach σr ∈ Rest
NameInteraction(sis ) σ := σr ∪ {Pos → s}
dist := 1 Permutes := Permutes ∪ {σ}
While is − dist ≥ 0 or is + dist ≤ n Return Permutes
If is − dist ≥ 0 NameInteraction(sis −dist )
If is + dist ≤ n NameInteraction(sis +dist ) Standardize(S)
dist := dist + 1 Reset all name spaces
AllStrings := ∅
Better(S) Permutes := RestrictedPermutations(S, 0)
Reset all name spaces Foreach σ ∈ Permutes
AllStrings := ∅ Snamed := StandardName(σ(S))
Permutes := all dependence-preserving permutations of S Add Snamed to AllStrings
Foreach σ ∈ Permutes Return the lexicographically smallest element of AllStrings
Snamed := StandardName(σ(S))
Add Snamed to AllStrings
Return the lexicographically smallest element of AllStrings Figure 18: Final standardization algorithm.
start

Figure 17: Better standardization algorithm. 5

5
10000
available name in a name space in some fixed order, and resetting a
name space causes it to begin again with the first name in the space. 10000
5 10000
StandardName names the seed interaction first, and then works
outward. Because the name of a value must be chosen consistently, 5
the constraints on naming increase as interactions are named. In-
teractions near seeds are most likely to be similar across scenarios,
final
so they are named first.
The worst-case running time of Better is still exponential in
Figure 19: A PFSA for which dropping edges with low weights
|S|. We can not expect to do better in the worst-case, because
does not identify the hot core. Edge labels are omitted.
Better can be used to solve the DAG-isomorphism problem by
encoding arbitrary DAGs as scenarios, and DAG-isomorphism is
NP-complete. However, better performance is possible in the com-
mon case, since trace scenarios are not arbitrary DAGs. In particu- NFA.
lar, the interactions in a scenario have names and named attributes. The PFSA learner is an off-the-shelf learner [22] that learns a
Our final standardization algorithm (Figure 18) uses those names PFSA that accepts the training strings, plus other strings. The
to reduce the number of permutations it considers. learner is a variation on the classic k-tails algorithm [4]. Briefly,
Standardize considers only dependence-preserving permu- the k-tails algorithm works as follows. First, a retrieval tree is con-
tations that put the skeletons of the interactions in the smallest pos- structed from the input strings. The algorithm then computes all
sible lexicographic order. Although there can be an exponential strings of length up to k (k-strings) that can be generated from each
number of such orderings, there is often only one. In that case, the state in the trie. If two states qa and qb generate the same k-strings,
set of interactions Selected (line *) always has one element, and they are merged. The process repeats until no more merges are pos-
the recursion never branches. With an appropriate implementation sible. The PFSA learner modifies k-tails by comparing how likely
of OfLeastKinds (which sorts the interactions), the algorithm two states are to generate the same k-strings.
runs in that case in time proportional to n log n. In our experience, The resulting PFSA accepts a superset of all the strings in the
the time spent running Standardize is an insignificant part of training scenarios, due to the generalizations performed by the
the scenario extraction time. learner. The parameter N that controls the size of the extracted
is chosen by the user to be large enough to include all of the inter-
4.2 Automaton learning esting behavior. It is therefore very likely that the ends of the train-
This section presents the algorithms and data structures used in ing scenarios contain uninteresting behavior. This is in fact what
learning the specification automaton. The automaton A is an NFA we see experimentally: the typical PFSA has a “hot” core with a
with edges labelled by standardized interactions, whose language few transitions that occur frequently, with the core surrounded by
includes the most common substrings of the scenario strings ex- a “cold” region with many transitions, each of which occurs infre-
tracted from the training traces, plus other strings that the PFSA quently. The corer whittles away the “cold” region, leaving just the
learner adds as it generalizes. Automaton learning has two steps. “hot” core.
First, an off-the-shelf learner learns a PFSA. Then, the corer re- The corer can not simply drop edges with low weights. Consider
moves infrequently traversed edges and converts the PFSA into an the PFSA in Figure 19 (edge labels are not important and are omit-
-1 Satisfies(S, Spec)
-1 Simplification
Standardization
Language of A Concrete scenarios
If S is in the language of Spec
Simplified scenarios
satisfying A satisfying A Return true
Scenario strings
satisfying A
Else Return false

Verify(T = t0 , . . . , tn , Spec, MaxSize)


Loop:
Foreach ti ∈ SeedsOf(T )
Figure 20: Scenarios that satisfy A. Size := 0
While Size ≤ MaxSize
Scenarios := Extract∗ (ti , Size)
Foreach S ∈ Scenarios
Sstd := Standardize(S)
ted). Four edges have a weight of 5, which is low compared to the If Satisfies(Sstd , Spec)
three edges with a weight of 10000. However, any string through Next Loop
this PFSA must traverse the edge out of the start state and the edge Size := Size + 1
into the end state. Despite their low weight, a string is more likely Return Fails(ti )
to traverse these edges than it is to traverse the edges with a weight
of 10000. Thus, a better measure of an edge’s “heat” is its likeli-
hood of being traversed while generating a string from the PFSA. Figure 21: Trace verification algorithm.
The problem of computing this measure is known as the Markov
chain problem [15]. The problem reduces to inverting a square ma-
trix with the number of rows and columns equal to the number of cessively larger scenarios until it finds a satisfactory one or until
transitions in the PFSA. it reaches the maximum scenario size. Because interactions in the
After computing the heat of each edge, the corer removes all trace are not necessarily ordered as they were in the training traces,
edges below a cutoff parameter, removes unreachable states from the algorithm does not use exactly the same extraction algorithm
the PFSA, and drops the frequencies on the edges. The result is an as the learner. Instead, Extract∗ (ti , Size) returns all scenarios
NFA, which a human can validate by inspection. seeded by ti with a total of exactly Size ancestors and descen-
dants. The distance between the seed and its ancestors and descen-
5. VERIFICATION dants is not important.
This section discusses how verification tools can use the miner’s
specifications. Program verification tools distinguish programs that 6. EXPERIMENTAL RESULTS
satisfy a specification from programs that do not. Before we can This section presents the results of an experiment in mining spec-
discuss these tools, we must clarify what we mean by “satisfying a ifications from traces of X11 programs.
specification”. We analyzed traces from programs that use the Xlib and X Toolkit
Let A be a specification. By construction, the language of A con- Intrinsics libraries for the X11 windowing system. The traces record
tains a set of scenario strings (Figure 20). The containment might an interaction for each X library call and callback from the X library
be strict since the automaton learner can introduce strings into A to client code. The interaction attributes include all arguments and
that are not scenario strings. Because standardization is a many- return values of calls, plus the fields of the structures that represent
to-one mapping (see Figure 14), each scenario string corresponds X protocol events. The tracing tool uses the Executable Editing
to a set of simplified scenarios. In turn, each scenario string corre- Library (EEL) [17] to instrument Solaris/SPARC executables.
sponds to a set of concrete scenarios. Figure 20 shows the chain of Traces were collected from full runs of widely distributed pro-
mappings. We say that the scenarios of Figure 20 satisfy A. grams that use the X11 selection mechanism. We studied the selec-
Now let T be an interaction trace. We say that T satisfies A if for tion mechanism since the Interclient Communication Conventions
every seed interaction is ∈ T , there is an interaction scenario Sis Manual (ICCCM) [25] gives English descriptions of several rules
seeded by is such that Sis satisfies A. We say that a program P sat- for how well-behaved programs should use the mechanism. The
isfies a specification A if any interaction trace T of P ’s execution experiment concentrated on a rule that specifies how programs ob-
satisfies A. tain ownership of the selection: the rule says that a client calling
Constructing program verification tools for specifications is out- XtOwnSelection or XSetSelectionOwner must pass in a
side the scope of this paper, but is the subject of ongoing research. timestamp derived from the X event that triggered the call.
There are two ways that such a tool could work. First, the tool Table 1 lists each program studied, its origin (either the X11 dis-
could construct a scenario that satisfies A for each interaction seed tribution or the X11 contrib directory), the number of static calls
encountered while simulating some abstraction of P , reporting an to the X library routines chosen as seeds, and the number of training
error if no such scenario can be constructed for some seed. Al- scenarios extracted from each trace. One of the authors gathered
ternatively, the tool could first translate A into an automaton that the traces by running each program for a few minutes, while trying
generates traces instead of scenario strings. The trace automaton to exercise the selection code by doing cut-and-paste operations, as
generates all traces that satisfy A. The verification tool would then well as exercising as much other functionality as possible in a short
exhaustively search for a trace that is not in A, reporting an error if time.
one is found. Both sorts of tools must be able to simulate simplifi- Specification mining depends on a sizable training set of well-
cation and standardization. debugged traces. In our case, the number of training traces was
Figure 21 shows a trace verification algorithm (not a program small, and as it turned out, several contained violations of the rule.
verification algorithm) that works in the first way. This is the al- As a result, the miner was not able to discover the rule when trained
gorithm used in our experiments (see Section 6). Verify takes on all of the programs. In order to learn the rule, we needed to
a trace, a specification, and a maximum scenario size. It attempts remove the buggy traces from the training set. We hypothesized
to verify that the trace satisfies the specification by extracting suc- that our miner could help find the bugs, even with a poor training
Name Source Static seeds Scenarios Name Verifies? Reason for failure Action
bitmap distrib 1 6 xcb n/a n/a accept
xclipboard distrib 2 2 bitmap no spec. too narrow accept
xconsole distrib 1 1 ups no bug! reject
xcutsel distrib 1 4 ted no spec. too narrow accept
xterm distrib 1 6 rxvt yes n/a accept
clipboard contrib 1 2 xterm no spec. too narrow accept
cxterm contrib 1 9 display no spec. too narrow accept
display contrib 4 16 xcutsel no spec. too narrow accept
e93 contrib 1 2 kterm yes n/a accept
kterm contrib 1 4 pixmap yes n/a accept
nedit contrib 2 2 cxterm yes n/a accept
pixmap contrib 1 11 xconsole no benign violation reject
rxvt contrib 1 4 nedit no spec. too narrow accept
ted contrib 3 9 e93 no bug! reject
test canvas contrib 1 4 xclipboard no benign violation reject
ups contrib 1 3 clipboard no benign violation reject
xcb contrib 1 11

Table 2: Results of processing each client program, in the order


Table 1: X11 client programs studied in the experiment. in which they were processed.

set. Identifying the buggy traces without the miner would require Table 2 lists the client programs in the order in which they were
inspecting each trace manually for bugs. processed. Out of the first six traces accepted (not including the ini-
Using the miner, we predicted that, while we would have to in- tial trace), five were rejected by an overly narrow specification. At
spect the first few traces, once a few correct traces had been col- this point, the specification seemed to stabilize: out of the next four
lected, the miner’s rule could be used to automatically validate the accepted, only one was initially rejected by the dynamic verifier.
remaining traces. In this experiment, we arranged the client pro- The expert did not have to inspect 4 out of the 16 the traces, which
grams in random order and went through the following iterative supports the second part of Hypothesis 1. We conjecture that if the
process: process had continued, the false rejection rate would have contin-
ued to drop.
Run the first program and gather a trace Five of the programs violated the rule in the ICCCM. We found
Mine a specification from the trace three programs with benign violations of the specification and two
Expert examines the specification programs with bugs. The specification applies to programs that use
Expert extracts hot core the selection mechanism to do cut-and-paste, while the programs
If the specification is not correct with benign violations used the selection mechanism to implement
Select another random order and start over their own communication protocol. These violations indicate that
For each remaining client program in order the rule described by the ICCCM is not universally applicable and
Run the program and gather a trace that the document should be clarified. Thus, the specification miner
Verify the trace against the specification helped find bugs and an documentation omission (an unexpected
If verification succeeds benefit).
Add the trace to the training set Figure 22 is the specification from the experiment. For legibil-
Generate a new specification ity’s sake, the figure omits some arguments. These arguments did
Else not participate in dependences within the core of the specification.
Examine the scenarios that failed The specification is compact, with six states and nine edges. It also
If no scenario violates the ICCCM rule matches the English rule very closely, with most complexity aris-
Add the trace to the training set ing from the several ways in which the X API receives an event.
Generate a new, more general specification In addition, the specification exposes a common pattern in which
Else the client calls XSetSelectionOwner repeatedly until XGetSelec-
Report the bug tionOwner indicates that the call was successful.
Our final hypothesis was that the corer and the expert would
For each trace that fails to verify, the expert either marks it as
buggy or includes it in the training set. The expert decides whether agree on which states in the final PFSA should be thrown out. The
the initial specification is correct: in our experiment, we accepted final PFSA had 27 states. The expert, who did not have access to the
the initial specification if we did not see any obvious bugs in the corer’s results, threw out 15 of these and retained 12; the remaining
twelve were merged to form the NFA in Figure 22. The corer and
first set of training scenarios. The expert also needs to extract the
the expert disagreed on five out of the 27 states, or 19%. The corer
hot core, since the training set is too small to use the corer.
The experiment tested three hypotheses: assigned likelihoods lower than 6% to 13 of the 15 deleted states,
and likelihoods higher than 13% to 9 of the 12 retained states. The
Hypothesis 1 The process will find bugs and reduce the number of other 2 deleted states had likelihoods of 13% and 20%, while the re-
traces that the expert must inspect. maining retained states had likelihoods of 5%, 6%, and 9%. Thus,

Hypothesis 2 The miner’s final specification will match the rule in 7. RELATED WORK
the ICCCM.
Ernst et al. also proposed automatic deduction of formal specifi-
Hypothesis 3 The corer and the human will agree on which states cations [11]. Their Daikon tool works by learning likely invariants
in the final PFSA belong in the final specification. involving program variables from dynamic traces. The resulting
ture from traces [23], but they model the sequence of operations on
G
START individual objects, not the data and temporal dependences across
3
several objects.
A C
H
8. CONCLUSION
B
This paper addresses an important problem in the program-
4
1 verification tool-chain, namely the problem of semi-automatic for-
E mulation of correctness specifications that could be accepted by
F model checkers and other similar tools. We have formulated the
5 problem as a machine learning problem and provided an algo-
D E
rithm based a reduction to finite automaton learning. While some
D E more experimental work remains ahead of us, initial experience is
promising.
2 END

A = XNextEvent(time = X21_0)
Acknowledgements
B = XNextEvent(time = X21_0) or B = XtDispatchEvent(time = X21_0) We thank Anand Raman, Peter Andreae, and Jon D. Patrick for al-
or B = XIfEvent(time = X21_0)
C = XtDispatchEvent(time = X21_0) or C = XtEventHandler(time = X21_0) lowing us to use their PFSA learner, as well as the anonymous refer-
or C = XtLastTimeStampProcessed(time = X21_0) ees for their many helpful comments on an early draft of this paper.
D = XGetSelectionOwner This work was supported in part by the National Science Founda-
E = XSetSelectionOwner(time = X21_0)
F = XtOwnSelection(time = X21_0)
tion with grants CCR-0093275 and CCR-0103670, the University
G = XtActionHookProc(time = X21_0) of Wisconsin Graduate School, and donations from IBM and Mi-
H = XInternAtom crosoft Research.

Figure 22: The NFA from the selection ownership specification.


9. REFERENCES
[1] Thomas Ball, Rupak Majumdar, Todd Millstein, and
Sriram K. Rajamani. Automatic predicate abstraction of C
formal specifications is the key difference between their approach programs. In Proceedings of the 2001 ACM SIGPLAN
and ours. Daikon’s specifications are arithmetic relationships that Conference on Programming Language Design and
hold at specific program points (e.g., a precondition x < y at entry Implementation, volume 36 of ACM SIGPLAN Notices,
to a procedure f ). By contrast, our specifications express temporal pages 203–213, July 2001.
and data-dependence relationships among calls to an API. Our tem-
[2] Thomas Ball and Sriram K. Rajamani. Automatically
poral specifications capture a different aspect of program behavior validating temporal safety properties of interfaces. In
than Daikon’s predicates on values and structures. The two forms Proceedings of the 8th International SPIN Workshop on
of specifications are complementary, but naturally require radically Model Checking of Software, number 2057 in Lecture Notes
different learning algorithms. in Computer Science, pages 103–122, May 2001.
Recently, Ernst et al. presented techniques for suppressing parts
[3] Thomas Ball and Sriram K. Rajamani. Bebop: a
of their learned specifications that are not useful to a program-
path-sensitive interprocedural dataflow engine. In
mer [19]. In the context of our temporal specifications, this result
Proceedings of the 2001 ACM SIGPLAN-SOGSOFT
corresponds to appropriately selecting the heavy core of the initial
Workshop on Program Analysis for Software Tools and
PFSA.
Engineering, ACM SIGPLAN Notices, pages 97–103, July
Another related tool is Houdini [12], an annotation assistant for
2001.
ESC/Java. Starting from an initial (guessed) candidate set of anno-
tations, which are similar to those of Daikon, Houdini uses ESC/Java [4] A.W̃. Biermann and J.Ã. Feldman. On the synthesis of
to refute invalid annotations. The focus of Houdini is on annotating finite-state machines from samples of their behaviour. IEEE
points of a single program with true properties, while the focus of Transactions on Computers, 21:591–597, 1972.
our tool is on discovering temporal properties that hold across all [5] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A
programs that use an interface. static analyzer for finding dynamic programming errors.
Other authors have described tools that extract automaton-based Software Practice and Experience, 30:775–802, 2000.
models. Cook and Wolf describe a tool for extracing FA models [6] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem,
of software development processes from traces of events [8]. Our and Dawson Engler. An empirical study of operating systems
work differs in that we extract specifications from program traces, errors. In Proceedings of the 18th ACM Symposium on
which must be reduced to a simpler form before they are palat- Operating Systems Principles (SOSP18), pages 73–88,
able for an FA learner. Ghosh et al. describe several techniques October 2001.
for learning the typical behavior of programs that make system [7] Douglas E. Comer and David L. Stevens. Internetworking
calls [13, 18]. Since they intend their models for intrusion detec- with TCP/IP. Client-server Programming and Applications,
tion, the models need only characterize a particular program’s be- BSD Socket Version. Prentice-Hall, Englewood Cliffs, NJ
havior, while our miner finds rules that are generally applicable and 07632, USA, 1993.
understandable by humans. Wagner and Dean’s intrusion detection [8] Jonathan E. Cook and Alexander L. Wolf. Discovering
system also extracts automaton models, but from source code, not models of software processes from event-based data. ACM
traces [27]. Their system also extracts models that apply only to Transactions on Software Engineering and Methodology,
a single program. Finally, Reiss and Renieris also extract struc- 7(3):215–249, July 1998.
[9] Robert DeLine and Manuel Fähndrich. Enforcing high-level [25] David Rosenthal. Inter-client communication conventions
protocols in low-level software. In Proceedings of the manual (ICCCM), version 2.0. X Consortium, Inc. and Sun
SIGPLAN ’01 Conference on Programming Language Microsystems, 1994. Part of the X11R6 distribution.
Design and Implementation (PLDI), pages 59–69, June 2001. [26] Robert Endre Tarjan. Efficiency of a good but not linear set
[10] Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, union algorithm. Journal of the ACM, 22(2):215–225, 1975.
and Benjamin Chelf. Bugs as deviant behavior: a general [27] David Wagner and Drew Dean. Intrusion detection via static
approach to inferring errors in system code. In Proceedings analysis. In Proceedings of the 2001 IEEE Symposium on
of the 18th ACM Symposium on Operating Systems Security and Privacy, May 2001.
Principles (SOSP18), pages 57–72, October 2001.
[11] Michael D. Ernst, Jake Cockrell, William G. Griswold, and
David Notkin. Dynamically discovering likely program
invariants to support program evolution. IEEE Transactions
in Software Engineering, 27(2):1–25, February 2001.
[12] Flanagan and Leino. Houdini, an annotation assistant for
ESC/java. In International Symposium on FME 2001:
Formal Methods for Increasing Software Productivity, LNCS,
volume 1, 2001.
[13] Anup K. Ghosh, Christoph Michael, and Michael Shatz. A
real-time intrusion detection system based on learning
program behavior. In RAID 2000, volume 1907 of Lecture
Notes in Computer Science, pages 93–109, 2000.
[14] E Mark Gold. Language identification in the limit.
Information and Control, 10:447–474, 1967.
[15] Michel Gondran and Michel Minoux. Graphs and
Algorithms. John Wiley and Sons, 1984.
[16] Michael Kearns, Yishay Mansour, Dana Ron, Ronitt
Rubinfeld, Robert E. Schapire, and Linda Sellie. On the
learnability of discrete distributions. In Proceedings of the
Twenty-sixth ACM Symposium on Theory of Computing,
pages 273–282, 1994.
[17] James R. Larus and Eric Schnarr. EEL: Machine-independent
executable editing. In Proceedings of the SIGPLAN ’95
Conference on Programming Language Design and
Implementation (PLDI), pages 291–300, June 1995.
[18] Christoph Michael and Anup Ghosh. Using finite automata
to mine execution data for intrusion detection: a preliminary
report. In RAID 2000, volume 1907 of Lecture Notes in
Computer Science, pages 66–79, 2000.
[19] William G. Griswold Michael D. Ernst, Adam Czeisler and
David Notkin. Quickly detecting relevant program invariants.
In Proceedings of the 22nd International Conference on
Software Engineering, June 2000.
[20] Kevin P. Murphy. Passively learning finite automata.
Technical Report 96-04-017, Santa Fe Institute, 1996.
[21] Anand Raman, Peter Andreae, and Jon Patrick. A beam
search algorithm for pfsa inference. Pattern Analysis and
Applications, 1(2), 1998.
[22] Anand V. Raman and Jon D. Patrick. The sk-strings method
for inferring PFSA. In Proceedings of the workshop on
automata induction, grammatical inference and language
acquisition at the 14th international conference on machine
learning (ICML97), 1997.
[23] S. P. Reiss and M. Renieris. Encoding program executions.
In Proceedings of the 23rd International Conference on
Software Engeneering (ICSE-01), pages 221–232, Los
Alamitos, California, May12–19 2001. IEEE Computer
Society.
[24] Dana Ron, Yoram Singer, and Naftali Tishby. On the
learnability and usage of acyclic probabilistic finite
automata. In Proceedings of the 8th Annual Conference on
Computational Learning Theory, pages 31–40. ACM Press,
New York, NY, 1995.

You might also like