Mining Specifications
Glenn Ammons                                       Rastislav Bodı́k                       James R. Larus
              Dept. of Computer Sciences                        Dept. of Computer Sciences                  Microsoft Research
                University of Wisconsin                           University of Wisconsin                   One Microsoft Way
               Madison, Wisconsin, USA                           Madison, Wisconsin, USA                 Redmond, Washington, USA
               ammons@cs.wisc.edu                                   bodik@cs.wisc.edu                      larus@microsoft.com
ABSTRACT                                                                                These tools, in general, statically compute an approximation of
Program verification is a promising approach to improving program                    a program’s possible dynamic behaviors and compare it against a
quality, because it can search all possible program executions for                   specification of correct behavior. These specifications often are
specific errors. However, the need to formally describe correct be-                  easy to develop for language-specific properties—such as avoid-
havior or errors is a major barrier to the widespread adoptio n of                   ing null dereferences and staying within array bounds. Even when
program verification, since programmers historically have been re-                   language properties are more difficult to express and check, they
luctant to write formal specifications. Automating the process of                    potentially apply to every program written in the language, so an
formulating specifications would remove a barrier to program ver-                    investment in a verification tool can be amortized easily.
ification and enhance its practicality.                                                 On the other hand, specifications particular to a program, say of
   This paper describes specification mining, a machine learning                     its abstractions or datatypes, may be difficult and expensive to de-
approach to discovering formal specifications of the protocols that                  velop because of the complexity of these mechanisms and the lim-
code must obey when interacting with an application program in-                      ited number of people who understand them. Also, as these spec-
terface or abstract data type. Starting from the assumption that a                   ifications may apply to only one program, their benefits are corre-
working program is well enough debugged to reveal strong hints                       spondingly reduced. Program verification is unlikely to be widely
of correct protocols, our tool infers a specification by observing                   used without cheaper and easier ways to formulate specifications.
program execution and concisely summarizing the frequent inter-                         This paper explores one approach to automating much of the pro-
action patterns as state machines that capture both temporal and                     cess of producing specifications. This technique, called specifica-
data dependences. These state machines can be examined by a pro-                     tion mining, discovers some of the temporal and data-dependence
grammer, to refine the specification and identify errors, and can be                 relationships that a program follows when it interacts with an ap-
utilized by automatic verification tools, to find bugs.                              plication programming interface (API) or abstract datatype (ADT).
   Our preliminary experience with the mining tool has been                          A specification miner observes these interactions in a running pro-
promising. We were able to learn specifications that not only cap-                   gram and uses this empirical data to infer a general rule about how
tured the correct protocol, but also discovered serious bugs.                        programs should interact with the API or ADT. These rules are con-
                                                                                     cisely summarized as state machines that capture both temporal and
                                                                                     data dependences. These state machines can be both examined by
1.     INTRODUCTION                                                                  a programmer, to refine the specification and identify errors, and
   It is difficult to imagine software without bugs. The richness                    utilized by automatic verification tools, to find bugs.
and variety of errors require an equally diverse set of techniques                      Mining proceeds under the assumption that an executing pro-
to avoid, detect, and correct them. Testing currently is the detec-                  gram, which presumably has passed some tests, generally uses an
tion method of choice. However, the high cost and inherent limita-                   API or ADT correctly, so that if a miner can identify the common
tions of testing has lead to a renewed interest in other approaches                  behavior, it can produce a correct specification, even from pro-
to finding bugs. One of the most promising directions is tools that                  grams that contain errors. Rather than start from the program’s
systematically detect important classes of errors [1–3, 5, 6, 9, 10].                text, in which feasible and infeasible paths are intermixed and cor-
   While program verification tools do not prevent programming                       rect paths are indistinguishable, mining begins with traces of a pro-
errors, they can quickly and cheaply identify oversights early in                    gram’s run-time interaction with an API or ADT. These traces are
the development process, when an error can be corrected by a pro-                    not only limited to feasible paths, but in general do not contain er-
grammer familiar with the code. Moreover, unlike testing, some                       rors.
verification tools can provide strong assurances that a program is                      The program in Figure 1 illustrates these points. The program
free of a certain type of error.                                                     uses the server-side socket API [7]. It generally observes the cor-
                                                                                     rect protocol: create a new socket s through a call to socket,
                                                                                     prepare s to accept connections by calling bind and listen, call
                                                                                     accept for each connection, service each connection, and finally
Permission to make digital or hard copies of all or part of this work for            call close to destroy s. Unfortunately, the program is buggy: if
personal or classroom use is granted without fee provided that copies are            the return statement on line 16 is executed, s is never closed.
not made or distributed for profit or commercial advantage and that copies              Even though a program is buggy, individual interaction traces
bear this notice and the full citation on the first page. To copy otherwise, to      can be correct. Figure 2 shows one such trace. If cond1 is rarely
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
                                                                                     true, it might be difficult to invent a test to force the program to
POPL ’02, Jan. 16-18, 2002 Portland, OR USA                                          behave badly. On the other hand, correct traces enable a miner to
Copyright 2002 ACM ISBN 1-58113-450-9/02/01 ...$5.00.
 1   int s = socket(AF_INET, SOCK_STREAM, 0);                                     socket(return = x)
 2   ...
 3   bind(s, &serv_addr, sizeof(serv_addr));
 4   ...
 5   listen(s, 5);                                                                bind(so = x)
 6   ...
 7   while(1) {
 8     int ns = accept(s, &addr, &len);
 9     if (ns < 0) break;
10     do {                                                                       listen(so = x)
11        read(ns, buffer, 255);
12        ...
13        write(ns, buffer, size);
14        if (cond1) return;                                                      accept(so = x, return = y)
15     } while (cond2)
16     close(ns);
17   }
18   close(s);
      Figure 1: An example program using the socket API.                                        read(fd = y)       write(fd = y)
 1 socket(domain = 2, type = 1, proto = 0,
          return = 7)
 2 bind(so = 7, addr = 0x400120, addr_len = 6,
        return = 0)
 3 listen(so = 7, backlog = 5, return = 0)
 4 accept(so = 7, addr = 0x400200,                                                                        close(fd = y)
          addr_len = 0x400240, return = 8)
 5 read(fd = 8, buf = 0x400320, len = 255,
        return = 12)
 6 write(fd = 8, buf = 0x400320, len = 12,                                           close(fd = x)
         return = 12)
 7 read(fd = 8, buf = 0x400320, len = 255,                                Figure 3: The output of our mining process: a specification
        return = 7)                                                       automaton for the socket protocol.
 8 write(fd = 8, buf = 0x400320, len = 7,
         return = 7)
 9 close(fd = 8, return = 0)
10 accept(so = 7, addr = 0x400200,
          addr_len = 0x400240, return = 10)
                                                                          interactions—and puts the scenarios into a standard, abstract form.
11 read(fd = 10, buf = 0x400320, len = 255,                                  The automaton learner is composed of two parts: an off-the-shelf
        return = 13)                                                      probablistic finite state automaton (PFSA) learner and a postpro-
12 write(fd = 10, buf = 0x400320, len = 13,                               cessor called the corer. The PFSA learner analyzes the scenario
         return = 13)                                                     strings and generates a PFSA, which should be both small and
13 close(fd = 10, return = 0)                                             likely to generate the scenarios. A PFSA is a nondeterministic fi-
14 close(fd = 7, return = 0)
                                                                          nite automaton (NFA) in which each edge is labelled by an abstract
                                                                          interaction and weighted by how often the edge is traversed while
Figure 2: Part of the input to our mining process: a trace of an          generating or accepting scenario strings. Rarely-used edges corre-
execution of the program in Figure 1.                                     spond to infrequent behavior, so the corer removes them. The corer
                                                                          also discards the weights, leaving an NFA. A human can validate
                                                                          the NFA by inspection, at which point the NFA specification can
                                                                          be used for program verification. Figure 1 shows a specification for
infer a specification of the correct protocol. A verification tool that
                                                                          the socket protocol of the program in Figure 1.
uses the specification to examine all program paths (e.g. [2, 10])
                                                                             This paper makes the following contributions:
could then find the rare bug.
   Our specification mining system is composed of four parts:                • We introduce a new approach, called specifications mining,
tracer, flow dependence annotator, scenario extractor, and automa-             for learning formal correctness specifications. Since specifi-
ton learner (Figure 4). The tracer instruments programs so that they           cation is the portion of program verification still dependent
trace and record their interactions with an API or ADT, as well as             primarily on people, automating this step can improve the
compute their usual results. We implemented two tracers. One is                appeal of verification and help improve software quality.
a replacement for the C stdio library, which requires recompiling
programs. The other is a more general executable editing tool that           • We use the observation that common behavior is often cor-
allows arbitrary tracing code to be inserted at call sites. The tracers        rect behavior to refine the specifications mining problem into
produce traces in a standard form, so that the rest of the process is          a problem of probabilistic learning from execution traces.
independent of the tracing technology.                                       • We develop and demonstrate a practical technique for prob-
   Flow dependence annotation is the first step in refining the traces         abilistic learning from execution traces. Our technique re-
into interaction scenarios, which can be fed to the learner. It con-           duces specification mining to the problem of learning regular
nects an interaction that produces a value with the interactions that          languages, for which off-the-shelf learners exist.
consume the value. Next, the scenario extractor uses these depen-
dences to extract interaction scenarios—small sets of dependent           The rest of the paper is organized as follows. Section 2 develops
      Program            Tracer                       Instrumented       with malloc), and then destroys the linked list, deallocating the
                                                      program            nodes with free in first-allocated, last-freed order. Ignoring the
                                                                         finite arithmetic, the traces do not form a regular language. First,
                                                                         a regular language must be defined over a finite alphabet, but
                                                                         LinkedList can make an unbounded number of distinct mal-
     Test inputs         Run                          Traces             loc and free calls. Second, LinkedList always makes a num-
                                                                         ber of malloc calls followed by an equal number of free calls,
                                                                         which is the canonical non-regular language.
                                                                            Although LinkedList’s traces do not form a regular language,
                         Flow dependence              Annotated
                                                                         its traces contain subtraces that do. Given a trace and an object o
                         annotator                    traces
                                                                         mentioned in that trace, consider the subtrace of the trace contain-
                                                                         ing calls to malloc that return o and calls to free that are passed
                                                                         o. The subtrace is simply a malloc call, followed by a free call.
                                                      Abstract           If the trace mentions n objects, there is one such subtrace for each
     Scenario seeds      Scenario extractor
                                                      scenarios          object. Each subtrace is exactly like all of the others, except for
                                                                         the particular object that it allocates and frees. Now replace that
                                                                         object in each subtrace with a standard name, say ostd . Now, all
                                                                         of the subtraces are identical, and the learner has a very strong hint
                          Automaton                   Specification      that free should always follow malloc. We call the renamed
                          learner                                        subtraces interaction scenarios.
                                                                            Our approach simplifies Problem 2.1 in two ways. First, the
                                                                         learner does not learn directly from traces. Instead, a preprocessor
      Figure 4: Overview of our specification mining system.
                                                                         extracts interaction scenarios from the traces. The scenarios manip-
                                                                         ulate no more than k data objects, for some k; in the LinkedList
                                                                         example, k = 1. Second, the set CS of correct scenarios is required
a formal statement of the specification mining problem. Section 3        to be regular. The simplified specification mining problem can now
discusses tracers and the flow dependence annotator. Section 4 de-       be defined:
scribes the scenario extractor and the automaton learner. Section 5
presents a dynamic checker for the mined specifications. Section 6           P ROBLEM 2.2. Let IS be the set of all interaction scenarios
discusses the results of some experiments with the mining tool and       with an API or ADT that manipulate no more than k data objects.
the checker. Section 7 discusses related work and Section 8 con-         Let CS ⊆ IS be the regular set of all correct scenarios. Given an
cludes the paper.                                                        unlabelled training set TS of interaction scenarios from IS , find a
                                                                         finite-state automaton AS that generates exactly the scenarios in
                                                                         CS .
2.      THE PROBLEM
  This section develops a formal statement of the specification             Problem 2.2 is also impossible to solve. The careful reader may
mining problem. At its most ambitious, specification mining at-          have noticed that the training set TS does not depend on CS . That
tempts to solve an unsolvable problem:                                   is, no matter what CS is, any subset of IS is a valid training set!
                                                                         Obviously, under these conditions, there is no basis on which to
   P ROBLEM 2.1. Let I be the set of all traces of interactions with     choose SS . The definition of Problem 2.2 allowed the training sets
an API or ADT, and C ⊆ I be the set of all correct traces of in-         to be chosen so liberally in order to allow “noisy” training sets that
teractions with the API or ADT. Given an unlabelled training set T       contain bad examples (that is, bugs) that are not in CS . A satis-
of interaction traces1 from I, find an automaton A that generates        factory definition of noise must wait until the problem has been
exactly the traces in C. A is called a specification. An algorithm       simplified further. For now, we simplify the problem by assuming
that finds A is called a specification miner.                            that the training “set” is in fact an infinite sequence of scenarios
   The rest of this section examines successive restrictions of this     from CS alone, such that each element of CS occurs at least once:
problem, which lead to a problem that can be attacked with the
                                                                             P ROBLEM 2.3. Let IS be the set of all interaction scenarios
methods of this paper. These simplifications were choosen to acco-
                                                                         with an API or ADT that manipulate no more than k data objects.
modate our techniques, and other restatements of the specification
                                                                         Let CS ⊆ IS be the regular set of all such correct scenarios. Fi-
mining problem are certainly possible.
                                                                         nally, let TS = c0 , c1 , . . . be an infinite sequence of elements from
   Problem 2.1 can not be solved because it places no restrictions
                                                                         CS in which each element of CS occurs at least once.
on the set C. If C is not recursively enumerable, then A does not
                                                                            For each n > 0, examine the first n elements of TS and produce
exist. In this paper, we require that C be generated by a finite-
                                                                         a finite-state automaton ASn , such that the sequence of finite-state
state automaton (that is, C is a regular language). This decision
                                                                         automata AS0 , AS1 , . . . has this property: for some N ≥ 0, ASN
is forced by two practical considerations. First, model checkers re-
                                                                         generates exactly the scenarios in CS and ASn = ASN for all
quire finite-state specifications. Second, the algorithms for learning
                                                                         n ≥ N . We say that the sequence AS0 , AS1 , . . . identifies CS in
finite-state automata are relatively well-developed.
                                                                         the limit.
   It is not enough to simply say that C must be regular, because
traces of most programs are not regular. For example, consider              Perhaps surprisingly, Problem 2.3 is also undecidable. Our def-
a C program (LinkedList) that takes a number n on the com-               inition of Problem 2.3 is inspired by E Mark Gold’s seminal paper
mand line, constructs a linked list of size n (allocating the nodes      on language identification in the limit [14], in which Gold shows
1                                                                        that regular languages can not be identified in the limit [14, The-
  By an “unlabelled training set”, we mean that no information is
provided as to which of the elements in T are also in C.                 orem I.8]. His proof is too long to repeat here, but the idea of
                                                                          int instrumented_socket(int do-
the proof is to present the members of an infinite regular language       main, int type, int proto)
to the learner in such a way that the learner is forced to change         {
its guess infinitely often, cycling through a never ending sequence         int rc = socket(domain, type, proto);
of finite sublanguages of the infinite language. Intuitively, the           fprintf(the_trace_fp,
learner’s dilemma is that any finite sequence of examples from the                  "socket(domain = %d, type = %d, "
infinite language is also a sequence of examples from a finite lan-                 "proto = %d, return = %d)\n",
                                                                                    domain, type, proto, rc);
guage, and the learner has no basis for preferring one of these over        return rc;
the other. Since CS is a possibly infinite regular language, Gold’s       }
theorem applies to Problem 2.3.
   Gold’s paper did not end work on learning regular languages
from examples. Subsequent work avoids the dilemma exploited               Figure 5: Illustration of trace instrumentation (instrumented
in Gold’s proof by providing the learner with extra information that      version of socket).
allows it to justify choosing a less general automaton over a more
general one (and vice versa). One class of approaches presents the
learner with examples generated according to a probability distri-           Unfortunately, with reasonable distance metrics D, it has been
bution; this sort of approach is particularly interesting to us because   shown that Problem 2.4 is not efficiently learnable [16]. An ef-
it also gives the learner a method for dealing with noise in its in-      ficient solution has been found for the case where M and M are
put. The task of the learner is to learn a close approximation of the     required to be acyclic and deterministic [24]. Since many interest-
probability distribution:                                                 ing specifications of program behavior contain loops, we chose to
      Let IS be the set of all interaction scenarios with an              use a greedy PFSA learning algorithm that is not guaranteed to find
      API or ADT that manipulate no more than k data ob-                  an ǫ-good approximation of M , but in practice generates succinct
      jects. Let P and Pb be probability distributions over               specifications.
      IS . We say that Pb is an ǫ-good approximation of P ,
      for ǫ ≥ 0, if
                                                                          3. TRACING AND FLOW DEPENDENCE
                             D(P, Pb) ≤ ǫ                                    ANNOTATION
      where D(P, Pb) is some measure of distance between                     This section describes the tracing and flow dependence annota-
      P and Pb.                                                           tion that produce the input to the scenario extractor.
                                                                          Tracing A tracer instruments a program, so that running it pro-
   Just as Problem 2.2 restricted CS to be a regular set, P must be       duces a trace of its interactions with an API or ADT, as well as its
restricted to a manageable class of distributions. We choose the          usual results. This paper assumes that a tracer only records function
distributions generated by probabilistic finite state automata (PF-       calls and returns, although depending on the API/ADT, the mining
SAs). A PFSA is a probabilistic analogue of a nondeterministic            system allows tracing other events, such as variable accesses or net-
finite state automaton. That is, a PFSA is a tuple (Σ, Q, qs , qf , p)    work messages.
where                                                                        Figure 5 shows an illustration of the trace instrumentation,
   • Σ is an output alphabet.                                             specifically the C code for an instrumented version of the socket
                                                                          call. This wrapper calls the real socket and records information
   • Q is a set of states.                                                about the interaction: the name of the call (socket), arguments,
                                                                          and return value. The entire socket API could be traced with an
   • qs ∈ Q is the start state of the automaton.
                                                                          instrumented version of each function.
   • qf ∈ Q is the final state of the automaton.                             Our system currently uses two tracers. The first instruments the
                                                                          C stdio library, by capturing all library calls and macro invocations
   • p(q, q ′ , a) is a probability function, giving the probability of
                                                                          in that API. The second consists of two parts: Perl scripts that au-
     transitioning from q ∈ Q to q ′ ∈ Q while outputting the
                                                                          tomatically generate instrumented versions of the function calls in
     symbol a ∈ Σ. Note that p(qf , q ′ , a) = 0 for all q ∈ Q and
                                                                          the X11 API, and a tool that edits program executables to replace
     a ∈ Σ.
                                                                          calls on these routines with calls to instrumented versions. The lat-
   Thus, a PFSA generates a distribution that assigns positive prob-      ter tool is based on the EEL Executable Editing Library [17] and
abilities to the strings in a regular language. Basing our defini-        is very general. It takes as input an executable, a library of in-
tion on the standard definition for learning probabilistic finite au-     strumented functions, and a file specifying which calls in the exe-
tomata [20], we can now give our final formulation of the specifi-        cutable to replace with calls to instrumented functions. The most
cation mining problem:                                                    time-consuming part of tracing an interface is writing the instru-
                                                                          mented version of each API call, but we believe that this step is
   P ROBLEM 2.4. Let IS be the set of all interaction scenarios
                                                                          easily automated.
with an API or ADT that manipulate no more than k data objects.
                                                                             All tracers record interactions in the same format, so that the rest
Let M be a target PFSA, and P M be the distribution over IS that
                                                                          of the mining system is independent of the particular tracer used.
M generates. Intuitively, P M assigns high probabilities to correct
                                                                          An interaction skeleton is of the form
traces and low probabilities to incorrect traces.
   Given a confidence parameter δ > 0 and an approximation pa-                        interaction(attribute 0 , . . . , attribute n )
rameter ǫ > 0, efficiently find with probability at least 1 − δ a
PFSA M such that its distribution P M is an ǫ-good approximation          where interaction names the interaction (that is, the name of a
of P M . “Efficiently” means that the mining algorithm must run in        function) and attribute i names the ith attribute of the interaction.
time polynomial in 1/ǫ, 1/δ, an upper bound n on the number of            Skeletons are just a convenient way of grouping interactions. They
states of M , and the size of the alphabet Σ of I.                        do not appear in traces. An interaction instantiates a skeleton by
assigning values to the attributes:                                      Definers: socket.return
                                                                                   bind.so
      interaction(attribute 0 = v0 , . . . , attribute n = vn )                    listen.so
                                                                                   accept.return
When tracing function calls, interaction attributes usually represent              close.fd
function arguments and return values, as in Figure 2. Structured
data can be represented by flattening the structures. For example,       Users:     bind.so
given this C code                                                                   listen.so
struct S { int x; int y; };                                                         accept.so
void f(S* s);                                                                       read.fd
                                                                                    write.fd
the tracer could record interactions with f with instances of this                  close.fd
skeleton
f(S, S_x, S_y)                                                           Figure 7: Attributes of socket interactions that define and use
                                                                         their values.
   By convention, this paper names traces with the letter T and in-
teractions with variations of the letter t. The actual interactions in
a trace of length n + 1 are numbered from 0 to n; for example,              Flow dependences connect attributes that change the state of an
T = t0 , . . . , tn . The notation t.a denotes the a attribute of the    abstract object (that is, attributes that define the object) to interac-
interaction t.                                                           tion attributes that depend on the state of an abstract object (that
                                                                         is, attributes that use the object). Ideally, the dependence analyzer
                  Flow dependence annotator                              would annotate a trace with flow dependences using no information
                                                                         beyond the trace itself. Our current system, however, relies on an
                                                Untyped trace            expert to tell the analyzer which attributes of interactions may de-
   Traces          Dependence analysis          with dependences         fine objects, and which attributes may use objects. This work must
                                                                         be done once for each API/ADT. Extending the system to infer the
                                                                         sets of definers and users automatically is future work.
                   Type inference                                           For simplicity, the examples in this paper assume that only
                                                                         socket-valued attributes of the interactions in Figure 2 carry depen-
                                                                         dences. Figure 7 lists attributes of interactions in Figure 2 that de-
                                                                         fine and use socket values. We constructed this table as follows. For
                     Annotated                                           each socket, the kernel maintains a hidden data structure. Some of
                     traces                                              the fields of that structure carry the state of the socket: whether the
                                                                         socket is closed or open, whether or not it can accept connections,
  Figure 6: Detailed view of the flow dependence annotator.              and so on. Other fields simply hold data: bytes that are outstanding,
                                                                         the port to which the socket is connected, and so on. Definers in
                                                                         Figure 2 typically modify one or more of the state fields of the data
Flow dependence annotation Flow dependence annotation anno-              structure. Users typically read one or more of those fields. Fields
tates each input trace with flow dependences and type assignments.       of the structure that merely hold data are ignored.
The scenario extractor uses these annotations to extract scenarios—         Creating Figure 7 required expert knowledge. However, note that
small sets of dependent interactions—from the trace and to put each      whenever the state fields of a socket’s data structure change, the set
scenario into a canonical form. The detailed view in Figure 6 shows      of API calls that may follow also changes. For example, after a
that flow dependence annotation is a two-step process. First, de-        socket is closed, read and write calls are no longer allowed. The
pendence analysis marks the trace with flow dependences, which           fact that close changes the state of the socket can be inferred from
constrain how interactions may be reordered and identify related         the trace: before a close, there are reads and writes; after a
interactions that could be grouped into a scenario for the automa-       close, there are no reads and writes. That is, interactions that
ton learner. Next, type inference assigns a type to each interaction     change state also change the sorts of interactions that may follow.
attribute in the trace. The scenario extractor uses the types to avoid   In future work, we hope to replace the expert with an automatic
naming conflicts when it puts a scenario into standard form. De-         tool that uses this fact to infer the sets of definers and users.
pendence analysis and type inference both examine the entirety of           Given the lists of attributes that define or use objects, dependence
each input trace, so their running time must be nearly linear.           analysis is a dynamic version of the reaching definitions problem.
   The miner treats all values as abstract objects whose underly-        The analyzer traverses the trace T = t0 , . . . , tn in order from t0 to
ing representation is unknown. However, interactions can depend          tn , maintaining a table M that maps values to attributes of actual
on results from other interactions. For example, in Figure 2, the        interactions. Initially, M is empty. If ti .a defines an object o, the
bind call (line 2) depends on the socket call (line 1), because          annotator updates M to map o to ti .a. If ti .a uses an object o and
the bind call uses file descriptor 7 returned by the socket call.        M maps o to ti′ .a′ , then the analyzer places a flow dependence
The order of these two interactions can not be reversed. By con-         from ti .a to ti′ .a′ . The running time of the algorithm scales lin-
trast, the interactions that manipulate file descriptor 8 (lines 4–9)    early in the length of the trace. The space required scales linearly
could be exchanged with the interactions that manipulate file de-        in the number of different values referenced by the trace.
scriptor 10 (lines 10–13), since these groups of operations are in-         For notational convenience, we introduce a relation df such that
dependent of each other. More importantly, a scenario that contains      df (ti .a, ti′ .a′ ) if and only if there is a flow dependence from ti .a
all interactions related to the close on line 13 should not include      to ti′ .a′ . The relation is extended from interaction attributes to
the interactions on lines 4–9.                                           interactions in the natural way: df (ti , ti′ ) holds if and only if there
Type(socket.return) = T0                                                        bind y to more than one instance of ns as it simulates the loop in
Type(bind.so) = T0                                                              lines 7–17.
Type(listen.so) = T0                                                               The scenario extractor simplifies and standardizes the scenarios
Type(accept.so) = T0                                                            before passing them to the automaton learner, because our speci-
Type(accept.return) = T0                                                        fication mining system uses an off-the-shelf PFSA learner. An al-
Type(read.fd) = T0                                                              ternative, which we have not tried, is to design a special-purpose
Type(write.fd) = T0                                                             learner for scenarios. Both schemes have benefits and costs.
Type(close.fd) = T0                                                                There are several off-the-shelf learners that learn PFSAs and
                                                                                similar automata from strings. Since our design transforms sce-
                                                                                narios to strings, a new learner can be substituted for the learner
Figure 8: The only valid typing for the skeleton attributes used                currently used. If the new learner learns PFSAs, no changes to the
by the trace in Figure 2.                                                       mining system are necessary. If the learner does not learn PFSAs,
                                                                                the corer may need to be changed, but none of the components be-
                                                                                fore the automaton learner in Figure 4 would require modification.
is some f and f ′ such that df (ti .a, ti′ .a′ ).                               In our experience, this flexible design was helpful. Before settling
   Type inference is the next step in the flow dependence annotator.            on the PFSA learner as use it now, we tried and rejected one other
Type inference assigns a type to each skeleton attribute that is in-            PFSA learner [21].
volved in dependences. If a value never flows between an instance                  On the other hand, a special-purpose learner could defer deci-
of one skeleton attribute and an instance of another skeleton at-               sions that our mining system now must make before invoking the
tribute, then type inference assigns the skeleton attributes separate           off-the-shelf learner. For example, when the scenario extractor re-
types. Strictly speaking, flow dependences alone give the scenario              places the concrete values in a scenario with abstract names, it does
extractor enough information to extract scenarios and put them into             so without regard to the names given to values in other scenarios.
a standard form. However, as Section 4 explains, the scenario ex-               Although the extractor always names equivalent scenarios in the
tractor can use the assurance that values will never flow between               exact same way (see below for details), when two scenarios are
certain attributes in a scenario to reduce naming conflicts. Type               “close” but not equivalent, the extractor’s choice of names can pre-
inference infers a typing that satisfies this condition:                        vent the PFSA learner from merging states that it would be able to
                                                                                merge with a different naming.
       If df (ti .a, ti′ .a′ ), then the typing gives the skeleton
       attribute of ti .a and the skeleton attribute of ti′ .a the
       same type.
                                                                                4.1 Scenario extraction
   Figure 8 gives a typing for the skeleton attributes used by the                   Annotated            Scenario extractor
                                                                                     traces
socket trace in Figure 2. In this example, every skeleton attribute
must have the same type because all socket attributes in the trace                                            Extraction            scenarios
are on some dependence chain with an instance of a close.fd                        Scenario seeds
attribute.
   The inference algorithm uses Tarjan’s union-find algorithm [26]
and requires time nearly linear in the trace. The type inferer                                                Simplification        simplified scenarios
starts with an initial typing that gives each skeleton attribute
its own unique type. Then, the inferer visits each dependence
df (ti .a, ti′ .a′ ) and unifies the types of the skeleton attribute of ti .a
                                                                                                              Standardization
and the skeleton attribute of ti′ .a. Type inference is complete when
all dependences have been visited.
                                                                                                                 Abstract
4.    SCENARIO EXTRACTION AND                                                                                    scenario strings
      AUTOMATON LEARNING
   This section explains how the scenario extractor and automaton                       Figure 9: Detailed view of the scenario extractor.
learner work. The first tool extracts interaction scenarios—small
sets of interdependent interactions—from annotated traces and pre-
pares them for the automaton learner. The second tool infers spec-                 Figure 9 is a detailed view of the scenario extractor. It receives
ifications from scenarios, not complete traces, for two reasons.                two inputs. The first is a set of traces, annotated as described in
   The primary reason is that scenarios are much shorter than traces            Section 3. In addition, the user controls which scenarios will be
and the running time of our PFSA learner increases as the third                 extracted by supplying a set of scenario seeds. Each seed is an
power of the length of its input—this is typical for automaton learn-           interaction skeleton. The extractor searches the input traces for
ers.                                                                            interactions that match the seeds and extracts a scenario from each
   Also, we restrict scenarios to refer to a small number of objects            interaction. For example, suppose the extractor was given the trace
by bounding the size of the scenario. Section 2 argues that bound-              of socket interactions in Figure 2 and accept(so, return) as
ing the number of objects makes specification mining tractable.                 the seed. The extractor would produce two scenarios, one around
Bounding the number of objects is not a severe limitation because               the accept on line 4 and the other around the accept on line
verification tools can verify that the specification holds for multiple         10.
bindings of program objects to specification objects. For example,              Extraction Producing scenarios from input traces is the first step of
although the protocol specified in Figure 1 mentions two objects, x             the extraction process. Informally, a scenario is a set of interactions
and y, a tool that attempts to verify the program in Figure 1 might             related by flow dependences. Formally, given an annotated trace
 1 socket(domain = 2, type = 1, proto = 0,                                   1    socket(return = 7)
          return = 7)                                                        2    bind(so = 7)
 2 bind(so = 7, addr = 0x400120, addr_len = 6,                               3    listen(so = 7)
        return = 0)                                                          4    accept(so = 7, return = 8) [seed]
 3 listen(so = 7, backlog = 5, return = 0)                                   5    read(fd = 8)
 4 accept(so = 7, addr = 0x400200,                                           6    write(fd = 8)
          addr_len = 0x400240,                                               7    read(fd = 8)
          return = 8) [seed]                                                 8    write(fd = 8)
 5 read(fd = 8, buf = 0x400320, len = 255,                                   9    close(fd = 8)
        return = 12)
 6 write(fd = 8, buf = 0x400320, len = 12,
         return = 12)                                                            Figure 11: The simplification of the scenario in Figure 10.
 7 read(fd = 8, buf = 0x400320, len = 255,
        return = 7)
 8 write(fd = 8, buf = 0x400320, len = 7,                                    1      socket(return = x0:T0)                                       (A)
         return = 7)                                                         2      bind(so = x0:T0)                                             (B)
 9 close(fd = 8, return = 0)                                                 3      listen(so = x0:T0)                                           (C)
                                                                             4      accept(so = x0:T0, return = x1:T0) [seed]                    (D)
                                                                             5      read(fd = x1:T0)                                             (E)
                                                                             7      read(fd = x1:T0)                                             (E)
Figure 10: A scenario extracted from around line 4 of Figure 2,              6      write(fd = x1:T0)                                            (F)
with N = 10                                                                  8      write(fd = x1:T0)                                            (F)
                                                                             9      close(fd = x1:T0)                                            (G)
T = t0 , . . . , tn , a scenario is a set S ⊆ {t0 , . . . , tn } with the
property:                                                                   Figure 12: Scenario string for the simplified scenario from Fig-
                                                                            ure 11.
      If ti0 ∈ S, tin ∈ S, and ti0 , . . . , tin is a chain of
      flow dependent interactions in T , then tij ∈ S for any
      0 ≤ j ≤ n.
                                                                                  Sar   =   {t ∈ [ta , td ] | ∃t′ ∈ Sa .t′ reaches t}
The extractor builds a scenario around each interaction in the trace              Sdr   =   {t ∈ [ta , td ] | ∃t′ ∈ Sd .t′ reachesinreverse t}
that matches a scenario seed. For any scenario S, seed (S) ∈ S is
the interaction that initially matches the seed.
   A user-tunable parameter N restricts the number of interactions          The final scenario is S = Sad ∪ (Sar ∩ Sdr ). Figure 10 shows a
in the extracted scenarios. Each scenario contains at most N an-            scenario extracted from the trace in Figure 2 with N = 10, around
cestors and at most N descendants of the seed interaction. The              the accept on line 4. The seed is marked. Also note that the
extractor prefers ancestors and descendants whose position in the           interactions in S inherit the dependences from the annotated trace.
input trace is close to the position of seed interaction.                   Simplification Given the extracted scenarios, simplification elimi-
   Once an interaction ts matching a seed is found, the extractor           nates all interaction attributes that do not carry a flow dependence
uses a two-step algorithm to produce a scenario. First, the extractor       in any training traces. The typing inferred by the dependence an-
constructs the sets:                                                        notator (see Section 3) assigns a type to an skeleton attribute if and
                                                                            only if an instance of that attribute is involved in a flow dependence
                                                                            somewhere in a trace. So, simplification preserves an interaction at-
             Sa    = {N closest ancestors of ts }                           tribute if and only if the corresponding skeleton attribute is typed.
                                                                            Figure 11 is the simplified version of the scenario in Figure 10.
             Sd    = {N closest descendants of ts }
                                                                            Standardization Standardization converts a scenario into a sce-
            Sad    = {ts } ∪ Sa ∪ Sd                                        nario string for the PFSA learner. Standardization improves the
                                                                            performance of the PFSA learner by producing scenario strings so
                                                                            that similar scenarios receive similar strings.
The extractor uses a simple prioritized worklist algorithm to con-             Figure 12 shows the result of standardizing the scenario in Fig-
struct the set of ancestors (descendants). The initial worklist is the      ure 11. Standardization applies two transformations: naming and
set of immediate ancestors (descendants) of ts . Repeatedly, un-            reordering.
til the worklist is empty or N ancestors (descendants) are found,              Naming replaces attribute values with symbolic variables. In
the extractor removes from the worklist the ancestor (descendant)           Figure 12, value 7 is replaced with the symbolic name x0:T0, and
whose position in the trace is nearest ts , adds it to the set of ances-    value 8 is replaced with the symbolic name x1:T0. Naming ex-
tors (descendants), and adds its immediate ancestors (descendants)          poses similarities between different scenarios by naming flow de-
to the worklist.                                                            pendences. For example, a scenario extracted around line 10 of
   The result, Sad , is not necessarily a scenario, because interac-        Figure 2 manipulates different socket values (7 and 10 instead of
tions along some flow dependence chains from ancestors of ts to             7 and 8), but naming still calls one of these values x0:T0 and the
descendants of ts might be missing. Any such interactions must              other x1:T0.
lie in the trace between the earliest ancestor ta in Sad and the lat-          When a value flows from one attribute to another, naming indi-
est descendant td in Sad , and must be reachable both by following          cates the dependence by assigning the same name to both attributes.
flow dependences from some ancestor of ts and by following flow             The dependence annotation typing (section 3) guarantees that, if
dependences in reverse from some descendant of ts . Thus, the ex-           two skeleton attributes are assigned different types, values never
tractor searches depth-first forwards from each element of Sa and           flow between instances of those attributes. Thus, naming uses a
backwards from each element of Sd to construct                              separate namespace for attributes of each type. Figure 13 illus-
 Original   S0                                   S1                                                  S              S
                                                                                                         0              1
       1    A(x=0, y=0) [seed]                   E(x=0, v=1) [seed]
       2    B(x=0, y=0)                          B(x=0, y=0)                                      σ0                    σ1
       3    C(x=0, y=0)                          C(x=0, y=0)
 Untyped
       1    A(x=x0, y=x1)                        E(x=x0, v=x1)
       2    B(x=x0, y=x1)                        B(x=x0, y=x2)                                       Γ0             Γ1
       3    C(x=x0, y=x1)                        C(x=x0, y=x2)
   Typed
       1    A(x=x0:T0, y=x0:T1)                  E(x=x0:T0, v=x0:T2)                      Figure 15: Equivalent scenarios.
       2    B(x=x0:T0, y=x0:T1)                  B(x=x0:T0, y=x0:T1)
       3    C(x=x0:T0, y=x0:T1)                  C(x=x0:T0, y=x0:T1)
                                                                         Naive(S)
                                                                           MaxSize := maximum size of a scenario
Figure 13: Two nearly equivalent scenarios and their scenario              X := a totally ordered set of MaxSize symbolic names
strings, with untyped and typed naming.                                    AllStrings := ∅
                                                                           Permutes := all dependence-preserving permutations of S
                                                                           Foreach σ ∈ Permutes
                                                                              Namings := all dependence-preserving namings of σ(S) from X
                                                                              Foreach Γ ∈ Namings
                Equivalent     Standardization
                                                                                Add Γ(σ(S)) to AllStrings
                scenarios
                                                                           Return the lexicographically smallest element of AllStrings
        Simplified scenarios                     Scenario strings                  Figure 16: Naive standardization algorithm.
                                                                         swap the source and sink of any dependence. Figure 12 illustrates
   Figure 14: Standardization, as a many-to-one mapping.                 a dependence-preserving permutation that swaps the read on line
                                                                         6 with the write on line 7.
                                                                             A naming Γ of S replaces each value in S with a symbolic name,
                                                                         taken from a set X. If si .a is an attribute in S, Γ(si .a) is the
trates how separate namespaces help expose more similarities to          symbolic name given to that attribute in Γ(S). We say that Γ is
the PFSA learner. Lines 2 and 3 of S0 and S1 are the same, but           dependence-preserving if, for any si .a and si′ .a′ , d(si .a, si′ .a′ ) ≡
line 1 differs in each scenario. Assume that naming assigns names        Γ(si .a) = Γ(si .a′ ).
to each interaction in turn, starting at the seed interaction. Without       Now let S0 = s0,0 , . . . , s0,n and S1 = s1,0 , . . . , s1,n be
types, naming treats lines 2 and 3 differently.                          two simplified scenarios. S0 and S1 are equivalent iff there are
   Reordering standardizes the order of scenario interactions. A         dependence-preserving permutations σ0 of S0 and σ1 of S1 and
scenario contains interactions that are partially ordered by flow,       dependence-preserving namings Γ0 of σ0 (S0 ) and Γ1 of σ1 (S1 )
anti, and output dependences. That is, each scenario corresponds to      such that Γ0 (σ0 (S0 )) = Γ1 (σ1 (S1 )) (Figure 15). In fact, the
a directed acyclic graph (DAG). The order in which the interactions      choice of σ0 and Γ0 is not important. We assert that if S0 and
appear in the original traces is just one legal total order. Reorder-    S1 are equivalent, then for any dependence-preserving σ0 of S0
ing puts two scenarios with the same DAG into the same total order,      and dependence-preserving Γ0 of σ0 (S0 ), there is a dependence-
even when their trace order differs, so that a PFSA learner is pre-      preserving σ1 of S1 and a dependence-preserving Γ1 of σ1 (Γ1 )
sented with fewer distinct strings. In Figure 12, reordering swapped     such that Γ0 (σ0 (S0 )) = Γ1 (σ1 (S1 )).
the write on line 6 with the read on line 7.                                 Figure 16 presents a naive standardization algorithm. Naive
   To a PFSA learner, each interaction in a scenario string is merely    tries all dependence-preserving permutations of S and all
an atomic letter. To emphasize this point, the right-hand side of        dependence-preserving namings of each permutation and returns
Figure 12 replaces each interaction with a shorthand letter. Stan-       the scenario string that comes first in lexicographic order. If Naive
dardization uses a small number of letters to represent a given set      assigns S0 and S1 the same scenario string, then S0 and S1 are
of scenarios. Using a small alphabet increases the PFSA learner’s        equivalent, since the algorithm has found permutations and nam-
opportunities to find similarities in the scenario strings. Also, PFSA   ings that make them equal. And, if S0 and S1 are equivalent, then
learners run more slowly with large alphabets.                           Naive generates the same AllStrings set for both of them.
   The rest of this section discusses our standardization algorithm in   So, equivalence characterizes the preimage of Naive, as promised.
detail. At a high level, standardization is a many-to-one mapping        However, the running time of the algorithm is exponential in |X|
from simplified scenarios to scenario strings (Figure 14). Under         and |S|.
this mapping, the preimage of a scenario string is a set of equiva-          The algorithm in Figure 17 removes the exponential behavior in
lent scenarios. Intuitively, equivalent scenarios manipulate abstract    |X| by considering only one standard naming for each permuted
objects in the same way. In the following, we define equivalence,        scenario. This optimization is safe because if S0 and S1 are equiv-
present our standardization algorithm, and show that equivalence         alent up to a dependence-preserving naming, then they differ only
characterizes the scenarios that standardization maps to the same        in their values, and StandardName does not depend on the iden-
scenario string.                                                         tities of the values at attributes, but only on their types and the
   Let S = s0 , . . . , sn be a simplified scenario. A dependence-       dependences that they carry.
preserving permutation of S is a permutation σ of S such that if             StandardName draws names from separate name spaces for
d(si , si′ ), then σ(i) < σ(i′ ). That is, the permutation does not      separate types. The GetNextName operation returns the next
NameInteraction(s)                                                        OfLeastSkeletons(S)
  Foreach attribute s.a                                                     return {s ∈ S | ¬∃s′ .skeleton of s′ precedes
    Type := the type of s.a’s skeleton attribute                                   skeleton of s lexicographically}
    Value := the value at s.a
    NameSpace := name space for Type                                      RestrictedPermutations(S, Pos)
    If NameSpace[Value] has not been set                                    Permutes := ∅
      NameSpace[Value] := GetNextName(NameSpace)                            Ready := {s ∈ S | ¬∃s′ ∈ S.d(s′ , s)}
    Replace Value with NameSpace[Value] in s.a                            * Selected := OfLeastKinds(Ready)
                                                                            Foreach s ∈ Selected
StandardName(S = s0 , . . . , sn )                                             Rest := RestrictedPermutations(S − {s}, Pos + 1)
  is := index of the seed in S                                                 Foreach σr ∈ Rest
  NameInteraction(sis )                                                           σ := σr ∪ {Pos → s}
  dist := 1                                                                       Permutes := Permutes ∪ {σ}
  While is − dist ≥ 0 or is + dist ≤ n                                      Return Permutes
      If is − dist ≥ 0 NameInteraction(sis −dist )
      If is + dist ≤ n NameInteraction(sis +dist )                        Standardize(S)
      dist := dist + 1                                                      Reset all name spaces
                                                                            AllStrings := ∅
Better(S)                                                                   Permutes := RestrictedPermutations(S, 0)
  Reset all name spaces                                                     Foreach σ ∈ Permutes
  AllStrings := ∅                                                             Snamed := StandardName(σ(S))
  Permutes := all dependence-preserving permutations of S                     Add Snamed to AllStrings
  Foreach σ ∈ Permutes                                                      Return the lexicographically smallest element of AllStrings
     Snamed := StandardName(σ(S))
     Add Snamed to AllStrings
  Return the lexicographically smallest element of AllStrings                        Figure 18: Final standardization algorithm.
                                                                                                             start
          Figure 17: Better standardization algorithm.                                                       5
                                                                                                         5
                                                                                                                     10000
available name in a name space in some fixed order, and resetting a
name space causes it to begin again with the first name in the space.                                                        10000
                                                                                                     5               10000
StandardName names the seed interaction first, and then works
outward. Because the name of a value must be chosen consistently,                                            5
the constraints on naming increase as interactions are named. In-
teractions near seeds are most likely to be similar across scenarios,
                                                                                                             final
so they are named first.
   The worst-case running time of Better is still exponential in
                                                                          Figure 19: A PFSA for which dropping edges with low weights
|S|. We can not expect to do better in the worst-case, because
                                                                          does not identify the hot core. Edge labels are omitted.
Better can be used to solve the DAG-isomorphism problem by
encoding arbitrary DAGs as scenarios, and DAG-isomorphism is
NP-complete. However, better performance is possible in the com-
mon case, since trace scenarios are not arbitrary DAGs. In particu-       NFA.
lar, the interactions in a scenario have names and named attributes.         The PFSA learner is an off-the-shelf learner [22] that learns a
Our final standardization algorithm (Figure 18) uses those names          PFSA that accepts the training strings, plus other strings. The
to reduce the number of permutations it considers.                        learner is a variation on the classic k-tails algorithm [4]. Briefly,
   Standardize considers only dependence-preserving permu-                the k-tails algorithm works as follows. First, a retrieval tree is con-
tations that put the skeletons of the interactions in the smallest pos-   structed from the input strings. The algorithm then computes all
sible lexicographic order. Although there can be an exponential           strings of length up to k (k-strings) that can be generated from each
number of such orderings, there is often only one. In that case, the      state in the trie. If two states qa and qb generate the same k-strings,
set of interactions Selected (line *) always has one element, and         they are merged. The process repeats until no more merges are pos-
the recursion never branches. With an appropriate implementation          sible. The PFSA learner modifies k-tails by comparing how likely
of OfLeastKinds (which sorts the interactions), the algorithm             two states are to generate the same k-strings.
runs in that case in time proportional to n log n. In our experience,        The resulting PFSA accepts a superset of all the strings in the
the time spent running Standardize is an insignificant part of            training scenarios, due to the generalizations performed by the
the scenario extraction time.                                             learner. The parameter N that controls the size of the extracted
                                                                          is chosen by the user to be large enough to include all of the inter-
4.2 Automaton learning                                                    esting behavior. It is therefore very likely that the ends of the train-
   This section presents the algorithms and data structures used in       ing scenarios contain uninteresting behavior. This is in fact what
learning the specification automaton. The automaton A is an NFA           we see experimentally: the typical PFSA has a “hot” core with a
with edges labelled by standardized interactions, whose language          few transitions that occur frequently, with the core surrounded by
includes the most common substrings of the scenario strings ex-           a “cold” region with many transitions, each of which occurs infre-
tracted from the training traces, plus other strings that the PFSA        quently. The corer whittles away the “cold” region, leaving just the
learner adds as it generalizes. Automaton learning has two steps.         “hot” core.
First, an off-the-shelf learner learns a PFSA. Then, the corer re-           The corer can not simply drop edges with low weights. Consider
moves infrequently traversed edges and converts the PFSA into an          the PFSA in Figure 19 (edge labels are not important and are omit-
                                                                            -1                        Satisfies(S, Spec)
                                          -1                  Simplification
                           Standardization
       Language of A                                                             Concrete scenarios
                                                                                                        If S is in the language of Spec
                                       Simplified scenarios
                                       satisfying A                              satisfying A             Return true
        Scenario strings
        satisfying A
                                                                                                        Else Return false
                                                                                                      Verify(T = t0 , . . . , tn , Spec, MaxSize)
                                                                                                        Loop:
                                                                                                        Foreach ti ∈ SeedsOf(T )
                    Figure 20: Scenarios that satisfy A.                                                   Size := 0
                                                                                                           While Size ≤ MaxSize
                                                                                                              Scenarios := Extract∗ (ti , Size)
                                                                                                              Foreach S ∈ Scenarios
                                                                                                                 Sstd := Standardize(S)
ted). Four edges have a weight of 5, which is low compared to the                                                If Satisfies(Sstd , Spec)
three edges with a weight of 10000. However, any string through                                                  Next Loop
this PFSA must traverse the edge out of the start state and the edge                                          Size := Size + 1
into the end state. Despite their low weight, a string is more likely                                      Return Fails(ti )
to traverse these edges than it is to traverse the edges with a weight
of 10000. Thus, a better measure of an edge’s “heat” is its likeli-
hood of being traversed while generating a string from the PFSA.                                                   Figure 21: Trace verification algorithm.
The problem of computing this measure is known as the Markov
chain problem [15]. The problem reduces to inverting a square ma-
trix with the number of rows and columns equal to the number of                                       cessively larger scenarios until it finds a satisfactory one or until
transitions in the PFSA.                                                                              it reaches the maximum scenario size. Because interactions in the
   After computing the heat of each edge, the corer removes all                                       trace are not necessarily ordered as they were in the training traces,
edges below a cutoff parameter, removes unreachable states from                                       the algorithm does not use exactly the same extraction algorithm
the PFSA, and drops the frequencies on the edges. The result is an                                    as the learner. Instead, Extract∗ (ti , Size) returns all scenarios
NFA, which a human can validate by inspection.                                                        seeded by ti with a total of exactly Size ancestors and descen-
                                                                                                      dants. The distance between the seed and its ancestors and descen-
5.    VERIFICATION                                                                                    dants is not important.
   This section discusses how verification tools can use the miner’s
specifications. Program verification tools distinguish programs that                                  6. EXPERIMENTAL RESULTS
satisfy a specification from programs that do not. Before we can                                         This section presents the results of an experiment in mining spec-
discuss these tools, we must clarify what we mean by “satisfying a                                    ifications from traces of X11 programs.
specification”.                                                                                          We analyzed traces from programs that use the Xlib and X Toolkit
   Let A be a specification. By construction, the language of A con-                                  Intrinsics libraries for the X11 windowing system. The traces record
tains a set of scenario strings (Figure 20). The containment might                                    an interaction for each X library call and callback from the X library
be strict since the automaton learner can introduce strings into A                                    to client code. The interaction attributes include all arguments and
that are not scenario strings. Because standardization is a many-                                     return values of calls, plus the fields of the structures that represent
to-one mapping (see Figure 14), each scenario string corresponds                                      X protocol events. The tracing tool uses the Executable Editing
to a set of simplified scenarios. In turn, each scenario string corre-                                Library (EEL) [17] to instrument Solaris/SPARC executables.
sponds to a set of concrete scenarios. Figure 20 shows the chain of                                      Traces were collected from full runs of widely distributed pro-
mappings. We say that the scenarios of Figure 20 satisfy A.                                           grams that use the X11 selection mechanism. We studied the selec-
   Now let T be an interaction trace. We say that T satisfies A if for                                tion mechanism since the Interclient Communication Conventions
every seed interaction is ∈ T , there is an interaction scenario Sis                                  Manual (ICCCM) [25] gives English descriptions of several rules
seeded by is such that Sis satisfies A. We say that a program P sat-                                  for how well-behaved programs should use the mechanism. The
isfies a specification A if any interaction trace T of P ’s execution                                 experiment concentrated on a rule that specifies how programs ob-
satisfies A.                                                                                          tain ownership of the selection: the rule says that a client calling
   Constructing program verification tools for specifications is out-                                 XtOwnSelection or XSetSelectionOwner must pass in a
side the scope of this paper, but is the subject of ongoing research.                                 timestamp derived from the X event that triggered the call.
There are two ways that such a tool could work. First, the tool                                          Table 1 lists each program studied, its origin (either the X11 dis-
could construct a scenario that satisfies A for each interaction seed                                 tribution or the X11 contrib directory), the number of static calls
encountered while simulating some abstraction of P , reporting an                                     to the X library routines chosen as seeds, and the number of training
error if no such scenario can be constructed for some seed. Al-                                       scenarios extracted from each trace. One of the authors gathered
ternatively, the tool could first translate A into an automaton that                                  the traces by running each program for a few minutes, while trying
generates traces instead of scenario strings. The trace automaton                                     to exercise the selection code by doing cut-and-paste operations, as
generates all traces that satisfy A. The verification tool would then                                 well as exercising as much other functionality as possible in a short
exhaustively search for a trace that is not in A, reporting an error if                               time.
one is found. Both sorts of tools must be able to simulate simplifi-                                     Specification mining depends on a sizable training set of well-
cation and standardization.                                                                           debugged traces. In our case, the number of training traces was
   Figure 21 shows a trace verification algorithm (not a program                                      small, and as it turned out, several contained violations of the rule.
verification algorithm) that works in the first way. This is the al-                                  As a result, the miner was not able to discover the rule when trained
gorithm used in our experiments (see Section 6). Verify takes                                         on all of the programs. In order to learn the rule, we needed to
a trace, a specification, and a maximum scenario size. It attempts                                    remove the buggy traces from the training set. We hypothesized
to verify that the trace satisfies the specification by extracting suc-                               that our miner could help find the bugs, even with a poor training
           Name          Source    Static seeds   Scenarios                     Name          Verifies?   Reason for failure   Action
           bitmap        distrib   1              6                             xcb           n/a         n/a                  accept
           xclipboard    distrib   2              2                             bitmap        no          spec. too narrow     accept
           xconsole      distrib   1              1                             ups           no          bug!                 reject
           xcutsel       distrib   1              4                             ted           no          spec. too narrow     accept
           xterm         distrib   1              6                             rxvt          yes         n/a                  accept
           clipboard     contrib   1              2                             xterm         no          spec. too narrow     accept
           cxterm        contrib   1              9                             display       no          spec. too narrow     accept
           display       contrib   4              16                            xcutsel       no          spec. too narrow     accept
           e93           contrib   1              2                             kterm         yes         n/a                  accept
           kterm         contrib   1              4                             pixmap        yes         n/a                  accept
           nedit         contrib   2              2                             cxterm        yes         n/a                  accept
           pixmap        contrib   1              11                            xconsole      no          benign violation     reject
           rxvt          contrib   1              4                             nedit         no          spec. too narrow     accept
           ted           contrib   3              9                             e93           no          bug!                 reject
           test canvas   contrib   1              4                             xclipboard    no          benign violation     reject
           ups           contrib   1              3                             clipboard     no          benign violation     reject
           xcb           contrib   1              11
                                                                        Table 2: Results of processing each client program, in the order
   Table 1: X11 client programs studied in the experiment.              in which they were processed.
set. Identifying the buggy traces without the miner would require          Table 2 lists the client programs in the order in which they were
inspecting each trace manually for bugs.                                processed. Out of the first six traces accepted (not including the ini-
   Using the miner, we predicted that, while we would have to in-       tial trace), five were rejected by an overly narrow specification. At
spect the first few traces, once a few correct traces had been col-     this point, the specification seemed to stabilize: out of the next four
lected, the miner’s rule could be used to automatically validate the    accepted, only one was initially rejected by the dynamic verifier.
remaining traces. In this experiment, we arranged the client pro-       The expert did not have to inspect 4 out of the 16 the traces, which
grams in random order and went through the following iterative          supports the second part of Hypothesis 1. We conjecture that if the
process:                                                                process had continued, the false rejection rate would have contin-
                                                                        ued to drop.
Run the first program and gather a trace                                   Five of the programs violated the rule in the ICCCM. We found
Mine a specification from the trace                                     three programs with benign violations of the specification and two
Expert examines the specification                                       programs with bugs. The specification applies to programs that use
Expert extracts hot core                                                the selection mechanism to do cut-and-paste, while the programs
If the specification is not correct                                     with benign violations used the selection mechanism to implement
  Select another random order and start over                            their own communication protocol. These violations indicate that
For each remaining client program in order                              the rule described by the ICCCM is not universally applicable and
   Run the program and gather a trace                                   that the document should be clarified. Thus, the specification miner
   Verify the trace against the specification                           helped find bugs and an documentation omission (an unexpected
   If verification succeeds                                             benefit).
     Add the trace to the training set                                     Figure 22 is the specification from the experiment. For legibil-
     Generate a new specification                                       ity’s sake, the figure omits some arguments. These arguments did
   Else                                                                 not participate in dependences within the core of the specification.
      Examine the scenarios that failed                                 The specification is compact, with six states and nine edges. It also
      If no scenario violates the ICCCM rule                            matches the English rule very closely, with most complexity aris-
        Add the trace to the training set                               ing from the several ways in which the X API receives an event.
        Generate a new, more general specification                      In addition, the specification exposes a common pattern in which
      Else                                                              the client calls XSetSelectionOwner repeatedly until XGetSelec-
         Report the bug                                                 tionOwner indicates that the call was successful.
                                                                           Our final hypothesis was that the corer and the expert would
   For each trace that fails to verify, the expert either marks it as
buggy or includes it in the training set. The expert decides whether    agree on which states in the final PFSA should be thrown out. The
the initial specification is correct: in our experiment, we accepted    final PFSA had 27 states. The expert, who did not have access to the
the initial specification if we did not see any obvious bugs in the     corer’s results, threw out 15 of these and retained 12; the remaining
                                                                        twelve were merged to form the NFA in Figure 22. The corer and
first set of training scenarios. The expert also needs to extract the
                                                                        the expert disagreed on five out of the 27 states, or 19%. The corer
hot core, since the training set is too small to use the corer.
   The experiment tested three hypotheses:                              assigned likelihoods lower than 6% to 13 of the 15 deleted states,
                                                                        and likelihoods higher than 13% to 9 of the 12 retained states. The
Hypothesis 1 The process will find bugs and reduce the number of        other 2 deleted states had likelihoods of 13% and 20%, while the re-
     traces that the expert must inspect.                               maining retained states had likelihoods of 5%, 6%, and 9%. Thus,
Hypothesis 2 The miner’s final specification will match the rule in     7. RELATED WORK
     the ICCCM.
                                                                           Ernst et al. also proposed automatic deduction of formal specifi-
Hypothesis 3 The corer and the human will agree on which states         cations [11]. Their Daikon tool works by learning likely invariants
     in the final PFSA belong in the final specification.               involving program variables from dynamic traces. The resulting
                                                                           ture from traces [23], but they model the sequence of operations on
                                    G
                       START                                               individual objects, not the data and temporal dependences across
                                                3
                                                                           several objects.
               A                    C
                                                H
                                                                           8. CONCLUSION
                        B
                                                                              This paper addresses an important problem in the program-
                                            4
               1                                                           verification tool-chain, namely the problem of semi-automatic for-
                   E                                                       mulation of correctness specifications that could be accepted by
                                        F                                  model checkers and other similar tools. We have formulated the
                            5                                              problem as a machine learning problem and provided an algo-
   D       E
                                                                           rithm based a reduction to finite automaton learning. While some
                   D            E                                          more experimental work remains ahead of us, initial experience is
                                                                           promising.
               2                            END
   A = XNextEvent(time = X21_0)
                                                                           Acknowledgements
   B = XNextEvent(time = X21_0) or B = XtDispatchEvent(time = X21_0)       We thank Anand Raman, Peter Andreae, and Jon D. Patrick for al-
       or B = XIfEvent(time = X21_0)
   C = XtDispatchEvent(time = X21_0) or C = XtEventHandler(time = X21_0)   lowing us to use their PFSA learner, as well as the anonymous refer-
        or C = XtLastTimeStampProcessed(time = X21_0)                      ees for their many helpful comments on an early draft of this paper.
   D = XGetSelectionOwner                                                  This work was supported in part by the National Science Founda-
   E = XSetSelectionOwner(time = X21_0)
   F = XtOwnSelection(time = X21_0)
                                                                           tion with grants CCR-0093275 and CCR-0103670, the University
   G = XtActionHookProc(time = X21_0)                                      of Wisconsin Graduate School, and donations from IBM and Mi-
   H = XInternAtom                                                         crosoft Research.
Figure 22: The NFA from the selection ownership specification.
                                                                           9. REFERENCES
                                                                            [1] Thomas Ball, Rupak Majumdar, Todd Millstein, and
                                                                                Sriram K. Rajamani. Automatic predicate abstraction of C
formal specifications is the key difference between their approach              programs. In Proceedings of the 2001 ACM SIGPLAN
and ours. Daikon’s specifications are arithmetic relationships that             Conference on Programming Language Design and
hold at specific program points (e.g., a precondition x < y at entry            Implementation, volume 36 of ACM SIGPLAN Notices,
to a procedure f ). By contrast, our specifications express temporal            pages 203–213, July 2001.
and data-dependence relationships among calls to an API. Our tem-
                                                                            [2] Thomas Ball and Sriram K. Rajamani. Automatically
poral specifications capture a different aspect of program behavior             validating temporal safety properties of interfaces. In
than Daikon’s predicates on values and structures. The two forms                Proceedings of the 8th International SPIN Workshop on
of specifications are complementary, but naturally require radically            Model Checking of Software, number 2057 in Lecture Notes
different learning algorithms.                                                  in Computer Science, pages 103–122, May 2001.
   Recently, Ernst et al. presented techniques for suppressing parts
                                                                            [3] Thomas Ball and Sriram K. Rajamani. Bebop: a
of their learned specifications that are not useful to a program-
                                                                                path-sensitive interprocedural dataflow engine. In
mer [19]. In the context of our temporal specifications, this result
                                                                                Proceedings of the 2001 ACM SIGPLAN-SOGSOFT
corresponds to appropriately selecting the heavy core of the initial
                                                                                Workshop on Program Analysis for Software Tools and
PFSA.
                                                                                Engineering, ACM SIGPLAN Notices, pages 97–103, July
   Another related tool is Houdini [12], an annotation assistant for
                                                                                2001.
ESC/Java. Starting from an initial (guessed) candidate set of anno-
tations, which are similar to those of Daikon, Houdini uses ESC/Java        [4] A.W̃. Biermann and J.Ã. Feldman. On the synthesis of
to refute invalid annotations. The focus of Houdini is on annotating            finite-state machines from samples of their behaviour. IEEE
points of a single program with true properties, while the focus of             Transactions on Computers, 21:591–597, 1972.
our tool is on discovering temporal properties that hold across all         [5] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A
programs that use an interface.                                                 static analyzer for finding dynamic programming errors.
   Other authors have described tools that extract automaton-based              Software Practice and Experience, 30:775–802, 2000.
models. Cook and Wolf describe a tool for extracing FA models               [6] Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem,
of software development processes from traces of events [8]. Our                and Dawson Engler. An empirical study of operating systems
work differs in that we extract specifications from program traces,             errors. In Proceedings of the 18th ACM Symposium on
which must be reduced to a simpler form before they are palat-                  Operating Systems Principles (SOSP18), pages 73–88,
able for an FA learner. Ghosh et al. describe several techniques                October 2001.
for learning the typical behavior of programs that make system              [7] Douglas E. Comer and David L. Stevens. Internetworking
calls [13, 18]. Since they intend their models for intrusion detec-             with TCP/IP. Client-server Programming and Applications,
tion, the models need only characterize a particular program’s be-              BSD Socket Version. Prentice-Hall, Englewood Cliffs, NJ
havior, while our miner finds rules that are generally applicable and           07632, USA, 1993.
understandable by humans. Wagner and Dean’s intrusion detection             [8] Jonathan E. Cook and Alexander L. Wolf. Discovering
system also extracts automaton models, but from source code, not                models of software processes from event-based data. ACM
traces [27]. Their system also extracts models that apply only to               Transactions on Software Engineering and Methodology,
a single program. Finally, Reiss and Renieris also extract struc-               7(3):215–249, July 1998.
 [9] Robert DeLine and Manuel Fähndrich. Enforcing high-level       [25] David Rosenthal. Inter-client communication conventions
     protocols in low-level software. In Proceedings of the               manual (ICCCM), version 2.0. X Consortium, Inc. and Sun
     SIGPLAN ’01 Conference on Programming Language                       Microsystems, 1994. Part of the X11R6 distribution.
     Design and Implementation (PLDI), pages 59–69, June 2001.       [26] Robert Endre Tarjan. Efficiency of a good but not linear set
[10] Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou,                union algorithm. Journal of the ACM, 22(2):215–225, 1975.
     and Benjamin Chelf. Bugs as deviant behavior: a general         [27] David Wagner and Drew Dean. Intrusion detection via static
     approach to inferring errors in system code. In Proceedings          analysis. In Proceedings of the 2001 IEEE Symposium on
     of the 18th ACM Symposium on Operating Systems                       Security and Privacy, May 2001.
     Principles (SOSP18), pages 57–72, October 2001.
[11] Michael D. Ernst, Jake Cockrell, William G. Griswold, and
     David Notkin. Dynamically discovering likely program
     invariants to support program evolution. IEEE Transactions
     in Software Engineering, 27(2):1–25, February 2001.
[12] Flanagan and Leino. Houdini, an annotation assistant for
     ESC/java. In International Symposium on FME 2001:
     Formal Methods for Increasing Software Productivity, LNCS,
     volume 1, 2001.
[13] Anup K. Ghosh, Christoph Michael, and Michael Shatz. A
     real-time intrusion detection system based on learning
     program behavior. In RAID 2000, volume 1907 of Lecture
     Notes in Computer Science, pages 93–109, 2000.
[14] E Mark Gold. Language identification in the limit.
     Information and Control, 10:447–474, 1967.
[15] Michel Gondran and Michel Minoux. Graphs and
     Algorithms. John Wiley and Sons, 1984.
[16] Michael Kearns, Yishay Mansour, Dana Ron, Ronitt
     Rubinfeld, Robert E. Schapire, and Linda Sellie. On the
     learnability of discrete distributions. In Proceedings of the
     Twenty-sixth ACM Symposium on Theory of Computing,
     pages 273–282, 1994.
[17] James R. Larus and Eric Schnarr. EEL: Machine-independent
     executable editing. In Proceedings of the SIGPLAN ’95
     Conference on Programming Language Design and
     Implementation (PLDI), pages 291–300, June 1995.
[18] Christoph Michael and Anup Ghosh. Using finite automata
     to mine execution data for intrusion detection: a preliminary
     report. In RAID 2000, volume 1907 of Lecture Notes in
     Computer Science, pages 66–79, 2000.
[19] William G. Griswold Michael D. Ernst, Adam Czeisler and
     David Notkin. Quickly detecting relevant program invariants.
     In Proceedings of the 22nd International Conference on
     Software Engineering, June 2000.
[20] Kevin P. Murphy. Passively learning finite automata.
     Technical Report 96-04-017, Santa Fe Institute, 1996.
[21] Anand Raman, Peter Andreae, and Jon Patrick. A beam
     search algorithm for pfsa inference. Pattern Analysis and
     Applications, 1(2), 1998.
[22] Anand V. Raman and Jon D. Patrick. The sk-strings method
     for inferring PFSA. In Proceedings of the workshop on
     automata induction, grammatical inference and language
     acquisition at the 14th international conference on machine
     learning (ICML97), 1997.
[23] S. P. Reiss and M. Renieris. Encoding program executions.
     In Proceedings of the 23rd International Conference on
     Software Engeneering (ICSE-01), pages 221–232, Los
     Alamitos, California, May12–19 2001. IEEE Computer
     Society.
[24] Dana Ron, Yoram Singer, and Naftali Tishby. On the
     learnability and usage of acyclic probabilistic finite
     automata. In Proceedings of the 8th Annual Conference on
     Computational Learning Theory, pages 31–40. ACM Press,
     New York, NY, 1995.