Sketch Based Interfaces: Early Processing For Sketch Understanding
Sketch Based Interfaces: Early Processing For Sketch Understanding
Understanding
                                                                                     1
   Sketch understanding overlaps in significant ways with the
extensive body of work on document image analysis gener-
ally (e.g., [2]) and graphics recognition in particular (e.g.,
[16]), where the task is to go from a scanned image of, say,
an engineering drawing, to a symbolic description of that
drawing.
   Differences arise because sketching is a realtime, interac-
tive process, and we want to deal with freehand sketches,
not the precise diagrams found in engineering drawings. As
a result we are not analyzing careful, finished drawings, but
are instead attempting to respond in real time to noisy, in-           Figure 1:    The stroke on the left contains both
complete sketches. The noise is different as well: noise in             curves and straight line segments. The points we
a freehand sketch is typically not the small-magnitude ran-            want to detect in the vertex detection phase are in-
domly distributed variation common in scanned documents.               dicated with large dots in the figure on the right.
There is also an additional source of very useful information          The beginning and the end points of the stroke are
in an interactive sketch: as we show below, the timing of              indicated with smaller dots.
pen motions can be very informative.
   Sketch understanding is a difficult task in general as sug-
gested by reports in previous systems (e.g., [9]) of a recogni-        3.1 Stroke Approximation
tion rate of 63%, even for a sharply restricted domain where             Stroke processing consists of detecting vertices at the end-
the objects to be recognized are limited to rectangles, circles,       points of linear segments of the stroke, then detecting and
lines, and squiggly lines (used to indicate text).                     characterizing curved segments of the stroke.
   Our domain–mechanical engineering design–presents the
additional difficulty that there is no fixed set of shapes to             3.1.1 Vertex detection
be recognized. While there are a number of traditional sym-
                                                                          We use the sketch in Fig. 1 as a motivating example of
bols with somewhat predictable geometries (e.g., symbols for
                                                                       what should be done in the vertex detection phase. Points
springs, pin joints, etc.), the system must also be able to deal
                                                                       marked in Fig. 1 indicate the corners of the stroke, where
with bodies of arbitrary shape that include both straight
                                                                       the local curvature is high.
lines and curves. As consequence, accurate early processing
                                                                          Note that the vertices are marked only at what we would
of the basic geometry–finding corners, fitting both lines and
                                                                       intuitively call the corners of the stroke (i.e., endpoints of
curves–becomes particularly important.
                                                                       linear segments). There are, by design, no vertices marked
                                                                       on curved portions of the stroke because we want to handle
3.   SYSTEM DESCRIPTION                                                these separately, modeling them with curves (as described
   Sketches can be created in our system using any of a vari-          below). This is unlike the well studied problem of piecewise
ety of devices that provide the experience of freehand draw-           linear approximation [13].
ing while capturing pen movement. We have used tradi-
tional digitizing tablets, a Wacom tablet that has an LCD-
display drawing surface (so the drawing appears under the
stylus), and a Mimio whiteboard system. In each case the
pen motions appear to the system as mouse movements,
with position sampled at rates between 30 and 150 points/sec,
depending on the device and software in use.
   In the description below, by a single stroke we mean the
set of points produced by the drawing implement between
the time it contacts the surface (mouse-down) and the time
it breaks contact (mouse-up). This single path may be com-
posed of multiple connected straight and curved segments                      Figure 2: Stroke representing a square.
(see, Fig. 1).
   Our approach to early processing consists of three phases              Our approach takes advantage of the interactive nature of
approximation, beautification, and basic recognition. Ap-              sketching, combining information from both stroke direction
proximation fits the most basic geometric primitives–lines              and speed data. Consider as an example the square in Fig. 2;
and curves–to a given set of pixels. The overall goal is to            Fig. 3 shows the direction, curvature (change in direction
approximate the stroke with a more compact and abstract                with respect to arc length) and speed data for this stroke.
description, while both minimizing error and avoiding over-            We locate vertices by looking for points along the stroke that
fitting. Beautification modifies the output of the approxi-               are minima of speed (the pen slows at corners) or maxima
mation layer, primarily to make it visually more appealing             of the absolute value of curvature.1
without changing its meaning, and secondarily to aid the                  While extrema in curvature and speed typically corre-
third phase, basic recognition. Basic recognition produces             spond to vertices, we cannot rely on them blindly because
interpretations of the strokes, as for example, interpreting a         noise in the data introduces many false positives. To deal
sequence of four lines as a rectangle or square. (Subsequent           with this we use average based filtering.
recognition, at the level of mechanical components, such as
springs, and pin joints is accomplished by another of our              1
                                                                         From here on for ease of description we use curvature to
systems [1]).                                                          mean the absolute value of the curvature data.
                                                                   2
     3                                         0.7                                                 1
                                                                                                  0.9
     2                                         0.6
0.8
     1                                         0.5
                                                                                                  0.7
                                                                                                  0.6
     0                                         0.4
0.5
    −1                                         0.3
                                                                                                  0.4
                                                                                                  0.3
    −2                                         0.2
0.2
    −3                                         0.1
                                                                                                  0.1
    −4                                          0                                                  0
         0   50   100   150       200   250          0   50         100   150         200   250         0   50    100   150   200   250
                        0.4
                                                                                                                                                 While average based filtering performs better than simply
                        0.3
                                                                                                                                              comparing the curvature data against a hard coded thresh-
                                                                                                                                              old, it is still clearly not free of empirical constants. As we
                        0.2
                                                                                                                                              explain when considering future work, scale space provides
                        0.1
                                                                                                                                              a better approach for dealing with noisy data without hav-
                                                                                                                                              ing to make a priori assumptions about the scale of relevant
                         0
                              0           50                  100               150               200            250                          features.
Figure 4: Curvature graph for the square in Fig. 2                                                                                            Application to speed change
with the threshold dividing it into regions.                                                                                                  Our experience is that curvature data alone rarely provides
                                                                                                                                              sufficient reliability. Noise is one problem, but variety in
                                                                                                                                              angle changes is another. Fig. 6 illustrates how curvature
                         1
                                                                                                                                              fit alone misses a vertex (at the upper right) because the
                        0.9
                                                                                                                                              curvature around that point was too small to be detected
                        0.8
                                                                                                                                              in the context of the other, larger curvatures. We solve this
                        0.7
                                                                                                                                              problem by incorporating speed data into our decision as an
                        0.6
                                                                                                                                              independent source of guidance.
                        0.5
                                                                                                                                                 Just as we did for the curvature data, we reduce the num-
                        0.4
                                                                                                                                              ber of false extrema by average based filtering, then look for
                        0.3
                                                                                                                                              speed minima. The intuition here is simply that pen speed
                        0.2
                                                                                                                                              drops when going around a corner in the sketch. Fig. 7
                        0.1
                                                                                                                                              shows (at left) the speed data for the sketch in Fig. 6, along
                         0
                                                                                                                                              with the polygon drawn from the speed-detected vertices (at
                              0           50                  100               150               200            250
                                                                                                                                              right).
Figure 5: Speed graph for the stroke in Fig. 2 with                                                                                           3
                                                                                                                                                An alternative approach is to detect consecutive almost-
the threshold dividing it into regions.                                                                                                       collinear edges (using some empirical threshold for collinear-
                                                                                                                                              ity) and combine them into one edge, removing the vertex
2
 The exact threshold has been determined empirically; for                                                                                     in between. Our hybrid fit scheme deals with the problem
curvature data the threshold is the mean, while for the speed                                                                                 without the need to decide what value to use for “almost-
the threshold is 90% of the mean.                                                                                                             collinear.”
                                                                                                                                          3
   Using speed data alone has its shortcomings as well. Poly-          computed as the average of the sum of the squares of the
lines formed from a combination of very short and long line            distances to the fit from each point in the stroke S:
segments can be problematic: the maximum speed reached                                         1 
along the short line segments may not be high enough to                                 εi =          ODSQ(s, Hi )
                                                                                              |S| s∈S
indicate the pen has started traversing another edge, with
the result that the entire short segment is interpreted as the         Here ODSQ stands for orthogonal distance squared, i.e., the
corner. This problem arises frequently when drawing thin               square of the distance from the stroke point to the relevant
rectangles, common in mechanical devices. Fig. 8 illustrates           line segment of the polyline defined by Hi . We compute the
this phenomena. In this figure, the speed fit misses the up-             error for Hi and for Hi ; the higher scoring of these two (i.e.,
per left corner of the rectangle because the pen failed to             the one with smaller least squares error) becomes Hi+1 , the
gain enough speed between the endpoints of the short verti-            next fit in the succession. This process continues until all
cal segment. The curvature fit, by contrast, detects all cor-           points in the speed and curvature fits have been used. The
ners, along with some other vertices that are artifacts due to         result is a set of hybrid fits.
hand dynamics during freehand sketching. This illustrates                 In selecting the best of the hybrid fits the problem is as
the utility of having both fits available.                              usual trading off more vertices in the fit against lower error.
                                                                       Here our approach is simple: We set an error upper bound
                                                                       and designate as our final fit Hf , the Hi with the fewest
                                                                       vertices that also has an error below the threshold.
                                                                   4
                                                                     Figure 10: At left the original sketch of a piece of
                                                                     metal revisited, and the final beautified output at
                                                                     right.
                                                                     sured against the original sketch, and look right to the user.
                                                                     Fig. 10 shows the original stroke for the metal piece we had
                                                                     before, and the output of the beautifier. Some examples of
                                                                     beautification are also present in Fig. 13.
                                                                 5
                                                                       Figure 12: An overtraced oval and a line along with
                                                                       and the system’s output.
                                                                   6
we believe requiring the user to draw things in a particu-           and toward earlier use of automated tools in the design cycle
lar fashion is unnecessary and reduces the natural feeling of        in particular.
sketching. Our goal is to make computers understand what
the user is doing rather than requiring the user to sketch in
a way that the computer can understand.
   Among the large body of work on beautification, Igarashi           7. REFERENCES
et al. [8] describes a system combining beautification with            [1] C. Alvarado. A natural sketching environment:
constraint satisfaction, focusing on exploiting features such             Bringing the computer into early stages of mechanical
as parallelism, perpendicularity, congruence and symmetry.                design. Master’s thesis, Massachusetts Institute of
The system infers geometric constraints by comparing the                  Technology, 2000.
input stroke with previous ones. Because sketches are inher-
                                                                      [2] H. S. Baird, H. Bunke, and K. Yamamoto. Structured
ently ambiguous, their system generates multiple interpreta-
                                                                          document image analysis. Springer-Verlag, Heidelberg,
tions corresponding to different ways of beautifying the in-
                                                                          1992.
put, and the most plausible interpretation is chosen among
these interpretations. The system is interactive, requiring           [3] M. Banks and E. Cohen. Realtime spline curves from
the user to do the selection, and doesn’t support curves. It              interactively sketched data. In SIGGRAPH,
is, nevertheless, more effective then ours at beautification,               Symposium on 3D Graphics, pages 99–107, 1990.
but beautification is not the main focus of our work and is            [4] A. Bentsson and J. Eklundh. Shape representation by
present for the purposes of completeness.                                 multiscale contour approximation. IEEE PAMI 13, p.
   The works in [15] and [3] describe methods for generating              85–93, 1992., 1992.
very accurate approximations to strokes known to be curves            [5] L. Eggli. Sketching with constraints. Master’s thesis,
with precision several orders of magnitude below the pixel                University of Utah, 1994.
resolution. The Bézier approximations we generate are less           [6] L. D. Erman, F. Hayes-Roth, V. R. Lesser, and D. R.
precise but are sufficient for approximating free-hand curves.              Reddy. The hearsay-ii speech understanding system:
We believe techniques in [15] and [3] are excessively pre-                Integrating knowledge to resolve uncertainty.
cise for free-hand curves, and the real challenge is detecting            Computing Surveys, 12:213–253, 1980. Reprinted in:
curved regions in a stroke rather than approximating those                Readings in Artificial Intelligence, Bonnie L. Webber
regions down to the numerical machine precision.                          and Nils J. Nilssen (eds.)(1981), pp 349-389. Morgan
                                                                          Kaufman Pub. Inc., Los Altos, CA.
                                                                      [7] A. Forsberg, K. Herndon, and R. Zeleznik. Sketch: An
5.   FUTURE WORK                                                          interface for sketching 3d scenes. In Proceedings of
   We are working to link this early processing to other work             SIGGRAPH’96, pages 163–170, 1996.
in our group that has focused on recognition [1] of higher            [8] T. Igarashi, S. Matsuoka, S. Kawachiya, and
level mechanical objects. This will provide the opportunity               H. Tanaka. Interactive beautification: A technique for
to add model-based processing of the stroke, in which early               rapid geometric design. In UIST ’97, pages 105–114,
operations like vertex localization may be usefully guided by             1997.
knowledge of the current best recognition hypothesis.
                                                                      [9] J. A. Landay and B. A. Myers. Sketching interfaces:
   In addition, incorporating ideas from scale space theory
                                                                          Toward more human interface design. IEEE
looks like a promising way of detecting different scales in-
                                                                          Computer, vol. 34, no. 3, March 2001, pp. 56-64.
herent in the data and avoiding a priori judgments about
                                                                     [10] T. Lindeberg. Edge detection and ridge detection with
the size of relevant features. In the pattern recognition com-
                                                                          automatic scale selection. ISRN
munity [4], [11] and [10] apply some of the ideas from scale
                                                                          KTH/NA/P–96/06–SE, 1996., 1996.
space theory to similar problems. We are currently working
on ways of applying these techniques to speed and curvature          [11] A. Rattarangsi and R. T. Chin. Scale-based detection
data. We believe this may allow us to deal more effectively                of corners of planar curves. IEEE Transactionsos
with sketches that contain relevant details at a variety of               Pattern Analysis and Machine Intelligence,
scales. There is no guaranteed way of deciding which scales               14(4):430–339, Apr. 1992.
are important at the geometric level, so using constraints           [12] N. Redding. Implicit polynomials, orthogonal distance
and/or information provided by the domain of application                  regression, and closest point on a curve. IEEE
may help in scale selection.                                              Transactions on Pattern Analysis and Machine
   Humans naturally seem to slow down when they draw                      Intelligence, pages 191–199, 2000.
things carefully as opposed to casually, so another inter-           [13] R. Rosin. Techniques for assessing polygonal
esting research direction would be to explore the degree to               approximations of curves. 7th British Machine Vision
which one can use the time it takes to draw a stroke as an                Conf., Edinburgh, 1996.
indication of how careful and precise the user meant to be.          [14] D. Rubine. Specifying gestures by example. Computer
                                                                          Graphics, 25(4):329–337, 1991.
                                                                     [15] P. Schneider. Phoenix: An interactive curve design
6.   CONCLUSION                                                           system based on the automatic fitting of
   We have built a system capable of using multiple sources               hand-sketched curves. Master’s thesis, University of
of information to produce good approximations of freehand                 Washington, 1988.
sketches. Users can sketch on an input device as if drawing          [16] K. Tombre. Analysis of engineering drawings. In
on paper and have the computer detect the low level geome-                GREC 2nd international workshop, pages 257–264,
try, enabling a more natural interaction with the computer,               1997.
as a first step toward more natural user interfaces generally,
                                                                 7
Figure 13: Performance examples: The first two pair are sketches of a marble dispenser mechanism and a
toggle switch. The last two are sketches of the direction reversing mechanism in a tape player.