Abstraction
Abstraction
DOI 10.1007/s11023-007-9061-7
Introduction
123
170 T. Colburn, G.Shute
123
Abstraction in Computer Science 171
qualities of particular things. We can have an abstract idea of water that does not
include in it any particular idea of this or that lake, or this or that sea, or this or that
dripping faucet, etc. So abstraction in this sense concerns the distinction between the
general and the particular in the world of ideas. Importantly, this abstraction
proceeds by dropping (or ignoring, or neglecting) spatial and temporal features from
ideas.
A theory of abstract ideas may or may not be associated with a theory of
universals, or things that can be shared by many particular things. Plato’s theory of
forms, for example, held that there is a world inhabited by things that can be
regarded as the shared common nature of particular things in the world of
experience. Justice, for example, is that which is shared by all just acts. Just acts are
particular and perceptible, while justice itself is not, though it is just as ‘‘real’’.
Justice is an example of an immutable Platonic form. Although Plato also used the
term ‘‘idea’’ to refer to forms, modern language uses ‘‘universal’’ to distinguish the
essence of a form from the minds that may come in contact with it.
A theory of universals may be regarded as the ontological counterpart to a theory
of abstract ideas. Just as one can distinguish abstract ideas from nonabstract ideas by
appealing to general terms and their relation to particular terms, one can distinguish
abstract objects (apart from any ideas of them) from concrete objects. This
distinction between abstract and concrete objects can play out in modern treatments
of modal logic, for example the logic of counterfactual conditionals, or conditionals
of the form, ‘‘If it were the case that p, then it would be the case that q’’. Lewis
(1973), for example, bases his logic of counterfactual conditionals on a similarity
ordering of possible worlds: a counterfactual conditional is true in this world (in
which p is false) if every world which is sufficiently similar to this one but in which
p is true is also a world in which q is true. So the statement ‘‘If kangaroos had no
tails, they would fall over’’ means, for Lewis, something like, ‘‘In all worlds
sufficiently similar to this one in which kangaroos have no tails, they fall over’’.
Such an analysis begs for a characterization of these other, possible but not actual,
worlds. Are they concrete, like this one, or are they abstract? If abstract, what
exactly is the distinction between concrete and abstract?
As Lewis points out, not only is there no universal agreement on this distinction,
but various ways of characterizing the distinction can conflict with one another. For
example, saying that abstract objects have no spatiotemporal location and do not
enter into causal interaction conflicts with the idea that sets are abstract, since ‘‘a set
of located things does seem to have a location, though perhaps a divided location’’
(Lewis 1986, p. 83). And regarding the absence of causal interaction, Lewis asks
why can’t a cause have a set of effects?
A better way of characterizing abstract and concrete entities, for Lewis, is by
looking at the process by which an abstract entity is constructed out of concrete
ones, which is a process of ‘‘somehow subtracting specificity, so that an incomplete
description of the original concrete entity would be a complete description of the
abstraction’’ (pp. 84–85). In performing an abstraction, ’’[w]e are ignoring some of
its features, not introducing some new thing from which those features are absent’’
(p. 86). So again, abstraction as a process is about ignoring (or dropping, or
neglecting) features with which one is not concerned.
123
172 T. Colburn, G.Shute
There is of course a long tradition in the philosophy of mathematics, but whether the
treatment is traditional (Kline 1972; Hardy 1967) or more along the lines of the
modern structuralists (Benacerraf 1965; Resnik 1997; Shapiro 1997), it is clear that
the production of theorems and their proofs is the central activity of mathematics.
These theorems and proofs are best understood not in isolation, but as parts of larger
structures.
Group theory is a good example. Mathematicians begin with a definition of
(axioms for) the concept of a group, that is, a set G for which an associative binary
operation * has been defined where there is an identity element for * in G and all
elements of G have inverses in G with respect to *. Then they develop theorems that
are true for all groups, and they identify properties that some, but not all groups
have that allow further inferences. They also define concepts (like conjugacy) that
can be used to infer further properties of groups and their elements. Finally, they
123
Abstraction in Computer Science 173
develop tools (like matrix representations) that help them recognize groups and
allow results from other mathematical inference structures to be applied to group
theory. The main objective is to be able to reason in the abstract about groups and
their elements.
Group theory, viewed as an inference structure, is a collection of deductive
arguments, with places in the arguments for the set of elements and the binary
operation. If the hypotheses of the arguments are satisfied when terms from a non-
mathematical realm are substituted in the places, then the conclusions of the
arguments, with similar substitutions, can be asserted.
As we mentioned before, there is plenty of overlap between mathematics and
computer science, and this overlap can include theorem producing as a role for
computer science—for example, when reasoning about formal languages and the
automata that process them. However, the central activity of computer science is the
production of software, and this activity is characterized primarily not by the
creation and exploitation of inference structures, but by the modeling of interaction
patterns. The kind of interaction involved depends upon the level of abstraction
used to describe programs. At a basic level, software prescribes the interacting of a
certain part of computer memory, namely the program itself, and another part of
memory, called the program data, through explicit instructions carried out by a
processor. At a different level, software embodies algorithms that prescribe
interactions among subroutines, which are cooperating pieces of programs. At a still
different level, every software system is an interaction of computational processes.
Today’s extremely complex software is possible only through abstraction levels that
leave machine-oriented concepts behind. Still, these levels are used to describe
interaction patterns, whether they be between software objects or between a user
and a system.
For example, abstract data types (ADTs) are encapsulations of data with the
operations that can be performed on the data, such encapsulation enabling the
convenient interaction of a user (human or otherwise) with the data. Creating ADTs
is an example of what computer scientists call data abstraction. Consider the table
ADT taught in introductory computer science courses (see for example Carrano
2007). It is an interaction pattern whose origin lies outside computer science in
human interactions with telephone books and dictionaries. The heart of the
abstraction is the use of a key (a person’s name or a word) to access other
information (a person’s address or telephone number or a word’s meaning). It is
distinguished from other access abstractions in that the order of access in a table is
determined by its user whereas abstractions such as streams, stacks, queues, and
priority queues provide access in an order determined by the abstraction.
The concept of an ADT is carried further with the concept of an object, which is
central to object-oriented programming, the current dominant programming
paradigm in computer science. ADTs are passive entities in that they can only be
acted upon by a controlling program as though they were glorified machine data
types. Objects still encapsulate data and operations, but, as opposed to ADTs,
objects are active entities in the sense that they can be called upon to perform
operations on themselves, and they can invoke operations on other objects by
passing messages to them. Thus the user/ADT interaction pattern is replaced by the
123
174 T. Colburn, G.Shute
123
Abstraction in Computer Science 175
123
176 T. Colburn, G.Shute
virtue of the special meaning of its terms, but in virtue of the universal relations
between them. The specific quality of the things which the terms denote do not, as
such, play any part in the system’’ (p. 138). As all students of logic learn, for
example, it is the form of modus ponens that makes it valid, and not the meanings of
the sentences that comprise it. So mathematical abstraction in this sense eschews
any nonlogical meaning.
We can characterize this sort of formal mathematical abstraction as an aid to
discovering universal properties or relationships that hold within any individual
member of a broad class of entity (for example, groups). The same sort of
abstraction is employed when deducing the relationship between the lengths of the
sides of any right triangle, as in the Pythagorean theorem. But another kind of
abstraction is employed when deciding that certain properties, for example the color
of a right triangle, or the processing speed of a computing device, are irrelevant
when reasoning about the properties of the class of such things as a whole. This kind
of abstraction makes judgments about what is essential to a concept and what is not.
In the mathematical theory of groups, for example, all that is chosen for study is the
set G and operation * with the particular properties of interest. All other concrete
properties of groups are ignored. They are inessential details when considering, for
example, whether two groups are isomorphic. So there are at least two kinds of
abstraction in mathematics: the emphasis of form over content, and the neglect of
certain features in favor of others. As Cohen and Nagel put it, ‘‘A deductive system
is therefore doubly abstract: it abstracts from the specific qualities of a subject
matter, and it selects some relations and neglects others’’ (pp. 138–139).
Any science, including computer science, succeeds by constructing formal
mathematical models of their subject matter that eliminate inessential details.
Inasmuch as such details constitute information, albeit irrelevant information, we
might call such elimination information neglect. It is our contention, however, that
computer science is distinguished from mathematics in the use of a kind of
abstraction that computer scientists call information hiding. The complexity of
behaviour of modern computing devices makes the task of programming them
impossible without abstraction tools that hide, but do not neglect, details that are
essential in a lower-level processing context but inessential in a software design and
programming context. The reason they can be essential in one context and
inessential in another is because of abstraction and information hiding tools that
have evolved over the history of software development.
The concepts behind today’s notion of a stored program computer date at least from
the 19th century with the work of Charles Babbage, who designed an analytical
engine that would perform complex arithmetic calculations using its mill by
following instructions encoded in its store. These instructions, transmitted to the
store through the use of punched cards, amounted to a program for the mill.
Unfortunately, the design of the analytical engine was beyond the capability of
anyone to build at the time, and it remained just that, a design. Today, of course,
123
Abstraction in Computer Science 177
123
178 T. Colburn, G.Shute
From a software point of view, a bit, or binary digit, is the interpretation given to a
piece of hardware, called a flip-flop, that is always in one and only one of two
possible states. It is an abstraction that hides the details of how the flip-flop is built.
Machine language is composed entirely of bits. To specify nontrivial computational
processes in machine language is a practical impossibility for humans, and so
programming languages with higher levels of abstraction are necessary. One such
abstraction is the byte, or collection of 8 bits, that hides the details of how the bits
are arranged so that they can all be addressed together. Various machine data types,
like word, integer, longword, float, and double, are abstractions that hide the details
of how bytes are arranged to represent numerical values of various magnitudes and
precisions. A variable is an abstraction that hides the details of how a symbolic
token can serve as a location for changing data values in memory. A register is an
abstraction of a collection of bits that hides the details of how the bits are intimately
related to the processor and memory. A machine instruction is an abstraction of a
collection of bits that hides the details of how the bits represent processor operations
on registers and variables. These abstractions make programming in certain kinds of
languages, called assembly languages, possible.
At a higher level of detail, a subroutine, function, or procedure is an abstraction
of a segment of memory that hides the details of how the segment represents a piece
of a program that is passed certain parameter values and returns a value as a result.
A pointer is an abstraction that hides the memory address of a piece of data. These
abstractions make programming in higher level languages possible. We’ve
mentioned how ADTs are abstractions of data and operations. They can be built
on top of the data types, procedures, and pointers available in higher level
languages.
Sometimes pointers and variables are used by higher level language programmers
in such a way that the memory once referred to by a pointer is no longer accessible
by the program—it is garbage. A garbage collector is an abstraction of a special
process that collects garbage and makes it once again available to the original
progam, hiding from that program the details of how this is done.
We’ve also mentioned how objects are abstractions of small software computers.
They hide the details of how objects can send one another messages to invoke
operations, and how they can be set up to inherit attributes and behavior.
Programming with them requires a category of higher level languages called object-
oriented programming languages. Object-oriented programming is the latest in a
history of approaches to programming that attempts to make programs reusable.
The practice of literally reusing pieces of program text has a long tradition in
programmers consulting published texts or manuals that list programs implementing
various algorithms and data structures. Programmers also reuse software when they
make use of software libraries, for example, mathematical function libraries or
graphics routine libraries. This use of code libraries is an example of procedural
abstraction, or the ability to execute code through the calling of named procedures
that accept explicitly described parameters and return certain guaranteed results. It
is an example of abstraction because the details of how the procedure performs its
123
Abstraction in Computer Science 179
computation are hidden from the procedure’s caller; since the caller only makes use
of the procedure for its results, there is no need for it to know the internals.
A procedure makes its required parameters and guaranteed results known through
a public applications programming interface, or API. The API makes the procedure
able to be ‘‘plugged into’’ another program in the same way that a hardware
component can be plugged into, say, a slot on a motherboard. Using APIs makes a
program modular like a motherboard, and it makes procedures reusable.
Besides enhancing reusability, procedural abstraction also supports changeabil-
ity. Abstracting a procedure’s function from the way it performs the function has the
effect of decoupling the procedure’s caller from the procedure. Thus, if the internals
of the procedure ever need to change, the procedure’s caller is unaffected, provided
that the changed procedure still supports the same API as before. Enforcing
procedural abstraction by using APIs throughout a system greatly enhances the
system’s changeability.
Procedures written in machine languages, assembly languages, or higher level
object-oriented programming languages all exploit linguistic constructs that are
abstractions of aspects of memory and processor hardware required to specify useful
computational processes. A remarkable aspect of all of them is that, not only are
they used to describe such processes, but, owing to the complex chain of events
involved in their translation and execution, they also bring these processes about. To
bring them about requires a monolithic, organizing program, called an operating
system, that makes the system’s computational resources available to application
programs in a way that can only succeed through information hiding.
123
180 T. Colburn, G.Shute
123
Abstraction in Computer Science 181
123
182 T. Colburn, G.Shute
boxes. The original message is recovered on the right, and passed on to the higher
layer on the receive side.
The headers, in effect, allow communication between the peers so that they may
coordinate to achieve the objectives of their layer. The overall effect is that the two
peers on the send and receive side cooperatively implement an abstract service that
is used by the layer above. The layer above sends and receives data in a format that
is suitable to its own objectives, with the details of how that service is provided
hidden.
This layered organization is common in software architectures. It defines an
architecture that can be adapted to many different contexts. Each layer can be
changed to meet new needs. As long as the abstract interface is maintained, other
layers do not need to be changed. In the history of the Internet, there have been
many new kinds of applications added, each with software mainly in the application
layer. Various technologies have been incorporated into the Internet for faster data
transfer through changes to the physical layer. However, due to the independence of
layers achieved through information hiding, layers above are unaffected by the
changes, other than to enjoy increased performance.
123
Abstraction in Computer Science 183
in all the displays throughout the store. This sort of problem, in which an observed
subject (the inventory) needs to notify observing objects (store displays) about
changes that occur to it, has been solved often enough that there is a well-known
pattern, called Observer, that solves it.
Observer accomplishes its solution elegantly by decoupling the observed from
the observing object. In fact, the utility of most of the design patterns in use for
object oriented design lies in their decoupling objects of different classes by
reducing class dependencies, whether those dependencies have their origin in
hardware or software platform features, required object operations, required object
state, or required algorithms. To the extent that classes have these kinds of
dependencies on one another reduced, they exploit information hiding. Objects from
classes that are loosely coupled in this way result in reusable and changeable
software.
While a design pattern such as Observer is applied at the class level, the notion of
a pattern is applicable to larger scale software architecture as well. Architectural
patterns have the same objective: increase system flexibility by increasing the
independence of the system components. Modern web applications, for example,
make extensive use of the Model-View-Controller architecture, whereby enterprise
data and its structure, called the model, are strictly separated from the views given
to users. Model views, in turn, are strictly separated from the modules controlling
user interaction. The independence of model, view, and controller components in
web applications makes for agile development and change, and is achieved, again,
through information hiding.
Conclusions
Mathematics and computer science are similar in that the primary products of both
disciplines, and the models that they build, are abstract. There is, however, a
significant difference in the type, nature, and use of the abstractions.
The primary products of mathematics are inference structures, whereas the
primary products of computer science are interaction patterns. To compare the
activity of the mathematician engaged in the manipulation of inference structures
and the computer scientist engaged in the manipulation of interaction patterns is to
reveal the essence of a distinction between information neglect and information
hiding.
The activity of purely mathematical reasoning is characterized by the formal-
ization of inference structures through axiomatization and theorem proving. By
contrast, the activity of software development is characterized by modeling
interaction patterns in both formal and informal designs, and in various, mostly high
level, programming languages. These patterns deal with interactions between
elements of software, between elements of software and its users, and between
elements of software and a technological foundation that includes hardware,
programming languages, operating systems, and networks.
Although mathematicians continue to develop new abstractions and elaborate on
the inferences that can be made within existing abstractions, the abstractions
123
184 T. Colburn, G.Shute
References
Benacerraf, P. (1965). What numbers could not be. Philosophical Review, 74, 47–73.
Carrano, F. M. (2007). Data abstraction & problem solving with C++. Boston: Addison-Wesley.
Cohen, M. R., & Nagel, E. (1953). The nature of a logical or mathematical system. In H. Feigl & M.
Brodbeck (Eds.), Readings in the philosophy of science (pp. 129–147). New York: Appleton-
Century-Crofts.
Colburn, T. R., Fetzer, J. H., & Rankin, T. L. (1993). Program verification: Fundamental issues in
computer science. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Fetzer, J. H. (1988). Program verification: The very idea. Communications of the ACM, 31(9), 1048–
1063.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns: Elements of reusable object-
oriented software. Boston: Addison-Wesley.
Hailperin, M., Kaiser, B., & Knight, K. (1999). Concrete abstractions: An introduction to computer
science. (Pacific Grove, CA: PWS Publishing). Also available online: http://www.gustavus.edu/
+max/concrete-abstractions.html.
Hardy, G. H. (1967). A mathematician’s apology. London: Cambridge University Press.
Kline, M. (1972). Mathematical thought from ancient to modern times. New York: Oxford University
Press.
Kurose, F. K., & Ross, K. W. (2003). Computer networking: A top-down approach featuring the internet.
(2nd ed.). Boston: Pearson Education.
Lewis, D. K. (1973). Counterfactuals. Cambridge, MA: Harvard University Press.
Lewis, D. K. (1986). On the plurality of worlds. Oxford: Basil Blackwell.
Lindholm, T., & Yellin, F. (1999). The Java virtual machine specification (2nd ed.). http://java.sun.com/
docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html.
Locke, J. (1706). An essay concerning human understanding. Illinois: Open Court Publishing (1962
reprint).
Resnik, M. D. (1997). Mathematics as a science of patterns. Oxford: Oxford University Press.
Russell, B. (1903). Principles of mathematics. Cambridge: Cambridge University Press.
Shapiro, S. (1997) Philosophy of mathematics: structure and ontology. New York: Oxford University
Press.
VMware (2006). VMware Inc. http://www.vmware.com/.
Whitehead, A. N., & Russell, B. (1910). Principia mathematica. Cambridge, England: Cambridge
University Press.
The Xen Virtual Machine Monitor (2006). University of Cambridge. http://www.cl.cam.ac.uk/Research/
SRG/netos/xen/.
123