Internet Linguistics
The Internet is now an integral part of contemporary life, and lin-
guists are increasingly studying its influence on language. In this
student-friendly guidebook, leading language authority Professor
David Crystal follows on from his landmark bestseller Language
and the Internet and presents the area as a new field: Internet
linguistics.
In his engaging trademark style, Crystal addresses the online
linguistic issues that affect us on a daily basis, incorporating
real-life examples drawn from his own studies and personal
involvement with Internet companies. He provides new linguis-
tic analyses of Twitter, Internet security, and online advertising,
explores the evolving multilingual character of the Internet, and
offers illuminating observations about a wide range of online
behaviour, from spam to exclamation marks.
Including many activities and suggestions for further research,
this is the essential introduction to a critical new field for students
of all levels of English language, linguistics and new media.
David Crystal is a freelance writer, lecturer and broadcaster,
based in Holyhead, North Wales. He is author of numerous books
including Just a Phrase I’m Going Through (Routledge 2009).
The first Routledge David Crystal Lectures DVD, The Future of
Language, was published in 2009.
‘Crystal draws on his wealth of expertise to shed light on the
important issues related to language form and use online.’
Mark Warschauer, University of California, Irvine, USA
‘David Crystal is a master linguist and master teacher. Given his
expertise on language and the internet, he is the ideal author for
this student text.’
Naomi S. Baron, American University, USA
‘Crystal provides a unique overview of authentic applications for
linguistics on the internet and the methodological issues raised in
the case-studies will be relevant for a wide range of projects that
readers may be working on. This will become essential reading for
students in this area.’
Charlotte Taylor, University of Portsmouth, UK
Internet Linguistics: A
Student Guide
David Crystal
First published 2011
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
Simultaneously published in the USA and Canada
by Routledge
270 Madison Avenue, New York, NY 10016
Routledge is an imprint of the Taylor & Francis Group, an informa business
This edition published in the Taylor & Francis e-Library, 2011.
To purchase your own copy of this or any of Taylor & Francis or Routledge’s
collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.
© 2011 David Crystal
The right of David Crystal to be identified as author of this work has been asserted
by him in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilized
in any form or by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying and recording, or in any information
storage or retrieval system, without permission in writing from the publishers.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging in Publication Data
Crystal, David, 1941–
Internet linguistics : a student guide / David Crystal.
p. cm.
Includes index.
1. Computational linguistics. 2. Internet. 3. Internet—Social aspects. I. Title.
P98.5.I57C75 2011
004.601'4—dc22
2010034571
ISBN 0-203-83090-3 Master e-book ISBN
ISBN 13: 978-0-415-60268-6 (hbk)
ISBN 13: 978-0-415-60271-6 (pbk)
ISBN 13: 978-0-203-83090-1 (ebk)
CONTENTS
PREFACE viii
1 Linguistic perspectives 1
Misconceptions 3
Terminological caution 7
Research challenges 10
2 The Internet as a medium 16
Speech vs writing 17
The Internet as a mixed medium 19
Differences with speech 21
Differences with writing 28
A new medium 32
3 A microexample: Twitter 36
Methodological issues 39
Content issues 42
Grammatical issues 45
Pragmatic issues 48
A variety in evolution 52
vi CONTENTS
4 Language change 57
Vocabulary 58
Orthography 61
Grammar 67
Pragmatics 69
Styles 75
5 A multilingual Internet 78
Policy and technology 82
Methodological issues 86
6 Applied Internet linguistics 92
Problem areas 93
The focus on ambiguity 98
A lexicopedic approach 103
The centrality of semantics 106
An illustration 109
Other aspects 113
7 A forensic case study 122
An extract 124
A case study 125
Method 126
Results and discussion 127
8 Towards a theoretical Internet linguistics 135
Relevance and indexing 140
New directions 148
9 Research directions and activities 150
1 Debating roles (Chapter 1) 151
2 Audio issues (Chapter 2) 151
3 Distinctive forms (Chapter 2) 152
4 Testing hypotheses (Chapters 2 and 3) 153
5 Punctuation (Chapter 4) 154
6 Spam (Chapter 4) 154
7 Online translation (Chapter 5) 155
8 Localization (Chapter 6) 158
CONTENTS vii
9 Taxonomy (Chapter 6) 159
10 Semantic targeting (Chapters 6 and 7) 161
Notes 163
Further reading 171
Index 172
P REFACE
How does one write a student guide to a subject that does not
exist – or, at least, does not yet exist in such a recognized form
that it appears routinely as a course in university syllabuses or as
a chapter in anthologies of linguistics? Inevitably, it will be some-
thing of a personal account, informed by the various Internet
projects with which I have been involved. The situation reminds
me of the 1980s, when pragmatics was evolving as a field of
study, and the various published introductions differed widely
in their subject-matter. Internet linguistics is at that inchoate
stage now. I can easily imagine other introductions to the subject
– written perhaps by someone with a background in computa-
tional linguistics – which would look very different from this one.
My background is in descriptive linguistics, and it shows. But it
is an appropriate background to have, for the one thing Internet
language needs, more than anything else, is good descriptions.
A growing number of linguistics students, at undergraduate
and postgraduate levels, are now beginning to study the subject,
and I have written this book primarily for them. It will I hope also
be of interest to those who are taking a language course as part of
a degree in media or communication studies. I have assumed that
PREFACE ix
readers have completed an introductory course in linguistics, or
at least read an introduction to the subject, and are familiar with
the various domains that constitute the Internet, including the
most recent developments. They will not find here an exposition
of syntax or sociolinguistics, or of blogging or social networking.
It is an account written for people who are comfortable with the
basic tenets and methods of linguistics, well versed in Internet
activities, and curious about the relationship between the two. It
is also for those, within this population, who are fascinated by
the way Internet language is evolving, and want to research it. I
have therefore given as many pointers as I could to topics where
research is needed. My aim is not just to inform but to inspire
more linguists to work in this field, for, as will become apparent
– and surprising as it may seem – the subject is urgently in need
of them. In particular, I have illustrated my points almost entirely
from English, and this limitation needs to be overcome if the con-
clusions are to be robust.
This book is very different from my Language and the Inter-
net. The emphasis in that work was on the stylistic diversity of
the medium, so there was a focus on the linguistic features which
identify language varieties. In the present book, general issues of
characterization and methodology take centre stage. The descrip-
tive chapter on Twitter would not have been out of place in the
earlier book, but in other respects Internet linguistics tries to live
up to its title and provide a wider perspective which Language
and the Internet lacked. A certain amount of overlap has been
inevitable, but I hope it is not intrusive.
My thanks are due to those who reviewed this text on behalf
of the publisher, and also to Sacha Carton, Ian Saunders, and
others in the companies (AND, Crystal Semantics, Adpepper
Media) with whom I have had the opportunity to develop the
approaches described in Chapter 6. Above all, I owe an enor-
mous debt of gratitude to my wife and business partner Hilary,
who has shared my close encounter with the Internet, profession-
ally and privately, over the past 20 years.
David Crystal
July 2010
1
LI NGUISTIC PERSPECTIV ES
Wherever we find language, we find linguists. That is what lin-
guists are for: to seek out, describe, and analyse manifestations of
language everywhere. So when we encounter the largest database
of language the world has ever seen, we would expect to find
linguists exploring it, to see what is going on. It has begun to
happen. And a new field is emerging as a consequence: Internet
linguistics.
The name is not yet in universal use, partly because other terms
have been proposed to focus on the communicative function of
the Internet. In the 1990s, computer-mediated communication
(CMC) became widely known, a usage which was much rein-
forced when it appeared in the title of an influential online pub-
lication, the Journal of Computer-Mediated Communication.1
However, from a linguistic point of view, this term presented a
problem: it was too broad. It included all forms of communica-
tion, such as music, photographs, line-drawings, and video, as
well as language in the strict sense of the word. It is this ‘strict
sense’ that forms the foundation of any course on linguistics,
where linguists point out the important difference between spo-
ken, written, and signed language, on the one hand, and such
figurative notions as ‘the language of painting’ and ‘the language
2 LINGUISTIC PERSPECTIVES
of the face’, on the other.2 The terms language and communica-
tion are not synonymous.
The name computer-mediated communication is still widely
used, though, as are two other terms which have an even broader
remit. The emergence of mobile technology placed a certain strain
on the notion of ‘mediation by computer’. People do not really
feel they are holding a computer up to their ear when they talk
on their cellphone, notwithstanding the fact that a great deal of
computational processing is involved in making the arrangement
work. And the unease was increased by the proliferation of inter-
active speech devices. Whether a machine is talking to us (as with
satellite-navigation car instructions or airport tannoy announce-
ments) or we are talking to a machine (as with a telephone-book-
ing service or a voice-activated washing-machine) or reading an
e-book, we do not primarily think of the devices as ‘computers’.
Or, at least, they are very different ‘computers’ from the kind
we are used to seeing on our desks or carrying in our briefcases.
Many people have thus begun to use the more inclusive names
electronically mediated communication (EMC) or digitally medi-
ated communication (DMC). It is too soon to say which of these
will become standard – or, indeed, whether some other name
will emerge from cyberspace. Either way, from a linguistic point
of view they are still too broad, blurring the distinction between
language and other forms of communication.
I find Internet linguistics the most convenient name for the
scientific study of all manifestations of language in the electronic
medium. It provides the required focus, compared with human
communication as a whole (for which the name Internet semiot-
ics might be more appropriate). And it is certainly a more satis-
factory label than some of those which were proposed in the early
days of the Internet. Cyberspeak, Netspeak, and other -speak
coinages were often used in accounts aimed at a general public,3
but their weakness was that they placed undue emphasis on the
potential linguistic idiosyncrasy of the medium and suggested
that the medium was more homogeneous than it actually is. The
predominance of English on the Internet led to such names as
Netlish and Weblish, but -lish terms are far too restricting today,
given the increased e-presence of Chinese and other languages.
LINGUISTIC PERSPECTIVES 3
Electronic discourse and computer-mediated discourse also had
some use, and their focus on interaction and dialogue have kept
them alive in a social networking era. The e- prefix generated e-
language and e-linguistics, though neither seems to have caught
on; nor has cyberlinguistics. Sometimes it was the kind of activ-
ity that generated a new label, as in the case of searchlinguistics.
Internet linguistics, as I am using the term, includes them all, as
does netlinguistics. It is the study of language on the Internet – or
language@internet, as the title of an online journal has it.4
As a domain of academic enquiry, Internet linguistics is in
its infancy, but we can see how it is likely to develop. All the
recognized branches of linguistics are in principle available. We
can anticipate studies of Internet syntax, morphology, means
of transmission (phonological, graphological, multimedia),
semantics, discourse, pragmatics, sociolinguistics, psycholin-
guistics, and so on. A balance needs to be maintained between
the study of the formal properties of Internet language and the
study of its communicative purposes and effects. As descrip-
tive and theoretical findings accumulate, we can expect a fruit-
ful domain of applied Internet linguistics to emerge, providing
solutions to problems of language encountered by the vari-
ous users of the Internet, such as in search, e-advertising, and
online security. Indeed, as we shall see, a great deal of research
into Internet language has already been motivated by applied
considerations.
MISCONCEPTIONS
As has happened repeatedly in the history of language study, an
important part of the linguist’s job is to eliminate popular mis-
conceptions, and the Internet has certainly provided plenty of
these. The prophets of doom have been out in force, attributing
every contemporary linguistic worry to the new technology, and
predicting the disappearance of languages and a decline in spo-
ken and written standards. When we investigate the worries, we
invariably find they are based on myths. The moral panic that
accompanied the arrival of text-messaging (or SMS, the ‘short-
messaging service’) provides an illustration.
4 LINGUISTIC PERSPECTIVES
When text-messaging became popular in the UK, around the
year 2000, many people saw it as a linguistic disaster. Five years
later, when it began to be popular in the USA, the same reac-
tion appeared there. There was a widespread belief that texting
had evolved as a modern phenomenon, full of abbreviations that
were being used in homework and exams by a young generation
that had lost its sense of standards. A typical comment appeared
in the Daily Mail in 2007 from the broadcaster John Humphrys.
In an article headed ‘I h8 txt msgs: How texting is wrecking our
language’ he says that texters are ‘vandals who are doing to our
language what Genghis Khan did to his neighbours eight hun-
dred years ago. They are destroying it: pillaging our punctuation;
savaging our sentences; raping our vocabulary. And they must be
stopped.’ He was not alone. Other disparaging comments have
labelled the genre as ‘textese’, ‘slanguage’, and a ‘digital virus’.
It was difficult to counter these views in the absence of relevant
linguistic research. But several studies have now shown that the
hysteria about the linguistic novelty (and thus the dangers) of
text-messaging is misplaced. All the popular beliefs about texting
are wrong. To summarize the results of a growing literature:5
only a small part of text-messaging uses distinctive abbreviations
(textisms); these abbreviations are not a modern phenomenon;
they are not restricted to the young generation; young people do
not pour them into their homework and exams; and texting helps
rather than hinders literacy standards.
Text-messages are not ‘full of abbreviations’. In one American
study, less than 20 per cent of the text-messages showed abbrevi-
ated forms of any kind – about three per message. In a Norwe-
gian study, the proportion was even lower, with just 6 per cent
using abbreviations. In a collection I made myself, the figure was
about 10 per cent. People evidently swallowed whole the sto-
ries that appear from time to time asserting that youngsters use
nothing else but abbreviations when they text. The most famous
case was a story widely reported in 2003 claiming that a teen-
ager had written an essay so full of textisms that her teacher was
totally unable to understand it. An extract was posted online,
and quoted incessantly. The whole thing was a hoax – which
everyone believed.
LINGUISTIC PERSPECTIVES 5
Nor are text-message abbreviations ‘a modern phenomenon’.
Many of them were being used in chatroom interactions that
predated the arrival of mobile phones. Several can be found in
pre-computer informal writing, dating back a hundred years or
more. The most noticeable feature is the use of single letters,
numerals, and symbols to represent words or parts of words,
as with b ‘be’ and 2 ‘to’. They are called rebuses, and they go
back centuries. Adults who condemn a ‘c u’ in a young person’s
texting have forgotten that they once did the same thing them-
selves when they played word games. Similarly, the use of initial
letters for whole words (n for ‘no’, gf for ‘girlfriend’, cmb ‘call
me back’) is not at all new. People have been initializing com-
mon phrases for ages. IOU is recorded from 1618. There is no
difference, apart from the medium of communication, between a
modern kid’s lol (‘laughing out loud’) and an earlier generation’s
SWALK (‘sealed with a loving kiss’).
Nor is the omission of letters – as in msg (‘message’) and xlnt
(‘excellent’) – a new phenomenon. Eric Partridge published
his Dictionary of Abbreviations in 1942. It contains dozens of
SMS-looking examples, such as agn ‘again’, mth ‘month’, and
gd ‘good’. Texters also use deviant spellings, such as wot ‘what’
and cos ‘because’. But they are by no means the first to use such
nonstandard forms. Several of these are so much part of English
literary tradition that they have been given entries in the Oxford
English Dictionary. Cos is there from 1828 and wot from 1829.
The most important finding of the research studies is that tex-
ting does not erode children’s ability to read and write. On the
contrary, literacy improves. Strong positive links have been found
between the use of textisms and the skills underlying success in
standard English in pre-teenage children. Interestingly, the more
they used abbreviations, the higher they scored on tests of read-
ing and vocabulary. The children who were better at spelling and
writing used the most textisms. And the younger they received
their first phone, the higher their scores. Sample sizes are small,
but the results all point in the same direction.
These results surprise some people. But why should we be sur-
prised? Children could not be good at texting if they had not
already developed considerable literacy awareness. Before you
6 LINGUISTIC PERSPECTIVES
can write and play with abbreviated forms, you need to have a
sense of how the sounds of your language relate to the letters.
You need to know that there are such things as alternative spell-
ings. You need to have a good visual memory and good motor
skills. If you are aware that your texting behaviour is different,
you must have already intuited that there is such a thing as a
standard. If you are using such abbreviations as lol and brb (‘be
right back’), you must have developed a sensitivity to the com-
municative needs of your textees, because these forms show you
are responding to them.
It will be a while before the moral panic surrounding the lan-
guage of text-messaging dies down. It does not take long for a
myth to be established in the mind of the general public, but it
can take a lifetime to eradicate it. That is one of the chief respon-
sibilities of linguists – to demythologize. They need to build up
databases using larger samples, patiently publicize findings, and
try to establish a more positive climate. They can also contribute
to educational projects, suggesting ways in which the Internet
in general (and text-messaging in particular) can be introduced
into the classroom so as to facilitate learning about language. A
fruitful exercise is the ‘translation’ of text-messages into a more
formal kind of standard language, and vice versa, in order to
develop the student’s sense of the appropriateness of styles of
language in particular situations. Several schools also engage in
creative projects, such as the writing of text-messaging poetry.
What linguists cannot do is contribute professionally to
the debates which take place about the social, psychological,
legal, and other dangers associated with the Internet. Should a
teacher confiscate a mobile phone being used by a student in
class? Should parents control the amount of time their children
spend on their computer? Should employers monitor the use of
computers for work-unrelated activity? Should the Internet be
censored? Should advertising be controlled? How can we pre-
vent excessive keyboard or keypad use causing muscular dam-
age? There are many such questions, about which I (as a human
being) have my opinions; but these opinions do not relate to my
expertise as a linguist. Rather, they fall under the remit of sociol-
ogists, psychologists, physiologists, educationalists, lawyers, and
LINGUISTIC PERSPECTIVES 7
others. They are not part of an Internet linguistics, though applied
linguistic collaborations with these other domains are likely to
prove illuminating.
What I, as a linguist, see on the Internet is a remarkable
expansion of the expressive options available in a language – far
exceeding the kinds of stylistic expansion that took place with
the arrival of printing and broadcasting. These earlier media
introduced many new varieties of language, such as news articles,
advertisements, sports commentaries, and weather forecasts. The
same sort of thing has happened on the Internet, illustrated by
such new varieties as email, chat, texting, blogging, tweeting,
instant messaging, and social networking. The difference is that
the Internet is so much larger than the earlier media – it is capa-
ble of subsuming the worlds of print and broadcasting – and
changes more rapidly. We therefore need to learn to manage it,
and this point applies not only to Internet content but also to the
language in which the content is expressed.
It is not always easy to use language clearly and effectively
on the Internet. The interaction between sender and receiver is
different from traditional conversation. The anonymity of par-
ticipants alters familiar communicative expectations. Written
language on a screen does not behave in the same way as writ-
ing on a traditional page. We write it differently and we read it
differently. It is easy to be ambiguous, misleading, or offensive,
as is shown by the proliferation of netiquette guides which offer
advice about how people should behave online. In short, we need
to take care. But we cannot take care if we do not understand
the strengths and weaknesses of the various linguistic options
that are available to us. We need to understand how electroni-
cally mediated language works, how to exploit the strengths and
avoid the dangers, and this is where the developing branch of
Internet linguistics can make a significant contribution.
TERMINOLOGICAL CAUTION
Students of Internet linguistics need also to be aware that some
of the terminology they associate with the subject of linguistic
science appears on the Internet in a different guise. This is not
8 LINGUISTIC PERSPECTIVES
the first time this has happened. Linguistics has often proved to
be useful to other intellectual disciplines, which borrow its terms
and then change their meaning. The Internet has done the same,
notably with the words semantic and semantics.
Semantics began as a branch of linguistic science.6 Indeed, the
word science is used in its original definition: the French philolo-
gist Michel Bréal, who introduced the term in the 1890s, defined
it as ‘la science des significations’ – the science of meaning in
language. It came to be seen as a level of linguistic investiga-
tion, alongside phonetics, phonology, morphology, and syntax,
in such seminal works as Leonard Bloomfield’s Language; but
the abstract and indeterminate nature of ‘meaning’ meant that
it remained a neglected branch of linguistics for many decades.
The first full-scale linguistic treatment was John Lyons’ two-vol-
ume Semantics in 1977, now regarded as a classic statement of
the ‘state of the art’ within linguistics and linguistic philosophy.
In the meantime, in the absence of a linguistic characterization,
other fields found the notion of semantics useful and began to
employ it in individual ways.
The philosopher Charles Morris gave semantics a more general
interpretation in 1946, defining it as the interpretation of signs
in general – signs here being used in an abstract sense to include
everything that conveys information. It therefore included facial
expressions, bodily gestures, road signs, railway signals, and
other non-linguistic systems. Also in the 1940s, the term achieved
a certain notoriety in popular usage, where ‘it’s just semantics’
began to refer to an irritating or pointless quibble. Psychologist
Charles Osgood took the term in a different direction in 1953,
referring to the judgements people make about words, and devis-
ing a system of rating scales which he called a ‘semantic differ-
ential’ – whether words are judged as strong/weak, good/bad,
active/passive, and so on. Sometimes the term was narrowed, as
when it began to appear in medicine with reference to a clinical
syndrome – ‘semantic aphasia’, where people lose the ability to
use words after brain damage. Sometimes it was broadened, as
when Alfred Korzybski developed ‘general semantics’ in the 1930s
as a method of enabling people to avoid the ideological traps
built into language. But the term has achieved one of its widest
LINGUISTIC PERSPECTIVES 9
extensions in the notion of the ‘Semantic Web’, where it includes
all concepts and relationships within human knowledge.
‘The vision I have for the Web is about anything being poten-
tially connected with anything’, says the web’s inventor, Tim
Berners-Lee, on the first page of his biographical account, Weav-
ing the Web.7 The Semantic Web will evolve ‘without relying
on English or any natural language for understanding’, he says
a little later. There could be no broader definition of semantics
than that, and no definition that is further away from the original
linguistic intention. The Semantic Web is seen to be an evolu-
tion of the web: the existing web is human readable, whereas the
Semantic Web will be machine readable. Faced with the web in
its current form, it is the human user who has to specify, find,
and implement the links between one page or site and another; in
the Semantic Web, the links will be processed by computers with-
out human intervention. Both a linguistic and an encyclopedic
dimension will be involved. For example, to achieve a presence
for automobile on the Semantic Web, the linguistic definition (as
found in a dictionary) would include such features as ‘vehicle’,
‘wheels’, ‘drive’, and ‘road’; the encyclopedic account would
include such elements as the different makes of car, their cost,
and their safety record.
Semantics has achieved a buzz word status on the Internet
these days, with many companies and approaches to knowledge
management calling themselves ‘semantic’ (see further, Chapter
6). It must not be assumed that they are all talking about the
same thing, or focusing on the same aspects of language. And this
cautionary note applies in principle to any use of a linguistic term
when found in the context of the Internet.
A rather different terminological question is what to call the
various entities which form Internet discourse, such as email,
blogs, chats, and tweets. A main aim of Internet linguistics is to
establish their linguistic character. They are often described as
genres, but that suggests a homogeneity which has not yet been
established. The same question-begging would arise if they were
called varieties or dialects or registers or any of the other terms
for situationally related uses of language provided by sociolin-
guistics and stylistics. Linguists have to demonstrate linguistic
10 LINGUISTIC PERSPECTIVES
coherence, not assume it. We need a term that is theoretically
neutral, from the linguistic point of view, and for the present
book I propose to use outputs. I shall talk about email, for exam-
ple, as being one of the outputs of Internet technology. The term
implies nothing about its linguistic character, or how it relates to
other outputs.
RESEARCH CHALLENGES
There are several properties of Internet language which consti-
tute a challenge to linguists wanting to explore this medium. The
amount of data it contains, first of all. There has never been a
language corpus as large as this one. It now contains more writ-
ten language than all the libraries in the world combined, and its
informational content is rapidly increasing as more parts of the
world come online, video storage grows (via such networks as
YouTube), and voice-over-Internet becomes routine.
Secondly, there is the diversity of the language encountered
on the Internet. The stylistic range has to recognize not only
web pages, but also the vast amount of material found in email,
chatrooms, virtual worlds, blogging, instant messaging, texting,
tweeting, and other outputs, as well as the increasing amount
of linguistic communication in social networking forums (over
170 in 2011) such as Facebook, MySpace, Hi5, and Bebo. Each
of these outputs presents different communicative perspectives,
properties, strategies, and expectations. It is difficult to find lin-
guistic generalizations that apply comfortably to Internet lan-
guage as a whole.
Part of the reason for this is another linguistically challeng-
ing property: the speed of change. It is not easy to keep pace
with the communicative opportunities offered by new tech-
nologies, let alone to explore them in the required linguistic
detail. By way of anecdotal illustration, the first edition of my
Language and the Internet appeared in 2001: it made no refer-
ence to blogging and instant messaging, which had achieved
little public presence at that time. A new edition of the book
was therefore quickly needed, and that appeared in 2006. It
included sections on the language of blogs and of instant
LINGUISTIC PERSPECTIVES 11
messages, but it made no reference to the social networking
sites, which had achieved little prominence, and certainly no
mention of Twitter, which arrived in the same year. Linguistic
studies of the Internet always run the risk of being out of date
as soon as they are written.
Even within a single output, it is difficult to keep pace. How
can we generalize about the linguistic style of emails? When
email first became prevalent, in the mid-1990s, the average age
of emailers was in the 20s. Today, it is in the late 30s: the aver-
age in the UK rose from 35.7 to 37.9 in the year October 2006
to October 2007, according to Nielsen Online.8 Doubtless simi-
lar increases are to be found in other countries. This means that
many emailers, for example, are now senior citizens – ‘silver surf-
ers’, as they are sometimes called. The consequence is that the
original colloquial and radical style of emails (with their deviant
spelling, punctuation, and capitalization) has been supplemented
by more conservative and formal styles, as older people intro-
duce their norms derived from the standard language.
Another example of rapid change comes from Twitter, which
uses a prompt to elicit a user response. In November 2009 the
nature of the prompt changed from ‘What are you doing?’ to
‘What’s happening?’ As the Twitter blog explained:
The fundamentally open model of Twitter created a new kind of informa-
tion network and it has long outgrown the concept of personal status
updates. Twitter helps you share and discover what’s happening now
among all the things, people, and events you care about. ‘What are you
doing?’ isn’t the right question anymore – starting today, we’ve short-
ened it by two characters. Twitter now asks, ‘What’s happening?’9
The blogger added: ‘We don’t expect this to change how any-
one uses Twitter’. But in fact a change from an inward-looking
question to an outward-looking one could not fail to alter the
content of the site. Twitter now has far fewer isolated postings
and far more semantic threads (see further, Chapter 3). In the
terminology of classical linguistics, we are faced with a new lan-
guage state (Saussure’s état de langue), which raises the question
of how we investigate the old ones.
12 LINGUISTIC PERSPECTIVES
For most people, the Internet became a reality following the
arrival of the web in 1991, and a searchable reality after the
arrival of Google in 1999. In that time, it went through several
changes, reflecting the technological developments of the time.
Each of these changes will have had linguistic consequences. For
example, the kinds of constraint which gave a particular linguis-
tic character to online games (MUDs, MOOs) in the 1990s have
long been superseded. This means that the language of those
games (1990s era) is in some ways like a period in the history of a
language, needing to be studied in its own terms. But defining the
boundaries of that period proves to be extremely difficult. The
start-point of a new language output is relatively easy to estab-
lish, as it is linked to the innovative technology: people conver-
sant with the history of the science can say with some precision
when the language we associate with text-messaging, blogging,
and tweeting began. What is more difficult is to identify end-
points, when a technology becomes outmoded or evolves into
something different. And even when one has a sense of start- and
end-points, tracking down the relevant data can be surprisingly
difficult.
The Internet is sometimes wonderfully specific about its tem-
poral identity, and at other times frustratingly inspecific. Beneath
every page there is information about when the page was created;
but only in a proportion of instances does that date appear on
screen. This can cause great confusion, when (for example) a
search for the population of a country yields several conflict-
ing figures, and it remains unclear whether these reflect a syn-
chronic or a diachronic perspective. When dates do appear, they
are sometimes incomplete: many news sites, for example, give
the day and the month, but not the year. There are techniques
for finding the creation date of a page, or the date when the page
was first spidered by a search engine or later updated, but they
are cumbersome to nonspecialists.
When the dates are available, linguists find themselves faced
with a different kind of problem: how to handle the unprec-
edented specificity? Linguists are used to being vague when it
comes to describing language change: a word is said to have
entered the language ‘in the early sixteenth century’ or in ‘the
LINGUISTIC PERSPECTIVES 13
1780s’. Indeed, with rare exceptions, it has been impossible to
identify the precise moment at which a new word or sense arrives
in a language. But the time-stamping of web pages, and the ability
to track changes, opens up a whole new set of opportunities. If I
introduce a new word such as digitextualization on my website
tomorrow at 09.42, it will be possible for lexicographers to say
that the first recorded use of this word was at 09.42 on that day.
This sort of chronological specificity has hitherto been of profes-
sional interest only to forensic linguists, concerned to identify
patterns of criminal interaction, but it will in future be of much
broader relevance. It is not yet clear how Internet linguistics will
handle this level of descriptive detail.
Finally, leaving aside questions of dating, some kinds of Inter-
net language present a rather different kind of challenge: inacces-
sibility. There is of course no problem in finding and download-
ing data from the pages of the web, within the various legal and
commercial constraints imposed by website-owners. But it is a
different matter when dealing with such outputs as email, chat,
and text-messages. People are notoriously reluctant to allow their
private e-communications to be accessed by passing linguists.
There are now some excellent corpora of emails and chatroom
interaction, but issues of reliability and representativeness have
yet to be fully explored, and some domains, such as text-messag-
ing, remain elusive, especially in languages other than English.
The research literature is characterized by a great deal of theo-
retical speculation but relatively few empirical studies.
Another research issue arises out of the practice of anonymity.
Normally, linguists take great pains to establish the situational
factors which motivate or condition a use of language. Factors
such as age, gender, class, and ethnicity are critical. But in a
medium where a large number of participants hide their identity,
or where we cannot trust the self-disclosed information about
themselves which they place online, it is difficult to know how
to interpret observed usage. Even fundamental distinctions, such
as whether a netizen is male or female, or a native or non-native
speaker, can be obscured. The Internet is not the first medium to
allow interaction between individuals who wish to remain anon-
ymous, of course, as we know from the history of telephone and
14 LINGUISTIC PERSPECTIVES
amateur radio; but it is certainly unprecedented in the scale and
range of situations in which people can hide their identity, espe-
cially in chatgroups, blogging, and social networking. The effect
of anonymity on linguistic behaviour also needs to be explored.
Operating behind a false persona seems to make people less
inhibited: they may feel emboldened to talk more and in differ-
ent ways from their real-world linguistic repertoire. They must
also expect to receive messages from others who are likewise
less inhibited, and be prepared for negative outcomes. There are
obviously inherent risks in talking to someone we do not know,
and instances of harassment, insulting or aggressive language,
and subterfuge are commonplace.
Ethical considerations also need to be taken into account: what
kinds of permission are needed to use Internet data? The same
questions that linguists had to address in the 1960s, in the early
days of corpus construction – such as the distinction between pub-
lic and private language – have risen again in electronic form. If
I send a message to the Internet, I have presumably let it go into
the public domain: do I then have any right to object if a linguist
includes it in a corpus? Who owns the text-messages in my mobile
phone archive: are they mine or the senders’? In an increasingly
litigious world, linguists need to take care that their data-collection
procedures are robust with respect to the question of ownership.
As the old saying goes: turn a challenge over and you see an
opportunity. The Internet offers linguists unprecedented opportu-
nities for original research. Because we are dealing with an elec-
tronic medium, we need to not only investigate the new kinds of
language introduced by the technology (blogging, tweeting, etc.),
but also reinterpret everything we already know about language
as realized through the older mediums of speech, writing, and
sign. Whatever facts were established about, say, the differences
between spoken and written vocabulary and grammar, these now
have to be revisited, because the way we use language on the Inter-
net is different in salient respects from the way we use it in tradi-
tional speech and writing. Which Internet styles of writing pro-
mote the use of abbreviations and emoticons? How does column
width affect discourse structure? Do hypertext links influence the
way a written text is organized? How does speech lag affect the
LINGUISTIC PERSPECTIVES 15
character of a spoken conversation on Skype or iChat? Every use
of language on the Internet will display features that do not cor-
respond to the features identifying that use in traditional speech or
writing. Written language has to be graphically translated10 so that
its content appears clearly on screen and can be easily accessed and
navigated. Spoken language too needs to be processed so that its
content can be indexed and navigated, with the possibilities here
dependent on progress in automatic speech recognition.
Even when the electronic medium simply scans texts for view-
ing on screen, it presents those texts in new ways, allowing us
to do new things with them. We can zoom in on an ancient
manuscript and see detail that was not easily visible before, or
carry out linguistic searches which were not practicable before.
Well-studied uses of speech and writing appear in fresh guises.
News journalism, for example, can look very different on screen
compared with how it would appear in the traditional medium
of print – paragraph size, for example, is often shorter. A poem
on a screen is a very different reading experience from one in a
printed book, especially when, as in text-messaging poetry, the
small screen allows only a small part of the poem to be seen at
any one time. The novelty is most apparent in the written lan-
guage, for the Internet to date has been a predominantly graphic
medium; but spoken language is also affected. Even the ‘listen
again’ feature in a broadcasting station offers new possibilities:
the programme is the same as it was on the radio, but the listener
now has the opportunity to stop it at will, to listen to something
a second time, to skip sections, and to move forwards and back-
wards along the timeline. The management of the auditory expe-
rience has transferred from the producer to the receiver.
The first step, then, in an Internet linguistics, is to establish the
properties of the medium which condition the language experi-
ence and behaviour of its users. The most illuminating way of
doing this, in my view, is to start by distinguishing it from the
familiar worlds of spoken and written language.
See also ‘Research directions and activities’, p. 151.
2
THE I NTERNET AS A ME D IU M
Linguists have been used to thinking of language in terms of
speech and writing ever since the subject began. In due course,
signing was added to make it a triad. When these dimensions
are defined, great reliance is placed on the notion of medium (or
modality, in some studies) – for speech, the phonic medium (air);
for writing, the graphic medium (marks on a surface); for sign-
ing, the visual medium (hand movement and facial expression).
Now we have a fourth dimension of linguistic communication
– an electronic or digital medium.
How are we to compare mediums of communication? The
anthropological and zoological approaches to semiotics have
shown us the fruitfulness of a design-feature framework, in which
salient properties of communication are identified and used as a
basis of comparison. In linguistics, this procedure was first intro-
duced by Charles Hockett in his comparison of language with ani-
mal communication.1 The approach essentially asked the question:
what does Medium A do that Medium B does not do, and vice
versa? Language, for example, displays the property of displace-
ment (the ability to talk about events remote in space or time from
the situation of the speaker), whereas gibbon calls, for example, do
not (the cries reflect immediate environmental stimuli). Hockett’s
THE INTERNET AS A MEDIUM 17
approach brought to light an important point: that not all animal
communication is the same. In relation to displacement, for exam-
ple, bee dancing shares some properties with human language.
We can apply this approach to the Internet. The aim is to
establish whether the electronic medium makes Internet lan-
guage different from that found in other mediums, and, if we
encounter differences, to examine why they are there. In view of
the technological range and speed of development of the Internet
(as summarized in Chapter 1), we must not expect the answers
to be the same for all outputs. The relationship of the language
to its associated technology will vary. The situations in which the
language is used will vary. But we will expect to find certain com-
mon properties, or at least parameters in terms of which different
outputs can be measured.
How are we to establish these properties? A first approximation
can be obtained by comparing the Internet with spoken and writ-
ten language. It has often been pointed out that the way we talk
about the Internet suggests an uncertainty over the relationship
with these two mediums. On the one hand we talk about having
an email ‘conversation’, entering a ‘chat’ room, and ‘tweeting’.
On the other hand we talk about ‘writing’ emails, ‘reading’ web
‘pages’, and sending ‘texts’. Is Internet language closer to speech
or to writing, or is it something entirely different?
SPEECH VS WRITING
After half a century of research in several general and applied lin-
guistic domains, such as grammar, lexicography, stylistics, and
foreign language teaching, the chief differences between speech
and writing have been clearly identified.2
Speech is time bound, dynamic, and transient; it is part of an
interaction in which both participants are usually present, and
the speaker has a particular addressee (or several addressees) in
mind. Writing is space bound, static, and permanent; it is the
result of a situation in which the writer is usually distant from the
reader, and often does not know who the reader is going to be.
With speech there is no time lag between production and recep-
tion, unless one is deliberately introduced by the recipient. The
18 THE INTERNET AS A MEDIUM
spontaneity and speed of most speech exchanges make it difficult
to engage in complex advance planning. The pressure to think
while talking promotes looser construction, repetition, rephrasing,
and comment clauses (such as you know, you see, mind you). Into-
nation and pause divide long utterances into manageable chunks,
but sentence boundaries are often unclear. By contrast, with writ-
ing there is always a time lag between production and reception.
Writers must anticipate the effects of this lag, as well as the prob-
lems posed by having their language read and interpreted by many
recipients in diverse settings. Writing allows repeated reading and
close analysis, and promotes the development of careful organiza-
tion and compact expression, with often intricate sentence struc-
ture. Units of discourse (sentences, paragraphs) are usually easy to
identify through punctuation and layout.
With speech, because participants are usually face to face, they
can rely on such extralinguistic cues as facial expression and ges-
ture to aid meaning (feedback). The lexicon is often vague, using
words which refer directly to the situation (deictic expressions,
such as that one, in here, right now). With writing, lack of visual
contact means that participants cannot rely on context to make
their meaning clear; nor is there any immediate feedback. Most
writing therefore avoids the use of deictic expressions, which are
likely to be ambiguous.
Many words and constructions are characteristic of (espe-
cially informal) speech, such as contracted forms (isn’t). Lengthy
coordinate sentences are normal, and are often of considerable
complexity. There is nonsense vocabulary (e.g. thingamajig),
obscenity, and slang, some of which may appear as graphic
euphemism (f***). Writing displays different characteristics,
such as multiple instances of subordination in the same sentence,
elaborately balanced syntactic patterns, and the long (often
multi-page) sentences found in some legal documents. Certain
items of vocabulary are never spoken, such as the longer names
of chemical compounds.
Speech is very suited to social or ‘phatic’ functions, such as pass-
ing the time of day, or any situation where casual and unplanned
discourse is desirable. It is also good at expressing social relation-
ships and personal attitudes, due to the vast range of nuances which
THE INTERNET AS A MEDIUM 19
can be expressed by the prosody and accompanying nonverbal fea-
tures. Writing is very suited to the recording of facts and the com-
munication of ideas, and to tasks of memory and learning. Written
records are easier to keep and scan, tables demonstrate relation-
ships between things, notes and lists provide mnemonics, and text
can be read at speeds which suit a person’s ability to learn.
With speech, there is an opportunity to rethink an utterance
while the other person is listening (starting again, adding a quali-
fication). However, errors, once spoken, cannot be withdrawn;
the speaker must live with the consequences. Interruptions and
overlapping speech are normal and highly audible. With writ-
ing, errors and other perceived inadequacies can be eliminated
in later drafts without the reader ever knowing they were there.
Interruptions, if they have occurred while writing, are also invis-
ible in the final product.
Unique features of speech include most of the prosody. The
many nuances of intonation, as well as contrasts of loudness,
tempo, rhythm, pause, and other tones of voice, cannot be
written down with much efficiency. Unique features of writing
include pages, lines, capitalization, spatial organization, and sev-
eral aspects of punctuation. Only a very few graphic conventions
relate to prosody, such as question marks and italics. Several
kinds of writing (e.g. timetables, graphs) cannot be read aloud
efficiently, but have to be assimilated visually.
When speech and writing are analysed in this way, it is plain
that it would be simplistic to treat them as two self-contained
and homogeneous entities. Varieties of language can be shown
to combine some of the above characteristics in different degrees.
It is more realistic to think of speech and writing as being the
end-points of a multidimensional continuum, within which vari-
eties can be located as being ‘more or less like speech’ and ‘more
or less like writing’. The varieties that form the Internet can be
approached in the same way.
THE INTERNET AS A MIXED MEDIUM
Internet language outputs vary with respect to their similarities
with speech and writing. At one extreme is the web, which in
20 THE INTERNET AS A MEDIUM
many of its functions (such as reference publishing and advertis-
ing) is no different from traditional situations that use writing;
indeed, many varieties of written language (science, law, journal-
ism, etc.) can now be found on the web with little stylistic change
– none at all, if an exact digital copy has been made. In contrast,
email, chat, instant messaging, and texting, though expressed
through the medium of writing, display several of the core prop-
erties of speech. They are time governed, expecting or demand-
ing an immediate response; they are transient, in the sense that
messages may be immediately deleted (as in emails) or be lost to
attention as they scroll off the screen (as in chatgroups); and their
utterances display much of the urgency and energetic force which
is characteristic of face-to-face conversation.
In relation to speech, the visual interaction of such packages as
Skype and iChat, or the split screens used in some kinds of tex-
tual chat, is the closest we get to face-to-face interaction, though
the ever-present lag between message transmission and reception
denies it the simultaneity we encounter in everyday conversa-
tion. When the visual dimension is absent, instant messaging
can approximate to the dynamic give and take of a conversa-
tion, though lacking the property of simultaneous feedback (see
below). In chatgroups, the pressure on individuals to respond is
still there, but less strong because the responsibility is shared.
With social networking forums and Twitter conversations, there
is no obligatory time-based dynamic, though many participants
do respond to incoming messages promptly. With email, there is
greater flexibility over delaying a response. With blogging and
most web pages, responses are optional, even when solicited.
The outputs vary greatly with respect to their linguistic idiosyn-
crasy and complexity. At one extreme we find the web, which dis-
plays the same range of written constructions and graphic options
as would be found in the corresponding texts of traditional
print. Online government reports, newspaper editions, or literary
archives (such as Project Gutenberg) have a great deal in com-
mon with their offline equivalents (though there is never identity,
as screen and page offer different functionalities and constraints).
At the other extreme, the character limits of texting and tweet-
ing reduce the grammatical and graphic options, and the more
THE INTERNET AS A MEDIUM 21
elaborate sentence patterns do not appear. In between, we find
outputs, such as blogging, that vary greatly in their constructional
and graphic complexity. Some blogs are highly crafted; others are
wildly erratic, when compared with the norms of the standard
written language. Emails vary enormously: some people are happy
to send messages with no revision at all, not caring if typing errors,
spelling mistakes, and other anomalies are included in their mes-
sages; others take as many pains to revise their messages as they
would in non-electronically mediated communication settings.
Internet outputs also vary greatly with respect to their commu-
nicative functions. There is a great deal of factual content on the
web, and in blogs and emails. Chatrooms and social networking
sites are highly variable: the more academic and professional they
are, the more likely they are to be factual in aim; the more social
they are, the more likely they are to contain sequences which
have negligible factual content. Instant message exchanges are
also highly variable, sometimes containing a great deal of infor-
mation, sometimes being wholly devoted to social chit-chat.
On the whole, Internet language is better seen as writing which
has been pulled some way in the direction of speech rather than
as speech which has been written down. However, expressing
the question in terms of the traditional dichotomy is misleading.
Internet language is identical to neither speech nor writing, but
selectively and adaptively displays properties of both. It is more
than an aggregate of spoken and written features. It does things
that neither of the other mediums does.
DIFFERENCES WITH SPEECH
Simultaneous feedback
The most important difference is the lack of simultaneous feed-
back. In a conversation, listeners perform an active role, using
vocalizations (such as mhm and really?), facial movement (such
as nodding and laughing), and gestures (such as hand movements
and shrugging) as a running commentary on the interaction.
Speakers unconsciously take note of this feedback and modify
their speech accordingly. The feedback acts as an index of ‘how
22 THE INTERNET AS A MEDIUM
we are doing’. If we say something ambiguous or potentially
offensive, it can be queried straight away. If we are uncertain of
how to put something, we can check with our listeners.
In Internet situations, simultaneous feedback is invariably
absent. When someone is writing an email, there can be no such
feedback, because the recipient is unaware of the impending mes-
sage. Successive feedback will arrive, but not simultaneous. Even
in so-called ‘instant’ messaging, while the fragment of dialogue
is being typed there is no simultaneous feedback. And even in
an apparently face-to-face situation, such as two people sending
messages to a split screen at the same time, or a dialogue using
visual Skype, there is a lag which can cause conversational inter-
ference, making the participants unsure about the relationship
between turns. In an audio situation, people find themselves talk-
ing at the same time and having to repeat what they said when
it becomes apparent that the other person has not heard them.
Things will improve as the technology matures, but Internet con-
versations currently lack the kind of immediate mutual respon-
siveness that we take for granted in everyday dialogue.
Linguists need to explore the consequences of this. If users
of the Internet cannot rely on obtaining simultaneous feedback
from their interlocutors, what effect does this have on the way
they use language? To take one example: an important feature
of informal conversation is its use of reaction signals, comment
clauses, and tag questions (such as mhm, you know and isn’t it?),
which give the listener the option of providing feedback. In a sit-
uation where this feedback is missing, will such features continue
to be used, or will they be adapted in some way? My impression
is that they are generally absent, but we need descriptive studies.
And if they are absent, we need to analyse what the effect of this
will be. Some writers have suggested that the lack of these fea-
tures is one of the reasons why so many Internet interactions are
misperceived as abrupt, cold, distant, or antagonistic. Address-
ing someone on the Internet is a bit like having a telephone con-
versation in which a listener is giving us no reactions at all: it is
an uncomfortable and unnatural situation, and in the absence of
such feedback our own language becomes more awkward than it
might otherwise be.
THE INTERNET AS A MEDIUM 23
Are people aware, when writing an email, that their language
is autonomous – that they are ‘on their own’? Judging by the
comments of neophyte emailers, the answer is no. Most of us
can recall cases where we sent an email, received an unexpect-
edly upset reply, and on rereading our message realized we had
said something we had not intended to say. People doubtless
learn from their mistakes. Netiquette guides repeatedly advise
that we read emails through before sending them, and similar
advice is relevant for all social networking situations. But the
guides are notably unhelpful when it comes to giving specific
advice about those aspects of grammar, vocabulary, orthogra-
phy, and style which will help or hinder the efficacy of an Inter-
net exchange. Most simply adopt old prescriptive attitudes,
repeating artificial grammatical shibboleths such as avoiding
the passive voice. We need more sophisticated, linguistically
informed accounts of why some Internet exchanges are more
successful than others.
Emoticons
It was an early awareness of the dangers of ambiguity which led
to the development of emoticons. Apart from in video interac-
tions, Internet exchanges lack the facial expressions, gestures,
and conventions of body posture and distance (the kinesics and
proxemics, as they are called in semiotics) which are so critical
in expressing personal opinions and attitudes, and in moderating
social relationships. The new symbols, such as the basic pairing
of :) and :( for positive and negative reactions respectively, were
intended to remove attitudinal ambiguity. Today there are over
60 emoticons usually offered by message exchange systems, and
some dictionaries list several hundred possibilities using ortho-
graphic features (such as constructing ~(_8^(|) to identify Homer
Simpson). However, despite the creative artistry, the semantic
role of emoticons has proved to be very limited. An individual
emoticon can still allow many readings – the basic smile, :), for
example, can mean sympathy, delight, amusement, and much
more – and these can be disambiguated only by referring to
the verbal context. Without care, moreover, they can increase
24 THE INTERNET AS A MEDIUM
misunderstanding: adding a smile to an utterance which is ironic
can be taken negatively as well as positively.
Usage is therefore changing. Emoticons were never very fre-
quently used – one study showed only 13 per cent of emails
contained them3 – and they seemed to be used more by young
people. Some linguists have interpreted this to mean that adults
have better communicative skills: they do not need to rely on
the crude attitudinal approximations that emoticons provide. On
the other hand, adults are quite prepared to use an emoticon to
replace an entire utterance – an emoticon with a broad grin, for
example – in a situation where speed of response is at a premium,
such as an instant messaging exchange. A lot depends on the
output: an utterance consisting solely of an emoticon would be
unusual in Twitter, where there is an expectation that messages
should be to some degree semantically self-contained. There are
also sociolinguistic and stylistic factors constraining our use of
these symbols. Is it the case that the more serious the content,
the fewer the emoticons? Or the more formal the interaction, the
fewer the emoticons? Is there a correlation between emoticon use
and age, gender, or ethnicity? In one instant messaging study,4
three-quarters of the 16 females used emoticons, but only 1 (in
6) of the males. We need more studies of who uses emoticons,
when, where, and why, in each kind of Internet activity.
Multiple conversations
In a traditional speech setting, it is impossible to hold a conver-
sation with more than one or two people at a time. Entering a
room in which several conversations are taking place simultane-
ously, we cannot pay attention to all of them or interact with
all of them. But in real-time multi-party settings on the Internet,
this is perfectly feasible and normal. In a chatroom, for example,
we observe messages from other participants scrolling down the
screen: there may be several conversations going on, on differ-
ent topics, and we can attend to them all, and respond to them,
depending only on our interest, motivation, and ability to type.
It is not clear how people communicate effectively, under such
circumstances. Short sentences, abbreviated words, punctuation
THE INTERNET AS A MEDIUM 25
avoidance, and other strategies motivated by economy account
for some of the stylistic features of chat, but complex sentences
can be encountered, and there is a great deal of individual
variation.
Nor is it clear how participants cope with the vagaries of turn-
taking, when several people are involved, and when the order
in which messages (transmission units) appear on a screen is
dependent on factors that are beyond the control of the partici-
pants. Messages are posted to a screen linearly, in the order in
which they are received by the system. In a multi-user environ-
ment, they are coming in from various sources all the time, and
with different lags, because of the way packets of information
are sent electronically through different global routes. A reaction
to a particular stimulus (such as a response to a question) can be
separated by an unpredictable number of other utterances. Even
in a two-way interaction, such as an instant messaging exchange,
the usual linear organization of face-to-face conversation can be
disrupted by a range of factors. Participant N may briefly leave
the interaction, while Participant P, unaware of N’s absence,
continues to send messages. N then returns and ‘catches up’ in
a string of responses to P. If P has made three points (let us call
them 1, 2, and 3), then N’s responses to each point (1r, 2r, and
3r) will be seen as a block, so that on screen what we see is 1, 2,
3, 1r, 2r, 3r, and not (as in an offline conversation) 1, 1r, 2, 2r,
3, 3r. Such a situation is bound to have some effect on the way
the discourse grammar operates. Are responses governed by the
same rules of ellipsis as are found in face-to-face conversation?
What constraints might there be on the use of anaphoric pro-
nouns? Could sequence of tenses be affected? There are many
such questions awaiting investigation.
A basic question is: how often does this happen? In two-way
(dyadic) interactions, the figure seems to be quite low. In a Swed-
ish study of over 1,500 dyadic instant messaging utterances, only
10 per cent of the utterances were not adjacent to the utterance
to which they related.5 In data of my own, the corresponding
figure was 15 per cent, but with considerable variation across
conversations (from 4 per cent to 27 per cent). Even so, we are
not talking large numbers: the majority of utterances respect
26 THE INTERNET AS A MEDIUM
adjacency. In the cases that do not, several factors seem to be
involved. The subject-matter of the conversation is one, as a dis-
cussion of a serious topic to which both parties are contributing
will motivate more sequences of utterances on each side, and
there will be more ‘talking at the same time’, with specific points
interacting in various ways. Overlapping is also likely to occur
at the point where one party introduces a change of topic, as in
utterance (U) 19 below:
15 H had you been to Steve’s house before?
16 L no,
17 L is cute
18 H isn’t it
19 H i’m working at home today
20 H the alarm man’s coming to reset the alarm
21 L bit scary with a 2 yr old – lots of light colours!
22 L ok, me too . . .
23 H isn’t it!
24 L dad too?
U21 continues the topic of U15–18, U22 responds to U19, U23
responds to U21, and U24 takes further U19. This is partly a
function of the time lag between utterances. If H had waited
longer before sending U19 and U20, L’s U21 would probably
have continued the theme of Steve’s house in the appropriate
place, and U22 would have followed U19. Instant messaging logs
are only a partial reflection of the discourse realities. The lag
makes them appear to be more incoherent than in fact they are.
It might be thought that disruption to turn-taking would inevi-
tably lead to a breakdown in communication, but analysis sug-
gests that this is not so. In the Swedish study referred to above, of
the 144 utterances coded as relating to a non-adjacent previous
utterance, 126 (87.5 per cent) caused no misunderstanding. In
my family data, there was no misunderstanding at all. Why is this
so? To begin with, the distance between a response utterance and
its preceding stimulus is not usually very great. Out of 122 utter-
ances in the H/L conversation, 18 were non-adjacent reactions
– illustrated by this sequence, where U30 replies to U28:
THE INTERNET AS A MEDIUM 27
28 L no news?
29 H then home by 1 for lunch
30 H not yet
Ten of the 18 non-adjacent reactions were separated by a sin-
gle utterance in this way, and a further five by two utterances.
Only three involved greater degrees of separation (one of which
is illustrated above: U24 to U19), and there was no problem of
miscomprehension.
An important factor is the sequencing of utterances. If N asks a
question, N expects a reply, and is capable of waiting for that reply
even though other utterances intervene. The intervening utter-
ances typically do not cause ambiguity because they are grammati-
cally and semantically unrelated. The relevant reply is signposted
through the use of response grammar and by lexical items belong-
ing to the semantic field of the question, as in this sequence:
150 N So will you be driving?
151 P I think Mike’s going to be there.
152 P That’ll make Jane smile.
153 P If the MOT’s OK.
Participants even seem to cope with ambiguous anaphoric refer-
ences and elliptical utterances, partly by remembering the lin-
guistic context, but also by using their knowledge of the situation
– something that is especially important in a situation where the
participants know each other well. For example, U24 in the H/L
dialogue could theoretically refer to U20, as it elides all reference
to the activity; but both parties know that it could only refer to
U19. Similarly, in U152 in the N/P dialogue, the that is theoreti-
cally ambiguous: it could refer to either the driving or to Mike.
But as both participants know about the relationship between
Mike and Jane, there is no problem.
If a participant feels that too much space has elapsed, several
strategies are available to reaffirm semantic order, such as intro-
ducing the topic again. P, for example, could say:
P Driving? If the MOT’s OK.
28 THE INTERNET AS A MEDIUM
An example of this occurred in the H/L conversation, where after
several utterances H returned to an earlier theme:
97 H re M – that’s so exciting
This is especially likely to happen in forums, where several par-
ticipants are involved and the time lag between messages is some-
times considerable. An observation submitted then has to have
its target message identified, to avoid it being associated with the
wrong utterance – for example, by inserting the name of the mes-
sage-sender before a reply, as in ‘Rob: I agree’. How many other
such strategies are there? We need more studies of the techniques
interactants use to maintain their sense of discourse organization
in conversations involving multiple participants.
DIFFERENCES WITH WRITING
Hypertext links
The Internet is an association of computer networks with com-
mon standards which enable messages to be sent from any reg-
istered computer (or host) on one network to any host on any
other. The mechanism which allows this to happen is the hyper-
text link – the colour-coded element on screen that users click on
when they want to move from one part of the system to another.
Hypertextuality is the most fundamental functional property of
the Internet, without which the medium would not exist. It has
parallels in some of the conventions of traditional written text
– such as the footnote number, the cross-reference, and the bib-
liographical citation. These also motivate a reader to move from
one place in a text to another. But they are optional features. It is
perfectly possible to have a traditional text, such as a brochure,
which has no footnotes or cross-references at all. It is not pos-
sible to have an Internet without hypertext links.
The hypertextuality in the current state of the Internet is of a
very limited kind, dependent on the decisions made by individ-
ual site designers. In a fully hypertextual system, all documents
would be completely and automatically interrelated. In the present
THE INTERNET AS A MEDIUM 29
system, links between sites are partial and often not reciprocated.
Site X might link to site Y, but Y does not link to X. Nor does the
existence of a link mean that it is achievable, as everyone knows
who has encountered a ‘page not found’ error message. But tech-
nical issues aside, several linguistic questions arise. How should
hypertext links be decided? It makes an interesting pedagogical
exercise for a class to take a page of text and discuss, in an ideal
hypertextual world, which elements would make the best links.
A related exercise is to look at a real Internet page and evaluate
whether links have been underused or overused. Just as we can
over-footnote a traditional text, so we can over-link a web page.
We need to ask how relevant or informative are the links on a
page, and these are linguistic questions. Berners-Lee put it like
this, when he wrote that the web ‘is more a social creation than
a technical one . . . to help people work together’.6 This seems to
place the issue firmly within the remit of those parts of linguistics
which deal with questions of discourse organization and audi-
ence – pragmatics, stylistics, and sociolinguistics.
Persistence
One of the most fundamental properties of traditional writ-
ing is its space-bound character – the fact that a piece of text is
static and permanent on the page. If something is written down,
repeated reference to it will be an encounter with an unchanged
text. By contrast, a page on the web often varies: its factual con-
tent can change in front of our eyes, as when news headlines
scroll across the screen or advertising pop-ups appear. This kind
of dynamic or animated language is not restricted to the Internet
– it has long been a feature of the neon signs in public advertis-
ing. What is different is the person-directed nature of the ani-
mation. We may even find a feature of our personal behaviour
highlighted, as an ad colourfully informs us that we have won a
huge sum of money.
Many web pages of course do have content which remains
unchanged on repeated viewing – in a newspaper archive, for
example, where the pages are an electronic replica of their
printed original. But there are also many pages which have
30 THE INTERNET AS A MEDIUM
content that seems to be permanent, yet are found to have altered
on subsequent viewing because they have been refreshed by the
website-owner – as is routine with e-commerce pages, where the
introduction of new models and prices provides the reader with
content that is being continually updated. Outputs display differ-
ent kinds of persistence. Comments to a website stay on a page
for as long as the site exists, unless deleted for some reason by the
website-owner. Messages are ephemeral on instant messaging
unless a decision is made to log them. Emails stay until removed
by the receiver (but may of course still be present on the host
server). Archives of messages are routinely made in electronic
mailing lists, blogs, and tweets.
User reactions to the content of a page also interfere with the
traditional notion of the persistence of a written text. With sev-
eral Internet categories, such as email, there are opportunities to
‘interfere’ with a message in ways that are not possible in tradi-
tional writing. A recipient may take a message and intercalate
(or ‘frame’) responses to the various points that the sender made
(see p. 73). The original sentences may be altered or deleted. In a
chatroom or public forum, a third party might be involved, in the
form of a moderator, whose role is to censor undesirable content.
In all cases, the text can be modified with an ease and undetecta-
bility that is not possible when people try to alter a traditionally
written text.
Multiple authorship
Intercalated and moderated texts illustrate a multi-authorship
phenomenon which reaches its extreme in wiki-type pages, where
readers may alter an existing text as their inclination takes them.
The process raises important social and legal issues, but it also
has several linguistic consequences.
First of all, it makes texts pragmatically heterogeneous, as the
intentions behind the various contributions vary greatly. Wiki
articles on sensitive topics (such as politics or religion) illus-
trate this most clearly, with judicious observations competing
with contributions that range from mild through moderate to
severe in the subjectivity of the writers’ opinions. Texts are also
THE INTERNET AS A MEDIUM 31
stylistically heterogeneous. Sometimes there are huge differences,
with standard and nonstandard language coexisting on the same
page, often because some of the contributors are plainly com-
municating in a second language in which they are nonfluent.
Traditional notions of stylistic coherence, with respect to level
of formality, technicality, and individuality, no longer apply,
though a certain amount of accommodation is apparent, with
contributors sensing the properties of each other’s style.
Cultural differences are especially important. People with dif-
ferent cultural backgrounds have different views on how formal
a piece of writing on the Internet should be, or how focused or
figurative it should be. One temperament requires that an author
gets to the point quickly and stays focused on it; another requires
a scene-setting preamble and allows diversions. One tempera-
ment is prone to vivid similes, metaphors, and personifications;
another avoids them. In a setting such as Wikipedia, we find cul-
ture differences affecting the willingness of people to change a
page – whether to add information, to clarify what is there, or to
delete it. Some countries (such as Japan) seem to privilege edit-
ing; others (such as France) seem reluctant to interfere with the
work of others.7 The differences appear at a detailed level. We
find pages which display a mix of contracted and uncontracted
forms (e.g. doesn’t vs does not), use conflicting conventions for
writing dates, times, and addresses, or vary in their preferences
over the use of colours. We need to know more about the diver-
sity of expectations and behaviour among people from different
cultures when they communicate on the Internet.
Multi-authorship also disturbs our sense of the physical iden-
tity of a text. How are we to define the boundaries of a text which
is ongoing? People can now routinely add to a text posted online,
either short-term, as in the immediate response to a news story,
or medium- or long-term, as in comments posted to a blog, bulle-
tin board, or other forum. Ferdinand de Saussure’s classical dis-
tinction between synchronic and diachronic does not adapt well
to the Internet, where everything is diachronic, time stampable to
a micro-level. In classical linguistics, texts are typically viewed as
synchronic entities, by which we mean we disregard the changes
that were made during the process of composition and treat the
32 THE INTERNET AS A MEDIUM
finished product as if time did not exist. But with many Internet
texts there is no finished product. I can today post a message to
a forum discussion on page X created in 2004. From a linguistic
point of view, we cannot say that we now have a new synchronic
iteration of X, since the language has changed in the interim. I
might comment that the discussion reads like something ‘out of
Facebook’ – which is a comment that could be made only after
2005, when that network began.
The problem exists even when the person introducing the vari-
ous changes is the same. The author of the original text may
change it – altering a web page, or revising a blog posting. How
are we to view the relationship between the various versions?
The question is particularly relevant now that print-on-demand
(POD) texts are becoming common. It is possible for me to pub-
lish a book very quickly and cheaply, printing only a handful of
copies. Having produced my first print-run, I then decide to print
another, but make a few changes to the file before I send it to the
POD company. In theory (and probably increasingly common in
practice), I can print just one copy, make some changes, then print
another copy, make some more changes, and so on. The situation
is beginning to resemble medieval scribal practice, where no two
manuscripts were identical, or the typesetting variations between
copies of Shakespeare’s First Folio. The traditional terminology
of ‘first edition’, ‘second edition’, ‘first edition with corrections’,
ISBN numbering, and so on, is inadequate to account for the
variability we now encounter; but it is unclear what to put in its
place. The same problem is also present in archiving. The British
Library, for example, launched a Web Archiving Consortium in
2008.8 My website is included. But how is one to define the rela-
tionship between the various time-stamped iterations of this site,
as they accumulate in the archive?
A NEW MEDIUM
The language of the Internet cannot be identified with either
spoken language or written language, even though it shares
some features with both. The electronic medium constrains and
facilitates human strategies of communication in unprecedented
THE INTERNET AS A MEDIUM 33
ways. Among the constraints are limited message size, message
lag, and lack of simultaneous feedback. Among the facilitations
are hypertext links, emoticons, and the opportunities provided
by multiple conversations and multiply authored texts. But this
is only a partial account, which raises the general question: how
many such design-features are there?
Susan Herring has approached this problem by adopting the
notion of facets from the field of knowledge management.9 Fac-
ets are parameters of contrast in relation to which outputs can be
defined, and are similar in conception to the notion of design-fea-
tures. Facets are grouped into two broad categories. Technologi-
cal facets characterize the medium, determined by the associated
computer hardware and software and by the character of the
protocols governing the various outputs. Social facets character-
ize the number, relationship, and behaviour of those using the
medium, the content and purpose of their communication, and
the language they use.
Under the technological heading the following variables are
recognized for online text (multimedia channels will need an
extension of the approach):
• Synchronicity: whether the activity operates in real time
(synchronic) or not (asynchronic)
• Granularity: the nature of the units transmitted by the sys-
tem, whether messages, characters, or lines
• Persistence: the period of time that messages remain on the
system after they are received
• Length: the number of characters that a system buffer allows
in a single message
• Channels: the multimedia channels involved (animated
graphics, video, audio)
• Identity: whether messages are anonymous or identified
• Audience: whether messages are publicly or privately
accessible
• Adaptation: whether the system allows content to be filtered,
quoted, or modified (cf. framing, p. 30)
• Format: the appearance of messages on screen, includ-
ing such variables as the order in which they appear, their
34 THE INTERNET AS A MEDIUM
location in relation to other messages, and whether other
information is automatically appended
Under the heading of social facets, Herring identifies the follow-
ing variables:
• Participation structure: the number of active or potential
participants in an interaction, the amount they say, the speed
at which they say it, whether they are interacting privately or
public, and in real life or pseudo-life
• Participant characteristics: the usual range of factors identi-
fied by sociolinguists as relevant for language analysis, such
as age, gender, education, cultural background, beliefs, and
professional skills
• Purpose: the reason(s) for a message, whether sent by indi-
viduals or groups (e.g. playing a game, advertising a product,
teaching a language)
• Activities: the means whereby the purpose is achieved (e.g.
using text, sending photographs, adding sound, providing a
forum)
• Topic: the kind of content felt to be relevant or appropriate
to a message (cf. the common reference to a message being
‘off-topic’)
• Tone: the manner or spirit of an interaction (e.g. unemo-
tional, jocular, aggressive, persuasive)
• Norms of organization: the way participants organize them-
selves (e.g. control content via a moderator, admit new mem-
bers, distribute messages)
• Norms of social appropriateness: the behavioural standards
accepted by the participants (e.g. netiquette guidelines, spam
filters)
• Norms of language: the linguistic conventions recognized
by participants (e.g. use of abbreviations, insider jokes, non-
standard spellings)
• Code: the language(s) or language varieties used by the par-
ticipants, whether spoken or written (i.e. including scripts
and fonts)
THE INTERNET AS A MEDIUM 35
Herring’s list is an inventory, valuable as a tool for promoting
the description and classification of Internet texts, within the
various outputs. The next step is to collect corpora and carry out
detailed descriptions, using parameters of this kind as guidelines.
We can talk about the uniqueness of Internet language in general
terms, but ultimately the only way to appreciate its character as a
new medium is to carry out a linguistic investigation of a sample
of data from an individual output. This invariably raises novel
methodological issues, at the same time identifying features that
are not encountered in analyses of ‘traditional’ speech and writ-
ing. A general account of the first Internet outputs – email, chat,
virtual worlds, the web, instant messaging, blogging, and text-
messaging – is already available in earlier works,10 so for present
purposes an apt illustration of the process can be found by taking
a more recent development – an output whose linguistic origins
lie in a combination of Internet and mobile phone.
See also ‘Research directions and activities’, pp. 151–3.