0% found this document useful (0 votes)
21 views16 pages

Nini JackTheRipper

Uploaded by

ike.cai.cxc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views16 pages

Nini JackTheRipper

Uploaded by

ike.cai.cxc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

An authorship analysis of the Jack

the Ripper letters


............................................................................................................................................................
Andrea Nini
Linguistics and English Language, University of Manchester, UK
.......................................................................................................................................
Abstract
The Whitechapel murders that terrorized London in 1888 are still remembered to
this day, thanks to the legend of its unapprehended perpetrator, Jack the Ripper.
In addition to the gruesomeness of the murders, the name and the persona of the
killer have been popularized by the over 200 letters signed as ‘Jack the Ripper’
that have been received following the murders. The most supported theory on the
authorship of these letters is that some of the earliest key texts were written by
journalists to sell more newspapers and that the same person is responsible for
writing the two most iconic earliest letters. The present article reports on an
authorship clustering/verification analysis of the Jack the Ripper letters with a
view to detect the presence of one writer for the earliest and most historically
Correspondence:
important texts. After compiling the ‘Jack the Ripper Corpus’ consisting of the
Andrea Nini, Linguistics and 209 letters linked to the case, a cluster analysis of the letters is carried out using
English Language, University the Jaccard distance of word 2-grams. The quantitative results and the discovery
of Manchester, Samuel of certain shared distinctive lexicogrammatical structures support the hypothesis
Alexander Building, Oxford that the two most iconic texts responsible for the creation of the persona of Jack
Rd, Manchester M13 9PL,
UK.
the Ripper were written by the same person. In addition, there is also evidence
E-mail: that a link exists between these texts and another of the key texts in the case, the
andrea.nini@manchester.ac.uk Moab and Midian letter.
.................................................................................................................................................................................

1 Introduction found the set of shared behavioural characteristics


of the murders to be distinctive (Keppel et al.,
On 31 August 1888, the murder of a prostitute in 2005).
the Whitechapel area of London started a series of Besides the investigative aspect, the Whitechapel
homicides that would be remembered for over a murders case and the legend of Jack the Ripper have
century: the Whitechapel murders. These murders an important socio-cultural dimension. The mystery
were characterized by mutilations of increasing surrounding the identity of the killer has led to in-
gruesomeness, such as disembowelment or removal credible and often unlikely speculations and even
of organs. Experts believe that between five and six though the Whitechapel murders happened more
murders were committed, culminating with the than a century ago, the mystery has created a busi-
most violent one on 9 November 1888. Although ness that is still alive and generating revenue in the
the killings are traditionally attributed to a single form of media products, books, and tours. These
person, commonly known as Jack the Ripper, elements have contributed to the engraving of the
there has never been definitive evidence to exclude mythology of Jack the Ripper into modern Western
the possibility that the murders were unconnected culture far more than the murders themselves, and
events, despite some modern research that has several academic works have explored both the

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018. ß The Author(s) 2018. Published by Oxford University Press on behalf of EADH. 621
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For
commercial re-use, please contact journals.permissions@oup.com
doi:10.1093/llc/fqx065 Advance Access published on 25 January 2018
Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843
by Carnegie Mellon University user
on 28 August 2018
A. Nini

sociological dimension of the mythology of Jack the iconic in the portrayal of Jack the Ripper and was
Ripper to shed light on 19th century England and taken more seriously than other letters because of
the beginning of the modern era (Walkowitz, 1982; the short window between the murders and the time
Perry Curtis, 2001; Haggard, 2007) or have identi- the postcard was sent (Begg, 2004).
fied links between Jack the Ripper and Victorian The police took these two texts seriously enough
literature (Tropp, 1999; Eighteen-Bisang, 2005; to produce and post copies outside of police stations
Storey, 2012). on 3 October 1888 (Rumbelow, 1979; Sugden, 2002).
The origin of the mythology of Jack the Ripper Following that, on 4 October, the two texts were also
lies in the communication that the killer allegedly published in many newspapers (Sugden, 2002), even
sent to the police or media during the time of the though some newspapers had obtained the informa-
murders and in the following months and years. tion of the name ‘Jack the Ripper’ and part of the
Although there is no evidence that the real killer texts already by 1 October (Perry Curtis, 2001).
was involved in the production of any of them, Although much less popular than the other two
the more than 200 Jack the Ripper letters signifi- texts, on 5 October the Central News Agency also
cantly contributed to the creation and populariza- received a third text, commonly known by experts
tion of the name and persona of Jack the Ripper. as the ‘Moab and Midian’ letter. This text
However, despite the large number of texts involved announced a triple event and justified the murders
in the case, only a small number of the Jack the with religious motives. The peculiarity of this letter
Ripper letters received substantial investigative or is that the original had never been sent to the police,
socio-cultural importance at the time. as the journalist Tom Bulling of the Central News
Probably the most important text in the case is Agency decided to copy the text and send only the
the ‘Dear Boss’ letter, which was received on 27 envelope to the police. The reasons behind this
September 1888 by the Central News Agency of choice were not explained and to date they are
London. This letter is the first ever signed as ‘Jack still unknown.
the Ripper’ and it is responsible for the creation of Besides the three texts delivered to the Central
the pseudonym. The letter claimed responsibility for News Agency, a large number of other letters and
the murder of Annie Chapman on 8 September postcards were sent to several other recipients such
1888 and mentioned that an ear would be cut off as the press or the police between October 1888 and
from the next victim and sent to the police. Indeed, November 1888, that is, after the two iconic texts
the murder of Chapman was followed by another were made public by the police. During this period,
murder in which part of one of the ears of the 130 letters allegedly written by the killer were
victim was removed, although this was never sent received, and the flow of letters continued for ten
to the police. Because of this fact and its style and more years. Among these letters, another text that
content, the letter was considered to be genuine and has become iconic and that was judged as important
it became famous for introducing the persona of during the case is the ‘From Hell’ letter, which was
Jack the Ripper and for providing a name that the received on 16 October by George Lusk, head of the
press could use to refer to the killer. Whitechapel Vigilance Committee, together with
The second most important text is the ‘Saucy half of a kidney (Rumbelow, 1979).
Jacky’ postcard, which was received on 1 October In most of the letters, the author(s) mimicked
1888 by the Central News Agency of London, signed the original ‘Dear Boss’ letter and ‘Saucy Jacky’
again as ‘Jack the Ripper’. The postcard claimed postcard in terms of taunting the police and using
responsibility for the double murder of Elizabeth salient stylistic features, such as the laughter ‘ha ha’,
Stride and Catherine Eddowes on the night of 30 or the salutation ‘Dear Boss’. Some of the letters
September 1888. The postcard did not threaten were almost exact copies of ‘Dear Boss’, especially
future murders and presented an apology for not the ones that were received a year or more later, in
having sent an ear to the police. Together with the conjunction with the anniversary of the murders or
‘Dear Boss’ letter, this postcard has also become in conjunction with new murders in Whitechapel.

622 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

Since it is quite unlikely that the same person competition with other news agencies and had a
produced hundreds of letters spanning decades reputation of fabricating or embellishing news
and sent from different places across the UK, it is (Evans and Skinner, 2001; Begg, 2004). Another
commonly assumed that most of the letters were theory proposed by Cook (2009) suggests that a
written by different individuals, who possibly had journalist named Frederick Best from the tabloid
not been involved with any of the killings. newspaper The Star was the actual author of the
Particularly interesting is the case of Maria ‘Dear Boss’ letter.
Coroner, a 21 year old girl who was caught sending As a first step to shed light on the authorship
one of those letters (Evans and Skinner, 2001). question of the Jack the Ripper letters, the present
When questioned, she explained that she did so as article reports on an authorship analysis of the texts
she was fascinated by the case. It is likely that many received during and after the Whitechapel murders
of the writers of these letters acted for similar rea- case that are connected to Jack the Ripper. The
sons, although the motives behind such actions will available data set lends itself to several authorship
probably never be established. These hoax letters questions, such as the profiling of the anonymous
themselves represent an interesting mirror into the author(s), or to the comparison between some key
fears and problems of the people who wrote them letters and Bulling’s and Best’s writings. In the pre-
(Remington, 2004). More importantly, these letters sent article an initial exploration of the Jack the
still exercise an impact on modern times. The Ripper letters is performed with the general aim of
Yorkshire ripper hoaxer, for example, sent letters finding out for which of the hundreds of texts there
that borrowed several linguistic elements from the is evidence of common authorship, with a special
‘Dear Boss’ letter (Ellis, 1994; Lewis, 1994). attention to the most important texts in the case
Such a collection of letters also represents an in- mentioned above and on those earliest texts received
valuable data set for forensic linguistics and for before 1 October 1888, that is, before the ‘Dear Boss’
authorship analysis. Linguistic analyses of the letters letter and the ‘Saucy Jacky’ postcard became of
can be useful to provide new evidence for the public domain.
Whitechapel murders case, since, as opposed to Establishing whether some of the Jack the Ripper
other sources of evidence nowadays corrupted by texts could be written by the same person is an im-
time, the language of the letters has reached us un- portant preliminary step as any future study, either
changed. The question of the authorship of the let- involving profiling or comparison, would benefit
ters mostly focuses on the early ones, such as the from knowing if a number of questioned texts can
‘Dear Boss’ and ‘Saucy Jacky’ texts. The most be clustered together. In this sense, the authorship
common theory about the authorship of these question tackled in the present study constitutes a
texts is that journalists fabricated them to increase useful starting point for any future authorship study
newspaper sales. The ‘enterprising journalist’ on the Jack the Ripper letters.
theory, more specifically, suggests that letters such
as the ‘Dear Boss’ letter were actually works of fic-
tion skilfully created to generate shock and ‘keep the 2 Data
business alive’ (Begg, 2004; Begg and Bennett,
2013). Evidence for the ‘enterprising journalist’ The data set used in the present study is a corpus
theory comes from the ‘Littlechild’ letter, in which that includes the texts connected to the Whitechapel
Detective Chief Inspector John George Littlechild murders: the Jack the Ripper Corpus (JRC) (see
mentions that at Scotland Yard virtually everyone Supplementary Material). This corpus consists of
knew that the ‘Dear Boss’ letter was fabricated by the letters or postcards found and transcribed in
Tom Bulling, a journalist of the Central News the Appendix of Evans and Skinner (2001), who
Agency itself, in collaboration with his manager claim to have collected all of the texts involved in
(Rumbelow, 1979; Begg, 2004). At the time, the the Whitechapel murders related to Jack the Ripper
Central News Agency had been in a fierce from the Metropolitan Police files. These letters

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 623

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

were OCR-scanned from the book and the scans and presents the intention to stop killing. The
were manually checked for scanning errors. The letter is unsigned;
corpus consists of 209 texts and 17,463 word  Text 2 (27 September, 244 word tokens): The
tokens. The average length of a text in the corpus ‘Dear Boss’ letter;
is of eighty-three tokens (min ¼ 7, max ¼ 648,  Text 3 (1 October, 57 word tokens): The ‘Saucy
SD ¼ 67.4). Jacky’ postcard; and
The peculiarity of the JRC is that almost all of the  Text 4 (1 October, 88 word tokens): This text
texts in the corpus are comparable in terms of their threatens more murders and is signed as ‘Ripper’.
broad situational parameters (Biber, 1994), as they are
Even though the analysis will include all the JRC
almost all written letters or postcards with similar
texts, these four texts are particularly important be-
linguistic purposes. For example, in terms of ad-
cause any linguistic similarity that links them
dressee, 67% of the texts were addressed to Scotland
cannot be explained by influence from the media,
Yard; Sir Charles Warren, the head of London
an explanation that cannot be ruled out for the
Metropolitan Police during that time; Inspector
other texts. In the rest of this article, the four texts
Abberline; or other law enforcement units. The re-
above will be called the ‘pre-publication’ texts,
maining 33% were either of unknown addressee
whereas the remaining 205 texts will be called the
(13%), or were addressed to common citizens or to
‘post-publication’ texts.
newspapers, news agencies, schools, or private firms
(20%). The vast majority of the letters was post-
marked or found in London, although other letters
were postmarked or found in places all over the UK, 3 Methodology
such as Birmingham, Bradford, Dublin, Edinburgh,
The authorship question considered for this study
Liverpool, Manchester, or Plymouth. All of the letters
concerns finding out which texts in a corpus are
were handwritten and a minority of them (4%)
likely to be written by the same author. Recently,
included drawings of various items, such as knives,
this task has been called ‘author clustering’ and it
skulls, or coffins. Finally, a large number of the letters
has been tackled using hierarchical cluster analysis
(75%) were indeed signed as ‘Jack the Ripper’ or with
on frequencies of features (Gómez-Adorno et al.,
variants of the name, such as ‘Jack the Whitechapel
2017). This authorship problem could be con-
Ripper’, or ‘JR’, or ‘jack ripper and son’. Some other
sidered, however, just as a special case of ‘author-
letters were not signed (11%) while the remaining
ship verification’, a problem that has received
letters used other pseudonyms, such as ‘Jim the
considerable attention in the literature (Koppel
Cutter’, ‘The Whore Killer’, or ‘Bill the Boweler’.
and Schler, 2004; Koppel et al., 2012; Brocardo et
The corpus ranges from 24 September 1888 to 14
al., 2013; Koppel, Schler and Argamon, 2013). The
October 1896, thus spanning more than 10 years after
best solutions proposed to solve this type of prob-
the murders. However, the majority of the texts, that
lem involve the addition of distractor texts belong-
is 62% of the corpus, was received during the period
ing to similar registers and the use of similarity
between October 1888 and November 1888.
metrics applied to feature sets consisting of frequen-
Among the total set of 209 texts, the present ana-
cies of linguistic features.
lysis will pay special attention to those early texts
The problem in applying any of these techniques
that were received not later than the 1 October 1888,
to the JRC corpus is that the JRC texts are too short
before the content of the ‘Dear Boss’ and ‘Saucy
to produce reliable frequencies, as the average text
Jacky’ was popularized by the police and the
length for the corpus is only eighty-three word
media and therefore hoaxers could have knowledge
tokens. For this reason, in this case it is necessary
of it. Before this date, according to Evans and
to adopt a method that does not involve the com-
Skinner’s (2001) collection, four texts were received:
putation of frequencies.
 Text 1 (24 September, 128 word tokens): In this A solution to the problem of analysing short texts
text the author admits to the killing of Chapman within a forensic linguistic context by considering

624 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

the presence or absence of features as opposed to features. Character n-grams could also be good fea-
their frequencies has been initially proposed by tures but they are less amenable to interpretation,
Grant (2010) and then further described in Grant which can be a drawback depending on the ultimate
(2013) for text messages. Inspired by research in goal of the research.
similarity between species in biology and ecology, In addition to these methodological advantages,
and already applied to assess similarity in crime the use of word n-grams as features has theoretical
types, this approach consists in quantifying the support. Corpus linguistics (Sinclair, 1991; Biber,
similarity between two texts using the Jaccard coef- Conrad and Cortes, 2004; Hoey, 2005) and psycho-
ficient, or the number of shared features between linguistics/cognitive linguistics (Langacker, 1987;
two texts divided by the total number of features Barlow and Kemmer, 2000; Schmitt, 2004; Wray,
in both texts (Jaccard, 1912): 2005; Schmid, 2016) have long theorized that com-
bination of words is at the core of language process-
jA \ Bj
J ðA; BÞ ¼ ing and empirical support has been found for these
jA [ Bj
theories (Ellis and Simpson-Vlach, 2009; Tremblay
After being successfully applied to text messages et al., 2009).
case, methods using the Jaccard coefficient have Furthermore, there is also empirical support for a
been applied with good results to other registers, strong idiolectal effect in the production and pro-
including newspaper articles (Juola, 2013), short cessing of word combinations (Mollin, 2009;
emails (Johnson and Wright, 2014; Wright, 2017), Barlow, 2013; Schmid and Mantlik, 2015; Günther,
and elicited personal narratives (Larner, 2014). 2016). Wright, (2017) reveals the idiolectal nature of
These studies have analysed the presence/absence certain word n-grams by taking one specific speech
of combination of words, mostly looking at word act as constant and then analysing how different
n-grams, that is, strings of words of length n col- authors realize this act, uncovering that each
lected using a moving window. author recurs to their own idiosyncratic set of lex-
Within plagiarism detection research, word ical choices to perform the same act.
n-gram techniques based on similar mathematical In the present study, for the reasons explained
principles are very common (Oakes, 2014, p. 65) above, the set of features that is taken under con-
on the grounds that the more shared strings there sideration is word n-grams, as the ultimate goal is to
are in two documents, the more there is shared simi- discover possible idiolectal encoding in the JRC let-
larity of encoding of meanings and therefore the less ters. Because the JRC texts are short, presence or
likely it is that the documents are independent from absence of word n-grams is considered, as opposed
each other, as explained by Coulthard (2004). to their frequency. Among all the possible sizes of
Word n-grams have been extensively adopted as n-grams, word 2-grams are chosen as any n-gram of
linguistic features in traditional frequency-based n > 2 is ultimately made up of n-grams of n ¼ 2,
stylometric methods for authorship attribution, al- meaning that word 2-grams return the most com-
though they are not deemed the best stylometric plete picture of the shared word combinations in
features, as they are often surpassed in efficacy by two sets. Presence or absence of word n-grams is
function words, simple word frequency, and, above quantified using the Jaccard ‘distance’, as opposed
all, character n-grams (Grieve, 2007; Stamatatos, to the coefficient, which can be defined as:
2009). Although word n-grams might not be ex- jA \ Bj
tremely good features when frequency is taken dJ ðA; BÞ ¼ 1 
jA [ Bj
under consideration, for a method involving pres-
ence/absence these features are much better than and which returns values between 0, or absolute
single words or function words because word strings identity, and 1, or absolute distance. The Jaccard
are rarer and the power of a presence/absence distance is used so that a hierarchical cluster analysis
method lies in the measurement and comparison can then be carried out. In this way, it is possible to
of the linguistic uniqueness of each author on rare first find out the major groups of texts that are more

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 625

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

similar to each other, and then it is possible to zoom dealing with texts of different length, as the likeli-
in and explore smaller groups of letters, such as the hood of any word or n-gram type being observed is
pre-publication letters. correlated with text length. However, provided that
However, evidence of common authorship of the shared n-grams found are also highly distinctive
two sets of documents can come not only from the evidence of common authorship is nonetheless
finding similarity but also from establishing that valid despite differences in text lengths.
this similarity is distinctive (Grant, 2010, 2013).
Although it is difficult to establish a universal
threshold for distinctiveness, it is safe to assume 4 Results
that if a particular n-gram or lexicogrammatical
structure does not occur at all or occurs extremely Figure 1 reveals that the relationship between the
infrequently in a comparable reference corpus then percentage of texts using a 2-gram (occurring in at
this n-gram or structure is distinctive. least two texts) and their frequency rank form a
The comparison corpus used to assess distinct- zipfian shape, as expected (Zipf, 1935). The graph
iveness should therefore include relevant population shows that the top eight 2-grams appear in at least
data (Turell and Gavaldà, 2013; Wright, 2017). If a 20% of the corpus. Some of these are very frequent
smaller sub-sample of its texts is considered, the because they reflect common grammatical struc-
remaining of the JRC itself is indeed a corpus with tures of English, such as ‘I am’, ‘I have’, ‘I will’.
relevant population data. However, because of its Two 2-grams reflect the influence of the signature
relatively small size, more data from 19th century and salutation of the ‘Dear Boss’ letter on the rest
English is necessary to find evidence of distinctive- of the corpus: ‘jack the’ and ‘dear boss’. Finally, the
ness. Ideally, because of the pervasiveness of register high incidence of the 2-grams ‘I shall’ and ‘yours
variation, the perfect comparison corpus would be truly’ are probably explained by both the influence
one including a large number of 19th century of the ‘Dear Boss’ letter and by the register of the
English letters of comparable communicative situ- letters.
ation (Biber, 2012). However, in the absence of an Because of their frequent occurrence and thus
extensive resource of this kind, the most compre- reduced discriminatory power, these top eight
hensive largest available set of general reference cor- 2-grams were excluded from further analysis.
pora was used instead, consisting of the largest The distance between each pair of texts was
available corpora of 19th century English: quantified using the Jaccard distance based on the
 The 132 million word 19th century section of the presence or absence of the remaining 1541 word
Corpus of Historical American English (COHA); 2-grams and a distance matrix was therefore gener-
 The 34 million word Corpus of Late Modern ated. Figure 2 shows a histogram and boxplot of the
English Texts 3 (CLMET3), spanning from Jaccard distances for all possible pairs of texts in the
1710 to 1920; JRC.
 The 19 million word Extended Old Bailey As the histogram of Fig. 2 shows, the most fre-
Corpus (EOBC), including the proceedings of quent Jaccard distance and also the median distance
the Old Bailey from 1720 to 1913. is approximately 1, which generally speaking means
that the texts in the JRC are not very similar to each
In sum, the method adopted in this study involves other. Only 25% of the scores are lower than 0.98,
the comparison of all the texts in the JRC to each which is marked in Fig. 2 by the leftmost edge of the
other using the Jaccard distance and a set of com- boxplot, and only 6% of the scores are lower than
parison corpora to find whether there are texts 0.95, that is, the outliers in the boxplot of Fig. 2
that are similar and distinctive in their linguistic indicated by circles.
encoding. The distance matrix was then used for a hierarch-
In addition, since the analysis involves word ical cluster analysis that can be visualized through
n-gram ‘types’, the method faces problems when the radial dendrogram in Fig. 3.

626 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

Fig. 1 Relationship between rank and percentage of occurrence for each word 2-gram in the JRC occurring in at least
two texts

interesting texts, including the pre-publication texts,


are all grouped in the cluster spanning over the top
hemisphere of the graph and therefore the rest of the
article will focus on this cluster. Although it would be
interesting to explore the other clusters, this is beyond
the scope and space of this study. The branch leading
to the top hemisphere then splits even further into
two more sub-branches, one developing to the right
containing the ‘From Hell’ letter, and one to the left
where all the other historically important letters,
including the pre-publication texts, are grouped. The
split at this level suggests that the ‘From Hell’ letter is
Fig. 2 Histogram and boxplot showing the distribution of rather linguistically dissimilar to the other famous let-
Jaccard distance values for all possible pairs of texts in the ters, at least in terms of word 2-gram use. The left
JRC branch then splits into two more clusters, with the
rightmost one splitting again into two large clusters.
Three main branches stem from the centre of the One of these two contains the ‘Dear Boss’ letter, the
graph in Fig. 3, corresponding to the three main clus- ‘Saucy Jacky’ postcard, and the ‘Moab and Midian’
ters found. On the right, there are two main clusters, letter, while the one next to it contains two of the
one of which includes only two texts. The remaining pre-publication letters. Therefore, among the pre-pub-
texts are all classified into another cluster whose lication JRC texts, ‘Dear Boss’ and ‘Saucy Jacky’ are
branch points to the left and that further splits into the most similar one, with the ‘Moab and Midian’
two other clusters that roughly correspond to the two letter being the most similar to them among all the
hemispheres of the graph. The most historically historically important texts.

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 627

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

Fig. 3 Radial dendrogram displaying the results of a hierarchical cluster analysis of the JRC corpus using the Ward
method based on Jaccard distances. The name of the texts is a code starting with two letters from the signature and
followed by the date in which it was received. The texts mentioned in the introduction, including the pre-publication
texts, contain their name in addition to the code

represents an overlapping word 2-gram type. The


4.1 The pre-publication texts graph also reports the Jaccard distance for each
Let us therefore examine the pre-publication texts pair of texts.
using a network graph as in Fig. 4, in which each As the cluster analysis already suggested, it is evi-
circle represents a text with a size proportional to dent that the two pre-publication texts that are
the text’s length in total word tokens and each link more similar to each other are the ‘Dear Boss’

628 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

letter and the ‘Saucy Jacky’ postcard. Additionally, in reference corpora of 19th century English. The
these two texts have a Jaccard distance of 0.93, use of ‘till’ as a variant of ‘until’ is also not very
which is a degree of dissimilarity that can be found distinctive as it is the predominant variant in the
in less than 5% of the pairs of texts in the JRC. The JRC (80%), CLMET3 (75%), and EOBC (90%) but
amount of shared language is striking considering the not in 19th century COHA (28%).
fact that the ‘Saucy Jacky’ postcard is very short and The two texts also share the use of infinitive
does not share any linguistic link with either the 24 clauses to post-modify the noun ‘time’ with a neg-
September text or the 1 October text. Although the ation in the matrix clause (6), which occurs in only
‘Dear Boss’ letter shares a number of 2-grams with two other texts in the JRC. The structure is quite
both Text 1 and Text 4, the Jaccard score for both rare even at a more general level, as it is found about
pairs is in the average for the corpus. ten to eighteen times per million words across the
Excluding the 3-gram ‘Jack the Ripper’, which reference corpora.
refers to the signatures of the two texts, Table 1 ‘Dear Boss’ and ‘Saucy Jacky’ also share the use
below presents the concordances of their overlap- of the verb ‘work’ to euphemistically indicate the act
ping 2-grams, with an analysis of their syntactic of killing (7). This use of ‘work’ is found in some
structure. post-publication JRC texts (about 20% of the texts
A closer examination reveals that the two texts in the corpus). It is very difficult to estimate dis-
share 2-grams of varying distinctiveness. The phrase tinctiveness for (7) using larger reference corpora,
‘a bit’, although with different syntactic function however, as it would involve the manual analysis of
(1), the verbs ‘give’ (2) and ‘got’ (3), or the use of thousands of instances.
the infinitive verb ‘to get’ (3) are common struc- Finally, the two texts share the use of a verb
tures that are frequently found both in the JRC and phrase headed by the phrasal verb ‘to keep back’

Table 1 Syntactic analysis of the concordances for the 2-grams in common between Dear Boss and Saucy Jacky
1 till I do [NP a bit more work] (Dear Boss)
number one squealed [ADVP a bit] (Saucy Jacky)
2 [NP I] [VP gave [NP the lady] [NP no time to squeal]] (Dear Boss)
[NP I] [VP gave [NP you] [NP the tip]] (Saucy Jacky)
3 [NP I] [VP got [NP all the red ink] [Part off]] (Dear Boss)
till [NP I] [VP got [INFCL to work again]] (Saucy Jacky)
4 I want [INFCL to get [INFCL to work]] (Dear Boss)
had not time [INFCL to get [NP ears]] (Saucy Jacky)
5 [SUB till] [CL [NP I] [VP do get buckled]] (Dear Boss)
[SUB till] [CL [NP I] [VP do a bit more work]] (Dear Boss)
[SUB till] [CL [NP I] [VP got to work again]] (Saucy Jacky)
6 [NP no time [INFCL to squeal]] (Dear Boss)
had not [NP time [INFCL to get ears]] (Saucy Jacky)
7 I want to get [INFCL to work] (Dear Boss)
till I got [INFCL to work again] (Saucy Jacky)
8 [VP keep [NP this letter] [PART back] [SUBCL till I do]] (Dear Boss)
thanks for [VP keeping [NP last letter] [PART back] [SUBCL till I got to work]] (Saucy Jacky)

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 629

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

with the direct object being a noun phrase with carried out by searching for occurrences of the
‘letter’ as head followed by a subordinate clause lemma of these variants accompanied by the
introduced by the subordinator ‘till’ (8). Indeed, lemma LETTER within a span of  seven words. The
the 4-gram ‘letter back till I’ itself neither occurs concordances were then manually examined to
in any other text in the JRC, nor in any other count only instances of the meaning of ‘withholding
corpus of 19th century English listed above. a letter, delay a letter to be sent’.
Because of the rarity of this 4-gram and the absence This corpus search revealed that ‘keep back’ is
of a more relevant large corpus of 19th century found 22.5% of the times the meaning of ‘delay
English letters, a search of this 4-gram was per- sending a letter’ is expressed across the reference
formed on the web, which returned a total of corpora. In the 19th century section of COHA, the
2,640 hits, all from exact copies of either the ‘Dear majority of the instances (59%) use the variant
Boss’ letter or of the ‘Saucy Jacky’ postcard. ‘withhold’. Out of seven instances of ‘keep back’,
A search was then performed using the 19th cen- four are from one author, John Townsend
tury English corpora listed above on the overall Trowbridge. In CLMET3 the most common variant
phrasal verb ‘keep back’ used with the meaning of is again ‘withhold’. One judge also uses ‘keep back’
withholding a letter as opposed to the use of the in the EOBC, where the most common variant is
other synonyms ‘keep’ (without ‘back’), ‘hold instead ‘detain’. Finally, in the JRC, only three in-
back’, ‘hold up’, ‘hold out’, ‘withhold’, ‘delay’ plus stances of this meaning are found, two of which are
any verb indicating ‘sending’, ‘refrain’ plus any verb instances of ‘keep (quiet)’ found in two letters
indicating ‘sending’, and ‘detain’. The queries were dated, respectively, 20 October (‘keep this quiret

Fig. 4 Network graph visualizing the relationships between the pre-publication texts. The size of each node is pro-
portional to each text’s length. Each edge represents a shared word 2-gram. For each pair of texts the Jaccard distance is
also reported. Distances are rounded up

630 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

[sic] till I have done one’) and 09 November 1888 whether further links between these two texts and
(‘keep this letter a bit quiert [sic] till you here of me other texts can be found.
again’). The third one is found in the ‘Moab and As Fig. 5 indicates, only eight JRC texts have a
Midian’ letter and it is the only instance across all Jaccard distance lower than 0.95 with ‘Dear Boss’,
the corpora to exactly match the syntactic structure including ‘Saucy Jacky’ (dJ ¼ 0.929) and ‘Moab and
in (8), having the object in between the main verb Midian’ (dJ ¼ 0.934), which are both therefore more
and the particle as well as a subordinate clause similar to ‘Dear Boss’ than 95% of the JRC. The most
introduced by the subordinator ‘till’ (‘keep this similar text to ‘Dear Boss’ is, however, JR_191188,
back till three are wiped out’). with a Jaccard distance of 0.776. This is not reported
In conclusion, among the four pre-publication in Fig. 5 to ease the visualization of the boxplots.
texts, these results support the hypothesis that the However, this text can be discounted as its
‘Dear Boss’ and ‘Saucy Jacky’ texts were not written anomalous score is explained by the fact that most
independently from each other, since these two texts of it was copied verbatim from ‘Dear Boss’, as the
are more similar to each other in their use of word presence of an overlapping 13-gram demonstrates:
2-grams than 95% of all the other possible pairs of
I want to get to work right away if I get a
texts in the JRC even though the texts received later
chance and will do another one indoors.
could have been influenced by them, and since some
(JR_191188)
of these similarities are also distinctive.
My knife’s so nice and sharp I want to get to
work right away if I get a chance. (Dear
4.2 The post-publication texts Boss)
Having established a link between the ‘Dear Boss’ This is somewhat expected in the post-publication
letter and the ‘Saucy Jacky’ postcard, let us now texts, as the ‘Dear Boss’ and ‘Saucy Jacky’ were in
explore the post-publication texts to determine the public domain.

Fig. 5 Boxplots showing Jaccard scores for Dear Boss (left) and Saucy Jacky (right) and all the other texts in the JRC

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 631

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

For ‘Saucy Jacky’, Fig. 5 indicates that the median ‘Saucy Jacky’ of all the other texts in the JRC, but it
score is 1 and that 50% of the texts in the JRC there- is also almost as close as ‘Dear Boss’ is to ‘Saucy
fore have almost no linguistic link with it. Only twelve Jacky’ and, more importantly, it is the only text
JRC texts have a Jaccard distance lower than 0.96, that is very close to both ‘Saucy Jacky’ and ‘Dear
and, among these, the ‘Moab and Midian’ letter is Boss’ (with the exclusion of the JR_191188 that con-
even more striking, as its Jaccard score with ‘Saucy tains a 13-gram copied word-by-word).
Jacky’ is 0.90, which is 0.03 points smaller than the Table 2 presents the 2-grams and their underlying
second most similar text, the ‘Dear Boss’ letter. syntactic structures shared by ‘Moab and Midian’
From this analysis, it is evident that the ‘Moab and either ‘Dear Boss’ or ‘Saucy Jacky’. ‘Midian’
and Midian’ letter not only is the most similar to shares with the two pre-publication texts as well as

Table 2 Syntactic analysis of the concordances for the n-grams in common between Dear Boss and Saucy Jacky and the
Moab and Midian letters
1 till I do [NP a bit more work] (Dear Boss)
number one squealed [ADVP a bit] (Saucy Jacky)
will send you [NP a bit of face] by post (Midian)
2 I love [NP my work] (Dear Boss)
The police now reckon [NP my work] a practical joke (Midian)
3 you ll hear about [NP [NP saucy Jacky] [Gen s] [N work]] (Saucy Jacky)
well well [NP Jacky] [VP ’s [NP a very practical joker]] (Midian)
4 ripping them till [NP I] [VP do [VP get buckled.]] (Dear Boss)
The next job [NP I] [VP do] I shall clip (Dear Boss)
[NP I] [VP do] a bit more work (Dear Boss)
Do as [NP I] [VP do] and the light of glory (Midian)
5 I keep on hearing [NP the police] have caught me (Dear Boss)
and send to [NP the police officers] (Dear Boss)
[NP The police] now reckon (Midian)
6 [NP [ADJ Grand] [N work]] the last job was (Dear Boss)
helps me in my [NP [ADJ grand] [N work]] (Midian)
7 is fit enough I hope [INTJ ha. ha.] (Dear Boss)
They say I’m a doctor now [INTJ ha ha] (Dear Boss)
Jacky’s a very practical joker [INTJ ha ha ha] (Midian)
8 I wasnt codding [NP dear old Boss] (Saucy Jacky)
I promise this [NP dear old Boss] (Midian)
9 [VP keep [NP this letter] [PART back] [SUBCL till I do]] (Dear Boss)
thanks for [VP keeping [NP last letter] [PART back] [SUBCL till I got to work]] (Saucy Jacky)
[VP Keep [NP this] [PART back] [SUBCL till three are wiped out]] (Midian)
10 [CL . . . [NP saucy Jacky s work] [ADVP tomorrow]] [NP double event] [NP this time]
(Saucy Jacky)
[CL I must get [INFCL to work] [ADVP tomorrow]] [NP treble event] [NP this time]
(Midian)

632 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

with several other JRC texts the use of the phrase ‘a was not made public before the ‘Saucy Jacky’ post-
bit’ (1) and the verb ‘work’ to euphemistically mean card was sent, the degree of their shared linguistic
‘kill’ (2). ‘Midian’ and ‘Saucy Jacky’ also share the use encoding is highly suggestive of the two documents
of the pseudonym ‘Jacky’, although the 2-gram ‘Jacky not being produced independently. Although it is
s’ is only a surface similarity, as its underlying syn- entirely possible that one author was responsible
tactic structure is very different (3). ‘Midian’ also for all of the earlier texts, the linguistic evidence
presents the use of a pro-verb ‘do’ (4) and it men- found so far can only suggest a link between the
tions the police (5), similarly to ‘Dear Boss’. The ‘Dear Boss’ letter and the ‘Saucy Jacky’ postcard
adjective ‘grand’ to modify ‘work’ (6), the interjec- while no strong links can be found between these
tion ‘ha ha’ (7), and the vocative ‘dear old boss’ (8) two texts and the other two pre-publication texts.
are features that have been copied by other authors of Among the evidence of a link between the ‘Dear
the JRC texts, as they appear in, respectively, three, Boss’ letter and the ‘Saucy Jacky’ postcard, the
eight, and fifty-five other JRC texts. strongest piece of evidence is the presence of a
The two most distinctive structures are the verb shared distinctive 4-gram, ‘letter back till I’. The
phrase headed by ‘keep back’ (9), already discussed syntactic structure underlying this 4-gram is a verb
above, and the use of a verbless clause, ADJ ‘–ble phrase headed by a phrasal verb that, used within
event this time’, elaborating the previous clause that particular structure underlying that particular
ending with the adverb ‘tomorrow’ (10). This last unit of meaning, is also rare and distinctive overall.
syntactic structure is underlying the 2-gram ‘work The presence of this 4-gram and of this structure
tomorrow’ and the 3-gram ‘event this time’, which thus supports the hypothesis that the two texts were
do not appear in any other JRC text. written by the same person. This conclusion is sub-
The 2-gram ‘work tomorrow’ is surprisingly in- stantiated by the fact that despite the presence of
frequent in the reference corpora (0.03–0.05 per about 200 texts trying to imitate the style of the
million words) while the 3-gram ‘event this time’ ‘Dear Boss’ letter or ‘Saucy Jacky’ postcard, no
cannot be found at all. Although the 3-gram can other text has managed to reproduce this structure
be found on the web (617,000 hits), a search of or 4-gram, which indeed this analysis has proved to
the two n-grams together returns almost only in- be the real distinctive feature of these two texts.
stances of either ‘Saucy Jacky’ or ‘Moab and The only exception is the ‘Moab and Midian’
Midian’. letter, which does not use the 4-gram but contains
In conclusion, there is linguistic evidence in sup- an instance of ‘keep back’ meaning ‘to withhold’,
port of the hypothesis that the ‘Moab and Midian’ including the co-selection of the position of the
letter has an authorship link with the other two pre- object and of the adverbial clause introduced by
publication texts, even accounting for the fact that ‘till’. Furthermore, the ‘Moab and Midian’ letter
‘Dear Boss’ and ‘Saucy Jacky’ were publicly available also shares another distinctive lexicogrammatical
at the time ‘Midian’ was received. structure with ‘Saucy Jacky’, the verbless clause
ADJ ‘–ble event this time’ which elaborates the pre-
vious clause ending with the adverb ‘tomorrow’. It
5 Discussion is not possible to discount that the author of this
letter was simply more skilled in copying the style of
The analysis of the n-gram types reported above sug- ‘Dear Boss’ than others, as by the time the ‘Moab
gests that the ‘Dear Boss’ letter and the ‘Saucy Jacky’ and Midian’ letter was received all the earliest texts
postcard share distinctive linguistic similarities. were publicly available. However, the ‘Moab and
Because authorship analysis studies demonstrated Midian’ letter is striking in also being the most simi-
that common strings or rare collocations shared by lar letter in terms of the number of shared word 2-
documents are indicative of a common authorial grams, even despite the fact that probably hundreds
source (Coulthard, 2004; Mollin, 2009; Johnson of other authors tried to imitate the style of ‘Dear
and Wright, 2014), given that the ‘Dear Boss’ letter Boss’ and ‘Saucy Jacky’.

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 633

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

The analysis also points out that there is no link style of the ‘Dear Boss’ letter and of the ‘Saucy Jacky’
between the ‘From Hell’ letter and the other histor- postcard. However, it is evident that none of the
ically important texts in the case. Although this lack authors of these texts successfully managed to indi-
of link does not constitute evidence that they were viduate that the real linguistic distinctiveness con-
not written by the same person, this finding does sisted in a seemingly common string such as ‘letter
lend some support to the initial presuppositions of back till I’, or in the phrasal verb ‘keep back’ and its
other scholars that ‘Dear Boss’ and ‘Saucy Jacky’ are underlying structure, or even in simply the presence
independent from the ‘From Hell’ letter of the meaning of ‘withhold this letter’, found in
(Rumbelow, 1979). This and many other letters in only two other Jack the Ripper texts but encoded
the JRC texts can be analysed in more detail in the differently.
future. Instead, impostors imitated structures such as
Historically speaking, the comparison presented the salutation ‘Dear Boss’. Quantitatively speaking,
between the earliest letters ever received in the despite the presence of these letters in full in the
Whitechapel murders case provides linguistic evi- public domain, only a very limited percentage of
dence supporting the hypothesis that the two most them presents substantial linguistic similarities,
iconic texts sent during the case were written by the implying that techniques such as the analysis of
same person. Although several scholars have already short texts using similarity measures such as the
commented on the similarity of the handwriting of Jaccard coefficient are quite effective in filtering
the ‘Dear Boss’ letter and the ‘Saucy Jacky’ postcard, this type of noise.
the common authorship of these two texts has never Theoretically, the results presented in this article
been established with certainty. The present analysis, also contribute to the understanding of idiolect. A
however, found linguistic evidence that supports the superficial reading of most of the JRC letters would
common authorship of these two texts. Future ana- only reveal their similarities in terms of meanings,
lyses focused on their profiling or on the compari- themes, purposes, and some phraseology. However,
son with known writings of suspect authors can thus this analysis has revealed that by investigating the
take as point of departure a link between these two way these meanings, themes, and purposes are
texts. encoded linguistically uniqueness emerges, as
Additionally, of great historical importance is demonstrated by the relatively low average Jaccard
also the link found between the two earlier iconic distances between the letters. As shown by Wright
texts and the ‘Moab and Midian’ letter, since this (2017) for short emails, although meanings and
text is one of the most controversial in the JRC. speech acts can be shared, it is the way they are
Besides being the third and last letter that was ever encoded in words and syntactic structures that
sent to the Central News Agency, after ‘Dear Boss’ tends to be idiosyncratic or unique.
and ‘Saucy Jacky’, Bulling’s decision of sending a
copy of the ‘Moab and Midian’ letter instead of
the original was never justified by the journalist 6 Conclusions
and still remains suspiciously unexplained (Evans
and Skinner, 2001). The linguistic link found be- In this article, an analysis of the texts sent during the
tween these three texts is therefore far from coinci- Whitechapel murders case was presented. This ana-
dental in the light of the other non-linguistic lysis found linguistic evidence that supports the hy-
evidence and significantly contributes to the pothesis that the two most iconic texts signed as
debate on the origin of the letter. ‘Jack the Ripper’, the ‘Dear Boss’ letter and the
The present analysis is also successful in present- ‘Saucy Jacky’ postcard, have been written by the
ing serious implications for modern research in fo- same person. Because of the number and the dis-
rensic linguistics and authorship analysis. The JRC tinctiveness of the linguistic similarities, it is likely
is a corpus made up of texts the majority of which that an authorial link also exists between these two
was fabricated by individuals that were imitating the texts and a third letter sent to the same recipient, the

634 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
The Jack the Ripper Letters

‘Moab and Midian’ letter. These results constitute Eighteen-Bisang, R. (2005). Dracula, Jack the Ripper and
new forensic evidence in the Jack the Ripper case a Thirst for Blood. Ripperologist, 60: 3–12.
after more than 100 years, even though they do not
Ellis, N. C. and Simpson-Vlach, R. (2009). Formulaic
reveal information about the identity of the killer(s).
language in native speakers: triangulating psycholin-
Besides the historical and forensic implications, guistics, corpus linguistics, and education. Corpus
the results presented in this article also have inter- Linguistics and Linguistic Theory, 5: 61–78.
esting consequences for modern research in author-
Ellis, S. (1994). The Yorkshire Ripper enquiry: part I.
ship analysis, forensic linguistics, and research on Forensic Linguistics, 1: 197–206.
idiolect. The results in this article present additional
Evans, S. P. and Skinner, K. (2001). Jack the Ripper:
evidence that uniqueness in linguistic production
Letters from Hell. Stroud: Sutton.
can be found in the way meaning is encoded and
that this encoding of meaning can be difficult to Gómez-Adorno, H., Aleman, Y., Vilariño, D., Sanchez-
Perez, M. A., Pinto, D., and Sidorov, G. (2017).
imitate.
Author clustering using hierarchical clustering analysis
– notebook for PAN at CLEF 2017. In Cappellato, L.,
Ferro, N., Goeuriot, L., and Mandl, T. (eds), CLEF 2017
Supplementary Data Working Notes. CEUR Workshop Proceedings. Dublin,
Ireland: CLEF and CEUR-WS.org.
Supplementary data are available at LLC online.
Grant, T. (2010). Txt 4n6: idiolect free authorship ana-
lysis. In Coulthard, M. (ed.), Routledge Handbook of
Forensic Linguistics. London: Routledge, pp. 508–23.
References Grant, T. (2013). TXT 4N6: method, consistency, and
Barlow, M. (2013). Individual differences and usage- distinctiveness in the analysis of SMS text messages.
based grammar. International Journal of Corpus Journal of Law and Policy, 21: 467–94.
Linguistics, 18: 443–78. Grieve, J. (2007). Quantitative authorship attribution: an
Barlow, M. and Kemmer, S. (2000). Usage-Based Models evaluation of techniques. Literary and Linguistic
of Language. Cambridge: Cambridge University Press. Computing, 22: 251–70.
Begg, P. (2004). Jack the Ripper: The Definitive History. Günther, F. (2016). Constructions in Cognitive Contexts:
Harlow: Longman. Why Individuals Matter in Linguistic Relativity Research.
Begg, P. and Bennett, J. G. (2013). The Complete and Berlin; Boston: Walter de Gruyter.
Essential Jack the Ripper. London: Penguin Books. Haggard, R. F. (2007). Jack the Ripper as the threat
Biber, D. (1994). Register and social dialect variation: an of outcast London. In Warwick, A. and Willis, M.
integrated approach. In Biber, D. and Finegan, E. (eds), (eds), Jack the Ripper: Media, Culture, History.
Sociolinguistic Perspectives on Register. Oxford: Oxford Manchester; New York, NY: Manchester University
University Press, pp. 315–47. Press.
Biber, D. (2012). Register as a predictor of linguistic vari- Hoey, M. (2005). Lexical Priming: A New Theory of Words
ation. Corpus Linguistics and Linguistic Theory, 8: 9–37. and Language. London: Routledge.
Biber, D., Conrad, S., and Cortes, V. (2004). If you look Jaccard, P. (1912). The distribution of the Flora in the
at . . .: lexical bundles in university teaching and text- Alpine Zone. New Phytologist, 11: 37–50.
books. Applied Linguistics, 25: 371–405. Johnson, A. and Wright, D. (2014). Identifying idiolect
Brocardo, M. L., Traore, I., Saad, S., and Woungang, I. in forensic authorship attribution: an N-gram textbite
(2013). Authorship verification for short messages approach. Language and Law/Linguagem E Direito, 1:
using stylometry. In 2013 International Conference on 37–69.
Computer, Information and Telecommunication Systems Juola, P. (2013). Stylometry and immigration: a case
(CITS), IEEE, pp. 1–6. study. Journal of Law and Policy, 21: 287–98.
Cook, A. (2009). Jack the Ripper. Stroud: Amberley. Keppel, R., Weis, J., Brown, K., and Welch, K. (2005).
Coulthard, M. (2004). Author identification, idiolect, and The Jack the Ripper murders: a modus operandi and
linguistic uniqueness. Applied Linguistics, 25: 431–47. signature analysis of the 1888–1891 whitechapel

Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018 635

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018
A. Nini

murders. Journal of Investigative Psychology and Linguistic Knowledge. Berlin: De Gruyter Mouton, pp.
Offender Profiling, 2: 1–21. 9–36.
Koppel, M. and Schler, J. (2004). Authorship verification Schmid, H.-J. and Mantlik, A. (2015). Entrenchment in
as a one-class classification problem. In Proceedings of historical corpora? Reconstructing dead authors’ minds
the 21th International Conference on Machine Learning. from their usage profiles. Anglia, 133: 583–623.
ACM, Banff, Alberta, Canada, pp. 62–7. Schmitt, N. (2004). Formulaic Sequences: Acquisition,
Koppel, M., Schler, J., and Argamon, S. (2013). Processing, and Use. Amsterdam; Philadelphia: John
Authorship attribution: what’s easy and what’s hard? Benjamins.
Journal of Law and Policy, 21: 317–31. Sinclair, J. (1991). Corpus, Concordance, Collocation.
Koppel, M., Schler, J., Argamon, S., and Winter, Y. Oxford: Oxford University Press.
(2012). The ‘fundamental problem’ of authorship attri- Stamatatos, E. (2009). A survey of modern authorship
bution. English Studies, 93: 284–91. attribution methods. Journal of the American Society
Langacker, R. W. (1987). Foundations of Cognitive for Information Science and Technology, 60: 538–56.
Grammar. Stanford, CA: Stanford University Press. Storey, N. (2012). The Dracula Secrets: Jack the Ripper and
Larner, S. (2014). A preliminary investigation into the use the Darkest Sources of Bram Stoker. Stroud: History Press.
of fixed formulaic sequences as a marker of authorship. Sugden, P. (2002). The Complete History of Jack the
International Journal of Speech, Language and the Law, Ripper. London: Robinson.
21: 1–22. Tremblay, A., Derwing, B., and Libben, G. (2009). Are
Lewis, J. W. (1994). The Yorkshire Ripper enquiry: part I. lexical bundles stored and processed as single units?
Forensic Linguistics, 1: 207–16. Working Papers of the Linguistics Circle. University of
Mollin, S. (2009). ‘I entirely understand’ is a blairism: the Victoria, vol. 19. pp. 258–79.
methodology of identifying idiolectal collocations. Tropp, M. (1999). Images of Fear: How Horror Stories
International Journal of Corpus Linguistics, 14: 367–92. Helped Shape Modern Culture (1818-1918). Jefferson,
Oakes, M. P. (2014). Literary Detective Work on the NC: McFarland & Co.
Computer. Amsterdam: John Benjamins Publishing Turell, M. T. and Gavaldà, N. (2013). Towards an index
Company. of idiolectal similitude (or distance) in forensic author-
Perry Curtis, L. (2001). Jack the Ripper and the London ship analysis. Journal of Law and Policy, 21: 495–514.
Press. New Haven; London: Yale University Press. Walkowitz, J. (1982). Jack the Ripper and the myth of
Remington, T. (2004). Dear boss: hoax as popular com- male violence. Feminist Studies, 8: 542–74.
munal narrative in the case of the Jack the Ripper let- Wray, A. (2005). Formulaic Language and the Lexicon.
ters. Journal of Criminal Justice and Popular Culture, 10: Cambridge: Cambridge University Press.
199–222. Wright, D. (2017). Using word N-grams to identify au-
Rumbelow, D. (1979). The Complete Jack the Ripper. thors and idiolects. A Corpus Approach to a Forensic
London: W. H. Allen. Linguistic Problem, International Journal of Corpus
Linguistics, 22: 212–41.
Schmid, H.-J. (2016). A framework for understanding
linguistic entrenchment and its psychological foun- Zipf, G. (1935). The Psycho-Biology of Language: An
dations. In Entrenchment and the Psychology of Introduction to Dynamic Philology. Boston: Houghton
Language Learning: How We Reorganize and Adapt Mifflin.

636 Digital Scholarship in the Humanities, Vol. 33, No. 3, 2018

Downloaded from https://academic.oup.com/dsh/article-abstract/33/3/621/4824843


by Carnegie Mellon University user
on 28 August 2018

You might also like