(Original PDF) Quantitative Corpus Linguistics With R Second Edition
(Original PDF) Quantitative Corpus Linguistics With R Second Edition
com
https://ebookluna.com/product/original-pdf-quantitative-
corpus-linguistics-with-r-second-edition/
OR CLICK BUTTON
DOWNLOAD NOW
https://ebookluna.com/product/ebook-pdf-quantitative-literacy-
thinking-between-the-lines-second-edition/
ebookluna.com
https://ebookluna.com/product/advanced-r-second-edition-by-hadley-
wickham/
ebookluna.com
https://ebookluna.com/product/quantitative-methods-for-business-13th-
edition-by-david-r-anderson-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-research-methods-in-
linguistics/
ebookluna.com
(eBook PDF) Learner Corpus Research: New Perspectives and
Applications
https://ebookluna.com/product/ebook-pdf-learner-corpus-research-new-
perspectives-and-applications/
ebookluna.com
https://ebookluna.com/download/an-introduction-to-management-science-
quantitative-approach-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-biology-now-with-physiology-
second-edition/
ebookluna.com
https://ebookluna.com/download/progress-in-heterocyclic-chemistry-
ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-a-concise-introduction-to-
linguistics-5th-edition/
ebookluna.com
List of Contents vii
Appendix 271
Index 272
Figures
This book is dedicated to the people who have been so kind as to be part of what I might
self-deprecatingly call my ‘support network’; they are in alphabetical order of last names:
PMC, SCD, MIF, BH, S[LW], MN, H[RW], and DS – I am very grateful to all you’ve
done and all your tolerance over the last year or so! I wish to thank the team at Routledge
for their interest in, and support of, a second edition of this textbook; also, I am grateful
to the members of my corpus linguistics and statistics newsgroups for their questions,
suggestions, and feedback on various issues and topics that have now made it into this
second edition. Finally, I am grateful to many students and participants of classes, summer
schools, and workshops/bootcamps where parts of this book were used.
1 Introduction
•• the history of corpus linguistics: Kaeding, Fries, early 1m word corpora, up to the
contemporary giga corpora and the still lively web-as-corpus discussion;
•• how to compile corpora: size, sampling, balancedness, representativity;
•• how to create corpus markup and annotation: lemmatization, tagging, parsing;
•• kinds and examples of corpora: synchronic vs. diachronic, annotated vs. unannotated;
•• what kinds of corpus-linguistic research have been done.
That is to say, rather than telling you about the discipline of corpus linguistics – its history,
its place in linguistics, its contributions to different fields, etc. – with this book, I will ‘only’
teach you how to do corpus-linguistic data processing with the programming language
R (see McEnery and Hardie 2011 for an excellent recent introduction). In other words,
this book presupposes that you know what you would like to explore but gives you tools
to do it that go beyond what most commonly used tools can offer and, thus, hopefully
also open up your minds about how to approach your corpus-linguistic questions. This
is important since, to me, corpus linguistics is a method of analysis, so talking about how
to do things should enjoy a high priority (see Gries 2010 and the rest of that special issue,
as well as Gries 2011 for my subjective takes on this matter). Therefore, I will mostly be
concerned with:
•• aspects of how exactly data are retrieved from corpora to be used in linguistically
informed analyses, specifically how to obtain from corpora frequency lists, dispersion
information, collocation displays, concordances, etc. (see Chapter 2 for explanation
and exemplification of these terms);
•• aspects of data manipulation and evaluation: how to process and convert corpus data;
how to save various kinds of results; how to import them into a spreadsheet pro-
gram for further annotation; how to analyze results statistically; how to represent the
results graphically; and how to report your results.
2 Introduction
A second important characteristic of this book is that it only uses freely available software:
•• R, the corpus linguist’s all-purpose tool (cf. R Core Team 2016): a software which is
a calculator, a statistics program, a (statistical) graphics program, and a programming
language at the same time. The versions used in this book are R (www.r-project.org)
and the freely available Microsoft R Open 3.3.1 (https://mran.revolutionanalytics.com/
open, the versions for Ubuntu 16.04 LTS (or Mint 18) and Microsoft Windows 10);
•• RStudio 0.99.1294 (www.rstudio.com);
•• LibreOffice 5.2.0.4 (www.libreoffice.org).
The choice of these software tools, especially the decision to use R, has a number of
important implications, which should be mentioned early on. As I just mentioned, R
is a full-fledged multi-purpose programming language and, thus, a very powerful tool.
However, this degree of power does come at a cost: In the beginning, it is undoubtedly
more difficult to do things with R than with ready-made (free or commercial) concord-
ancing software that has been written specifically for corpus-linguistic applications. For
example, if you want to generate a frequency list of a corpus or a concordance of a word
in a corpus with R, you must write a small script or a little bit of code in a programming
language, which is the technical way of saying you write lines of text that are instructions
to R. If you do not need pretty output, this script may consist of just a few lines, but it will
often also be longer than that. On the other hand, if you have a ready-made concordancer,
you click a few buttons (and enter a search term) to get the job done. One may therefore
ask why go through the trouble of learning R? There is a variety of very good reasons for
this, some of them related to corpus linguistics, some more general.
First, let me address this very argument, which is often made against using R (or other
programming languages): why use a lot of time and effort to learn a programming lan-
guage if you can get results from ready-made software within minutes? With regard to
the time that goes into learning R, yes, there is a learning curve. However, that time may
not be as long as you think: Many participants in my bootcamps and other workshops
develop a first good understanding of R that allows them to begin to proceed on their
own within just a few days. Plus, being able to program is an extremely useful skill for
academic purposes, but also for jobs outside of academia; I would go so far as to say that
learning to program is extremely useful in how it develops, or hones, a particular way of
analytical and rigorous thinking that is useful in general. With regard to the time that goes
into writing a script, much of that usually needs to be undertaken only once. As you will
see below, once you have written your first few scripts while going through this book, you
can usually reuse (parts of) them for many different tasks and corpora, and the amount
of time that is required to perform a particular task becomes very similar to that of using
a ready-made program. In fact, nearly all corpus-linguistic tasks in my own research are
done with (somewhat adjusted) scripts or small snippets of code from this book. In addi-
tion, once you explore how to write your own functions (see Section 3.10), you can easily
write your own versatile or specialized functions yourself; I will make several of those
available in subsequent chapters. This way, the actual effort of generating a frequency list,
a collocate display, a dispersion plot, etc. often reduces to about the time you need with
a concordance program. In fact, R may even be faster than competing applications: For
example, some concordance programs read in the corpus files once before they are pro-
cessed and then again for performing the actual task – R requires only one pass and may,
therefore, outperform some competitors in terms of processing time.
Another point related to the notion that programming knowledge is useful: The knowl-
edge you will acquire by working through this book is quite general, and I mean that in a
Introduction 3
good way. This is because you will not be restricted to just one particular software appli-
cation (or even one version of one particular software application) and its restricted set
of features. Rather, you will acquire knowledge of a programming language and regular
expressions which will allow you to use many different utilities and to understand scripts
in other programming languages, such as Perl or Python. (At the same time, I think R is
simpler than Perl or Python, but can also interface with them via RSPerl and RSPython,
respectively; see www.omegahat.org.) For example, if you ever come across scripts by
other people or decide to turn to these languages yourself, you will benefit from know-
ing R in a way that no ready-made concordancing software would allow for. If you are
already a bit familiar with corpus-linguistic work, you may now think “but why turn to
R and not use Perl or Python (especially since you say Perl and Python are similar anyway
and many people already use one of these languages)?” This is a good question, and I
myself used Perl for corpus processing before I turned to R. However, I think I also have
a good answer to why to use R instead. First, the issue of speed is much less of a problem
than one may think. R is fast enough and stable enough for most applications (especially if
you heed some of the advice given in Sections 3.6.3 and 3.10). Thus, if a script takes a bit
of time, you can simply run it over lunch, while you are in class, or even overnight and col-
lect the results afterwards. Second, R has other advantages. The main one is probably that,
in addition text-processing capabilities, R offers a large number of ready-made functions
for the statistical evaluation and graphical representation of data, which allows you to
perform just about all corpus-linguistic tasks within only one programming environment.
You can do your data processing, data retrieval, annotation, statistical evaluation, graphi-
cal representation . . . everything within just one environment, whereas if you wanted to
do all these things in Perl or Python, you would require a huge amount of separate pro-
gramming. Consider a very simple example: R has a function called table that generates
a frequency table. To perform the same in Perl you would either have to have a small loop
counting elements in an array and in a stepwise fashion increment their frequencies in a
hash or, later and more cleverly, program a subroutine which you would then always call
upon. While this is no problem with a one-dimensional frequency list, this is much harder
with multidimensional frequency tables: Perl’s arrays of arrays or hashes of arrays etc. are
not for the faint-hearted, whereas R’s table is easy to handle, and additional functions
(table, xtabs, ftable, etc.) allow you to handle such tables very easily. I believe learning
one environment can be sufficiently hard for beginners, and therefore recommend using
the more comprehensive environment with the greater number of simpler functions, which
to me clearly is R. And, once you have mastered the fundamentals of R and face situations
in which you need maximal computational power, switching to Perl or Python in a limited
number of cases will be easier for you anyway, especially since much of the programming
languages’ syntaxes is similar and the regular expressions used in this book are all Perl
compatible. (Let me tell you, though, that in all my years using R, there were a mere two
instances where I had to switch to Perl and that was only because I didn’t yet know how
to solve a particular problem in R.)
Second, by learning to do your analyses with a programming language, you usually
have more control over what you are actually doing: Different concordance programs
have different settings or different ways of handling searches that are not always obvious
to the (inexperienced) user. For instance, ready-made concordance tools often have slightly
different settings that specify what ‘a word’ is, which means you can get different results
if you have different programs perform the same search on the same corpus. Yes, those
settings can usually be tweaked, but that means that, actually, such a ready-made applica-
tion requires the same attention to detail as R, and with a programming language all of
your methodological choices are right there in the code for everyone to see and replicate.
4 Introduction
Third, if you use a particular concordancing software, you are at the mercy of its
developer. If the developers change its behavior, its results output, or its default settings,
you can only hope that this is documented well and/or does not affect your results. There
have been cases where even silent over-the-internet updates have changed the output of
such software from one day to the next. Worse, developers might even discontinue the
development of a tool altogether – and let us not even consider how sorry the state of the
discipline of corpus linguistics would be if a majority of its practitioners was dependent
on not even a handful of ready-made corpus tools and websites that allow you to search
a corpus online. Somewhat polemically speaking, being able to enter a URL and type in a
search word shouldn’t make you a corpus linguist.
The fourth and maybe most important reason for learning a programming language
such as R is that a programming language is a much more versatile tool than any ready-
made software application. For instance, many ready-made corpus tools can only offer the
functionality they aim to provide for corpora with particular formats, and then can only
provide a small number of kinds of output. R, as a programming language, can handle
pretty much any input and can generate pretty much any output you want – in fact, in my
bootcamps, I tell participants on day 1 that I don’t want to hear any questions that begin
with “Can R . . . ?” because the answer is “Yes”. For instance, with R you can readily use
the CELEX database, CHAT files from language acquisition corpora, the very hierarchi-
cally layered annotation of XML corpora, previously generated frequency lists for corpora
you no longer have access to, literature files from Project Gutenberg or similar sites, tabu-
lar corpus files such as those from the Corpus of Contemporary American English (http://
corpus.byu.edu/coca) or the Corpus of Historical American English (http://corpus.byu.
edu/coha), and so on and so forth. You can use files of whatever encoding, meaning
that data from any language/writing system can be straightforwardly processed, and R’s
general data-processing capabilities are mostly only limited by your working memory
and abilities (rather than, for instance, the number of rows your spreadsheet software
can handle). With very few exceptions, R works identically on all three major operating
systems: Linux/Unix, Windows, and Mac OS X. In a way, once you have mastered the
basic mechanisms, there is basically no limit to what you can do with it, both in terms of
linguistic processing and statistical evaluation.
But there are also additional important advantages in the fact that R is an open-source
tool/programming language. For instance, there is a large number of functions and pack-
ages that are contributed by users all over the world. These often allow effective shortcuts
that are not, or hardly, possible with ready-made applications, which you cannot tweak
as you wish. Also, contrary to commercial concordance software, bug-fixes are usu-
ally available very quickly. And a final, obvious, and very down-to-earth advantage of
using open-source software is of course that it comes free of charge. Any student or any
department’s computer lab can afford it without expensive licenses, temporally limited or
functionally restricted licenses, or irritating ads and nag screens. All this makes a strong
case for the choice of software made here.
•• you can send questions about corpus linguistics with R to the list and, hopefully, get
useful responses from some kind soul(s);
•• post suggestions for revisions of this book there;
•• inform me and the other readers of errors you find and, of course, be informed when
other people or I find errata.
Thus, while this is not an easy book, I hope these aids help you to become a good
corpus linguist. If you work through the whole book, you will be able to do a large num-
ber of things you could not even do with commercial concordancing software; many of the
scripts you find here are taken from actual research, and are in fact simplified versions of
scripts I have used myself for published papers. In addition, if you also take up the many
recommendations for further exploration that are scattered throughout the book, you will
probably find ever new and more efficient ways of application.
References
Gries, Stefan Th. (2010). Corpus linguistics and theoretical linguistics: A love–hate relationship?
Not necessarily . . . International Journal of Corpus Linguistics 15(3), 327–343.
Gries, Stefan Th. (2011). Methodological and interdisciplinary stance in corpus linguistics. In
Geoffrey Barnbrook, Vander Viana, & Sonia Zyngier (Eds.), Perspectives on corpus linguistics:
Connections and controversies (pp. 81–98). Amsterdam: John Benjamins.
Gries, Stefan Th. (2013). Statistics for linguistics with R. 2nd rev. and ext. ed. Berlin: De Gruyter
Mouton.
McEnery, Tony, & Andrew Hardie. (2011). Corpus linguistics: Method, theory, and practice.
Cambridge: Cambridge University Press.
R Core Team. (2016). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. Retrieved from www.R-project.org.
2 The Four Central Corpus-Linguistic
Methods
This last point leads me, with some slight trepidation, to make a comment on our field in
general, an informal observation based largely on a number of papers I have read as submis-
sions in recent months. In particular, we seem to be witnessing as well a shift in the way
some linguists find and utilize data – many papers now use corpora as their primary data,
and many use internet data.
(Joseph 2004: 382)
In this chapter you will learn what a corpus is (plural: corpora) and what the four methods
are to which nearly all aspects of corpus-linguistic work can be reduced in some way.
2.1 Corpora
Before we start to actually look at corpus linguistics, we have to clarify our terminology a
little. While the actual programming tasks do not differ between them, in this book I will
distinguish between a corpus, a text archive, and an example collection.
•• “Machine-readable” refers to the fact that nowadays virtually all corpora are stored
in the form of plain ASCII or Unicode text files that can be loaded, manipulated, and
processed platform-independently. This does not mean, however, that corpus linguists
only deal with raw text files – quite the contrary: some corpora are shipped with
sophisticated retrieval software that makes it possible to look for precisely defined
lexical, syntactic, or other patterns. It does mean, however, that you would have a
hard time finding corpora on paper, in the form of punch cards or digitally in HTML
or Microsoft Word document formats; the probably most widely used format consists
of text files with a Unicode UTF-8 encoding and XML annotation.
•• “Produced in a natural communicative setting” means that the texts were spoken
or written for some authentic communicative purpose, but not for the purpose of
putting them into a corpus. For example, many corpora consist to a large degree of
8 The Four Central Corpus-Linguistic Methods
newspaper articles. These meet the criterion of having been produced in a natural
setting because journalists write the article to be published in newspapers and to
communicate something to their readers, not because they want to fill a linguist’s
corpus. Similarly, if I obtained permission to record all of a particular person’s con-
versations in one week, then hopefully, while the person and his interlocutors usually
are aware of their conversations being recorded, I will obtain authentic conversations
rather than conversations produced only for the sake of my corpus.
•• I use “representative [ . . . ] with respect to a particular language, variety . . . ” here to
refer to the fact that the different parts of the linguistic variety I am interested in should
all be manifested in the corpus (at least if you want to generalize much beyond your
sample, e.g., to the language in general). For example, if I was interested in phonologi-
cal reduction patterns of speech of adolescent Californians and recorded only parts of
their conversations with several people from their peer group, my corpus would not be
representative in the above sense because it would not reflect the fact that some sizable
proportion of the speech of adolescent Californians may also consist of dialogs with a
parent, a teacher, etc., which would therefore also have to be included.
•• I use “balanced with respect to a particular linguistic language, variety . . . ” to mean
that not only should all parts of which a variety consists be sampled into the corpus,
but also that the proportion with which a particular part is represented in a corpus
should reflect the proportion the part makes up in this variety and/or the importance
of the part in this variety (at least if you want to generalize much beyond your sam-
ple, e.g., to the language in general). For example, if I know that dialogs make up 65
percent of the speech of adolescent Californians, approximately 65 percent of my
corpus should consist of dialog recordings. This example already shows that this cri-
terion is more of a theoretical ideal: How would one even measure the proportion that
dialogs make-up of the speech of adolescent Californians? We can only record a tiny
sample of all adolescent Californians, and how would we measure the proportion of
dialogs? In terms of time? In terms of sentences? In terms of words? And how would
we measure the importance of a particular linguistic variety? The implicit assumption
that conversational speech is somehow the primary object of interest in linguistics also
prevails in corpus linguistics, which is why corpora often aim at including as much
spoken language as possible, but on the other hand a single newspaper headline read
by millions of people may have a much larger influence on every reader’s linguistic
system than 20 hours of dialog. In sum, balanced corpora are a theoretical ideal cor-
pus compilers constantly bear in mind, but the ultimate and exact way of compiling a
balanced corpus has remained mysterious so far.
It is useful to point out, however, that the above definition of a corpus is perhaps the
prototype, which implies that there are many other corpora that differ from the proto-
type and other kinds of corpora along a variety of dimensions. For instance, the TIMIT
Acoustic-Phonetic Continuous Speech Corpus is made up of audio recordings of 630
speakers of eight major dialects of American English, where each speaker read phoneti-
cally rich sentences, a setting which is not exactly a natural communicative setting. Or
consider the DCIEM Map Task Corpus, which consists of unscripted dialogs in which
one interlocutor describes a route on a map to the other after both interlocutors were
subjected to 60 hours of sleep deprivation and one of three drug treatments – again,
hardly a normal situation. Even a genre as widely used as newspaper text – journalese –
is not necessarily close to being a prototypical corpus, given how newspaper writing
is created much more deliberately and consciously than many other texts – plus they
often come with linguistically arbitrary restrictions regarding their length, are often not
The Four Central Corpus-Linguistic Methods 9
written by a single person, and are heavily edited, etc. Thus, the notion of corpus is really
a rather diverse one.
Many people would prefer to consider newspaper data not corpora, but text archives.
Those would be databases of texts which
As the above discussion already indicated, however, the distinction between corpora and
text archives is often blurred. It is theoretically easy to make, but in practice often not
adhered to very strictly and, again, has very few implications for the kinds of (R) program-
ming they require. For example, if a publisher of a popular computing periodical makes all
the issues of the previous year available on their website, then the first criterion is met, but
not the last three. However, because of their availability and size, many corpus linguists
use them as resources, and as long as one bears their limitations in mind in terms of rep-
resentativity etc., there is little reason not to.
Finally, an example collection is just what the name says it is – a collection of examples
that, typically, the person who compiled the examples came across and noted down. For
example, much psycholinguistic research in the 1970s was based on collections of speech
errors compiled by the researchers themselves and/or their helpers. Occasionally, people
refer to such collections as error corpora, but we will not use the term corpus for these. It is
easy to see how such collections compare to corpora. On the one hand, for example, some
errors – while occurring frequently in authentic speech – are more difficult to perceive than
others and thus hardly ever make it into a collection. This would be an analog to the balanc-
edness problem outlined above. On the other hand, the perception of errors is contingent
on the acuity of the researcher while, with corpus research, the corpus compilation would
not be contingent on a particular person’s perceptual skills. Finally, because of the scarcity
of speech errors, usually all speech errors perceived (in a particular amount of time) are
included into the corpus, whereas, at least usually and ideally, corpus compilers are more
picky and select the material to be included with an eye to the criteria of representativity and
balancedness outlined above.1 Be that as it may, if only for the sake of terminological clar-
ity, it is useful to distinguish the notions of corpora, text archives, and example collections.
Other annotation includes that with regard to semantic characteristics, stylistic aspects,
anaphoric relations (co-reference annotation), etc. Nowadays, most corpora come in the
The Four Central Corpus-Linguistic Methods 11
form of XML files, and we will explore many examples involving XML annotation in the
chapters to come. As is probably obvious from the above, annotation can sometimes be
done completely automatically (possibly with human error-checking), semi-automatically,
or must be done completely manually. POS tagging, the probably most frequent kind of
annotation, is usually done automatically, and for English taggers are claimed to achieve
accuracy rates of 97 percent – a number that I sometimes find hard to believe when I look
at corpora, but that is a different story.
Then, there is a difference between diachronic corpora and synchronic corpora. The
former aim at representing how a language/variety changes over time, while the latter
provide, so to speak, a snapshot of a language/variety at one particular point in time. Yet
another distinction is that between monolingual corpora and parallel corpora. As you
might already guess from the names, the former have been compiled to provide informa-
tion about one particular language/variety, whereas the latter ideally provide the same
text in several different languages. Examples include translations from EU Parliament
debates into the 23 languages of the European Union, or the Canadian Hansard corpus,
containing Canadian Parliament debates in English and French. Again, ideally, a parallel
corpus does not just have the translations in different languages, but has the transla-
tions sentence-aligned, such that for every sentence in language L1, you can automatically
retrieve its translation in the languages L2 to Ln.
The next distinction to be mentioned here is that of static corpora vs. dynamic/moni-
tor corpora. Static corpora have a fixed size (e.g., the Brown corpus, the LOB corpus, the
British National Corpus), whereas dynamic corpora do not since they may be constantly
extended with new material (e.g., the Bank of English).
The final distinction I would like to mention at least briefly involves the encoding of
the corpus files. Given especially the predominance of work on English in corpus linguis-
tics, until rather recently many corpora came in the so-called ASCII (American Standard
Code for Information Interchange) character encoding, an encoding scheme that encodes
27 = 128 characters as numbers and that is largely based on the Western alphabet. With
these characters, special characters that were not part of the ASCII character inventory
were often paraphrased, e.g., “é” was paraphrased as “é”. However, the number
of corpora for many more languages has been increasing steadily, and given the large
number of characters that writing systems such as Chinese have, this is not a practi-
cal approach. As such, language-specific character encodings were developed (e.g., ISO
8859-1 for Western European Languages vs. ISO 2022 for Chinese/Japanese/Korean lan-
guages). However, in the interest of overcoming compatibility problems that arose due to
how different languages used different character encodings, the field of corpus linguistics
has been moving towards using only one unified (i.e., not language-specific) multilingual
character encoding in the form of Unicode (most notably UTF-8). This development is
in tandem with the move toward XML corpus annotation and, more generally, UTF-8
becoming the most widely used character encoding on the internet.
Now that you know a bit about the kinds of corpora that exist, there is one other really
important point to be made. While we will see below that corpus linguistics has a lot to
offer to the analyst, it is worth pointing out that, strictly speaking at least, the only thing
corpora can provide is information on frequencies. Put differently, there is no meaning in
corpora, and no functions, only:
No fruit jellies so easily as black currants when they are ripe; and
their juice is so rich and thick that it will bear the addition of a very
small quantity of water sometimes, without causing the preserve to
mould. When the currants have been very dusty, we have
occasionally had them washed and drained before they were used,
without any injurious effects. Jam boiled down in the usual manner
with this fruit is often very dry. It may be greatly improved by taking
out nearly half the currants when it is ready to be potted, pressing
them well against the side of the preserving-pan to extract the juice:
this leaves the remainder far more liquid and refreshing than when
the skins are all retained. Another mode of making fine black currant
jam—as well as that of any other fruit—is to add one pound at least
of juice, extracted as for jelly, to two pounds of the berries, and to
allow sugar for it in the same proportion as directed for each pound
of them.
For marmalade or paste, which is most useful in affections of the
throat and chest, the currants must be stewed tender in their own
juice, and then rubbed through a sieve. After ten minutes’ boiling,
sugar in fine powder must be stirred gradually to the pulp, off the fire,
until it is dissolved: a few minutes more of boiling will then suffice to
render the preserve thick, and it will become quite firm when cold.
More or less sugar can be added to the taste, but it is not generally
liked very sweet.
Best black currant jam.—Currants, 4 lbs.; juice of currants, 2 lbs.:
15 to 20 minutes’ gentle boiling. Sugar, 3 to 4 lbs.: 10 minutes.
Marmalade, or paste of black currants.—Fruit, 4 lbs.: stewed in its
own juice 15 minutes, or until quite soft. Pulp boiled 10 minutes.
Sugar, from 7 to 9 oz. to the lb.: 10 to 14 minutes.
Obs.—The following are the receipts originally inserted in this
work, and which we leave unaltered.
To six pounds of the fruit, stripped carefully from the stalks, add
four pounds and a half of sugar. Let them heat gently, but as soon as
the sugar is dissolved boil the preserve rapidly for fifteen minutes. A
more common kind of jam may be made by boiling the fruit by itself
from ten to fifteen minutes, and for ten minutes after half its weight of
sugar has been added to it.
Black currants, 6 lbs.; sugar, 4-1/2 lbs.: 15 minutes. Or: fruit, 6 lbs.:
10 to 15 minutes. Sugar, 3 lbs.: 10 minutes.
Obs.—There are few preparations of fruit so refreshing and so
useful in illness as those of black currants, and it is therefore
advisable always to have a store of them, and to have them well and
carefully made.
NURSERY PRESERVE.
(A New Receipt.)
The market-price of our English pines is generally too high to
permit their being very commonly used for preserve; and though
some of those imported from the West Indies are sufficiently well-
flavoured to make excellent jam, they must be selected with
judgment for the purpose, or they will possibly not answer for it. They
should be fully ripe, but perfectly sound: should the stalk end appear
mouldy or discoloured, the fruit should be rejected. The degree of
flavour which it possesses may be ascertained with tolerable
accuracy by its odour; for if of good quality, and fit for use, it will be
very fragrant. After the rinds have been pared off, and every dark
speck taken from the flesh, the pines may be rasped on a fine and
delicately clean grater, or sliced thin, cut up quickly into dice, and
pounded in a stone or marble mortar; or a portion may be grated,
and the remainder reduced to pulp in the mortar. Weigh, and then
heat and boil it gently for ten minutes; draw it from the fire, and stir to
it by degrees fourteen ounces of sugar to the pound of fruit; boil it
until it thickens and becomes very transparent, which it will be in
about fifteen minutes, should the quantity be small: it will require a
rather longer time if it be large. The sugar ought to be of the best
quality and beaten quite to powder; and for this, as well as for every
other kind of preserve, it should be dry. A remarkably fine
marmalade may be compounded of English pines only, or even with
one English pine of superior growth, and two or three of the West
Indian mixed with it; but all when used should be fully ripe, without at
all verging on decay; for in no other state will their delicious flavour
be in its perfection.
In making the jam always avoid placing the preserving-pan flat
upon the fire, as this of itself will often convert what would otherwise
be excellent preserve, into a strange sort of compound, for which it is
difficult to find a name, and which results from the sugar being
subjected—when in combination with the acid of the fruit—to a
degree of heat which converts it into caramel or highly-boiled barley-
sugar. When there is no regular preserving-stove, a flat trivet should
be securely placed across the fire of the kitchen-range to raise the
pan from immediate contact with the burning coals, or charcoal. It is
better to grate down, than to pound the fruit for the present receipt
should any parts of it be ever so slightly tough; and it should then be
slowly stewed until quite tender before any sugar is added to it; or
with only a very small quantity stirred in should it become too dry. A
superior marmalade even to this, might probably be made by adding
to the rasped pines a little juice drawn by a gentle heat, or expressed
cold, from inferior portions of the fruit; but this is only supposition.
A FINE PRESERVE OF THE GREEN ORANGE PLUM.
When the plums are thoroughly ripe, take off the skins, stone,
weigh, and boil them quickly without sugar for fifty minutes, keeping
them well stirred; then to every four pounds add three of good sugar
reduced quite to powder, boil the preserve from five to eight minutes
longer, and clear off the scum perfectly before it is poured into the
jars. When the flesh of the fruit will not separate easily from the
stones, weigh and throw the plums whole into the preserving-pan,
boil them to a pulp, pass them through a sieve, and deduct the
weight of the stones from them when apportioning the sugar to the
jam. The Orleans plum may be substituted for greengages in this
receipt.
Greengages, stoned and skinned, 6 lbs.: 50 minutes. Sugar, 4-1/2
lbs.: 5 to 8 minutes.
PRESERVE OF THE MAGNUM BONUM, OR MOGUL PLUM.
Prepare, weigh, and boil the plums for forty minutes; stir to them
half their weight of good sugar beaten fine, and when it is dissolved
continue the boiling for ten additional minutes, and skim the preserve
carefully during the time. This is an excellent marmalade, but it may
be rendered richer by increasing the proportion of sugar. The
blanched kernels of a portion of the fruit stones will much improve its
flavour, but they should be mixed with it only two or three minutes
before it is taken from the fire. When the plums are not entirely ripe,
it is difficult to free them from the stones and skins: they should then
be boiled down and pressed through a sieve, as directed for
greengages, in the receipt above.
Mogul plums, skinned and stoned, 6 lbs.: 40 minutes. Sugar, 3
lbs.: 5 to 8 minutes.
TO DRY OR PRESERVE MOGUL PLUMS IN SYRUP.
Pare the plums, but do not remove the stalks or stones; take their
weight of dry sifted sugar, lay them into a deep dish or bowl, and
strew it over them; let them remain thus for a night, then pour them
gently into a preserving-pan with all the sugar, heat them slowly, and
let them just simmer for five minutes; in two days repeat the process,
and do so again and again at an interval of two or three days, until
the fruit is tender and very clear; put it then into jars, and keep it in
the syrup, or drain and dry the plums very gradually, as directed for
other fruit. When they are not sufficiently ripe for the skin to part from
them readily, they must be covered with spring water, placed over a
slow fire, and just scalded until it can be stripped from them easily.
They may also be entirely prepared by the receipt for dried apricots
which follows, a page or two from this.
MUSSEL PLUM CHEESE AND JELLY.
Fill large stone jars with the fruit, which should be ripe, dry, and
sound; set them into an oven from which the bread has been drawn
several hours, and let them remain all night; or, if this cannot
conveniently be done, place them in pans of water, and boil them
gently until the plums are tender, and have yielded their juice to the
utmost. Pour this from them, strain it through a jelly bag, weigh, and
then boil it rapidly for twenty-five minutes. Have ready, broken small,
three pounds of sugar for four of the juice, stir them together until it is
dissolved, and then continue the boiling quickly for ten minutes
longer, and be careful to remove all the scum. Pour the preserve into
small moulds or pans, and turn it out when it is wanted for table: it
will be very fine, both in colour and in flavour.
Juice of plums, 4 lbs.: 25 minutes. Sugar, 3 lbs.: 10 minutes.
The cheese.—Skin and stone the plums from which the juice has
been poured, and after having weighed, boil them an hour and a
quarter over a brisk fire, and stir them constantly; then to three
pounds of fruit add one of sugar, beaten to powder; boil the preserve
for another half hour, and press it into shallow pans or moulds.
Plums, 3 lbs.: 1-1/4 hour. Sugar, 1 lb.: 30 minutes.
APRICOT MARMALADE.
(French Receipt.)
Take apricots which have attained their full growth and colour, but
before they begin to soften; weigh, and wipe them lightly; make a
small incision across the top of each plum, pass the point of a knife
through the stalk end, and gently push out the stones without
breaking the fruit; next, put the apricots into a preserving-pan, with
sufficient cold water to float them easily; place it over a moderate
fire, and when it begins to boil, should the apricots be quite tender,
lift them out and throw them into more cold water, but simmer them,
otherwise, until they are so. Take the same weight of sugar that there
was of the fruit before it was stoned, and boil it for ten minutes with a
quart of water to the four pounds; skim the syrup carefully, throw in
the apricots (which should previously be well drained on a soft cloth,
or on a sieve), simmer them for one minute, and set them by in it
until the following day, then drain it from them, boil it for ten minutes,
and pour it on them the instant it is taken from the fire; in forty-eight
hours repeat the process, and when the syrup has boiled ten
minutes, put in the apricots, and simmer them from two to four
minutes, or until they look quite clear. They may be stored in the
syrup until wanted for drying, or drained from it, laid separately on
slates or dishes, and dried very gradually: the blanched kernels may
be put inside the fruit, or added to the syrup.
Apricots, 4 lbs., scalded until tender; sugar 4 lbs.; water, 1 quart:
10 minutes. Apricots, in syrup, 1 minute; left 24 hours. Syrup, boiled
again, 10 minutes, and poured on fruit: stand 2 days. Syrup, boiled
again, 10 minutes, and apricots 2 to 4 minutes, or until clear.
Obs.—The syrup should be quite thick when the apricots are put in
for the last time; but both fruit and sugar vary so much in quality and
in the degree of boiling which they require, that no invariable rule can
be given for the latter. The apricot syrup strained very clear, and
mixed with twice its measure of pale French brandy, makes an
agreeable liqueur, which is much improved by infusing in it for a few
days half an ounce of the fruit-kernels, blanched and bruised, to the
quart of liquor.
We have found that cherries prepared by either of the receipts
which we have given for preserving them with sugar, if thrown into
the apricot syrup when partially dried, just scalded in it, and left for a
fortnight, then drained and dried as usual, become a delicious
sweetmeat. Mussel, imperatrice, or any other plums, when quite ripe,
if simmered in it very gently until they are tender, and left for a few
days to imbibe its flavour, then drained and finished as usual, are
likewise excellent.
PEACH JAM, OR MARMALADE.
The fruit for this preserve, which is a very delicious one, should be
finely flavoured, and quite ripe, though perfectly sound. Pare, stone,
weigh, and boil it quickly for three-quarters of an hour, and do not fail
to stir it often during the time; draw it from the fire, and mix with it ten
ounces of well-refined sugar, rolled or beaten to powder, for each
pound of the peaches; clear it carefully from scum, and boil it briskly
for five minutes; throw in the strained juice of one or two good
lemons; continue the boiling for three minutes only, and pour out the
marmalade. Two minutes after the sugar is stirred to the fruit, add
the blanched kernels of part of the peaches.
Peaches, stoned and pared, 4 lbs.; 3/4 hour. Sugar, 2-1/2 lbs.: 2
minutes. Blanched peach-kernels: 3 minutes. Juice of 2 small
lemons: 3 minutes.
Obs.—This jam, like most others, is improved by pressing the fruit
through a sieve after it has been partially boiled. Nothing can be finer
than its flavour, which would be injured by adding the sugar at first;
and a larger proportion renders it cloyingly sweet. Nectarines and
peaches mixed, make an admirable preserve.
TO PRESERVE, OR TO DRY PEACHES OR NECTARINES.
The fruit for this jam should be freshly gathered and quite ripe.
Split, stone, weigh, and boil it quickly for forty minutes; then stir in
half its weight of good sugar roughly powdered, and when it is
dissolved, give the preserve fifteen minutes additional boiling,
keeping it stirred, and thoroughly skimmed.
Damsons, stoned, 6 lbs.: 40 minutes. Sugar, 3 lbs.: 15 minutes.
Obs.—A more refined preserve is made by pressing the fruit
through a sieve after it is boiled tender; but the jam is excellent
without.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookluna.com