Exploring and Comparing Tools: borrow with Antconc
Ylva Berglund Prytz & Martin Wynne
                                       February 2014
This exercise will use the Antconc freeware concordance program, and the BNC Sampler
corpus. Antconc is a free and easy to use application for exploring texts and corpora. The
BNC Sampler is a subset of the BNC. The BNC is a snapshot of the English language in
Britain in the late 20th century, with examples of speech and writing of a wide variety of
types. The BNC Sampler has just one million words of writing and one million words of
speech, making it a much smaller (2% of thw whole) but in some ways more manageable
sample of the language.
If you don't already have the relevant files on your computer, you can download them all
for free from:
    • BNC Sampler from the Oxford Text Archive (requires authorization by email):
       http://www.ota.ox.ac.uk/desc/2551
    • Antconc: http://www.antlab.sci.waseda.ac.jp/software.html
This document contains some exercises to help you get started to use Antconc to explore
the usage of words in English.
1.1 Getting started
These are the questions which we are trying to answer about the use of 'borrow'? Again,
we're going to use corpus evidence to explore how the word is used. This time we're going
to use a different tool, and a smaller corpus, and we're going to see if a different approach
might shed light in new areas.
What is borrowed and by whom?
What words do you expect to find together with borrow?
Can these words be grouped in some way, for example based on their word class, function,
or meaning?
Where would you expect these words (e.g. before or after borrow? Immediately adjacent or
not?)
Who do you think uses the work borrow? In what context or type of language would you
find borrow?
Are there any words that are NOT used with borrow?
First, if you don't already have it installed, you need to download, unzip and run Antconc.
This is very simple, and it works on Windows, Mac and Linux operating systems. The details
of the next stages will vary slightly, depending on the set-up of your computer.
   1. If you already have Antconc on your desktop go to step 2. Otherwise, in a web
      browser, go to http://www.antlab.sci.waseda.ac.jp/software.html and download the
      relevant version of the Antconc programme for your operating system. The 'classic'
      version for Windows (3.2.4w) has been tested with these exercises. Note the location
     1                                      Corpus Linguistics, IT Services 2014
   where the file is saved to, or find it in your browser's list of downloaded files. Double-
   click on the file to unpack the archive. Extract it to the Desktop, the Downloads
   folder, or some other convenient location.
2. Double-click on the Antconc icon to run it. An Antconc window should appear, as
   below:
3. Before we can load a corpus and get started, we need to change a few of the settings
   in Antconc, since the defaults are not going to work for us straight away.
4. Global settings → File settings → Select default file type ALL or XML
5. File → Open dir then navigate to where you have downloaded the corpus files, and
   click on the XML directory. A list of files should now appear in the pane on the left-
   hand side.
6. Now let's search! Type borrow in the search box. A list of concordance lines should
   now appear in the main pane. This might be a little slow, depending on your system.
7. The results might not be very satisfactory, since you can probably see lots of codes
   and tags, and not many words. Let's adjust the setting again.
8. Global settings → Tag settings → Hide tags (check box)
9. Search for borrow again. This time, you should get a more easily readable KWIC
   concordance, as below:
  10. Take another look through the 'Global settings'. You will see that there are options
      for wildcard searches, and for the display of results. You might want to change
      colours, fonts and font sizes now. You will also see that there are lots of options for
      fonts and for specifying the character set, which makes Antconc usable with lots of
      languages. You might also want to resize or maximize the Antconc window, and
      adjust the width of some of the panes.
1.2 Exploring 'borrow' with Antconc
  11. Return to the Concordance tab and explore the concordance lines which are now
      displayed on your screen. Here are some tips on things that you can try. Can you
      reproduce the summary information which BNCweb gave you at the top of the
      browser page, which we used to answer the following questions?
           How many occurrences of ‘borrow’ are there in the corpus?
           What is the relative frequency (hits per million words)?
           What other statistical information can you find?
           Is the word used in particular types of texts or contexts? (Can you access the
           contextual information which BNCweb was able to show?)
  12. So far we have only searched for the base form of the verb borrow. What about
      borrows, borrowed, borrowing, etc.? Try using a wild card, and search for borrow*.
      How many results do you have now? Are we picking up all inflected forms of the
      verb? Are we getting results for other related (or unrelated) words?
   13. Try sorting the results. You can adjust the settings with the Kwic Sort settings at the
       bottom of the window, and then click on the yellow Sort button to resort the
       concordance lines. Sorting for example on 1R, 2R and 3R, will sort first
       alphabetically on the word one place after the search term. Where there are
       multiple occurrences of a word in this position, these examples are sorted on the
       second word to the right. Note that words used as sort keys are coloured in Antconc.
   14. Try sorting with Kwic Sort settings 1L, 2L and 3L to reveal patterns of words
       occuring before borrow.
   15. Try sorting with Kwic Sort settings 1L, 1R and 2R. Does this reveal different
       patterns?
1.3 Exploring further functions of Antconc
   16. The main functions can be found via tabs across the top of the main pane. Click on
       Concordance Plot. This is a visualization of how the occurrences of the search term
       are distributed across the texts. This might not be very interesting in this case!
   17. Go back to the Concordance tab and click on one of the lines. You will be taken to
       the File View, which allows you to explore the expanded context of the
       concordance.
   18. Click on the Clusters tab. Before hitting Start, let's look at the default settings. This
       function finds repeated clusters of words featuring the search term, so it can be
       useful for finding fixed expressions and other repeated patterns. Change the Cluster
       size minimum to 3, the maximum to 5, and the Min. Cluster Frequency to 3. Then
       click on the blue Start button below the search box. Wait for the files to be
       processed then look at the results. Try adjusting the parameters and searching
       again – are any more interesting results revealed?
   19. Click on the Word List tab then click on the blue Start button. A list of all the words
       in the corpus is now displayed, with the most frequent at the top of the list.
   20. Once you have made a word list, it is possible to explore Collocates. (If you click on
       the Collocates tab before making a word list, it will tell you that it needs to jump to
       Word List first.) We will be focussing on collocation next week, so we won't explore
       this in detail now. For the moment, you can view the output in this tab as a list of
       words, such as money, which have a tendency to occur close to the word borrow.
       Once again, the results will probably be more useful if you adjust the default
       settings. You could try a Window Span of 4L to 4R (four words to the left and four
       words to the right) and a Min. Collocate Frequency of 3.
   21. Note that if you look at the other tabs after running a new search, the old
       information will still be there until you hit Start again. Go to Clusters and hit Start
       again. You should now get results for borrow*, so there should be more results.
   22. In the Clusters tab, select N-Grams then hit Start. This will take a little longer to
       calculate, so be patient. What do you think that this result set represents?
More information on using Antconc
Take a look at the 'Online Help' section on the Antconc software webpage (see above).
There are lots of videos, and a detailed, step-by-step written tutorial guide at:
http://www.antlab.sci.waseda.ac.jp/software/antconc_guide_by_warren_tang_20110305.pdf