Sueco I
Sueco I
Using GF
MALIN AHLBERG
University of Gothenburg
Chalmers University of Technology
Department of Computer Science and Engineering
Göteborg, Sweden, January 2012
The Author grants to Chalmers University of Technology and University of Gothenburg
the non-exclusive right to publish the Work electronically and in a non-commercial
purpose make it accessible on the Internet.
The Author warrants that he/she is the author to the Work, and warrants that the Work
does not contain text, pictures or other material that violates copyright law.
The Author shall, when transferring the rights of the Work to a third party (for example a
publisher or a company), acknowledge the third party about this agreement. If the Author
has signed a copyright agreement with a third party regarding the Work, the Author
warrants hereby that he/she has obtained any necessary permission from this third party to
let Chalmers University of Technology and University of Gothenburg store the Work
electronically and make it accessible on the Internet.
Malin Ahlberg
University of Gothenburg
Chalmers University of Technology
Department of Computer Science and Engineering
SE-412 96 Göteborg
Sweden
Telephone + 46 (0)31-772 1000
This thesis describes work towards a wide-coverage grammar for parsing and generating
Swedish text. We do this by using the dependently typed grammar formalism GF, a func-
tional programming language specialized at describing grammars. The idea is to combine
existing language resources with new techniques, with an aim to achieve a parser for unre-
stricted Swedish. To reach this goal, problems of computational as well as linguistic nature
had to be solved. The work includes the development of the grammar – to identify and
formalize grammatical constructions frequent in Swedish – as well as methods for importing
a large-scale lexicon and for evaluating the parser. We present the methods and technolo-
gies used and discuss the advantages and problems of using GF for modeling large-scale
grammars. We further discuss how our long-term goal can be reached by combining our
rule-based grammar with statistical methods.
Our contribution is a wide-coverage GF lexicon, a translation of a Swedish treebank into the
GF notation and an extended Swedish grammar implementation. The grammar is based
on the multilingual abstract syntax given in the GF resource library, and now also covers
constructions specific to Swedish. We further give an example of the advantage of using de-
pendent types when describing grammar and syntax, in this case for dealing with reflexive
pronouns.
Acknowledgments
Many people have helped me during this project and made this work possible. I would first
of all like to thank Center of Language Technology, that has funded the project.
Further, thanks to my excellent supervisor Ramona Enache for all her help and guidance
in every phase and all aspects of the work. Thanks to Elisabet Engdahl for sharing her
seemingly unlimited knowledge of Swedish grammar. She has also has acted as a second
supervisor, and given me very helpful comments and suggestions. Thanks to Aarne Ranta
for all his great ideas and for letting me do this project.
I am also grateful to Krasimir Angelov, Markus Forsberg, Peter Ljunglöf, Lars Borin
and many others who have contributed with ideas and inspiration and shown interest in
this work.
Finally, I would like to thank my friends and family. Special thanks to Dan for all his
support, advice and patience and – most importantly – for being such a good friend.
Contents
1 Introduction 1
1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Grammatical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Writing a GF grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 The resource library . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Frontiers of Grammatical Framework . . . . . . . . . . . . . . . . . . . 8
2.2 Talbanken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Saldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Swedish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Post-nominal articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Verb second . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Passive voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.4 Impersonal constructions . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.5 Reflexive pronouns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Importing Saldo 13
3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 The grammar 17
4.1 The Swedish resource grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.1.1 Noun phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.2 Verb phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.3 Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Development of the grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 The s-passive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Impersonal constructions . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 Formalizing the rules for reflexive pronouns by using dependent types 26
4.2.4 A second future tense: “Kommer att” . . . . . . . . . . . . . . . . . . 30
4.2.5 Modifying verb phrases . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.6 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5 Extracting a GF treebank 35
5.1 The Talbanken annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 The translation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.1 Differences in the notation . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Results and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Discussion 41
6.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.1 Enhancements and improvements . . . . . . . . . . . . . . . . . . . . . 43
6.2.2 Making the parser robust . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chapter 1
Introduction
To build the grammar from scratch would be not only time consuming but would also mean
that already existing systems would have to be reimplemented. In order to not reinvent the
wheel we proceed from a combination of well-tested sources. We start from a GF resource
grammar, consisting of an abstract and a concrete syntax file defining a set of rules for
morphology and syntax. This is what is meant by grammar in this thesis, as opposed to
grammar in the traditional linguistic sense. From GF, we get a well-defined system for
describing language, as well as a strong connection to and possibility of translation between
the more than 20 other languages implemented in the framework. Further, we use the
extensive lexicon Saldo and the treebank Talbanken.
1
1.1 Aims
The purpose of this project has been to prepare the GF grammar for parsing of unrestricted
Swedish. This has meant to develop earlier techniques to fit for Swedish, create methods for
achieving and keeping a large-scale lexicon and to adapt the existing resource grammar to
model more language specific constrictions. The project was divided into three subsections,
aiming at
2
Chapter 2
Background
The work described in this thesis is part of a bigger project which aims at using GF for
parsing unrestricted Swedish. In previous work1 , a start was made to extend the Swedish
GF grammar and a tool for lexical acquisition was developed. We now construct a bigger
and more expressive grammar as well as a large scale GF lexicon. As all GF grammars, this
one defines a parser, and we develop it by getting examples, ideas and test material from
the treebank Talbanken. The project is hence heavily depending on three resources, which
will be described in this section.
3
GF editor 2 . Figure 2.1 shows commands for parsing, where the results are shown as in
figure 2.2.PhrGF uses Graphviz3 to visualize the trees and the given commands (fig. 2.1)
specify that the output format should be PDF and that these files should be opened by
the program Evince. The user can choose to see the parse tree (fig. 2.2a) or the abstract
Utt
tree (fig. 2.2b). Abstract trees are more verbose and show all functions and types used for
parsing the sentence.
Cl
PhrUtt : Phr
UseCl : S
Pron VPSlash NP
TTAnt : Temp PPos : Pol PredVP : Cl
N
see_V2 : V2 DetQuant : Det UseN : CN
Figure 2.2: Abstract tree and parse tree for the sentence “Jag ser katten”.
Parse trees on the other hand show only the types assigned to the words and phrases.
Information about tense, polarity etc., which are explicitly given in the abstract tree, are
not reproduced in the parse tree. Hence, parse trees do not give complete representations
but model the parse results in a transparent manner. For our example, the definiteness and
number of the noun ‘katten’ is shown as DetQuant DefArt NumSg in the abstract tree while
the parse tree only shows that the noun phrase consists of one noun. In the correspond-
ing English parse tree, figure 2.3, the noun is explicitly quantified by the article ‘the’, and
the determiner, the first argument to the function DetCN, is therefore shown in the parse tree.
2 http://www.grammaticalframework.org/demos/gfse/
3 http://www.graphviz.org/
Phr
Utt
abstract TestGrammar = {
cat N ; V ; S ;
fun
Pred : N -> V -> S ;
cat_N : N ;
sleep_V : V ;
}
4 http://www.molto-project.eu/
The example in figure 2.4 shows an abstract grammar defining three categories, one for
nouns, one for verbs and one for sentences. The abstract grammar also gives the function
types. In this case we have Pred, which tells us that by taking a noun and a verb we can
form a sentence. No information of how this is done is given at this stage. The grammar
also defines two words, the noun cat_N and the verb sleep_V.
lin Pred n v = n ++ v ;
cat_N = "katten" ;
sleep_V = "sover" ;
}
Figure 2.5 shows how the abstract grammar can be implemented for Swedish. Nouns, verbs
and sentences are all defined as strings, Str. The function Pred simply glues the two strings
‘katten’ and ‘sover’ together:
Pred cat sleep = "katten sover".
We get a more complicated example if we allow the nouns to be used in both plural and
singular. We add a category N’ to the abstract, which represents a noun with a fixed number,
and we introduce two functions for setting the number: NSg : N -> N’ and NPl : N -> N’.
Figure 2.6 introduces some new concepts: records, tables and parameters. In the concrete
syntax, N is defined to be a record consisting of the field s. The type of s, Num => Str shows
that it is a table, which given a parameter of type Num returns a string. Num is defined to
either have value Sg or Pl. The dot (.) is used for projection and the bang (!) as a selection
operator. n.s ! Sg thus means that we use the branch for Sg in field s of n.
When implementing an English version of the grammar, we encounter another problem:
the verb form depends on the number of the noun. We solve this by letting N’ carry
information about its number and letting Pred pass this on to the verb. Finally, the type
of V is put into a table, showing the verbs forms for each number.
concrete TestGrammarEng of TestGrammar = {
lincat S = Str ;
V = {s : Num => Str} ;
N = {s : Num => Str} ;
N’ = {s : Str ; num : Num} ;
lin
Pred n v = n.s ++ v.s ! n.num ;
NPl n = {s = n.s ! Pl ; num = Pl} ;
NSg n = {s = n.s ! Sg ; num = Sg} ;
cat_N = {s = table {Sg => "the cat" ;
Pl => "the cats"}};
sleep_V = {s = table {Sg => "sleeps" ;
Pl => "sleep"}};
param Num = Sg | Pl ;
}
We now have two implementations of the abstract, one for Swedish and one for English.
The resulting GF grammar is able both to parse a string to an abstract tree and to go in the
other direction; to produce a string of natural language given an abstract tree. This step is
called linearization. Translation is a consequence of this, we can parse a Swedish string and
then linearize the resulting abstract tree to English.
1st declination 5th declination This is the case for ‘flicka’, a noun belonging to the
flicka hjärta first declination. For others, like ”hjärta”, also the
flickan hjärtat plural form ”hjärtan” is needed. The worst case for
flickor hjärtan nouns is four needed forms, both singular and plural
flickorna hjärtana in definite and indefinite form. Section 4.1 will give
a more thorough description of the Swedish resource
grammar.
2.1.3 Frontiers of Grammatical Framework
As an open source-project, GF is constantly being developed and improved. New languages
are added, the compiler is being improved, ways of using it in more efficient and easy-going
manners are added and the possibilities to use GF in different environments increased. There
is research on how to make more use of the dependent types, for reasoning by using on-
tologies [Enache and Angelov, 2011] or generating natural language via Montague semantics
[Ranta, 2004].
2.2 Talbanken
For testing and evaluation of the grammar and lexicon, we needed to be able to compare
them against a reliable source. Talbanken [Einarsson, 1976] was perfect for our purpose,
being a freely available, manually annotated, large-scale treebank. It is analyzed with the
MAMBA annotation scheme (Teleman, 1974) and consists of four parts. Two of them are
transcriptions of spoken language, one a collection of text written by high school students,
and one, section P, consists of professionally written Swedish gathered from newspapers,
brochures and textbooks.
Talbanken was also used to train the Swedish version of the Malt parser [Hall, 2007] and was
then redistributed in an updated version, Talbanken05 [Nivre et al., 2006]. It is released in
Malt5 and Tiger6 XML-formats where the trees have been made deeper and more detailed
while still containing the lexical MAMBA layer. The Malt parser was trained on section
P of Talbanken, and these more than 6000 sentences have been used our project. The
treebank has served as an inspiration and an evaluation source throughout the project. An
automatic mapping between its trees and the abstract trees from GF has been done, which
will be explained in section 5.
2.3 Saldo
A good parser needs a good lexicon. We have used Saldo [Borin et al., 2008], a large elec-
tronic lexicon developed and maintained at Gothenburg University. It is built on Svenskt
Associationslexikon and contains information about more than 120 000 modern Swedish
words. For each word there is semantic, syntactical and a morphological information. The
user can find examples of usage in corpora, graphs of semantically connected words and
some suggestions for how to analyse compounds.
The semantic aspect of Saldo requires that words with multiple meanings are separated into
different entries. The word ‘uppskatta’ for example, has two entries, with slightly different
semantics: enjoy
(1) Jag uppskattar teater.
“I enjoy theater.”
5 http://w3.msi.vxu.se/ nivre/research/MaltXML.html
6 http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/doc/html/TigerXML.html
(a) Morphological info for (b) Graph showing the hyponyms of ‘katt’
‘katt’
or estimate:
2.4 Swedish
§
Swedish [Teleman et al., 1999, Inl. 3] is a North-Germanic language, closely related to
Norwegian and Danish. The languages share most of their grammatical structures and are
mutually intelligible. Swedish is also one of the official languages in Finland and altogether
spoken by approximately 9 million people. Swedish syntax is often similar to English, but
the morphology is richer and the word order slightly more intricate.
However, for some determiners, ie. ‘min’ (‘my’ ), the noun should be in indefinite form.
§
declarative main clause must consist of a verb. The normal word order is subject-verb-
object, but any syntactic category can be fronted [Holmes and Hinchcliff, 1994, 1027]. This
is called topicalisation and is very common, especially for temporal and locative adverbial
phrases. The examples 5 - 7 all have the same propositional meaning, but vary in how the
content is presented.
(5) Du ser inte mig.
you see not me
“You don’t see me.”
§
The s-passive is more commonly used than periphrastic passive, for both written and Swedish
[Teleman et al., 1999, Pass. 1] and dominates especially when the subject is inanimate.
§
Constructions with ‘det är/var’ (‘it is/was’ ) are very common in Swedish
[Holmes and Hinchcliff, 1994, 309d]:
(13) Det var roligt att höra.
it was nice to hear.
“I’m glad to hear that.”
’Det’ is also used as formal subject in presentational constructions where the real subject is
put in the position of an object.
(14) Det står en älg på fältet.
it stands a moose on the field
“There is a moose in the field.”
2.4.5 Reflexive pronouns
§
The Scandinavian language have special reflexive pronouns and reflexive possessive pronouns
for the 3rd person [Holmes and Hinchcliff, 1994, 310 & 319], distinct from the normal 3rd
person forms.
a. Han slog sig. b. Han såg sitt barn.
(15)
He hit him self. He saw his (own) child.
The LinGO Grammar Matrix [Bender et al., 2002], is a starter-kit for building Head-Driven Phrase
Structure Grammars [Pollard and Sag, 1994] (HPSG) providing compatibility with tools for parsing,
evaluation, semantic representations etc. Translation is supported by using Minimal Recursion Seman-
tics [Copestake et al., 1999] as an interlingua.
There is a collection of grammars implemented in this framework, giving broad-coverage descriptions
of English, Japanese and German. The Scandinavian Grammar Matrix [Søgaard and Haugereid, 2005]
covers common parts of Scandinavian, while Norsource (Hellan, 2003) describes Norwegian. A Swedish
version was based upon this (SweCore, Ahrenberg) covering the morphology and some differences be-
tween Swedish and Norwegian. Further, there is the BiTSE grammar [Stymne, 2006], also implemented
using the Lingo Matrix, which focuses on describing and translating verb frames.
The Swedish version of the Core Language Engine (CLE) [Gambäck, 1997] gives a full syntactic analysis
as well as semantics represented in ‘Quasi logical form’. A translation to English was implemented and
the work was further developed in the spoken language translator [Rayner et al., 2000]. Unfortunately, it
is no longer available. The coverage of the Swedish CLE is also reported to be very limited [Nivre, 2002,
p. 134].
In the TAG formalism [Joshi, 1975], there are projects on getting open source, wide-coverage grammars
for English and Korean, but, to our knowledge, not for Swedish.
The ParGram [Butt et al., 2002] project aims at making wide coverage grammars using the Lexical
Functional Grammar approach [Bresnan, 1982]. The grammars are implemented in parallel in order to
coordinate the analyses of different languages and there are now grammars for English, German, Japanese
and Norwegian.
Chapter 3
Importing Saldo
The lexicon provided with the GF resources is far too small for open-domain parsing. Ex-
periments have been made to use an interactive tool for lexical acquisition, but this should
be used for complementing rather than creating the lexicon. This section describes the pro-
cess of importing Saldo, which is compatible with GF, and easily translated to GF format.
As Saldo is continuously updated, the importing process has been designed to be fast and
stable enough to be redone at any time.
3.1 Implementation
The basic algorithm for importing Saldo was implemented by Angelov (2008) and produces
code for a GF lexicon. For each word in Saldo, it decides which forms should be used as
input to the GF smart paradigms. For a verb, this will in most cases mean giving the
present tense form, see figure 3.1.
mkV "knyter" ;
Figure 3.1: First code produced for the verb ‘knyta’ (‘tie’ )
All assumed paradigms are printed to a temporary lexicon, which will produce an in-
flection table for every entry when compiled. The tables are compared to the information
given in Saldo and if the tables are equal the code for the word is saved. If the table is
erroneous, another try is made by giving more forms to the smart paradigm. For example
3.1, the smart paradigm will fail to calculate the correct inflection table. In the next try
both the present and the past tense are given:
13
The program is run iteratively until the GF table matches the one given in Saldo, or
until there are no more ways of using the smart paradigm. The verb ’knyta’ will need three
forms:
the compounding form ‘väx-’) while the one generated in GF contains some that are not
in Saldo (e.g. ‘vuxits’). As GF concerns about syntax only, and not semantics, and as the
GF tables are automatically generated, they always contain all word forms, although some
forms may never be used in the natural language. Saldo may also contain variants: ‘växt’
and ‘vuxit’ are both supine form. As far as possible, the program makes up for this by
only comparing the overlapping forms and only requiring that GF generates one variant
whenever alternatives are given.
During this project, the program has been made more robust than the previous version. It
also prints log files providing information about the process of importing each word: which
paradigms that have been tried, the results from the comparisons of the inflection tables
and finally listings of the words that could not be imported.
Each entry in Saldo has an identifier, e.g. äta..vb, which is used as constant names
in the GF lexicon. However, the identifier may need some renaming since there are special
characters in the Saldo identifiers that should be avoided in GF function names. The
importation therefore needed some renaming. The Swedish letters å,ä,ö are translated into
aa,ae,oe respectively.
3.2 Results
The resulting dictionary contains more than 100 000 entries, approximately 80 % of the
total size of Saldo. There are a number of reasons why some words were not imported,
the most obvious one is that we do not want all categories from Saldo in the GF lexicon.
Prepositions, numerals, personal pronouns etc. are assumed to be present in the resource
grammars and should not be added again. Saldo contains many pronouns which are not
analysed the same way in GF (see section 4.1.1). Before adding them to our lexicon, we
need to do more analysing to find their correct GF-category. Some experiments on finding
the category have been done using Talbanken, see section 68.
Categories involving multiple words are usually handled as idioms and should be given
in a separate lexicon. In total six types of words were considered for the extraction:
Figure 3.5
Most but not all words of these categories have been imported. One reason why the im-
porting phase would fail is that Saldo, unlike GF, only contains the actually used word
forms. For technical reasons, the smart paradigm might need forms never used. Consider
for example the plural tantum noun ‘glasögon’ (‘glasses’ ). The smart paradigm requires
a singular form, and since the program could not find this in Saldo, there was no way of
adding the lemma to the lexicon. When the program failed to import a noun, this was often
the explanation. Words of this type may be added manually, for ‘glasögon’ we could use the
ostensibly correct singular form ‘glasöga’, although this has another meaning (‘glass-eye’).
The same problem occurred for the irregular s-verbs, (‘synas’ (‘show’) or umgås (‘social-
ize’)) which made up 61.5 % of the failing verbs of type vb.
In a few cases the smart paradigms could not generate the correct declination.
When testing the coverage of Talbanken, we found that there are around 2500 word forms
still missing, excluding the ones tagged as names and numbers. This number may seem very
high, but 4/5 of the word forms are compounds and when performing the intended parsing,
an additional analysis identifying compounds should be preformed before looking-up the
words in the lexicon. We should also take into consideration that we cannot automatically
find out how many actually stem from the same word, or how many abbreviations that
are present. Talbanken also contains a small number of spelling errors, which probably are
enumerated among our missing words. The majority of the missing words are only used
once.
Figure 3.6
A list of words that were given different labels in GF than in Talbanken has been composed,
consisting of about 1600 entries. Many of those are acceptable and reflects the difference
made in the analyses, such as the examples in table 3.7. Others are examples of words that
are still missing from the lexicon.
Figure 3.7
Valency information, which is crucial for GF, is not given in Saldo and hence not in the
imported lexicon. It remains as future work to find methods to extract this information
from Talbanken and to automatically build it into the lexicon.
Chapter 4
The grammar
An important part of this project has been to develop the Swedish GF grammar and to
adapt it to cover constructions used in Talbanken. As a grammar implementation can never
be expected to give full coverage of a language, we aim for a grammar fragment which gives
a deep analysis of the most important Swedish constructions. The starting point has been
the GF resource grammar and the new implementation is still compatible with this. Before
describing the actual implementation in section 4.2, we will give an introduction to the
resource grammars in general and to the Swedish implementation in particular.
17
Text
Punct Phr
Imp S QS
Predet Pron PN Det CN ListNP AdV V,V2,V3,V*,V2* AP Subj ListAdj IDet VPSlash
Card RCl
Numeral,Digits AdN RP
CAdv
Figure 4.3
Some rules for determination were described in section 2.4.1; if we have a common noun
phrase consisting of the parts ‘liten’ (‘small’ ) and ‘katt’ (‘cat’ ), there are three ways they
can be combined, as shown in figure 4.4. The noun may be used in definite or indefinite
form and the adjective in its weak or strong form [Josefsson, 2001, p. 31].
Figure 4.4
Hence, all determiners in our grammar must keep information about which definiteness
they require; the DetSpecies parameter is stored as an inherent feature of the determiner.
The resource grammar distinguishes between quantifiers (Quant), determiners (Det) and
predeterminers (Predet). Predeterminers modify NPs while the other to modifies CNs. The
differences are further shown in table 4.5.
The definite article is considered to be a quantifier, which has the forms en and ett for
singular. In plural it is the either ‘de’ or nothing, cf. sentence (17a) and 17b.
Figure 4.5
Figure 4.6
Due to the syntax oriented analysis in GF, the GF category for pronouns PN only contains
personal pronouns. Many words, like ‘somliga’ (‘some’ ), are considered to be pronouns
in other analyses, such as The Swedish Academy Grammar [Teleman et al., 1999], but are
classified differently in GF, usually as determiners or quantifiers as they may determine
noun phrases (18).
(18) Somliga studenter jobbar bara på nätterna
“Some student only work at night time”
VP
VP field finit neg adV inf comp obj adv
(han) har inte alltid tänkt på henne så
The verb phrase fields are not put in their correct order until the tense and type of clause
is determined, ie. when a sentence is created.
Swedish verbs may take up to five arguments [Stymne, 2006, p. 53]. These may be prepo-
sitions, particles, reflexive object, indirect objects and direct objects.
(19) Jag tar med mig den till honom
I take with me it to him
“I bring it to him”
The verb ‘ta’ in sentence (19) takes one particle (‘med’ ), one preposition (‘till’ ) and two
objects: ‘den’ and ‘honom’. In a GF lexicon this verb is given the category V3, a three-
place verb, taking two objects. The notion V3 is motivated by the formal translation:
bring(I,it,him).
Particles are given in the lexicon, as well as the prepositions that are chosen by the verb,
since these may not vary. The entry for ‘ta med ’, as used in sentence (19), is described as
follows:
ta_med_V3 = dirV3 (reflV (partV (take_V "med"))) (mkPrep "till") ;
Phr
The function dirV3 creates a three-place verb, where the first object is direct and the second
is to be
Phrused with the preposition given as the last argument: ‘till ’. reflV shows that the
verb always is used with a reflexive pronoun and partV gives the particle ‘med’. The fact
that the chosen prepositions isUttattached to the verb in the lexicon causes the parse tree
Utt
visualization algorithm to group them together. This is also the case for particles, cf. parse
tree 4.7b and 4.7a.
S S
Cl
Cl
NP VP
NP VP
Pron VPSlash NP
Pron V
V2 Pron
Figure 4.7: The visualized parse trees do not show the internal difference of chosen prepo-
sitions and particles
As already stated, the visualized parse trees is not a complete representation, even if the
verb phrases in the visualizations look the same, the two cases are treated and represented
differently internally. The fronting of the preposition, as in of sentence (20a)., is accepted
but fronting of particles, as in 20b., is not.
(20) a På pojken tittar du.
on the boy look you
“You look at the boy.”
b *På när du springer tittar jag
on when you run watch I
4.1.3 Clauses
The category clause, Cl, represents a pre-sentence, that does not yet have any tense, polarity
or word order set, see figure 4.8.
Figure 4.8
Like verb phrases, a clause may also be missing an object and then has the type ClSlash.
The ClSlash is formed by a VPSlash which is given a subject. This is a convenient way
to form questions, relative clauses and topicalized clauses (see figure 4.9), as introduced in
[Gazdar, 1981].
Figure 4.9
4.1.4 Overview
The original resource grammar could express complex sentences as the one in figure 4.10.
Even though the verb phrase “har inte ätit de gula äpplena idag” is discontinuous, the whole
phrase is still treated as one constituent in GF. The parts are connected in the tree, and the
subject ‘han’ is put between the finite verb and the rest of the phrase. At the code level, this
is done using the record type for VP, which consists of fields that can be put in different order.
table { Inv => verb.fin ++ subj ++ verb.neg ++ verb.inf ++ verb.compl ; ...
QS
QCl
Cl
NP VP
Pron VP Adv
VPSlash NP
V2 Det CN
Quant AP CN
A N
Figure 4.10: Parse tree for “Har han inte ätit de gula äpplena idag?”
Some studies suggest that the s-passive is used in more than 80 % of the times
[Laanemets, 2009]. It is however not as common in the other Scandinavian languages, where
not all words have passive forms for all tenses. The Norwegian translation of sentence (24)
is:
(25) Oppgaven ble skrevet av en student [NO]
uppsatsen blev skriven av en student [SE]
The corresponding Swedish sentence is acceptable, but not as natural sounding as sentence
(24). The resource grammar for Scandinavian therefore implemented the function for pas-
sive, PassV2, by using auxiliary verb.
PassV2 : V2 -> VP ;
ta -> blev tagen
The function allows two-place verbs to be used in passive by using bli (become), and thereby
turned into complete verb phrases; they no longer need an object.
During this project, the s-passive was added although the periphrastic passive is still allowed.
The grammar further allows not only V2, but all verb phrases that misses an object, to form
passives:
PassVP : VPSlash -> VP ;
ta -> togs
erbjöd -> erbjöds
A V3 like ‘give’ in sentence (26) hence gives rise to two passives, (28) and (27).
(26) Active use of two-place verb
Vi erbjöd henne jobbet
we offered her the job
“We offered her the job”
(27) First place in two-place verbs
Hon erbjöds jobbet
she offered+s the job
“She was offered the job”
The function combines a verb phrase, a determiner and a noun phrase to a clause. If the
determiner does not fulfill the requirements stated above, the clause is put to NONEXIST.
This works well for parsing, but leads to problems if the grammar is used for random
generation. The solution is thus not ideal, but since the definiteness of noun phrases or
determiners cannot be seen on the type level, it is not known until runtime whether the
determiner is accepted in the subject.
As a future direction it would be interesting to examine the consequences of letting the
noun phrases have more information on the type level. In the implementation for reflexive
objects (see section 4.2.3), dependent types are used for showing if a noun phrase needs an
antecedent. We would also like to differentiate between the NPs in sentence (36a,b), where
‘av ’ should be used only when the noun has an explicit article or determiner.
(41) a. Han är längre än sin kompis. b. Han är här oftare än sin kompis.
He is taller than self’s friend. He is here more often than self’s friend.
(42) a. Hon tyckte om skolan och alla sina elever.
She liked the school and all self’s students.
b. Han såg sina få böcker och sin penna.
He saw self’s few books and self’s pencil.
Reflexive pronouns can not be used in subject noun phrases of finite sentences, as shown
by the ungrammatical examples in sentence (46) and (43). The third person reflexives
(‘sig’,‘sin’) requires a third person antecedent (see 45). Furthermore, the antecedent must
be within the same finite sentence as the reflexive pronoun, see (46). The grammar should
not accept any of these sentences:
(43) *Sina vantar var kvar på tåget.
self’s gloves were left on the train.
Apart from these restrictions, noun phrases containing reflexive pronouns may be used
as any other NP. They may be conjoined (42 a,b) and used with other determiners (39).
In the standard GF analysis, which is preformed bottom-up starting from the POS-tags,
information about semantic roles are given by the functions, not by the categories. That
is, we know that the first argument of the function PredVP acts as the subject, but the
noun phrase itself does not carry information about its semantic role. Until it is given as
an argument to a clause level function, no difference is made between subject and object
noun phrases. For this reason, the formalization of reflexive pronouns required the use of a
different analysis. In short, what is wanted can be summarized as follows:
The dependency spreads through the grammar and in order to avoid all code duplication
we use another approach. When looking at the type of the functions, we notice that they
can be generalized:
PredetNP : Det -> NP x -> NP x;
The solution chosen in this project is to make use of this generalization and introduce the
use of dependent types in a resource grammar. Following the idea given above, we make a
difference between subjects and objects, but not by giving them entirely different types, but
by letting the type NP depend on an argument, which may either be Subject or Object.
cat NP NPType ;
PredVP : NP Obj -> VP -> Cl ;
ComplSlash : VPSlash -> NP Obj -> VP ;
PredetNP : (a : NPType) -> Det -> NP a -> NP a ;
The types for adverbial and adjectival phrases are also turned into dependent types.
The modal verb ‘ska’ signals an intention of committing the action, either from the subject
or from the speaker. Cf.
The verb ‘kommer’ (‘come’ ), normally used with the infinite marker att, does not signal
any intention, but that the speaker has belief that it will actually come true.
The resource grammar included ‘ska’, which was implemented as the standard way of form-
ing future tense, and hence represents the translation of ‘will’. The new grammar also
supports “kommer att”, expressed as an alternative future tense.
Figure 4.11: Result of parsing “Det kommer att bli mörkt snart”. The future tense is
marked by the constant TFutKommer
Since two out of the three Scandinavian languages share this tense, it has been added to
the Scandinavian Extra module. Table 4.12 shows how the grammar expresses new tense
in different types of sentences.
Figure 4.12: The covered and accepted usage of the future tense with ‘komma’
4.2.5 Modifying verb phrases
Focusing adverbs
The GF analysis distinguishes between two categories of adverbs: Adv and AdV. The AdV –
e.g. ‘aldrig’ (‘never’ ) and ‘inte’ (‘not’ ) – attaches directly to the verb.
(55) Cf.
a. Jag äter aldrig fisk b. Jag äter fisk nu
I eat never fish I eat fish now
“ I never eat fish” “I eat fish now”
The difference is implemented by having separate fields in the VP table for the two categories.
Main clause : subj ++ verb.fin ++ verb.adV ++ verb.inf ++ verb.adv
han har aldrig varit här
The adverb ‘bara’ may be used as an AdV but also before the finite verb, when emphasizing
the verb itself. This is an example of a focusing adverb, others examples are ‘inte ens’ (‘not
even’ ) and ‘till och med’ (‘even’ ).
(56) a. Han bara log b. Hon till och med visslar
he only smiled she even whistles
“ He just smiled” “She even whistles”
Focusing adverbs are accepted by the new grammar implementation, where they have their
own field in the VPtable.
Relative clauses
The resource grammar already gave a good coverage of relative clauses and embedded sen-
tences. All constructions used in examples 60a-c were accepted.
(60) a. Pojken, som är blyg, tystnar
“The boy, who is shy, falls silent”
§
definite form, whenever the noun phrase is followed by a restrictive relative clause
[Holmes and Hinchcliff, 1994, 329]. Talbanken contained several examples where the mod-
ified noun is in the indefinite form as in (61).
(61) de uppfattningar som förs fram ...
the opinions that put+passive forward ...
“the opinions, that are presented ...”
When no relative clause is present, the definite form with the postnominal definite article
must be used, cf. sentence (62).
(62) a. de uppfattningarna förs fram
“the opinions are presented”
b. *de uppfattningar förs fram
Apart from this, only corrections have been done, exemplified in this sentence.
(63) Hon sover som är bra Hon sover, vilket är bra
“She sleeps, which is good”
As a side note, some complications regarding the function RelCl can be pointed out.
The English implementation of this construction is ‘such that’, and the Swedish version
‘sådan att’ sounds awkward, except when used in of logic and mathematics books.
By starting from a list containing all pronouns in Saldo and all forms they may occur
in, Talbanken was harvested for the ones that both occurred with the determiner tag DT
and with the present participle tag SP. This way we are left with a limited number of words
that could easily be analysed and added to the lexicon.
This method is both faster and more reliable than going through the list by hand and
categorising them manually by looking up each word or by using introspection.
4.3 Testing
Throughout the project regression testing has been used. Every time the grammar was
updated it was tested against a treebank consisting of 155 sentences, and the number of
parse trees were compared to a standard. The purpose was first of all to make sure that the
coverage was not decreased, but also to make sure that we would notice if new ambiguities
or overgenerating functions were introduced. If so, the ambiguities could be removed when
possible, and when not, we were at least made aware of them and could decide whether the
increase of coverage were worth the increase in ambiguities.
New functions were also tested by random generating and by creating tables showing all
forms, such as table 4.12.
Chapter 5
Extracting a GF treebank
Talbanken contains much valuable information about phrase structures and word usage.
It is analyzed with the MAMBA annotation scheme (Telemann, 1974). One part of this
project has focused on translating trees from Talbanken to GF by constructing a mapping,
which automatically transforms trees in the Talbanken format to GF abstract syntax trees.
We hence get a comparison between the two annotations and at the same time we get
methods for extracting and translating to GF notation. Figure 5.1 shows an example of a
visualized Talbanken05 tree of the sentence “Katten på bilen blir större” (“The cat on the
car gets bigger”) and the result of translating it to GF. Both the POS tags as well as the
syntactic information are needed during the translation, so that all the information shown
in the abstract tree (5.1b) can be extracted. We need to know that katten is a noun used
in definite form (DefArt), singular (NumSg). Even though the notation is translated, the
original analyse from Talbanken is preserved. på bilen should still form a subtree attached
to katten.
The mapping is based on a translation of the English Penn Treebank [Angelov, 2011a].
By modifying this program, we now have a translation that works for the Tiger XML-
format of Talbanken. Adaption was required for the differences in annotation as well as for
the syntactic differences between Swedish and English.
The translation gives us means to evaluate our parser. By both parsing a Talbanken
sentence and transforming its annotated tree, we can easily inspect if the results are equal.
Additionally, the mapping shows which grammatical constructions that are still missing
from the GF grammar and shows how the GF analysis differs from the one made in Tal-
banken. If there are words missing from our dictionary, the rich POS-tags may help us to
automatically find the correct declination and add it to the lexicon. Further, our parser will
need probabilities (see section 6.2) of how often a function is used. The GF treebank we
achieve from the translation is a good source for this information.
35
Phr
Utt
ROOT
S
MS
S Cl
SS FV SP IP NP VP
PhrUtt : Phr
NP BVPS AJKP IP
NoPConj : PConj UttS : Utt NoVoc : Voc
NP Adv VA AP
CN Prep NP A
TTAnt : Temp PPos : Pol PredVP : Cl
NNDD PP
(a) Talbanken tree (b) GF abstract tree (c) The corresponding GF parse tree
Figure 5.1: Trees for the sentence “Katten på bilen blir större”.
due to the level of detail, there are for example almost 50 tags for nouns, excluding proper
names. The tags show definiteness, case and whether the word is a compound. Some words
also have their own tags.
The translation covers complex noun phrases which may consist of pronouns, reflexive ob-
jects or common nouns and contain predeterminers, determiners, quantifiers and adjectival
modifiers. Special rules were needed for each verb category in order to find the right num-
ber of arguments. The mapping also covers different word order, such as questions and
topicalisation.
UseCl : S
uppskatta_V2 : V2 they_Pron : V2
simply glue the subject to the verb with PredVP. For each similar GF construction, an extra
rule would be needed to get full coverage.
NP ++OC VN NP CNP
PP PP ! !aa
PP PP ! a
oskifta dödsbon och familjestiftelser oskifta dödsbon ++ C+
++OC VN
och familjestiftelser
for a conjunction. The focus on the mapping has been simple sentences, in order to cover
the most fundamental structures. If the program is extended and adapted to cover complex
conjunction, a deeper evaluation should be preformed to see which version that suits our
needs the best. The deeper implementation also gives some more information about verb
phrases, by the tag VG (verb group), which groups an infinitival verb and its object together
into a VP. This information can however be extracted from the flat implementation, and the
results get slightly better when using this version.
When evaluating the mapping, the results strongly depend on which restrictions we put
on the input. One of the reasons why a node cannot be translated, is the use of the tags
show in figure 5.5. The PU tag is used for graphic listings, and not for fluent text. In our
grammar there is naturally no corresponding function; the listings are meant for making the
text look nice in folders etc and outside the scope for the grammar itself. The tags XX and
NAC are often used since Talbanken makes a difference between subject and object noun
phrases. The analysis of elliptical expression in (72)
(72) För stora krav.
“Too high demands.”
contains the tags XX and NAC, since it is not obvious whether the noun phrase is used
as subject or an object. The tags shown in figure 5.5 occur quite frequently in the treebank
and are always translated to metas, which lowers our result.
Figure 5.5
The main goal has been to be able to translate shorter sentences, with no idioms or
conjunction. If we assure that the lexicon contains the correct word class for all lemmas
involved, we can restore more than 85 % of the nodes in the original tree. If we lift all the
restrictions excluding the PU, we get 65 % coverage. If we test randomly collected sentences
that do not contain any of the tags listed in figure 5.5, 72 % can be restored (see figure 5.6)
No list items 65 %
No special punctuation or bad tags 72 %
Short sentences with known words 85 %
Figure 5.6
A mapping between GF and the Wall Street Journal of Penn Treebank has earlier been
conducted [Angelov, 2011b]. The percentage of restored nodes from Peen Treebank is higher
than our results. The reason for this may be the fact that English is syntactically less
complicated than Swedish. Furthermore, the text in Talbanken are from various brochures,
newspapers and text books, where idiomatic expressions are more likely the be appear and
the language presumably less strict than in Wall Street Journal1 . Also, the Penn Treebank
contains a lower number of tags, 82 compared to more than 300 in Talbanken. Even if the
tags describing the same word class as another tag are excluded, Talbanken still leaves us
with more than 130 tags. With more tags, we get more information, but as the number
increase, so does the amount of work of finding the correct translation for each combination
of tags and writing rules that cover all constructions.
We believe that the results could be enhanced by simply adding more rules and in this
way get a wider coverage. There are many special cases that require special rules. Since we
are not aware of any formal specification of how the tags may be combined in Talbanken,
the only way of finding all possibilities are to manually look for different patterns. Another
option would be to make the mapping more robust, but the robustness must not interfere
with the correctness.
1 www.wsj.com
Chapter 6
Discussion
6.1 Evaluation
The project has resulted in
a large-scale GF lexicon and a program to redo the importation when needed
an extended grammar covering an important part of Swedish
a comparison between GF and another annotation
a deeper testing of the Swedish resource grammar and an estimation of how well GF
can be used to describe larger parts of a language
a study of how dependent types can be used in the resource grammars
Besides being capable of reimporting Saldo, the lexicon extraction program could also
be modified for importing other lexical resources. The only requirement is that the resource
provides inflection tables.
The grammar has been extended and enhanced, and its current status is a specialized
extension of the resource grammar. Besides parsing, the grammar may well be used for
language generation, which works fast even when using an extensive lexicon. Although it is
not been formally verified, we believe that the majority of the sentences generated are gram-
matically correct in a syntactical point of view. Without any semantic knowledge, nonsense
phrases cannot be avoided in random generation. However, given that the abstract tree
has a meaningful interpretation, the linearization should be correct. There are some cases
when the correctness of the output has been put aside in order to increase the expressivity,
such as ComplBareVV, which use a verb-complement verb without the infinitive mark, as in
sentence (73b).
(73) a. Jag börjar att bli hungrig
I begin to become hungry
“I’m getting hungry”
b. Jag börjar bli hungrig
I begin become hungry
“I’m getting hungry”
41
Since there is no information in the grammar or lexicon about which verbs that allows the
infinitive mark to be left out, it will also allow the more questionable sentence (74b).
(74) a. Jag låter bli att titta
I let be to watch
“I refrain from watching”
b. *Jag låter bli titta
I leave be watch
As these functions serve to provide stylistic flexibility when parsing, they can be left out
when generating language, and the grammar then generates text of good grammatical qual-
ity.
When it comes to parsing, we do not get far without robustness. The grammar in itself is
by no means robust, and just one unexpected punctuation mark, unknown word or ellipsis
will cause the parsing of the whole sentence to fail.
Parsing bigger parts of Talbanken would hence give very low results at this stage, and a
comparison of the results would not be of much value as there would not be enough material
to do be able to do any interesting analysis. An estimation of the improvement can be given
by looking at the results from running the test suite used for the grammar development. The
sentences given in the test suite are short, four to 10 words, and the words are in most cases
included in the lexicon, but there are also constructions that have not been implemented
during this project. The first grammar could parse less than half of the sentences, the result
for the final grammar was 66 %. It is thus not yet interesting to talk about coverage, but
about the quality and the ability to scale up, which has so far proved to be good. I further
believe that the presence of a expert in Swedish, professor Elisabet Engdahl1 , has increased
the standard substantially.
By the renewed import of Saldo, we have doubled the size of the lexicon and thereby added
many of the commonly used words that were missing from the older version. This is of
course a big improvement. However, the lexical part still requires much work before it can
be made use of. We need valency information to make a good analysis. The lexicon is also
too big to use with the current techniques. Its size causes the incremental parsing algorithm
to use more heap memory than normal computers allow. To solve this, we need to use the
lexicon data more cleverly.
1 http://svenska.gu.se/om-oss/personal/elisabet-engdahl
6.2 Future Work
At the end of the current project, we are left with many interesting future directions. The
future work described in this section is divided into two categories: the ones aiming at
making the parser robust and the ones that can be seen as extensions of the work done so
far.
Pronominal object shift is common in Swedish and obligatory in Danish and Norwe-
gian.
(75) a. Jag ser honom inte b. Jag ser inte honom
I see him not I see not him
Personal pronouns are allowed to precede negation and some other sentence adverbials
in main clauses without auxiliary verbs.
(76) a. *Vi har honom inte sett b. * Jag ser huset inte
we have him not seen b. I see the house not
Although object shifts are frequently used, they are hardly found in Talbanken’s part
P, which has been the inspiration for this project. Therefore, this implementation has
so far not been prioritized.
This however is not the correct analysis for ‘hund’ (‘dog’ ) in sentence (78).
(78) Vi ska köpa hund
we will buy dog
“We are going to buy a dog”
The implementation of reflexive pronouns can be improved. It should for example be
possible to differentiate between object and subject control verbs.
Lexicon
Our lexicon still lacks some parts that can be imported from Saldo. The multiple-word
verbs (vbm) and multiple-word nouns (nm) should be imported to an idiom lexicon, which
can be used as an extension to the main lexicon. For the words that we tried but failed
to import, another tool for lexicon acquisition could be used. The tool developed in the
previous part of this project2 would be suitable. All in all, it should be ensured that we use
as much information as possible from our resources.
Lexicon
No matter how much we increase the size of our lexicon, we will never cover all compounds.
We therefore need to be able to do compounding analysis. Saldo has tools for this, which
might be usable in this project as well [Forsberg, 2007]. As noted, we additionally need to
come up with better methods for using the lexicon when parsing, since its time and memory
consumption is very high. One possibility is of course to do deeper refinements of the
parsing source code, an extensive work which is far outside the scope for this project. Other
solutions are to either adapt the lexicon automatically for its domain and hence making it
smaller and faster, or to adapt the input sentences to a smaller lexicon by preprocessing.
Further, the lexicon needs valencies since this information lays the foundation for the GF
analysis. During the mapping, described in section 5, we analysed how Talbanken annotates
2 web.student.chalmers.se/˜mahlberg/SwedishGrammar.pdf
valency information. It should be possible to extract data from Talbanken, showing how
words are normally used. Lexin3 provides list of valencies information, from which lexical
information could be extracted, given that the data is freely available. This is also a source
of information helping us separate between verbs that can be used without an infinitive
marker from the ones that cannot, cf. (74a-b). However, Lexin is relatively small and does
not contain more than 10 000 entries.
Many Swedish verbs can be used with different number of arguments. Having one lexical
entry for every possible usage of a verb does not seem to be a good idea considering the
ambiguities it will lead to and the already high time usage. The task should instead be left
to the robust layer of the parser, possibly implemented by using external resources.
Probabilities
The grammar will always contain ambiguities and before returning the result to the user,
some analysis should be done to find the most probable tree. When the size of the lexicon
increases, so will the ambiguities as the number of word forms overlapping each other gets
higher. Our new grammar also have many more rules which contribute to increasing the
number of interpretations of each sentence.
GF already allows probabilities, and by using a large GF treebank, we can get more
reliable statistics for our functions. By implementing dependency probabilities, our results
would be even better.
6.3 Conclusion
We have developed the main components for a deep Swedish parser; an extended grammar
and lexicon and material for evaluation and disambiguation. By starting from the GF
resource grammar, we got a well-defined system for describing language.
Our final goal is to be able to parse unrestricted text, but considering that no syntactic
theory is yet capable of wholly covering a natural language, we are satisfied with the gram-
mar implementation of an important fragment of Swedish and have focused on constructions
that are frequent in Talbanken. Being a natural language processing-project, it will prob-
ably never be entirely complete, but by combining the rule-based parser with statistical
techniques, as described in section 6.2, we still believe that it is possible to achieve our goal.
All parts of the project are open-source and may thus be used in other applications. The
grammar and the lexicon may be beneficial also when working with controlled languages,
as it increases the coverage of the Swedish resource grammar.
The constructions that we have focused on have all been possible to implement, with
varying amounts of work. Many of them could be done by utilizing and extending the
resource library but in some cases we needed to part from the multilingual abstract and use
other grammatical theories in order to arrive at a good analysis.
3 http://spraakbanken.gu.se/lexin/
Bibliography
46
[Enache and Angelov, 2011] Enache, R. and Angelov, K. (2011). Typeful Ontolo-
gies with Direct Multilingual Verbalization. LNCS Post-Proceedings of the Con-
trolled Natural Languages Workshop (CNL 2010) , Marettimo, Italy. Available from:
http://publications.lib.chalmers.se/cpl/record/index.xsql?pubid=150487.
[España-Bonet et al., 2011] España-Bonet, C., Enache, R., Slaski, A., Ranta, A., Marquez,
L., and Gonzalez, M. (2011). Patent translation within the MOLTO project. Available
from: http://www.molto-project.eu/sites/default/files/patentsMOLTO4.pdf.
[Forsberg, 2007] Forsberg, M. (2007). The Functional Morphology Library. Available from:
http://www.cs.chalmers.se/˜markus/FM Tech Report.pdf.
[Gambäck, 1997] Gambäck, B. (1997). Processing Swedish Sentences: A Unification-Based
Grammar and Some Applications.
[Gazdar, 1981] Gazdar, G. (1981). Unbounded Dependencies and Coordinate Structure.
Linguistic Inquiry, 12:155–184.
[Hall, 2007] Hall, J. (2007). A Hybrid Constituency-Dependency Parser for Swedish.
[Holmes and Hinchcliff, 1994] Holmes, P. and Hinchcliff, I. (1994). Swedish - A Comprehen-
sive Grammar. Routledge, London, 2nd edition.
[Josefsson, 2001] Josefsson, G. (2001). Svensk universitetsgrammtik för nybörjare. Stu-
dentlitteratur.
[Joshi, 1975] Joshi, A. K. (1975). Tree adjunct grammars. Journal of Computer and System
Sciences archive.
[Kokkinakis and Kokkinakis, 1999] Kokkinakis, D. and Kokkinakis, S. J. (1999). A Cascaded
Finite-State Parser for Syntactic Analysis of Swedish. In In Proceedings of the 9th EACL,
pages 245–248.
[Laanemets, 2009] Laanemets, A. (2009). The passive voice in written and spo-
ken Scandinavian. Available from: http://gagl.eldoc.ub.rug.nl/root/2009-49/2009-49-
07/?pLanguage=en&pFullItemRecord=ON.
[Ljunglöf, 2004] Ljunglöf, P. (2004). Expressivity and Complexity of the Grammatical Frame-
work. PhD thesis. Available from: http://www.ling.gu.se/˜peb/pubs/Ljunglof-2004a.pdf.
[Ljunglöf et al., 2005] Ljunglöf, P., Bringert, B., Cooper, R., Forslund, A.-C.,
Hjelm, D., Jonson, R., and Ranta, A. (2005). The TALK Grammar Li-
brary: An Integration of GF with TrindiKit. Available from: http://www.talk-
project.org/fileadmin/talk/publications public/deliverables public/TK D1-1.pdf.
[Martin-Löf, 1984] Martin-Löf, P. (1984). Intuitionistic type theory. Notes by Giovanni
Sambin of a series of lectures given in Padua, June 1980.
[Milner et al., 1997] Milner, R., Tofte, M., and Macqueen, D. (1997). The Definition of
Standard ML. MIT Press, Cambridge, MA, USA.
[Nivre, 2002] Nivre, J. (2002). What Kinds of Trees Grow in Swedish Soil? A comparison
of four annotation schemes for Swedish.
[Nivre et al., 2006] Nivre, J., Nilsson, J., and Hall, J. (2006). Talbanken05: A Swedish
treebank with phrase structure and dependency annotation. In In Proceedings of the
fifth international conference on Language Resources and Evaluation (LREC2006), pages
24–26.
[Norell, 2008] Norell, U. (2008). Dependently typed programming in Agda. In In Lecture
Notes from the Summer School in Advanced Functional Programming.
[Pollard, 1984] Pollard, C. (1984). Generalized phrase structure grammars, head grammars
and natural language. PhD thesis.
[Pollard and Sag, 1994] Pollard, C. and Sag, I. (1994). Head-Driven Phrase Structure Gram-
mar. University of Chicago Press.
[Ranta, 2004] Ranta, A. (2004). Computational Semantics in Type Theory. Mathematics
and Social Sciences, 165:31–57.
[Ranta, 2009] Ranta, A. (2009). The GF Resource Grammar Library. Linguistic Issues in
Language Technology.
[Ranta, 2011] Ranta, A. (2011). Grammatical Framework: Programming with Multilingual
Grammars. CSLI Publications, Stanford.
[Rayner et al., 2000] Rayner, M., Carter, D., Bouillon, P., Digalakis, V., and Wirén, M.
(2000). The spoken language translator. Cambridge Univerisy press.
[Seki et al., 1991] Seki, H., Matsumura, T., Fujii, M., and Kasami, T. (1991). On
multiple context-free grammars. Theor. Comput. Sci., 88:191–229. Available from:
http://dx.doi.org/10.1016/0304-3975(91)90374-B.
[Simon Thompson, 1999] Simon Thompson (1999). Haskell: The Craft of Functional Pro-
gramming. Addison-Wesley, 2nd edition.
[Stymne, 2006] Stymne, S. (2006). Swedish-English Verb Frame Divergences in a Bilin-
gual Head-driven Phrase Structure Grammar for Machine Translation. Master’s thesis,
Linköping University.
[Søgaard and Haugereid, 2005] Søgaard, A. and Haugereid, P. (2005). A brief documentation
of a computational HPSG grammar specifying (most of) the common subset of linguistic
types for Danish, Norwegian and Swedish. Nordisk Sprogteknologi 2004, pages 247–56.
[Tapanainen and Järvinen, 1997] Tapanainen, P. and Järvinen, T. (1997). A non-projective
dependency parser. In In Proceedings of the 5th Conference on Applied Natural Language
Processing, pages 64–71.
[Teleman et al., 1999] Teleman, U., Hellberg, S., and Andersson, E. (1999). Svenska
Akademiens grammatik. Svenska Akademien, Nordstedts.
[Xiaochu Qi, 2009] Xiaochu Qi (2009). An Implementation of the Language Lambda Prolog
Organized around Higher-Order Pattern Unification. CoRR.