The Soundness of Internalized Polarity Marking

The Soundness of Internalized Polarity Marking Lawrence S. Moss Department of Mathematics Indiana University Bloomington, IN 47405 USA Comments/Corrections/Criticism Welcome! Abstract This paper provides a foundation for the polarity marking technique introduced by David Dowty [3] in connection with monotonicity reasoning in natural language and in linguistic analyses of negative polarity items based on categorial grammar. Dowty’s work is an alternative to the better-known algorithmic approach first proposed by Johan van Benthem [11], and elaborated by Vı́ctor Sánchez Valencia [10]. Dowty’s system internalized the monotonicity/polarity markings by generating strings using a categorial grammar whose categories already contain the markings that the earlier system would obtain by separate steps working on an already-derived string. Despite the linguistic advantages of the internalized system, no soundness proof has yet been given for it. This paper offers an account. The leading mathematical idea is to interpret categorial types as preorders (in order to talk about monotonicity in the first place), and then to add a primitive operation to the type hierarchy of taking the opposite of a preorder (in order to capture monotone decreasing functions). At the same time, the use of internalized categories also raises issues. Although these will not be addressed in full, the paper points out possible approaches to them. 1 Introduction One enterprise at the border of logic and linguistics is the crafting of logical systems which capture aspects of inference in natural language. The ultimate goal is a logical system which is strong enough to represent the most common inferential phenomena in language and yet is weak enough to be decidable. Another goal would be to use representations which are as close as possible to the surface forms. This second goal is likely to be unobtainable, and when one backs off to consider structured representations, it is natural to turn to those suggested by linguistic formalisms of one form or another. The particular strand of this enterprise that concerns us in this paper is the natural logic program initiated by Johan van Benthem [11, 12], and elaborated in Vı́ctor Sánchez Valencia [10]. As mentioned in van Benthem [13], the “proposed ingredients” of a logical system to satisfy the goals would consist of several “modules”: (a) Monotonicity Reasoning, i.e., Predicate Replacement, (b) Conservativity, i.e., Predicate Restriction, and also (c) Algebraic Laws for inferential features of specific lexical items. We are only concerned with (a) in this paper, and we are mainly concerned with an alternative to van Benthem’s seminal proposal. There are several reasons why monotonicity is worthy of study on its own: it is arguably tied up with linguistic phenomena, it is a pervasive phenomenon in actual reasoning, and finds its way into computational systems for linguistic processing (cf. Nairn, Condoravdi and Karttunen [8] and MacCartney and 1 Manning [6]). The leading idea in van Benthem’s work is to develop a formal approach to monotonicity on top of categorial grammar (CG). This makes sense because the semantics of CG is given in terms of functions, and monotonicity in the semantic sense is all about functions. The success story of the natural logic program is given by the monotonicity calculus, a way of determining the polarity of words in a given sentence. Before turning to it, we should explain the difference between monotonicity and polarity. We follow the usage of Bernardi [1], Section 4.1: The differences between monotonicity and polarity could be summarized in a few words by saying that monotonicity is a property of functions . . .. On the other hand, polarity is a static syntactic notion which can be computed for all positions in a given formula. This connection between the semantic notion of monotonicity and the syntactic one of polarity is what one needs to reach a proof theoretical account of natural reasoning and build a natural logic. This point might be clarified with an example of the monotonicity calculus. Consider the sentence No dog chased every cat. The sentence is to be derived in some categorial grammar, and to be brief we suppress all the details on this except to say that a derivation would look like the trees below, except without the + and − marks. Given the derivation, one would first add some notations to indicate that the interpretation of every, as a function from noun meanings, is monotone decreasing but results in a function that is monotone increasing. Also, the interpretation of no, as a function from noun meanings, is monotone decreasing and again results in a function that is monotone decreasing. every+ cat− chase every(cat)+ no dog no(dog)+ chase(every(cat))− no(dog)(chase(every(cat)))+ every− cat+ chase every(cat)− no dog no(dog)+ chase(every(cat))− no(dog)(chase(every(cat)))+ Monotonicity Marking Polarity Determination + − + + − + On the left, we have monotonicity markings in the derivation tree itself. These markings were propagated from the information about every and no according to a rule which need not concern us. One then turns the Monotonicity Marking tree into the Polarity Determination tree. The rule is: for each node, count the number of − signs from it to the root of the tree (at the bottom). This changes the signs for every, cat, and every cat. The resulting signs are the polarities. For example, the + marking on cat indicates that this occurrence of cat might be replaced by a word with a “larger” value. For example, replacing cat with animal gives a valid inference: No dog chased every cat implies No dog chased every animal 1 . On the other hand, dog is marked −, corresponding to the fact that replacement inferences with it go in the opposite direction. So replacing dog with animal would not give a validity; instead, replace with the “smaller” phrase old dog. As Dowty [3] mentions, “The goal [in his paper] is to ‘collapse’ the independent steps of Monotonicity Marking and Polarity Determination into the syntactic derivation itself, so that words and constituents are generated with the markings already in place that they would receive in Sánchez’ polarity summaries.” His motivation is mainly linguistic rather than logical: “the format of Sánchez’ system . . . is no doubt highly appropriate for the logical studies he developed it for, it is unsuitable for the linguistics applications I am interested in.” For this reason, he gave an “alternative formulation.” This system is “almost certainly equivalent to Sánchez’,” but he lacked a proof. Bernardi [1] also notes that while her system and Sánchez’ “are proven to be sound, no analogous result is known for” the system proposed by Dowty. However her presentation makes it clear that there are advantages to the internalization of polarity markers. Others who use a system inspired by [3] include Christodoulopoulos [2]. The purpose of this short paper is to offer a soundness proof for internalized monotonicity markings, and to discuss related issues. The first of these issues has to do with a choice in the overall categorial 1 Incidentally, we are taking this sentence and others like it in the subject wide-scope reading: there is no dog with the property that it chases every cat. 2 architecture. We wish to take seriously the idea that syntactic categories might be interpreted in ordered settings. Given two types σ and τ , we want to consider two resulting types: the upward monotone functions and the downward monotone functions. The first alternative is to generate types as follows: begin with a set T0 of basic types, and then build the smallest set T1 ⊇ T0 closed in the following way: If σ, τ ∈ T1 , then also (σ, τ )+ ∈ T1 and (σ, τ )− ∈ T1 . This would seem to be the most straightforward way to proceed, and it actually seems to be close to the way that is presented in the literature2 . Indeed, we worked out the theory this way before deciding to rewrite this paper based on a second alternative. This second way is based on the observation that a downward monotone function from a preorder P to a preorder Q is the same thing as an upward monotone function from P to the opposite order −Q. This order −Q has the same points as Q, but the order is “upside down.” The suggestion is that we generate types a little differently. We take smallest set T1 ⊇ T0 closed in the following way: If σ, τ ∈ T1 , then also (σ, τ ) ∈ T1 . If σ ∈ T1 , then also −σ ∈ T1 . We shall call this the fully internalized scheme. This includes copies of the types done the first way, by viewing (σ, τ )+ as (σ, τ ) and (σ, τ )− as (σ, −τ ). It is a strictly bigger set, since −σ for the basic types enters into the fully internalized hierarchy but not the lesser one. 2 Background: Preorders and their Opposites For the work in this paper we need to use preorders. Our goal in this section is to state the facts which are needed in as compact a manner as possible, and also to mention a few examples that will be used in the sequel. A preorder is a pair P = (P, ≤) consisting of a set P together with a relation ≤ which is reflexive and transitive. This means that the following hold: 1. p ≤ p for all p ∈ P . 2. If p ≤ q and q ≤ r, then p ≤ r. Frequently one assumes that the order has additional properties: it may be also anti-symmetric, or a lattice order, etc. For most of what we are doing, no extra assumptions are needed. Indeed, the specific results do not use the preorder properties very much, and some of the work in this section goes through for absolutely arbitrary relations. However, since the intended application is to a setting in which the relation represents some kind of inference, it makes sense to state the results for the weakest class of mathematical objects that has something that looks like “inference,” and this seems to be preorders. Example 2.1 For any set X, we have a preorder X = (X, ≤), where x ≤ y iff x = y. This is called the flat preorder on X. More interestingly, for any set X we have a preorder P(X) whose set part is the set of subsets of X, and where p ≤ q iff p is a subset of q. Another preorder is 2 = {F, T} with F ≤ T. The natural class of maps between preorders P and Q is the set of monotone functions: the functions f : P → Q with the property that if p ≤ q in P, then f (p) ≤ f (q) in Q. When we write f : P → Q in this paper, we mean that f is monotone. We write [P, Q] for the set of monotone functions from P to Q. [P, Q] is itself a preorder, with the pointwise order : f ≤ g in [P, Q] iff for all p ∈ P , f (p) ≤ g(p) in Q 2 For example, Dowty [3] states his category formation rules in this way, adding directional versions. But shortly thereafter he mentions that lexical items will “be entered in two categories.” Adding downward-marked versions of the base categories is tantamount to what we call the fully internalized scheme. 3 Let P and Q be preorders. The product preorder P × Q is the preorder whose universe is the cartesian product P × Q, and with (p, q) ≤ (p0 , q 0 ) iff p ≤ p0 in P and q ≤ q 0 in Q. Preorders P and Q are isomorphic if there is a function f : P → Q which is bijective (one-to-one and maps P onto Q) and also has the property that p ≤ p0 iff f (p) ≤ f (p0 ). We write P ∼ = Q. We also identify isomorphic preorders, and so we even write P = Q to make this point. P, Q, and R, and all sets X: 1. For each p ∈ P , the function appp : [P, Q] → Q given by appp (f ) = f (p) is an element of [[P, Q], Q]. Proposition 2.2 For all preorders 2. [P × Q, R] ∼ = [P, [Q, R]]. 3. [X, 2] ∼ = P(X). Proof For part (1), let f ≤ g in [P, Q]. Then appp (f ) = f (p) ≤ g(p) = appp (g). In the second part, let f : [P × Q, R] → [P, [Q, R]] be f (a)(p)(q) = a(p, q). It is routine to check that f is an isomorphism. In part (3), the isomorphism is ϕ : [X, 2] → P(X) given by ϕ(f ) = {p ∈ P : f (p) = T}. a Antitone functions and opposites of preorders We are also going to be interested in antitone functions from a preorder P to a preorder Q. These are the functions f : P → Q with the property that if p ≤ q in P, then f (q) ≤ f (p) in Q. We can express things in a more elegant way using the concept of the opposite preorder −P of a given preorder P.3 This is the preorder with the same set part, but with the opposite order. Formally, −P p ≤ q in −P = iff P q ≤ p in P Example 2.3 One example is the negation operation on truth values: ¬ : 2 → −2. For another, let P be any preorder, and let Q be P(P ), the power set preorder on the underlying set P of P. Let ↑: P → −Q be defined by ↑(p) = {p0 ∈ P : p ≤ p0 in P}. Since p ≤ p0 in P implies ↑(p) ⊇↑(p0 ), we indeed see that ↑: P → −Q. We collect the most important facts on this operation below. Proposition 2.4 For all preorders P, Q, and R, and all sets X: 1. −(−P) = P. 2. [P, −Q] = −[−P, Q]. 3. [−P, −Q] = −[P, Q]. 4. If f : P → Q and g : Q → R, then g · f : P → R. 5. If f : P → −Q and g : Q → −R, then g · f : P → R. 6. −(P × Q) ∼ = −P × −Q. 7. X∼ = −X. 8. −[X, P] ∼ = [X, −P]. Proof 3I For part (1), recall first that P and −P have the same underlying set. It follows that P and P P have wavered in my presentation between writing − and op . The latter notation would be more familiar to people who had seen duality in category theory, and might also make sense because the superscript notation matches the way that polarity marking works in the literature. On the other hand, the − notation makes the laws in Proposition 2.4 look more natural. In any case, I welcome advice from readers. P 4 −(−P) also have this the same underlying set. Concerning the order: x ≤ y in P iff y ≤ x in −P iff x ≤ y in −(−P). For (2), suppose that f : P → −Q. Then f is a set function from P to Q. To see that f is a monotone function from −P to Q, let x ≤ y in −P. Then y ≤ x in P. So f (y) ≤ f (x) in −Q. And thus f (x) ≤ f (y) in Q, as desired. This verifies that as sets, [P, −Q] = [−P, Q]. To show that as preorders [P, −Q] = −[−P, Q], we must check that if f ≤ g in the order on [P, −Q], then g ≤ f in the order on [−P, Q] as well. For this, let p ∈ −P ; so p ∈ P . Then f (p) ≤ g(p) in −Q. Thus g(p) ≤ f (p) in Q. This for all p ∈ P shows that g ≤ f in [−P, Q]. Part (3) follows from parts (1) and (2). For part (4), assume x ≤ y. Then f (x) ≤ f (y), and so g(f (x)) ≤ g(f (y)). For part (5), note that by part (2), g : −Q → R. So by part (4), g · f : P → R. Continuing with part (6), note first that −(P × Q) and −P × −Q have the same elements. To check that the orders are appropriately related, note that the following are equivalent: (p, q) ≤ (p0 , q 0 ) in −(P × Q); (p0 , q 0 ) ≤ (p, q) in (P × Q); p0 ≤ p in P and q 0 ≤ q in Q; p ≤ p0 in −P and q ≤ q 0 in −Q; (p, q) ≤ (p0 , q 0 ) in −P × −Q. For part (7), we verify that the identity map on P is an isomorphism from P to −P. If p ≤ q in P, then clearly q ≤ p in −P. And if q ≤ p in −P, then since P is flat, p = q. Thus p ≤ q in P,. The last part comes from parts (3) and (8). a Example 2.5 Part (3) in Proposition 2.4 is illustrated by considering [2, 2] and [−2, −2]. Let c be the constant function T, let d be the constant function F, and let i be the identity. Then [2, 2] is d < i < c, and [−2, −2] is c < i < d. Example 2.6 The classical conditional → gives a monotone function from −2 × 2 to 2. Since [−2 × 2, 2] ∼ = [−2, [2, 2]], this gives if ∈ [−2, [2, 2]]. This same function belongs to the opposite preorder, and so we could just as well write if ∈ −[−2, [2, 2]] ∼ = [2, −[2, 2]] ∼ = [2, [−2, −2]]. Example 2.7 Let X be any set. We use letters like p and q to denote elements of [X, 2]. Please keep in mind that the set of (monotone) functions from X to 2 is in one-to-one correspondence with the set of subsets of X (see Part (3) of Proposition 2.2). Define ∈ [−[X, 2], [[X, 2], 2]] ∈ [[X, 2], [[X, 2], 2]] ∈ [−[X, 2], [−[X, 2], 2]] every some no in the standard way: T if p ≤ q F otherwise every(p)(q) = some(p)(q) = ¬every(p)(¬ · q) no(p)(q) = ¬some(p)(q) It is routine to verify that these functions really belong to the sets mentioned above. And as in Example 2.6, each of these functions belongs to the opposite preorder as well, and we therefore have every some no ∈ [[X, 2], [−[X, 2], −2]] ∈ [−[X, 2], [−[X, 2], −2]] ∈ [[X, 2], [[X, 2], −2]] 5 Summary The main point of this section is the fact mentioned in Proposition 2.4, part (3). In words, the monotone functions from −P to −Q are exactly the same as the monotone functions from P to Q, but as preordered sets themselves, [−P, −Q] and [P, Q] are opposite orders. A simple illustration of the opposite orders was presented in Example 2.5. This fact, and also part (2) of Proposition 2.4, is important in the linguistic examples because it justifies why lexical items are typically assigned two categories. For example, if we interpret a common noun by an element f of a preorder of the form [X, 2], then the same f is an element of −[X, 2], so we might as declare our noun to also have some typing which calls for an element of −[X, 2]. And in general, if we type a lexical item w with a type (σ, τ ) corresponding to an element of some preorder [P, Q], then we might as well endow w with the type (−σ, −τ ) with the very same interpretation function. 3 Higher-order Terms over Preorders The centerpiece of this note is a presentation of the fully internalized type scheme. This section presents the details, aimed at readers familiar with categorical type logics (see Moortgat [7] for background). We should mention at the outset that what we are generalizing are the applicative (Ajdukiewicz-Bar Hillel) categorial grammars. However, everything we do extends to the more useful class of categorial grammars, as we shall see. Fix a set T0 of basic types. Let T1 be the smallest superset of T0 closed in the following way: If σ, τ ∈ T1 , then also (σ, τ ) ∈ T1 . If σ ∈ T1 , then also −σ ∈ T1 . Let ≡ be the smallest equivalence relation on T1 such that the following hold: 1. −(−σ) ≡ σ. 2. −(σ, τ ) ≡ (−σ, −τ ). 3. If σ ≡ σ 0 , then also −σ ≡ −σ 0 . 4. If σ ≡ σ 0 and τ ≡ τ 0 , then (σ, τ ) ≡ (σ 0 , τ 0 ). Definition T = T1 /≡. This is the set of types over T0 . The operations σ 7→ −σ and σ, τ → 7 (σ, τ ) are well-defined on T. We always use letters like σ and τ to denote elements of T, as opposed to writing [σ] and [τ ]. That is, we simply work with the elements of T1 , but identify equivalent types. Definition Let T0 be a set of basic types. A typed language over T0 is a collection of typed variables v : σ and typed constants c : σ, where σ in each of these is an element of T. We generally assume that the set of typed variables includes infinitely many of each type. But there might be no constants whatsoever. We use L to denote a typed language in this sense. Let L be a typed language. We form typed terms t : σ as follows: 1. If v : σ (as a typed variable), then v : σ (as a typed term). 2. If c : σ (as a typed constant), then c : σ (as a typed term). 3. If t : (σ, τ ) and u : σ, then t(u) : τ . Frequently we do not display the types of our terms. 6 3.1 Semantics For the semantics of our higher-order language L we use models M of the following form. M consists of an assignment of preorders σ 7→ Pσ on T0 , together with some data which we shall mention shortly. Before this, extend the assignment σ 7→ Pσ to T1 by P(σ,τ ) = [Pσ , Pτ ] P−σ = −Pσ An easy induction shows that if σ ≡ τ , then Pσ = Pτ . So we have Pσ for σ ∈ T. We use Pσ to denote the set underlying the preorder Pσ . The rest of the structure of a model M consists of an assignment [[c]] ∈ Pσ for each constant c : σ, and also a typed map f ; this is just a map which to a typed variable v : σ gives some f (v) ∈ Pσ . 3.2 Ground terms and contexts A ground term is a term with no free variables. Each ground term t : σ has a denotation [[t]] ∈ Pσ defined in the obvious way: [[c]] = is given at the outset for constants c : σ [[t(u)]] = [[t]]([[u]]) A context is a typed term with exactly one variable, x. (This variable may be of any type.) We write t for a context, or sometimes t(x) to emphasize the variable. We’ll be interested in contexts of the form t(u). Note that if t(u) is a context and if x appears in u, then t is a ground term; and vice-versa. In the definition below, we remind you that subterms are not necessarily proper. That, is a variable x is a subterm of itself. Definition Fix a model M for L. Let x : ρ, and let t : σ be a context. We associate to t a set function f t : Pρ → Pσ in the following way: 1. If t = x, so that σ = ρ, then fx : Pσ → Pσ is the identity. 2. If t is u(v) with u : (τ, σ) and v : τ , and if x is a subterm of u, then ft is app[[v ]] · fu . That is, ft(u) is a ∈ Pρ 7→ fu (a)([[v]]) . 3. If t is u(v) with u : (τ, σ) and v : τ , and if x is a subterm of v, then ft is [[u]] · fv . That is, ft is a ∈ Pρ 7→ [[u]](fv (a)) . The idea of ft is that as a ranges over its interpretation space Pρ , ft (a) would be the result of substituting various values of this space in for the variable, and then evaluating the result. Proposition 3.1 For all contexts t(x) with t : σ and x : ρ, and all ground terms s : ρ, ft ([[s]]) [[t(x ← s)]], = where t(x ← u) is the result of substituting s for x in t. Proof By induction on t. If t = x, the result is trivial. Suppose that t = u(v) with x appearing in v. 7 Then t(x ← s) is u(x ← s)(v), and ft ([[s]]) = fu ([[s]])([[v]]) = [[u(x ← u)]]([[v]]) = [[u(x ← u)(v)]] = [[t(x ← s)]] Finally, if t = u(v) with x appearing in v, then t(x ← s) is u(v(x ← s)), and ft ([[s]]) = [[u]](fv ([[s]])) [[u(v(x ← s))]] = In both calculations, we used the induction hypothesis. = [[t(x ← s)]] a Notice that we defined ft as a set function and wrote ft : Pρ → Pσ instead of ft : Pρ → Pσ . The reason why we did this is that it is not immediately clear that ft is monotone. This is the content of the following result. Lemma 3.2 (Context Lemma) Let t be a context, where t : σ and x : ρ. Then ft is element of [Pρ , Pσ ]. Proof By induction on t. In the case that t is a type variable x : σ, then fx is the identity on Pσ , and indeed the identity is a monotone function on any preorder. Next, assume that x occurs in u. By induction hypothesis, fu belongs to [Pρ , P(τ,σ) ] = [Pρ , [Pτ , Pσ ]]. The function app[[u]] belongs to [[Pτ , Pσ ], Pσ ] by Proposition 2.2, part (1), and so ft belongs to [Pρ , Pσ ] by Proposition 2.4, part (4). Finally, assume that x occurs in v. By induction hypothesis, fv : Pρ → Pτ . And [[u]] ∈ P(τ,σ) , so that [[u]] : Pτ → Pσ . By Proposition 2.4, part (4), [[u]] · fv is an element of [Pρ → Pσ ]. a This Context Lemma is exactly the soundness fact underlying internalized polarity markers. We’ll see some examples in the next section. Notice also that it is a very general result: the only facts about preorders were the very general results in Section 2. The Context Lemma also allows one to generalize our work from the applicative (AB) grammars to the setting of CG using the Lambek Calculus. In detail, one generalizes the notion of a context to one which allows more than one free variable but requires that all variables which occur free have only one free occurrence. Suppose that x1 : ρ1 , . . . , xn : ρn are the free variables in such a generalized context Q t(x1 , . . . , xn ) : σ. Then t defines a set function a set function ft : i Pρi → Pσ . A generalization Q of the Context Lemma then shows that ft is monotone as a function from the product preorder i Pρi . So functions given by lambda abstraction on each of the variables are also monotone, and this amounts to the soundness of the introduction rules of the Lambek Calculus in the internalized setting. This point an advantage of the internalized system in comparison to the monotonicity calculus: in the latter approach, the results of polarity determination become side conditions on the introduction rules for the lambda operator. 3.3 Logic Figure 1 several sound logical principles. These are implicit in Fyodorov, Winter, and Francez [4]; see also Zamansky, Francez, and Winter [14] for a more developed proposal. The rules the reflexive and transitive properties of the preorder, the fact that all function types are interpreted by sets of monotone functions, and the pointwise definition of the order on function sets. The rules in Figure 1 define a logical system whose assertions order statements such as u : σ ≤ v : σ; then the statements such as u : σ become side conditions on the rules. (For simplicity, we are only dealing with ground terms u and v, but it is not hard to generalize the treatment to allow variables.) The logic thus defines a relation Γ ` ϕ on order statements. Given a model M and an order statement ψ of the form u : σ ≤ v : σ, we say that M satisfies ψ iff [[u]] ≤ [[v]] in Pσ . The soundness of the logic is the statement that every model M satisfying all of the sentences in Γ also satisfies φ. 8 t:σ≤t:σ t:σ≤u:σ u:σ≤v:σ t:σ≤v:σ u : σ ≤ v : σ t : (σ, τ ) t(u) : τ ≤ t(v) : τ u : (σ, τ ) ≤ v : (σ, τ ) t : σ u(t) : τ ≤ v(t) : τ Figure 1: Monotonicity Principles The Context Lemma then gives another sound principle: x : σ in the context t u : σ ≤ v : σ t(u) : τ ≤ t(v) : τ (1) However, all instances of (1) are already provable in the logic as we have defined it. This is an easy inductive argument on the context t. 4 Examples and Discussion We present a small example to illustrate the ideas. It is a version of the language R∗ from PrattHartmann and Moss [9]. We also take the opportunity to discuss internalized marking in a general way. We have a few comments on the challenges of building a logic based on internalized markings. And we also have a discussion of the word any and how it might be treated. First, we describe a language L corresponding to this vocabulary. Let’s take our set T0 of basic types to be {t, pr}. (These stand for truth value and property. In more traditional presentations, the type pr might be (e, t), where e is a type of entities.) Here are the constants of the language L and their types: 1. We have typed constants every+ some+ no+ any+ every− some− no− any− : (−pr, (pr, t)) : (pr, (pr, t)) : (−pr, (−pr, t)) : (−pr, (pr, t)) : (pr, (−pr, −t)) : (−pr, (−pr, −t)) : (pr, (pr, −t)) : (−pr, (−pr, −t)) 2. We fix a set of unary atoms corresponding to some plural nouns and lexical verb phrases in English. For definiteness, we take cat, dog, animal, runs, sleeps. Each unary atom p gives two typed constants: p+ : pr and p− : −pr. 3. We also fix a set of binary atoms corresponding to some transitive verbs in English. To be definite, we take chase, see. Every binary atom r gives four type constants: r1+ r1− r2+ r2− : ((pr, t), pr) : ((−pr, −t), −pr) : ((−pr, t), pr) : ((pr, −t), −pr) This completes the definition of our typed language L. We might mention that the superscripts + and − on the constants c : σ of L are exactly the polarities pol(σ) of the associated types. (We shall define the polarity function in the next section.) These notations + and − are mnemonic; we could do without them. As always with categorial grammars, a lexicon is a set of pairs consisting of words in a natural language together with terms. We have been writing the words in the target language in italics, and 9 then terms for them are written in sans serif. It is very important that the lexicon allows a given word to appear with many terms. As we have seen, we need every to appear with every+ and every− , for example. We still are only concerned with the syntax at this point, and the semantics will enter once we have seen some examples. Examples of typed terms and contexts Here are a few examples of typed terms along with their derivations. First, here is a derivation of the term corresponding to the English sentence No dog chases every cat: no+ : (−pr, (−pr, t)) dog− : −pr every− : (pr, (−pr, −t)) cat+ : pr chase− every− (cat+ ) : (−pr, −t) 1 : ((−pr, −t), −pr) no+ (dog− ) : (−pr, t) − + chase− 1 (every (cat )) : −pr − + no+ (dog− )(chase− 1 (every (cat ))) : t One should compare this treatment with what we saw of the same sentence in the Introduction. The internalized setting does not have steps of Monotonicity Marking and Polarity Determination. It only involves a derivation in the applicative CG at hand. Please also keep in mind that the superscripts + and − above are from the lexicon, not from any additional process, and that the lexicon could have been formulated without those superscripts. They are in fact the polarities of the attendant categories. We similarly have the following terms: + − some+ (dog+ )(chase+ 1 (every (cat ))) : t + + + + − some (dog )(chase2 (no (cat ))) : t + + no+ (dog− )(chase− 2 (no (cat ))) : t The point here is that all four different typings of the transitive verbs are needed to represent sentences of English. I do not believe that this point has been noticed about the internalized scheme before. It raises an issue that we shall take up in the next subsection: how would a grammar writer come up with all of the needed categories? − + Here is an example of a context: no+ (x : −pr)(chase− 1 (every (cat ))) : t. So x is a variable of type −pr. In any model, this context gives a function from interpretations of type −pr to those of type −t. The Context Lemma would tell us that this function is a monotone function. Turning things around, it would be an antitone function from interpretations of type −pr to those of type −t. Any The internalized approach enables a treatment of any that has it any mean the same thing as every when it has positive polarity, and the same thing as some when it has negative polarity. For example, here is a sentence intended to mean everything which sees any cat runs: see− 2 every+ : (−pr, (pr, t)) any− : (−pr, (−pr, −t)) cat− : −pr any− (cat− ) : (−pr, −t) : ((−pr, −t), −pr) − − see− 2 (any (cat )) : −pr − − every+ (see− 2 (any (cat ))) : (pr, t) runs+ : pr − − + every+ (see− 2 (any (cat ))(runs ) : t The natural reading is for any to have an existential reading. Another context for existential readings of any is in the antecedent of a conditional. In the other direction, consider − + any+ (cat− )(see− 1 (any (dog ))) : t. The natural reading of both occurrences is universal. 10 4.1 Standard models Up until now, we have only given one example of a typed language L. Now we describe a family of models for this language. The family is based sets with interpretations of the unary and binary atoms. To make this precise, let us call a pre-model a structure M0 consisting of a set M together with subsets [[p]] ⊆ M for unary atoms p and relations [[r]] ⊆ M × M for binary atoms r. Every pre-model M0 now gives a bona fide model M of L in the following way. The underlying universe M gives a flat preorder M. We take Ppr = [M, 2] ∼ = P(M ). We also take Pt = 2. We interpret the typed constants p+ : pr corresponding to unary atoms by [[p+ ]](m) = T iff m ∈ [[p]]. (On the right we use the interpretation of p in the model M.) Usually we write p+ instead of [[p]]. The constants p− : −pr are interpreted by the same functions. The interpretations of every+ , some+ , and no+ are given in Example 2.7 (taking the set X to be the universe M of M). And the interpretations of every− , some− , and no− are the same functions. We interpret any+ the same way as every+ , and any− the same way as some+ . Recall that the binary atom r gives four typed constants r1+ , r1− , r2+ , and r2− . These are all interpreted in model in the same way, by r∗ (q)(m) = q({m0 ∈ M : [[r]](m, m0 )}) It is clear that for all q ∈ P(pr,t) = [[M, 2], 2], r∗ (q) is a function from M to {T, F}. The monotonicity of this function is trivial, since M is flat. We do need to check that all of the interpretations of the constants are correctly typed. This is important, and it raises a general issue. Issue 1: What are the sources of the multiple typings of the lexical items? As this short paper draws to a close, we wish to identify some issues with the internalized scheme of polarity marking. Here is the first: The scheme requires many entries for each lexical item, usually with the same semantics in every model. These amount to multiple typings of the same function. What is the source of these multiple typings, and can they be automatically inferred? The principal source for multiple typings is the fact noted in Proposition 2.4, part (3): [−P, −Q] = −[P, Q]. That is, we have two (opposite) order relations on the same set, and so it makes sense that we should type an element of that set in two ways. This accounts for the multiple typings of nouns and determiners in our example, but not for transitive verbs. For them, and for all categories in more complicated languages, we suspect that there is no simple answer. Perhaps the best one could hope for would be a formal system in which to derive the various typings. In the same way that one could propose logical systems which process inferences in parallel with syntactic derivations, one could hope to do the same thing for order statements. Here is our vision of a what a derivation might look like in such a system. Let us write φ(m, r) for {m0 ∈ M : [[r]](m, m0 )}. A justification for the typing r2− : [[[M, 2], −2], [M, −2]] could be the derivation below: q ≤ q 0 in [[M, 2], −2] r2− (q) φ(m, r) ∈ [M, 2] = q(φ(m, r)) ≤ q (φ(m, r)) = r2− (q 0 ) in − 2 0 r2− (q) ≤ r2− (q 0 ) in [M, −2] r2− : [[[M, 2], −2], [M, −2]] The challenge would be to propose a system which is small enough to be decidable, and yet big enough to allow us to use the notation φ(m, r) as we have done it. Here is a related point. As Dowty [3] notes, the parsing problem for internalized grammars is no harder than that of CG in the first place. In fact, the work for internalized grammars is a special case, 11 since they are CGs with nothing extra. However, this could be a misleading comparison: if one is given a CG G and some additional information (for example, the intended meanings of the constants corresponding to determiners), one would need to convert this to an internalized grammar G∗ . It may be that the size of the lexicon of G∗ is exponential in the size of the lexicon of G. The fact that in our small grammar we need four typings for the verbs makes one worry. And in the worst case, the internalized scheme would be less attractive for computational purposes. Issue 2: what is the relation of polarity and monotonicity? It is “well-known” in this area that negative polarity items are those which must occur, or usually occur in “downward monotone positions” in a given sentence. But without saying what the orders on the semantic spaces, this point could be confusing. For example, suppose one has a reason to assume that the order on truth values is T < F rather than F < T. In this case, the notions of upward monotone positions and downward monotone ones would switch. More seriously, given a complicated function type (σ, −τ ), it is not so clear whether this should be recast as −(−σ, τ ). In this case, it would not be so clear what the upward and downward monotone positions are in a sentence. This matter might be clarified by looking at no vs. at most one in a sentence such as No dog chases every cat. As we have seen, our treatment suggests that the occurrence of no is in a positive position, and so we expect it to be monotone increasing. We would like to say that no ≤ at most one, since this is the verdict from generalized quantifier theory. Indeed, the interpretations functions no and at most one are taken to be functions of type ((M, 2), ((M, 2), 2)), and then the order on this space is taken to be ultimately determined by the last “2”. However, we do not have a single function no but rather two functions, no+ and no− . These are not quite restrictions of no and at most one. For that matter, why exactly do we say that no+ is monotone increasing? It’s type was (−pr, (−pr, t)), after all. What seems to be going on here is that in a curried form, we have (−pr × −pr, t). When curried like this, no+ and no− . are restrictions of no and at most one. Moreover, the order on on a product space is determined by the codomain order, forgetting the domain order. Also, the “meaning space” in a model for types pr and −pr are the same. Taken together, this means that no+ indeed corresponds to a monotone function and no− to an antitone function. Here is a general definition of the polarity of a type. Define pol(σ) : T1 → {+, −} as follows: 1. If σ ∈ T0 , then pol(σ) = +. 2. pol((σ, τ )) = pol(τ ). 3. pol(−σ) = −pol(σ). It is easy to check that if σ ≡ τ , then pol(σ) = pol(τ ). (It is enough to show that pol(−(−σ)) = pol(σ), and also that pol(−(σ, τ )) = pol((−σ, −τ )).) This allows us define pol on the quotient set, T1 /≡, our set T of types. The definition of the pol function gives an algorithm to calculate pol(σ). Another way to do this would be to: (1) put σ in “negation normal form”, by driving the - signs through function spaces so that there are no function types inside the scope of any − sign; (2) remove an even number of − signs from basic types, so that the remaining basic types have either zero or one − sign in front; (3), finally, starting at the top, proceed “to the right” through the term until one reaches either a basic type or its negation. For example, every− is of type (pr, (−pr, −t)), and the polarity of its occurrence in our example sentence No dog chases every cat (or in any term derived in our grammar), is −. Let u be a subterm occurrence in the term t. Then there is a unique context t0 (x) such that t = t0 [u ← x]. Let σ be the type of x in t0 . We say that u has positive polarity if pol(x) = +, and that u has negative polarity if pol(x) = −. We naturally would like to say that if an occurrence u has positive polarity in t, then that occurrence is in a monotone increasing position, and if the occurrence has negative polarity, then it is in a monotone decreasing position. 12 Issue 3: how can we build a logic to reason about classes of standard models? Our last issue concerns the contribution of the types in an internalized CG to a natural logic. The matter is illustrated by a discussion concerning the example language L presented at the beginning of Section 4. Suppose we are given entailment relations concerning the unary and binary atoms of L. For unary atoms, these take the form p ⇒ q, where p and q are unary atoms. For binary atoms, entailment relations look similarly: r ⇒ s. (Incidentally, one is also interested in exclusion relations: see MacCartney and Manning [6] for applications, and Icard [5] for theory behind this.) These entailment relations correspond to semantic restrictions on standard models. The first corresponds to the requirement that [[p]] ⊆ [[q]], and the second to the parallel requirement that [[r]] ⊆ [[s]]. Working with entailment relations in this way means restricting attention to standard models which conform to the restrictions. It is natural to then look for additional axioms and/or rules of inference in the logical system that would give a sound and complete logic for these models. The entailment relations correspond to axioms of the sentential kind, and also on the orders. The sentential axioms can be stated only for the unary atoms, and they would be sentences like (every+ (p− ))q− . For the binary atoms, the statements are more involved and we leave them as exercises. Turning to axioms on the orders, our entailment relations give us the following: r1− ≤ s+ 1 in ((−pr, −t), −pr) r2+ ≤ s+ 2 in ((−pr, t), pr) r2− ≤ s− 2 in ((pr, −t), −pr) p+ ≤ q+ in pr q− ≤ p− in −pr r1+ ≤ s+ 1 in ((pr, t), pr) However, even adopting both kinds of axioms on top of the logical principles we saw in Section 3.3 would not result in a complete logical system, so it would fall short of what we want in a natural logic. One source of problems was noted in Zamansky, Francez, and Winter [14]: the system would not have any way to derive principles corresponding to de Morgan’s laws. Another problem for us would be even more basic. The resulting logic does not appear to be strong enough to derive the sound sentences of the form every X is an X. To be sure, there are unsound sentences of this form in the logic, due to the typing. For example, the way we are doing things, every dog who chases any cat chases any cat is not a logical validity at all, since the first any is to be interpreted existentially and the second universally. (I first heard an example of this type in a lecture of Barbara Partee.) But without any, we would like to derive, for example + − + + − every+ (see− 1 (every (cat )))(see1 (every (cat ))) At this point, I can merely speculate and offer a suggestion. My feeling is that one should look to logics with individual variables, and then the properties of the individual words become rules of inference in the logic. Specifically, it should be the case that if x is an individual or NP-level variable, then one should be able to infer cat+ (x) from cat− (x), and vice-versa. Logics of this type have been proposed in the literature, but they do not build on the framework of categorial grammar and therefore have nothing to say about polarity and monotonicity. Integrating the work there with the proposal in this paper is the most important open issue in work on this topic. Conclusion This paper explores what happens to CG if we adopt the idea that interpretation needs to happen in ordered domains, especially if we add to the type formation rules a construction for the opposite order. Doing this gives a Context Lemma in a straightforward way, and the Context Lemma is tantamount to the soundness of the internalized polarity marking scheme. The types in the internalized scheme are more plentiful than in ordinary CG, and this is a source of issues as well as possibilities. The issues concern how the new typings are to be generated, and how they are to interact with each other in proof-theoretic settings. There certainly is more to do on this matter, and this paper has only provided the groundwork. With some additional work, one could easily imagine that the internalized scheme will be an important contribution to the research agenda of natural logic. 13 References [1] Raffaella Bernardi. Reasoning with Polarity in Categorial Type Logic. PhD thesis, University of Utrecht, 2002. [2] Christos Christodoulopoulos. Creating a natural logic inference system with combinatory categorial grammar. Master’s thesis, University of Edinburgh, 2008. [3] David Dowty. The role of negative polarity and concord marking in natural language reasoning. In Proceedings of SALT IV, page 114144. Cornell University, Ithaca, NY, 1994. [4] Yaroslav Fyodorov, Yoad Winter, and Nissim Francez. Order-based inference in natural logic. Log. J. IGPL, 11(4):385–417, 2003. Inference in computational semantics: the 2000 Dagstuhl Workshop. [5] Thomas Icard. Exclusion and containment in natural language. ms., Stanford University, 2010. [6] Bill MacCartney and Christopher D. Manning. An extended model of natural logic. In Proceedings of the Eighth International Conference on Computational Semantics (IWCS-8), Tilburg, Netherlands, 2009. [7] Michael Moortgat. Categorial type logics. In Handbook of Logic and Language, pages 93–178. MIT Press, Cambridge, MA, 1997. [8] Rowan Nairn, Cleo Condoravdi, and Lauri Karttunen. Computing relative polarity for textual inference. In Proceedings of ICoS-5 (Inference in Computational Semantics), Buxton, UK, 2006. [9] Ian Pratt-Hartmann and Lawrence S. Moss. Logics for the relational syllogistic. Review of Symbolic Logic, 2(4):647–683, 2009. [10] Vı́ctor M. Sánchez Valencia. Studies on natural logic and categorical grammar. PhD thesis, University of Amsterdam, 1991. [11] Johan van Benthem. Essays in logical semantics, volume 29 of Studies in Linguistics and Philosophy. D. Reidel Publishing Co., Dordrecht, 1986. [12] Johan van Benthem. Language in Action, volume 130 of Studies in Logic and the Foundations of Mathematics. North Holland, Amsterdam, 1991. [13] Johan van Benthem. A brief history of natural logic. In M. Chakraborty, B. Löwe, M. Nath Mitra, and S. Sarukkai, editors, Logic, Navya-Nyaya and Applications, Homage to Bimal Krishna Matilal. College Publications, London, 2008. [14] Anna Zamansky, Nissim Francez, and Yoad Winter. A ‘natural logic’ inference system using the Lambek calculus. J. Log. Lang. Inf., 15(3):273–295, 2006. 14

The Soundness of Internalized Polarity Marking

Related documents

Products

Support

The Soundness of Internalized Polarity Marking

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib