Lies SL
Lies SL
i i
    i                                                 i
    i                                                                            i
iv
        Copyright 
2011
                    c     by The Sanskrit Library. All rights reserved. Repro-
        duction in any medium is restricted.
i i
    i                                                                            i
    i                                                                                 i
        Foreword
        by G EORGE C ARDONA
                                            v
i                                                                                         i
    i                                                                                 i
    i                                                                                       i
vi FOREWORD
        that includes such elements is termed padapāt.ha. At least one such ana-
        lyzed text predates the grammarian Pān.ini, the padapāt.ha to the Rgveda
                                                                                  ˚
        by Śākalya. The padapāt.ha related to any saṁhitāpāt.ha obviously derives
        from the latter, its source. On the other hand, the separate padas of the
        padapāt.ha can be viewed theoretically as the source of the continuously
        recited text, gotten by removing pauses at boundaries and thereby apply-
        ing phonological rules that take effect between contiguous units. This is,
        in fact, the theoretical stance taken by authors of texts called prātiśākhya,
        which formulate phonological rules modifying padas in contiguity with
        other padas. Thus, phonological alternations within Vedic texts were ob-
        jects of concern by at least the early sixth century B.C. Pān.ini himself
        — who can hardly be dated later than around 500 B.C. — composed a
        generalized grammatical work, his śabdānuśāsana, which includes both a
        set of rules, called As..tādhyāyı̄, serving to account through a derivational
        system for the accepted usage of his time and place as well as certain
        dialectal differences and features particular to earlier Vedic. One of the
        appendices to the As..tādhyāyı̄ is an inventory of sounds — referred to
        as the aks.arasamāmnāya by early students of Pān.ini’s work — that is
        divided into fourteen sets, each set off from the others by a final conso-
        nantal marker (it), which serves to form abbreviatory terms (pratyāhāra)
        referring to groups of sounds with respect to phonological rules as for-
        mulated in the As..tādhyāyı̄.
             The order of sounds in Pān.ini’s aks.arasamāmnāya shows properties
        best explained as due to its being a reworking of an earlier source. The
        five sets of stops in such earlier inventories, moreover, show an obvi-
        ous phonetic ordering, from velar to labial, that is, an order based on
        the production of sounds, from the back of the oral cavity to the front.
        Moreover, prātiśākhyas not only state rules of phonological replacement
        but also describe the production of sounds, a topic which is dealt with in
        works on phonetics (śiks.ā) such as the Āpiśaliśiks.ā of Āpiśali. Accord-
        ingly, scholars are justified in maintaining that early Indian texts reflect
        a sophisticated investigation of Sanskrit phonology and phonetics.
             Scholars have also frequently debated whether or not writing played
        a role in the composition and transmission of such early works as the
        prātiśākhyas and the As..tādhyāyı̄. There can be no doubt whatever that
        the latter was later transmitted orally. It is also most plausible that Pān.ini
        himself composed and transmitted his work orally. Thus, Pān.ini formu-
i i
    i                                                                                       i
    i                                                                                          i
FOREWORD vii
i i
    i                                                                                          i
    i                                                                                   i
viii FOREWORD
i i
    i                                                                                   i
    i                                                                                 i
FOREWORD ix
                                                                 George Cardona
                                                               February 19, 2011
i i
    i                                                                                 i
    i                                                    i
x FOREWORD
i i
    i                                                    i
    i                                                                                i
Preface
                                            xi
i                                                                                        i
    i                                                                                i
    i                                                                                 i
xii PREFACE
             Sanskrit text has been moving into the digital medium. Recent dec-
        ades have witnessed the growth of machine-readable Sanskrit texts in
        archives such as the Thesaurus Indogermanischer Text-und Sprachmate-
        rialien (TITUS), Kyoto University, Indology, and the Göttingen Register
        of Electronic Texts in Indian Languages (GRETIL). The last few years
        have witnessed a burgeoning of digital images of Sanskrit manuscripts
        and books hosted on-line. For example, the University of Pennsylvania
        Library, which houses the largest collection of Sanskrit manuscripts in
        the Western Hemisphere, has made digital images of two hundred ninety-
        seven of them available on the web. The Universal Digital Library, and
        Google Books have made digital images of large numbers of Sanskrit
        texts accessible as part of their enormous library digitization projects.
        Digitized Sanskrit documents include machine-readable text and images
        of lexical resources such as those of the Cologne Digital Sanskrit Lexi-
        con project (CDSL), and the University of Chicago’s Digital Dictionaries
        of South Asia project (DDSA).
             As oral, manuscript, and print media that have conveyed the knowl-
        edge embodied in the ancient Sanskrit language make their transition into
        digital media, a number of scholars have begun collaborating in the San-
        skrit Computational Linguistics Consortium which has organized sev-
        eral symposia since 2007. Members include linguists finding new chal-
        lenges in formalizing the syntax of a free-word-order language, computer
        scientists drawn to model techniques of generative grammar used by
        the ancient India grammarian Pān.ini, philologists using digital methods
        to assist in critical editing, and scholars collaborating to build corpora,
        databases, and tools for the use of academic researchers and commercial
        enterprises. The authors of the present volume have actively participated
        in and fostered this growing collaboration.
             Since 1999, we have worked together to facilitate the entry, linguis-
        tic processing, and display of Sanskrit texts both in print and on the Web.
        Our collaboration began with the preparation of the web and print publi-
        cation of Scharf’s (2002) Rāmopākhyāna and the launch of The Sanskrit
        Library website1 in 2002, and continued with the International Digital
        Sanskrit Library Integration project at Brown University under grants
        from the National Science Foundation (NSF) 2006–2009. In July 2009
        we began the project Enhancing Access to Primary Cultural Heritage
              1 <http://sanskritlibrary.org/>.
i i
    i                                                                                 i
    i                                                                                  i
PREFACE xiii
        Materials of India under a grant from the National Endowment for the
        Humanities, and in July 2010 we began the project Sanskrit Lexical
        Sources: Digital Synthesis and Revision. Struggling to overcome the
        lack of adequate encoding for Sanskrit led us to tackle the issue both
        practically and theoretically. With colleagues worldwide, we prepared
        a proposal to extend the Unicode Standard to allow adequate encoding
        of Vedic Sanskrit. Simultaneously, we engaged in a thorough review of
        the fundamental principles of encoding. We reviewed encoding princi-
        ples not just for Sanskrit and not just in digital character encoding, but
        considered the question broadly in terms of the means that humans com-
        municate knowledge through speech, writing, print, and electronic me-
        dia. The present volume is a result of these investigations. While the
        linguistic material discussed is drawn primarily from Sanskrit, the ques-
        tions addressed are relevant to linguistic encoding in general and should
        be of interest to scholars of linguistics.
            On the fifth of September 2009, I received a call from my colleague
        and co-author Malcolm Hyman’s wife informing me that he had passed
        away suddenly the night before. It is regrettable that he did not get to see
        the publication of this book that has been nearly complete for two years
        and that he himself was primarily responsible for typesetting. It is far
        more regrettable that the fruitful collaboration that we have undertaken
        in the past decade has come to an end, and that the potential contributions
        he had to make will not materialize. Malcolm had a comprehensive view
        of digital humanities and prescient vision of productive directions for
        research. I am grateful for what I have learned from him in the course of
        our work together – even in being forced to learn TEX to bring this book
        to completion. In tribute to him and in the hope that others may find his
        work instructive and inspiring, his complete curriculum vitae is included
        in Appendix E of this volume.
            Part of this work was supported by the NSF under grant no. 0535207.
        Any opinions, findings, and conclusions or recommendations expressed
        are those of the authors and do not necessarily reflect the views of the
        NSF.
i i
    i                                                                                  i
    i                                                          i
xiv PREFACE
i i
    i                                                          i
    i                                                                                           i
Contents
Preface xi
Illustrations xix
Abbreviations xxi
        1   Introduction                                                                   1
            1.1 Technologies for representing spoken language         .   .   .   .   .    2
            1.2 The Sanskrit language . . . . . . . . . . . . . .     .   .   .   .   .    8
            1.3 The Devanāgarı̄ script . . . . . . . . . . . . . .   .   .   .   .   .    9
            1.4 Roman transliteration . . . . . . . . . . . . . .     .   .   .   .   .   16
            1.5 The All-India Alphabet . . . . . . . . . . . . .      .   .   .   .   .   18
                                           xv
i                                                                                                   i
    i                                                                                           i
    i                                                                                                            i
xvi CONTENTS
        5     Sanskrit phonology                                                                            61
              5.1 Description of Sanskrit sounds . . . . . . . . . . . . . .                                62
              5.2 Phonetic and phonological differences . . . . . . . . . .                                 65
                   5.2.1 Phonetic differences . . . . . . . . . . . . . . .                                 65
                   5.2.2 Sounds of problematic characterization . . . . .                                   68
                   5.2.3 Differences in phonological classification of
                          segments . . . . . . . . . . . . . . . . . . . . .                                71
                   5.2.4 Differences in the system of feature classification                                73
                   5.2.5 Indian treatises on phonological features . . . . .                                73
                   5.2.6 Modern feature analysis . . . . . . . . . . . . .                                  75
        6     Sound-based encoding                                                                          79
              6.1 Criteria for selecting distinctive elements to encode                         .   .   .   79
                  6.1.1 Phoneme . . . . . . . . . . . . . . . . . .                             .   .   .   80
                  6.1.2 Generative grammar . . . . . . . . . . . .                              .   .   .   84
                  6.1.3 Historical linguistics . . . . . . . . . . . .                          .   .   .   85
                  6.1.4 Paralinguistic semantics . . . . . . . . . .                            .   .   .   87
                  6.1.5 Contrastive segments . . . . . . . . . . . .                            .   .   .   89
                  6.1.6 Phoneme in the broader sense . . . . . . .                              .   .   .   91
                  6.1.7 Contrastive phonologies . . . . . . . . . .                             .   .   .   92
i i
    i                                                                                                            i
    i                                                                                                              i
CONTENTS xvii
        8   Conclusions                                                     113
            8.1 Dynamic transcoding . . . . . . . . . . . . . . . . . . . 117
            8.2 Text-to-speech and speech-recognition . . . . . . . . . . 118
            8.3 Higher-level encoding . . . . . . . . . . . . . . . . . . . 119
Appendices 121
        A Tables                                                                                            123
          A.1 Phonetic features . . . . . . . . . . . . . . . . . . . . .                                   124
          A.2 Sounds categorized by Āpiśali . . . . . . . . . . . . . .                                   126
          A.3 Sounds categorized by Śaunaka . . . . . . . . . . . . . .                                    128
          A.4 Sounds categorized after Halle et al. . . . . . . . . . . .                                   130
          A.5 Sanskrit phonetics . . . . . . . . . . . . . . . . . . . . .                                  132
          A.6 Sanskrit phonetics according to Āpiśali . . . . . . . . .                                   134
          A.7 Sanskrit phonetics according to Śaunaka . . . . . . . . .                                    136
          A.8 Sanskrit phonemics . . . . . . . . . . . . . . . . . . . .                                    138
          A.9 Sanskrit sounds derived from PIE by Burrow . . . . . .                                        140
          A.10 PIE phonemics according to Burrow . . . . . . . . . . .                                      142
          A.11 PIE phonemics according to Szemerényi . . . . . . . . .                                      144
          A.12 Feature tree after Halle . . . . . . . . . . . . . . . . . .                                 146
          A.13 Graphic features of Devanāgarı̄ according to Ivanov and
               Toporov . . . . . . . . . . . . . . . . . . . . . . . . . .                                  148
i i
    i                                                                                                              i
    i                                                                                         i
xviii CONTENTS
Bibliography 231
Index 261
i i
    i                                                                                         i
    i                                                                                     i
Illustrations
                                         xix
i                                                                                             i
    i                                                                                     i
    i                                                        i
xx ILLUSTRATIONS
i i
    i                                                        i
    i                                                                   i
Abbreviations
         A.          Pān.ini’s As..tādhāyī
         ĀŚ.       Āpiśaliśiks.ā
         APr.        Atharvaprātiśākhya
         ASCII       American Standard Code for Information Inter-
                     change
         BCDIC       Extended Binary Coded Decimal Interchange
                     Code
         CA.         Caturādhyāyikā
         CCITT       Comité Consultatif International Télpéhonique et
                     Télégraphique
         DhP.        The Pān.inian Dhātupāt.ha
         MBhK.       Kielhorn’s edition of Patañjali’s Mahābhās.ya
         PIE         Proto-Indo-European
         RPr.        Rkprātiśākhya
         ˚
         RV.         ˚
                     Rgveda
         ˚
         TPr.        ˚
                     Taittirı̄yaprātiśākhya
         VPr.        Vājasaneyiprātiśākhya
         Vyā. Pa.   Vyād.i Paribhās.āvrtti
                                           ˚
                                    xxi
i                                                                           i
    i                                                                   i
    i                                                          i
xxii ABBREVIATIONS
i i
    i                                                          i
    i                                                                                   i
Chapter 1
Introduction
                                             1
i                                                                                           i
    i                                                                                   i
    i                                                                                                 i
2 CHAPTER 1. INTRODUCTION
        which it simply imitates the old system. [. . . ] For example, early printed books imitated
        manuscripts, and early cinema used fixed cameras in imitation of the fixed viewpoint of the
        theatre-goer” (Waller, 1986, 74).
           2 The earliest “proto-writing”, attested in the ancient Near East, is associated with
i i
    i                                                                                                 i
    i                                                                                                  i
© « ¢ ° ´ ¼ À Ý
        F IGURE 1.2: Printed text with paradigm of the Latin verb lego ‘read’,
            ca. 1445
        adopted the existing written letter forms and did not question their entire suitability as
        shapes for reproduction into metal types. Nor did either printer or founder, for many years
        until printing had been recognized for its own sake, make any attempt to seek or create
        letter forms better adapted to type reproduction than the written characters” (Ghosh, 1983,
        12).
i i
    i                                                                                                  i
    i                                                                                               i
4 CHAPTER 1. INTRODUCTION
        Aldus Manutius, who published the first volume of an edition of Aristotle in Greek in
        1495, closely imitated calligraphic style in his type, and made use of numerous ligatures
        and abbreviations. Ingram (1966), who provides an extensive guide to ligatures and ab-
        breviations in early Greek typography, remarks that when he first encountered Renaissance
        Greek printing, “I saw little resemblance between the Greek I had learned in school and
        this peculiar, cramped typeface which I could not read and which often contained only an
        occasional letter I could recognize” (Ingram, 1966, 371).
            6 On the early history of Arabic typography in Europe, see Roper (2002).
            7 Automation began to be introduced into type composition and casting considerably
        earlier in the nineteenth century. Notable early systems were devised by William Church
        (1822) and by James Young and Adrian Delambre (1840–1841) (Schlesinger 1989; Kahan
        2000, 1–2).
i i
    i                                                                                               i
    i                                                                     i
i i
    i                                                                     i
    i                                                                                                   i
6 CHAPTER 1. INTRODUCTION
        however, resembled at first the older typecases; with time, they became
        simplified and more ergonomic (AbiFarès, 2001). Another late nine-
        teenth century technology, the typewriter, was first commercially manu-
        factured in the United States in the 1870s.8 The typewriter greatly ex-
        panded the mechanical production of texts and allowed mechanical tech-
        nology to be used for the creation of even ephemeral documents. Type-
        writers reproduced many aspects of printing technology, but with several
        accommodations: a greatly reduced inventory of characters, monospac-
        ing, and the elimination of many possibilities for aesthetic refinement.
            Teletype machines, which originated around 1907, allowed for the
        remote transmission and printing of text; they led eventually to stan-
        dards for information encoding, most notably ASCII (American Standard
        Code for Information Interchange) in the 1960s (Bemer, 1963; Smith,
        1964; Mackenzie, 1980; Gaylord, 1995).9 Current digital computer key-
        boards evolved from teletype keyboards, and the first documents created
        using computers resembled typewritten documents. Digital typesetting
        emerged in the 1970s and made possible the creation of high-quality doc-
        uments that incorporated aspects of traditional typography (Syropoulos,
        Tsolomitis & Sofroniou, 2003). The desktop publishing revolution of the
        1980s and 90s brought these capabilities to an international public that
        continues to expand today.
            8 Manufacture by Remington of the typewriter designed by Christopher Latham Sholes
        and Carlos Glidden began in 1873 (Beeching, 1990; Bukatman, 1993; Kahan, 2000).
            9 We may look even earlier, to the five-bit code for telegraphy patented in 1874 by Bau-
        dot (Gillam, 2002, 43). A later rearrangement of the code was standardized in 1931 as
        CCITT #2 by the Comité Consultatif International Télpéhonique et Télégraphique (now
        renamed ITU-T) and extensively used by teletype machines (Mackenzie, 1980, 6, 62–64).
        As a matter of historical curiosity, we may note that the ultimate antecedent of the Bau-
        dot code was Francis Bacon’s so-called “bi-literal” cipher, first published in 1623 (Strasser
        1988, 88–9; Kahn 1996, 882–3).
           ASCII became an American (ASA) standard on June 17, 1963. Although ASCII is gen-
        erally thought of as a seven-bit code, it was actually designed as an eight-bit code with the
        eighth bit unassigned (Bemer, 1963, 35). When ASA (American Standards Association)
        became ANSI (American National Standards Institute), ASCII was officially designated
        ANSI X3.4-1968 (Mackenzie, 1980, 8). On the relation of ASCII to ISO 646 see Gaylord
        (1995).
           An interesting predecessor of character encoding is the Linotype, which redistributed
        its matrices in accordance with a seven-digit binary code assigned to each type, “although
        [Mergenthaler] probably did not realize the mathematical significance” (Kahan, 2000, 206).
i i
    i                                                                                                   i
    i                                                                                                i
        text and language processing followed at first only slowly, although we find already in
        1949 the first electronic text project in the humanities, namely, Roberto Busa’s computer-
        generated concordance Index Thomisticus (Hockey, 2000, 5). Today the Index Thomisticus
        lives on as the Index Thomisticus Treebank, a morphologically and syntactically annotated
i i
    i                                                                                                i
    i                                                                                                   i
8 CHAPTER 1. INTRODUCTION
i i
    i                                                                                                   i
    i                                                                                                     i
        remarked upon. In the late seventh century the Chinese Buddhist pilgrim Yijing noted that
        “The Vedas have been handed down from mouth to mouth, not transcribed on paper or
        leaves” (Takakusu, 1896, 182).
           12 Kharosthı̄ is now encoded in plane 1 of Unicode (U+10A00–U+10A5F).
                    ..
i i
    i                                                                                                     i
    i                                                                                   i
10 CHAPTER 1. INTRODUCTION
i i
    i                                                                                   i
    i                                                                                                      i
        equivalent to e0−1 .
          Psycholinguistic research suggests “that orthographic representations are organized into
        syllable-like units independently from phonological influences” (Ward & Romani, 2000,
        654). Cf. Caramazza & Miceli (1990); Badecker (1996, 60 n. 5, 67). For further dis-
        cussion with reference to Indic scripts see Sproat (2006); Kompalli (2007). The regular
        expression given above formalizes one of the two criteria of orthographic legality: “how
        many consonant letters you may have in a row before you must have a vowel” (Ward &
        Romani, 2000, 654). Knowledge of orthographic legality also involves knowledge of or-
        thotactic constraints on sequences of consonant characters (i. e., is a particular sequence of
        characters legal or illegal?).
i i
    i                                                                                                      i
    i                                                                                                   i
12 CHAPTER 1. INTRODUCTION
        Some ligatures (e. g., [a hks.ai = k, + :Sa) have idiosyncratic forms that are
        opaque in terms of their constituent analysis, and may thus be considered
        “graphic idioms” (Ivanov & Toporov, 1968, 35).14 Traditional Sanskrit
        orthography requires glyphs for representing more than a thousand con-
        sonant clusters, and it is not uncommon for there to exist four or more
        distinct styles for representing a single cluster (Wikner, 2002). Agen-
        broad (n.d.) illustrates difficulties in unifying consonantal characters in
        single ligatures. Shaw (1980, 28) reports that traditionally Devanāgarı̄
        fonts required 500–800 types for conjunct consonants.
             An examination of the visual characteristics of Devanāgarı̄ script
        helps to explain its graphotactic properties. Hamp (1959, 2) uses the
        term ‘graphotactic’ for the combination of graphic units by analogy with
        the term ‘phonotactic’. The two most obvious visual features of Devanā-
        garı̄ are the headstroke (śirorekhā) that runs horizontally across the top
        of a sequence of Devanāgarı̄ consonant graphs,15 and the vertical bar that
        appears at the right of many characters. The portion of the character that
        is densest in information (in information-theoretic terms) is below the
           14 Voigt (2005, 34) argues that h[ai originally was not a ligature, but rather was derived
        directly from Aramaic hs.i and was used to represent [ts] (possibly with the final component
        glottalized: [ts’]).
           15 This feature arose from the technology of calligraphy (Ghosh, 1983, 16). The head-
        stroke developed from an earlier head mark, which evolved in turn from the triangle of
        ink formed by the first placement of the pen at the start of drawing a character (Salomon
        1998, 31–8l; Shaw 1980, 28). In typographic terms, the headstroke in Devanāgarı̄ is the
        equivalent of the baseline in scripts such as Latin and Greek (cf. Katsoulidis 1996).
           Ivanov & Toporov (1968, 35) offer a doubtful functional explanation of the śirorekhā.
        They write:
                The continuity of the phonetic stream is reflected in the continuity of the
                graphic chain: separate syllabic symbols in a word and separate words
                themselves are connected by an uninterrupted horizontal line. This feature
                of the Indian writing can be explained not only by its phonetic character
                but also by the specific character of the word in Sanskrit where a significant
                role is played by long compound words which are sometimes functionally
                analogous to entire syntagms.
        Such an explanation cannot be accepted because there is no correlation between the pho-
        netic unity and the graphic unity of strings united by a headbar or separated by a gap in
        the headbar. There is no greater phonetic unity in tasmātkaroti than in anyo ’gacchat even
        though the latter breaks the headbar between words, and the former forms a conjunct conso-
        nant running the headbar across two words. Moreover, manuscripts write entire sentences
        uninterrupted regardless of word boundaries.
i i
    i                                                                                                   i
    i                                                                                                     i
        headstroke and to the left of the vertical bar.16 In only a few signs (Ta
        hthai, ;Da hdhai, and Ba hbhai) is the headstroke broken. Visually, we can
        establish three major classes of (consonant) characters:17
           1. Characters with a vertical bar at the far right (Ka             ga ;Ga . ca .ja Va :Na ta
              Ta ;Da na :pa ba Ba ma ya va Za :Sa .sa)
           2. Characters with a vertical bar at the center (k           :P)
           3. Characters that hang from a small stem attached to the headstroke
              (C f F .q Q d h); most of these characters have round bottoms.
        The character hjhai may belong either to group 1 (if it takes the shape
        Ja) or to group 2 (if it takes the shape ½). The character hlai may be-
        long either to group 1 (if it takes the shape a) or group 3 (if it takes the
        shape l). The character .= does not readily fit into this typology. The fol-
        lowing basic script behaviors are explicable with reference to the above
        categories:
i i
    i                                                                                                     i
    i                                                                                                i
14 CHAPTER 1. INTRODUCTION
                 (b) A consonant that follows /H/ is drawn within the open circle
                     that comprises the lower half of the h, utilizing the roof and
                     right of this circle as its upper horizontal or right vertical bar.
                     (e. g. Ì = h, + l)
        the glottal stop [P] (Kaufman 1984, 93 n. 40; Hoberman 1985, 224).
           20 In Kharosthı̄ initial consonants are formed from attaching the dependent vowel signs
                       ..
        to a character derived from aleph (Scharfe, 2002, 393).
i i
    i                                                                                                i
    i                                                                                                     i
          21 Owing    to sandhi, initial independent vowel signs will be written only (1) in hiatus,
        i. e. the environment V##V; or (2) in pausa (initially in a major phonological phrase). Al-
        though the glottal stop is not a phoneme of English, it commonly occurs in inter-word
        hiatus, e. g. heavy oak; steady awning — N. B. that the glottal stop is not ordinarily real-
        ized as full glottal closure (Hillenbrand & Houde, 1996); cf. Hadj-Salah (1971, 73 n. 63).
        Similar phonetics is likely to obtain in Sanskrit. Note that inter-word hiatus is often consid-
        ered exceptional — careful authors of ancient Greek prose, for example, avoided it entirely
        (Benseler, 1841). Many languages typically eliminate within-word hiatus (Clements, 1990,
        301) or disallow it entirely (Romani & Calabrese, 1998, 102).
            22 Cardona 1997, li–lxiv; Witzel 1974.
i i
    i                                                                                                     i
    i                                                                                      i
16 CHAPTER 1. INTRODUCTION
i i
    i                                                                                      i
    i                                                                                              i
        (website: <http://www.iso.ch/>).
          24 <http://www.unicode.org/cldr/transliteration_guidelines.html>.
i i
    i                                                                                              i
    i                                                                                               i
18 CHAPTER 1. INTRODUCTION
        letters (Jones, 1942, 2–3). Each of these solutions has its weaknesses.
        Bartholomew Ziegenbalg in his Tamil grammar of 1716 spells the pre-
        palatal affricate (a unitary phoneme) of Tamil as hytschi (Firth, 1936,
        34). Even today, it is customary in Germany to render with the hepta-
        graph hschtschi the phoneme written in Cyrillic as hIi. Clearly the use
        of “polygraphs” can be uneconomical. The use of diacritics can present
        extraordinary challenges to the typesetter, as when one wishes to indicate
        in a Romanized text that a Sanskrit vowel is long (macron), nasalized
        (tilde), and accented; in this case three diacritics must be stacked. The
        creation of new characters is always an option; but after one has added
        enough new characters, one has a new script — no longer Roman.25
        phabet (Ryan 1993; Desbordes 1990). On more recent created characters for the Latin
        alphabet, see Abercrombie (1981).
i i
    i                                                                                               i
    i                                                                                                  i
        Trigger 1998, 41). They are clearly vitiated by the high levels of literacy current in these
        countries as well, of course, as the tremendous economic growth.
          27 Text in the All-India Alphabet is conventionally printed in boldface.
i i
    i                                                                                                  i
    i                                                                                    i
20 CHAPTER 1. INTRODUCTION
i i
    i                                                                                    i
    i                                                                                                  i
Chapter 2
                                                    21
i                                                                                                          i
    i                                                                                                  i
    i                                                                           i
i i
    i                                                                           i
    i                                                                                             i
i i
    i                                                                                             i
    i                                                                                                 i
        was used not only for setting Sanskrit, but also for vernacular languages
        such as Marathi, Hindi, Nepali, and Gujarati (Shaw, 1980, 30).
            Hot-metal typesetting came to India in the 1920s when the Mergen-
        thaler Linotype Company started shipping Indic fonts for its linecasters
        (Ross, 2002). The Monotype Corporation cut a 12 point Devanāgarı̄ font
        for hot-metal typesetting as early as 1923 (Shaw, 1980, 28). Hot-metal
        technology, however, necessitated “severely restricted character sets, the
        lack of kerning, and the inability to position the subscribed or super-
        scribed vowel signs” (Ross, 2002).3 The Indologist W. Norman Brown
        (1892–1975), founder of the first South Asia area studies program in
        the United States (at the University of Pennsylvania), served as consul-
        tant to the Merganthaler Linotype Company in the 1930s and subsequent
        decades. Brown considered script reform measures that would ease the
        transition to modern technologies such as hot-metal typesetting.4 The
        Devanāgarı̄ script reform committee of Uttar Pradesh made several rec-
        ommendations (1940), including:
        The aim of these reforms was to reduce the number of pieces of type
        needed to set Devanāgarı̄. (Traditionally, Devanāgarı̄ type required four
            3 “The Linotype mechanism put constraints on type face design because the machine
        could not emulate all the features of manuscript; in particular, where adjacent elements
        overlap vertically” (Kahan, 2000, 190). See also Ghosh (1983, 10).
            4 Politicians of course had their say in the matter. Jawharlal Nehru for some time con-
        sidered the benefits that might follow from adopting the Roman alphabet. Gandhi sought
        to replace the independent vowel signs of Devanāgarı̄ with the sign A, together with the
        dependent vowel signs.
i i
    i                                                                                                 i
    i                                                                                            i
        typecases, compared to the two needed for Roman.) The proposal of the
        committee required only 110 types (Brown, 1953, 5):
                 full consonant forms and independent vowel forms                42
                 half forms of consonants                                        26
                 special conjunct forms                                           1
                 dependent vowel forms                                           14
                 punctuation                                                      8
                 numerals                                                        10
                 miscellaneous signs                                              9
        Several Hindi newspapers adopted certain of the committee’s sugges-
        tions, although none adopted all (Brown, 1953, 5).5
        (1979).
           6 See for instance the German code DIN 66003-1967, Informationsverarbeitung 7-Bit
        Code.
           7 For some other Hindi typewriter layouts, see Beeching (1990, 58). See also Bhatia
(1974).
i i
    i                                                                                            i
    i                                                                                                i
        full forms of consonants, while the same keys struck with shift depressed
        generate the half-forms used in the construction of ligatures. Certain in-
        dividual graphs can only be typed with a combination of keystrokes. For
        instance the aspirate :P hphai must be typed as: (1) :pa hpai and (2) the
        loop that appears to the right of the vertical bar. Thus the Devanāga-
        rı̄ typewriter decomposes characters into their visual constituents.8 Of
        course, many of the conjunct forms and diacritics traditionally used in
        high-quality Sanskrit typography simply cannot be reproduced with such
        a typewriter.
              Text processing software on the digital computer brought the possi-
        bility of an expanded character repertoire and the possibility of shifting
        the burden of tedious composition processes from human to machine. Yet
        in the absence of standardized encodings and text layout software ade-
        quate to meeting the challenges of complex scripts, the first generation
        of Devanāgarı̄ fonts made use of completely proprietary, non-standard
        encodings, were not able to unify non-distinctive glyph variants under a
        single grapheme,9 and required that text be stored (and, often, typed) in
            8 Similarly, the typewriter keyboard designed by the Arabic script reformer Ahmed
        Lakhdar-Ghazal uses three symbols as the appendices of word-final Arabic letters; tra-
        ditionally, the combination of letterform and appendix has been considered a variant form
        of a single grapheme (Mahmoud, 1979, 111).
            9 The term grapheme denotes a minimal distinctive unit of visual language; cf. Pulgram
1951; Hamp 1959. The term has been used in various ways by (psycho-)linguists. This
i i
    i                                                                                                i
    i                                                                                                            i
        is written to the left of the onset consonant(s) in its orthographic syllable; thus -nti is
        written ;a;nta
                   //// . Moreover, in a sequence of /r/ + consonant(s), the /r/ is written above the final
        constituent of the orthographic syllable: ;Da;}yRa dharmya ‘suitable, legitimate, virtuous’, k+.a;Ra
        kartrı̄ ‘female agent’.
           Primary users of Devanāgarı̄, however, sometimes find the visual order of graphs “nat-
        ural” (in as much as it is the order that they follow when writing by hand) and become
        confused if they are required to input the /i/ (for example) in its phonetic position (Joshi,
        Ganu, Chand, Parmar & Mathur, 2004).
           11 <http://www.internetworldstats.com/asia/in.htm>.
i i
    i                                                                                                            i
    i                                                                                                    i
        2.3       UPACCII
        In the early part of 1983 Pijush K. Ghosh was a guest of the digital typog-
        raphy project at Stanford University. Ghosh worked to create fonts that
        would allow Indic languages to be set using Donald Knuth’s TEX sys-
        tem. Ghosh (1983, 23) recognized the need for “[t]he design of efficient
        internal codes for the characters of a script for information processing,
        storage and transmission.” The solution was a Universal Phonetic Atom
        Code Chart for Information Interchange (UPACCII), based on ASCII
        (Ghosh, 1983, 26). Ghosh includes the control characters at their normal
        ASCII positions (000–037).12 He largely maintains the ASCII characters
        at 040–077 in their normal positions, substituting only the candrabindu
        at 044 for h$i, the anusvāra at 046 for h&i, and the danda at 056 for h.i.
        From 0100–0107 he places the visarga, accent marks, punctuation, the
        avagraha, and the short vowel hAi. The consonants hki through h;Dai are
        positioned at 0110–0132. The ASCII sequence is preserved from 0133–
        0140. The consonants hnai through hhi are at 0141–0156. The vowels
        (save hAi) follow at 0157–0170: hA;ai, hIi, hIR i, hoi, hi, hi, hO;i, hOe;i,
        hA;eai, hA;Eai. At 0171 Ghosh places the virāma, at 0172 a BREAK charac-
        ter to prevent ligation (parallel to ZWNJ U+200C in Unicode). Normal
        ASCII values continue from 0173–0177.
             Ghosh rightly makes his code independent from input (keyboarding)
        and output (printing). For the former, he proposes an ergonomic key-
        board layout inspired by the Dvorak layout; for the latter he proposes a
        print code chart (Ghosh, 1983, 28, 31). UPACCII is basically phonetic
        in nature, so that there are not (as in ISCII and Unicode) separate char-
        acters for independent vowels and for dependent vowel mātras. Ghosh’s
        encoding is intelligent and possesses some strengths in comparison with
        contemporary encodings that are widespread, but it was never adopted as
          12 Character codes are indicated here in octal notation, like that used for constants in the
C programming language.
i i
    i                                                                                                    i
    i                                                                                  i
2.4. ISCII 29
        2.4     ISCII
        The Indian Script Code for Information Interchange (ISCII) is an Indian
        national standard; the first version was published by the Indian Depart-
        ment of Electronics (DOE) in 1983 (Bhatt, n.d.). More recent versions
        have been published in 1986, 1988, 1991, and 1998. ISCII is designed
        to support Devanāgarı̄ as well as nine other Brāhmı̄-derived scripts: Gu-
        jarati, Panjabi, Assamese, Bengali, Oriya, Telugu, Tamil, Malayalam and
        Kannada. These scripts are the primary means of writing for the twenty-
        two nationally recognized languages of India, with the exception of those
        that are primarily written in Perso-Arabic script, viz. Urdu, Kashmiri,
        Sindhi (Singh, 1997).
             ISCII employs a single set of codepoints for ten distinct scripts. Thus
        the syllable hkai is encoded identically whether it is written in Devanā-
        garı̄, Gujarati, or Malayalam. The general structural principles of ISCII
        are based on those of the Brāhmı̄-derived scripts. In general:
           • Consonants imply /a/, unless overridden by either an explicit vowel
             or the HALANT character (= virāma, i.e. the ∅ vowel).
           • Separate codepoints exist for independent and dependent vowel
             signs.
i i
    i                                                                                  i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                                   i
                  à	    isolated
                   K	   word-initial  I . K	 nabata ‘to sprout’
                   J	word-medial I       K	 . bint ‘girl’
                  á	 word-final       á	 .K tibn ‘straw’.
            •   The Devanāgarı̄ sequence of k hkai + HALANT + k hkai may be re-
                alized as (1) a single glyph with two components stacked vertically
           13 A thorough discussion is to be found in ISO/IEC TR 15285:1998(E) “Information
        between inscriptions and characters, where the latter are categorical in nature and presup-
        pose an equivalence class (Tolchinsky, 2003, 17).
i i
    i                                                                                                   i
    i                                                                                         i
i i
    i                                                                                         i
    i                                                                               i
        transliteration of Vedic and Tamil. The CS and CSX standards are based
        on IBM CP 437 (an 8-bit codepage with the lower half corresponding to
        ASCII, and an upper half containing accented characters for European
        languages and additional symbols). The CS standard replaced 32 code-
        points in CP 437 with upper- and lower-case characters used in Sanskrit
        transliteration (but not used for modern Western European languages).
        CSX replaced an additional 22 codepoints. A fundamental design prin-
        ciple of CS and CSX was to depart as little as possible from CP 437. A
        superset of CSX, CSX+, also exists, which adds an additional 28 charac-
        ters used in Indic transliteration and specified in ISO 15919; four other
        characters for general-purpose typography are also added. One character
        (á) has been moved, since its codepoint is reserved in Windows character
        sets for a non-breaking space.
            Although a number of fonts supporting the CS family of standards
        exist (including fonts released under free licenses such as the GPL16 ),
        CS/CSX/CSX+ are not registered with any international standards au-
        thority and lack any general OS- or application-level support. Packages
        providing support for CS in TEX are available, however (Pandey, 1998).
i i
    i                                                                               i
    i                                                                                                i
        role. CD-ROMs distributed by TITUS still contain the texts in the TITUS
        Indological 8-bit Encoding.
            The TITUS Indological encoding departs significantly from CP 437;
        with the exception of the basic alphanumeric characters and basic punc-
        tuation, all symbols have been redefined. (Even CP 437 is not a superset
        of ASCII, as it redefines the ASCII control characters (0x00-0x19) as
        dingbats, and other symbols.) The TITUS encoding in addition overrides
        other characters in the ASCII range: 0x23 = # → h (indicates aspiration
        of the preceding segment); 0x24 = $ → r̄ (syllabic long r); 0x25 =
        % → , (Semitic , ayin); 0x26 = & → -˚ (Semitic hamza); 0x7f =
        BEL → ṁ (anusvāra). The upper half of the TITUS encoding contains
        modified Roman characters used in the transcription of Sanskrit, as well
        as other Indic and Dravidian languages, and such related languages as
        Avestan. Some characters frequently used in the orthography of Western
        European languages are retained as well.17
             • Devanagari (U+0900–U+097F).
        Unicode lacks codepoints for characters with under-rings and for charac-
        ters with the combination of an accent and another diacritic; these may
           17 Details of the encoding were kindly supplied by Jost Gippert (personal communica-
        tion). A TrueType font for displaying texts in the TITUS Indological encoding is available
        from TITUS (<http://titus.uni-frankfurt.de/>).
i i
    i                                                                                                i
    i                                                                                               i
        circulated, under the name “The Vāmana Project”, to add to Unicode all characters needed
        for implementing ISO 15919 in precomposed format. It is, however, the policy of the Uni-
        code consortium to add no new precomposed characters, where characters can be composed
        from presently-encoded characters.
           19 Such input schemes are used, for instance, in Lagally’s excellent ArabT X package
                                                                                      E
        (Lagally, 2004).
i i
    i                                                                                               i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                                        i
2.11. WX 37
        2.11         wx
         The authors of the textbook Natural Language Processing: A Paninian
         Perspective present a scheme for “[i]nternal representation in the com-
         puter” that shares many design principles with our SLP1 (Bharati, Chai-
         tanya & Sangal, 1996, 193). In the wx scheme (so dubbed after the
         characters used to encode the dental stops t and d), a single charac-
         ter represents a single speech sound. Equivalences are more or less
         straight-forward. Lower-case ASCII letters represent short vowels or
         close diphthongs, while upper-case letters represent long vowels and
         open diphthongs. The symbol q represents r, and L, l (Huet, 2009,
         196); while no provision is made at all for r̄˚or l̄. The˚graphic oppo-
         sition lowercase–uppercase consistently represents˚ ˚ the phonological op-
         position unaspirated–aspirated. Some characters have a peculiar repre-
         sentation: e. g. the velar nasal ṅ (f) and the palatal nasal ñ (F). The den-
         tal oral stops t, th, d, dh are represented as w, W, x, X, whereas the
         retroflex .t, .th, d., d.h are represented as t, T, d, D. This convention
         is no doubt motivated by the fact that speakers of Modern Indic and Dra-
         vidian languages regularly perceive English alveolar stops as retroflex.22
         The retroflex sibilant s. is represented as R. This scheme, despite its con-
         siderable virtues, seems not to be widely used, although Indian students
         in NLP study it, and it plays a role in the Anusaaraka suite of NLP soft-
         ware,23 including the Sanskrit morphological analyzer of Amba Kulkarni
         and V. Sheeba. The scheme is, however, fundamentally limited, since it
         does not allow for the full set of vocalic liquids described by the Sanskrit
         grammarians, the unaspirated and aspirated retroflex lateral flaps .l and
        l.h, any system of accents, or other sounds peculiar to Vedic traditions.
        2.12         Kyoto-Harvard
        The Kyoto-Harvard transliteration is not a meta-transliteration as defined
        above. It instead chooses one or two symbols for each Sanskrit speech
        sound, with the addition of some special-use symbols (Wujastyk, 1996).
           22 Thus in Hindi, for example, both instances of alveolar [t] in tractor become [ú]: :f"E;#f:=.
        The retroflex series of stops in Hindi contrasts (as in Sanskrit) with pure dentals: [t”], [t”h ],
        [d”], [d”h ] (not alveolars). Cf. Harley (1955, xix).
           23 <http://ltrc.iiit.net/~anusaaraka/>
i i
    i                                                                                                        i
    i                                                                                 i
        2.13       Varn.amālā
        Joshi, Dharmadhikari & Bedekar (2007) have proposed a scheme for
        Sanskrit text encoding which they term varn.amālā ‘garland of speech
        sounds’. Whereas ISCII and Unicode take as their starting point for
        the encoding of Indian-language texts the orthographic syllable (aks.ara),
        Joshi et al. propose a phonemic approach in which the fundamental unit
        is the individual speech sound (varn.a). The proposed varn.amālā in-
        cludes the fourteen vowels of Sanskrit; six additional vowels (short e,
        candra e, long candra e, short o, candra o, long candra o); anusvāra,
        nasalization (candrabindu), and visarga; and thirty-four consonants (in-
        cluding the retroflex lateral flap .l).
            The varn.amālā scheme has been implemented in the context of the
        IndiX project developed by C-DAC Mumbai. IndiX is a set of libraries
        and applications based on the GNU/Linux operating system that provide
        support for Indic scripts.24
            The varn.amālā scheme is indeed based on phonetic principles, many
        of which are in accord with principles that we develop below. The status
        of this encoding, however, remains unclear. Joshi et al. (2007) do not
        assign codepoints or provide an ordering of the sounds in the repetoire.
        Earlier work by Joshi (2006) presents the varn.amālā as a “Vedic San-
          24 <http://www.cdacmumbai.in/projects/indix/>.
i i
    i                                                                                 i
    i                                                                                  i
        2.13. VARN
                 . AMĀLĀ                                                       39
i i
    i                                                                                  i
    i                                                             i
i i
    i                                                             i
    i                                                                                  i
Chapter 3
        Critique of encoding
        systems seen so far
        Most of the encoding systems surveyed above are based primarily either
        upon Devanāgarı̄ script or upon the standard Romanization of Sanskrit.
        The difficulties with these systems are due in part to problems in the
        modes of graphic representation of Sanskrit sounds adopted in Devanā-
        garı̄ and the standard Romanization themselves. Current encoding per-
        sists in being script-based; it allows display conventions to govern uses of
        encoding that transcend appearance. While free-hand drawing and type-
        face, upon which contemporary encoding systems are based, historically
        served only display purposes, contemporary character encoding serves
        linguistic and archiving purposes that transcend mere display. Hence,
        while it is understandable that initially character encoding was motivated
        by display issues in imitation of typeface or manuscript hand, recent ex-
        igencies require an explicit system for encoding complete linguistic in-
        formation. It is therefore timely to consider the principles governing the
        design of character-encoding systems.
             The difficulties with the Devanāgarı̄ standards and the Roman stan-
        dards surveyed above become evident by observing the discrepancies be-
        tween the encoding of Sanskrit embodied in the Devanāgarı̄ script and in
        standard Romanization. Consider especially the following three points:
                                            41
i                                                                                          i
    i                                                                                  i
    i                                                                                     i
i i
    i                                                                                     i
    i                                                                                               i
i i
    i                                                                                               i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                                i
2 Cf. Macdonell 1910, 77-78; Renou 1952, 68–69 (both cited critically by Cardona 1997,
lvi–lxi).
i i
    i                                                                                                i
    i                                                                                i
Chapter 4
                                           47
i                                                                                        i
    i                                                                                i
    i                                                                                 i
i i
    i                                                                                 i
    i                                                                                                   i
print in Europe.
i i
    i                                                                                                   i
    i                                                                                                  i
        lowing the phonetic–graphic route in writing (and thus is entirely incapable of writing
        nonsense words) but has well-preserved ability to write known words via the whole-word
        route (Shallice, 1981).
            4 There are two kinds of grapheme-color synesthetes: for projectors, the color percept
        is bound to the visually-presented grapheme, whereas for associators the color percept
        is “seen” before the “mind’s eye” (Smilek et al., 2005). The most compelling current
        explanation of grapheme-color synesthesia is based on proximity of the V4 or V8 areas
        implicated in color vision to the so-called “visual number grapheme” area. These areas are
        all located within the fusiform gyrus. Additional connections between these areas could
        explain the synesthetic percepts (Ramachandran & Hubbard 2001; Ramachandran et al.
        2004). Subsequent research has identified the “visual number grapheme” area as belong-
        ing to the visual word form area (VWFA), with the approximate location (−43, −54, −12)
        in Talairach space (Cohen & Dehaene, 2004). Although it is unlikely that the VWFA is
        entirely devoted to reading, it is hypothesized that the VWFA contains detectors tuned to
        recognize graphemes, as opposed to pseudo-graphemes. It is further hypothesized that
        “neurons in the fusiform region are tuned to progressively larger and more invariant units
        of words, from visual features in extrastriate cortex to broader units such as graphemes,
        syllables, morphemes, or even entire words as one moves anteriorily [sic: anteriorly] in the
        fusiform gyrus” (Cohen & Dehaene, 2004, 471).
i i
    i                                                                                                  i
    i                                                                                   i
        ter than the orthography associated with that script. The complexity of
        the mapping between the orthographic and the phonetic levels is known
        as orthographic depth and can be precisely quantified (Frost 1992; van
        den Bosch, Content, Daelemans & de Gelder 1994; Treiman 2006, 595).
        To take two contrasting cases, Finnish orthography is very shallow (or
        transparent), while English is quite deep (Lyytinen, Aro, Holopainen,
        Leiwo, Lyttinen & Tolvanen, 2006, 40). Since orthographies are never
        entirely shallow or transparent (Weir, 1967), character encoding by its
        very nature represents knowledge that has already passed through sev-
        eral stages at which information loss is possible. The goal of encoding
        should be to minimize the loss of information. Since degradation can
        occur at each stage of expression and transition, one ought to capture the
        informational content at the earliest stage possible. Given that script is
        inherently a secondary phenomenon vis-à-vis spoken language, encoding
        should be based directly on spoken language.
             As noted above (§1.3), Devanāgarı̄ script itself was not specifically
        designed to represent Sanskrit phonology but rather was adapted to this
        use subsequently. Devanāgarı̄ derives from Brāhmı̄ script, which was in
        turn influenced by Kharos.t.hı̄, which was itself adapted from Aramaic.
        Brāhmı̄ was placed in service in India originally to represent the phonol-
        ogy of Prākrit, rather than Sanskrit; the former lacks a number of the
        latter’s phonemes, including vocalic r, r̄, and l, and the open diphthongs
                                              ˚˚
        ai and au (Oberlies, 2003, 168). Moreover,      ˚ phonological features
                                                        some
        of Sanskrit for which Devanāgarı̄ incorporates an encoding mechanism,
        such as the glottal stop, are not explicitly recognized in the phonologies
        of Indian linguists. Since Devanāgarı̄ was never systematically designed
        to represent the phonological systems of Indian linguists in the first place,
        it would be surprising indeed if it should serve as a more appropriate ba-
        sis for encoding Sanskrit than Sanskrit phonology. In fact, very few of the
        world’s writing systems were designed for the languages that they repre-
        sent in extant texts and manuscripts. Borrowing is the norm in the history
        of writing, and adaptations almost always fail to capture the structure of
        the spoken language adequately.
             Therefore, where one has access to the phonology of the language,
        where the orthography is fairly shallow, and where the orthography de-
        parts from an ideal coding of spoken language structure, the basis for text
        encoding should be phonetic rather than graphic. Sanskrit meets these
i i
    i                                                                                   i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                                 i
        of a papyrus roll fixed a limit on the extent of a book (or single section
        of a complete work).7 In India, binding techniques and materials con-
        strained the number of pages in a manuscript, and the size of palm leaves
        constrained the size of a page. Writing implements, the resolution of
        human vision, and manual motor limitations governed the size of char-
        acters. The minimal independent unit in script is the graphic segment or
        graph.
        4.2.2      Features
        Speech is not one-dimensional. Phonetic units may be decomposed into
        a number of acoustic or articulatory features that are realized simultane-
        ously. Early work on phonetic features conceived features as constituents
        of phonetic segments; or, from a different perspective, segments were
        bundles of features (Jakobson, Fant & Halle, 1963, 3).8 Yet more re-
        cently, linguists have come to see that features overlap segments; for
        instance, the feature of voice [+ voice] is realized across all six segments
        of the Sanskrit word form babhūva ‘he was’.9 Ancient Indian phonetic
        treatises recognized that pitch either spread from a vowel to neighboring
        consonants in its syllable (TPr. 1.43), or properly belonged to the sylla-
        ble itself (cf. RPr. 3.9; VPr. 3.130 (Rastogi); APr. 3.67) (Whitney, 1868,
                        ˚ the prosody of retroflexion extends rightward from r, r̄,
        314). In Sanskrit,
                                                                                  ˚˚
        r, or s., causing non-final n to be realized as n. despite intervening vowels,
        semivowels, gutturals, labials, or anusvāra (A. 8.4.1–2; cf. Allen 1951,
        940; Zwicky 1965, 61–63; Hock 1979, 52–53; Anderson 1985, 191–
        192; Dixon & Aikhenvald 2002, 17; Hamann 2003, 122). Thus a feature
        may be associated with a string of one or more segments; and a segment
        is associated with a set of features. It was J. R. Firth’s insight that “some
        phonological properties are not uniquely ‘placed’ with respect to partic-
        ular segments within a larger unit” (Anderson, 1985, 185); Firth refers to
        such properties as prosodies.10
           7 For  details, see Kenyon (1951).
           8 We   bypass here the question of whether the mental lexicon contains featural specifi-
        cations (Feature-Segment Hypothesis) or just segments (Indivisible-Segment Hypothesis)
        (Stemberger, 1982).
            9 Cf. Zellig Harris’ notion of long components (Anderson, 1985, 191).
          10 It is the norm that features overlap segments. Contemporary research on articulatory
i i
    i                                                                                                 i
    i                                                                                                 i
        Mathematically speaking, they correspond to equations of the form y = a sin b(x+c), where
        a determines the amplitude, b determines the frequency, and c determines the phase. In a
        linear oscillating system the output (a periodic time waveform) corresponds to the sum
        of a set of sinusoids. Fourier analysis allows us to find the complex coefficients Cn that
        represent the phases and amplitudes of the harmonic sinusoid components of the periodic
        time waveform v(t) using the following equation:
                                             1 T/2
                                                Z
                                        Cn =          v(t)e j2πnt/T dt,
                                             T −T/2
        where T is the period of the waveform (Pierce, 1999). In speech, vowels are approximately
        harmonic, whereas consonants correspond generally to noise.
           12 Analysis of characters as mathematical graphs is not merely a recent development.
        Already Bondy (1972) presents a graph theoretical analysis of the Greek alphabet, together
        with some interesting remarks on palaeographic developments, some hints of prospective
        algorithms, and a good measure of humor.
i i
    i                                                                                                 i
    i                                                                                                       i
        natural character for all articulate sounds might easily be agreed on, if nations would
        agree on anything generally beneficial, by delineating the several organs of speech in the
        act of articulation, selecting from each a distinct and elegant outline” (qtd. by Firth 1946,
        122).
           14 Bell’s system is indeed remarkable, and it is to be lamented that such a system was
        never developed further. On the other hand, from our perspective as twenty-first century
        linguists, there are many insufficiencies in visible speech: how, for example, to represent
        the retroflex series of Sanskrit, or the emphatic (pharyngealized) series of Arabic, or the in-
        gressive sounds of certain African languages? Bell’s system allows for 120 unique sounds,
        which is approximately equal to the maximum phonemic inventory of any known language;
        yet it is by no means adapted for transcribing a language such as !Xũ/!Kung, which has,
        on one count, 141 phonemes (Maddieson 1984, 421–422; cf. Ladefoged 2005, 9). (48 of
        these are “click” consonants. The phonemics of this language are somewhat controversial.
        The classic study is Snyman (1970).) On languages with a large phonemic inventory cf.
        Szemerényi (1967, 86), who characterizes the 84 phonemes of Ubykh as “a world record”.
           15 “Yet Sweet differs from Bell by relating place to the passive, not the active, articulator”
i i
    i                                                                                                       i
    i                                                                                   i
        the same way that phonetic features may spread over multiple phonetic
        segments, graphic features may spread over multiple graphic segments.
        An example is the horizontal headstroke that runs across a sequence of
        Devanāgarı̄ characters.
i i
    i                                                                                   i
    i                                                                                   i
        message. Similarly, the stains, smudges, spills, and creases on the page
        of a manuscript, as well as the wiggles, trailers, absolute (as opposed to
        relative) stroke dimensions are irrelevant to the scholar who is decipher-
        ing the linguistic content of a manuscript (cf. Kropač 1991, 118). Yet to a
        palaeographer or codicologist, these same idiosyncratic marks may pro-
        vide valuable information about the manuscript’s date, its copyist, and
        the conditions under which it has been stored.
             Speakers of a particular language normally do not distinguish phones
        that occur only in complementary distribution in their language. Thus
        Arabic does not distinguish between [b] and [p], which constitute distinct
        phonemes in English. English does not distinguish between [p] and [ph ],
        which are distinct phonemes in Sanskrit. For an English speaker [p] and
        [ph ] are the same phoneme, just as for the Arabic speaker [b] and [p] are
        the same (cf. Jakobson et al. 1963, 9).
             The structure of an encoding should closely follow the structure of
        the linguistic units themselves. A character-encoding scheme ideally en-
        codes only the minimally distinctive graphs of the language. A sound en-
        coding scheme ideally encodes only the minimally distinctive phones of
        the language. In either case, codepoints are assigned only to contrastive
        units. The ancient Indian linguists understood a similar principle regard-
        ing the relationship between speech and meaning in the first stage of
        the expression of knowledge. They desired a one-to-one correspondence
        between the speech form and the object to be conveyed. The principle
        of the avoidance of redundancy is embodied in Patañjali’s oft-repeated
        phrase, “one does not employ a speech form for what has already been
        stated” (uktārthānām aprayogah.).18 Similarly, in Mīmāṁsāsūtra 1.3.26
        anyāyaś cānekaśabdatvam, Jaimini states that it is improper for a single
        meaning to be donoted by multiple speech forms.
i i
    i                                                                                   i
    i                                                                                                   i
        encode directly the sounds of the spoken language, rather than the char-
        acters that symbolize them. In making such decisions, one must consider
        whether the cultural heritage is received primarily in written or in oral
        form and, if written, how closely the written form represents the phonol-
        ogy of the language.
            For English, the Roman script, rather than the oral language, is the
        predominant vehicle of the received cultural heritage. Scholarship is
        primarily written. Although regional pronunciation varies, spelling is
        highly standardized and needs to be taught even to native speakers up
        through secondary school. The Roman script, designed to model the
        Latin sound system, was never systematically remodeled to accord with
        English phonology.19 Moreover, the phonology of English has changed
        significantly since the adoption of the Roman alphabet, widening further
        the gap between script and sound. Character encoding evolved first to
        capture the system of contemporary written English (Birnbaum, 1989),
        and ASCII (as well as supersets such as ISO 8859-120 and Unicode) pro-
        vides a reasonable basis for archiving and processing English language
        text. A phonetic encoding of English would not be desirable for many ap-
        plications, since it would necessarily impose arbitrary dialectal features
        on written texts. Furthermore, writers and readers of English are used to
        an orthography that often privileges morphological representation over
        phonological representation (consider for instance the different vowels
        in potent ["powtnt] and impotent ["Imp@tnt]) (Weir 1967; Klima 1972;
        French 1976, 124; " Sampson 1985, 204–205; " Tolchinsky 2003, 92, 193–
        194; Snowling 2005; Lyytinen et al. 2006, 49). English spelling also
        possesses a lexical-semantic aspect, as shown by such homophonous but
        heterographic and heterosemantic sets as {knew, new, gnu} (Weir 1967,
           19 The earliest inscriptions in Old English are in the Runic futhorc alphabet, which de-
        rives from the Germanic futhark and is first attested (in the Caistor-by-Norwich runes) for
        the fourth or early fifth century (Page, 1999, 21). The adoption of the Roman alphabet
        was a response to the spread of Christianity. Originally, several added letters represented
        phonemes specific to English: hæ, ð, þ, ßi (the latter two directly borrowed from futhorc: þ
        = þorn ‘thorn’; ß = ßynn ‘joy’) (Page, 1999, 186, 212–213). With the rise of printing in the
        15th century, the added characters fell into disuse, since they did not exist in the fonts of
        continental printers (McArthur, 1992, 31–32). Once again we see the limitations imposed
        by a shift in technology.
           20 See Gaylord 1995.
i i
    i                                                                                                   i
    i                                                                                                    i
21 Similarly, French orthography involves a number of elements that are not phono-
        lier systems allowed for phonetic transcription using only ASCII symbols. CHILDES
        (Child Language Data Exchange System) (<http : / / childes . psy. cmu . edu / >) uses the
        PHONASCII transliteration format (based on IPA) in its CHAT database (Allen 1988;
        MacWhinney 1991, 71–82). ARPAbet (Shoup, 1980) is a widely-used pure ASCII sys-
        tem for the phonetic transcription of American English.
i i
    i                                                                                                    i
    i                                                       i
i i
    i                                                       i
    i                                                                                                 i
Chapter 5
Sanskrit phonology
prātiśākhya to 250 B . C . E .
                                                   61
i                                                                                                         i
    i                                                                                                 i
    i                                                                                                      i
        median absolute duration of a stressed vowel is 130 msec; that of a consonant or unstressed
        vowel is about 70 msec (Klatt, 1976).
i i
    i                                                                                                      i
    i                                                                                                     i
i i
    i                                                                                                     i
    i                                                                                                        i
        clear whether the Vedic L, .l and \h, .lh were flaps, taps, or approximants. In Modern Indic
        (Gujarati, Marathi, Oriya, and the four Dravidian languages), L, .l is a retroflex lateral ap-
        proximant, not a flap (Aklujkar, Cardona, Deshpande, Bhaskararao), and it is reasonable to
        assume that retroflex lateral approximants developed from the intervocalic voiced retroflex
        stops .q, d. and Q, d.h (Cardona). In Tamil the retroflex lateral approximant  ñ .l is not exclu-
        sively intervocalic but occurs in clusters, including geminates (Steever) and contrasts with
        a central retroflex approximant with lateral contact between the sides of the mid-tongue
        and the palate x     ñ l, as well as with a non-lateral post-alveolar ñ r (which may be in the
        process of merging     ¯ with alveolar ñ̀ r) (Keane, 2004, 113) (with thanks
                                                                                 ¯   also to Chevillard).
        Likewise, the Vedic retroflex laterals are distinguished from the modern Hindi retroflex
        flaps .qÍ, and QÍ,. The development of weaker allophones in intervocalic position in Vedic is
        paralleled in Middle Indo-Aryan: nn > n., and ll > .l (Hock).
i i
    i                                                                                                        i
    i                                                                                                     i
śākhya.
i i
    i                                                                                                     i
    i                                                                                                     i
        ing those that correspond to the views of Pān.ini and the Vājasaneyiprātiśākhya, which are
        not always clearly indicated as the views of others.
i i
    i                                                                                                     i
    i                                                                                                  i
        dāttamayam, but commentators and Rastogi’s edition 4.139 read udāttamayam instead of
        anudāttamayam. If Sharma’s edition is simply mistaken, the accentual system prescribed
        is more complex than here described, but it is possible that commentators and Rastogi have
        revised the text to conform to the Rkprātiśākhya description without recognizing that the
        Vājasaneyiprātiśākhya described a˚different accentual system.
i i
    i                                                                                                  i
    i                                                                                                       i
i i
    i                                                                                                       i
    i                                                                                                     i
        lar nasal by arguing that if anusvāra is phonetically a pure nasal, as some ancient treatises
        describe it, its production would require dorsovelar closure. This identity cannot be ac-
        cepted, however, because Indian phonetic treatises consistently distinguish anusvāra from
        the nasal stops, including the velar nasal. A uvular place of articulation would account for
        the distinction of the anusvāra from the velar nasal stop (Laver, 1994, 209–14). It might also
        account for the diverse descriptions of its place of articulation: the uniqueness of the uvula
        as a place of articulation would account for its escaping the notice of the ancients, or if it
        were recognized, the systematic inelegance of creating a sixth buccal place of articulation
        solely for this sound would have discouraged ancient phoneticians from so categorizing it.
        Yet a voiced uvular nasal stop is rare in the phonetic inventories of the world’s languages
        and is not recognized by any Indian phonetic treatises.
            18 Busetto (2003, 193 n. 3, 205 n. 18) combines the voicing of the Pāninı̄yaśiksā and
                                                                                      .         .
        related traditions with the aspiration of the Rkprātiśākhya tradition. He characterizes
                                                         ˚
        anusvāra as originally being a voiced fricative homorganic with the subsequent segment.
        While ancient phonetic treatises generally characterize anusvāra as voiced, the Rkprāti-
                                                                                              ˚
        śākhya, which characterizes it as unvoiced, represents the earliest phonetic description   in
        the Indian tradition.
i i
    i                                                                                                     i
    i                                                                                                      i
90. ˚
i i
    i                                                                                                      i
    i                                                                                                      i
i i
    i                                                                                                      i
    i                                                                                                i
i i
    i                                                                                                i
    i                                                                                                   i
        (1997a).
          27 Bhattojidı̄ksita preserves for etymological reasons the base of the tongue as a separate
                 ..      .
        place of pronunciation only for the jihvāmūlı̄ya: Siddhāntakaumudı̄ 10 (Cardona, 1965,
        227).
          28 Deshpande (1997a, 84). Kāśikā on A. 1.1.8.
i i
    i                                                                                                   i
    i                                                                                        i
i i
    i                                                                                        i
    i                                                                                        i
i i
    i                                                                                        i
    i                                                                                                     i
i i
    i                                                                                                     i
    i                                                                                                  i
        node and with higher nodes branching on the basis of features with decreasing indices in
        their (inversely) ranked list (see n. 33 above).
           35 See e. g. Halle (1995); Calabrese (1998, 9).
i i
    i                                                                                                  i
    i                                                                                    i
        ulator feature (Halle et al. 2000, 388–389; cf. TABLE 12). The fact that
        articulators are controlled by paired sets of agonistic and antagonistic
        muscles is directly reflected in the binary character of their subordinate
        features. Further, he requires that features constitute only terminal nodes
        and that only these spread; he thereby abandons Clements’ (1985) provi-
        sion that higher nodes spread. If this proposal proves correct, then a net-
        work model might better represent feature organization than a tree model
        does. Halle’s recent research validates the articulatory feature analysis
        employed by the ancient Indian phoneticians, especially that of Āpiśali,
        who gives prominence to articulators (see above §5.2.5).
            Since the feature organization of Halle et al. represents the most
        advanced feature analysis in the field of phonology and since it shares
        the articulatory approach to feature analysis of ancient Indian treatises,
        it may be a fruitful basis for analyzing the feature systems and sound
        catalogs of the Indian treatises. Certain features and articulators Halle
        employs are not distinctive in Sanskrit, such as the articulator tongue
        root and its subordinate features, and the articulator-free feature suction.
        Halle reduces the number of articulators considered separate by Āpiśali
        (TABLE 2, II); he accounts for the required distinctions instead by in-
        troducing disposition features subordinate to the remaining articulators
        (back, low, high, anterior, distributed) and the articulator-free feature lat-
        eral. His laryngeal features capture well the observations of Āpiśali and
        Śaunaka concerning the effect of the larynx on pitch (TABLE 2; TABLE
        3 IV, VI[E]) and revise the effect Śaunaka describes of glottal aperture
        on voicing (TABLE 3 III, V). Halle converts the feature nasal from a
        place-of-articulation (TABLE 2, [II]D; TABLE 3, [I]G) or an extra-buccal
        feature (TABLE 2 [III]B5; TABLE 3 [VI]C) to an articulator. He cap-
        tures stricture features, used conservatively by Śaunaka (TABLE 3, II)
        and liberally by Āpiśali (TABLE 2, III) by the articulator-free features
        continuant, consonantal, and sonorant. The direction of Halle’s research
        would seem to lead to an articulatory account of the latter two. TABLE 4
        summarizes the articulatory features of Sanskrit sounds per Halle et al.
        (2000).
i i
    i                                                                                    i
    i                                                                                  i
Chapter 6
Sound-based encoding
    i                                                                                  i
    i                                                                                 i
        6.1.1    Phoneme
        Kemp (1994) summarizes the major elements and history of the con-
        cept of a phoneme. Early definitions of the phoneme limited features
        that could distinguish phonemes to those qualifying timbre, but since
        the 1950s the concept has been extended to include duration, stress, and
        pitch.
            Phonemes are the minimally contrastive segments of sound in a lan-
        guage, on the basis of the contrast between which lexical and gram-
        matical distinctions can be made. Sounds that are lexically or gram-
        matically contrastive in parallel distribution are independent phonemes.
        Conversely, where phonetically similar sounds differ only post-lexically,
        they are not independent phonemes; rather they are either allophones or
        free phonetic variants. Phonetically similar sounds that occur in com-
        plementary distribution are allophones; phonetically similar sounds that
        are non-contrastive in parallel distribution are free phonetic variants. A
        middle category concerns sounds that are barely contrastive (Goldsmith,
        1995a, 10–12). Two sounds, both of which are common, may be con-
        trastive in just a small set of environments; one of two contrastive sounds
        may occur only in limited contexts; or there may be some other asym-
        metry between contrastive sounds. The contrast here possesses a low
        functional yield.
i i
    i                                                                                 i
    i                                                                                             i
               The concept of a phoneme is yoked with two parameters that limit its
        utility as the sole basis for encoding. The first is that the sounds belong
        to the same language in the strictest sense, namely, “the speech of one
        individual pronouncing in a definite and consistent style” (Jones, 1962,
        9). Differences in style, rate, or dialect are not included in the same
        phonemic system. The second limiting parameter of the concept of a
        phoneme is that for sounds to be considered contrastive they are required
        to differentiate semantic content in a narrow sense.
               A number of the phonetic segments described in section 5.1 are not
        phonemes. These include inseparable phonetic segments described as
        subsegments. The status of subsegments within the vowels r and l and
        within e, o, ai, and au cannot be considered independently of˚those˚vow-
        els. Although Old Indo-Aryan e, o, ai, and au are historically derived
        from Proto-Indo-Iranian sequences of separate vowels *aï, *aü, *āï, and
        *āü, they cannot be eliminated as independent phonemes in a synchronic
        description of Sanskrit. The rest of the subsegments described in section
        5.1 are overlapping phases, that is, they are simultaneously the offset
        phase of the first of two segments and the onset phase of the second.
        As such, they form parts of allophones. These include the nasals yama
        and nāsikya in the phonological description of the Rkprātiśākhya, where
        they are the overlapping phases of a stop or h and    ˚ the following nasal
        stop. While Indian phoneticians make a great contribution to the science
        of phonetics by providing descriptions of these sounds, the subsegments
        are not phonemic. They occur in very limited environments as parts of
        sounds that occur in complementary distribution with other allophones
        of their respective phonemes.
               Several other marginal phonetic segments are not phonemes in the
        strict and narrow sense. They occur only in complementary distribution
        with other sounds in parallel contexts and hence are allophones. The
        short vowels ĕ and ŏ occur word-initially in hiatus after e and o in com-
        plementary distribution with a in certain Vedic dialects. They also occur
        in Sāmaveda as free phonetic variants in a specific recitational repetition
        called nyuṅkha. Slightly lengthened short vowels in Vājasaneyisaṁhitā
        occur in complementary distribution with short vowels.1 The retroflex
        L, .l and \h, .lh occur intervocalically in complementary distribution with d.
           1 Long vowels shortened in specific contexts and termed ksipra likewise would not be
                                                                       .
        phonemes, even if they did differ in length from short vowels.
i i
    i                                                                                             i
    i                                                                                                   i
        feature gravity. Historically, the voiceless velar fricative symbolized by hghi in English
        words like cough is in Present Day English a voiceless labio-dental fricative [f] (Ladefoged,
        1971, 44).
            3 We use word-final as a translation of padānta, that is, occurring at the end of a pada
i i
    i                                                                                                   i
    i                                                                                             i
in detail.
i i
    i                                                                                             i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                                       i
i i
    i                                                                                                       i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                                   i
i i
    i                                                                                                   i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                  i
i i
    i                                                                                  i
    i                                                                                 i
i i
    i                                                                                 i
    i                                                                                      i
        trastive feature can the default be construed as indicating the lack of that
        feature.
            In the case of anusvāra, it is necessary to encode a unit to repre-
        sent a default anusvāra unspecified as regards length — in addition to
        a short, long, heavy, and two-mora anusvāra. Although the Vājasaneyi-
        prātiśākhya (4.149) assigns a length of 21 mora to the short anusvāra it de-
        scribes as contrasting with a long anusvāra, and the Rkprātiśākhya (1.34)
        assigns the same weight of 12 mora to the only anusvāra˚       it approves, it
        would not be suitable to use the short anusvāra as a default anusvāra,
        since the Rkprātiśākhya (13.32–33) reports that other authorities specify
        the short˚and long anusvāra as measuring 41 mora and 34 mora respec-
        tively. The Rkprātiśākhya anusvāra is thereby distinguished from both
        short and long˚ anusvāra. Similarly, it is necessary to encode a system
        of three accents in addition to the system of four tones and monotone,
        because many texts indicate three accents without providing any infor-
        mation about which of the four tones are represented. Ancient Indian
        linguists provide rules of accent sandhi that transform isolated accent
        into contextual tone. To allow encoding of accent both before and after
        the application of these rules, it is necessary to adopt both a system of
        three underlying accents (to capture the pitch distribution before accent
        sandhi) as well as four surface tones (to capture the pitch distribution
        after accent sandhi). While the surface tone scheme is required to cap-
        ture the contrasts of different traditions of pitch distribution after accent
        sandhi, European scholars have established a tradition of representing
        Vedic accent utilizing the system of three contrasting pitches belonging
        to the derivational level at which accent sandhi has not yet applied.
i i
    i                                                                                      i
    i                                                                                         i
        it. For example, the unaspirated and aspirated retroflex lateral flaps / /
        and / h / do not occur in Classical Sanskrit; the phonemes /ã/ and /ãh /
        do. In certain Rgvedic dialects, unaspirated and aspirated retroflex lat-
        eral flaps occur˚in complementary distribution with [ã], [ãh ] and hence
        are allophones of [ã], [ãh ]. In separate encoding schemes for Classical
        Sanskrit and for the Rgvedic dialects, encodings are necessary only for
        the two phonemes /ã/,˚ /ãh /.
             In a database that includes passages both in the Rgvedic dialect and in
                                                                 ˚
        the Classical Sanskrit dialect, one could tag the passages     in one or both of
        the dialects and apply phonetic rules to produce the contextually appro-
        priate allophones proper to each dialect. For instance, Yāska’s Nirukta,
        which is predominantly in the Classical Sanskrit dialect, cites passages
        in the Rgvedic dialect. Nirukta 3.11 cites RV. 2.23.9, which contains
        the word ˚ talito with an intervocalic retroflex  ˚ lateral flap, Romanized l.
                      .                                                                .
        Nirukta 3.11 then cites the linguist Śākapūn.i explaining that tal.it refers
        to lightning in the passage vidyut tal.id bhavatı̄ti śākapūn.ih.. Rather than
        including both the retroflex lateral flap [ ] and [ã] in the character encod-
        ing, one might tag the text in Rgvedic dialect and allow special rules to
                                          ˚ flap [ ] in text so tagged. Such a tagging
        realize /ã/ as the retroflex lateral
        in the latter passage could be achieved as follows:
        <embed dialect="rv">vidyut taqid bhavati</embed>
        iti SAkapURiH
        Within the tagged dialect portion, intervocalic /ã/ will always be real-
        ized as the retroflex lateral flap [ ]; outside such tags, it will always be
        realized as [ã].
            It is not likely that such a system would be practical at the present
        stage, however, since this encoding would have to be fairly fine-grained,
        and since we possess insufficient information about dialectal differences
        and loanwords. We are uncertain, for instance, whether Śākapūn.i writes
        consistently in Rgvedic dialect or just cites the single word tal.it in Rg-
        vedic dialect. If˚the latter, the above demonstration includes too much˚of
        the passage within the <embed> tag. Moreover the Nirukta itself uses
        the retroflex .l, .lh even when not directly citing. Immediately after refer-
        ring to Śākapūn.i, the Nirukta continues, sā hy avatāl.ayati, using .l outside
        Rgvedic dialect. The text as received makes no mention/use distinction.
        ˚
        With the lack of reliable information about the author’s dialect, one is
i i
    i                                                                                         i
    i                                                                                                    i
i i
    i                                                                                                    i
    i                                                                                   i
        guishes between extra-high, high, and low, while the system of Pān.ini
        and the Vājasaneyiprātiśākhya distinguishes between high, low, and ex-
        tra-low. Although each of the systems of surface accentuation distin-
        guishes only three pitches, a system of surface accentuation that will ac-
        commodate contrasts across these systems must distinguish four pitches.
        A system that captures distinction in pitch across Vedic dialects must
        therefore distinguish between extra-high, high, low, and extra-low. Con-
        sequently, it is necessary to devise a character-encoding scheme adequate
        to capture phonemic distinctions in the broad sense across all Sanskrit di-
        alects.
            Higher-level bracketing does not seem suitable to capture distinctions
        in various Sanskrit dialects and loan words since there may be insuffi-
        cient evidence to identify the various dialects and source languages for
        loan words. There exist, however, phonetic distinctions that are more
        suitably captured by using higher-level bracketing than by incorporating
        them in a character encoding based on the phoneme in the broad sense.
        Higher-level bracketing is appropriate where the phonetic distinction is
        made only with explicit reference to units at a higher level than the pho-
        neme. For instance, higher-level bracketing is appropriate where the pho-
        netic distinction is made only with reference to lexical items. For exam-
        ple, Mallaśarmakrtaśiks.ā 45–46 describes nasalized vowels prolonged
                            ˚
        to five and six morae.     While there is no reason to doubt the phonetic
        accuracy of the description, there is no need to include the distinction of
        five- and six-mora lengths in the featural scheme of Sanskrit nor to in-
        clude nasalized vowels having a length of five or six morae in the broadly
        phonemic character inventory. Such lengths need not be included be-
        cause their only occurrence is in the final vowels of particular lexical
        items. The length of five morae occurs only in the word mahā, and the
        length of six morae occurs only in the word ati (Mallaśarmakrtaśiks.ā
        46). Because the occurrence is lexically specific, the phenomenon   ˚ is best
        described lexically, as it was described by the Śiks.ā itself. The Śiks.ā
        calls the occurrence of these extra-long nasalized vowels mahāraṅga and
        atiraṅga, i.e. the raṅga of mahā and the raṅga of ati. The defining char-
        acter of the distinguishing feature seems to be the lexical item rather than
        the length of the sound. Therefore, we consider it a lexical feature rather
        than a phonetic one.
            It is difficult to capture suprasegmental features such as accent in
i i
    i                                                                                   i
    i                                                                                                              i
i i
    i                                                                                                              i
    i                                                                                  i
        not a distinct accent of the anusvāra. Since the svarita accent of the syl-
        lable is encoded by the vowel accent, it would be redundant to encode it
        again for the anusvāra.
i i
    i                                                                                  i
    i                                                                                  i
        als. For example, the vowel a with a dependent circumflex which falls
        from extra-high to high according to the description in the Rkprātiśākhya
        is coded a^98; with the independent circumflex before ˚     a high-pitched
        or circumflexed syllable, a^97. As described in the Vājasaneyiprāti-
        śākhya, these are coded a^87 and a^86 respectively. Contours that
        include three pitches within a single vowel can be similarly accommo-
        dated.
i i
    i                                                                                  i
    i                                                        i
i i
    i                                                        i
    i                                                                                                i
Chapter 7
Script-based encoding
                                                  101
i                                                                                                        i
    i                                                                                                i
    i                                                                                   i
        that human psychology shapes writing, and writing in turn shapes hu-
        man psychology, he argued, produces a feedback loop that allowed for
        rapid evolution. Writing is associated with the rise of complex forms of
        social and cultural organization, the accrual of specific historical knowl-
        edge, and the abstract thought that led to the sciences and technologies.
        The written document allows for increasing distance from the primary
        act of communication; it “speaks” to an imagined reader, or a generally
        literate audience. Writing abstracts distinctive features from the chaine
        de la parôle, levels the differences between spoken language dialects
        (and a fortiori idiolects) (Weir, 1967, 172), and removes many of the
        context-dependent features of face-to-face interaction.
             There is no question that writing contributes to certain elements of
        cognitive development, and that it has historically been associated with
        the development of complex social organization and the accrual of spe-
        cific knowledge. The accrual of specific knowledge itself allows for
        progress in science and technology. However, the contention that writing
        is directly responsible for the development of abstract thought in human
        evolution is speculative at best. Indeed, the opposite may be the case.
        The reliance on writing may contribute to the deterioration of cognitive
        ability. In the Phaedrus, Socrates denigrates writing by relating the words
        of king Thamus of the Egyptian Thebes to the god Theuth when Theuth
        revealed the art of writing to him. When Theuth promised that it would
        make the people wiser and improve their memories, king Thamus retorts
        that it would have the very opposite effect. He says, “it will implant
        forgetfulness in their souls; they will cease to exercise memory because
        they rely on that which is written”. (Phaedrus 275a.) Specific knowl-
        edge inherited through the oral tradition could produce the development
        of abstract thought just as well as specific knowledge inherited through
        written means. The development of linguistic sciences in India are evi-
        dence against the claim that writing is responsible for the development
        of the abstract thought that led to the sciences. These sciences developed
        in oral medium. Pān.ini composed the As..tādhyāyı̄ using phonetic, not
        visual, markers. The oral transmission of Vedic texts spawned the devel-
        opment of mnemonic techniques that led to prodigous feats of memory.
        To this day students trained in traditional methods know thousands of
        verses or sūtras by heart. The composition of poetry in early cultures
        attests to the ability to speak to an imagined reader in the abstract, in the
i i
    i                                                                                   i
    i                                                                                  i
        absence of writing.
            Although writing cannot claim sole responsibility for producing ab-
        stract thought in the history of human cognitive development, neverthe-
        less writing has been the dominant medium for knowledge transmission
        in the past couple of millenia. Attention to the structure of written lan-
        guage is important to historians, psychologists (Ellis, 1979), and educa-
        tors. Moreover the study of written language has technological appli-
        cation in optical character recognition (OCR), handwriting recognition,
        and the design of new media (Rosenberger, 1998). The investigation of
        written-language structure begins with segmentation. Writing systems
        give different cues to segmentation at different levels, for instance by
        punctuation and regularity of spacing, which may be present or absent to
        varying degrees. Words are delimited in most present-day Western writ-
        ing, whereas in East Asian writing they are not (nor are they in many pre-
        modern Western manuscripts) (Saenger, 1991); printed Sanskrit texts in
        Devanāgarı̄ lie somewhere in-between, with word separation only where
        sandhi and the graphotactic structure of the script permit it. Analysis of
        written language into abstract units called characters that are repeated
        with variable visual features is the basis for most computer processing of
        language as it is known today. Characters may be discretely realized, as
        in contemporary English texts, or unsegmented, as in a printed or hand-
        written Arabic text (Abu-Rabia & Taha, 2006). Handwriting (as well as
        printing, insofar as its glyphs are imitative of handwritten ones) can be
        analyzed into units smaller than the character, in particular, “a set of up-
        strokes and downstrokes ordered in time” (Mermelstein & Eden, 1964,
        257). Such a level of analysis takes into account the physical mechanism
        involved in writing and is capable of identifying units that are invariant,
        while characters may be realized with infinite variation.
i i
    i                                                                                  i
    i                                                                                                  i
        Gibson (1969, 88) proposes a set of distinctive features, divided into five
        classes, for capital letters of the Roman alphabet:
           1. straight: [± horizontal], [± vertical], [± diagonal /], [± diagonal \]
        150–151). A set of attributes used in a visual grammar of the upper-case Roman letters in-
        cludes {shaft, leg, arm, bay, closure, weld, inlet, notch, hook, crossing, symmetry, marker}
        (Reed, 1978, 146). Cf. Narasimhan & Reddy (1967).
           Featural analysis has also been applied to graphic systems other than written language,
        e. g. children’s drawings (Krampen, 1986, 87–88).
            6 See Gibson (1969, 88–89).
i i
    i                                                                                                  i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                                  i
(2007).
i i
    i                                                                                                  i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                                    i
        respectively, which differ only in [± aspirated]; hbai and hvai represent /b/
        and /w/, which are both non-syllabic voiced segments with labial articu-
        lation;10 and hBai and hmai represent /bh / and /m/, which differ only in the
        values of [± aspirated, ± nasal]. Moreover, the four retroflex non-nasal
        stop characters hfi, hFi, h.qi, and hQi all share the graphic feature of a
        round bottom. Graphic similarity, however, is by no means always corre-
        lated with phonetic similarity, and Devanāgarı̄ has several close graphical
        pairs corresponding to sounds that are not especially similar, such as ya
        hyai and Ta hthai, :pa hpai and :Sa hśai, Ba hbhai and ½ hjhai.
        application. Character components fall into four groups: (a) straight lines (5), (b) circles
        and curlicues (7), (c) dots (2), (d) other shapes (24). Several components are given in
        variant forms. We thank Hans Hock for sharing these materials with us.
i i
    i                                                                                                    i
    i                                                                                         i
i i
    i                                                                                         i
    i                                                                           i
i i
    i                                                                           i
    i                                                        i
i i
    i                                                        i
    i                                                                               i
Chapter 8
Conclusions
                                          113
i                                                                                       i
    i                                                                               i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                  i
CONCLUSIONS 115
        tures, and what criteria to use to contrast items. Since information degra-
        dation arises at each stage in representation of knowledge, it is felicitous
        to encode the primary medium of knowledge transmission. Given that
        script is inherently a secondary phenomenon vis-à-vis spoken language,
        encoding should be based directly on spoken language. Devanāgarı̄ script
        itself was not specifically designed to represent Sanskrit phonology, but
        rather was adapted to this use subsequently; hence it is not surprising
        that it proves to be a less appropriate basis for encoding Sanskrit than
        Sanskrit phonology itself.
            Few of the world’s writing systems were designed for the languages
        that they represent in extant texts. Most were adapted, and adaptations
        almost always fail to capture the structure of the spoken language ade-
        quately. Therefore, in general, where one has access to the phonology of
        the language, where the orthography is fairly shallow, and where the stan-
        dard orthography departs from an ideal coding of spoken language struc-
        ture, the basis for text encoding should be phonetic rather than graphic.
        Sanskrit meets these conditions, and so it is better to encode Sanskrit
        speech sounds directly than to encode the secondary representations of
        those sounds in Devanāgarı̄, Roman, or any other script. Directly coding
        Sanskrit speech sounds will solve the problems of ambiguity and redun-
        dancy that we have noted in our survey of current encoding schemes.
            Spoken language has a temporal dimension, and scripts that repre-
        sent spoken language have a linear dimension that corresponds to the
        temporal dimension of spoken language. The minimal independent unit
        in the chain of speech is the phonetic segment or phone. The minimal
        independent unit in script is the graphic segment or graph. A segmental
        linguistic encoding is based upon minimal phonetic or graphic segments.
        Yet both phonetic and graphic units may be decomposed into systems
        of features orthogonal to this dimension of segmentation and not nec-
        essarily coterminous with the minimal units of segmentation. Phonetic
        units may be decomposed into a set of acoustic or articulatory features
        that are realized simultaneously. Similarly, writing may be analyzed into
        graphic features. Although the boundaries between phonetic and graphic
        segments are sites of marked alterations in phonetic and graphic fea-
        tures, each feature may independently be associated with a string of one
        or more phonetic or graphic segments. Encodings may be entirely seg-
        mental, at one pole of the synthetic–analytic axis, or entirely featural at
i i
    i                                                                                  i
    i                                                                                  i
i i
    i                                                                                  i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                              i
        users who are not familiar with English and English keyboard layouts. New hardware
        addresses these challenges (Joshi et al., 2004).
           2 As of March 2010, India had about 545 million mobile phone users. Source: <https:
//www.cia.gov/library/publications/the-world-factbook/geos/in.html>.
i i
    i                                                                                              i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                    i
        netics, morphology, syntax, and semantics. The current book drew heav-
        ily from the first. The tradition of grammar (vyākaran.a) offers interest-
        ing modes of morphological and syntactic analysis that may prove to be
        more suitable to highly-inflected free-word-order languages than meth-
        ods employed in contemporary computational frameworks. The tradi-
        tions of grammar, logic (nyāya), ritual exegesis (karmamı̄māṁsā), and
        literary theory (alaṅkāraśāstra) offer various competing intricate theo-
        ries of verbal comprehension. These Indian linguistic traditions might
        contribute useful insights to contemporary formal linguistics.
              Indian linguistic theories can be formalized and implemented compu-
        tationally. Research to work out the details of Indian semantic and syn-
        tactic theory could contribute to contemporary research at the semantics-
        syntax interface where computational linguistic work is flourishing. The
        authors’ current work draws upon major semantic and syntactic trea-
        tises in the Indian grammatical tradition and contemporary techniques
        of formalization and computational implementation to bring ancient In-
        dian theories face to face with contemporary computational linguistic
        work. On the one hand, we articulate Indian theories in contemporary
        terms and offer a critique and insights useful to contemporary linguists.
        On the other hand, we suggest ways of modeling ancient Indian theo-
        ries computationally. The latter will allow computational modeling to
        clarify those ancient theories and assist in answering difficult questions
        regarding their principles and historicity. Implementing Indian theories
        of morphology, syntax, and semantics computationally requires working
        out methods to encode the categories and distinctions articulated in these
        theories. Research that compares the Indian theories with contemporary
        theories requires correlating the encodings of Indian linguistic categories
        with traditional European categories. We hope to develop these higher-
        level linguistic encoding schemes and to utilize them in the creation of
        tagged corpora for linguistic research.
              XML has emerged as the standard method of implementing higher-
        level encoding in digital texts. The Text-Encoding Initiative (TEI) has de-
        veloped standards for encoding metadata of digital texts in XML, and for
        encoding various literary aspects of texts. Investigation of the categories
        and distinctions of sentiments (rasa) and literary figures (alaṅkāra) in the
        Indian traditions of literary criticism and artistic appreciation (alaṅkāra-
        śāstra, nāt.yaśāstra) remains a fruitful field for future research.
i i
    i                                                                                    i
    i                                                   i
Appendices
                               121
i                                                           i
    i                                                   i
    i                                                          i
122 APPENDICES
i i
    i                                                          i
    i                                                   i
Appendix A
Tables
                               123
i                                                           i
    i                                                   i
    i                                                                                  i
124 APPENDICES
i i
    i                                                                                  i
    i                                                                           i
i i
    i                                                                           i
    i                                                                                  i
126 APPENDICES
Notes:
i i
    i                                                                                  i
    i                                                                                                i
i i
    i                                                                                                i
    i                                                                                        i
128 APPENDICES
i i
    i                                                                                        i
    i                                                                                                       i
i i
    i                                                                                                       i
    i                                                                               i
130 APPENDICES
i i
    i                                                                               i
    i                                                                                                    i
i i
    i                                                                                                    i
    i                                                                                    i
132 APPENDICES
Notes:
i i
    i                                                                                    i
    i
                                                                                                                                                                      i
i
                                                                                                                                                                                                                     i
                        CONSONANTS                                                                         VOWELS1,2
                                                                                         3
                        stops                                               semivowels spirants
                                                                                                                                                 APPENDIX A: TABLES
        GUTTURAL                                                                h, h6            H h.                   Aa             A;a ā
        VELAR            k, k      K,a kh     g,a g    ;G,a gh     .z, ṅ       M ṁ7           ^h
        PALATAL          . c,a c    C, ch     .j,a j   J,a jh     V,a ñ          y,a y           Z,a ¯ś    I i IR ı̄         O; e     Oe; ai
                    8
        RETROFLEX          f, .t    F, .th    .q, d.   Q, d.h     :N,a n.        .=, r           :S,a s.     r  r̄
        DENTAL             t,a t    T,a th    d, d     ;D,a dh      n,a n         l, l          .s,a s      ˚l ˚l̄
        LABIAL           :p,a p    :P, ph      b,a b   B,a bh     m,a m          v,a v          ^h          o ˚u ˚ū         A;ea o A;Ea au
                                                                                                                                                                      “LIES” — 2011/6/21 — 15:43 — page 133 — #153
                                                                                                    ˇ
                                                                                                                                                 133
i
                                                                                                                                                                                                                     i
    i
                                                                                                                                                                      i
    i                                                                                         i
134 APPENDICES
Notes:
i i
    i                                                                                         i
    i
                                                                                                                                                                  i
i
                                                                                                                                                                                                                 i
                       CONSONANTS                                                                      VOWELS1
                                                                                      2
                       stops                                              semivowels spirants          simple diphthongs   simple
                                                                                                                                             APPENDIX A: TABLES
                       incontinuously contacted                           slightly cont. slightly open open3 >      >>     most open close
                       UNVOICED             VOICED                        VOICED         UNVD . VD . VOICED
                       unasp. asp.          unasp. asp.         nasal4
135
i
                                                                                                                                                                                                                 i
    i
                                                                                                                                                                  i
    i                                                                                    i
136 APPENDICES
Notes:
i i
    i                                                                                    i
    i
                                                                                                                                                                         i
i
                                                                                                                                                                                                                        i
                        CONSONANTS                                                                          VOWELS1
                                                                                       2
                        stops                                              semivowels spirants              simple            complex
                                                                                                                                                    APPENDIX A: TABLES
                        incontinuously contacted                           slightly cont. continuously open continuously open continuously open
                        UNVOICED             VOICED                        VOICED         UNVD . VD .       VOICED            VOICED
                        unasp. asp.          unasp. asp.         nasal3                                     short long        long
                                                                                                                              fused diphthong
        GUTTURAL                                                                                H h.      h, h   Aa     A;a ā
        VELAR            k, k      K,a kh     g,a g    ;G,a gh    .z, ṅ                       ^h
        PALATAL          . c,a c    C, ch     .j,a j   J,a jh    V,a ñ         y,a y            Z,a ¯ś          Ii     IR ı̄    O; e      Oe; ai
                    4
        RETROFLEX          f, .t    F, .th    .q, d.   Q, d.h    :N,a n.       .=, r            :S,a s.          r      r̄
        DENTAL             t,a t    T,a th    d, d     ;D,a dh     n,a n        l, l           .s,a s            ˚l    ˚l̄
        LABIAL           :p,a p    :P, ph      b,a b   B,a bh    m,a m         v,a v           ^h                o ˚u   ˚ū     A;ea o   A;Ea au
                                                                                                                                                                         “LIES” — 2011/6/21 — 15:43 — page 137 — #157
                                                                                                  ˇ
        NASAL                                                                              M     ṁ5
                                                                                                                                                    137
i
                                                                                                                                                                                                                        i
    i
                                                                                                                                                                         i
    i                                                                                 i
138 APPENDICES
Notes:
i i
    i                                                                                 i
    i
                                                                                                                                                                i
i
                                                                                                                                                                                                               i
                    CONSONANTS                                                                      VOWELS
                    stops                                              semivowels spirants
                                                                                                                                           APPENDIX A: TABLES
                                                                                           ˇ
                                                                                                                                           139
i
                                                                                                                                                                                                               i
    i
                                                                                                                                                                i
    i                                                                                  i
140 APPENDICES
Notes:
i i
    i                                                                                  i
    i
                                                                                                                                                      i
i
                                                                                                                                                                                                     i
                        CONSONANTS                                                             VOWELS
                        stops                                     semivowels spirants
                                                                                                                                 APPENDIX A: TABLES
        GUTTURAL                                                     Xh 2             s                  He             eH
                3         w       w        w       wh
        VELAR            k      k H      g      g          n         n,m             s
        PALATAL3        kw ,ḱ kw H,ḱH gw ,ǵ gwh ,ǵh    n           y             ḱ      y     yH          H ey     eHy
                    4
        RETROFLEX         t       tH      d      dh        n          r,l          ḱ,s5     r     rH
        DENTAL            t       tH      d      dh        n          r,l           ḱ,s     l     lH
                                              6
        LABIAL            p      pH b,p,pH       bh        m          w              s       w     wH         H ew     eHw
                                                                                                                                                      “LIES” — 2011/6/21 — 15:43 — page 141 — #161
141
i
                                                                                                                                                                                                     i
    i
                                                                                                                                                      i
    i                                                                                     i
142 APPENDICES
i i
    i                                                                                     i
    i
                                                                                                                     i
i
                                                                                                                                                                    i
                     CONSONANTS1,2,3                                                  VOWELS4
                     stops                            liquids5   glides5   spirants
                     UNVD . VOICED                    VOICED               UNVD .     VOICED
                     unasp. unasp. asp.      nasal5
        LABIOVELAR     kw      gw     gwh
        VELAR 6         k       g      gh                          w
        PALATAL         ḱ      ǵ     ǵh               r         y                     e
        DENTAL          t       d      dh     n          l                    s
        LABIAL          p       b      bh     m
                                                                                                                     “LIES” — 2011/6/21 — 15:43 — page 143 — #163
143
i
                                                                                                                                                                    i
    i
                                                                                                                     i
    i                                                                                     i
144 APPENDICES
i i
    i                                                                                     i
    i
                                                                                                                                      i
i
                                                                                                                                                                                     i
                     CONSONANTS                                                              VOWELS1
                     stops                                     liquids   glides   spirants   low mid      high
                     UNVOICED          VOICED                  VOICED             UNVD .     VOICED
                     unasp.     asp.   unasp.   asp.   nasal
        GLOTTAL                                                                      h
        LABIOVELAR    kw        kwh     gw      gwh
        VELAR          k         kh      g      gw                                           a
        PALATAL        ḱ        ḱh     ǵ      ǵh              r        y                 @        e    i
        DENTAL         t         th      d       dh     n         l                  s
                                                                                                                                      “LIES” — 2011/6/21 — 15:43 — page 145 — #165
        LABIAL         p         ph      b       bh     m                 w                           o    u
                                                                                                                 145
i
                                                                                                                                                                                     i
    i
                                                                                                                                      i
    i                                                                                i
146 APPENDICES
i i
    i                                                                                i
    i                                                                                 i
          [suction]
          [continuant]
          [strident]
          [lateral]
          [nasal]                     Soft Palate               [consonantal]
                                                                [sonorant]
          [retracted tongue root]     Tongue Root
          [advanced tongue root]                     Guttural
          [stiff vocal folds]
          [slack vocal folds]         Larynx
          [constricted glottis]
          [spread glottis]
          [anterior]                  Coronal
          [distributed]
          [round]                     Labial         Place
          [back]
          [high]                      Dorsal
          [low]
i i
    i                                                                                 i
    i                                                                              i
148 APPENDICES
i i
    i                                                                              i
    i
                                                                                                                                                                                                             i
i
                                                                                                                                                                                                                                                            i
             A A;a I IR o      O; Oe; A;ea A;Ea k Ka ga ;Ga .z . ca C .ja Ja Va f F .q Q :Na ta Ta d ;Da na :pa :P ba Ba ma ya .= l va Za :Sa .sa h M H
         1   +   +   + + +   +    +   +   +   +   +   +   +   +   +   +   +   +   +   +   +   +   +   +   + + + +   +   + + +   −   +   +   +   +   −   +   + + + + +   +   +   + − −
2 + + − − − − + + + + − − + + + + + + − + − + + + − − − − + + + − + + + + + + + + − + + + + + − − −
3 − − + + + + − − − − − − − − − − − − + − − − + − + + + + − − + + − − − − − − − + − − − − − − + − −
4 − − − − − − − − − − − − − − − + + − − − − − − − − − − − + − − − − − − − − + + − + − − − − + − − −
         5   −   −   + + −   −    −   −   −   −   +   +   −   −   −   +   −   −   −   −   −   −   +   −   − − − −   +   − − +   −   −   −   −   −   −   −   − + − − +   −   +   − − −
                                                                                                                                                                                        APPENDIX A: TABLES
6 − − − − − − + + − − − − − − − − − − − + − − + − − − − − − − − − − − − − − + + − − − − − − + − − −
7 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − + − − − + + − − − + − − − − + − − − −
8 − − − − − − − − − − + − − − + + − − − − + − − − − − − − − − − − − − − − + − − − − − + − − − − − −
9 − + − − − − + + − − − − + + − − − − − − − − − − − − − − + − − − − − − − − − − − − − − − − − − − −
10 − − − − − + − − − − − − − − + − − − − − − − − − − − − − − − − − − − − + − − − − − − − − − − − − −
11 − − − − − − − − − − − − − − − − − − − − − − − − − − − − − + − − − + − − + − − − − − − − + − − − −
12 − − − + − − + + + + − − − − − − − − − + − + − + − − − − − − − − − − − − − − − − − − − − − − − − −
13 − − − − − − − − − − − − − − − − − − − − − − − − + + − + − − − − − − − − − − − − − − − − − − + − −
14 − − − − − − − − − − − − − − − − − − − − − − − − − − − + − − + − − − − − − − − − − − − + − − − − −
15 − − + + − − − − − − − − − − − − − − + − − − + − − − + − − − − − − − − − − − − − − − − − − − − − −
16 − − − − − − − − + + − − − − − − − + − − + − − − − − − − − − − − + − − − − − − − − − − − − − − − −
17 + + + − − − + + − − − − + + − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −
                     − − −   −    −   −   −   −   −   −   −   −   −   −   −   −   −   −       −   −   −   − − − −   −   − − −   −   −   −   −   −   −   −   − − − − −   −   −   − − −
                                                                                                                                                                                                             “LIES” — 2011/6/21 — 15:43 — page 149 — #169
18 + + +
19 − − − − + + − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − + − − −
20 − − − − − − − − − − + + + − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − −
        21   −   −   − − −   −    −   −   −   −   −   −   −   −   −   −   −   −   +   −   −   −   −   −   − − − −   −   − − −   −   −   −   −   −   −   −   − − − − −   −   −   − + +
                                                                                                                                                                                        149
i
                                                                                                                                                                                                                                                            i
    i
                                                                                                                                                                                                             i
    i                                                          i
150 APPENDICES
i i
    i                                                          i
    i                                                                                  i
Appendix B
                                            151
i                                                                                          i
    i                                                                                  i
    i                                                                                                                    i
152 APPENDICES
i i
    i                                                                                                                    i
    i                                                                                  i
        B.2      Punctuation
        Although punctuation does not properly belong to a phonetic encoding,
        a limited number of punctuation tokens are supported in this encoding,
        since they can be used to provide basic segmentation information. The
        question mark is used to indicate inaudible or illegible characters in tran-
        scription.
                         Y’        Á.       ;;; ?     --       2
                         ’         .         ?        -
                      avagraha   danda    question   hyphen   space
        B.3      Modifiers
        Modifiers are added after a character to indicate variations in segment
        stricture, length, accent, and nasalization, in the order stated. Prolonged
        length, accent, and nasalization occur in classical Sanskrit as well as Ve-
        dic. Modifiers are used in combination to indicate special features of
        stricture, length, accent, and nasalization in Vedic.
B.3.1 Stricture
B.3.2 Length
i i
    i                                                                                  i
    i                                                                      i
154 APPENDICES
B.3.3 Accent
                /    high pitch
                \    low pitch
                ^    circumflex
                6    extra low tone
                7    low tone
                8    high tone
                9    extra high tone
                +    sharpness
B.3.4 Nasalization
~ nasalization
                y_ heavy y
                v_ heavy v
                y= light y
                v= light v
                k! unreleased (abhinidhāna) k
                g! unreleased (abhinidhāna) g
                . . . similarly for other unreleased stops
                y! unreleased (abhinidhāna) y
                v! unreleased (abhinidhāna) v
                l! unreleased (abhinidhāna) l
i i
    i                                                                      i
    i                                                                                   i
B.4.2 Length
                 a* epenthetic a
                 i* epenthetic i
                 u* epenthetic u
                 f* epenthetic r
                 x* epenthetic˚l
                 e* epenthetic˚e
                 e1 short e
                 o1 short o
                a1# slightly lengthened short a
                . . . similarly for other slightly lengthened short vowels
i i
    i                                                                                   i
    i                                                                                i
156 APPENDICES
                H/     high-pitched visarga
                H\     low-pitched visarga
                H^     svarita visarga
                M\     low-pitched anusvāra
        B.4.5    Nasals
        Nasalization
        Both SLP1 and SLP2 include means to encode 20 yamas (k~, kh~, . . . ,
        b~, bh~) considered, on phonetic grounds, to be epenthetic nasalized
        segments that adopt features of both of the preceding stop and of the fol-
        lowing nasal. Yet the preferred method of encoding yamas, in accordance
        with the phonological analysis of most ancient Indian phonetic treatises,
        is to employ characters for just four epenthetic nasals (k~, kh~, g~,
        gh~), or, on the minority view of the Rkprātiśākhya, to employ yamas
                                               ˚
i i
    i                                                                                i
    i                                                                                 i
        (k~, kh~, . . . , b~, bh~) in place of the non-nasal stop that precedes the
        nasal. (See p. 63 and p. 72 for discussion.)
                l~    nasalized l
                y~    nasalized y
                v~    nasalized v
                k~    nasalized offset (yama), after unvoiced unaspi-
                      rated non-nasal stop when followed by a nasal
                      stop
                K~    nasalized offset (yama), after unvoiced aspirated
                      non-nasal stop when followed by a nasal stop
                g~    nasalized offset (yama), after voiced unaspirated
                      non-nasal stop when followed by a nasal stop
                G~    nasalized offset (yama), after voiced aspirated
                      non-nasal stop when followed by a nasal stop
                h~    nasalized offset (nāsikya), after h when followed
                      by a nasal stop
Anusvāra
i i
    i                                                                                 i
    i                                                                           i
158 APPENDICES
Raṅga
i i
    i                                                                           i
    i                                                                                i
Appendix C
        Devanāgarı̄
        Often several options are given for the marking of Vedic accentuation in
        Devanāgarı̄, including those used in the following traditions:
                                           159
i                                                                                        i
    i                                                                                i
    i                                                                                   i
160 APPENDICES
i i
    i                                                                                   i
    i                                                                                        i
        IPA transcription
        The IPA transcription given in column 5 is for reference; it does not
        necessarily represent the only historically correct reconstruction for the
        sound in question. Surface tones are indicated using the tone-letter sys-
        tem of Chao (1930). Underlying tones are indicated using the grave ac-
        cent, acute accent, and circumflex accent with their standard IPA (1949–
        1996) meanings.
            The short a [5] in Sanskrit, although described as close in comparison
        with ā [A:], is yet more open than schwa [@].
            The diphthongs ai and au in the modern pronunciation of Sanskrit
        use the close a [5] at the onset but preserve the same sounds as the corre-
        sponding vowels i [i] and u [u] at the offset. The vowels represented by
        i and u are the most front and most back vowels shown in the IPA chart;
        they never represent [I] as in ‘pin’ or [U] as in ‘book’.
            We use the true palatal symbols for the palatal series of stops c ch j
        jh [c ch é éh ] and a palatal spirant ś [ç] rather than alveolar fricatives [Ù Ùh
        Ã Ãh ].
i i
    i                                                                                        i
    i                                                                       i
162 APPENDICES
i i
    i                                                                       i
    i                                                                      i
i i
    i                                                                      i
    i                                                                               i
164 APPENDICES
i i
    i                                                                               i
    i                                                                            i
i i
    i                                                                            i
    i                                                                               i
166 APPENDICES
i i
    i                                                                               i
    i                                                                        i
i i
    i                                                                        i
    i                                                                            i
168 APPENDICES
i i
    i                                                                            i
    i                                                                           i
i i
    i                                                                           i
    i                                                                                 i
170 APPENDICES
i i
    i                                                                                 i
    i                                                                              i
i i
    i                                                                              i
    i                                                                           i
172 APPENDICES
i i
    i                                                                           i
    i                                                                          i
i i
    i                                                                          i
    i                                                                              i
174 APPENDICES
i i
    i                                                                              i
    i                                                                               i
i i
    i                                                                               i
    i                                                                            i
176 APPENDICES
i i
    i                                                                            i
    i                                                                           i
i i
    i                                                                           i
    i                                                                                  i
178 APPENDICES
i i
    i                                                                                  i
    i                                                                             i
i i
    i                                                                             i
    i                                                                                 i
180 APPENDICES
i i
    i                                                                                 i
    i                                                                          i
i i
    i                                                                          i
    i                                                                            i
182 APPENDICES
i i
    i                                                                            i
    i                                                                            i
i i
    i                                                                            i
    i                                                                                i
184 APPENDICES
i i
    i                                                                                i
    i                                                                                i
i i
    i                                                                                i
    i                                                                              i
186 APPENDICES
i i
    i                                                                              i
    i                                                                              i
i i
    i                                                                              i
    i                                                                                 i
188 APPENDICES
i i
    i                                                                                 i
    i                                                                                    i
i i
    i                                                                                    i
    i                                                                                         i
190 APPENDICES
i i
    i                                                                                         i
    i                                                                                        i
                                                                                  Aı̃    Ć
         338    E4^97~                                                            < ::: £
         339    E4^87~              Oe <;4 5 / OeÍ <;4 6,7                       Aı̃  Ą
                                                                                  < ::: £
         33A    E4^87+~             Oe<;4 5,6 / OeÉï<;4 7                        Aı̃  Ą
                                                                                  < ::: £
         33B    E4^86~              3:O<!e;4 5 / Oe<;4 6                         Aı̃  Ć
                                                                                  < ::: £
i i
    i                                                                                        i
    i                                                                                 i
192 APPENDICES
i i
    i                                                                                 i
    i                                                                             i
i i
    i                                                                             i
    i                                                                                  i
194 APPENDICES
i i
    i                                                                                  i
    i                                                                                  i
i i
    i                                                                                  i
    i                                                                                     i
196 APPENDICES
i i
    i                                                                                     i
    i                                                                                     i
i i
    i                                                                                     i
    i                                                                         i
198 APPENDICES
i i
    i                                                                         i
    i                                                            i
i i
    i                                                            i
    i                                                             i
200 APPENDICES
i i
    i                                                             i
    i                                                            i
         4CB    y!                    y,a          y       j^
         4CC    y~                    y< ,a        ỹ      j̃
         4CD    r                       .=,        r       õ
         4CE    l                    ,a           l       ”l
         4CF    l!                   ,a           l      ”l^
         4D0    l~                   < ,a         l̃      ”l̃
         4D1    v                      v,a         v      w
         4D2    v_                    ë+;aë ,Á     v       B
         4D3    v=                     v,a         v
         4D4    v!                     v,a         v      w^
         4D5    v~                     v<.,a       ṽ     w̃
         4D6    S                     Z,a          ś      ç
         4D7    z                     :S,a         s.      ù
         4D8    s                    .s,a          s       ”s
         4D9    h                      h,          h       H
i i
    i                                                            i
    i                                                                                                                                                             i
202 APPENDICES
                                      घ≈ « 3 (and
                                         ∆      DEVANAGARI
                                               NOTE:  not
                                                DEVANAGARI
                                                DEVANAGARI  above
                                                           Several          SIGN
                                                                        LONGit)
                                                                             SIGNof
                                                                              SIGN    CANDRABINDU
                                                                                     as   shown
                                                                                        CANDRABINDU
                                                                                      the           isnext
                                                                                              characters
                                                                                         CANDRABINDU            AVAGRAHA
                                                                                                          usedtoabovethe
                                                                                                                    THREE
                                                                                                                       TWO         is
                                                                                                                                  is
                                                                                                                                   as
                                                                                                                             kacould  isused
                                                                                                                                          used
                                                                                                                                        used
                                                                                                                                        follows: toto क
                                                                                                                                                      mark
                                                                                                                                                      mark
                                                                                                                                           be anusvāra
                                                                                                                                                considered     anusvāra.
                                                                                                                                                               aƒto
                                                                                                                                                          ¬ √after vowel
                                                                                                                                                                   vowel
                                                                                                                                                                    ≈ abe
                                                                                                                                                                        ∆short      (Figure
                                                                                                                                                                                . prolonged
                                                                                                                                                                                   prolonged
                                                                                                                                                                             «sequences
                                                                                                                                                                                   There        is no  8G)
                                                                                                                                                                                                       of  to
                                                                                                                                                                                                            to three
                                                                                                                                                                                                                two
                                                                                                                                                                                                            question
                                                                                                                                                                                                            other         mora
                                                                                                                                                                                                                         mora
                                                                                                                                                                                                                           that the
                                                                                                                                                                                                                     characters.    with
         4E9    M1/8                  ><       VEDIC
                                         क first        SIGN
                                                nasalization.
                                                VEDIC
                                               The   twoSIGN
                                                     glyphs  of these
                                                                                  ANUSVARA
                                                                            (Figure
                                                                                thehere
                                                                          ANTARGOMUKHA
                                                                         for                 need toisbe
                                                                                            8F)
                                                                                            8E)
                                                                                       CANDRABINDU
                                                                                                              used,
                                                                                                                   to
                                                                                                                 encoded
                                                                                                                         mark
                                                                                                                  characters
                                                                                                                                     a long
                                                                                                                         with aasbindu         alladded
                                                                                                                                         spacing
                                                                                                                                        are                on equally,
                                                                                                                                                                top, to
                                                                                                                                                      characters,
                                                                                                                                                    aligned               butmark
                                                                                                                                                                                      vowel.
                                                                                                                                                                                          short
                                                                                                                                                                                  onethemight
                                                                                                                                                                                 at            height
                                                                                                                                                                                                       (Figure
                                                                                                                                                                                                        anusvāra
                                                                                                                                                                                                         argue
                                                                                                                                                                                                             of the
                                                                                                                                                                                                                    8K)
                                                                                                                                                                                                                   thatafter     a long
                                                                                                                                                                                                                           the last
                                                                                                                                                                                                                         headline
                                         « 3 four
                                         ∆      vowel.     (Figure              8H)                                                isisfollows:
                                                                                                                                          used
                                                                                                                                        used       to क
                                                                                                                                                  to  mark
                                                                                                                                                      mark¬ √ anusvāra.
                                                                                                                                                               aƒtovowel            (Figure
                                                                                                                                                                                . prolonged            8G) to  three      mora      with
                                      >< !           could              be SIGN
                                                                             composed           withnext   a base       character         and                                                  .isAn       argument         (which
                                                DEVANAGARI
                                               (and   notSeveral
                                                            above              it)    CANDRABINDU               AVAGRAHA                                            ≈ be∆ «sequences
                                                DEVANAGARI
                                               NOTE:                         SIGNofas     shown
                                                                                        CANDRABINDU
                                                                                      the     characters        to   the
                                                                                                                    THREE
                                                                                                                 above             as
                                                                                                                             kacould       be considered
                                                                                                                                                  COMBINING         CANDRABINDU    There              no
                                                                                                                                                                                                       of   question
                                                                                                                                                                                                            other          that
                                                                                                                                                                                                                     characters.  the
         4EA    M1\7                     ख first
                                         क      nasalization.
                                               seems
                                                VEDIC
                                               The   twoto
                                                         SIGN
                                                     glyphs  us these
                                                             of          to (Figure
                                                                             bethe
                                                                          BAHIRGOMUKHA
                                                                                    here
                                                                          ANTARGOMUKHA
                                                                         for                8F)
                                                                                    particularly
                                                                                             need to
                                                                                       CANDRABINDU
                                                                                                       isisstrong)
                                                                                                             used,
                                                                                                            beused,
                                                                                                                 encodedwith
                                                                                                                         with aaasbindu
                                                                                                                         against
                                                                                                                  characters           bindu
                                                                                                                                        this
                                                                                                                                        are     or
                                                                                                                                               would
                                                                                                                                         spacing
                                                                                                                                               all   candrabindu
                                                                                                                                                  added  beonthetop,
                                                                                                                                                      characters,
                                                                                                                                                    aligned         typical
                                                                                                                                                                equally,   added
                                                                                                                                                                          to
                                                                                                                                                                          but  mark
                                                                                                                                                                                 atglyph
                                                                                                                                                                                  one   on
                                                                                                                                                                                       theshort
                                                                                                                                                                                          mighttop,       to of
                                                                                                                                                                                                representation
                                                                                                                                                                                               height    arguemark
                                                                                                                                                                                                        anusvāra  thatanusvāra
                                                                                                                                                                                                                  the    after
                                                                                                                                                                                                                           of    a long
                                                                                                                                                                                                                           thesuch
                                                                                                                                                                                                                         headlinelast   or
                                         « four nasalization.
                                                vowel.
                                               (and   not (Figure
                                               sequences
                                                     could
                                                DEVANAGARI  above  (all     (Figure
                                                                        be SIGN 8H)
                                                                              shown
                                                                             composed
                                                                               it)   as     8I)
                                                                                             here
                                                                                          shown with
                                                                                      CANDRABINDU    at
                                                                                                     next  a14base
                                                                                                                 points):
                                                                                                                AVAGRAHA
                                                                                                                to   thecharacter
                                                                                                                             ka as    isfollows:
                                                                                                                                          used     to क
                                                                                                                                                      mark
                                                                                                                                          and COMBINING
                                                                                                                                   क¬क√कƒक≈क∆क«कƒ         ¬ √ anusvāra.
                                                                                                                                                                    ≈ ∆ँक«Ë.ँकThere
                                                                                                                                                                ƒ CANDRABINDU       (Figure
                                                                                                                                                                                    ÈँकΩँक         . no
                                                                                                                                                                                               .isAn   8G)
                                                                                                                                                                                                        Note     how the
                                                                                                                                                                                                           argument
                                                                                                                                                                                                            question            true
                                                                                                                                                                                                                            (which
                                                                                                                                                                                                                           that   the
         4EB    M1\6                     ग first
                                         ख
                                         क     seems
                                                VEDICtwo
                                               AVAGRAHA to   usconnects
                                                          SIGN
                                                         SIGNof          to             with
                                                                             be particularly
                                                                           SAJIHVA
                                                                          BAHIRGOMUKHA
                                                                         these      here
                                                                          ANTARGOMUKHA           theto
                                                                                           BAHIRGOMUKHA
                                                                                             need      is
                                                                                                       KA    used,
                                                                                                        isstrong)
                                                                                                            beused,
                                                                                                               while
                                                                                                                 encoded   is
                                                                                                                           theused,
                                                                                                                        with
                                                                                                                         with
                                                                                                                         against           with
                                                                                                                                  aCANDRABINDU
                                                                                                                                   aasbindu
                                                                                                                                       bindu
                                                                                                                                        this    or
                                                                                                                                               would
                                                                                                                                         spacing    acandrabindu
                                                                                                                                                  addedbindu
                                                                                                                                                         be
                                                                                                                                                      characters,ortypical
                                                                                                                                                           onthetop,
                                                                                                                                                          AVAGRAHA    candrabindu
                                                                                                                                                                           added
                                                                                                                                                                          to
                                                                                                                                                                          but  mark
                                                                                                                                                                              does glyph
                                                                                                                                                                                  one   onshort
                                                                                                                                                                                        not.
                                                                                                                                                                                          mighttop,
                                                                                                                                                                                                 Weadded  to    onthat
                                                                                                                                                                                                              mark
                                                                                                                                                                                                        anusvāra
                                                                                                                                                                                                           prefer
                                                                                                                                                                                                representation
                                                                                                                                                                                                         argue        top,
                                                                                                                                                                                                                     the      to
                                                                                                                                                                                                                        anusvāra
                                                                                                                                                                                                                         after   amark
                                                                                                                                                                                                                            unique
                                                                                                                                                                                                                           of
                                                                                                                                                                                                                           the such long
                                                                                                                                                                                                                                 last   or
                                                anusvāra
                                                nasalization.
                                                vowel.
                                               encodings.
                                               sequences   (Figure or
                                                                   (all be nasalization.
                                                                            (Figure
                                                                                8H) 8I)
                                                                              shown          here  (Figure
                                                                                                     at a14base    8J)character
                                                                                                                 points):                                                ँकËँकÈँकΩँक               . Note        how the        true
                                             2 four  could                   composed           with                                      and COMBINING CANDRABINDU
                                                                                                                                   क¬क√कƒक≈क∆क«कƒ                                              . An        argument         (which
         4EC    M1#                      घ AVAGRAHA
                                         ग
                                         ख     seems
                                                VEDICto  SIGNusconnects
                                                          SIGN           LONG
                                                                         to             with
                                                                                    ANUSVARA
                                                                             be particularly
                                                                           SAJIHVA
                                                                          BAHIRGOMUKHA           the is
                                                                                           BAHIRGOMUKHAKA    used
                                                                                                             used,
                                                                                                               while
                                                                                                          strong)      to  is
                                                                                                                        with mark
                                                                                                                           the
                                                                                                                         againstused,  athis
                                                                                                                                          long
                                                                                                                                           with
                                                                                                                                  aCANDRABINDU
                                                                                                                                      bindu     oranusvāra
                                                                                                                                               wouldacandrabindu
                                                                                                                                                       bindu
                                                                                                                                                         be  theor
                                                                                                                                                          AVAGRAHAafter
                                                                                                                                                                    typical adoes
                                                                                                                                                                                short
                                                                                                                                                                      candrabindu
                                                                                                                                                                           added   glyphon vowel.
                                                                                                                                                                                        not.   top,
                                                                                                                                                                                                 Weadded   (Figure
                                                                                                                                                                                                          to
                                                                                                                                                                                                representation  on the
                                                                                                                                                                                                              mark
                                                                                                                                                                                                           prefer        8K)
                                                                                                                                                                                                                      top,    to
                                                                                                                                                                                                                        anusvāra
                                                                                                                                                                                                                            unique
                                                                                                                                                                                                                           of  suchmark or
                                      9. Additions
                                                anusvāra  foror
                                                nasalization.               nasalization.
                                                                         Devanāgarī.
                                                                            (Figure
                                                                              shown 8I)            (Figure
                                                                                             hereThe               8J) five
                                                                                                            following                    characters are proposed         ँकËँकas    Èँकadditions
                                                                                                                                                                                            Ωँक. Notetohow         the existing
                                             2 encodings.
                                               sequences           (all                              at 14       points):          क¬क√कƒक≈क∆क«कƒ                                                                         the true
         4ED    M1#/8                    घ AVAGRAHA
                                         ग
                                      DevanāgarīNOTE:
                                                VEDIC    SIGNSeveral
                                                        block.
                                                          SIGN     connects
                                                                         LONG
                                                                           SAJIHVA  of   the
                                                                                        with
                                                                                    ANUSVARA   characters
                                                                                                 the is
                                                                                           BAHIRGOMUKHAKAused  while above
                                                                                                                       tois  mark
                                                                                                                           the     could
                                                                                                                                used,  a long be anusvāra
                                                                                                                                           with
                                                                                                                                   CANDRABINDU      considered
                                                                                                                                                    a bindu      or
                                                                                                                                                          AVAGRAHA    to
                                                                                                                                                                  after     abe
                                                                                                                                                                      candrabindu
                                                                                                                                                                              does sequences
                                                                                                                                                                                short      vowel.
                                                                                                                                                                                        not.     Weadded   of  other
                                                                                                                                                                                                           (Figure
                                                                                                                                                                                                                on the
                                                                                                                                                                                                           prefer     top,characters.
                                                                                                                                                                                                                         8K)  to mark
                                                                                                                                                                                                                            unique
                                                The glyphs
                                      9. Additions
                                                anusvāra
                                               encodings.  fororDevanāgarī.for     the CANDRABINDU
                                                                            nasalization.          (Figure
                                                                                                  The              8J)
                                                                                                            following characters           are all aligned
                                                                                                                                 five characters                    equally,asat additions
                                                                                                                                                          are proposed                     the heightto of         thethe    headline
                                                                                                                                                                                                                          existing
         4EE    M1#\7                    !
                                         घ DEVANAGARI
                                      @’Devanāgarī
                                             2
                                                (and
                                                NOTE:
                                                VEDIC  not        above
                                                             Several
                                                        block.
                                                         SIGN             VOWEL
                                                                         LONG     it)
                                                                                    of asthe
                                                                                       SIGN
                                                                                    ANUSVARA shown     isnext
                                                                                               characters
                                                                                                CANDRA       used  toabove
                                                                                                                LONG   totheEmark     as
                                                                                                                               iskacould
                                                                                                                                   used    follows:
                                                                                                                                       a long            क ¬ √after
                                                                                                                                            inbeDevanagari
                                                                                                                                                    considered
                                                                                                                                                   anusvāra        ƒto≈ abe ∆short
                                                                                                                                                                   transcriptions «sequences
                                                                                                                                                                                    . There            is no
                                                                                                                                                                                               of Avestan
                                                                                                                                                                                           vowel.          of   question
                                                                                                                                                                                                               other
                                                                                                                                                                                                           (Figure  to 8K)
                                                                                                                                                                                                                         mark  that
                                                                                                                                                                                                                                  thethe
                                                                                                                                                                                                                          characters.
                                                first
                                               long
                                      9. Additions     two
                                                The schwa  forofb̄.Devanāgarī.
                                                       glyphs              these
                                                                           for     thehere
                                                                            (DEVANAGARI        need
                                                                                                  The
                                                                                                    VOWEL
                                                                                          CANDRABINDU     tofollowing
                                                                                                               beSIGNencoded
                                                                                                                      characters fiveascharacters
                                                                                                                             CANDRA         spacing
                                                                                                                                            E isall
                                                                                                                                           are     used  characters,
                                                                                                                                                          to mark
                                                                                                                                                       aligned
                                                                                                                                                          are           thebut
                                                                                                                                                                    equally,
                                                                                                                                                                proposed              one
                                                                                                                                                                                regular
                                                                                                                                                                                    asat        might
                                                                                                                                                                                           theschwa
                                                                                                                                                                                          additions  height  argue      that
                                                                                                                                                                                                                  (Figure
                                                                                                                                                                                                              to of
                                                                                                                                                                                                             b.)   thethe      the
                                                                                                                                                                                                                                9B)last
                                                                                                                                                                                                                             headline
                                                                                                                                                                                                                          existing
         4EF    M1#\6                 @’᪓
                                      Devanāgarīfour
                                                (and
                                                NOTE:  could
                                                       not        above
                                                             Several
                                                        block.
                                               DEVANAGARI
                                               DEVANAGARI                be   composed
                                                                           SIGN
                                                                          VOWEL   it)
                                                                                    of asthe
                                                                                     PUSHPIKA
                                                                                       SIGN  shownwithisnext
                                                                                               characters
                                                                                                CANDRA       used
                                                                                                             a LONG
                                                                                                                 base  as
                                                                                                                   toabove
                                                                                                                         theEaiska
                                                                                                                                 placeholder
                                                                                                                            character       inand
                                                                                                                                      as follows:
                                                                                                                                   used
                                                                                                                                   could              orक“filler”,
                                                                                                                                              beDevanagari
                                                                                                                                                    considered
                                                                                                                                                     COMBINING          ≈often
                                                                                                                                                             ¬ √ transcriptions
                                                                                                                                                                    ƒtoCANDRABINDU
                                                                                                                                                                             be     . flanked
                                                                                                                                                                             ∆ «sequences
                                                                                                                                                                                       There             by
                                                                                                                                                                                               of .Avestan
                                                                                                                                                                                                       isAnof double
                                                                                                                                                                                                           no  argument
                                                                                                                                                                                                                question
                                                                                                                                                                                                               otherto mark dandas
                                                                                                                                                                                                                                (which
                                                                                                                                                                                                                               that
                                                                                                                                                                                                                                  thethe
                                                                                                                                                                                                                          characters.
                                               (Figure
                                                seems
                                                first
                                               long
                                                The    two9C)of
                                                          to
                                                      schwa
                                                       glyphs      usb̄.these
                                                                           to   bethe particularly
                                                                                      here
                                                                            (DEVANAGARI
                                                                           for                 need VOWEL
                                                                                          CANDRABINDU     tostrong)
                                                                                                               beSIGNencoded against
                                                                                                                      characters
                                                                                                                             CANDRA    asthis
                                                                                                                                            spacing
                                                                                                                                            E isall
                                                                                                                                           are     would
                                                                                                                                                   used   tobemark
                                                                                                                                                       aligned   the
                                                                                                                                                         characters,    typical
                                                                                                                                                                        thebut
                                                                                                                                                                    equally,            glyph
                                                                                                                                                                                      one
                                                                                                                                                                                regular
                                                                                                                                                                                      at        might
                                                                                                                                                                                           theschwa    representation
                                                                                                                                                                                                     height  argue
                                                                                                                                                                                                             b.)of             of
                                                                                                                                                                                                                                9B)such
                                                                                                                                                                                                                        thatheadline
                                                                                                                                                                                                                  (Figure
                                                                                                                                                                                                                       the     the    last
         4F0    M2                    >
                                      @’᪓᪔ 3 DEVANAGARI
                                                sequences
                                                four
                                                (and   could
                                                       not above
                                               DEVANAGARI                (all
                                                                         be
                                                                          CARET
                                                                           SIGN
                                                                          VOWEL  shown
                                                                              composed is   used
                                                                                  it)PUSHPIKA
                                                                                       as
                                                                                       SIGN    here
                                                                                             shown  toisat
                                                                                                  with
                                                                                                CANDRA     mark
                                                                                                             a14
                                                                                                             used
                                                                                                          next   base
                                                                                                                LONG  the
                                                                                                                   topoints):
                                                                                                                       asthe ainsertion
                                                                                                                                 placeholder
                                                                                                                            character
                                                                                                                            E  iska   as
                                                                                                                                   used        point
                                                                                                                                            inand
                                                                                                                                           follows:   orof
                                                                                                                                                Devanagari
                                                                                                                                      क¬क√कƒक≈क∆क«कƒ        omitted
                                                                                                                                                         क“filler”,        text
                                                                                                                                                                             ∆ँक«Ëand
                                                                                                                                                                        ≈often
                                                                                                                                                             ¬ √ transcriptions
                                                                                                                                                     COMBINING      ƒ CANDRABINDU   .ँक  Èto
                                                                                                                                                                                      flanked
                                                                                                                                                                                       There ँकofΩmark   . no
                                                                                                                                                                                                         by
                                                                                                                                                                                                      .Avestan
                                                                                                                                                                                                     ँकisAn    word
                                                                                                                                                                                                            Note
                                                                                                                                                                                                              double how
                                                                                                                                                                                                               argument
                                                                                                                                                                                                                question
                                                                                                                                                                                                                    to   division.
                                                                                                                                                                                                                         mark the
                                                                                                                                                                                                                            dandasthetrue
                                                                                                                                                                                                                                (which
                                                                                                                                                                                                                               that    the
                                               The  divider
                                               (Figure
                                                seems
                                                first
                                               long    two9C)of
                                                          to
                                                      schwa
                                                AVAGRAHA                 sign
                                                                   usb̄.connects
                                                                           to
                                                                           thesebehas here
                                                                            (DEVANAGARI   awith
                                                                                              distinctive
                                                                                      particularly
                                                                                               needthe
                                                                                                    VOWEL toKA    shape
                                                                                                               bewhile
                                                                                                             strong) encoded
                                                                                                                  SIGN         withCANDRABINDU
                                                                                                                               the
                                                                                                                             against
                                                                                                                             CANDRA    asathis
                                                                                                                                            thin
                                                                                                                                            spacing descending
                                                                                                                                            E iswould
                                                                                                                                                   used   tobe   thediagonal
                                                                                                                                                         characters,
                                                                                                                                                              mark
                                                                                                                                                             AVAGRAHA   typical
                                                                                                                                                                        thebut    does
                                                                                                                                                                                regular  and
                                                                                                                                                                                        glyph
                                                                                                                                                                                      one    not. thick
                                                                                                                                                                                                might
                                                                                                                                                                                                schwa         rising
                                                                                                                                                                                                        Weargueprefer
                                                                                                                                                                                                       representation
                                                                                                                                                                                                             b.)         diagonal
                                                                                                                                                                                                                          the of
                                                                                                                                                                                                                        that
                                                                                                                                                                                                                  (Figure       unique
                                                                                                                                                                                                                               the
                                                                                                                                                                                                                                9B) such
                                                                                                                                                                                                                                      last
         4F1    M2/8                  >      3 that
                                      ᪓᪔ DEVANAGARI
                                                four
                                               the
                                                    distinguish
                                                encodings.
                                                sequences
                                                       could (all
                                                   point:     Á᪔ uswhich
                                                                         be     itshown
                                                                                    from
                                                                              composed
                                                                          CARET
                                                                           SIGN        is used
                                                                                     PUSHPIKA the
                                                                                               here generic
                                                                                                    to
                                                                                                  withisatmark
                                                                                                             a14base
                                                                                                             used    caret
                                                                                                                      the
                                                                                                                     points):   U+2038.
                                                                                                                             ainsertion
                                                                                                                       as character
                                                                                                                                 placeholder  andIt is
                                                                                                                                               point
                                                                                                                                      क¬क√कƒक≈क∆क«कƒ    a “filler”,
                                                                                                                                                      orofzero-width
                                                                                                                                                            omittedCANDRABINDU
                                                                                                                                                     COMBINING                ँकspacing
                                                                                                                                                                           text
                                                                                                                                                                           often  Ëand   Èto
                                                                                                                                                                                      flanked
                                                                                                                                                                                     ँक      ँकcharacter
                                                                                                                                                                                                  Ωmark  . Note
                                                                                                                                                                                                         by
                                                                                                                                                                                                      . An
                                                                                                                                                                                                     ँक        wordcentered
                                                                                                                                                                                                              double how
                                                                                                                                                                                                               argument  division.
                                                                                                                                                                                                                              the
                                                                                                                                                                                                                            dandas ontrue
                                                                                                                                                                                                                                (which
                                               The  divider
                                               (Figure
                                                seems
                                                AVAGRAHA  9C)
                                                          to               to beis
                                                                         sign
                                                                         connects  hasusedawithbetween
                                                                                              distinctive
                                                                                      particularly the KA       orthographic
                                                                                                                  shapeagainst
                                                                                                                 while
                                                                                                             strong)           the       asyllables:
                                                                                                                               withCANDRABINDU
                                                                                                                                            thinwould
                                                                                                                                          this      descending
                                                                                                                                                         कÀ᪔be
                                                                                                                                                             कÀthe  koko.
                                                                                                                                                             AVAGRAHA diagonal
                                                                                                                                                                        typical (Figure
                                                                                                                                                                                  does   and
                                                                                                                                                                                        glyph   9D)
                                                                                                                                                                                             not. thick We rising
                                                                                                                                                                                                               preferdiagonal
                                                                                                                                                                                                       representation     the ofunique
                                                                                                                                                                                                                                    such
                                      ᪙9.᪔ Additions
                                               that distinguish
                                                encodings.
                                                sequences
                                               DEVANAGARI
                                               DEVANAGARI    for(all      LETTER
                                                                          CARET itshown
                                                                                    from
                                                                           Devanāgarī. isZHAusedis used
                                                                                              the
                                                                                               here generic
                                                                                                    The
                                                                                                    to  atmark in Devanagari
                                                                                                              14     caret
                                                                                                              following         U+2038.
                                                                                                                      the insertion
                                                                                                                     points):       five   transcriptions
                                                                                                                                                 It is of
                                                                                                                                            characters
                                                                                                                                               point
                                                                                                                                      क¬क√कƒक≈क∆क«कƒ         areofproposed
                                                                                                                                                        a zero-width
                                                                                                                                                            omitted  Avestan
                                                                                                                                                                           text      ँकtoas
                                                                                                                                                                              ँकspacing
                                                                                                                                                                                  Ëand   Èmark       ँक.the
                                                                                                                                                                                                 character
                                                                                                                                                                                             ँकadditions
                                                                                                                                                                                             to   Ωmark        voiced
                                                                                                                                                                                                                   centered
                                                                                                                                                                                                                  to
                                                                                                                                                                                                               word
                                                                                                                                                                                                            Note     howthe palatal
                                                                                                                                                                                                                              theontrue
                                                                                                                                                                                                                              existing
                                                                                                                                                                                                                         division.
                                               fricative
                                               the
                                         Devanāgarī
                                               The point:
                                                    divider
                                                AVAGRAHA    [b].
                                                         block.Á᪔ which   (Figure
                                                                         sign
                                                                         connects  is    9E)
                                                                                   hasusedawithbetween
                                                                                                   the KA orthographic
                                                                                              distinctive         shape the
                                                                                                                 while                   asyllables:
                                                                                                                               withCANDRABINDU
                                                                                                                                            thin descending  कÀ koko.
                                                                                                                                                         कÀ᪔AVAGRAHA  diagonal  (Figure
                                                                                                                                                                                  doesand       9D)
                                                                                                                                                                                             not. thick We rising
                                                                                                                                                                                                               preferdiagonal
                                                                                                                                                                                                                          the unique
                                      ᪙9. Additions
                                               that distinguish
                                                encodings.
                                               DEVANAGARI    for LETTER         it from
                                                                           Devanāgarī.  ZHAthe  is used
                                                                                                    generic
                                                                                                    The        in Devanagari
                                                                                                                     caret U+2038.
                                                                                                              following             five transcriptions      areofproposed
                                                                                                                                                 It is a zero-width
                                                                                                                                            characters               Avestan    spacingtoasmark  character
                                                                                                                                                                                               additions  the voiced    thepalatal
                                                                                                                                                                                                                   centered
                                                                                                                                                                                                                  to               on
                                                                                                                                                                                                                              existing
                                                                                                                                                                                                                                    9
                                               fricative
                                         @’Devanāgarī
                                               the point:   [b].
                                                         block.
                                                DEVANAGARI      Á᪔ which  (Figure
                                                                            VOWEL        9E)
                                                                                   is usedSIGNbetween
                                                                                                  CANDRAorthographic
                                                                                                                  LONG E is used          syllables:
                                                                                                                                               in Devanagari
                                                                                                                                                         कÀ᪔कÀ koko.            (Figure 9D)
                                                                                                                                                                      transcriptions                  of Avestan to mark the
i                                     ᪙9. Additions
                                                long schwa
                                               DEVANAGARI    for b̄.       Devanāgarī.
                                                                          LETTER        ZHA is used
                                                                              (DEVANAGARI           The
                                                                                                      VOWEL    in Devanagari
                                                                                                              following
                                                                                                                      SIGN CANDRA   five transcriptions
                                                                                                                                               E is used are
                                                                                                                                            characters       toof    Avestan
                                                                                                                                                                  mark
                                                                                                                                                                    proposed
                                                                                                                                                                                         i
                                                                                                                                                                                       toasmark
                                                                                                                                                                             the regular                  the voiced
                                                                                                                                                                                                       schwa
                                                                                                                                                                                               additions                thepalatal
                                                                                                                                                                                                                  to (Figure
                                                                                                                                                                                                                 b.)                9B)
                                                                                                                                                                                                                              existing
                                                                                                                                                                                                                                    9
                                         @’᪓   fricative
                                         Devanāgarī         [b]. (Figure
                                                         block.
                                                DEVANAGARI
                                                DEVANAGARI                               9E)
                                                                             SIGN PUSHPIKA
                                                                            VOWEL         SIGN CANDRA      is usedLONG     as Eaisplaceholder
                                                                                                                                       used in Devanagarior “filler”,          often flanked
                                                                                                                                                                      transcriptions                         by double
                                                                                                                                                                                                      of Avestan         to markdandas the
                                                (Figure
                                                long   schwa9C)b̄. (DEVANAGARI VOWEL SIGN CANDRA E is used to mark the regular schwa b.) (Figure 9B)
                                                                                                                                                                                                                                    9
    i                                    @’᪓᪔ DEVANAGARI
                                                DEVANAGARI VOWEL            CARET
                                                                             SIGN PUSHPIKAis
                                                                                          SIGNused     tois mark
                                                                                                  CANDRA       used
                                                                                                                  LONG    the
                                                                                                                           as Eainsertion
                                                                                                                                       used inpoint
                                                                                                                                  isplaceholder     Devanagari     itranscriptions
                                                                                                                                                         orof“filler”,
                                                                                                                                                               omitted          text and
                                                                                                                                                                               often       flanked toofmark        word
                                                                                                                                                                                                             by double
                                                                                                                                                                                                           Avestan       to division.
                                                                                                                                                                                                                                dandas
                                                                                                                                                                                                                             mark      the
                                                The
                                                (Figure
                                                long   divider
                                                       schwa9C)b̄.sign               has a distinctive
                                                                              (DEVANAGARI             VOWEL shape     SIGN CANDRA  with a thin        descending
                                                                                                                                               E is used     to markdiagonal the regular       andschwa  thick b.)rising     diagonal
                                                                                                                                                                                                                       (Figure      9B)
                                         ᪓᪔ DEVANAGARI
                                                that distinguish            CARET
                                                                             SIGN  it fromis used
                                                                                       PUSHPIKA  the togeneric usedcaret
                                                                                                           is mark        the       U+2038.
                                                                                                                           as ainsertion
                                                                                                                                     placeholder    It isorof
                                                                                                                                                   point   a “filler”,
                                                                                                                                                             zero-width
                                                                                                                                                               omitted often    textspacing
                                                                                                                                                                                         and
                                                                                                                                                                                           flanked tocharacter
                                                                                                                                                                                                         mark      word
                                                                                                                                                                                                             by double  centered
                                                                                                                                                                                                                             division.
                                                                                                                                                                                                                                dandas  on
                                                the
                                                The  point:
                                                (Figuredivider
                                                            9C)     Á᪔ which
                                                                           sign is   hasused      between orthographic
                                                                                             a distinctive           shape with asyllables:                 कÀ᪔कÀ koko.
                                                                                                                                               thin descending             diagonal (Figure    and9D)    thick rising diagonal
                                         ᪙᪔ DEVANAGARI
                                                that distinguish            LETTER
                                                                            CARET  it fromisZHA
                                                                                              usedis used
                                                                                                 the   generic
                                                                                                       to        in Devanagari
                                                                                                              mark      caret       U+2038.
                                                                                                                          the insertion       transcriptions
                                                                                                                                                    It is of
                                                                                                                                                   point             of Avestan
                                                                                                                                                           a zero-width
                                                                                                                                                               omitted          textspacing
                                                                                                                                                                                         andto marktocharacter
                                                                                                                                                                                                         mark the word
                                                                                                                                                                                                                    voiced
                                                                                                                                                                                                                        centeredpalatal
                                                                                                                                                                                                                             division.  on
                                                fricative
                                                the point:
                                                The    divider[b].          (Figure
                                                                     Á᪔ which
                                                                           sign      is
                                                                                     hasused9E)   between orthographic
                                                                                             a distinctive           shape with asyllables:                 कÀ᪔कÀ koko.
                                                                                                                                               thin descending             diagonal (Figure    and9D)    thick rising diagonal
                                         ᪙ DEVANAGARI
                                                that distinguish            LETTER it from ZHAthe is used
                                                                                                       generic   in Devanagari
                                                                                                                        caret U+2038.         transcriptions         of Avestan
                                                                                                                                                    It is a zero-width              spacing to mark           the voiced
                                                                                                                                                                                                        character       centeredpalatal on
                                                fricative                                                                                                                                                                                9
                                                the point:[b].              (Figure
                                                                      Á᪔ which       is used9E) between orthographic syllables: कÀ᪔कÀ koko. (Figure 9D)
                                         ᪙ DEVANAGARI LETTER ZHA is used in Devanagari transcriptions of Avestan to mark the voiced palatal
                                                fricative [b]. (Figure 9E)                                                                                                                                                               9
    i                                                           i
i i
    i                                                           i
    i                                                          i
204 APPENDICES
i i
    i                                                          i
    i                                                                                 i
Appendix D
                                           205
i                                                                                         i
    i                                                                                 i
    i                                                                               i
206 APPENDICES
        one through three of the header show higher nodes in Halle’s feature tree
        as shown in Table 12. ‘GUTTRL’ stands for GUTTERAL, ‘SPal’ and
        ‘spal’ for soft palate, and ‘tblade’ for tongue blade. The abbreviations
        shown in columns 3-21 in row four of the table header are given in the
        following table:
                G     glottal
                Sp    spread glottis
                St    stiff vocal folds
                Sl    slack vocal folds
                R     rhinal
                N     nasal
                Dr    dorsal
                B     back
                H     high
                L     low
                Cr    coronal
                A     anterior
                Dt    distributed
                Lb    labial
                Rd    round
                Cn    consonantal
                Sn    sonorant
                Ct    continuant
                Lt    lateral
i i
    i                                                                               i
    i                                                                           i
                            GUTTRL                    PLACE
                             Larynx SPal       Dorsal    Coronal Labial
                              glottis spal tongue body    tblade lips
        SLP3 SLP2 SLP1      G Sp St Sl R N Dl B H L Cr A Dt Lb Rd Cn Sn Ct Lt
           1   000   a        − − − − + + − +                           −
           2   001   a~       − − − + + + − +                           −
           3   002   a/       − + − − + + − +                           −
           4   003   a/~      − + − + + + − +                           −
           5   004   a\       − − + − + + − +                           −
           6   005   a\~      − − + + + + − +                           −
           7   006   a^       − + + − + + − +                           −
           8   007   a^~      − + + + + + − +                           −
           9   030   A        − − − − + + − +                           −
          10   031   A~       − − − + + + − +                           −
          11   032   A/       − + − − + + − +                           −
          12   033   A/~      − + − + + + − +                           −
          13   034   A\       − − + − + + − +                           −
          14   035   A\~      − − + + + + − +                           −
          15   036   A^       − + + − + + − +                           −
          16   037   A^~      − + + + + + − +                           −
          17   048   a3       − − − − + + − +                           −
          18   049   a3~      − − − + + + − +                           −
          19   04A   a3/      − + − − + + − +                           −
          20   04B   a3/~     − + − + + + − +                           −
          21   04C   a3\      − − + − + + − +                           −
          22   04D   a3\~     − − + + + + − +                           −
          23   04E   a3^      − + + − + + − +                           −
          24   04F   a3^~     − + + + + + − +                           −
          25   080   i        − − − − + − + −                           −
          26   081   i~       − − − + + − + −                           −
          27   082   i/       − + − − + − + −                           −
          28   083   i/~      − + − + + − + −                           −
          29   084   i\       − − + − + − + −                           −
          30   085   i\~      − − + + + − + −                           −
          31   086   i^       − + + − + − + −                           −
          32   087   i^~      − + + + + − + −                           −
          33   0B0   I        − − − − + − + −                           −
i i
    i                                                                           i
    i                                                                             i
208 APPENDICES
                       GUTTRL                    PLACE
                        Larynx SPal       Dorsal    Coronal   Labial
                         glottis spal tongue body    tblade    lips
        SLP3 SLP2 SLP1 G Sp St Sl R N Dl B H L Cr A Dt        Lb Rd Cn Sn Ct Lt
          34 0B1 I~      − − − + + − + −                             −
          35 0B2 I/      − + − − + − + −                             −
          36 0B3 I/~     − + − + + − + −                             −
          37 0B4 I\      − − + − + − + −                             −
          38 0B5 I\~     − − + + + − + −                             −
          39 0B6 I^      − + + − + − + −                             −
          40 0B7 I^~     − + + + + − + −                             −
          41 0C8 i3      − − − − + − + −                             −
          42 0C0 i3~     − − − + + − + −                             −
          43 0CA i3/     − + − − + − + −                             −
          44 0CB i3/~    − + − + + − + −                             −
          45 0CC i3\     − − + − + − + −                             −
          46 0CD i3\~    − − + + + − + −                             −
          47 0CE i3^     − + + − + − + −                             −
          48 0CF i3^~    − + + + + − + −                             −
          49 100 u       − − − − + + + −                      + + −
          50 101 u~      − − − + + + + −                      + + −
          51 102 u/      − + − − + + + −                      + + −
          52 103 u/~     − + − + + + + −                      + + −
          53 104 u\      − − + − + + + −                      + + −
          54 105 u\~     − − + + + + + −                      + + −
          55 106 u^      − + + − + + + −                      + + −
          56 107 u^~     − + + + + + + −                      + + −
          57 130 U       − − − − + + + −                      + + −
          58 131 U~      − − − + + + + −                      + + −
          59 132 U/      − + − − + + + −                      + + −
          60 133 U/~     − + − + + + + −                      + + −
          61 134 U\      − − + − + + + −                      + + −
          62 135 U\~     − − + + + + + −                      + + −
          63 136 U^      − + + − + + + −                      + + −
          64 137 U^~     − + + + + + + −                      + + −
          65 148 u3      − − − − + + + −                      + + −
          66 149 u3~     − − − + + + + −                      + + −
i i
    i                                                                             i
    i                                                                                     i
                            GUTTRL                    PLACE
                             Larynx SPal       Dorsal    Coronal   Labial
                              glottis spal tongue body    tblade    lips
        SLP3 SLP2 SLP1      G Sp St Sl R N Dl B H L Cr A Dt        Lb Rd    Cn Sn Ct Lt
          67   14A   u3/      − + − − + + + −                      + +      −
          68   14B   u3/~     − + − + + + + −                      + +      −
          69   14C   u3\      − − + − + + + −                      + +      −
          70   14D   u3\~     − − + + + + + −                      + +      −
          71   14E   u3^      − + + − + + + −                      + +      −
          72   14F   u3^~     − + + + + + + −                      + +      −
          73   180   f        − − − −                                       −
          74   181   f~       − − − +                                       −
          75   182   f/       − + − −                                       −
          76   183   f/~      − + − +                                       −
          77   184   f\       − − + −                                       −
          78   185   f\~      − − + +                                       −
          79   186   f^       − + + −                                       −
          80   187   f^~      − + + +                                       −
          81   1B0   F        − − − −                                       −
          82   1B1   F~       − − − +                                       −
          83   1B2   F/       − + − −                                       −
          84   1B3   F/~      − + − +                                       −
          85   1B4   F\       − − + −                                       −
          86   1B5   F\~      − − + +                                       −
          87   1B6   F^       − + + −                                       −
          88   1B7   F^~      − + + +                                       −
          89   1C8   f3       − − − −                                       −
          90   1C9   f3~      − − − +                                       −
          91   1CA   f3/      − + − −                                       −
          92   1CB   f3/~     − + − +                                       −
          93   1CC   f3\      − − + −                                       −
          94   1CD   f3\~     − − + +                                       −
          95   1CE   f3^      − + + −                                       −
          96   1CF   f3^~     − + + +                                       −
          97   200   x        − − − −                                       −
          98   201   x~       − − − +                                       −
          99   202   x/       − + − −                                       −
i i
    i                                                                                     i
    i                                                                           i
210 APPENDICES
                            GUTTRL                    PLACE
                             Larynx SPal       Dorsal    Coronal Labial
                              glottis spal tongue body    tblade lips
        SLP3 SLP2 SLP1      G Sp St Sl R N Dl B H L Cr A Dt Lb Rd Cn Sn Ct Lt
         100   203   x/~      − + − +                                   −
         101   204   x\       − − + −                                   −
         102   205   x\~      − − + +                                   −
         103   206   x^       − + + −                                   −
         104   207   x^~      − + + +                                   −
         105   230   X        − − − −                                   −
         106   231   X~       − − − +                                   −
         107   232   X/       − + − −                                   −
         108   233   X/~      − + − +                                   −
         109   234   X\       − − + −                                   −
         110   235   X\~      − − + +                                   −
         111   236   X^       − + + −                                   −
         112   237   X^~      − + + +                                   −
         113   248   x3       − − − −                                   −
         114   249   x3~      − − − +                                   −
         115   24A   x3/      − + − −                                   −
         116   24B   x3/~     − + − +                                   −
         117   24C   x3\      − − + −                                   −
         118   24D   x3\~     − − + +                                   −
         119   24E   x3^      − + + −                                   −
         120   24F   x3^~     − + + +                                   −
         121   280   e1       − − − − + − − −                           −
         122   281   e1~      − − − + + − − −                           −
         123   282   e1/      − + − − + − − −                           −
         124   283   e1/~     − + − + + − − −                           −
         125   284   e1\      − − + − + − − −                           −
         126   285   e1\~     − − + + + − − −                           −
         127   286   e1^      − + + − + − − −                           −
         128   287   e1^~     − + + + + − − −                           −
         129   298   e        − − − − + − − −                           −
         130   299   e~       − − − + + − − −                           −
         131   29A   e/       − + − − + − − −                           −
         132   29B   e/~      − + − + + − − −                           −
i i
    i                                                                           i
    i                                                                                  i
                            GUTTRL                    PLACE
                             Larynx SPal       Dorsal    Coronal   Labial
                              glottis spal tongue body    tblade    lips
        SLP3 SLP2 SLP1      G Sp St Sl R N Dl B H L Cr A Dt        Lb Rd Cn Sn Ct Lt
         133   29C   e\       − − + − + − − −                             −
         134   29D   e\~      − − + + + − − −                             −
         135   29E   e^       − + + − + − − −                             −
         136   29F   e^~      − + + + + − − −                             −
         137   2B0   e3       − − − − + − − −                             −
         138   2B1   e3~      − − − + + − − −                             −
         139   2B2   e3/      − + − − + − − −                             −
         140   2B3   e3/~     − + − + + − − −                             −
         141   2B4   e3\      − − + − + − − −                             −
         142   2B5   e3\~     − − + + + − − −                             −
         143   2B6   e3^      − + + − + − − −                             −
         144   2B7   e3^~     − + + + + − − −                             −
         145   300   E        − − − − + +/− −/+ +/−                       −
         146   301   E~       − − − + + +/− −/+ +/−                       −
         147   302   E/       − + − − + +/− −/+ +/−                       −
         148   303   E/~      − + − + + +/− −/+ +/−                       −
         149   304   E\       − − + − + +/− −/+ +/−                       −
         150   305   E\~      − − + + + +/− −/+ +/−                       −
         151   306   E^       − + + − + +/− −/+ +/−                       −
         152   307   E^~      − + + + + +/− −/+ +/−                       −
         153   318   E3       − − − − + +/− −/+ +/−                       −
         154   319   E3~      − − − + + +/− −/+ +/−                       −
         155   31A   E3/      − + − − + +/− −/+ +/−                       −
         156   31B   E3/~     − + − + + +/− −/+ +/−                       −
         157   31C   E3\      − − + − + +/− −/+ +/−                       −
         158   31D   E3\~     − − + + + +/− −/+ +/−                       −
         159   31E   E3^      − + + − + +/− −/+ +/−                       −
         160   31F   E3^~     − + + + + +/− −/+ +/−                       −
         161   380   o1       − − − − + + − −                      + + −
         162   381   o1~      − − − + + + − −                      + + −
         163   382   o1/      − + − − + + − −                      + + −
         164   383   o1/~     − + − + + + − −                      + + −
         165   384   o1\      − − + − + + − −                      + + −
i i
    i                                                                                  i
    i                                                                                     i
212 APPENDICES
                            GUTTRL                    PLACE
                             Larynx SPal       Dorsal    Coronal   Labial
                              glottis spal tongue body    tblade    lips
        SLP3 SLP2 SLP1      G Sp St Sl R N Dl B H L Cr A Dt        Lb Rd    Cn Sn Ct Lt
         166   385   o1\~     − − + + + + − −                      + +      −
         167   386   o1^      − + + − + + − −                      + +      −
         168   387   o1^~     − + + + + + − −                      + +      −
         169   398   o        − − − − + + − −                      + +      −
         170   399   o~       − − − + + + − −                      + +      −
         171   39A   o/       − + − − + + − −                      + +      −
         172   39B   o/~      − + − + + + − −                      + +      −
         173   39C   o\       − − + − + + − −                      + +      −
         174   39D   o\~      − − + + + + − −                      + +      −
         175   39E   o^       − + + − + + − −                      + +      −
         176   39F   o^~      − + + + + + − −                      + +      −
         177   3B0   o3       − − − − + + − −                      + +      −
         178   3B1   o3~      − − − + + + − −                      + +      −
         179   3B2   o3/      − + − − + + − −                      + +      −
         180   3B3   o3/~     − + − + + + − −                      + +      −
         181   3B4   o3\      − − + − + + − −                      + +      −
         182   3B5   o3\~     − − + + + + − −                      + +      −
         183   3B6   o3^      − + + − + + − −                      + +      −
         184   3B7   o3^~     − + + + + + − −                      + +      −
         185   400   O        − − − − + +/+ −/+ +/−                + +      −
         186   401   O~       − − − + + +/+ −/+ +/−                + +      −
         187   402   O/       − + − − + +/+ −/+ +/−                + +      −
         188   403   O/~      − + − + + +/+ −/+ +/−                + +      −
         189   404   O\       − − + − + +/+ −/+ +/−                + +      −
         190   405   O\~      − − + + + +/+ −/+ +/−                + +      −
         191   406   O^       − + + − + +/+ −/+ +/−                + +      −
         192   407   O^~      − + + + + +/+ −/+ +/−                + +      −
         193   418   O3       − − − − + +/+ −/+ +/−                + +      −
         194   419   O3~      − − − + + +/+ −/+ +/−                + +      −
         195   41A   O3/      − + − − + +/+ −/+ +/−                + +      −
         196   41B   O3/~     − + − + + +/+ −/+ +/−                + +      −
         197   41C   O3\      − − + − + +/+ −/+ +/−                + +      −
         198   41D   O3\~     − − + + + +/+ −/+ +/−                + +      −
i i
    i                                                                                     i
    i                                                                                i
                       GUTTRL                    PLACE
                        Larynx SPal       Dorsal    Coronal   Labial
                         glottis spal tongue body    tblade    lips
        SLP3 SLP2 SLP1 G Sp St Sl R N Dl B H L Cr A Dt        Lb Rd Cn   Sn Ct Lt
         199 41E O3^     − + + − + +/+ −/+ +/−                + + −
         200 41F O3^~ − + + + + +/+ −/+ +/−                   + + −
         201 480 k       − + − − +                                   +   −   −
         202 483 K       + + − − +                                   +   −   −
         203 486 g       − − + − +                                   +   −   −
         204 489 G       + − + − +                                   +   −   −
         205 48C N       − − + + +                                   +   +       −
         206 48E c       − + − −                    + −+             +   −   −
         207 491 C       + + − −                    + −+             +   −   −
         208 494 j       − − + −                    + −+             +   −   −
         209 497 J       + − + −                    + −+             +   −   −
         210 49A Y       − − + +                    + −+             +   +       −
         211 49C w       − + − −                    + −−             +   −   −
         212 49F W       + + − −                    + −−             +   −   −
         213 4A2 q       − − + −                    + −−             +   −   −
         214 4A5 L       − − + −                    + −−             +   +       +
         215 4A6 Q       + − + −                    + −−             +   −   −
         216 4A9 |       + − + −                    + −−             +   +       +
         217 4AA R       − − + +                    + −−             +   +       −
         218 4AC t       − + − −                    + +              +   −   −
         219 4AF T       + + − −                    + +              +   −   −
         220 4B2 d       − − + −                    + +              +   −   −
         221 4B5 D       + − + −                    + +              +   −   −
         222 4B8 n       − − + +                    + +              +   +       −
         223 4BA p       − + − −                              + − +      −   −
         224 4BD P       + + − −                              + − +      −   −
         225 4C0 b       − − + −                              + − +      −   −
         226 4C3 B       + − + −                              + − +      −   −
         227 4C6 m       − − + +                              + − +      +       −
         228 4C8 y       − − + −                       −+            −
         229 4CC y~      − − + +                       −+            −
         230 4CD r       − − + −                    + −−             +   +       −
         231 4CE l       − − + −                    + +              +   +       +
i i
    i                                                                                i
    i                                                                                 i
214 APPENDICES
                       GUTTRL                      PLACE
                        Larynx      SPal    Dorsal    Coronal   Labial
                         glottis    spal tongue body   tblade    lips
        SLP3 SLP2 SLP1 G Sp St Sl   R N Dl B H L Cr A Dt        Lb Rd Cn   Sn Ct Lt
         232 4D0 l~      − − +        +               + +              +   +     +
         233 4D1 v       − − +        −                         + + −
         234 4D5 v~      − − +        +                         + + −
         235 4D6 S       + + −        −               + −+             +   − +
         236 4D7 z       + + −        −               + −−             +   − +
         237 4D8 s       + + −        −               + +              +   − +
         238 4D9 h     ++ − +         −                                −
         239 4DB H     ++ + −         −                                −
         240 4E1 Z       + + −        − +                              +   − +
         241 4E2 V       + + −        −                         + − +      − +
         242 4E3 M       + − +      ++                                 −
i i
    i                                                                                 i
    i                                                                                 i
Appendix E
        Malcolm D. Hyman
        12 November 1970 – 2 September 2009
                                           215
i                                                                                         i
    i                                                                                 i
    i                                                                                  i
216 APPENDICES
        where she had been living permanently for several years. The atmosphere
        seemed congenial for the raising of a fatherless child, and we cooperated
        in his upbringing as long as he lived at home. Malcolm was rather shy
        as a child, though in college he became much more extroverted. From
        the beginning, he was deeply compassionate, with a respect for all living
        things. One could not so much as swat a mosquito in his presence. He
        was also a natural leader, one whose influence sprang from his generos-
        ity toward others and his having thought through what he had to say. All
        his life, Malcolm was his ideas, which, in turn, were imbued with his
        passionate convictions about the moral worth of creation.
             Malcolm began talking late – he spoke a private language until kinder-
        garten (this tendency ran in the male line in my family) which, in his case,
        may have influenced his fascination with grammar, syntax and language
        in general. Within a few months of entering first grade, he was read-
        ing on an adult level. One day, he mentioned the schwa to me. When
        I expressed ignorance, he asked with genuine surprise, “Don’t you pay
        attention to the diacritical marks in the dictionary? In second grade, he
        brought home a book from the music library and, in a weekend, taught
        himself to read both treble and bass clef, not to mention alto and tenor.
        Shortly thereafter, he asked to take piano lessons, which he continued
        throughout high school. His musical gifts were significant enough to
        contemplate a career as a pianist or composer. By this time, he was
        spending every weekend in a city eighty miles from our home, working
        with a piano coach and studying harmony and composition.
             He also brought home grammar text books throughout school, pour-
        ing over them and sometimes pointing out mistakes in the author’s rea-
        soning. At eleven, having never laid hands on a computer, he bought two
        books on programming. The following Monday, he walked into the high
        school computer lab and asked the teacher if he might try something. He
        then wrote a program that surpassed the teacher’s skills. That summer,
        he enrolled in a graduate level computer course at The University of Wis-
        consin at LaCrosse. A year later, he began to attend the summer sessions
        in computing at Michigan Tech, where professors often asked him to as-
        sist in teaching the other students. At twelve, he was running his own
        software company, creating programs for local businesses.
             Wisconsin devotes many of its resources to education from the pri-
        mary grades through its excellent state university system, and the Three
i i
    i                                                                                  i
    i                                                                                i
i i
    i                                                                                i
    i                                                                                 i
218 APPENDICES
        and maintained the highly sophisticated web site for the Brown Classics
        Department, and picked up design tips from the Rhode Island School of
        Design, next door. Some intellects keep their various interests in sep-
        arate compartments. Malcolm, however, saw the relationships between
        different disciplines, and his vast knowledge in one area illuminated his
        understanding of others. His thirst for knowledge continued to expand as
        long as he lived. He read voraciously in literature, philosophy, psychol-
        ogy, history and the sciences. Popular culture was a longtime fascination
        for him, and music remained close to his heart.
             Malcolm’s dissertation concerned the way Latin grammarians treated
        barbarisms and solecisms. His first significant published paper, “Bad
        Grammar in Context,” defined his philosophy in this regard: Let me con-
        clude by sketching the “big picture,” as I see it. Language is constitu-
        itive of institutions – such as religion and law – that serve to produce
        social cohesion. Given that spoken and written language are the media
        par excellence for communication, the importance of linguistic norms
        in maintaining group identity should be evident. Conservatism in lan-
        guage preserves the social status quo. But society is not a static entity;
        it must adjust continually to changing situations and modes of living.
        Revolutionary movements (such as Stoicism and early Christianity) aim
        toward an upheaval of traditional institutions; and so it is not surprising
        to see their depreciation of the prescriptive stance of the grammarians.
        These two linguistic attitudes – the prescriptive and the anti-prescriptive
        – exist in a dynamic that shapes, at any historical moment, the form of
        cultural life. Malcolm’s own sympathies were generally against the pre-
        scriptivists, especially when their purpose was to define class barriers.
        Having grown up among people who worked with their hands, he fought
        the kind of genteel grammatical notions that look down on what they
        perceive as uneducated speech.
             At the same time, he was intrigued with the ways in which culture
        is transmitted in writing, even when the writing does not communicate
        in conventional ways. His paper, “Of Glyphs and Glottography,” written
        during his productive years at the Max Planck Institute for the History of
        Science in Berlin, examines proto-writing and the connection between
        written and spoken language. It begins with a typically whimsical epi-
        graph, from Popeye the Sailor Man: “This writin’ is wroten rotten, if
        you happen to ask me,” and observes, It is ... evident that writing that
i i
    i                                                                                 i
    i                                                                                   i
i i
    i                                                                                   i
    i                                                                                i
220 APPENDICES
        Information Science.
            It would seem that already Malcolm had set the course of a produc-
        tive and happy life. He rejoiced in his many friends and colleagues who
        helped sustain his own productivity and with whom he was invariably
        and selflessly generous. In 2006, he married Dr. Ludmila Selemeneva,
        a Russian specialist in rhetoric, and on December 2, 2008, their son,
        Stanley William Hyman was born in Berlin. Malcolm was looking for-
        ward to dividing his time between there and Providence where his work
        at Brown had expanded. Yet he had always lived with such intensity
        and drive that periodically he succumbed to the temptation to push him-
        self too far. Unfortunately, for some years he had also suffered from
        a complex of physical diseases which may well have had one underly-
        ing though undiagnosed root. Because he was such a selfless and warm
        person, convivial and happy to immerse himself in the fellowship of oth-
        ers, only those closest to him were aware that he suffered from several
        escalating and life-threatening conditions. His sudden death on Septem-
        ber 4, 2009, left all who knew him bereft. It also deprived scholarship
        of the significant contributions he would surely have made had he lived
        longer. Fortunately, that aspect of him lives on in the continuing work of
        all whom he influenced, and not least in this book.
i i
    i                                                                                i
    i                                                                             i
        E DUCATION
        Ph. D. in Classics, May 2002
        Brown University, Providence, RI
        Thesis: “Barbarism and Solecism in Ancient Grammatical Thought”
        Advisor: William F. Wyatt
        R ESEARCH I NTERESTS
        cognitive aspects of writing; ancient literacy
        linguistic and scholarly computing
        technical terminology and scientific concepts
        Graeco-Roman language science
        P OSITIONS
        Visiting Scholar, Department of Classics, Brown University, Providence,
               RI (2006–2009)
        Wissenschaftlicher Mitarbeiter, Max-Planck-Institut für Wissenschafts-
             geschichte, Berlin, Germany (2004–2009)
        Research Fellow, Department of Classics, Harvard University (2001–
             2004)
        G RANTS
        Co-Principal Investigator, “Enhancing Access to Primary Cultural Her-
             itage Materials of India,” National Endowment for the Humanities
             ($301,540, 12 months, starting July 2009) (with PI: P. Scharf)
i i
    i                                                                             i
    i                                                                               i
222 APPENDICES
        P UBLICATIONS
        “On the Tip of the Ancient Tongue: Failures of Lexical Access in Greek
              and Latin” (co-author: P. Thibodeau), in progress (2009)
        “Studies in Cacemphaton” (co-author: P. Thibodeau), in progress (2009)
        “Chomsky between Revolutions,” in Chomsky’s Revolutions, ed. D. Kib-
             bee (forthcoming, 2009)
        Linguistic Issues in Encoding Sanskrit (co-author: P. Scharf), Motilal
              Banarsidass (forthcoming, 2009)
        “Toward an Epistemic Web” (co-author: J. Renn), in Globalization of
             Knowledge and its Consequences, ed. J. Renn (forthcoming, 2009)
        “Euclid and Beyond: Towards a Long-term History of Deductivity” (co-
              author: M. Schiefsky), Künstliche Intelligenz 4/09 (2009)
i i
    i                                                                               i
    i                                                                                i
i i
    i                                                                                i
    i                                                                               i
224 APPENDICES
i i
    i                                                                               i
    i                                                                              i
i i
    i                                                                              i
    i                                                                            i
226 APPENDICES
        C ONFERENCE ORGANIZATION
        “Multilingualism, Linguae Francae, and the Global History of Religious
             and Scientific Concepts” (with J. Braarvig), The Norwegian Insti-
             tute at Athens, Greece, April 3–5, 2009
        “Viva     Voce: Echoes of Performance in the Ancient Text”
                (with V. Panoussi, J. Rowley, P. Thibodeau, M. Sundahl), Brown
                University, February 7–8, 1997
        T EACHING
        Teaching Fellow, Department of Classics, Brown University (1995–1997)
        P ROFESSIONAL AFFILITATIONS
        North American Association for the History of the Language Sciences
        Linguistic Society of America
        Henry Sweet Society for the History of Linguistic Ideas
        Association for Literary and Linguistic Computing
        Association for Computing in the Humanities
        P ROFESSIONAL ACTIVITIES
        Leader, Cross-Sectional Group III: The Spread of Knowledge through
        Cultures, TOPOI: The Formation and and Transformation of Space in
i i
    i                                                                            i
    i                                                                              i
i i
    i                                                                              i
    i                                                                                  i
228 APPENDICES
i i
    i                                                                                  i
    i                                                                            i
        C OMPUTER S KILLS
        Programming Languages: Java, Perl, Python
        Other: XML, XSL, RDF, Relax NG, TEI, HTML, CGI, JavaScript, LATEX,
        PostgreSQL, Zope, R, xfst
        Linux system administration
        L ANGUAGES R EAD
        Latin, Ancient Greek, Sanskrit, Italian, French, Spanish, German
        some university study also of Akkadian
        OTHER S KILLS
        Copy-editing and indexing experience
i i
    i                                                                            i
    i                                                          i
230 APPENDICES
i i
    i                                                          i
    i                                                                                 i
Bibliography
                                           231
i                                                                                         i
    i                                                                                 i
    i                                                                               i
232 BIBLIOGRAPHY
i i
    i                                                                               i
    i                                                                                i
BIBLIOGRAPHY 233
i i
    i                                                                                i
    i                                                                              i
234 BIBLIOGRAPHY
i i
    i                                                                              i
    i                                                                                 i
BIBLIOGRAPHY 235
i i
    i                                                                                 i
    i                                                                               i
236 BIBLIOGRAPHY
i i
    i                                                                               i
    i                                                                               i
BIBLIOGRAPHY 237
i i
    i                                                                               i
    i                                                                                i
238 BIBLIOGRAPHY
i i
    i                                                                                i
    i                                                                                i
BIBLIOGRAPHY 239
i i
    i                                                                                i
    i                                                                              i
240 BIBLIOGRAPHY
        Haas, W., ed. (1976), Writing without Letters, Vol. 4 of Mont Follick
             Series, Manchester University Press, Manchester.
        Hadj-Salah, A. (1971), ‘La notion de syllabe et la theorie cinetico-
             impulsionnelle des phoneticiens arabes’, Al-Lisāniyyāt 1, 63–83.
i i
    i                                                                              i
    i                                                                             i
BIBLIOGRAPHY 241
i i
    i                                                                             i
    i                                                                                i
242 BIBLIOGRAPHY
i i
    i                                                                                i
    i                                                                                 i
BIBLIOGRAPHY 243
i i
    i                                                                                 i
    i                                                                             i
244 BIBLIOGRAPHY
        —–. (1962), The Phoneme: Its Nature and Use, 2d edn, W. Heffer &
            Sons, Cambridge.
        Joseph, J. E. (2000), Limiting the Arbitrary: Linguistic Naturalism and
             its Opposites in Plato’s Cratylus and Modern Theories of Lan-
             guage, Vol. 96 of Studies in the History of the Language Sciences,
             John Benjamins, Amsterdam.
        Joshi, A., Ganu, A., Chand, A., Parmar, V. & Mathur, G. (2004),
             ‘Keylekh: A keyboard for text entry in Indic scripts’, CHI 2004,
             April 24–29, Vienna.
        Joshi, R. K. (2006), ‘The phonemic model from India for bi-modal appli-
             cations’, Paper delivered at the Second Workshop on International-
             izing SSML, Heraklion, Crete, May 2006. <http://www.w3.org/
             2006/02/SSML/agenda.html>.
        Joshi, R. K., Dharmadhikari, T. N. & Bedekar, V. V. (2007), ‘The
             phonemic approach for Sanskrit text’, <http://sanskrit.inria.fr/
             Symposium/Phonemics_CDAC.pdf>.
        Joshi, R. M. & Aaron, P. G., eds (2006), Handbook of Orthography and
             Literacy, Erlbaum, Mahwaw NJ.
        Kahan, B. (2000), Ottmar Mergenthaler: The Man and his Machine; A
            Biographical Appreciation of the Inventor on his Centennial, Oak
            Knoll Press, New Castle DE.
        Kahn, D. (1996), The Codebreakers: The Story of Secret Writing, 2d edn,
            Scribner, New York.
i i
    i                                                                             i
    i                                                                                i
BIBLIOGRAPHY 245
i i
    i                                                                                i
    i                                                                                i
246 BIBLIOGRAPHY
i i
    i                                                                                i
    i                                                                              i
BIBLIOGRAPHY 247
i i
    i                                                                              i
    i                                                                             i
248 BIBLIOGRAPHY
i i
    i                                                                             i
    i                                                                                 i
BIBLIOGRAPHY 249
i i
    i                                                                                 i
    i                                                                              i
250 BIBLIOGRAPHY
i i
    i                                                                              i
    i                                                                                      i
BIBLIOGRAPHY 251
i i
    i                                                                                      i
    i                                                                                    i
252 BIBLIOGRAPHY
i i
    i                                                                                    i
    i                                                                                                                  i
BIBLIOGRAPHY 253
i i
    i                                                                                                                  i
    i                                                                                 i
254 BIBLIOGRAPHY
i i
    i                                                                                 i
    i                                                                               i
BIBLIOGRAPHY 255
        Smith, F., Lott, D. & Cronnell, B. (1969), ‘The effect of type size and
             case alternation on word identification’, American Journal of Psy-
             chology 82(2), 248–253.
        Smith, F. W. (1964), ‘New American Standard Code for Information In-
             terchange’, Western Union Technical Review 18(2), 50–61.
        Smith, G. (1885), The Life of William Carey, D. D.: Shoemaker and Mis-
             sionary, Professor of Sanskrit, Bengali, and Marathi in the College
             of Fort William, Calcutta, J. Murray, London.
        Snowling, M. J. (2005), Dyslexia, in B. Hopkins, ed., ‘The Cambridge
            Encyclopedia of Child Development’, Cambridge University Press,
            Cambridge, pp. 433–436.
        Snyman, J. W. (1970), An Introduction to the !Xũ (!Kung) Language,
            A. A. Balkema, Cape Town.
i i
    i                                                                               i
    i                                                                                i
256 BIBLIOGRAPHY
        Suen, C. Y., Mori, S., Kim, S. H. & Leung, C. H. (2003), Analysis and
             recognition of Asian scripts—the state of the art, in ‘Proceedings of
             the 7th International Conference on Document Analysis and Recog-
             nition’, pp. 866–878.
        Sweet, H. (1892), A Manual of Current Shorthand, Orthographic and
            Phonetic, Clarendon, Oxford.
        Syropoulos, A., Tsolomitis, A. & Sofroniou, N. (2003), Digital Typog-
             raphy Using LATEX, Springer, New York.
        Szemerényi, O. (1967), ‘The new look of Indo-European: Reconstruc-
            tion and typology’, Phonetica 17(2), 65–99.
        Takakusu, J. (1896), Record of the Buddhist Religion as Practised in
            India and the Malay Archipelago, Clarendon, Oxford.
        Tolchinsky, L. (2003), The Cradle of Culture and What Children Know
             About Writing and Numbers Before Being Taught, Erlbaum, Mah-
             waw NJ.
        Tomasello, M. (1999), The Cultural Origins of Human Cognition, Har-
            vard University Press, Cambridge MA.
        Treiman, R. (2006), Knowledge about letters as a foundation for reading
             and spelling, in Joshi & Aaron (2006), pp. 581–599.
        Trigger, B. G. (1998), ‘Writing systems: A case study in cultural evolu-
             tion’, Norwegian Archaeological Review 31(1), 39–62.
        Trigo Ferre, R. L. (1988), The Phonological Derivation and Behavior of
             Nasal Glides, PhD thesis, MIT, Cambridge MA. MIT Dissertations
             in Linguistics TRIG01.
        Tversky, A. (1977), ‘Features of similarity’, Psychological Review
             84(4), 327–352.
        Unicode Consortium (2006), The Unicode Standard, Version 5.0,
            Addison-Wesley, Boston.
        Vacek, J. (1976), ‘The Sanskrit sibilants’, Wissenschaftliche Zeitschrift
            der Humboldt-Universität zu Berlin, Gesellschafts und sprachwis-
            senschaftliche Reihe 25(3), 407–412.
i i
    i                                                                                i
    i                                                                                 i
BIBLIOGRAPHY 257
i i
    i                                                                                 i
    i                                                                               i
258 BIBLIOGRAPHY
i i
    i                                                                               i
    i                                                                               i
BIBLIOGRAPHY 259
i i
    i                                                                               i
    i                                                                          i
260 BIBLIOGRAPHY
i i
    i                                                                          i
    i                                                                                i
Index
                                                261
i                                                                                        i
    i                                                                                i
    i                                                                               i
262 INDEX
i i
    i                                                                               i
    i                                                                             i
INDEX 263
i i
    i                                                                             i
    i                                                                                i
264 INDEX
i i
    i                                                                                i
    i                                                                                   i
INDEX 265
i i
    i                                                                                   i
    i                                                                                            i
266 INDEX
i i
    i                                                                                            i
    i                                                                                         i
INDEX 267
i i
    i                                                                                         i
    i                                                                             i
268 INDEX
i i
    i                                                                             i
    i                                                                               i
INDEX 269
i i
i i