1
SCRIPT GRAMMAR FOR GUJARATI LANGUAGE 
 Prepared by   
Technology Development for Indian Languages (TDIL) Programme   
Department of Information Technology, Government of India   
in association with    
Centre for Development of Advanced Computing (C-DAC)      
2  
Table of Contents    
0.  INTRODUCTION ...................................................................................................... 3 
1.  OBJECTIVES OF SCRIPT GRAMMAR .................................................................. 4 
2.  END USERS FOR SCRIPT GRAMMAR ................................................................. 5 
3.  SCOPE ........................................................................................................................ 6 
4.  TERMINOLOGY ......................................................................................................... 7 
5.  PHILOSOPHY AND UNDERLYING PRINCIPLES................................................ 11 
6.  SCRIPT GRAMMAR STRUCTURE ...................................................................... 12 
6.1. PERIPHERAL ELEMENTS OF THE SCRIPT GRAMMAR .............................. 13 
6.2. CONFORMITY TO THE SYLLABLE STRUCTURE ........................................ 14 
6.3 SCRIPT GRAMMAR PROPER ............................................................................. 18 
6.3.1. The Character Set of Gujarati. ........................................................................ 18 
6.3.2. Consonant Mtr  Combinations. ................................................................... 24 
6.3.3. The Ligature  Set of Gujarati. ......................................................................... 30 
6.3.4  The Collation Order of Gujarati. ..................................................................... 35 
7.  REFERENCES ......................................................................................................... 37 
8.  ANNEXURES .......................................................................................................... 38 
Annexure 1: Names of experts who have contributed to the script grammar ............... 38 
Annexure 2: Unicode Table of Gujarati ........................................................................ 39                  
3      
0.  INTRODUCTION  
The term script grammar refers to the behaviour pattern of the writing system of a given 
language. Languages which have written representations do not use a haphazard manner 
of storing the information within the system, but use a coherent pattern which is similar 
to  the  linguistic  grammar  of  a  given  language.  With  the  help  of  specialists  (not 
necessarily  linguist)  who  work  in  the  area  of  the  written  representation  of  the  language, 
the  manner  in  which  the  shapes  of  the  characters  of  the  language  and  the  representation 
of  the  conjunct  forms  is  provided.  In  other  words  the  Script  Grammar  deals  with  the 
surface  structure  of  the  language  and  tries  to  provide  the  best  possible  fit    for  shapes 
and their representation. Since this is a highly subjective issue, the shapes provided here 
are  recommendations  at  the  best  and  conform  to  the  perception  of  the  mandating 
body/evaluators who consensually arrive at the best possible fit which is acceptable to 
a  majority  of  users.  An  example  from  the  Devangar  script  will  make  the  above  clear. 
Although  Marathi  and  Nepali  share  the  same  script  Devangar,  not  only  do  they  not 
share the same character inventory but in addition the representation of certain characters 
is  different.  Thus  the  Nepali  /la/    is  different  from  the  Marathi    /la/  in  so  far  as  the 
placement  of  the  stem  is  concerned  Nepali    Marathi  .  This  ensures  that  the  Script 
Grammar  conforms  to  the  language  in  question  and  provides  the  character  shapes 
acceptable  to  a  given  user  community.  It  should  be  noted  that  this  does  not  mean 
monotony.  The  Marathi  and  Nepali  /la/  can  have  a  variety  of  forms  once  the  intrinsic 
structure of the character is determined.  
Script Grammar is the term used to define: 
  the writing system used to inscribe a given language 
  the history of the script and language (wherever available) 
  the syllabic structure of the writing system of the language 
  the rule ordering of the characters within the syllable (akshar) 
  description of the syllabic clusters 
  collation order of the characters: lexical / dictionary sorting order      
4  
1.  OBJECTIVES OF SCRIPT GRAMMAR  
The Objectives of the script grammar for each language can be divided into two major 
parts:  
Societal: 
  Provide a visual representation of shapes that are deemed to be in conformity with 
the perception of a given community  
  Ensure thereby that this perception is safe-guarded  
  Through wide-spread dissemination and creation of appropriate tools ensure that 
within the given linguistic community, all media tries to adopt the given shape.  
Technical: 
  Classify the language in terms of its ISO and also whether it belongs to the Abjad, 
Akshar (Alphasyllabary) class.  
  Provide an inventory of the characters pertinent to the language and classify the 
same in terms of their taxonomy.  
  As a corollary determine whether the inventory is in conformity to the Syllable 
formalism as stipulated in ISCII91 and subsequently adopted by Unicode.  
  Since Brahmi is written from left to right, and since certain characters do not 
follow the linear L to R order, provide an inventory of displaced catenators i.e. 
characters such as Mtrs that concatenate to the Consonant  
  Propose the best shape representation of the individual characters as well as of the 
ligatures used within a given script. As a corollary request the expert(s) to identify 
the largest possible strings of such ligatures.  
  Finally provide the collation order pertinent to that script/language, which would 
be of great utility to high-end NLP as well as to CLDRs in the pertinent 
language. Thus for example, the collation order for Marathi is different from 
Hindi although both languages share the same script. Thus in Marathi  ,   are 
placed at the end of the consonant inventory i.e. after  in the sort order. In Hindi 
 is sorted along with  and  with       
5  
2.  END USERS FOR SCRIPT GRAMMAR   
The script-grammar specific to a given language can be used by a large number of users.   
  Most importantly it can be used by font developers desirous of  developing a font 
which is compliant with the perception of the characters and ligatures of a 
language by its user community.  
  Certain features of the script grammar such as the shapes can also be used for 
testing OCR and OHWR. Similarly information regarding Ligatures as well as 
collation order can help in high-end NLP work such as detecting invalid 
combinations, correct implementation of syllable structure, prediction routines to 
name a few. Information regarding collation and character sets can be also used 
for CLDR.  
  They allow the font designer to design a font which is in compliance with the 
norms and standards of that particular script. A major problem which will be dealt 
with in the template is one of ligatures. The final list of ligatures defined by the 
script grammar allows the font designer to write specific rules for such glyphs.  
  It permits the software developer to design and implement the keyboard and the 
input mechanism which will meet the requirement of the particular linguistic 
community.  
  The collation or sort order as described in a Script Grammar permits the software 
developer to write software functions/ routines for sorting data in all applications.  
  Script Grammars are equally important for keyboard design, especially when 
supplemented by frequency data from a corpus.  
As can be seen the script grammar has a wide range of use and can be of utility to font 
developers, Indian language developers and linguists in the area of computation.                
6  
3.  SCOPE  
This script grammar document contains following information about the  language and 
the script used for writing the language.  
1.  Name of the language and its representation in the 3 letter mnemonic as 
per  ISO 639.3 standard. 
2.  Script used to inscribe the given language 
3.  The structure of the script used for writing the language 
  Rule ordering of the characters within the syllable formation is a 
language 
  Description of the syllabic clusters of the script 
  Collation order of the characters: lexical / dictionary sorting order 
  Compliance of the script with Unicode. 
These will be treated within the relevant sections of the script grammar      
7  
4.  TERMINOLOGY
1  
Abjad: A writing system in which each symbol always or usually stands for a consonant. 
The  long  vowels  are  indicated.  However  the  short  vowels  are  rarely  marked  and  the 
reader needs to supply these. Example: Urdu written in Perso-Arabic Script is an example 
of this writing system.  
Abugida:  also called an alphasyllabary, is a segmental writing system in which 
consonantvowel sequences are written as a unit: each unit is based on a consonant letter, 
and vowel notation is obligatory but secondary
2  
Akshar: see Abugida    
Allographs: Variants of the representation of a character. Thus ae and  [U+00E6] in 
Latin alphabet are allographs.  
Allo-Script: The term relates to languages which share a common script. Thus 
Devangar is used to write 9 official languages. However these languages do not use the 
same set of characters. Thus Marathi uses the retroflex lla   [U+ 0933] which Hindi 
does not use. Flaps used in Hindi  [U+095C]  [U+095D] are not used in Marathi. 
These sub-sets of scripts based on a single matricial script are termed as allo-scripts.  
Alphabet: A set of letters used in writing a language. Example: The English Alphabet.  
Aspirated consonant: A consonant which is pronounced with an extra puff of air coming 
out at the time of release of the oral obstruction. This has a sound of an extra "h".   
Basic alphabet: The minimal set of letters which can be used for uniquely encoding 
every word of a language. The basic alphabet for English consists of only the upper-case 
letters A-Z  
Catenators: Also termed as Concatenators are characters which are concatenated to 
another character. In the Brahmi script these are the Mtrs or Vowel modifiers which are 
adjoined to the consonant and add a vocalic value to the consonant.   
Conjunct: The Indic scripts are noted for a large number of consonant conjunct forms 
that serve as orthographic abbreviations (ligatures) of two or more adjacent letterforms. 
This abbreviation takes place only in the context of a consonant cluster. Under normal 
circumstances, a consonant cluster is depicted with a conjunct glyph if such a glyph is 
available in the current font. In the absence of a conjunct glyph, the one or more dead                                                  
1
 As in the case of the BIS Document, in order to make the terminology accessible for all readers, examples 
have been chosen from English/Latin scripts, wherever possible. Some definitions have been excerpted 
from the BIS ISCII91 document and suitably modified where necessary. 
2
 Wikipedia definition   
8  
consonants that form part of the cluster are depicted using half-form glyphs. In the 
absence of half-form glyphs, the dead consonants are depicted using the nominal 
consonant forms combined with visible virama signs.
3  
Consonant: A letter representing a speech sound in which the breath is at least partly 
obstructed.  
Diacritic:A mark added to a letter which distinguishes it from the same letter without a 
mark, usually having a different phonetic value or stress.  
Displaced Catenator: (see Catenator) Within the Brahmi script, the writing system is 
linear and moves from left to right. However in the case of some catenators this rules is 
not observed and the catenator (wholly or partially) is placed to the right of the consonant 
to which it relates. The short vowel I // in Devangar is an example of a displaced 
catenator.  
Display composing: The process of organizing the basic shapes available in a font in 
order to display (or print) a word.  
Display rendition: The process by which a string of characters is displayed (or printed). 
In this process several consecutive characters may combine with each other on the screen. 
The sequence of display of the characters may become different.  
Eyebrow repha: (See Eyelash ra)  
Eyelash ra: The eyelash ra is used in Konkani, Gujarati and Marathii. It is treated as 
different from the  (repha) by certain linguists. While the former is treated as a flap, the 
latter is a continuant trill (cf., Kalyan Kale and Anjali Soman. 1986).   
Font: A set of symbols used for display or printing of a script in a particular style.  
International numerals: The conventional 0 to 9 digits used in English for denoting 
numbers. these are also known as Indo-Arabic numerals (to differentiate them from the 
Roman numerals like IX for 9).  
Latin alphabet: The alphabet used for writing the language of ancient Rome. Also 
known as the Roman alphabet. The alphabet is used today for writing English and 
European languages.  
Letter: A character representing one or more of the simple or compound sounds used in 
speech. It can be any of the alphabetic symbols.  
Ligature: (see Conjunct)                                                   
3
 Unicode ver. 6.0 Chapter 9.0 pp 6-7   
9  
Nasal consonant: A consonant pronounced with the breath passing through the nose. 
Example m n  in English.  
Nasalized vowel: A vowel pronounced with the breath passing both through the nose and 
the mouth. In Indian scripts this is denoted by a Chandrabindu and gives the vowel/vowel 
modifier  over which it placed a nasal value. Example:    
Phonetic alphabet: An alphabet which has direct correspondence between letters and 
sounds Example: The International Phonetic Alphabet..  
Pure consonant: A consonant which does not have any vowel implicitly associated with 
it.  
Rafar: A special case of a ligature constituted by the adjunction of ra followed by a 
halanta to consonant. The resultant combination places the ra on top of the consonant to 
which it is adjoined. In case the consonant itself is adjoined to another consonant, the 
rafar is placed above the consonant e.g.in Devanagari:  +=    , ++=  
Rakar: A special case of a ligature constituted by the adjunction of a consonant followed 
by a halanta to ra. In a large number of Brahmi scripts the ra is adjoined to the stem of 
consonant to which it relates. In the case of consonants which have no stem such as the 
dental retroflexes in Devangar, the rakar is placed below the consonant to which it 
relates.   
Repha: (see Rafar)  
Roman script: The script based on the ancient Roman alphabet, with the letters A-Z and 
additional diacritic marks. Used for writing a language which is not usually written in the 
Roman alphabet.  
Script: A distinctive and complete set of characters used for the written form of one or 
more languages.  
Script numerals: The 0 to 9 digits in a script, which have shapes distinct from their 
international counterparts.  
Syllable: A unit of pronunciation uttered without interruption, forming whole or part of a 
word, and usually having one vowel or diphthong sound optionally surrounded by one or 
more consonants  
Transliteration: Representation of words with the closest corresponding letters in an 
alphabet of a different language.  
Vowel: A letter representing a speech sound made with the vibration of the vocal cords, 
but without audible obstruction    
10  
Vowel sign: A graphic character associated with a letter, to indicate a vowel to be 
associated with that character (Mtr in Hindi).    
11  
5.  PHILOSOPHY AND UNDERLYING PRINCIPLES  
The script grammar is based on the following principles:  
1.  The Grammar aims  to depict the surface grammar of the written language: the 
manner in which characters as well as conjuncts are depicted  
2.  Where a given script admits many languages, it is pre-suppose that such 
languages will prescribe different representations for a given shape or 
conjunct according to the perception of the native users of that language  
3.  Corollary to the above the result is a script and allo-scripts i.e. a given script 
shared by many languages is not uniformly deployed across all the languages 
but is subject to variations and modulations.  
4.  The term Grammar is used here in a non-normative sense: what is prescribed 
is in the form of recommendations provided by experts who visualize the 
shape of the given script in their mother tongue in a specific manner. 
Subjective variations may occur
4  
5.  The Grammar is limited to its synchronic use i.e. the manner in which a given 
language as of today admits a character set within the script used to write it. It 
is not diachronic or historical in nature and does not study the evolution of the 
given script across centuries.                                                  
4
 It is recommended that such variations be culled by placing the Grammars of different scripts in public 
review.    
12  
6.  SCRIPT GRAMMAR STRUCTURE  
The script grammar provided below has the following parts.   
Part 6.1. deals with peripheral elements such as the ISO of the language, the writing 
system used: (Alphasyllabic) Abugida or Abjad.   
Part 6.2. treats of the syllabic structure. It verifies whether the character set of the 
language complies with the ISCII syllabic structure and if not which cases are not 
compliant.   
Part 6.3 is the script grammar proper and describes the character set as well as the 
conjunct shapes of the given script along with the collation order       
13   
6.1. PERIPHERAL ELEMENTS OF THE SCRIPT GRAMMAR  
These  constitute  the  elements  that  are  peripheral  to  the  Script  Grammar.  The  main 
parameters  considered  are  the  mnemonic  and  name  of  the  language  (needed  for  CLDR 
and  also  for  language  tags),  the  writing  system  used  to  inscribe  the  language  and 
wherever possible a short history of the language.   
6.1.1. Name of the language and its representation in the 3 letter mnemonic as per  
ISO 639.1. & 639.3  
Name of the Language: GUJARATI   
ISO Mnemonics: guj  
This refers to a one line description of the language and its mnemonic representation as 
per the ISO.     
6.1.2.  Identification of the writing system(s) used to inscribe the given language 
Gujarati is written using the Gujarati script. It is an alphasyllabary with the akshar 
as its core.  
This is a one line description of the script used to write the language.  However  in case 
the language uses more than one script, all the scripts in question are specified, provided 
these constitute the official language of the given state.  
All scripts derived from Brahmi are Abugidas i.e. syllabary driven systems. The main 
features  of Abugidas are as under: 
  The consonant has an implicit vowel built-in which is normally the schwa. 
  The  inherent  vowel  can  be  modified  by  the  addition  of  other  vowels  or 
muted by a diacritic termed as a Virama or Halanta 
  Vowels can be handled as full vowels with a vocalic value  
  When  two  or  more  consonants  join  together  they  form  ligatures  which  can 
be recognized by their shape  or alternatively form an entirely new shape 
  +  = . 
Abugidas/Alphasyllabaries because of their syllabic structure require a special 
description which is the subject of  the discussion in  6.2. below.  
6.1.3.  Amendments needed in Unicode for Gujarati language  
None  have been proposed by the Gujarati Sahitya Parishad which has mandated the 
script grammar.     
14   
6.2. CONFORMITY TO THE SYLLABLE STRUCTURE   
Gujarati language complies with the syllable (akshar) structure described above. It 
can admit up to 3 consonant clusters.   
Alphasyllabaries  are  determined  by  the  notion  of  the  syllable  or  the  Akshar.  The 
compositional grammar of the syllable  determines it well-formedness. This is through a 
series  of  formal  constraints  based  on  a  Backus-Naur  Formalism    which  is  given  below. 
The syllable (akshar), first defined in the ISCII document (1991), identifies the following 
character sub-sets for the purposes of identifying the syllable (akshar). In what follows 
the syllable analysis will be restricted to Gujarati.  
(C)  Consonants  
3    -    " 
  O  ^     
8    '  8  Q 
d         
    >     
  -       
      C      
 (V) Vowels   
-  -      U  3     -  -  -  -  -  -         
(M) Mtrs or Vowel Modifiers  
t  l  l               t   t  t  
(D)   Diacritics   
: Anuswar   
Anuswara, a nasal, denoted by a dot above the letter after 
which it is to be pronounced. This falls under Nasal category. 
 :Chandrabindu 
Chandrabindu, a nasal, denoted by a breve with a dot 
superposed above the letter after which it is to be pronounced. 
This falls under Nasal category. 
 Visarga  
Visarga, denoted by two dots placed above the other.    
15  
: Avagraha 
For extra length with long vowels as seen in the Sanskrit text  
 dtq         
(H):Halanta   -  Halant  used  in  most  writing  systems  to  signify  the  lack  of  an  inherent 
vowel.        
(N)
5
 Nukta    - is Not used in Gujarati  
Each  of  these  sub-types  has  its  restrictions  in  terms  of  what  can  precede  or  follow  it 
within a syllable (akshar), as shown in the table below:           
C can be preceded by H or no subtype  and followed by  any one of the following: M,D,H 
V can be preceded by no subtype and followed by D but not by another sub-type. 
M can be preceded by C and followed by D. 
D  can  be  preceded  by  C,  V,  M  and  followed  by  no  other  subtype.  It  closes  the  syllable 
(akshar). 
H can be preceded by C alone and followed only by C and no other sub-set.  
6.2.1. Syllable (akshar) Types  
The  formalism  defines  the  syllable  (akshar)  in  terms  of  both  what  can  constitute  a 
syllable (akshar) and what cannot. A valid syllable (akshar) as per this definition can be 
of only two types:  
1. A vowel syllable (akshar): a full vowel.  
2. A consonant syllable (akshar): a full consonant (having a mtr )  
The  three  other  subsets  viz.  Mtrs,  Diacritics,  Halanta  cannot  constitute  a  syllable 
(akshar) by themselves or in combination among themselves.  
6.2.1.1. The Vowel syllable (akshar) is of the following types:                                                   
5
 The nukta is a small dot placed under a character in Northern scripts to show that they are flapped or for 
deriving 5 other consonants in the Devangar and Punjabi scripts, required for Urdu ,,,,   
PRECEDED BY  SUBTYPE  FOLLOWED BY 
-, H  C  M,D,H 
-  V  D 
C  M  D 
C,V,M  D  - 
C  H  C   
16  
6.2.1.1.1. A pure vowel all by itself: - /a/  - // etc. 
6.2.1.1.2.  A  vowel  followed  by  a  modifier  i.e.  either  a  nasal  marker    or  a  visarga  or  an 
avagraha:  / /, - / H / - /:/  
6.2.1.2. The Consonant syllable (akshar) can be of the following types: 
6.2.1.2.1. A full consonant (with or without Nukta)  i.e. with the inherent vowel : 3 : /ka/  
6.2.1.2.2.  A  consonant  followed  by  a  mtr  i.e.  the  inherent  vowel  being  substituted  by 
another vowel: 3l /ki:/ 
6.2.1.2.3. A consonant followed by a modifier: 3   /k /, C /haH
6
/ 
6.2.1.2.4. A consonant followed by a mtr and a modifier:   /k/,   /duH/. 
6.2.1.2.5.  A  consonant  cluster  i.e.  a  dead  or  half  consonant  (Consonant+Halanta) 
followed by a full consonant followed optionally by a mtr, a modifier or a combination 
of both. These result in a ligature or what is often termed as yuktakshara. 
    3 /tka/ 3  /tka /, 3 /tkaH/ 3    /tk/, 3  /tku/. 
The  above  permutations  and  combinations  result  in  7  major  syllable  (akshar)  types.  Of 
these the last type  introduces the problem of the number of consonant clusters. ISCII (91, 
p.23)  provides  for  up  to  three  consonant  clusters  as  the  worst  case  i.e.  the  largest 
possible  string.  This  is  functional  for  modern  Prakrits  where  the  largest  consonantal 
cluster  rarely  exceeds  three  consonant.  Sanskrit  is  an  exception  where  in  a  single  word, 
four consonants can come together:  /krtsnya/ "wholeness", "entirety".   
This means that theoretically the following forms can be postulated: 
1.  Vowel Set: With the Vowel as the node. 
       V       VD 
2.  Consonant  set:  With  the  Consonant  as  the  node  (an  implicit  or  modified 
vowel is pre-implied).   
Node  Mtr  Modifier  Mtr+Modifier 
C  CM  CD  CMD 
CHC  CHCM  CHCD  CHCMD 
CHCHC  CHCHCM  CHCHCD  CHCHCMD 
CHCHCHC  CHCHCHCM  CHCHCHCD  CHCHCHCM  
A total number of 16 theoretical syllables is therefore possible. It will be seen that the 
written syllable (akshar) is not very different in structure from the phonetic syllable and 
that the movement from the written to the spoken levels is made feasible by application 
of certain rules.                                                   
6
 This character  represents phonetically the weak implicit vowel, termed as schwa and often shown as /a/ 
also.   
17  
This formal structure of the syllable (akshar) explained above is common to all Brahmi 
based scripts (with a few variations). It will form the basis of an exhaustive description of 
the characters as well as their ligatural representations.      
18  
6.3 SCRIPT GRAMMAR PROPER  
This section lays down in detail the different parameters of the Script Grammar for 
Gujarati. These are:      
6.3.1. The Character Set of Gujarati.  
6.3.2. The Consonant mtr combinations of Gujarati.  
6.3.3. The Ligature Set of Gujarati.  
6.3.4. Collocation Order of Gujarati  
6.3.1. The Character Set of Gujarati.   
This section provides detailed information about the characters in the language and the 
list of the same and also more importantly shows the manner in which the character is to 
be written. Each subsection comprises therefore two parts: the basic character set and the 
shape each character should have, as mandated by the experts who have designed the 
script grammar of Gujarati.   
This comprises the following: 
6.3.1.1. The Consonant Set 
6.3.1.2. The Vowel Set 
6.3.1.3. The Mtr  Set 
6.3.1.4. Displaced Catenators 
6.3.1.5. Shape of the combination of ra (rakar,repha) 
6.3.1.6. The Set of Diacritics 
6.3.1.7. Numerals 
6.3.1.8. Punctuation marks 
6.3.1.9. Other symbols 
Each of these will be analysed in detail: 
6.3.1.1. The Consonant Set 
The Consonant set of Gujarati comprises the following characters: 
A basic Consonant  inventory arranged as per their Vargas.   
  -voiced 
-aspirated 
-voiced 
+aspirated 
+voiced 
-aspirated 
+voiced 
+aspirated 
Nasal 
Velar 
3    -    " 
Palatal 
  O  ^     
Retroflex 
8    '  8  Q 
Dental 
d         
B-labial 
    >        
19    
Other consonants   
  -        
       C    
Note:  Ligatures H O ? are not listed in the consonants list  as they are ligatures.  
The exact shapes as desired by the experts are provided in the table below: 
  -voiced 
-aspirated 
-voiced 
+aspirated 
+voiced 
-aspirated 
+voiced 
+aspirated 
Nasal 
Velar 
3    -    " 
Palatal 
  O  ^     
Retroflex 
8    '  8  Q 
Dental 
d         
B-labial 
    >      
Other consonants   
  -        
        C   
6.3.1.2. The Vowel Set 
The Vowel set of Gujarati is as under:  
- 
GUJARATI LETTER A 
- 
GUJARATI LETTER AA  
GUJARATI LETTER I  
GUJARATI LETTER II 
U 
GUJARATI LETTER U 
3 
GUJARATI LETTER UU   
20   
GUJARATI LETTER VOCALIC R 
- 
GUJARATI LETTER E 
- 
GUJARATI LETTER AI 
- 
GUJARATI LETTER O 
- 
GUJARATI LETTER AU 
- 
GUJARATI LETTER CANDRA E 
- 
GUJARATI LETTER CANDRA O  
As per expert recommendations the character set should be written as under: 
- -   U 3  - - - - - - 
6.3.1.3. The Mtr  Set 
The Mtr  (Vowel Modifier Set)  of Gujarati is as under: 
Mtr Names  Mtrs Sign  Where is it used ?  Consonant Shapes 
formed 
1. Gujarati sign AA 
t  -  g+-=3t 
2. Gujarati sign I      
( stands to the left of 
the consonant) 
l    g+=l3 
3. Gujarati sign II 
l    g+=3l 
4. Gujarati sign U 
  U  g+U=3  
5. Gujarati sign UU 
  3  g+3=3 
6. Gujarati sign 
vocalic R 
    g+= 
7. Gujarati sign 
vocalic candra E 
  -   g+- =3  
8. Gujarati sign E 
   -  g+-=3  
9 Gujarati sign AI 
  -  g+-=3  
10.Gujarati sign 
candra O 
t   -  g+-=3t   
21  
11.Gujarati sign O 
t  -  g+-=3t 
12. Gujarati sign AU 
t  -  g+-=3t     
As per expert recommendations the character set should be written as under: 
t,  l,   l,  ,  ,  ,  ,  ,  ,  t,  t,  t,  
6.3.1.4. Displaced Catenators 
Under normal circumstances Vowel Modifiers also known as catenators (since they 
concatenate to the preceding consonant) in Brahmi based scripts are written from left to 
right in linear order (with the exception of Consonant stacks). However certain modifiers 
are displaced and are placed to the left of the consonant to which they concatenate. As a 
general rule in all Devangar script driven languages there is only one displaced 
catenator:   
CATENATOR  POSITION  EXAMPLE 
l 
To left of character 
l3,ld,l- 
6.3.1.5. Shape of the combination of ra (rakar,repha) 
The  takes a variety of shapes known as rakar and repha (rafar) depending on its 
position. When conjoined before a consonant by means of the halanta, it changes shape 
and is placed on top of the consonant or consonant clusters to which it relates. This is 
called a repha or rafar. Gujarati admits a special repha known as eyelash ra. When it is 
conjoined after a consonant with the help of a halanta, it appends to the consonant in the 
shape of a slanting stroke attached to the stem (side rakar) or in the case of consonants 
which have no stem such as , it is appended in the shape of a ^ to the bottom of the 
character (bottom rakar). Gujarati has the following combinations of ra:  
RAFARS 
Top rafar: 5          for ex top rafars will be formed in case of following words.  
 ,   
RAKARS 
1.  Bottom rakar          
22  
2.  Side rakar   z   > 
Bottom rakar 
|   
Side rakar 
, , t  
6.3.1.6. Diacritics 
These are as under in the case of Gujarati: 
:  - Anuswar - -  
: -Chandrabindu/Anunasika Rarely used in Gujarati  
:  - Halant  q  
  - Visarga  
:  - Avagaraha: for extra length with long vowels, mainly in Sanskrit texts  
         e.g. /  dtq /   
6.3.1.7. Numerals 
Following are the numbers used in Gujarati language.  
Latino-Arabic set: (0,1,2,3,4,5,6,7,8,9)  is used in official documents. But in the Gujarati 
text, Gujarati numerals should be preferred. They are as follows. 
<   Y   D Z C  
Numeral 
Shapes 
Explanation 
< 
Gujarati  Digit Zero  
Gujarati Digit One  
Gujarati Digit Two  
Gujarati Digit Three 
Y 
Gujarati Digit Four  
Gujarati Digit Five  
Gujarati Digit Six 
D 
Gujarati Digit Seven   
23      
6.3.1.8. Punctuation Markers 
Gujarati uses punctuation markers from the Latin set. such as . , ; :   ( ) [ ] etc. 
English fullstop [.] is used, the use of Purna and Deergha Virama (full-stop/danda) 
Devangar  code block: U+0964, U+0965 ,  is commonly used in poetry..  
A list of punctuations is provided below: 
Sr. No.  Name of the marker  Marker Shape 
01  Full Stop or Period  . 
02  Question Mark  ? 
03  Exclamation Mark  ! 
04  Apostrophe  , 
05  Semi Colon  ; 
06  Colon  : 
07  Hyphen  - 
08  Dash    -- 
09  Ellipsis mark   ... 
10  Oblique  / 
11  Double quotation mark   " " 
12  Single quotation mark    
13  Cross   XXX 
14  As Above  - - " - - 
15  Round Brackets  ( ) 
16  Square Brackets   [ ] 
17  Curly Brackets   { } 
19  Devangar  Danda  
| 
20  Devangar  Double Danda 
||  
6.3.1.9 Other Symbols 
These are religious, currency markers etc. included in Unicode: 
3 Om (as written in Gujarati)   
: Rupee Sign as mandated by Government of India. 
Note: The old sign for Rupee in Gujarati  [0AF1] which was  followed by the 
abbreviation marker has been replaced by  [U+20B9] 
Z 
Gujarati Digit Eight 
C 
Gujarati Digit Nine   
24  
6.3.2. Consonant Mtr  Combinations.  
These refer to the shapes generated when a Mtr  is adjoined to the Consonant. The 
layout of these is in the shape of a matrix where  the first horizontal row refers to the 
active consonant and the first vertical column refers to the vowel-modifier.   
Due to constraints of space and also for reasons of clarity, for each class a series of 3 
tables are provided.  
Table 1:  3    -    "    O  ^       
Table 2:  8    '  8  Q  d         
Table 3:      >        -         
       C  
Wherever there is an X it implies that the experts have deemed that such a combination is 
not used in the language. However for the font developer this is an indication that for this 
particular combination which is not existent  in the language but needs to be 
accommodated in the font table, a simple linear combination be provided.  
e.g. Although the combination of " +Mtr  is theoretically not possible it needs to be 
handled at the font level in the anticipation that a user could type this combination. The 
font would show the following: "t 
The classes are as under: 
6.3.2.1. refers to a simple concatenation of Consonant and Mtr  combinations. 
6.3.2.2. refers to a concatenation of Consonant and Mtr  + Nasal marker combinations. 
Other diacritics such as avagraha and visarga have been avoided, since these are linear in 
nature, are adjoined to the combination  and do not in any way modify the structure of the 
shapes.      
25   
6.3.2.1 Consonant and Mtr  combinations.  
This set refers to a simple concatenation of Consonant and Mtr.  
Consonant and Mtr  combinations Set 1  
  3    -    "    O  ^     
t  3t  t  -t  t  X  t  Ot    t  X 
l  l3  l  l-  l  X  l  lO  l^  l  X 
l  3l  l  -l  l  X  l  Ol    l  X 
          X  X  O    X  X 
  3      -     X     O   ^      X 
  3     -     X    O   ^     X 
  3      -      X     O  ^     X 
t  3t  t  -t  t  X  t  Ot  ^t  t  X 
t  3t  t  -t  t  X  t  Ot  ^t  t  X 
  3      -     X     O   ^    X 
  3   X  -     X     O   ^      X 
t  3t  X  -t  t  X  t  Ot  ^t  t  X    
Remark 1- "  and  are rarely used as the first members of clusters  
Consonant and Mtr  combinations Set 2  
This set is in continuation of set 1 which shows consonant and Matra combinations.  
  8    '  8  Q  d         
t  8t  t  't  8t  Qt  dt  t  t  t  t 
l  l8  l  l'  l8  lQ  ld  l  l  l  l 
l  8l  l  'l  8l  Ql  dl  l  l  l  l 
  8  X  X  X  X    X  X     
  8      '   8     d          
  8    '  8    d            
26  
  8     '   8   Q  d           
  8      '   8   Q   d             
t  8t  t  't  8t  Qt  dt  t  t  t  t 
t  8t  t  't  8t  X  dt  t  t  t  t 
  8      '   8   X  X     X  X   
t  8t  t  't  X  X  dt  t  X  X  t  
Consonant and Mtr  combinations Set 3  
This set is in continuation of set 2 which shows consonant and Matra combinations.  
      >        -            C      O 
t  t  t  >t  t  t  t  -t  t  t  t  t  t  Ct  t  t  Ot 
l  l  l  l>  l  l  l  l-  l  l  l  l  l  lC  l  l  lO 
l  l  l  >l  l  l  l  -l  l  l  l  l  l  Cl  l  l  Ol 
    X        X  X         X    X  X  X  X      
  >                          C         X 
     
  >                         C        X 
        >          -               C         O 
       >           -                 C        X 
t  t  t  >t  t  t  t  -t  t  t  t  t  t  Ct  t  t  Ot 
t  t  t  >t  t  t  t  -t  t  t  t  t  t  Ct  X  t  X 
        >   X     X  -           X     C  X  X  X 
t  t  t  >t  X  t  t  -t  t  t  t  X  t  Ct  X  X  X 
 
 
   
 
 
27 
 
 
 
6.3.2.2 Consonant and Mtr +Nasal combinations.  
This set refers to a Consonant and Mtr  + Nasal marker combinations. 
 
Consonant and Mtr + Nasal combinations - Set 1  
 
  3    -    "    O  ^     
  3      -      X     O   ^     X 
t   3t  t  -t  t  X  t  Ot     t  X 
l   l3  l  l-  X  X  l  X  l^  l  X 
l   3l  l  -l  l  X  l  Ol    l  X 
  X  X  X  X  X  X  O       X  X 
  3        -      X     O    ^       X 
  3        -      X     O    ^      X 
  3      -      X     O   ^     X 
  3   X  X    X  X  X  X  X  X 
t   3    -    X    O  ^    X 
t   3    -    "    O  ^    X 
  3        -       "       O   ^       X 
t   3    -    "    O  ^     
   
 
 
28 
 
Consonant and Mtr +Nasal combinations - Set 2  
 
This set is in continuation of set 1 above which shows combinations of Consonant and 
Mtr  + Nasal marker 
  8    '  8  Q  d         
  8      '   8   Q   d             
t  8t  t  't  8t  Qt  dt  t  t  t  t 
l  l8  l  l'  l8  lQ  ld  X  l  l  l 
l  8l  l  'l  8l  Ql  dl  l  l  l  l 
  X  X  X  X  X  X  X  X  X  X 
  8       '   8      d               
  8      '   8     d              
  8      '   8   Q   d              
  8      '   8   Q   d              
t  8    '  8  X  d         
t  8    '  8  Q  d         
  8        '    8    Q    d                
t  8    '  8  Q  d         
 
 
 
Consonant and Mtr +Nasal combinations - Set 3  
 
This set is in continuation of set 2 above which shows combinations of Consonant and 
Mtr  + Nasal marker 
 
      >        -            C    
7
  O
8
 
        >           -                C      X  X 
t  t  t  >t  t  t  t  -t  t  t  t  t  t  Ct  t  t  Ot 
l  l  l  l>  l  l  X  l-  l  l  l  l  l  lC  l  X  X 
                                                 
7
 Inserted by expert although this is not a single consonant but a ligature 
8
 Inserted by expert although this is not a single consonant but a ligature 
 
 
29 
 
l  l  l  >l  l  l  X  -l  l  l  l  l  l  Cl  l  X  X 
  X  X      X  X  X  X           X  X  X  X  X 
      
   >                                 C        X  X 
      
   >         X                     C       X  X 
       >        X  -                C     X  X 
        >   X     X  -                 C      X  X 
t      >      X  -            C    X  X 
t      >      X  -            C    X  X 
          >             -                   C        X  X 
t      >        -            C    X  X 
 
 
Consonant and Mtr + Nasal combinations: With Chandrabindu 
Since Chandrabindu is rarely used in Gujarati, the experts have deemed the same as 
invalid 
   
 
 
30 
 
 
   
6.3.3. The Ligature  Set of Gujarati.  
Gujarati has a large set of ligatural forms. These are combinations of 
Consonant+Halanta+Consonant (CHC) or CHCHC or even rarer CHCHCHC. The CHC 
combinations which are the most frequent are arranged in the shape of a matrix: the  
abscissa or horizontal axis refers to the Consonant which constitutes the ligature and the 
ordinate or vertical axis shows the consonant which forms the ligature and which is 
followed by a halanta. 
As in 6.3.2. the ligature sets are divided into the following 
6.3.3.1 CHC (in a matrix) 
6.3.3.2 CHCHC 
6.3.3.3.CHCHCHC 
6.3.3.1. CHC ( combination of two consonanats) 
These ligatures are presented as in the earlier case of Consonant+Mtr  combinations in 
three sets. A lot of slots have an X marked, showing that the experts have deemed that 
such a ligature is not possible in the language. However  in these cases, the font 
developer is to assume that the ligature is linear in nature. 
The following set shows a combination of two consonants. To know how particular 
combinations forms, select one consonant from the first column and second from first 
row. For eg. Combination of consonant 3 and 3 is  ligature . 
CHC( combination of two consonants) - Set 1 
  3    -    "    O  ^     
g    3  X  X  X  3  3O  X  3  X 
  X    X  X  X  X  X  ^  X  X 
_  X  X  --  -  X  X  X  -^  -  X 
q  X  X  X    X  X  X  X  X  X 
  X  X  X  X  X  X  X  X  X  X 
_  X  X  X  X  X    O  X  X  X 
  X  X  X  X  X  X    X  X  X 
_  X  X  X  X  X  X  X  ^    X 
(  X  X  X  X  X  X  X  X    X 
  X  X  X  X  X  X  X  X  X  X 
  3  X  X  X  X    X  X  X  X 
 
 
31 
 
  X  X  X  X  X  X  X  X  X  X 
{  X  X  {-  X  X  X  X  {^  {  X 
q  X  X  X  X  X  X  X  X  X  X 
_  X  X  X  X  X  X  X  X  X  X 
q  3    X  X  X  X  X  X    X 
q  X  X  X  X  X  X  X  X  X  X 
q  X  X  q  -  q  X  X  X  X  X  X 
q  X  X  X  X  X  X  X  X  X  X 
_  3  X  -  X  X    X  ^    X 
}  3  X  X  X  X    X  X  X  X 
g  X  X  X  X  X  X  X  ^  X  X 
  X  X  X  X  X  X  X  >^  >  X 
  X  X  X  X  X  X  X  X  X  X 
  X  X  X  X  X  X  X  X  X  X 
q  X  X  X  X  X  X  X  X  X  X 
  3  X  -  X  X    X  ^  X  X 
Q  X  X  X  X  X  X  X  X  X  X 
_  X  X  X  X  X  X  X  -^  -  X 
  -3  -  X  X  X  -  X  X  X  X 
j  3    -  X  X  X  X  X  X  X 
  -3  -  X  X  X  -  X  -^  X  X 
  X  X  X  X  X  X  X  X  X  X 
 
   
 
 
32 
 
CHC Set 2:  
The following set shows a combination of two consonants. To know how particular 
combinations forms, select one consonant from the first column and second from first 
row. For eg. Combination of consonant g  and 8 is  ligature 38. 
 
CHC( combination of two consonants) - Set 2 
  8    '  8  Q  d         
g  38  X  X  X  X  3d  3  X  X  3 
  X  X  X  X  X  d  X  X  X  X 
_  X  X  X  X  -Q  X  X  -  -  - 
q  X  X  X  X  X  X  X  X  X   
  X  X  X  X  X  X  X  X  X  X 
_  X  X  X  X  X  -d  X  X  X  X 
  X  X  X  X  X  X  X  X  X  X 
_  X  X  X  X  X  X  X  X  X   
(  X  X  '  X  X  X  X    X  X 
  X  X  X  X  X  X  X  X  X  X 
         '  X  X  X  X  X  X  X 
  X    X  X  X  X  X  X  X  X 
{  X  X    {  8   X  X  X  {  X  X 
q  X  X  X    X  X  X  X  X  X 
_  X  X  X  X  X  X  X  X  X  X 
q  X  X  X  X  X      X  X   
q  X  X  X  X  X  X    X  X  X 
q  X  X  X  X  X  X  X  q    q     X 
q  X  X  X  X  X  X  X  X  -  - 
_  8  X  '  X  X  d          
}  8  X  X  X  X  d  X  X  X   
 
 
33 
 
g  8  X  X  X  X  d    X  X  X 
  X  X  >'  X  X  X  X  >  >  > 
  X  X  X  X  X  X  X  X  X  X 
  X  X  X  X  Q  d  X  X  X   
q  X  X  X  X  X  X  X  X  X  X 
  8  X  '  X  X  d        X 
Q  X  X  X  X  X  X  X  X  X  X 
_  X  X  -'  X  X  X  X  X  X  X 
  X  X  X  X  X  -d  X  X  X  - 
j      X  X  Q  X  X  X  X  X 
  -8  X  -'  X  X  -d  -  -  X  - 
  X  X  X  X  X  X  X  X  X  c  
 
     
CHC SET 3: 
The following set shows a combination of two consonants. To know how particular 
combinations forms, select one consonant from the first column and second from first 
row. For eg. Combination of consonant g  and  is  the ligature . 
CHC( combination of two consonants) - Set 3 
      >        -            C 
g        X  X  3  3  z  3  3  3  X  3  X 
  X  X  X  X        X      X    X 
_  -  X  ->  -  -  -    -  -  X  X  -  X 
q  X  X  X  X  X      X    X  X  X  X 
  X  X  X  X  {  X  X  X  X  X  X  X  X 
_  X  X  X  X  X    >  X  X  X  X  X  X 
  X  X  X  X  X    O   X    X  X  X  X 
_  X  X  X  X        X    X  X  X  X 
 
 
34 
 
(  X  X  >  X        X  X  X  X  X  X 
  X  X  X  X  X  X  X  X  X  X  X  X  X 
    X  X  X        X      X    X 
  X  X  X  X  X    X  X  X  X  X  X  X 
{  X  X  X  {  {      X  {  X  X  {  {C 
q  X  X  X  X  X    )  X  X  X  X  X  X 
_  X  X  X  _  _  _  X  X  X  X  X  X  X 
q      >  X      ?        X    X 
q  X  X  X  X  X      X  X  X  X  X  X 
q  q  X  q>  q      _  X    X  X  X  qC 
q  X  X  X  X  -  -  >  X  -  X  X  X  X 
_  X    X  X      X        X    C 
}      X  X  X        X    X    C 
g  X  X  X  X  X    g    X  X  X  X  X 
  X  X  >>  >  X  >    >  >  X  X  >  >C 
  X  X  X    X      X    X  X  X  X 
      >            X    X    C 
q  X  X  X  X  X  -  X  X  X  X  X  X  X 
      >        X        X    C 
Q  X  X  X  X  X  !  X  X  X  X  X  X  X 
_  X  X  X  X  X  -    X  -  X  X  X  -C 
  -  -  ->  -  -  -        -  X  X  -C 
j      X  X      X  X    X    X  X 
  -  -  ->  X  -  -    -  -  X  X  -  X 
  X  X  X  X        /
 
  X  X    cC 
                         
 
 
 
35 
 
6.3.3.2 CHCHC ( combination of three consonanats) 
These are not as frequent as the CHC combinations. Only the major are listed below. 
With a few exceptions these are mainly linear in nature 
Q            O     
z                 
   -   -   q-   _     
   O        O   _       
_    -    -    -           
   z            -z  
-      -        
6.3.3.3.CHCHCHC ( Combination of four Consonanats) 
This cluster is rare in a majority of languages and the experts have deemed that it is not 
found in Gujarati 
 
6.3.4  The Collation Order of Gujarati.  
Collation is one of the most important features of a script grammar. It determines the 
order in which a given culture indexes its characters. This is best seen in a dictionary sort 
where for easy search words are sorted and arranged in a specific order. Within a given 
script, each allo-script may have a different sort-order. Thus in Devanagari the conjunct 
glyph  is sorted along with , since the first letter of that conjunct is  and on a similar 
principle  is sorted along with . Different scripts admit different sort orders and for all 
high-end NLP applications, sort is a crucial feature to ensure that the applications index 
data as per the cultural perception of that community. In quite a few States, sort order is 
clearly defined by the statutory bodies of that state and hence it is crucial that such sort 
order be ascertained and introduced in the script grammar. 
In  the  case  of  Gujarati  the  following  is  the  traditional  sort  order  as  determined  by  the 
experts.  The  order  as  given  below  is  pertinent  to  sorting  by  a  computer  program  and  is 
compliant with CLDR as laid down by Unicode and W3C. 
 
 
 
 
36 
 
 
         -   -         U   3       
-   -   -   -   -   -  3        -       "   
   O   ^          8      '   8  Q    
d                       >       
   -                      C       
t   l   l                     t   t   t 
 
In Tabular format: 
         -   -         U   3         
-   -   -   -   -   -  3        -       "   
   O   ^          8      '   8  Q      
d                       >           
   -                      C         
t   l   l                     t   t   t 
 
  1 are used only for Sanskrit Loans 
 
   
 
 
37 
 
 
 
 
7.  REFERENCES 
1.  http://www.unicode.org 
2.  ISCII91 
 
   
 
 
38 
 
8.  ANNEXURES 
Annexure 1: Names of experts who have contributed to the script grammar 
   
 
 
39 
 
Annexure 2: Unicode Table of Gujarati
9
 
 
 
   
                                                 
9
 The Unicode chart provided is for version 5.1 since the Script Grammar was prepared at that time. No 
considerable change in the script grammar can be seen in the updated versions of Unicode, with the 
possible addition of the Rupee Sign U+02B9 
 
 
40