1                                   Introduction
2                         Language Grammar and Automata
 3
 4   The formal language and automata theory based on three fundamental concepts Language, Grammar and
 5   automata.
 6   Language:
 7   We are familiar with the notion of natural language, such Hindi, Bengali. The word “Language” means a
 8   medium through which we can express our views, thoughts and ideas. But it is not sufficient for the
 9   definition of formal languages. We need a precise definition of the term,
10    We start with a non empty finite set of symbols “∑” called alphabet.
11   If we consider an alphabet ∑ = {a,b} where the elements of the alphabets are called symbols, if no other
12   rules are specified then the string w contains the element of ∑ belongs to the language L(G).
13   L(G)={w:wε∑={a,b}}
14   For example w1=ab
15                 W2=aab
16                 W3=ba
17                 W4=abab
18                 W5=aabb
19   Where w1, W2, W3, W4, W5ε L(G)
20   If we specify a rule in L(G)
21        1. Equal number of a= Equal number of b.
22   Then w1, W4, W5ε L (G) and W2, W3 does not belongs to L (G)
23   If we specify two rules in L (G)
24        1. Equal number of a= Equal number of b.
25        2. All a should precede all b.
26   Then w1, W5ε L (G) and W2, W3, W4 does not belongs to L (G)
27   And L(G)={ an bn:n≥1}
28   Examples of languages:
29        • The set of legal English words
30        • The set of legal C programs
31   The concatenation of two strings w1 and W2 is the strings obtained by appending the symbols of W2 to the
32   right end of w1 that is
33   w1.W2= abaab.
34   The reverse of a string is obtained by writing the symbols in the reverse order. That is if,
35   w1=ab then w1R=ba.
36   The length of a string w, denoted by |w|, is the number of symbols in the string. For example |w1|=2.
37   Sometimes we need to refer λ, the empty string, which is a string with no symbols at all. The relation of λ
38   is that,
39    |λ|=0,
40   wλ =w= λw, hold for all w.
41   Any string of consecutive characters in some w is said to be the substring of w. For example W6= W4. W5.
42   W4, W5 are the substring of W6. W4 is the prefix of W6 and W5 is the suffix of W6.
                                                         1
43   If ∑ is the alphabet, ∑ * is the set of strings obtained by concatenating zero or more symbols from∑. The
44   set ∑ * always contain λ and it is called star closure. ∑+ denotes it excludes λ and it is called positive
45   closure. If it is given ∑ it is to be assumed that it is star closure and contains λ. We can now define,
46   ∑+= ∑*-{ λ}
47   A language is defined very generally as of subset of ∑*. A string in a language L will be called a sentence
48   of L. Note that ∑* is countably infinite set. Hence ∑*=∑0U∑1 U∑2……. and ∑+=∑1 U∑2U∑3…….
49   For example, if ∑={a, b}, the following are the language over ∑
50   L1= {λ, a, b}
51   L2= {ab, aabb, aaabbb …}
52   L3= {w ε∑*| number of a’s and number of b’s in w is equal}
53   In the above example L1 is finite language but L2 and L3 is infinite language.
54   Grammar:
55   In 1959, Noam Chomsky tried to give a mathematical definition for grammar. He defined four types of
56   grammar viz., type 0, type 1, type2 and type 3. At the same time, the programming language ALGOL was
57   being considered. It was considered as a block-structured language and a grammar was required which
58   could describe all syntactically correct programs. The definition given was called Backus-Naur Form
59   (BNF). This definition found to be same as the definition given by Chomsky for type2 grammar. A
60   grammar for the English language tells us whether a particular sentence is well-formed or not. A typical
61   rule for grammar is “a sentence can consist of a noun phrase followed by a predicate.”
62   Consider the following two sentences and their parse tree,
                       Sentence
             Noun Phrase      Verb phrase
        Article            Noun                 running
                                   is
           The             Dog
63
64   Fig.1.1 “The dog is running”
                             Sentence
      Noun Phrase                                         Verb phrase
        PropNoun                        verve                            Noun Phrase
         Kolkata                          is                   Article    Adjective    Noun
                                                                  a       beautiful    city
65
66   Fig.1.2 “Kolkata is a beautiful city”
                                                                  2
 67   The rules of the grammar for a sentence1 can be written as follows:
 68   Sentence<Noun Phrase>< Verb Phrase>
 69   Noun Phrase<Article><Noun>
 70   Verb Phrase<Verb><Noun Phrase>
 71   The derivation of the grammar is as follows
 72   Sentence<Noun Phrase>< Verb Phrase>
 73   Sentence<Article><Noun><Verb Phrase>
 74   SentenceThe <Noun><Verb Phrase>
 75   SentenceThe dog <Verb Phrase>
 76   SentenceThe dog <Verb><Noun Phrase>
 77   SentenceThe dog is running
 78   The rules of the grammar for a sentence2 can be written as follows:
 79   Sentence<Noun Phrase>< Verb Phrase>
 80   Noun Phrase<PropNoun>
 81   Verb Phrase<Verb><Noun Phrase>
 82   Noun Phrase<Article><Adjective><Noun>
 83   Sentence<Noun Phrase>< Verb Phrase>
 84   Sentence <PropNoun>< Verb Phrase>
 85   SentenceKolkata < Verb Phrase>
 86   SentenceKolkata <Verb><Noun Phrase>
 87   SentenceKolkata is <Noun Phrase>
 88   SentenceKolkata is <Article><Adjective><Noun>
 89   SentenceKolkata is a <Adjective><Noun>
 90   SentenceKolkata is a beautiful <Noun>
 91   SentenceKolkata is a beautiful city.
 92   A grammar G is defined by the quadruple
 93   G= (V, T, S, P)
 94   Where V is the finite set of objects called variables,
 95   T is the finite set of objects called terminal symbols,
 96   SεV is a special symbol called start symbol,
 97   And P is a finite set of productions.
 98   The production rule is the heart of the grammar; they specify how the grammar transforms one string to
 99   another. This will be denoted by “”.In this book we will assume that all production rules are of the form
100   xy and yz where x, y ε (VUT)+ and z ε (VUT)*
101   Now if we have a string w1=uxv then it can be written as w2=uyv and then w3=uzv, or it can can be
102   directly written as w1=*=>w3, * indicates that some unspecified number of steps (including zero) can be
103   taken to derive w3 from w1.
104   Definition
105   Let G= (V, T, S, P) be a grammar. Then the set
106   L(G)={wεT*: S=*=>w3}
107   Is the language generated by G.
108   If wεL(G), then the sequence
                                                          3
109   S==> w1==> w2==> w3…………..==> wn==> w is a derivation of a sentence w. The strings S, w1, w2,
110   w3, wn which contains variables as well as terminals, are called sentential forms of the derivation. If w
111   contains terminal symbol then it is called sentence.
112   Now Let us discuss what should be the production rule for the language
113   Example 1:
114   L(G)={ an bn:n≥0}
115   If we consider a variable S and the production
116   SaSb
117   Sλ.
118   Then SaSbaaSbbaaaSbbbaaaλbbbaaabbb
119   So the grammar,
120   V=S
121   T=a,b
122   S=S
123   P= SaSb
124       Sλ
125   If L(G)={ an bn:n≥1}
126   SaSb
127   Sab.
128   Then SaSbaaSbbaaaSbbbaaaabbbbaaaabbbb
129   So the grammar,
130   V={S}
131   T={a,b}
132   S=S
133   P= SaSb
134       Sab
135   Example 2:
136   Consider the language,
137   L(G)={ an bn+1:n≥0}
138   The production rule is,
139   SaSb
140       Sb
141   The production rule can be written in the following manner also,
142   SAb
143   AaAb/ λ
144   In the 1st case number of production require is 2 and in the 2nd case number of production required is 3.
145   Hence we obviously choose the 1st case.
146   Example 3:
147   L(G)={ an b2n:n≥0}
148   In this problem b will appear twice than a.
149   So the production of the grammar is
150   SaSbb
151       Sλ
152   Example 3:
                                                         4
153   L(G)={ an+2 bn:n≥0}
154   In this problem two more “a” requires than equal no of a and b.
155   So the production of the grammar is
156   SaSb
157       Saa.
158   Example 4:
159   L(G)={ an bn-3:n≥3}
160   This problem is quit similar to the previous one.
161   SaSb
162       Saaa.
163   Example 5:
164   L(G)={ an bm:n≥0, m>n}
165   In this problem three condition have to be considered.
166   a)Number of a may be “0” or more than “0” but number of b appears more than “1”
167   b)Equal number a= Equal number b.
168   c)More “b” appears than “a”
169   So the production rule is,
170   SaSb/Bb
171       Bb/ λ
172   V={S, B}
173   T= {a, b}
174   S= {S}
175   P= SaSb/Bb
176       Bb/ λ
177   Example 6:
178   Find the grammars for ∑= {a,b} that generate the sets of all strings with exactly one ”a”
179   SAaA
180   AAb/ λ
181   Example 7:
182   Find the grammars for ∑= {a,b} that generate the sets of all strings with atleast one ”a”
183   SAaA
184   AAa/Ab/ λ
185   Example 8:
186   Find the grammars for ∑= {a,b} that generate the sets of all strings with no more than three “a”s.
187   SAaA/ AaAaA/AaAaAaA
188   AAb/ λ
189   Example 9:
190   Find the grammars for ∑={a,b} that generate the sets of all strings with at least three “a”s.
191   S AaAaAaA
192   AAa/Ab/ λ
193   Example 10:
194   Find grammars for the languages on ∑= {a} and L(G)={w:|w|mod3=0}
195   SaaaS/λ
196   Find grammars for the languages on ∑= {a} and L(G)={w:|w|mod3>0}
                                                           5
197   This problem can be divided into two sub problem |w|mod3=1 and |w|mod3=2
198   |w|=1 can be represented as 1+ 0X3           |w|=2 can be represented as 2+ 0X3
199   |w|=4 can be represented as 1+ 1X3           |w|=5 can be represented as 2+ 1X3
200   |w|=7 can be represented as 1+ 2X3           |w|=8 can be represented as 2+ 2X3
201   |w|=10 can be represented as 1+ 3X3          |w|=11 can be represented as 2+ 3X3
202   ……………………………..                                 ………………………………….
203   Hence the production rule is
204   SS1/S2
205   S1aaaS1/a
206   S2aaaS1/aa
207    Example 11:
208   Give a simple description of the language generated by the grammar with the production
209   SaA
210   AbS
211   S λ
212   Let’s make an arbitrary derivation of the grammar
213   SaAabSabaAababSababλabab
214   Hence the language of the grammar is (ab)n
215   Example 12:
216   What language does the grammar with this production generate?
217   SaA
218   AB
219   BAa
220   Let us make an arbitrary derivation of the grammar
221   SaAaBaAaaBa
222   This derivation will not stop ever. Hence this type of language is called infinite language.
223   Example 13:
224   Let L(G)={ab, aa, baa}. Which of the following strings are L*: abaabaab, ababbaa, baababaa.
225   Example 14:
226   Are the two grammars with respective productions
227   SaSb/ab/λ
228   And
229   SaAb/ab
230   AaAb/ λ
231   Equivalent or not where S is the start symbol.
232   Let us make a derivation of the two production with minimum condition.
233   For the first production
234   S λ
235   For the second production
236   Sab
237   As the two grammars producing separate terminals in the minimum condition, hence they are not
238   equivalent.
239   Example 15:
240   Find the language of the grammar with the production
                                                   6
241   SSS
242   S λ
243   SaSb
244   SbSa
245   The language is L (G) = {na(w)= nb(w)}
246   Where na(w) is the number of a in w and nb(w) is the number of b in w.
247   Example 16:
248   Find the grammar of the language
249   L(G) ={ an bn am bm:n≥0, m≥0}
250   The production rule of the grammar is
251   SSS
252   SaSb/λ
253   Automata
254   The Automaton is an abstract model of any digital computer. Any smallest unit of a machine or computer
255   having a basic job to perform is called automaton and the plural form is called automata. There are
256   different varieties of such abstract models (also called models of computation) which can be
257   defined mathematically. Some of them are as powerful in principle as today's real computers,
258   while the simpler ones are less powerful. We will study the simpler machines as they are easier
259   to introduce some formalism used in theory. The simplest computational model is called a finite state
260   machine or a finite automaton. The basic block diagram of a finite state machine is shown in fig 1.3. It
261   consists of an input file, which the automaton can read but not change. The input file is divided into cells,
262   each of which can hold one symbol. The input mechanism can read the input file from left to right, one
263   symbol at a time. The input mechanism can also sense the end of the input string. The automaton can
264   produce output of the same form. It contains a temporary storage device, consisting of unlimited number
265   of cells, each capable of holding symbol corresponding to an alphabet (it is not necessary that alphabet
266   should be stored in the storage device). The automation can read and change the contents of the storage
267   cells. Finally the automation has a control unit, which can be in any of the in any of a finite number of
268   internal states, and which can change states in a defined manner.
                                                   Input File
                                  0   0    1   0     1      0       1   0   0   0
                                                                                                       0
                                                                                                       0
                                                                                                       1
                                                     Control Unit
                                                                                        Storage Unit
                                                                                                       1
                                                                                                       0
                                                                                                       0
                                                                                                       0
                                      0    0   0     1      0       1   0   0   0   0
269                                                      Output File
270                           Fig 1.3: Schematic representation of a general automation
                                                                7
271   The concept of a finite automation appears to have arise in the 1943 paper. A logical calculus of the ideas
272   immanent in nervous activity, by Warren McCullock and Walter Pitts. In 1951 Kleene introduced regular
273   expressions to describe the behavior of finite automata. In 1959, Dana Scott and Michael Rabin
274   introduced non-deterministic automata and showed the surprising theorem that they are equivalent to
275   deterministic automata.
276   Let us consider a finite state machine that takes input bits serially and produce the output “1” when the
277   last four inputs are 0101 else it produce the output “0” as shown in fig 1.3. Input file contains the input
278   0010101000 and the control unit read the input from left to right and produce the output serially
279   0000101000. Storage unit store the contents of the input file, alphabets.
280   At any given time, the control unit is in some internal state, and the input mechanism is scanning a
281   particular symbol on the input file. The internal state of the control unit at the next transition is
282   determined by the next state or transition function. The transition function gives the next state in terms of
283   the current state, the current input symbol, and the information currently in the temporary input storage.
284   Automata generally of two types accepter and transducer. When the automata produces the output in
285   simple “yes” or “no” or “0” or “1” then it is called accepter. The sequence detector we discussed earlier
286   is an example of accepter. A more general automata, capable of producing strings of symbols (number or
287   character) as output, is called a transducer. Example of a transducer is counter, shift register.
288
289
290
291
292
293
294
295