0% found this document useful (0 votes)
17 views11 pages

LANGUAGES

Uploaded by

leonelbulawan485
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

LANGUAGES

Uploaded by

leonelbulawan485
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Republic of the Philippines

Sorsogon State University


Bulan Campus

MODULE 5.1

LANGUAGES

CS 211
DISCRETE STRUCTURES 2
1st Semester, S.Y.2023 - 2024
BSCS II

Engr. Rey C. Rodrigueza, MIT


Associate Professor IV
Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

1 Introduction
This module delves on the fundamental principles of formal languages and
the operations that can performed on languages. According to logician
D.I.A.Cohen[1], the word ”formal” refers to the fact that all the rules
for the language are explicitly stated in terms of what strings of symbols
can occur. No liberties are tolerated, and no reference to any ”deeper
understanding” is required. Language will be considered solely as symbols
on paper and not as expressions of ideas in the minds of humans. In this
basic model, language is not communication among intellects, but a game
of symbols with formal rules. the term ”formal” emphasizes that it is the
form of the string of symbols we are interested in, not the meaning.

2 Objectives
After completing this module, you will be able to:

• Produce the words that form the language that can be generated
from an alphabet.
• Perform operations on languages such as union, concatenation, Kleene-
star, complementation and reversal.

3 Discussion of Content
3.1 Alphabets, Strings and Languages
3.1.1 Alphabets
An alphabet is any finite set of characters.

Here are some examples for such alphabets:


(i) Σ1 = {0, 1}
(ii) Σ2 = {a, b, c}
(iii) Σ3 = {0, 1, #}
(iv) Σ4 = {a, . . . z, A, . . . Z}: all the letters in the English language.
(v) Σ5 = ASCII - this is the standard encoding schemes used by comput-
ers mappings bytes (i.e., integers in the range 0 to 255) to characters.
As such, a is 65, and the space character ∆ is 32.

Author: Engr. Rey C. Rodrigueza, MIT 1


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

(vi) Σ6 = {for, while, if, main, do, else, break, continue, return}.

3.1.2 Strings
A string over an alphabet Σ is a finite sequence of characters from Σ.

Some sample strings with alphabet (say) Σ = {a, b, c} are abc, baba, and
aaaabbbbccc.

The length of a string x is the number of characters in x, and it is denoted


by |x|. Thus, the length of the string w = abcdef is |w| = 6.

The empty string is denoted by ϵ, and it (of course) has length 0. The
empty string is the string containing zero characters in it.

The concatenation of two strings x and w is denoted by xw, and it is the


string formed by the string x followed by the string w. As a concrete ex-
ample, consider x = automata, w = theory and the concatenated strings
xw = automatatheory and wx = theoryautomata.

Naturally, concatenating with the empty string results in no change in the


string. Formally, for any string x, we have that xϵ = x. As such ϵϵ = ϵ.

For a string w, the string x is a substring of w if the string x appears


contiguously in w.
As such, for w = abcdef
we have that bcd is a substring of w,
but ace is not a substring of w.

A string x is a suffix of w if its a substring of w appearing in the end of


w. Similarly, y is a prefix of w if y is a substring of w appearing in the
beginning of w.
As such, for w = abcdef
we have that abc is a prefix of w,
and def is a suffix of w.

Here is a formal definition of prefix and substring.

Definition The string x is a prefix of a string w, if there exists a string


z, such that w = xz.

Author: Engr. Rey C. Rodrigueza, MIT 2


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

Similarly, x is a substring of w if there exist strings y and z such that


w = yxz.

3.1.3 Languages
A language is a set of strings. One special language is Σ∗ , which is the set
of all possible strings generated over the alphabet Σ∗ [1]. For example, if

Σ = {a, b, c}, then


Σ∗ = {ϵ, a, b, c, aa, ab, ac, ba, . . . , aaaaaabbbaababa, . . . }.

Namely, Σ∗ is the “full” language made of characters of Σ. Naturally, any


language over Σ is going to be a subset of Σ∗ .

Some abstract examples of languages[2].


1. L = {b, ba, baa, baaa, baaaa, . . . }.
2. The language of all strings consisting of n 0’s followed by n 1’s for
some n ≥ 0 : {ϵ, 01, 0011, 000111, . . . }.

3. The set of strings of 0’s and 1’s with an equal number of each:
{ϵ, 01, 10, 0011, 0101, 1001, . . . }.
4. The set of binary numbers whose value is a prime:
{10, 11, 101, 111, 1011, . . . }.

5. Σ∗ is a language for any alphabet Σ.


6. ϕ, the empty language, is a language over any alphabet.
7. {ϵ}, the language consisting of only the empty string, is also a lan-
guage over any alphabet. ϕ ̸= {ϵ}; the former has no strings and
the latter has one string.
8. The only important constraint on what can be a language is that
all alphabets are finite. Thus languages, although they can have an
infinite number of strings, are restricted to consist of strings drawn
from one fixed, finite alphabet.

In automata theory, a problem is the question of deciding whether a given


string is a member of some particular language.

Author: Engr. Rey C. Rodrigueza, MIT 3


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

Now, is the following a language?


{aa,ab,ba,ϵ}.

How about {aa,ab,ba,ϕ}. Is this a language?

Lexicographic ordering of a set of strings is an ordering of strings that


have shorter strings first, and sort the strings alphabetically within each
length. Naturally, we assume that we have an order on the given alphabet.

Thus, for Σ = {a, b}, the Lexicographic ordering of Σ∗ is

ϵ, a, b, aa, ab, ba, bb, aaa, aab, . . . .

The reversal wR of a string w is the sequence of the symbols of w but


in reverse order. So, (ALIBAT A)R = AT ABILA. Also, aR = a for
any symbol a, and εR = ε. A string is a palindrome if wR = w. The
following are examples of palindromes: ε, I, LEVEL, ROTATOR, abba,
and 0100010.

The nth power of a string w, denoted by wn , is the concatenation of


n w′ s. If w = ab over Σ = {a,b,c}, then w1 = ab = w, w2 = (ab)2 =
abab, w5 = ababababab and w0 = ϵ.

3.1.4 Operations on Languages


Since languages are sets, any set operation may be performed on lan-
guages. Let L1 = {0, 1}, L2 = {1, 11, 111} be languages over the alphabet
Σ = {0, 1}. Then
L1 ∪ L2 = {0, 1, 11, 111}.
L1 ∩ L2 = {1}.
L1 − L2 = {11, 111}, and
L1 × L2 = {(0, 1), (0, 11), (0, 111), (1, 1), (1, 11), (1, 111)}.

These operations apply even if the languages are infinite. For example, if
L3 = {(11)n | n ≥ 0} and L4 = {(111)n | n ≥ 0} then L3 ∪ L4 = {1n |
n is a multiple of 2 or 3, n ≥ 0}.

The concatenation of two languages L1 and L2 , denoted by L1 · L2 or


simply L1 L2 , is the set of all strings which are concatenations of a string
in L1 and a string in L2 . Formally, L1 ·L2 = L1 L2 = {xy | x ∈ L1 and y ∈
L2 }. For example, if L = {a, ba} and M = {1, 00, 111} then LM = {a1,

Author: Engr. Rey C. Rodrigueza, MIT 4


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

a00, a111, ba1, ba00, ba111}.

A language L to the nth power, denoted by Ln , is the set of all con-


catenations of n strings from L. For language L = {1, 01} over Σ = {0, 1},
L2 = {11, 101, 011, 0101},
L1 = L,
L0 = {ϵ},
L3 = {111, 1101, 1011, 0111, 10101, 01101, 01011, 010101}, etc.

Similarly, the alphabet Σ can be raised to the nth power. Σn is the


set of all strings over Σ that is of length n. If Σ = {0, 1}, then
Σ0 = {ϵ},
Σ1 = {0, 1},
Σ2 = {00, 01, 10, 11},
Σ2 = {001, 001, 010, 011, 100, 101, 110, 111}, and so on.

Note that Σ contain symbols, while Σ1 contains strings of length 1.


They look the same when written but are actually different sets [3].

The Kleene-star or Kleene-closure Σ∗ of the alphabet Σ is the union


of all the powers of Σ. For the set N of natural numbers,

[
Σ∗ = Σi = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . .
i∈N

Σ∗ is the set of all strings of all lengths over Σ. The positive closure
Σ of Σ is the set of all non-empty strings over Σ. Σ+ = Σ∗ − {ϵ} is the
+

set of all strings over Σ of positive length. If Σ = {0,1}, Σ∗ is the set of


all (non-empty) bit strings.

Similarly, the Kleene-star L∗ of a language L is the union of all powers


of L. That is, for the set N of natural numbers,

[
L∗ = Li = L0 ∪ L1 ∪ L2 ∪ . . .
i∈N

Also, the positive closure of L is L+ = L1 ∪ L2 ∪ L3 ∪ . . . . Observe that


ϵ ∈ L if and only if ϵ ∈ L+ .

Author: Engr. Rey C. Rodrigueza, MIT 5


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

The complement of a language L over alphabet Σ is L = L′ = Σ∗ − L.



Σ and the empty language ϕ are complements of each other. For L5 =
{w | w begins with a over Σ = {a,b}, then
L5 = {w | w does not begin witha}
= {ϵ, b, ba, bb, baa, bab, bba, bbb, baaa, . . . }.

Note that the complement includes ϵ because ϵ is not in L5 . It is not ex-


actly the language of strings that begin with b. It is also not those strings
that end in b.

The reversal LR of a language L is the set of all reversals of strings


in L. That is, LR = {wR | w ∈ L}. For example, over Σ = {a, b}
the reversal of L5 above is LR
5 = {w | w ends with a}. The reversal of
L1 = {w | |w| ≤ 2} is itself. That is, LR
1 = L1 . The reversal of L4 =
{an bn | n is a natural number} is L4 = {bn an | n is a natural number}.
Clearly, |L| = LR for any language L.

3.1.5 Why should we care about languages?


Consider the language Lprimes that contains all strings over Σ = {0, 1, . . . , 9}
which are prime numbers. If we can build a fast computer program (or
an automata) that can tell us whether a string s (i.e., a number) is in
Lprimes , then we decide if a number is prime or not. And this is a very
useful program to have, since most encryption schemes currently used by
computers (i.e., RSA) rely on the ability to find very large prime numbers.

Let us state it explicitly: The ability to decide if a word is in a spe-


cific language (like Lprimes ) is equivalent to performing a computational
task (which might be extremely non-trivial). You can think about this
schematically, as a program that gets as input a number (i.e., string made
out of digits), and decides if it is prime or not. If the input is a prime
number, it outputs Yes and otherwise it outputs No. See figure below.

Author: Engr. Rey C. Rodrigueza, MIT 6


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

3.1.6 Strings and Programs


A text file (i.e., source code of a program) is a long one dimensional string
with special ⟨N L⟩ (i.e., newline) characters that instruct the computer
how to display the file on the screen. That is, the special ⟨N L⟩ characters
instruct the computer to start a new line. Thus, the text file

if x=y then
jump up and down and scream

Is in fact encoded on the computer as the string

if∆x=y∆then⟨N L⟩∆∆jump∆up∆and∆down∆and∆scream.

Here, ∆ denote the special space character and ⟨N L⟩ is the new-line char-
acter.

Program input and output can be consider to be files. So a standard


program can be thought of as a function that maps strings to strings.1
That is P : Σ∗ −→ Σ∗ . Most machines in this class map input strings
to two outputs: “yes” and “no”. A few automatas and most real-world
machines produce more complex output.

Author: Engr. Rey C. Rodrigueza, MIT 7


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

4 Self-Check Exercises
1. Consider the language S ∗ , where S = {a, b}. How many words does
this language have of length 2? of length 3? of length n?
2. Consider the language S ∗ , where S = {aa, b}. How many words does
this language have of length 5? of length 6? of length n? What can
be said in general?

3. Let S = {ab, bb} and let T = {ab,bb, bbb}. Show that S ∗ = T ∗ .


4. Let us define (S ∗∗ )∗ = S ∗∗∗ . Is this set bigger than S ∗ ? Is it bigger
than S?
5. Find the reversal of each of the languages below over the alphabet
Σ = {0, 1}:
(a) L1 = {ϵ, 1, 011010, 10001}
(b) L2 = {w | w is a palindrome }
6. Find the complement of each of the languages below over the alpha-
bet Σ = {0, 1}:
(a) L1 = {000, 001, 010, 011, 100, 101, 110, 111}
(b) L2 = {w | w ends with a 0 }

Author: Engr. Rey C. Rodrigueza, MIT 8


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

5 Problem Set
1. If a language L has k elements, what is the maximum value for |Ln |?
Give an example L and n where |Ln | is less than the maximum value
for |Ln |.
2. Let L1 = {w ∈ {0, 1}∗ | |w| ≤ 3} and L2 = {w ∈ {0, 1}∗ |
|w ends with a 1 }. Describe L1 ∪L2 , L1 ∩L2 , L1 −L2 , and L1 ×L2 .
3. Determine the nth power of L = {aa, b, bc} for n = 0,1,2,3.
4. Consider the language S ∗ , where S = {ab, ba}. Write out all the
words in S ∗ that have seven or fewer letters. Can any word in this
language contain the substrings aaa or bbb? What is the smallest
word that is not in this language?
5. Show that {0}∗ and {0, 1}∗ are countably infinite languages.
6. Let S = {ab, bb} and let T = {ab,bb, bbb}. Show that S ∗ ̸= T ∗ ,
but that S ∗ ⊂ T ∗ .

7. Find the reversal of each of the languages below over the alphabet
Σ = {0, 1}:
(a) L3 = {w | w has many 0’s as 1’s }
(b) L4 = {w | w every 1 in w is preceded by a 0 }

8. Find the complement of each of the languages below over the alpha-
bet Σ = {0, 1}:
(a) L3 = {w | the middle symbol of w is the symbol 1 }
(b) L4 = {w | every 1 in w is preceded by a 0 }

Author: Engr. Rey C. Rodrigueza, MIT 9


Automata Theory & Formal Languages Sorsogon State University
Module 5.1: Languages Bulan Campus

References
[1] D. I. Cohen, Introduction to Computer Theory, 2nd ed. Jones and
Bartlett Publishers, Inc., 1996.
[2] J. E. Hopcroft, R. Motwani, and J. D. Ullman, Introduction to
Automata Theory, Languages and Computation. Addison-Wesley.
http://www-db. stanford. edu/ullman/ialc. html, 2019.

[3] C. J. Delfinado, Automata Theory. C & E Publishing, Inc., 2016.

Author: Engr. Rey C. Rodrigueza, MIT 10

You might also like