0% found this document useful (0 votes)
24 views10 pages

Unit 1

Unit 1 covers the basics of formal languages, focusing on their significance in programming languages, including concepts of syntax, semantics, grammars, and Backus-Naur Form (BNF). It explains the structure of formal languages, the rules governing their syntax and semantics, and how grammars can generate languages. The unit aims to equip learners with an understanding of these foundational concepts essential for programming language theory.

Uploaded by

afomh2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

Unit 1

Unit 1 covers the basics of formal languages, focusing on their significance in programming languages, including concepts of syntax, semantics, grammars, and Backus-Naur Form (BNF). It explains the structure of formal languages, the rules governing their syntax and semantics, and how grammars can generate languages. The unit aims to equip learners with an understanding of these foundational concepts essential for programming language theory.

Uploaded by

afomh2019
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT 1: BASICS OF FORMAL LANGUAGES

Contents
1.0 Aims and Objectives

1.1. Introduction
1.2. Syntax and Semantics

1.3. Grammars
1.4. Backus Naur Form

1.5. Summary
1.6. Model Examination Questions

1.0. AIMS AND OBJECTIVES

This unit introduces to the concepts of formal languages and its importance in the field of
programming languages.
After you have studied this unit, you will be able to:
• Understand the importance of syntax and semantics in programming languages
• Know what and how BNF works
• Explain the role of grammars

1
1.1. INTRODUCTION

A formal language is an abstraction of the general characteristics of programming languages.


A formal language consists of a set of symbols and some rules of formation by which these
symbols can be combined into entities called sentences. A formal language is the set of all
sentences permitted by the rules of formation. Although some of the formal languages we
study here are simpler than programming languages, they have many of the same essential
features. We can learn a great deal about programming languages from formal languages.

We are all familiar with the notion of natural languages, such as English and French. Still,
most of us would probably find it difficult to say exactly what the word “language” means.
Dictionaries define the term informally as a system suitable for the expression of certain
ideas, facts, or concepts, including a set of symbols and rules for their manipulation. While
this gives us an intuitive idea of what a language is, it is not sufficient as a definition for the
study of formal languages. We need a precise definition for the term.

We start with a finite, nonempty set ∑ of symbols, called the alphabet. From the individual
symbols we construct strings, which are finite sequences of symbols from the alphabet. For
example, if the alphabet ∑ = {a, b}, then abab and aaabbba are strings on ∑. With few
exceptions, we will use lowercase letters a, b, c... for elements of ∑ and u, υ, ω,...for string
names. We will write, for example, w = abaaa to indicate that the string named w has the
specific value abaaa.

The concatenation of two strings w and v is the string obtained by appending the symbols of
υ to the right end of w, that is, if w = a1a2 . . . an; and v = b1b2 . . . bm; then the concatenation of
w and v, denoted by wv, is: wv = a1a2 . . . an b1b2 . . . bm.

The reverse of a string is obtained by writing the symbols in reverse order. The length of a
string w, denoted by |w|, is the number of symbols in the string. We will frequently need to
refer to the empty string, which is a string with no symbols at all.

To study languages mathematically, we need a mechanism to describe them. Everyday


language is imprecise and ambiguous, so informal descriptions in English are often
inadequate. A grammar for the English language tells us whether a particular sentence is

2
well-formed or not. A typical rule of English grammar is “a sentence can consist of a noun
phrase followed by a predicate.”

1.2. SYNTAX AND SEMANTICS

Programming languages are examples of formal languages. Formal languages are defined by
two sets of rules:
• Syntax: precise rules that tell you the symbols you are allowed to use and how to put
them together into legal expressions.
• Semantics: precise rules that tell you the meanings of the symbols and legal
expressions.
Consider learning a new language (even a language such as French). How are you taught
what words there are and how to put these together to make grammatical sentences (the
syntax)? How are you taught what the words and phrases mean (the semantics)?
Object language: the language under discussion e.g. the one you are being taught;
Meta language: the language in which the object language is discussed.

Syntax is anything having to do with formal languages without regard to any interpretation
or meaning given to them. Syntax is concerned with the rules used for constructing, or
transforming the symbols and words of a language, as contrasted with the semantics of a
language which is concerned with its meaning. The symbols, formulas, systems, proofs, and
interpretations expressed in formal languages are syntactic entities whose properties may
be studied without regard to any meaning they may be given, and, in fact, need not be given
any.

Syntax is usually associated with the rules (or grammar) governing the composition of texts
in a formal language that constitute the well formed formulas of a formal system. The term
syntax refers to the rules governing the composition of well-formed expressions in a
programming language. As in mathematical logic, it is independent of semantics and
interpretation.

3
The syntax of a formal language is its structure, and is specified by a formal grammar of the
formal language. Syntax is usually associated with the rules (or grammar) governing the
composition of texts in a formal language that constitute the well-formed formulas of a
formal system. For example:

Semantics: precise rules that tell you the meanings of the symbols and legal expressions.
In programming languages theory, semantics is the field concerned with the rigorous
mathematical study of the meaning of programming languages. It does so by evaluating the
meaning of syntactically valid strings defined by a specific programming language, showing
the computation involved. In such a case that the evaluation would be of syntactically invalid
strings, the result would be non-computation. Semantics describes the processes a computer
follows when executing a program in that specific language. This can be shown by describing
the relationship between the input and output of a program, or an explanation of how the
program will be executed on a certain platform, hence creating a model of computation.

Check your progress-1


1. Explain what syntax and semantics means in programming languages?
_________________________________________________________________________________________________
________________________________________________________________________________________________
__________________________________________________

4
1.2. GRAMMARS

In the literary sense of the term, grammars denote syntactical rules for conversation in
natural languages.
A grammar G can be formally written as a 4-tuple (N, T, S, P) where:

• N or VN is a set of variables or non-terminal symbols.

• T or ∑ is a set of Terminal symbols.

• S is a special variable called the Start symbol, S ∈ N

• P is Production rules for Terminals and Non-terminals. A production rule has the
form α → β, where α and β are strings on VN ∪ ∑ and least one symbol of α belongs
to VN.

Example:
Grammar G is given by ({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})

Here,

• S, A, and B are Non-terminal symbols;

• a and b are Terminal symbols

• S is the Start symbol, S ∈ N

• Productions, P: S → AB, A → a, B → b

The set of all strings that can be derived from a grammar is said to be the language
generated from that grammar. A language generated by a grammar G is a subset formally
defined by

L(G)= {W|W ∈ ∑*, S ⇒G W}

If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2.

Example
Suppose we have the following grammar:

5
G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}

The language generated by this grammar −

L(G) = {ab, a2b, ab2, a2b2, ………}

= {am bn | m ≥ 1 and n ≥ 1}

We’ll consider some languages and convert it into a grammar G which produces those
languages.

Example
Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the
grammar G which produces L(G).

Solution

Since L(G) = {am bn | m ≥ 0 and n > 0}

the set of strings accepted can be rewritten as −

L(G) = {b, ab,bb, aab, abb, …….}

Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including
null.

To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −

S → aS , S → B, B → b and B → bB

S → B → b (Accepted)

S → B → bB → bb (Accepted)

S → aS → aB → ab (Accepted)

S → aS → aaS → aaB → aab(Accepted)

S → aS → aB → abB → abb (Accepted)

Thus, we can prove every single string in L(G) is accepted by the language generated by the
production set.

6
Hence the grammar −

G: ({S, A, B}, {a, b}, S, {S → aS | B, B → b | bB })

1.4. BACKUS NAUR FORM

Backus-Naur notation (shortly BNF) is a formal mathematical way to describe a language,


(to describe the syntax of the programming languages). The Backus-Naur Form is a way of
defining syntax. It consists of
• a set of terminal symbols
• a set of non-terminal symbols
• a set of production rules of the form

Left-Hand-Side: = Right-Hand-Side
where the LHS is a non-terminal symbol and the RHS is a sequence of symbols (terminals or
non-terminals).
• The meaning of the production rule is that the non-terminal on the left hand side may
be replaced by the expression on the right hand side.
• Any sentence which is derived using the production rules is said to be syntactically
correct. It is possible to check the syntax of a sentence by building a parse tree to show
how the sentence is derived from the production rules. If it is not possible to build such
a tree, then the sentence has syntax errors.
• Syntax rules define how to produce well-formed sentences. But this does not imply that
a well formed sentence has any sensible meaning. Semantics define what a sentence
means.
• It is used to formally define the grammar of a language.
BNF is sort of like a mathematical game: you start with a symbol (called the start symbol and
by convention usually named S in examples) and are then given rules for what you can
replace this symbol with. The language defined by the BNF grammar is just the set of all
strings you can produce by following these rules.
The rules are called production rules, and look like this:

7
symbol: = alternative1 | alternative2
A production rule simply states that the symbol on the left-hand side of the: = must be
replaced by one of the alternatives on the right hand side. The alternatives are separated by
|s. (One variation on this is to use: = instead of: =, but the meaning is the same.) Alternatives
usually consist of both symbols and something called terminals. Terminals are simply pieces
of the final string that are not symbols. There is one special symbol in BNF: @, which simply
means that the symbol can be removed. If you replace a symbol by @ you do it by just
removing the symbol. This is useful because in some cases it is difficult to end the
replacement process without using this trick. So, the language described by a grammar is the
set of all strings you can produce with the production rules. If a string cannot in any way be
produced by using the rules the string is not allowed in the language. BNF and context free
grammars was nearly identical.

Key features of Backus-Naur Form


• Non-Terminals: defined by a production rule
• Terminals: These are the basic components of the language being defined, e.g.
symbols, keywords, variable identifiers, etc. in the language being defined
• Production Rule: Each production rule has a non-terminal symbol on the left-hand
side, and the right-hand side may contain nonterminal or terminal symbols, possibly
in specified sequences.
• Start Symbol: A `top-level' non- terminal symbol which stands for the ‘legal
expressions' in the language.

Here is an example of a grammar:


<identifier> ::= <letter>
| <identifier> <digit>
| <identifier> <letter>
<letter> ::= a|b|c|d| … |x|y|z
<digit> ::= 0|1|2|3|4|5|6|7|8|9
The essential features of the BNF formalism are:
1. Angle brackets. These signify non-terminal symbols.
2. The symbol: = which is read `is defined as'.

8
3. The symbol | which means ‘or'.
4. The idea of a production rule.
5. A terminal symbol: anything not enclosed in angle brackets.

Check your progress-2


1. Describe what Backus Naur Form means in formal languages?

_________________________________________________________________________________________________
________________________________________________________________________________________________

1.5. SUMMARY

In logic, syntax is anything having to do with formal languages or formal systems without
regard to any interpretation or meaning given to them. Syntax is concerned with the rules
used for constructing, or transforming the symbols and words of a language, as contrasted
with the semantics of a language which is concerned with its meaning.

There are two relationships involving the semantics and the syntax:
• one which ensures that each semantic element (meaningful thing) has at least one
syntactic representation
• one which ensures that each syntactic representation has a unique meaning.

Backus-Naur / Backus-Normal Form (BNF) is a meta language, by meta language, we mean


a language used to define another language. Using BNF to express a language, we can clearly
identify which constructs are legal in a language and which are not.

9
1.6. MODEL EXAMINATION QUESTIONS.

I: True/False questions
1. The Backus-Naur Form is a way of defining syntax.
2. The term semantics refers to the rules governing the composition of well-formed
expressions in a programming language.
3. A formal language is an abstraction of the general characteristics of a
programming languages.
4. Grammars are set of rules for generating strings.

II: Short Answer Questions


1. Generate a syntax tree for the following expression?
a. A = B * C + D
2. Explain the difference of syntax and semantics in programming languages?
3. What are the advantages of BNF in formal languages?

10

You might also like