Programming Language Principles
Programming Language Principles
St.PETER’SENGINEERINGCOLLEGE
                             (UGC-Autonomous)
              Affiliated to JNTUH, Approved by AICTE, Accredited by NBA &NAAC with “A” Grade
             Maisammaguda Village, Dhulapally , Opp.to T.S. Forest Academy, Hyderabad-500100
                                   Departmentof
                                        CSE
LECTURENOTES
                                                     ON
       PRINCIPLES OF PROGRAMMING
                LANGUAGE
III-I
                                       Prepared by
                         Mr.D,Harith Reddy/J.Sagar Babu
SPEC                                                                                           Page 1
    Principles Of Programming Language
                                                       UNIT-II
        Names, Bindings, and Scopes: Introduction, names, variables, concept of binding,
        scope, scope and lifetime, referencing environments, named constants.
         Data types: Introduction, primitive, character, string types, user defined ordinal types,
        array, associative arrays, record, tuple types, list types, union types, pointer and
        reference types, type checking, strong typing, type equivalence
        Expressions and Statements: Arithmetic expressions, overloaded operators, type
        conversions, relational and boolean expressions, short- circuit evaluation, assignment
        statements, mixed-mode assignment
        Control Structures: introduction, selection                    statements,      iterative    statements,
        unconditional branching, guarded commands.
 Names, Bindings, and Scopes:
Introduction
            Imperative languages are abstractions of von Neumann architecture.
Names
        The term identifier is often used interchangeably with name.
        Design issues for names:
            Are names case sensitive?
            Are special words reserved words or keywords?
        Name Forms
            A name is a string of characters used to identify some entity in a program.
            If too short, they cannot be connotative
            Language examples:
                    FORTRAN I: maximum 6
                    COBOL: maximum 30
                    FORTRAN 90 and ANSI C: maximum 31
                    Ada and Java: no limit, and all are significant
                    C++: no limit, but implementers often impose one b/c they do not want the symbol
                       table in which identifiers are stored during compilation to be too large. Also, to simplify
                       the maintenance of that table.
    SPEC                                                                                                        Page 2
Principles Of Programming Language
        Names in most programming languages have the same form: a letter followed by a string
         consisting of letters, digits, and (_).
SPEC                                                                                             Page 3
Principles Of Programming Language
        Although the use of the _ was widely used in the 70s and 80s, that practice is far less popular.
        C-based languages, replaced the _ by the “camel” notation, as in myStack.
        Prior to Fortran 90, the following two names are equivalent:
         Sum Of Salaries         // names could have embedded
         spaces
          SumOfSalaries          // which were ignored
        Case sensitivity
             Disadvantage : readability (names that look alike are different)
                    • worse in C++ and Java because predefined names are mixed case (e.g.
                        IndexOutOfBoundsException)
                    • In C, however, exclusive use of lowercase for names.
             C, C++, and Java names are case sensitive  rose, Rose, ROSE
                are distinct names “What about Readability”
              Special words
        An aid to readability; used to delimit or separate statement clauses
        A keyword is a word that is special only in certain contexts.
       Exanple : Fortran
          Real Apple             // if found at the beginning and followed by a name, it is a declarative
                                 statement
        Disadvantage: poor readability. Compilers and users must recognize the difference.
        A reserved word is a special word that cannot be used as a user-defined name.
        As a language design choice, reserved words are better than keywords.
       Example : Fortran
                         Integer Real
                         Real Integer
          Variables
        A variable is an abstraction of a memory cell(s).
        Variables can be characterized as a sextuple of
         attributes: (name, address, value, type, lifetime, and
         scope)
Name
SPEC                                                                                                        Page 4
Principles Of Programming Language
       different calls may result in that var having different addresses.
SPEC                                                                        Page 5
Principles Of Programming Language
  Type
      Determines the range of values of variables and the set of operations that are defined for values
        of that type; in the case of floating point, type also determines the precision.
  Value
      The contents of the location with which the variable is associated.
      Abstract memory cell - the physical cell or collection of cells associated with a variable.
  Aliases
          If two variable names can be used to access the same memory location, they are called aliases
          Aliases are harmful to readability (program readers must remember all of them)
          How aliases can be created?
          Pointers, reference variables, C and C++ unions.
          FORTRAN Some of the original justifications for aliases are no longer valid.
           Concept Of Binding
        A binding is an association between an attribute and an entity, such as between a variable and
         its type or value, or between an operation and a symbol.
        The time at which a binding takes place is called binding time.
        Binding and binding times are prominent concepts in the semantics of programming languages.
        Bindings can take place at language design time, language implementation time, compile time,
         load time, link time, or run time.
  Example: the asterisk symbol (*) is usually bound to the multiplication operation at language design
  time.
  Some of the bindings and their binding times for the parts of this assignment statement are as follows:
SPEC                                                                                                   Page 6
Principles Of Programming Language
  Types can be specified statically through some form of explicit or implicit declaration.
SPEC                                                                                         Page 7
Principles Of Programming Language
SPEC                                                                                                         Page 8
Principles Of Programming Language
        It is not possible to create machine code instructions whose operand types are not known at
         compile time.
SPEC                                                                                                   Page 9
Principles Of Programming Language
      Pure interpretation typically takes at least ten times as long as to execute equivalent machine
       code.
      Type error detection by the compiler is difficult b/c any var can be assigned a value of any
       type.
      Incorrect types of right sides of assignments are not detected as errors; rather, the type of the
       left side is simply changed to the incorrect type.
  Example:
                        i, x    Integer
                        y        floating-point array
                        i=x      what the user meant to type
                        i=y      what the user typed instead
  No error is detected by the compiler or run-time system. i is simply changed to a floating-point array
  type. Hence, the result is erroneous. In a static type binding language, the compiler would detect the
  error and the program would not get to execution.
  Disadvantages:
      Inefficient, because all attributes are dynamic “run-time.”
      Loss of error detection by the compiler.
  Storage Bindings & Lifetime
  Static Variables: Bound to memory cells before execution begins and remains bound to the same
  memory cell throughout execution.
  Example: all FORTRAN 77 variables, C static variables.
  Advantages:
       Efficiency: (direct addressing): All addressing of static vars can be direct. No run-time
         overhead is incurred for allocating and deallocating vars.
       History-sensitive: have vars retain their values between separate executions of the subprogram.
  Disadvantage:
       Lack of flexibility (no recursion) is supported
       Storage cannot be shared among variables.
  Example : if two large arrays are used by two subprograms, which are never active at the same time,
  they cannot share the same storage for their arrays.
  Stack-dynamic Variables: Storage bindings are created for variables when their declaration
  statements are elaborated, but whose types are statically bound.
  Elaboration of such a declaration refers to the storage allocation and binding process indicated by the
  declaration, which takes place when execution reaches the code to which the declaration is attached.
SPEC                                                                                                  Page 10
Principles Of Programming Language
  Example : The variable declarations that appear at the beginning of a Java method are elaborated
  when the method is invoked and the variables defined by those declarations are deallocated when the
  method completes its execution.
  Advantages:
     Allows recursion: each active copy of the recursive subprogram has its own version of the
       local variables.
     In the absence of recursion it conserves storage b/c all subprograms share the same memory
       space for their locals.
  Disadvantages:
      Overhead of allocation and deallocation.
      Subprograms cannot be history sensitive.
      Inefficient references (indirect addressing) is required b/c the place in the stack where a
        particular var will reside can only be determined during execution.
      In Java, C++, and C#, variables defined in methods are by default stack-dynamic.
int *intnode;
                  …
                  intnode = new int; // allocates an int cell
                  …
                  delete intnode;        // deallocates the cell to which
                                         // intnode points
SPEC                                                                                                  Page 11
Principles Of Programming Language
SPEC                                                                Page 12
Principles Of Programming Language
         Java objects are explicitly heap-dynamic and are accessed through reference vars.
         Java uses implicit garbage collection.
         Explicit heap-dynamic vars are used for dynamic structures, such as linked lists and trees that
          need to grow and shrink during execution.
  Advantage:
      Provides for dynamic storage management.
  Disadvantage:
      Inefficient “Cost of allocation and deallocation” and unreliable “difficulty of using pointer
        and reference variables correctly”
Scope
      Scope
             The scope of a var is the range of statements in which the var is visible.
             A var is visible in a statement if it can be referenced in that statement.
             Local var is local in a program unit or block if it is declared there.
             Non-local var of a program unit or block are those that are visible within the program unit or block but
              are not declared there.
                   Static Scope
         Binding names to non-local vars is called static scoping.
         There are two categories of static scoped languages:
              Nested Subprograms.
              Subprograms that can’t be nested.
       Ada, and JavaScript allow nested subprogram, but the C-based languages do not.
       When a compiler for static-scoped language finds a reference to a var, the attributes of the var
        are determined by finding the statement in which it was declared.
       Ex: Suppose a reference is made to a var x in subprogram Sub1. The correct declaration is found
        by first searching the declarations of subprogram Sub1.
       If no declaration is found for the var there, the search continues in the declarations of the
        subprogram that declared subprogram Sub1, which is called its static parent.
       If a declaration of x is not found there, the search continues to the next larger enclosing unit (the
        unit that declared Sub1’s parent), and so forth, until a declaration for x is found or the largest unit’s
        declarations have been searched without success.  an undeclared var error has been detected.
       The static parent of subprogram Sub1, and its static parent, and so forth up to and including the
        main program, are called the static ancestors of Sub1.
SPEC                                                                                                              Page 13
Principles Of Programming Language
           end;            -- of Sub1
       Procedure Sub2 is
           X Integer;
           Begin           -- of Sub2
           …X…
           end;            -- of Sub2
       Begin               -- of Big
       …
       end;                -- of Big
    Under static scoping, the reference to the var X in Sub1 is to the X declared in the procedure Big.
    This is true b/c the search for X begins in the procedure in which the reference occurs, Sub1, but no
     declaration for X is found there.
   The search thus continues in the static parent of Sub1, Big, where the declaration of X is found.
  Example : Skeletal C#
  void sub()
  {
       int count;
       …
       while (…)
       {
           int count;
           count ++;
           …
       }
       …}
SPEC                                                                                                 Page 14
Principles Of Programming Language
    The reference to count in the while loop is to that loop’s local count. The count of sub is hidden
     from the code inside the while loop.
   A declaration for a var effectively hides any declaration of a var with the same name in a
     larger enclosing scope.
   C++ and Ada allow access to these "hidden" variables
   In Ada: Main.X
   In C++:
              class_name::nam
              e Blocks
   Allows a section of code to have its own local vars whose scope is minimized.
   Such vars are stack dynamic, so they have their storage allocated when the section is entered
     and deallocated when the section is exited.
   From ALGOL 60:
  Example:
       C and C++:
       for (...)
       {
             int index;
             ...
       }
       Ada:
                          A calls C and D
                          B calls A and E
SPEC                                                                                                 Page 15
Principles Of Programming Language
        The program contains an overall scope for main, with two procedures that defined scopes inside
         main, A, and b.
        Inside A are scopes for the procedures C and D.
        Inside B is the scope for the procedure E.
        It is convenient to view
       Fig 2.1 : The structure of the program as a tree in which each node represents a procedure and
                                                thus a scope.
SPEC                                                                                                  Page 16
Principles Of Programming Language
  Example:
  Procedure Big is
       X : Integer;
       Procedure Sub1 is
        Begin              -- of Sub1
        …X…
        end;               -- of Sub1
       Procedure Sub2 is
        X Integer;
SPEC                                                                                                Page 17
Principles Of Programming Language
       Begin    -- of Sub2
SPEC                                 Page 18
Principles Of Programming Language
        …X…
        end;                   -- of Sub2
       Begin                   -- of Big
       …
       end;                    -- of Big
             The search proceeds from the local procedure, Sub1, to its caller, Sub2, where a declaration of X
              is found.
             Big calls Sub1
             The dynamic parent of Sub1 is Big. The reference is to the X in Big.
              {
              int sum;
              …
              printheader();
              }       /* end of compute */
SPEC                                                                                                              Page 19
Principles Of Programming Language
       visible variables in all active subprograms.
SPEC                                                  Page 20
Principles Of Programming Language
          procedure Example is
              A, B : Integer;
              …
              procedure Sub1 is
               X, Y : Integer;
               begin       -- of Sub1
               …                         1
              end          -- of Sub1
              procedure Sub2 is
               X : Integer;
               …
               procedure Sub3 is
                  X : Integer;
                  begin    -- of Sub3
                  …                      2
                  end;     -- of Sub3
              begin        -- of Sub2
              …                          3
           end; { Sub2}
          begin
          …                                     4
          end;             {Example}
 Consider the following program; assume that the only function calls are the
SPEC                                                                                     Page 21
Principles Of Programming Language
       following: main calls sub2, which calls sub1
SPEC                                                  Page 22
Principles Of Programming Language
          void sub1()
          {
              int a, b;
              …                      1
          }           /* end of sub1 */
          void sub2()
          {
              int b, c;
              …                      2
              sub1;
          }           /* end of sub2 */
          void main ()
          {
              int c, d;
              …                      3
              sub2();
          }           /* end of main */
        The referencing environments of the indicated program points are as follows:
Named Constants
        It is a var that is bound to a value only at the time it is bound to storage; its value can’t be
         change by assignment or by an input statement.
SPEC                                                                                                        Page 23
    Principles Of Programming Language
 Data Types:
Introduction
          A data type defines a collection of data objects and a set of predefined operations on those objects.
          Computer programs produce results by manipulating data.
          ALGOL 68 provided a few basic types and a few flexible structure-defining operators that allow a
           programmer to design a data structure for each need.
          A descriptor is the collection of the attributes of a variable.
          In an implementation a descriptor is a collection of memory cells that store variable attributes.
          If the attributes are static, descriptor are required only at compile time.
          They are built by the compiler, usually as a part of the symbol table, and are used
           during compilation.
          For dynamic attributes, part or all of the descriptor must be maintained during execution.
          Descriptors are used for type checking and by allocation and deallocation operations.
    Primitive
          Those not defined in terms of other data types.
          The primitive data types of a language, along with one or more type constructors provide
           structured types.
       Numeric Types
       1. Integer
              Model real numbers, but only as approximations for most real values.
              On most computers, floating-point numbers are stored in binary, which exacerbates the
               problem.
              Another problem, is the loss of accuracy through arithmetic operations.
              Languages for scientific use support at least two floating-point types; sometimes more (e.g.
               float, and double.)
              The collection of values that can be represented by a floating-point type is defined in terms of
               precision and range.
              Precision: is the accuracy of the fractional part of a value, measured as the number of
               bits. Figure below shows single and double precision.
              Range: is the range of fractions and exponents.
    SPEC                                                                                                     Page 24
Principles Of Programming Language
                          Fig : Floating Point representation
SPEC                                                            Page 25
   Principles Of Programming Language
           3. Decimal
           –   Larger computers that are designed to support business applications have hardware support for
               decimal data types.
           –   Decimal types store a fixed number of decimal digits, with the decimal point at a fixed
               position in the value.
           –   Advantage: accuracy of decimal values.
           –   Disadvantages: limited range since no exponents are allowed, and its representation wastes
               memory.
                    Boolean Types
           –   Introduced by ALGOL 60.
           –   They are used to represent switched and flags in programs.
           –   The use of Booleans enhances readability.
                    Character Types
           –   Char types are stored as numeric codings (ASCII / Unicode).
   Character String Types
           –   A character string type is one in which values are sequences of characters.
           –   Important Design Issues:
               1. Is it a primitive type or just a special kind of array?
               2. Is the length of objects static or dynamic?
           –   C and C++ use char arrays to store char strings and provide a collection of string
               operations through a standard library whose header is string.h.
           –   How is the length of the char string decided?
           –   The null char which is represented with 0.
           –   Ex:
               char *str = “apples”; // char ptr points at the str apples0
        Operations:
              Assignment
              Comparison (=, >, etc.)
              Catenation
              Substring reference
              Pattern matching
(Ada)
           N := N1 & N2       (catenation)
           N(2..4)            (substring reference)
  Evaluation
      Aid to writability.
      As a primitive type with static length, they are inexpensive to provide--why not have them?
      Dynamic length is nice, but is it worth the expense?
              Implementation of Character String Types
      Static length - compile-time descriptor has three fields:
            1. Name of the type
            2. Type’s length
            3. Address of first char
      Limited dynamic length Strings - may need a run-time descriptor for length to store both
        the fixed maximum length and the current length(but not in C and C++)
      Dynamic length Strings–
             Need run-time descriptor b/c only current length needs to be stored.
             Allocation/deallocation is the biggest implementation problem. Storage to which it is
                bound must grow and shrink dynamically.
             There are two approaches to supporting allocation and deallocation:
            1. Strings can be stored in a linked list “Complexity of string operations, pointer chasing”
            2. Store strings in adjacent cells. “What about when a string grows?” Find a new area of
                memory and the old part is moved to this area. Allocation and deallocation is slower but
                using adjacent cells results in faster string operations and requires less storage.
SPEC                                                                                                Page 27
    Principles Of Programming Language
Enumeration Types
• All possible values, which are named constants, are provided in the definition
• C# example
Design issues
              Is an enumeration constant allowed to appear in more than one type definition, and if so, how is
               the type of an occurrence of that constant checked?
              Are enumeration values coerced to integer?
              Any other type coerced to an enumeration type?
       Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type
       variables in these languages are not coerced into integer types
Subrange Types
 Ada‘s design
Day1: Days;
Day2: Weekday;
    SPEC                                                                                                   Page 28
Principles Of Programming Language
  Day2 := Day1;
SPEC                                 Page 29
Principles Of Programming Language
Subrange Evaluation
 Aid to readability
Make it clear to the readers that variables of subrange can store only certain range of values
 Reliability
Assigning a value to a subrange variable that is outside the specified range is detected as an error
Array
      An array is an aggregate of homogeneous data elements in which an individual element is
       identified by its position in the aggregate, relative to the first element.
      A reference to an array element in a program often includes one or more non-constant subscripts.
      Such references require a run-time calculation to determine the memory location being referenced.
  Design Issues
  1.   What types are legal for subscripts?
  2.   Are subscripting expressions in element references range checked?
  3.   When are subscript ranges bound?
  4.   When does allocation take place?
  5.   What is the maximum number of subscripts?
  6.   Can array objects be initialized?
      Ex: Ada
       Sum := Sum + B(I);
       Because ( ) are used for both subprogram parameters and array subscripts in Ada, this results in
       reduced readability.
SPEC                                                                                                        Page 30
Principles Of Programming Language
SPEC                                                                 Page 31
Principles Of Programming Language
      Ada allows other types as subscripts, such as Boolean, char, and enumeration.
      Among contemporary languages, C, C++, Perl, and Fortran don’t specify range checking
       of subscripts, but Java, and C# do.
      The binding of subscript type to an array variable is usually static, but the subscript value ranges
       are sometimes dynamically bound.
      In C-based languages, the lower bound of all index ranges is fixed at zero.
  1. A static array is one in which the subscript ranges are statically bound and storage allocation is
     static.
         o Advantages: efficiency “No allocation & deallocation.”
  2. A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the
     allocation is done at elaboration time during execution.
         o Advantages: Space efficiency. A large array in one subprogram can use the same space as a
             large array in different subprograms.
  3. A stack-dynamic array is one in which the subscript ranges are dynamically bound, and the
     storage allocation is dynamic “during execution.” Once bound they remain fixed during the
     lifetime of the variable.
          o Advantages: Flexibility. The size of the array is known at bound time.
  4. A fixed heap-dynamic array is one in which the subscript ranges are dynamically bound, and the
     storage allocation is dynamic, but they are both fixed after storage is allocated.
     The bindings are done when the user program requests them, rather than at elaboration time, and the
     storage is allocated on the heap, rather than the stack.
  5. A heap-dynamic array is one in which the subscript ranges are dynamically bound, and the
     storage allocation is dynamic, and can change any number of times during the array’s
     lifetime.
          o Advantages: Flexibility. Arrays can grow and shrink during program execution as the need
             for space changes.
   Examples of the five categories:
       Arrays declared in C & C++ function that includes the static modifier are static.
       Arrays declared in C & C++ function without the static modifier are fixed stack-dynamic arrays.
       Ada arrays can be stack dynamic:
       Get (List_Len);
       declare
         List : array (1..List_Len) of Integer;
         begin
          ...
SPEC                                                                                                      Page 32
Principles Of Programming Language
       end;
SPEC                                 Page 33
Principles Of Programming Language
       The user inputs the number of desired elements for array List. The elements are then dynamically
       allocated when execution reaches the declare block.
       When execution reaches the end of the block, the array is deallocated.
       C & C++ also provide fixed heap-dynamic arrays. The function malloc and free are used in C.
       The operations new and delete are used in C++.
       In Java all arrays are fixed heap dynamic arrays. Once created, they keep the same subscript
       ranges and storage.
       C# provides heap-dynamic arrays using an array class ArrayList.
       ArrayList intList = new ArrayList( );
       Elements are added to this object with the Add method, as in intArray.Add(nextOne);
  Array Initialization
      Usually just a list of values that are put in the array in the order in which the array elements are
       stored in memory.
      Fortran uses the DATA statement, or put the values in / ... / on the declaration.
      The array will have 8 elements because the null character is implicitly included by the compiler.
      In Java, the syntax to define and initialize an array of references to String objects.
SPEC                                                                                                          Page 34
Principles Of Programming Language
      Pascal does not allow array initialization.
SPEC                                                 Page 35
Principles Of Programming Language
Associative Arrays
  An associative array is an unordered collection of data elements that are indexed by an equal number of
  values called keys.
  Design Issues:
       1. What is th eform of references to elements? 2. Is the size static or dynamic?
       2. Is the size static or dynamic?
           Structure and Operations in Perl - Names begin with %
           Literals are delimited by parentheses
Record
  A record is a possibly heterogeneous aggregate of data elements in which the individual elements are
  identified by names
   Design Issues:
  1. What is the form of references?
  2. What unit operations are defined?
  Record Operations
  1. Assignment
SPEC                                                                                                Page 36
Principles Of Programming Language
          In Ada, = and /=; one operand can be an aggregate constant
SPEC                                                                    Page 37
Principles Of Programming Language
4. MOVE CORRESPONDING
          In COBOL : it moves all fields in the source record to fields with the same names in the
  destination record
Tuple Types
          A tuple is a data type that is similar to a record, except that the elements are not named.
          Python includes an immutable tuple type.
          If a tuple needs to be changed, it can be converted to an array with the list function.
          After the change, it can be converted back to a tuple with the tuple function.
          One use of tuples is when an array must be write protected, such as when it is sent as a
           parameter to an external function and the user does not want the function to be able to modify
           the parameter.
          Python’s tuples are closely related to its lists, except that tuples are immutable. A tuple is
           created by assigning a tuple literal, as in the following
  EX:
  myTuple = (3, 5.8, 'apple')
  myTuple[1]
          This references the first element of the tuple, because tuple indexing begins at 1.
          Tuples can be catenated with the plus (+) operator. They can be deleted with the del statement.
          There are also other operators and functions that operate on tuples.
SPEC                                                                                                     Page 38
Principles Of Programming Language
          A tuple pattern is simply a sequence of names, one for each element of the tuple, with or
           without the delimiting parentheses.
          When a tuple pattern is the left side of a let construct, it is a multiple assignment.
   EX:
  let tup = (3, 5, 7);;
  let a, b, c = tup;;
  This assigns 3 to a, 5 to b, and 7 to c.
Tuples are used in Python, ML, and F# to allow functions to return multiple values.
  List Types
  Lists were first supported in the first functional programming language, LISP. They have always been
  part of the functional languages, but in recent years they have found their way into some imperative
  languages. Lists in Scheme and Common LISP are delimited by parentheses and the elements are not
  separated by any punctuation.
  EX:
  List     (A B C D)
  Nested lists       (A (B C) D)
  In this list, (B C) is a list nested inside the outer list. Data and code have the same syntactic form in
  LISP and its descendants. If the list (A B C) is interpreted as code, it is a call to the function A with
  parameters B and C.
  The CAR function returns the first element of its list parameter.
   (CAR '(A B C))
                    This call to CAR returns A.
  The CDR function returns its parameter list minus its first element.
   (CDR '(A B C))
                    This function call returns the list (B C).
        In Scheme and Common LISP, new lists are constructed with the CONS and LIST functions.
  The function CONS takes two parameters and returns a new list with its first parameter as the first
  element and its second parameter as the remainder of that list.
  (CONS 'A '(B C))
                    This call returns the new list (A B C).
  The LIST function takes any number of parameters and returns a new list with the parameters as its
  elements.
SPEC                                                                                                   Page 39
Principles Of Programming Language
             This call returns the new list (A B (C D)).
SPEC                                                        Page 40
Principles Of Programming Language
  ML has lists and list operations, although their appearance is not like those of Scheme.
  Lists are specified in square brackets, with the elements separated by commas.
  [5, 7, 9]
  [] is the empty list, which could also be specified with nil.
  The Scheme CONS function is implemented as a binary infix operator in ML, represented as ::. For
  example,
  3 :: [5, 7, 9]
                      Returns the following new list: [3, 5, 7, 9].
  The elements of a list must be of the same type, so the following list would be illegal:
  [5, 7.3, 9]
  ML has functions that correspond to Scheme’s CAR and CDR, named hd (head) and tl (tail).
  For example,
  hd [5, 7, 9] is 5
tl [5, 7, 9] is [7, 9]
SPEC                                                                                                      Page 41
Principles Of Programming Language
  Union Types
  A union is a type whose variables are allowed to store different type values at different times during
  execution
  Design Issues for unions:
  1. What kind of type checking, if any, must be done?
  2. Should unions be integrated with
  records? Examples:
        1. FORTRAN - with EQUIVALENCE
        2. Algol 68
                   discriminated unions
                    Use a hidden tag to maintain the current type
                   Tag is implicitly set by assignment
                   References are legal only in conformity clauses
                   This runtime type selection is a safe method of accessing union objects
        3. Pascal
                   both discriminated and non discriminated unions
                    type intreal = record tagg : Boolean of true : (blint : integer); false : (blreal : real); end;
  Problem with Pascal’s design: Type checking is ineffective
  Reasons:
        a. User can create inconsistent unions (because the tag can be individually assigned) var blurb :
           intreal; x : real;
           blurb.tagg := true; { it is an integer }
            blurb.blint := 47; { ok }
            blurb.tagg := false; { it is a real }
           x := blurb.blreal; { assigns an integer to a real }
   b.    The tag is optional! - Now, only the declaration and the second and last assignments are
  required to cause trouble
   4.       Ada : Discriminated unions
j = *ptr
sets j to 206.
Pointer Problems
SPEC                                                                                                      Page 43
Principles Of Programming Language
            a change to the location to cause the storage manager to fail.
SPEC                                                                         Page 44
Principles Of Programming Language
        A heap-dynamic variable that is no longer referenced by any program pointer “no longer
         accessible by the user program.”
        Such variables are often called garbage b/c they are not useful for their original purpose,
         and also they can’t be reallocated for some new use by the program.
        Creating Lost Heap-Dynamic Variables:
         a) Pointer p1 is set to point to a newly created heap-dynamic variable.
         b) p1 is later set to point to another newly created heap-dynamic variable.
         c) The first heap-dynamic variable is now inaccessible, or lost.
        The process of losing heap-dynamic variables is called memory leakage.
SPEC                                                                                                     Page 45
Principles Of Programming Language
  Type Checking
      Type checking is the activity of ensuring that the operands of an operator are of compatible types.
      A compatible type is one that is either legal for the operator, or is allowed under language rules to
       be implicitly converted, by compiler-generated code, to a legal type.
      This automatic conversion is called a coercion.
      Ex: an int var and a float var are added in Java, the value of the int var is coerced to float and a
       floating-point is performed.
      A type error is the application of an operator to an operand of an inappropriate type.
      Ex: in C, if an int value was passed to a function that expected a float value, a type error would
       occur (compilers didn’t check the types of parameters)
      If all type bindings are static, nearly all type checking can be static.
      If type bindings are dynamic, type checking must be dynamic and done at run-time.
  Strong Typing
      A programming language is strongly typed if type errors are always detected. It requires that the
       types of all operands can be determined, either at compile time or run time.
      Advantage of strong typing: allows the detection of the misuses of variables that result in type
       errors.
      Java and C# are strongly typed. Types can be explicitly cast, which would result in type error.
       However, there are no implicit ways type errors can go undetected.
      The coercion rules of a language have an important effect on the value of type checking.
      Coercion results in a loss of part of the reason of strong typing – error detection.
      Ex:
       int a, b;
       float d;
       a + d;     // the programmer meant a + b, however
  –    The compiler would not detect this error. Var a would be coerced to float.
       Type Equivalence
          A major question involved in the application of types to type checking has to do with type
           equivalence: When are two types the same? One way of trying to answer this question is to
           compare the sets of values simply as sets. Two sets are the same if they contain the same values.
  EX:
  Any type defined as the Cartesian product A X B is the same as any other type defined in the same
  way. On the other hand, if we assume that type B is not the same as type A, then a type defined as A X
  B is not the same as a type defined as B X A, since A X B contains the pairs (a, b) but B X A contains
  the pairs (b, a).
  This view of type equivalence is that two data types are the same if they have the same structure: They
  are built in exactly the same way using the same type constructors from the same simple types. This
  form of type equivalence is called structural equivalence and is one of the principal forms of type
  equivalence in programming languages.
SPEC                                                                                                     Page 46
Principles Of Programming Language
   EX:
  struct Rec1{
  char x;
  int y;
  char z[10];
  };
  struct Rec2{
  char x;
  int y;
  char z[10];
  };
  struct Rec3{
  int y;
  char x;
  };
           Structural equivalence is relatively easy to implement. What’s more, it provides all the
            information needed to perform error checking and storage allocation.
           It is used in such languages as Algol60, Algol68, FORTRAN, COBOL, and a few modern
            languages such as Modula-3.
           It is also used selectively in such languages as C, Java, and ML,
           To check structural equivalence, a translator may represent types as trees and check equivalence
            recursively on subtrees . Questions still arise, however, in determining how much information
            is included in a type under the application of a type constructor.
  EX:
  the two types A1 and A2 defined by
  typedef int A1[10];
  typedef int A2[20];
  structurally equivalent? Perhaps yes, if the size of the index set is not part of an array type; otherwise,
  no. A similar question arises with respect to member names of structures. If structures are taken to be
  justCartesian products.
SPEC                                                                                                    Page 47
Principles Of Programming Language
  EX:
  struct RecA{
  char x;
  int y;
  };
  and
  struct RecB{
  char a;
  int b;
  };
  Expressions And Statements:
      Introduction
Arithmetic Expressions
       Arithmetic Expressions
       Their evaluation was one of the motivations for the development of the first programming
        languages.
       Most of the characteristics of arithmetic expressions in programming languages were inherited from
        conventions that had evolved in math.
       Arithmetic expressions consist of operators, operands, parentheses, and function calls.
       The operators can be unary, or binary. C-based languages include a ternary operator.
       The purpose of an arithmetic expression is to specify an arithmetic computation.
       An implementation of such a computation must cause two actions:
            o Fetching the operands from memory
            o Executing the arithmetic operations on those operands.
       Design issues for arithmetic expressions:
        1. What are the operator precedence rules?
        2. What are the operator associativity rules?
        3. What is the order of operand evaluation?
        4. Are there restrictions on operand evaluation side effects?
        5. Does the language allow user-defined operator overloading?
        6. What mode mixing is allowed in expressions?
SPEC                                                                                                 Page 48
Principles Of Programming Language
  1. Precedence
  1. The operator precedence rules for expression evaluation define the order in which “adjacent”
     operators of different precedence levels are evaluated (“adjacent” means they are separated by
     at most one operand).
  2. Typical precedence levels:
             1. parentheses
             2. unary operators
             3. ** (if the language supports
             it) 4. *, /
             5. +, -
  3. Many languages also include unary versions of addition and subtraction.
  4. Unary addition is called the identity operator because it usually has no associated operation and
     thus has no effect on its operand.
  5. In Java, unary plus actually does have an effect when its operand is short or byte. An implicit
     conversion of short and byte operands to int type takes place.
  6. Ex:
             A + (- B) * C                 // is legal
             A+-B*C                        // is illegal
  2. Associativity
         The operator associativity rules for expression evaluation define the order in which adjacent
            operators with the same precedence level are evaluated. “An operator can be either left or
            right associative.”
         Typical associativity rules:
                o Left to right, except **, which is right to left
                o Sometimes unary operators associate right to left (e.g., FORTRAN)
         Ex: (Java)
 Ex: (Fortran)
A ** B ** C // right to left
SPEC                                                                                                  Page 49
Principles Of Programming Language
             APL is different; all operators have equal precedence and all operators associate right to left.
             Ex:
              AXB+C              // A = 3, B = 4, C = 5  27
         Precedence and associativity rules can be overridden with parentheses.
  Parentheses
             Programmers can alter the precedence and associativity rules by placing parentheses
              in expressions.
             A parenthesized part of an expression has precedence over its adjacent un-
              parenthesized parts.
             Ex:
              (A + B) * C
  3. Conditional Expressions
         Sometimes if-then-else statements are used to perform a conditional expression assignment.
         if (count == 0)
             average = 0;
         else
             average = sum / count;
             In the C-based languages, this can be specified more conveniently in an assignment
              statement using a conditional expressions
              average = (count == 0) ? 0 : sum / count;
               Operand evaluation order
             The process:
              1. Variables: just fetch the value from memory.
              2. Constants: sometimes a fetch from memory; sometimes the constant is in the machine
              language instruction.
              3. Parenthesized expressions: evaluate all operands and operators first.
SPEC                                                                                                     Page 50
Principles Of Programming Language
  1. Side Effects
         A side effect of a function, called a functional side effect, occurs when the
            function changes either one of its parameters or a global variable.
         Ex:
         a + fun(a)
            If fun does not have the side effect of changing a, then the order of evaluation of the
             two operands, a and fun(a), has no effect on the value of the expression.
            However, if fun changes a, there is an effect.
            Ex:
             Consider the following situation: fun returns the value of its argument
             divided by 2 and changes its parameter to have the value 20, and:
             a = 10;
             b = a + fun(a);
            If the value of a is returned first (in the expression evaluation process), its value is 10 and
             the value of the expression is 15.
            But if the second is evaluated first, then the value of the first operand is 20 and the value of
             the expression is 25.
            The following shows a C program which illustrate the same problem.
             int a = 5;
             int fun1() {
                 a = 17;
                 return 3;
             }
             void fun2() {
                 a = a + fun1();
             }
             void main() {
                 fun2();
             }
            The value computed for a in fun2 depends on the order of evaluation of the operands in the
             expression a + fun1(). The value of a will be either 8 or 20.
            Two possible solutions:
             1. Write the language definition to disallow functional side effects
                o No two-way parameters in functions.
                o No non-local references in functions.
                o Advantage: it works!
SPEC                                                                                                      Page 51
Principles Of Programming Language
Java guarantees that operands are evaluated in left-to-right order, eliminating this problem.
           Overloaded Operators
          The use of an operator for more than one purpose is operator overloading.
          Some are common (e.g., + for int and float).
          Java uses + for addition and for string catenation.
          Some are potential trouble (e.g., & in C and C++)
Type Conversions
       A narrowing conversion is one that converts an object to a type that cannot include all of the
        values of the original type e.g., float to int.
      A widening conversion is one in which an object is converted to a type that can include at least
        approximations to all of the values of the original type e.g., int to float.
  Coercion in Expressions
      A mixed-mode expression is one that has operands of different types.
      A coercion is an implicit type conversion.
      The disadvantage of coercions:
             They decrease in the type error detection ability of the compiler
      In most languages, all numeric types are coerced in expressions, using widening conversions
      Language are not in agreement on the issue of coercions in arithmetic expressions.
      Those against a broad range of coercions are concerned with the reliability problems that
        can result from such coercions, because they eliminate the benefits of type checking.
      Those who would rather include a wide range of coercions are more concerned with the loss in
        flexibility that results from restrictions.
      The issue is whether programmers should be concerned with this category of errors or whether
        the compiler should detect them.
SPEC                                                                                                       Page 52
Principles Of Programming Language
           Ex:
           void mymethod() {
               int a, b, c;
               float d;
               …
               a = b * d;
               …
           }
  Example
  :
     Ada:            FLOAT(INDEX)--INDEX is INTEGER type
       Java:
                     (int)speed /*speed is float type*/
  Errors in Expressions
      Caused by:
            – Inherent limitations of arithmetic           e.g. division by zero
            – Limitations of computer arithmetic           e.g. overflow or underflow
      Such errors are often ignored by the run-time system.
SPEC                                                                                                     Page 53
Principles Of Programming Language
  Greater than or
  equal                >=            >=   .GE. or >=
  Less than or equal   <=            <=   .LE. or >=
SPEC                                                   Page 54
  Boolean Expressions
     If the programmer assumed b would change every time this expression is evaluated during
      execution, the program will fail.
     C, C++, and Java: use short-circuit evaluation for the usual Boolean operators (&& and ||), but also
      provide bitwise Boolean operators that are not short circuit (& and |)
Assignment Statements
     This design treats the assignment operator much like any other binary operator, except that it has
      the side effect of changing its left operand.
     Ex:
           while ((ch = getchar())!=EOF)
                {…}                                 // why ( ) around assignment?
          The assignment statement must be parenthesized b/c the precedence of the assignment operator is
           lower than that of the relational operators.
       Disadvantage:
               o Another kind of expression side effect which leads to expressions that are difficult to read
                  and understand.
        There is a loss of error detection in the C design of the assignment operation that frequently leads to
          program errors.
          if (x = y) …
           instead of
           if (x == y) …
       Mixed-Mode Assignment
          In FORTRAN, C, and C++, any numeric value can be assigned to any numeric scalar
           variable; whatever conversion is necessary is done.
          In Pascal, integers can be assigned to reals, but reals cannot be assigned to integers (the
           programmer must specify whether the conversion from real to integer is truncated or rounded.)
          In Java, only widening assignment coercions are done.
          In Ada, there is no assignment coercion.
          In all languages that allow mixed-mode assignment, the coercion takes place only after the
           right side expression has been evaluated.
 Control Structures
Introduction
          A control structure is a control statement and the statements whose execution it controls.
          Overall Design Question:
              o What control statements should a language have, beyond selection and pretest logical loops?
    Selection Statements
          A selection statement provides the means of choosing between two or more paths of execution.
          Two general categories:
               o Two-way selectors
               o Multiple-way selectors
                    Two-Way Selection Statements
          The general form of a two-way selector is as follows:
           if control_expression
               then clause
               else clause
1. Design issues
       What is the form and type of the control expression?
       How are the then and else clauses specified?
       How should the meaning of nested selectors be specified?
2. The control Expression
    Control expressions are specified in parenthesis if the then reserved word is not used to
      introduce the then clause, as in the C-based languages.
    In C89, which did not have a Boolean data type, arithmetic expressions were used as control
      expressions.
    In contemporary languages, such as Java and C#, only Boolean expressions can be used
      for control expressions.
    Prior to FORTRAN95 IF:
      IF (boolean_expr) statement
    Problem: can select only a single statement; to select more, a GOTO must be used, as in the
     following example
Clause Form
    In most contemporary languages, the then and else clauses either appear as single statements or
     compound statements.
    C-based languages use braces to form compound statements.
    In Ada the last clause in a selection construct is terminated with end and if.
Nesting Selectors
    In Java and contemporary languages, the static semantics of the language specify that the else
     clause is always paired with the nearest unpaired then clause.
if (sum == 0)
         if (count == 0)
                    result = 0;
       else
                    result = 1;
             else
             result = 1;
    C, C++, and C# have the same problem as Java with selection statement nesting.
           Multiple Selection Constructs
    The multiple selection construct allows the selection of one of any number of statements or
      statement groups.
Design Issues
   {
       case constant_expression_1 : statement_1;
       ...
       case constant_expression_n : statement_n;
       [default: statement_n+1]
   }
            The control expression and the constant expressions are integer type.
            The switch construct does not provide implicit branches at the end of those code segments.
            Selectable segments can be statement sequences, blocks, or compound statements.
            Any number of segments can be executed in one execution of the construct (a trade-off
             between reliability and flexibility—convenience.)
            To avoid it, the programmer must supply a break statement for each segment.
            default clause is for unrepresented values (if there is no default, the whole statement does
             nothing.)
            C# switch statement rule disallows the implicit execution of more than one segment.
            The rule is that every selectable segment must end with an explicit unconditional branch
             statement, either a break, which transfers control out of the switch construct, or a goto, which
             can transfer control to on of the selectable segments.
switch (value) {
       case -1: Negatives++;
                 break;
       case 0: Positives++;
                 goto case 1;
       case 1: Positives ++;
       default: Console.WriteLine(“Error in switch \n”);
   }
    Bad aspects:
         o Not encapsulated (selectable segments could be anywhere)
         o Segments require GOTOs
         o FORTRAN computed GOTO and assigned GOTO
    To alleviate the poor readability of deeply nested two-way selectors, Ada has been
     extended specifically for this use.
         else
              if Count < 100 then
               Bag2 := True;
         else
               if Count < 1000 then
                 Bag3 := True;
               end if;
              end if;
         end if;
       The elsif version is the more readable of the two.
  This example is not easily simulated with a case statement, b/c each selectable statement is chosen on
  the basis of a Boolean expression.
Iterative Statements
     The repeated execution of a statement or compound statement is accomplished by iteration
      zero, one, or more times.
     Iteration is the very essence of the power of computer.
     The repeated execution of a statement is often accomplished in a functional language by
      recursion rather then by iteration.
     General design issues for iteration control statements:
      1. How is iteration controlled?
      2. Where is the control mechanism in the loop?
     The primary possibilities for iteration control are logical, counting, or a combination of the two.
     The main choices for the location of the control mechanism are the top of the loop or the bottom
      of the loop.
     The body of a loop is the collection of statements whose execution is controlled by the iteration
      statement.
     The term pretest means that the loop completion occurs before the loop body is executed.
     The term posttest means that the loop completion occurs after the loop body is executed.
     The iteration statement and the associated loop body together form an iteration
                construct. Counter-Controlled Loops
     A counting iterative control statement has a var, called the loop var, in which the count value is
      maintained.
     It also includes means of specifying the intial and terminal values of the loop var, and
      the difference between sequential loop var values, called the stepsize.
     The intial, terminal and stepsize are called the loop parameters.
     Design Issues:
           – What are the type and scope of the loop variable?
           – What is the value of the loop variable at loop termination?
           – Should it be legal for the loop variable or loop parameters to be changed in the loop body,
               and if so, does the change affect loop control?
           – Should the loop parameters be evaluated only once, or once for every iteration?
     FORTRAN 90’s DO
     Syntax:
      DO label variable = initial, terminal [, stepsize]
         …
      END DO [name]
     The label is that of the last statement in the loop body, and the stepsize, when absent, defaults to 1.
     The loop params are allowed to be expressions and can have negative or positive values.
     They are evaluated at the beginning of the execution of the DO statement, and the value is used to
      compute an iteration count, which then has the number of times the loop is to be executed.
     The loop is controlled by the iteration count, not the loop param, so even if the params are
      changed in the loop, which is legal, those changes cannot affect loop control.
   The iteration count is an internal var that is inaccessible to the user code.
   The DO statement is a single-entry structure.
    count2 = 1.0
    loop:
       if count1 > 10 goto out
       if count2 > 100.0 goto out
       count1 = count1 + 1
       sum = count1 + count2
       count2 = count2 * 2.5
       goto loop
    out…
   The loop above does not need and thus does not have a loop body.
   C99 and C++ differ from earlier version of C in two ways:
    1. It can use a Boolean expression for loop control.
    2. The first expression can include var definitions.
             Logically Controlled Loops
   Design Issues:
        1. Pretest or posttest?
        2. Should this be a special case of the counting loop statement (or a separate statement)?
   C and C++ also have both, but the control expression for the posttest version is treated just like in
    the pretest case (while - do and do - while)
   These two statements forms are exemplified by the following C# code:
    sum = 0;
    indat = Console.ReadLine( );
    while (indat >= 0) {
        sum += indat;
        indat = Console.ReadLine( );
    }
    value = Console.ReadLine( );
    do {
        value /= 10;
        digits ++;
    } while (value > 0);
   The only real difference between the do and the while is that the do always causes the loop body to
    be executed at least once.
   Java does not have a goto, the loop bodies cannot be entered anywhere but at their beginning.
             User-Located Loop Control Mechanisms
   It is sometimes convenient for a programmer to choose a location for loop control other than the top
    or bottom of the loop.
   Design issues:
         1. Should the conditional be part of the exit?
         2. Should control be transferable out of more than one loop?
   C and C++ have unconditional unlabeled exits (break).
   Java, Perl, and C# have unconditional labeled exits (break in Java and C#, last in Perl).
   The following is an example of nested loops in C#:
OuterLoop:
        for (row = 0; row < numRows; row++)
            for (col = 0; col < numCols; col++)
                 { sum += mat[row][col];
                 if (sum > 1000.0)
                    break outerLoop;
        }
   C and C++ include an unlabeled control statement, continue, that transfers control to the control
    mechanism of the smallest enclosing loop.
   This is not an exit but rather a way to skip the rest of the loop statements on the current iteration
    without terminating the loop structure. Ex:
      The notation {0} in the parameter to Console.WriteLine above indicates the position in the string to
       be displayed where the value of the first named variable, name in this example, is to be placed.
      Control mechanism is a call to a function that returns the next element in some chosen order, if
       there is one; else exit loop
      C's for can be used to build a user-defined iterator.
      The iterator is called at the beginning of each iteration, and each time it is called, the iterator returns
       an element from a particular data structure in some specific order.
Unconditional Branching
    An unconditional branch statement transfers execution control to a specified place in the
     program.
  Problems with Unconditional Branching
   The unconditional branch, or goto, is the most powerful statement for controlling the flow of
     execution of a program’s statements.
   However, using the goto carelessly can lead to serious problems.
   Without restrictions on use, imposed either by language design or programming standards, goto
     statements can make programs virtually unreadable, and as a result, highly unreliable and
     difficult to maintain.
   There problems follow directly from a goto’s capability of forcing any program statement to
     follow any other in execution sequence, regardless of whether the statement proceeds or follows
     the first in textual order.
   Java doesn’t have a goto. However, most currently popular languages include a goto statement.
   C# uses goto in the switch statement.
Guarded Commands
  To support a new programming methodology (verification during program development)
          Selection: if -> [] -> ... [] -> fi -
              Semantics: When this construct is
              reached,
               o Evaluate all boolean expressions
               o If more than one are true, choose one non deterministically
               o If none are true, it is a runtime error
  Idea: if the order of evaluation is not important.
          Loops do -> [] -> ... [] -> od Semantics:
                   For each iteration Evaluate all boolean expressions
o If more than one are true, choose one non deterministically; then start loop again
o If none are true, Connection between control statements and program verification is
  intimate
o Verification is impossible with gotos
o Verification is possible with only selection and logical pretest loops
o Verification is relatively simple with only guarded commands.