SIMD ARRAY
PROCESSORS
                   Chapter 4
NOTE: Refer two author book Kai Hwang and Briggs page no:325
          SIMD ARRAY PROCESSORS
• ILLIAC-IV
• Burroughs Scientific Processor(BSP)
ILLIAC-IV
ILLIAC IV
• Constitutes of multiple synchronized processing
  elements under one control unit
• Each processing elements contain an ALU, registers
  and local memory
• User programs are loaded into the control unit
  memory from an external source
• The control unit then decodes and decides where the
  instruction should be executed.
                   ILLIAC IV
• Branching instructions are executed directly on the
  control unit
• Vector instructions are sent to the processing
  elements for distributive execution to achieve spatial
  parallelism via duplicate arithmetic units
• The data can be loaded into the processing element’s
  memory from external source via system bus and
  broadcast mode of control unit
ILLIAC
   •
       IV
    Masking scheme is used to control the status of
    processing element during execution
   • A PE may either be activated or deactivated during an
     instruction cycle
   • Enabled PE only perform execution
   • Data exchange between the PE is done via
     interconnection network that performs data routing
     and manipulation function
ILLIAC IV
• Interconnection network is too under the supervision
  of control unit
• A host computer is interfaced with array processor
  through control unit
• A host computer is a general purpose machine which
  serves as a overall manager of the entire system.
   • Host : front-end machine; manages resources, i/o activity
   • Array Processor can be considered as back-end attached
     computer
   • Note: each node has local memory
 Burroughs Scientific Processor(BSP)
• Effectively a successor to the ILLIAC IV machine
• But with an architecture modified to reflect the fact that the BSP was
  intended to be a commercial product
• It has fewer processing units than ILLIAC IV
• All processors enjoyed equal access to a common logical address space
  which was divided into a number of physically separate memory
  modules
• Each processing element is nothing more than an arithmetic unit with
  input and output registers, and these units are homogeneous and non-
  pipelined
  • Note: All nodes share the same memory howsoever the memory are divided
    into modules
SIMD Interconnection Network
     1D : Linear; 2D: Ring, Mesh,Star, Tree; 3D: Hypercube
   SIMD Interconnection Network
   • Data exchange among the PE’s are done via interconnection
     network
   • They perform all data routing and manipulation functions
   • Architecture of an interconnection network is based on
     topology
• Static: pattern is fixed and cannot be reconfigured
• Dynamic : pattern inside the network are not fixed
   • Depending on the number of stages it is divided into two
     types
      • Single stage
      • Multi stage
    NOTE: refer 334 page no of two author kai Hwang and Briggs
Single Stage
• Only one stage is used
• Depending on the interstage connection used a single stage is also
  known as recirculating network
• Data may recirculate single stage many times before reaching their
  destination
   • E.g : Crossbar Network
             NOTE: refer 334 page no of two author kai Hwang and Briggs
Multi Stage
    • Consist of many stages interconnected switch
    • Characterized by switch box and network connectivity
    • The connectivity is controlled by choice of interstage
      connection pattern
    • A switch box may have any of the four patterns
      mentioned in the next slide.
         NOTE: refer 337 page no of two author kai Hwang and Briggs
                     1: Straight
                     2:Exchange
                     3: Lower Broadcast
                     4:Upper Broadcast
NOTE: refer 337 page no of two author kai Hwang and Briggs
Mesh Connected Illilac Network
No of Nodes = N=16
No of interconnection per node (r) =√N=4
Max no of hops≤ √N-1                        Example:
Routing function for connecting ith node:   Node 3 will be connected with:
R+i(1)= (i+1) mod N          0≤ i ≤N-I      i+1= 4th node
R-i(1)= (i-1) mod N                         i-1=3rd node
R+r(i)= (r+i) mod N                         i+4=7th node
R-r(i)= (r-i) mod N                         i-4=(-1)=15
Shuffle exchange and omega Networks:
• The class of shuffle- exchange network is based on two routing function
  shuffle(S) and exchange(G).
• Two types: perfect shuffle and Inverse shuffle
• For perfect shuffle
• Let A=an-1, an-2,…….a1,a0 be a PE address.
• S(an-1, an-2,…….a1,a0 )=an-2,…….a1,a0 , an-1 ..
• N= number of PE’s.
• n= log n
• The cyclic shifting of bits in A to the left for one bit position is performed by
  the S.
               NOTE: 350 page no of two author kai Hwang and Briggs
PERFECT SHUFFLE
     NOTE: 351 page no of two author kai Hwang and Briggs
Shuffle exchange and omega Networks:
• Inverse shuffle
• Let A=an-1, an-2,…….a1,a0 be a PE address.
• S(an-1, an-2,…….a1,a0 )=a0, an-1, an-2,…….a1
• N= number of PE’s.
• n= log n
• The cyclic shifting of bits in A to the right for one bit position is
  performed by the S.
             NOTE: 350 page no of two author kai Hwang and Briggs
Inverse Shuffle
       NOTE: 351 page no of two author kai Hwang and Briggs
OMEGA NETWORK (by using perfect
shuffle)
      NOTE: refer Video
 BLOCKING STATE
 • If the i/o ports are on the same side then the network
   is called one-sided networks, also known as full
   switches
 • Two side multistage has an input and output side
    • This can be classified further as blocking or non blocking
• If the simultaneous connections of some multiple i/o pairs result in
  conflicts in switches or links, then the multistage network is known
  as blocking network. Eg: OMEGA NETWORK
• If the network can perform all possible connections sources (input)
  and destination(output) by rearranging its connection then its
  known as non-blocking network. Eg: Benes Network
Cube Interconnection Network
• Cube network can be implemented as either a re-circular network or
  as multistage network for SIMD.
• In cube network by single cube we can able to connect 8 PEs.
• To connect 16 node we need two cube and so on.
• For representing 8 PEs 3bits are required.
• Interconnection made by using following rule:
   • vertical lines connect vertices(PEs) differ in the most significant bit.
   • Horizontal line differs in the least significant bit.
   • Vertices at both ends of diagonal lines differs in the middle bit position.
                             (refer page no:343
Cube Interconnection Network
Assignment:
• Design a 4-cube network for an array processor consisting of 16
  Processing Elements (PEs). Trace the path to route a packet of data
  from the node 0110 to 1101.
• Barrel Shifter Network (refer page no:345)
• With the aid of an example explain the ‘Masking’ and ‘Data Routing’
  mechanism in an SIMD Array processor.
Thank You