Finite Element Method
Finite Element Method
      01m /2 hE
          0FCII  ENGIER       BN OUROMODE  [   E 84
   432N004-6
mSIFEDNEL-R
EN
 homhmhI
                          12.8
                        -21
11111.25
     111
   ul1111.6
                                                                      Sponsored by
                                                                      NAVAL FACILITIES ENGINEERING COMMAND)
~~ September 1984ji
LJJ
                                                                                  DTIG -"
                                                                                               .I
       An Investigation Conducted by:
       DEPARTMENT 01- CIVIL ENGINEI*.RIN;                             ~ELECTE
       UNIVERSITY OF CALIFORNIA
BerkclcN, CA 95663OC2
       N00014-76-C-OO1 3
       N62583/83 M T265
                                                                 84      10       23 008                      1
                                                t                                                          t
            E~             I. S
 z          0N            CI w-O                                              l4"
0-0         00            -.           0N
                                        ri                      0-~
E E ar
I..I rp 'U.I... II IN
                                     S,.-
                                                         >I                              it
                                                                                         I-
N '.C
00
E. C                 5r
                      Unclassified
 SC.JRITY CLASIICATION OF TH~IS PAGE (WP,.n noW. F..,...d)
                                                   REPOT DCUMNTATON
                                                            AGEREAD
                                                  REPOT DCUMNTATON
                                                             AGEBEFORE
                                                                                                                                          INSTRUCTIONS
                                                                                                                                         COMPLETING FORM
     IREPORT         NUMBER                                                                        0             3             IE        CATALOG NUMBER
      CR-84 .032
          AILE (.d    S,b-1I.)
                                                                              -0"''
                                                                                U1
                                                                                                                 S
                                                                                                                     7   YP
                                                                                                                               WO   RPORT A PERIOD COV'EREO
14MONITORING AGENCY NAME # ADDRESS,I dIlefr.I I,,, Co-,II,nll OfiIr.J IS SECURITY CLASS W~ thA.I.Po.I)
17 DISTRIBUTION STATEMENT (.1 111. *.st-erent-,d in Block 20. 1 d,ffe-.o I,-. R*Porf)
      cos duig
     I/O
   Ii)                            easebysae
       C       ...
        Anrali           tmsdeiethtsatwihheeentc
                         Unnclassified                                                                  -
                      neciviy
                       iforatin adgnertesUneclassifed                                      dt
                     structure.
                           The element matrices are              then assembledPto               form
Page
CONCLUSION.......................... 9
List of Figures
List of Tables
0v                                                   V5
                                           -1-
Introduction
The analysis of large structures, especially in three dimensions, can result in .if:
matrices that demand an exceptionally large amount of computer storage. The storage needs of
these matrices depend to a large extent on their sparsity and the data structure that is used t.
store them. The choice of the data structure in turn depends on the method that is usc , ..
the associated system of equations. Presently, most solution schemes used in finite element com-
puter programs are based on direct methods, i.e. triangular factorization of the stiffness matrix,
K. Starting from a given mesh description, a finite element program performs the following steps:
In many applications the available primary memory is not sufficient to store the assembled
matrix, and therefore secondary storage is used. In this circumstance, steps 4 through 7 involve
data transfers between primary and secondary store, often referred to as 1/0. In this case, K is
partitioned into blocks and each block is assembled and stored on secondary store. The blocks
are then brought back into the main memory to form the factors of K. For large enough prob.-
A great deal of effort has been expended to develop new procedures for reordering the equili-
brium equations, thus reducing the overall storage requirements in the solution steps 5, 6 and 7
11,2,M. This ismotivated by the fact that reduction in storage translates directly into a savings
in the I/O costs. Among the many solution schemes used, the frontal method 141and the prolle
or skyline method II are probably the most popular. In 181it is shown that when the same nodal
                                                                                          V.l
                                                -2-
elimination ordering is used the profile method performs the same number of operations a., lot
frontal method; in [51 an algorithm is described that delivers a good frontal node ordering for the
profile method. The significant difference between these methods is that the frontal method otel.
combines steps 4, 5 and . In this way the I/O during the assembly step is overlapped with tht
1/O in the factorization step; thus, the frontal method results in a saving that is equal to the c.
of 1/O in step 4. Alternatively, the assembly of a profile stored matrix which is partitioned into
blocks requires a multiple pass through the elements to perform the assembly. The principle
differences in the I/O costa of the two methods during factorization may be traced to the 3bove
differences in the partitioning of the matrix into a frontal or a profile form. The principle disa4-.
vantage of the frontal method is the added overhead to retain a small front width during the tri-
angular factorization in step 6 and subsequent resolutions in step 7. For example, in resolution of
equations this added overhead may lead to CPU costs which are several times those of a profile
stored resolution.
In this paper we take a different approach. We use the following simple observation:
In other words, one should use the most efficient data structure for the assembly process, step 5,
and then restructure the data for ones favorite solution scheme, i.e., either frontal or profile. In
this way one can achieve the same reductions in 1/0 as the frontal method and at the same time
maintain high modularity of the program. Here we develop a data structure that stores only the
nonzero terms in the stiffness matrix in a compacted form, and present an algorithm for the
assembly of K for this storage method. This approach results in considerable reduction in the
storage needs during the assembly process. Therefore large matrices often may be fully assembled
This approach has the added advantage that the program is not built around a single equa-
tion solver. One can have many solution procedures by simply expanding the compacted strue-
ture of K into a form appropriate for each particular solution method. Furthermore, the com-
pacted structure can be used directly for iterative solution techniques such as the conjugate
                                                 -3-
Storage Scheme
We now describe the compacted structure that is used to store K. We only consider sym-
metric matrices, although the extension to the nonsymmetric case is trivial, and store only L,-
upper triangular part of K is stored. The diagonal terms of K will be stored separately in a eta-
gle array of length n, where a is the total number of equations. The remaining off-diagonals will
be placed in a second array of length r, where r is the total number of nonzero off-diagonal terms
in the upper triangular part of K. All the elements in the same column will be placed consecu-
tively in this array, starting from the top of the column down to the diagonal (excluding the diag-
onal term). The columns are stored consecutively from the second to the last. For each entry in
this array we store its row number in a corresponding integer array of length r. The example
Ezample:
                                           9        12           15
                                           2 10 11     13 14
                                              3     16 17
                                                  4       19     21
                                 K                   5 1820
                                                        6        22
                                                           7     23
                                                                  8
                                     dis= 11,2,3,4,5,6,7,81.
A real array then stores the off-diagonal terms of K and a corresponding integer array denotes the
     The total storage requirement is r + n real words and r + n integer words. Using a 16
                  5
bit integer word,   -(n + r) real words (64 bit) will be sufficient. Then the largest number of
equations that can be solved this way is 215 - 1 P 32000. With a 32 bit integer word, t,               .-
                                                                                                        ,c
In this section we give a step by step derivation of the assembly algorithm. Each step is
demonstrated with the aid of the mesh example in Fig. 1. First we introduce some notation. A
finite element mesh is denoted by M - (E , N) where E and N represent the collection of ele-
ments and nodes in the mesh. A part of the input information provided to a finite element pro-
gram is the set of nodes belonging to an element. This we denote as NCN. For example ele-
ment 4 in Fig. I has the connectivity set N 4 = {5,8,9,6). In Table I we give the complete list
of the connectivity sets N, for each element in example 1. These data are usually assembled in a
single array known as the connectivity array. The complete set ( N, V sEE ) is sufficient to =
describe the connectivity of a given mesh. Another part of the input data is the boundary condi-
tions that determine the set of the indices of all the active degrees of freedom at node p. We
denote this set as U.. In Table I we give the set ( Up VpEN) for the example in Fig. 1. In this
example, we assume that there are two degree of freedom at each node. The collection of N,
column 2, and U., column 4, given in Table I is sufficient to determine the sparsity structure of
                                           b:A
                     Element Connectivity                Boundary Conditions
                       1        (1,4,5,2)             2             (1)
                       2        (2,5,6,3)            4              (2)
                     3          (4,7,8,5)            5             (3,4)
                       4        (5,8,9,6)            6              (5)
                                                     7              (6)
                     675        {(7,11,8)
                                  10,11l, 7)
                                 (8,11,9)
                                                     89            (7,8)
                                                                    (9)
8 (11,12,9) 11 (10)
Table 1. Connectivity sets and active degrees of freedom for the mesh in Fig. 1.
Our objective here is to find the set of indices of the unknowns that are coupled with a
given degree of freedom. This is precisely the row number of each nonzero term in a given
column of the stiffness matrix, irow, that is required for the storage scheme described in the pre-
vious section.
First, we must establish the set of elements that are connected to each node. This can be
done by inspecting the element connectivity sets. Looking at the second column of Table 1, for
example node 4 appears twice, in rows 1 and 3. We then conclude that node 4 is connected to
elements 1 and 3. This process must be repeated for each node. The difficulty here is that we do
not know apriori the number of storage locations needed to identiry the set of elements for each
node. For this reason the above process iscarried out in two steps. The number of elements con-
nected to each node is determined and stored first. We refer to this as the E-degree (element
degree) of each node. In the example the -degree of node 4 is 2. The E-degree also determines
the length of the array that is required to keep the set of elements connected to each node. For
each node p we demote this set by Ep C E. Then the above process is simply to evaluate
                                      E, -c      Ip N)                                          (E)
for each node p. This equation may be thought of as fnding the pseudo-inverse of the connec-
tivity array. The &,degreeof mode p is the number of terms inE (the cardisality of E. ). See
                                                            -6-
columns two and three of Table 2 for the Edegree and the complete set of E, for the nodes in
example 1.
Next, we And for the set of nodes that are adjacent to each node p. We denote thib L1
    ACN. This is the set of all nodes that belong to an element with p as one of its nodes                 IR*:vi-7
    established the set of elements connected to node p, the adjacent nodes are all the 4t'
belonging to these elements. In example I node 4 is connected to elements I and 3 (see column 2
of Table 2). The set of nodes belonging to elements 1 and 3 are obtained by inspecting column 2
of Table 1; these are ( 1, 4, 5, 2 ) and ( 4, 7, 8, 5 ) respectively. Then the set of nodes adjacent
cent to it and is the cardinality of A,. In columns 4 and 5 of Table 2 we give the N-degree and
the adjacency set of each node in the nodal graph of example 1 (Fig. 2). This step is simply to
A,= U N, - n (2)
Note that both A, and N-degree can be obtained in the same loop.
      Node            E                            N                                   S
        p         degree            Ep          degree                  AP           degree          SP
         1        1                 1)             3                (4,5,2)            0
         2            2          (1,2)             5             1, 4, 5,6,3)          1            {
         3            1           (2)              3               (2,5,0)             1            (2)
         4            2          (1,3)             5           (1,2,5,7, 8             2           (1,2)
         5            4        (1,3,4,21           8         1, 4, 7, 8, 9,6, 3, 2     4         (1,4,3,2)
*        6            2           (2,4)            5              (2, 5,8,9, 3 )       3          (2,5,3)
         7            3          (3,5,6)           5          (10, 11,8,,     4)       2           (4,5)
         8            4        (3,5,7,4)           6          (11, 9,6, 5, 4, 7        4         (6,5,4,7)
         9            3         (4,7,8)            5           (11, 12, 6, 5,8)        3          16,5,8)
        10            1            (6)             2                 17,11)             1           (7)
        11            4        (6,5,7,8)           5          (10,12,9,8,7)            4        (10,9,8,7)
        12            1           (8)              2              (11,9)               2          (11,9)
    Table 2.              The result of algorithm for establishing the row index of the nonzero terms in K
                          for example 1.
                                                                 -7-
Since we want to store only the upper triangular part of K we need to store only a subset of
AP. This will be the set of nodes in A, with an index less than p; that is
Sp= (i iEA, and i < p ). We refer to the number of terms in S, as the S-degree (.,rc.-
degree) of a node. The set S, is only useful when the numbering of the unknowns are such tha-t
when i < j all the unknowns at node i have a smaller index than the urknowns at n,,Ie
Whenever this is not true it is necessary to use the complete set of adjacent nodes A. together
with the numbers of the unknowns for each node (e.g., see listing in Appendix A.).
Finally, for a given unknown at node p with index EUV, we find the set of the indices of all
other unknowns that are coupled with u,. This will be the set of row indices R, for nonzeros in
                                                       R,              U U,                                                    (3)
                                                                     tIE A,
For example 1 ffj is the adjacency set of j in the graph of the unknowns in Fig. 3. Since we only
store the upper triangular part of K we scan through ifj and use the subset defined by:
R) is the row index of all the nonzero terms in the i-tb column of the upper triangular part of K.
Ij                                    D.O.F.          Node
                                                        p                        R
                                             1              2                   (0)
                                             2              4                    11)
                                             3              5                  (1,2)
                                             4              5                 (3,2, 1
                                             6              6                 (4,3,1)
                                             6              7                 (3,4,2)
                                             7              8           (6,2, 4, 5, 3)
                                                S           8            2, 3,4,,6,,)
                                             0,                         (7,4, 5, 3, 8
                                            10          11                (9,6,7,8)
Table 3. The row indices of the nonzero terms in the upper triangular part of K.
            _                                                                                                                        r
                                                -8-
The listing of a FORTRAN program that performs all the steps that is described in Lbis
section is given in Appendix A. In this Appendix we also provide the subroutine that use, the
Numerical Result
We use the algorithm described in the previous section to assemble the stiffnems matr!,,
the problems described in Table 6. The total storage required during the assembly step is
evaluated. We compare these results to similar results obtained when the assembly is performed
directly into a profile data structure. The storage requirement of the compacted assembly is not
effected by the node ordering. For the assembly into a profile form, we numbered the nodes
across the width of the mesh to reduce the bandwidth of the stiffness matrix. Although, the
bandwidth could have been reduced further using a renumbering scheme such as 12,31, we omitted
this step for simplicity. The results for examples 2 to 6, given in Table 6, are presented in Table
4 below.
Table 4.      Comparison of the storage demands of profile and compacted assembly for K in
              examples 2 to 6.
The reults in Table 4 is obtained based on the assumption that a real word is twice as long as an
integer word. We observe a reduction from 40% for two dimensional (2-D) problems to 80% for
3-D problems for these examples. The reductions will be more if short integer words are used. It      4
is interesting to note that the required storage for compacted structure varies linearly with the
number of equations. Therefore, the saving will be more for larger problems. In Table 5 we give
the storage counts for the two methods considered here on square mesh in 2-D and c      '    .
a function of the number of equations. To obtain these estimates we assumed that there ib ,ady
1 2 5/2 n 3n
                              2                          n/        15/2 n
                              3            n 2/          n6/3        21n
Table 5. Estimated storage needs for each scheme on regular mesh. n is the number of equations.
Conclusion
The essential steps in a finite element program can be modified to make use of the com-
3. Choose a solution procedure and renumber the equations to reduce the storage
4. expand the compacted K into a data structure suitable for the solution method.
When there is insufficient primary storage, the assembly of the matrix in compacted form
opens a number of possible avenues that one can take to reduce the 1/o cost. The expnamsloa of
the compacted form need not be done immediately after it's assembly. The matrix can be kept in
compact form and put on secondary store and expanded into a full prolle form only when a fac-
torization mut be performed. This way the number of data entries that is read (in the lpet
phse of I/O) cam be reduced considerably, which is turn results in a reductiom is the solution
Ume.
                                       *    10-
10 4 7
$ -
I
                                             -   12-
                                                                   .,   i
Example 2: Cantilever Struc-
ture, left end fixed, plane stress
elements with 2 degrees of free-                       .,
do. per mode.
Reference[
III    A. George and J. W. Liu, Computer Solution of Large Sparse Positive Definite Systems,
       Prentice-Hall, Englewood Cliffs, 1981.
121    E. Cuthill and J. McKee, "Reducing the Bandwidth of Sparse Symmetric Matrices," Proc.
       ACM Nat. Con., New-York, 1969.
131 N. E. Gibbs, W. G. Poole, Jr., and P. K. Stockmeyer, "An Algorithm for Reducing Lbe
       Bandwidth and Profile of A Sparse Matrix," SIAM 1. Num. Anal., Vol. 13, pp. 236-250,
       1976.
14)    B. Irons, "A Frontal Solution Program for Finite Element Analysis," Int. J. Num. Meth.
       Engng., Vol. 2, pp. 5-32, 1970.
151 M. Hoit and E. L. Wilson, "An Equation Numbering Algorithm Based on a Minimum Front
       Criteria," Computers and Structures,Vol. 16, No. 1-4, pp. 225-239, 1983.
161    B. Nour-Omid, B. N. Parlett and R. L. Taylor, "A Newton-Lanczos Method for Solution of
       Nonlinear Finiie Element Equations," Computers and Structures, Vol. 16, No. 1-4, pp. 241-
       252, 1983.
181    R. L. Taylor, E. L. Wilson and S. Sackett, "Direct Solution of Equations by Frontal and
       Variable Band, Active Column Methods," in Nonlinear Finite Element Analysis in Structural
       Mechanics, Proc. Europe-US Workshop, July 1976.
191    R. L. Taylor, "Computer Procedure for Finite Element Analysis," Ch. 24, The Finite Ele-
       ment Method, 3-rd Ed., by 0. C. Zienkiwicz, McGraw-Hill, London, 1977.
1101   E. L. Wilson, "Solution of Sparse Stiffness Matrices for Structural Systems," Proc. of Sparse
       Matrices, Ed. 1. S. Duff, SIAM, Philadelphia, 1979.
           m&
                                   -14-
            DO 360 1 =            I,NUMNP
                NE       =   IC( I)
                DO 340 I = I,NDF
                   NEQ = ID(II,I)
                   IF ( NEQ GT. 0 ) THEN
                      JCOLE(NEQ) = KP
                      KPO = KP + I
                      IF ( NEP LE. NE ) THEN
                         DO 330 N = NEP,NE
                            NN = IELC(N)
                            DO 320 J   I,NEN
                                            K - IX(J,NN)
                                            DO 310 JJ = INDF
                                               NEQJ - ID(JJ,K)
                                               IF (NEQJ GE NEQ                OR   NEQJ   LT.   0) GO TO 310
      C
      C..                                      CHECK TO SEE IF NODE ALREADY IN LIST
      C
                                    IF ( KPO LE. KP ) THEN
                                       DO 300 ICK = KPO,KP
                                          IF( IRON(KK)   EQ NEQJ                          )   GO TO 310
      Soo                              CONTINUE
                                   END IF
                                   lCP - KP + I
                                    IROW(KP) - NEQJ
      310                       CONTINUE
      320                    CONTINUE
      330                 CONTINUE
                          JCOLE(NEQ) =K?
                      END IF
                   END IF
      340       CONTINUE
                NEP - NE + 1
      350   CONTINUE
            RETURN
            END
                                                    -16-
                 SUBROUTINE IZERO(IA,NN)
                 DIMENSION IA(NN)
                 DO l00 N = I,NN
                     IA(N) = 0
         100     CONTINUE
                 RETURN
                 END
               i[
t_
e . . .. l I