Unit - 1 (CFD)
Unit - 1 (CFD)
Unit - 1
UNIT - I 1
[COMPUTATIONAL FLUID DYNAMICS]
REPRESENTATION OF INTEGERS
Integers are whole numbers or fixed-point numbers with the radix
point fixed after the least-significant bit. They are contrast to real
numbers or floating-point numbers, where the position of the radix point varies.
It is important to take note that integers and floating-point numbers are treated
differently in computers. They have different representation and are processed
differently (e.g., floating-point numbers are processed in a so-called floating-
point processor). Floating-point numbers will be discussed later.
Computers use a fixed number of bits to represent an integer. The commonly-
used bit-lengths for integers are 8-bit, 16-bit, 32-bit or 64-bit. Besides bit-
lengths, there are two representation schemes for integers:
1. Unsigned Integers: can represent zero and positive integers.
2. Signed Integers: can represent zero, positive and negative integers.
Three representation schemes had been proposed for signed
integers:
a. Sign-Magnitude representation
b. 1's Complement representation
c. 2's Complement representation
You, as the programmer, need to decide on the bit-length and representation
scheme for your integers, depending on your application's requirements.
Suppose that you need a counter for counting a small quantity from 0 up to 200,
you might choose the 8-bit unsigned integer scheme as there is no negative
numbers involved.
3.1 n-bit Unsigned Integers
Unsigned integers can represent zero and positive integers, but not negative
integers. The value of an unsigned integer is interpreted as "the magnitude of
its underlying binary pattern".
Example 1: Suppose that n=8 and the binary pattern is 0100 0001B, the value
of this unsigned integer is 1×2^0 + 1×2^6 = 65D.
Example 2: Suppose that n=16 and the binary pattern is 0001 0000 0000
1000B, the value of this unsigned integer is 1×2^3 + 1×2^12 = 4104D.
Example 3: Suppose that n=16 and the binary pattern is 0000 0000 0000
0000B, the value of this unsigned integer is 0.
An n-bit pattern can represent 2^n distinct integers. An n-bit unsigned integer
can represent integers from 0 to (2^n)-1, as tabulated below:
UNIT - I 2
[COMPUTATIONAL FLUID DYNAMICS]
n Minimum Maximum
8 0 (2^8)-1 (=255)
16 0 (2^16)-1 (=65,535)
In sign-magnitude representation:
The most-significant bit (msb) is the sign bit, with value of 0
representing positive integer and 1 representing negative integer.
The remaining n-1 bits represents the magnitude (absolute value) of
the integer. The absolute value of the integer is interpreted as "the
magnitude of the (n-1)-bit binary pattern".
Example 1: Suppose that n=8 and the binary representation is 0 100 0001B.
Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65D
Example 2: Suppose that n=8 and the binary representation is 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is 000 0001B = 1D
Hence, the integer is -1D
Example 3: Suppose that n=8 and the binary representation is 0 000 0000B.
Sign bit is 0 ⇒ positive
UNIT - I 3
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 4
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 5
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 6
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 7
[COMPUTATIONAL FLUID DYNAMICS]
Because of the fixed precision (i.e., fixed number of bits), an n-bit 2's
complement signed integer has a certain range. For example, for n=8, the range
of 2's complement signed integers is -128 to +127. During addition (and
subtraction), it is important to check whether the result exceeds this range, in
other words, whether overflow or underflow has occurred.
Example 4: Overflow: Suppose that n=8, 127D + 2D = 129D (overflow -
beyond the range)
127D → 0111 1111B
2D → 0000 0010B(+
1000 0001B → -127D (wrong)
UNIT - I 8
[COMPUTATIONAL FLUID DYNAMICS]
The following diagram explains how the 2's complement works. By re-arranging
the number line, values from -128 to +127 are represented contiguously by
ignoring the carry bit.
UNIT - I 9
[COMPUTATIONAL FLUID DYNAMICS]
Modern computers store one byte of data in each memory address or location,
i.e., byte addressable memory. An 32-bit integer is, therefore, stored in 4
memory addresses.
The term"Endian" refers to the order of storing bytes in computer memory. In
"Big Endian" scheme, the most significant byte is stored first in the lowest
memory address (or big in first), while "Little Endian" stores the least significant
bytes in the lowest memory address.
For example, the 32-bit integer 12345678H (30541989610) is stored as 12H 34H
56H 78H in big endian; and 78H 56H 34H 12H in little endian. An 16-bit integer
00H 01H is interpreted as 0001H in big endian, and 0100H as little endian.
3.10 Exercise (Integer Representation)
1. What are the ranges of 8-bit, 16-bit, 32-bit and 64-bit integer, in
"unsigned" and "signed" representation?
2. Give the value of 88, 0, 1, 127, and 255 in 8-bit unsigned
representation.
3. Give the value of +88, -88 , -1, 0, +1, -128, and +127 in 8-bit 2's
complement signed representation.
4. Give the value of +88, -88 , -1, 0, +1, -127, and +127 in 8-bit sign-
magnitude representation.
UNIT - I 10
[COMPUTATIONAL FLUID DYNAMICS]
5. Give the value of +88, -88 , -1, 0, +1, -127 and +127 in 8-bit 1's
complement representation.
6. [TODO] more.
Answers
1. The range of unsigned n-bit integers is [0, 2^n - 1]. The range of n-
bit 2's complement signed integer is [-2^(n-1), +2^(n-1)-1];
2. 88 (0101 1000), 0 (0000 0000), 1 (0000 0001), 127 (0111 1111), 255
(1111 1111).
3. +88 (0101 1000), -88 (1010 1000), -1 (1111 1111), 0 (0000 0000), +1
(0000 0001), -128 (1000 0000), +127 (0111 1111).
4. +88 (0101 1000), -88 (1101 1000), -1 (1000 0001), 0 (0000 0000 or
1000 0000), +1 (0000 0001), -127 (1111 1111), +127 (0111 1111).
5. +88 (0101 1000), -88 (1010 0111), -1 (1111 1110), 0 (0000 0000 or
1111 1111), +1 (0000 0001), -127 (1000 0000), +127 (0111 1111).
FLOATING POINT ARITHMETIC
A floating-point number (or real number) can represent a very large value
(e.g., 1.23×10^88) or a very small value (e.g., 1.23×10^-88). It could also
represent very large negative number (e.g., -1.23×10^88) and very small
negative number (e.g., -1.23×10^-88), as well as zero, as illustrated:
a finite 2^n distinct numbers. Hence, not all the real numbers can be
represented. The nearest approximation will be used instead, resulted in loss of
accuracy.
It is also important to note that floating number arithmetic is very much less
efficient than integer arithmetic. It could be speed up with a so-called
dedicated floating-point co-processor. Hence, use integers if your application
does not require floating-point numbers.
In computers, floating-point numbers are represented in scientific notation
of fraction (F) and exponent (E) with a radix of 2, in the form of F×2^E.
Both E and F can be positive as well as negative. Modern computers adopt IEEE
754 standard for representing floating-point numbers. There are two
representation schemes: 32-bit single-precision and 64-bit double-precision.
4.1 IEEE-754 32-bit Single-Precision Floating-Point Numbers
Normalized Form
Let's illustrate with an example, suppose that the 32-bit pattern is 1 1000
0001 011 0000 0000 0000 0000 0000, with:
S=1
E = 1000 0001
F = 011 0000 0000 0000 0000 0000
In the normalized form, the actual fraction is normalized with an implicit leading
1 in the form of 1.F. In this example, the actual fraction is 1.011 0000 0000 0000
0000 0000 = 1 + 1×2^-2 + 1×2^-3 = 1.375D.
The sign bit represents the sign of the number, with S=0 for positive and S=1 for
negative number. In this example with S=1, this is a negative number, i.e., -
1.375D.
In normalized form, the actual exponent is E-127 (so-called excess-127 or bias-
127). This is because we need to represent both positive and negative exponent.
UNIT - I 12
[COMPUTATIONAL FLUID DYNAMICS]
With an 8-bit E, ranging from 0 to 255, the excess-127 scheme could provide
actual exponent of -127 to 128. In this example, E-127=129-127=2D.
Hence, the number represented is -1.375×2^2=-5.5D.
De-Normalized Form
Normalized form has a serious problem, with an implicit leading 1 for the
fraction, it cannot represent the number zero! Convince yourself on this!
De-normalized form was devised to represent zero and other numbers.
For E=0, the numbers are in the de-normalized form. An implicit leading 0
(instead of 1) is used for the fraction; and the actual exponent is always -126.
Hence, the number zero can be represented with E=0 and F=0 (because 0.0×2^-
126=0).
We can also represent very small positive and negative numbers in de-
normalized form with E=0. For example, if S=1, E=0, and F=011 0000 0000 0000
0000 0000. The actual fraction is 0.011=1×2^-2+1×2^-3=0.375D. Since S=1, it is
a negative number. With E=0, the actual exponent is -126. Hence the number
is -0.375×2^-126 = -4.4×10^-39, which is an extremely small negative number
(close to zero).
Summary
In summary, the value (N) is calculated as follows:
For 1 ≤ E ≤ 254, N = (-1)^S × 1.F × 2^(E-127). These numbers are in the
so-called normalized form. The sign-bit represents the sign of the
number. Fractional part (1.F) are normalized with an implicit leading
1. The exponent is bias (or in excess) of 127, so as to represent both
positive and negative exponent. The range of exponent is -
126 to +127.
For E = 0, N = (-1)^S × 0.F × 2^(-126). These numbers are in the so-
called denormalized form. The exponent of 2^-126 evaluates to a very
small number. Denormalized form is needed to represent zero
(with F=0 and E=0). It can also represents very small positive and
negative number close to zero.
For E = 255, it represents special values, such as ±INF (positive and
negative infinity) and NaN (not a number). This is beyond the scope of
this article.
Example 1: Suppose that IEEE-754 32-bit floating-point representation pattern
is 0 10000000 110 0000 0000 0000 0000 0000.
Sign bit S = 0 ⇒ positive number
E = 1000 0000B = 128D (in normalized form)
Fraction is 1.11B (with an implicit leading 1) = 1 + 1×2^-1 + 1×2^-2 = 1.75D
UNIT - I 13
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 14
[COMPUTATIONAL FLUID DYNAMICS]
Hints:
1. Largest positive number: S=0, E=1111 1110 (254), F=111 1111 1111
1111 1111 1111.
Smallest positive number: S=0, E=0000 00001 (1), F=000 0000 0000
0000 0000 0000.
2. Same as above, but S=1.
3. Largest positive number: S=0, E=0, F=111 1111 1111 1111 1111
1111.
Smallest positive number: S=0, E=0, F=000 0000 0000 0000 0000
0001.
4. Same as above, but S=1.
Notes For Java Users
You can use JDK methods Float.intBitsToFloat(int
bits) or Double.longBitsToDouble(long bits) to create a single-precision 32-
bit float or double-precision 64-bit double with the specific bit patterns, and
print their values. For examples,
System.out.println(Float.intBitsToFloat(0x7fffff));
System.out.println(Double.longBitsToDouble(0x1fffffffffffffL));
UNIT - I 15
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 16
[COMPUTATIONAL FLUID DYNAMICS]
000000000000 E = 254, F = 0
00000000000B N(max) = 1.1...1B × 2^127 = (2 - 2^-23) × 2^127
E = 1, F = 0 (≈3.4028235 × 10^38)
N(min) = 1.0B
× 2^-126
(≈1.17549435
× 10^-38)
UNIT - I 17
[COMPUTATIONAL FLUID DYNAMICS]
Special Values
Zero: Zero cannot be represented in the normalized form, and must be
represented in denormalized form with E=0 and F=0. There are two
representations for zero: +0 with S=0 and -0 with S=1.
Infinity: The value of +infinity (e.g., 1/0) and -infinity (e.g., -1/0) are represented
with an exponent of all 1's (E = 255 for single-precision and E = 2047 for double-
precision), F=0, and S=0 (for +INF) and S=1 (for -INF).
Not a Number (NaN): NaN denotes a value that cannot be represented as real
number (e.g. 0/0). NaN is represented with Exponent of all 1's (E = 255 for
single-precision and E = 2047 for double-precision) and any non-zero fraction.
LOSS OF SIGNIFICANCE
The term loss of significance refers to an undesirable effect in calculations
using floating-point arithmetic.
Computers and calculators employ floating-point arithmetic to express and
work with fractional numbers. Numbers in this arithmetic consist of a fixed
number of digits, with a decimal point that “floats” among them. In calculations,
only that fixed number of digits is maintained, so that the results of calculations
are only approximate — which is often quite adequate.
Loss of significance occurs in the subtraction of two nearly equal numbers,
which produces a result much smaller than either of the original numbers. The
effect can be an unacceptable reduction in the number of accurate (significant)
digits in the result.
An algorithm for calculating solutions of a mathematical problem is
called stable if a small change to its input does not result in a large change to its
result.
A floating-point representation of a fractional number can be thought of as
a small change in the number. So, loss of significance will always result if an
unstable algorithm is used with floating-point arithmetic.
UNIT - I 18
[COMPUTATIONAL FLUID DYNAMICS]
Floating-point subtraction
UNIT - I 19
[COMPUTATIONAL FLUID DYNAMICS]
A better algorithm
If
x1 = ( −b + ( b2 − 4 a c )1/2 ) / ( 2 a )
and
x2 = ( −b − ( b2 − 4 a c )1/2 ) / ( 2 a ) ,
then the identity holds:
x1 x2 = c / a ,
which expresses the value of one root in terms of the other.
The algorithm is: use the quadratic formula to find the larger of the two
solutions (the one where two like values aren’t subtracted), then use this
UNIT - I 20
[COMPUTATIONAL FLUID DYNAMICS]
Discussion
Other examples
Every time I have seen this messy fact came up, someone was doing
an eigenanalysis where it was quite uncalled-for. For example, a simple
iteration will give the largest eigenvalue; often that is all that is needed.
Numerical differentiation
The calculus process of differentiation involves a subtraction of
numbers that are by nature almost the same. This causes a lot of trouble
in numerical calculations.
The obvious view of solving differential equations, is one of
integrating a differential expression; the usual approach is to try to get an
adequate result a step size large enough that the loss of significance in the
differences begins to dominate. More often, practitioners rely on dumb
luck.
Note that the inverse process, integration, does not suffer from
this…often re-writing the problem in terms of integrals provides an
attractive alternative algorithm.
This is a nice example of the dictum: an instability is a stability in
reverse.
ERROR PROPAGATION
Propagation of Error (or Propagation of Uncertainty) is defined as the effects on
a function by a variable's uncertainty. It is a calculus derived statistical
calculation designed to combine uncertainties from multiple variables to
provide an accurate measurement of uncertainty.
Introduction
Every measurement has an air of uncertainty about it, and not all uncertainties
are equal. Therefore, the ability to properly combine uncertainties from
different measurements is crucial. Uncertainty in measurement comes about in
a variety of ways: instrument variability, different observers, sample differences,
time of day, etc. Typically, error is given by the standard deviation (σx) of a
measurement.
UNIT - I 22
[COMPUTATIONAL FLUID DYNAMICS]
x=f(a,b,c)(1)
Because each measurement has an uncertainty about its mean, it can be written
that the uncertainty of dxi of the ith measurement of x depends on the
uncertainty of the ith measurements of a, b, and c:
dxi=f(dai,dbi,dci)(2)
The total deviation of x is then derived from the partial derivative of x with
respect to each of the variables:
dx=(δxδa)b,cda,(δxδb)a,cdb,(δxδc)a,bdc(3)
In the first step, two unique terms appear on the right hand side of the
equation: square terms and cross terms.
Square Terms
(δxδa)2(da)2,(δxδb)2(db)2,(δxδc)2(dc)2(4)
Cross Terms
(δxda)(δxdb)dadb,(δxda)(δxdc)dadc,(δxdb)(δxdc)dbdc(5
Square terms, due to the nature of squaring, are always positive, and therefore
never cancel each other out. By contrast, cross terms may cancel each other out,
due to the possibility that each term may be positive or negative. If da, db,
and dc represent random and independent uncertainties, about half of the cross
UNIT - I 23
[COMPUTATIONAL FLUID DYNAMICS]
terms will be negative and half positive (this is primarily due to the fact that the
variables represent uncertainty about a mean). In effect, the sum of the cross
terms should approach zero, especially as N� increases. However, if the
variables are correlated rather than independent, the cross term may not cancel
out.
Assuming the cross terms do cancel out, then the second step - summing
from i=1 to i=N - would be:
∑(dxi)2=(δxδa)2∑(dai)2+(δxδb)2∑(dbi)2(6)
∑(dxi)2N−1=(δxδa)2∑(dai)2N−1+(δxδb)2∑(dbi)2N−1(7)
The previous step created a situation where Equation 77 could mimic the
standard deviation equation. This is desired, because it creates a statistical
relationship between the variable x, and the other variables a, b, c, etc... as
follows:
∑(dxi)2N−1=∑(xi−x¯)2N−1=σ2x(8)
Rewriting Equation 77 using the statistical relationship created yields the Exact
Formula for Propagation of Error:
σ2x=(δxδa)2σ2a+(δxδb)2σ2b+(δxδc)2σ2c(9)
UNIT - I 24
[COMPUTATIONAL FLUID DYNAMICS]
Addition or Subtraction
If x=a+b−c then
σx=σa2+σb2+σc2−−−−−−−−−−−−√(10)
Multiplication or Division
If x=a×bc�=�×�� then
σxx=(σaa)2+(σbb)2+(σcc)2−−−−−−−−−−−−−−−−−−−−−√(11)
Exponential
If x=ay then
σxx=y(σaa)(12)
Logarithmic
If x=log(a) then
σx=0.434(σaa)(13)
Anti-logarithmic
If x=antilog(a) then
σxx=2.303(σa)(14)
Note
The Exact Formula for Propagation of Error in Equation 99 can be used to derive
the arithmetic examples noted above. Starting with a simple equation:
x=a×bc(15)
UNIT - I 25
[COMPUTATIONAL FLUID DYNAMICS]
where x is the desired results with a given standard deviation, and a, b, and c are
experimental variables, each with a difference standard deviation. Taking the
partial derivative of each experimental variable, a, b, and c:
(δxδa)=bc(16)
(δxδb)=ac(17)
and
(δxδc)=−abc2(18)
σ2x=(bc)2σ2a+(ac)2σ2b+(−abc2)2σ2c(19)
σ2xx2=(bc)2σ2a(abc)2+(ac)2σ2b(abc)2+(−abc2)2σ2c(abc)2(20)
Canceling out terms and square-rooting both sides yields Equation 1111:
σxx=(σaa)2+(σbb)2+(σcc)2−−−−−−−−−−−−−−−−−−−−−√
Example 11
ε=Alc.
Solution
Since Beer's Law deals with multiplication/division, we'll use Equation 1111:
σϵϵ=(0.0000080.172807)2+(0.11.0)2+(0.313.7)2−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−√=0.10237
As stated in the note above, Equation 1111 yields a relative standard deviation,
or a percentage of the ε variable. Using Beer's Law, ε = 0.012614 L moles-1 cm-
1
Therefore, the σϵ for this example would be 10.237% of ε, which is 0.001291.
UNIT - I 26
[COMPUTATIONAL FLUID DYNAMICS]
If you are given an equation that relates two different variables and given the
relative uncertainties of one of the variables, it is possible to determine the
relative uncertainty of the other variable by using calculus. In problems, the
uncertainty is usually given as a percent. Let's say we measure the radius of a
very small object. The problem might state that there is a 5% uncertainty when
measuring this radius.
(dxx)=(Δxx)=uncertainty
Example 22
Let's look at the example of the radius of an object again. If we know the
uncertainty of the radius to be 5%, the uncertainty is defined as
(dxx)=(Δxx)=5%=0.05.
V(r)=c(r2)
Solution
The first step to finding the uncertainty of the volume is to understand our given
information. Since we are given the radius has a 5% uncertainty, we know that
(∆r/r) = 0.05. We are looking for (∆V/V).
Now that we have done this, the next step is to take the derivative of this
equation to obtain:
UNIT - I 27
[COMPUTATIONAL FLUID DYNAMICS]
dVdr=ΔVΔr=2cr
ΔV=2cr(Δr)
ΔVV=2cr(Δr)V
We are given the equation of the volume to be V=c(r)2, so we can plug this back
into our previous equation for V� to get:
ΔVV=2cr(Δr)c(r)2
Now we can cancel variables that are in both the numerator and denominator
to get:
ΔVV=2Δrr=2(Δrr)
We have now narrowed down the equation so that ∆r/r is left. We know the
value of uncertainty for ∆r/r to be 5%, or 0.05. Plugging this value in for ∆r/r we
get:
ΔVV=2(0.05)=0.1=10%
The uncertainty of the volume is 10%. This method can be used in chemistry as
well, not just the biological example shown above.
UNIT - I 28
[COMPUTATIONAL FLUID DYNAMICS]
and ill-conditioned depends on the context of the problem and the uses of the
results.
Example:
Suppose we want to evaluate the expression y = x/(1 − x).
With x = 0.93
we get y = 13.28...,
but with x = 0.94
we get y = 15.66....
So we would probably say that this expression is ill-conditioned when evaluated
for x near 0.93.
On the other hand,
if we use x = −0.93 and x = −0.94,
we get values of −0.4818... and −0.4845... and we would say this it is well-
conditioned for x near −0.93.
For many types of problems we can compute a condition number that indicates
the magnification of the changes. The condition number is defined by
Relative error in the output ≈ Condition number × Relative error in the input.
For example, consider evaluating a function f(x) at a point x = x0. The input is x0
and the output is f(x0). If we perturb the input to x = x0 + then the output is
f(x0 + ) and by applying the Mean Value Theorem we get
Applying this to f(x) = x/(1 − x) we get Cf (x) = 1 |1−x| . Then Cf (0.93) = 14.28...
and Cf (−0.93) = 0.5181.... This is consistent with what we saw in the example
above.
With the condition number as defined above, we still have no sharp cutoff
between well- and ill-conditioned. We do know that if the condition number is
less than 1 then it is well-conditioned and if the condition number is arbitrarily
large then it is ill-conditioned. We usually study condition in limiting cases where
these extremes are observed. For example, f(x) = x/(1 − x) is clearly ill-
conditioned for x near 1 and well-conditioned for x < 0 and x > 2.
Stability of an Algorithm
When we study an algorithm our interest is the same as for an expression: we
want small changes in the input to only produce small changes in the output. An
algorithm or numerical process is called stable if this is true and it is called
unstable if large changes in the output are produced. Analyzing an algorithm for
UNIT - I 29
[COMPUTATIONAL FLUID DYNAMICS]
where f is your function, t is the time of the reading, and h is the distance to the
next time step.
UNIT - I 30
[COMPUTATIONAL FLUID DYNAMICS]
The error between our approximations and true values can be found as follows:
UNIT - I 31
[COMPUTATIONAL FLUID DYNAMICS]
As can be seen, the smaller root has a larger error associated with it because
deviations will be more apparent with smaller numbers than larger numbers.
If you have the insight to see that your computation will involve operations with
numbers of differing magnitudes, the equations can sometimes be cleverly
manipulated to reduce roundoff error. In our example, if the quadratic formula
equation is rationalized, the resulting absolute error is much smaller because
fewer operations are required and numbers of similar magnitudes are being
multiplied and added together:
Truncation Error
Truncation errors are introduced when exact mathematical formulas are
represented by approximations. An effective way to understand truncation
error is through a Taylor Series approximation. Let’s say that we want to
approximate some function, f(x) at the point xi+1, which is some distance, h, away
from the basepoint xi, whose true value is shown in black in Figure 1. The Taylor
series approximation starts with a single zero order term and as additional terms
are added to the series, the approximation begins to approach the true value.
However, an infinite number of terms would be needed to reach this true value.
UNIT - I 32
[COMPUTATIONAL FLUID DYNAMICS]
where Rn is a remainder term used to account for all of the terms that were not
included in the series and is therefore a representation of the truncation error.
The remainder term is generally expressed as Rn=O(hn+1) which shows that
truncation error is proportional to the step size, h, raised to the n+1 where n is
the number of terms included in the expansion. It is clear that as the step size
decreases, so does the truncation error.
The Tradeoff in Errors
The total error of an approximation is the summation of roundoff error and
truncation error. As seen from the previous sections, truncation error decreases
as step size decreases. However, when step size decreases, this usually results
in the necessity for more precise computations which consequently results in an
increase in roundoff error. Therefore, the errors are in direct conflict with one
another: as we decrease one, the other increases.
UNIT - I 33
[COMPUTATIONAL FLUID DYNAMICS]
However, the optimal step size to minimize error can be determined. Using an
iterative method of trying different step sizes and recording the error between
the approximation and the true value, the following graph shown in Figure 2 will
result. The minimum of the curve corresponds to the minimum error achievable
and corresponds to the optimal step size. Any error to the right of this point
(larger step sizes) is primarily due to truncation error and the increase in error
to the left of this point corresponds to where roundoff error begins to dominate.
While this graph is specific to a certain function and type of approximation, the
general rule and shape will still hold for other cases.
CONVERGENCE OF SEQUENCES
UNIT - I 35
[COMPUTATIONAL FLUID DYNAMICS]
Here, a1 and a2 are the coefficients of x, and b1 and b2 are the coefficients of y,
and c1 and c2 are constant. The solution for the system of linear equations is the
ordered pair (x, y), which satisfies the given equation.
In this article, we are going to learn different methods of solving simultaneous
linear equations with steps and many solved examples in detail.
Substitution Method
Elimination Method
Graphical Method
Now, let us discuss all these three methods in detail with examples.
Substitution Method
Follow the below steps to solve the system of linear equations or simultaneous
linear equations using the substitution method:
Example 1:
Solve the following system of linear equations using the substitution method:
UNIT - I 36
[COMPUTATIONAL FLUID DYNAMICS]
3x – 4y = 0
9x – 8y = 12
Solution:
Given:
3x – 4y = 0 …(1)
9x – 8y = 12 …(2)
Step 1: Rearrange equation (1) to express x in terms of y:
⇒3x = 4y
⇒x = 4y/3 …(3)
Now substitute x = 4y/3 in (2), we get
⇒9(4y/3) – 8y = 12
⇒(36y/3) – 8y = 12
⇒12y – 8y = 12
⇒4y = 12
⇒y = 12/4 = 3
Hence, the value of y is 3
Now, substitute y= 3 in (3) to get the value of x
⇒ x = [4(3)]/3 = 12/3 = 4
Therefore, x = 4
Thus, the solution for the simultaneous linear equations (x, y) is (4, 3).
Elimination Method
To find the solutions (x, y) for the system of linear equations using
the elimination method, go through the below steps:
5. Finally, the ordered pair (x, y) is the solution of the simultaneous equation.
Example 2:
Solve the system of equations using the elimination method:
2x+3y= 11
x+2y = 7
Solution:
Given:
2x+3y= 11 …(1)
x+2y = 7 …(2)
Now, multiply the equation (2) by 2, we get
2x + 4y = 14 …(3)
Now, solve the equation (1) and (3),
2x + 3y = 11
2x + 4y = 14
– – –
__________
0 – y = -3
⇒ -y = -3
⇒y=3
Now, substitute y = 3 in equation (2),
x + 2(3) = 7
x+6=7
x=1
Hence, x = 1 and y = 3
Therefore, the solution for the system of equations 2x+3y= 11and x+2y = 7 is x =
1 and y=3.
Graphical Method
To solve the simultaneous linear equations graphically, follow the below steps:
UNIT - I 38
[COMPUTATIONAL FLUID DYNAMICS]
x 1 2 3
y 3 2 1
Now, take the second equation x-y = 0
When x = 1,
⇒ -y = 0-1
⇒y=1
When x = 2,
⇒ -y = 0-2
⇒y=2
UNIT - I 39
[COMPUTATIONAL FLUID DYNAMICS]
When x = 3,
⇒ -y = 0-3
⇒y=3
x 1 2 3
y 1 2 3
Now, plot these points in the XY coordinate plane.
From the graph, it is observed that the point of intersection of two straight lines
is (2, 2), which is the solution for the given simultaneous linear equation.
UNIT - I 41
[COMPUTATIONAL FLUID DYNAMICS]
-2x - 2y + z = 3
On Prof. McFarland's tests, you would be asked to solve the above problem
[1]
Write the given system (above) as a single matrix equation:
[2]
Solve the matrix equation obtained in step [1] above; i.e., find X.
In the MATRIX INVERSE METHOD (unlike Gauss/Jordan), we solve for the matrix
variable X by left-multiplying both sides of the above matrix equation (AX=B) by A-1. Typically,
A-1 is calculated as a separate exercize ; otherwise, we must pause here to calculate A-1.
[3]
Using [2] above, write the solution to the original system:
x = -3
x- y + 3z = 2
14
2x + y + 2z = 2 y=
5
UNIT - I 42
[COMPUTATIONAL FLUID DYNAMICS]
-2x - 2y + z = 3 z=
13
5
Band matrix
Bandwidth
Formally, consider an n×n matrix A=(ai,j ). If all matrix elements are zero outside
a diagonally bordered band whose range is determined by constants k1 and k2:
then the quantities k1 and k2 are called the lower bandwidth and upper
bandwidth, respectively.[1] The bandwidth of the matrix is the maximum
of k1 and k2; in other words, it is the number k such that if .
Examples
UNIT - I 43
[COMPUTATIONAL FLUID DYNAMICS]
Applications
In numerical analysis, matrices from finite element or finite difference problems
are often banded. Such matrices can be viewed as descriptions of the coupling
between the problem variables; the banded property corresponds to the fact
that variables are not coupled over arbitrarily large distances. Such matrices can
be further divided – for instance, banded matrices exist where every element in
the band is nonzero. These often arise when discretising one-dimensional
problems.
Problems in higher dimensions also lead to banded matrices, in which case the
band itself also tends to be sparse. For instance, a partial differential equation
on a square domain (using central differences) will yield a matrix with a
bandwidth equal to the square root of the matrix dimension, but inside the band
only 5 diagonals are nonzero. Unfortunately, applying Gaussian elimination (or
equivalently an LU decomposition) to such a matrix results in the band being
filled in by many non-zero elements.
Band storage
Band matrices are usually stored by storing the diagonals in the band; the rest is
implicitly zero.
For example, a tridiagonal matrix has bandwidth 1. The 6-by-6 matrix is stored
as the 6-by-3 matrix
A further saving is possible when the matrix is symmetric. For example,
consider a symmetric 6-by-6 matrix with an upper bandwidth of 2:
This matrix is stored as the 6-by-3 matrix:
UNIT - I 44
[COMPUTATIONAL FLUID DYNAMICS]
UNIT - I 45