ps2 Normalmapping
ps2 Normalmapping
Morten Mikkelsen
IO Interactive
mm@ioi.dk
November 4, 2004
Abstract
This paper describes a method for doing PC-style normal mapping on the
Playstation 2 by taking advantage of the GS and VU1 units. Two variations
are described: A cheap two-pass solution without per-pixel normalization
and a per-pixel normalized alternative which requires four passes.
1 Introduction
1
2 Previous work
One problem is that T L is clamped to zero per vertex and not per pixel
which may result in incorrect dot products. Figure 1 illustrates the interpo-
lation problem.
3 The Approach
3.1 Overview
The two normal mapping methods share the same general rendering steps.
The only difference between the two is how they achieve signed multiplica-
tions. In the general case, two buffers are needed: A 32 bit light accumulation
buffer (LAB ) and a 32 bit dot product buffer (DPB ). The rendering steps
(1-4) for rendering normal mapped objects are as follows:
1. Render all the visible normal mapped objects to fill the z-buffer and to
set the ambient color in the LAB.
2.1 Clear the DPB to 0x00808080. This pass is free for every light
after the first one (as will be explained in section 3.4).
2.2 Disable color clamp.
2.3 Render all objects that are hit by the current light so that signed
multiplications are delivered to red, green, and blue.
3
2.4 Apply a 2D post filter pass over the DPB to achieve R + G + B,
resulting in the final dot product.
2.5 Enable color clamp.
2.6 Add the lighting contribution of the DPB to the LAB. To do this
the DPB is read as an 8 bit texture with the dot products as texels.
We use an intensity look-up palette (ILP ) to add the dot product
to all 3 channels of the LAB. The ILP entries of the negative dot
products are set to zero. The look-up is finally multiplied by the
color of the light and added to the LAB.
3. We are done with the DPB. Clear it to zero and render all objects unlit
with its diffuse texture.
4. Multiply the buffer (unsigned) of the diffuse layer with the LAB:
(reddif f · redlight , greendif f · greenlight , bluedif f · bluelight )
Even if there is not enough VRAM for two draw buffers, it is still possible
to use the techniques in this paper. Instead, use a single light and save the
dot product layer in a free alpha channel like the one in the display buffer.
The intensities may be applied once it seems convenient during the rendering
pipeline. For a single light source, pre-filling the z-buffer (step 1) is not
necessary.
Alternatively, 2 to 3 lights can be used by storing their dot product layers
in unused alpha pixels in the VRAM. Then use the frame buffer as the LAB.
Once the LAB is done, copy the red, green, and blue to the free alphas and
render the unlit diffuse buffer. Afterwards, at step 4, apply the LAB by
using these alphas. Note, this is not necessary if more than one draw buffer
is available.
4
3.2 Achieving signed multiplication with unsigned in-
put without the per vertex clamping problem
Assume we have two signed values a and b and we wish to calculate the
product a · b (implicit signed shift right by 7). Since the input is 8 bit,
we assume a ∈ {−127, −126, ..., 128} and b ∈ {−128, −127, ..., 127} so that
a · b ∈ {−128, −127, ..., 127}. We create two intermediate values a2 and b2
by the equations (1) and (2):
a2 = 128 − a (1)
b2 = b + 128 (2)
a · b = (128 − a2 ) · (b2 − 128)
= 128b2 − a2 b2 + 128a2 − 1282 (3)
We can rearrange a little and take advantage of the fact that every mul-
tiplication has an implicit shift to the right by 7:
Now (4) is quite close to an equation that can be computed using the GS
blend mode function. Furthermore, a2 and b2 are both 8 bit unsigned inputs.
Alternatively, the signed product can be expressed as (5):
The last term, however, is still signed. In order to fix that, we create
another intermediate value c2 and rewrite (5) to:
5
3.3 Adding R+G+B.
Similar calculations can be made for (7), by again leaving out the sub-
traction by 128:
As mentioned in step 2.6, an ILP is used to add the dot products to the
LAB. Assuming the pixels we look-up in the DPB are the final dot products,
the ILP must contain 1, 2, 3, . . . , 128 in red, green, and blue of the first 128
entries. For free clamping, we keep zero in the last 128 entries.
It is possible to simplify equation (9) to the sum of r + g + b. Since the
result is used as a palette look-up, we can simply compensate by reordering
the palette, so the new palette is a simple permutation of the original palette.
Leaving out the subtraction by 128 in (9) just offsets the entries of the
ILP by 128.
The main principle of fetching any channel in a 24/32 bit frame buffer
on the PS2 is using the buffer as an 8 bit texture twice the width and twice
the height. Looking at the tables in section 8.3 of the GS users manual
[Sony02] makes it clear that real-time swizzling is needed to get this to work.
1
Note that all calculations are made using modulo by 256 on the GS with color clamping
disabled.
6
The good news is that this can be done using a selection of pretesselated
sprites and the region repeat mode. The details of how to fetch channels in
a 24/32 bit buffer on the PS2 are available on the playstation2-linux website
[Breugelmans01].
When using multiple lights, one trick is to clear the DPB by setting the
z-buffer (24 bit) to point to the DPB when adding the contribution of the
current light to the LAB (step 2.6). The depth is set to 0x00808080 and the
TEST register is set to all pixels pass. This involves reading from the DPB
as an 8 bit texture and writing to it as a z-buffer. There are two ways to
make this work without getting texels overwritten before they are read:
• Set the XYZ2s of the sprites according to the Z24/Z32 layout by setting
them so the four 32x16 regions in the pages (a page is 64x32) are
swapped along the diagonals.
7
They both give the same result. It works because the z-buffer is forced
to write to pixels inside the 32x16 region that is currently being read as a
texture (and is already cached). This eliminates the deleting of pixels in
32x16 regions that have not yet been read. Of course this also means the
contribution gets added to the LAB in PSMZ32 layout. This can be fixed by
rendering the signed multiplications in the DPB in PSMZ32 layout as well,
which will take us back to PSMCT32 layout in the LAB. Alternatively, one
could also just unswizzle the LAB on the GS at the end, once all lights have
been processed (before step 3).
Step 2.3 can be achieved in two passes. In order to do so, the tolight vector
must be packed and passed per vertex stored as vertex colors. The packed
tolight vector lx , ly , lz is the normalized direction towards the light source
Tx , Ty , Tz (transformed into tangent space), scaled and then decentralized
using equation (2):
(lx , ly , lz ) = (128, 128, 128) + ((char)(s · Tx ), (char)(s · Ty ), (char)(s · Tz ))
The value s is an empirical scale factor and is given later in this section.
The surface normal nx , ny , nz of length 128 in tangent space is packed
by splitting it into a postive and a negative side. This is done similar to
[Breugelmans02] but in addition an alpha term is computed. Two palettes
are used, one for each side. An additional difference is that the tolight vectors
are not divided into positive/negative sides but simply offset by 128.
In order to do the per-pixel shading, we need to compute the following
(according to equation (10)):
8
The vector Lx , Ly , Lz is the barycentrically weighted result of the sur-
rounding per-vertex packed tolight vectors. The final dot product is com-
puted as in equation (9) (without subtraction by 128), which yields:
This means that we can precompute the three last terms and store the
result in alpha of the normal map.
9
(a) Pass 1 (b) Pass 2 (c) alpha added to rgb
Figure 2: (a) A low resolution model rendered using MODULATE and the
positive palette (first pass). (b) Second pass uses the negative palette and
subtracts the source from the frame buffer. Furthermore, the alpha of the
normal map is passed to alpha of the frame buffer. (c) Third shot shows the
result after adding the alpha. This is not a part of step 2.3, but done at step
2.4, i.e. purely 2D without using the geometry of the model. Adding r +g +b
yields the final dot products (see section 3.3)
In the two-pass case, we have to modify the post filter of the DPB (step
2.4) to get the final dot product result: r + g + b + a + a + a. This is done
by using one 2D pass reading the DPB in PSMT8H and adding the contents
to red, green, and blue (see figure 2(c)). The result at this point is not the
signed multiplications since we have just added the last term na by adding
an equally large slice (one third) of it to each channel. The final dot products
are obtained by completing the post filter adding red, green, and blue.
Per vertex attenuation of any kind may be achieved by scaling down the
vectors towards the light source Tx , Ty , Tz .
Through trial and error, good results (i.e., without wrapping errors) have
been achieved using the factor s = 122. The code for packing the normals
is shown in appendix 2. It is possible that larger factors may be used for s
depending on how rounding was performed during normal map creation.
10
3.6 Four-pass and per pixel normalization solution
q
m = 2· Tx2 + Ty2 + (Tz + 1)2
q
= 2 · 2 · Tz + 2
Tx
s = + 0.5
m
Ty
t = + 0.5
m
So using this method, the tolight vectors are delivered via the texture
coordinates and not the vertex colors.
After the first pass, the DPB contents is rasterized triangles with their
normalized tolight vectors. To execute the signed dot product multiplications
on the GS, we take advantage of equation (8). So, for the subsequent 3 passes
the following GS blendmode is used:
This blend mode implies that X, Y and Z of the normal map must be
passed through via alpha, so 3 palettes have to be used: One for each X, Y
2
Code for precomputing the sphere map is shown in Appendix 1.
11
(normalmap rgb − framebufferrgb ) ⋅ normalmap a + framebufferrgb
This blend mode implies that x, y and z of the normal map must be passed through via
This
alpha,blend
so 3 mode implies
palettes that
have to bex,used:
y andOne
z offor
theeach
normal
(X, map
Y andmust
Z) ofbethe
passed through
normal map. via
A
alpha, so 3 palettes have to be used: One for each (X, Y and Z) of the normal
quantized 8 bit normal map can be used with three palettes or alternatively three map. A
8 bit
quantized 8 bit normal map can be used with three palettes or alternatively
normal maps, which will give the same quality as using a standard 24 bit normal map. three 8 bit
normal
The RGB maps, which
of the will give
3 palettes the same
should be setquality
to 128 asso using
tolighta will
standard 24 bit normal
be subtracted from map.
128
The RGB of the 3 palettes should be set to
during blending (the first part of equation 2.3).128 so tolight will be subtracted from 128
during blending (the first part of equation 2.3).
The palette for an 8 bit normal map is structured as shown below:
The palette for an 8 bit normal map is structured as shown below:
0 1 255
0 1 255
r g b a r g b a ... r g b a
r g b a r g b a ... r g b a
X1 Y1 Z1 0 X2 Y2 Z2 0 X256 Y256 Z256 0
X1 Y1 Z1 0 X2 Y2 Z2 0 X256 Y256 Z256 0
The palette used to represent X during the signed
(a) Ordinary multiplication pass is:
palette
The palette used to represent X during the signed multiplication pass is:
0 1 255
0 1 255
r g b a r g b a ... r g b a
r g b a r g b a ... r g b a
128 128 128 X1 128 128 128 X2 128 128 128 X256
128 128 128 X1 128 128 128 X2 128 128 128 X256
(b) Y
Two similar palettes are created for Reordered
and Z. Palette
Two similar palettes are created for Y and Z.
The resulting signed multiplications do not have rounding errors, and are identical to a
The resulting signed
Themultiplications
of ado not haveofrounding errors, andmap.
are identical to a
Figure
normal 3: multiplication
char (a) a ⋅ b with
structure apalette anright
signed shift 8 bit
by normal (b) The
7 (offset by 128).
normal char multiplication a ⋅ b with a signed shift right by 7 (offset by 128).
reordered palette used to represent X during the signed multiplication pass.
ToSimilar
summarize,
ones the
are procedure
made for is
Yas follows:
and Z.
To summarize, the procedure is as follows:
• Pass 1: Render triangles with the sphere map applied into the DPB.
• ZPass
and 1: Render
of the triangles
normalthe map. Awith the spherebit
quantized map appliedmap
into can
the DPB.
• Pass 2: Render triangles again but8use normal
palette for X, set be used
mask with
to affect red
• Pass
three only 2:
palettesRender the triangles
or alternatively again but use palette for X, set mask to affect red
and use the blend modethree 8 bit
above, use normal maps,
the normal mapwhich will give the
as a texture.
only and use the blend mode above, use the normal map as a texture.
• Pass
same 3: Same
quality as pass
as using 2 but use palette
a standard 24 bitfor Y and map.
normal affect green
The RGBonly. of the 3
• Pass 3: Same as pass 2 but use palette for Y and affect green only.
• Passshould
palettes 4: Same
beas pass
to 22128
but (see
use palette
figure for Z and affect blue
willonly.
• Pass 4: Same asset
pass but use palette 3(b))
for so tolight
Z and affect blue be subtracted
only.
from 128 during blending (the first part of equation (8)).
The resulting signed multiplications do not have rounding errors, and are
identical with an ordinary char multiplication a · b with a signed shift right
by 7 (offset by 128).
• Pass 1: Render triangles with the sphere map applied into the DPB.
• Pass 2: Render the triangles again but use palette for X, set mask to 8
affect red only and use the blend mode above, use the normal map as 8
a texture.
• Pass 3: Same as pass 2 but use palette for Y and affect green only.
• Pass 4: Same as pass 2 but use palette for Z and affect blue only.
12
(a) Spheremap (b) Pass 1: Apply sphere map
Figure 4: (a) The sphere map normalization table/texture. (b) A low resolu-
tion model rendered with the sphere map applied (first pass). (c-e) The 2nd,
3rd and 4th pass. Each pass updates a single channel in the framebuffer (red,
green and blue respectively). Thus, (e) shows the final signed multiplication
offset by 128. Adding these yields the final dot products (see section 3.3)
13
The sphere map forms tolight vectors which are packed according to equa-
tion (1) and hence are in the set {0, 1, ...255} (see the code in appendix 1).
As mentioned in section 3, a traditional normal map is used, centered at 128
and also in the set {0, 1, ...255} (equation (2)).
The term n·l is the dot product between the unit length vertex normal and
the unit length direction towards the light in object space. This is actually
fortunate for this four-pass method since it fixes a well-known sphere mapping
artifact, i.e., that projecting onto a sphere map per vertex and not per pixel
goes wrong once the look-ups reach the far back of the sphere. Applying this
factor will make sure triangles facing away from the light remain unlit.
The question is how the attenuation factor should be applied now that
the light vectors are in a sphere map and not in the vertex colors. Applying
any kind of attenuation factor can be done by scaling down the vectors in
the sphere map during the first pass. The attenuation cannot be put directly
into the vertex colors of this pass since the sphere map is stored as vectors
subtracted from 128. A solution is to use any of the HIGHLIGHT texture
functions to get the correct result, by undoing equation (1), applying the
scale, and applying equation (1) again:
So by setting red, green, and blue of the vertex color to scale · 128 and
then (1 − scale) · 128 in the vertex color alpha and by using a HIGHLIGHT
texture function during first pass, per vertex attenuation is possible.
14
This is done by using the GS blend mode to apply the attenuation in the
same way as when using the HIGHLIGHT texture function. Alternatively,
this can also be applied after all four passes are complete.
15
4 Results
The methods have been implemented and tested on the PS2. The results
for a low resolution model rendered using the four-pass method running in
real-time on the PS2 are shown in figure 6. The model was lit by two point
light sources, a green and a blue. The frame buffer resolution was 512 x 448
and the normal map (figure 5) is 256 x 256 in 8 bit. A comparison of the
two methods is shown in figure 7.
16
(a) Wireframe (b) Gouraud shading
Figure 6: (a) Wireframe model (412 triangles), to give an idea of the amount
of actual detail in the model. (b) Traditional Gouraud shading. (c) The
model with normal mapping applied. (d) Normal mapping with specular
highlights.
17
(a) Two-pass front (b) Four-pass front
Figure 7: Shown to the left (a) and (c) is the front and the back of the head
rendered using the two-pass DOT3 solution. To the right (b) and (d) shows
the same shots using the four-pass method. (b) is the same picture as seen
in 6(c). The edge highlighting is especially noticable on (c).
18
5 Conclusion
6 Acknowledgements
The author would like to thank Kasper Høy Nielsen for his help restruc-
turing this paper and for his many suggetions that improved its readabil-
ity. Thanks also to Mircea Marghidanu, Steven Osman, Lionel Lemarie and
Trine Mikkelsen for additional proof reading and their insightful comments.
Finally, thanks to IO Interactive and to Eidos for letting me publish this
paper.
References
[Blinn78] Blinn, J.F.: ”Simulation of wrinkled surfaces”, Proceedings of the
5th annual conference on Computer graphics and interactive techniques,
ACM Press, pp. 286–292, 1978.
19
[Cohen98] Cohen J., Olano, M., Manocha, D.: ”Appearance-Preserving Sim-
plification”, Computer Graphics, SIGGRAPH Procedings, July, 1998.
20
Appendix 1: Code to generate the sphere map texture
// det = B ^2 - 4 AC
// lz = ( - B + - sqrt ( det ))/2 A
// this can be reduced since det = (2 - B )^2
// lz = ( - B + -(2 - B ))/2 A
// so we have two roots
// lz_1 = ( - B +(2 - B ))/2 which is 1 - B ( usable )
// lz_2 = ( - B -(2 - B ))/2 which is -1 ( not usable )
// so this means lz = 1 - B
// once we have the lz component we
// can calculate lx and ly aswell
21
float Lz = 1 - ( s2 * s2 + t2 * t2 )*2;
// zero vector
int r = 128;
int g = 128;
int b = 128;
if ( Lz >= -1)
{
const float m = 2* sqrt (2* Lz +2);
float Lx = ( s -0.5)* m ;
float Ly = ( t -0.5)* m ;
// write
const int vect = ( int ) (( b < <16)|( g < <8)|( r < <0));
(( int *) mem )[ y * iWidth + x ] = vect ;
}
}
}
22
Appendix 2: Packing normals for two-pass
// scale to range
int iX = ( int ) ( nx * scale );
int iY = ( int ) ( ny * scale );
// positive side
const int R_pos = Max ( iX , 0);
const int G_pos = Max ( iY , 0);
const int B_pos = Max ( iZ , 0);
// negative side
const int R_neg = Max ( - iX , 0);
const int G_neg = Max ( - iY , 0);
const int B_neg = Max ( - iZ , 0); // B_neg should always be zero
const int delta = ( R_neg - R_pos )+( G_neg - G_pos )+( B_neg - B_pos );
int alpha = (3*128 + delta + 1) / 3;
alpha = ( alpha <0)?0:(( iX >255)?255: alpha );
23