This is my work to build a software 3D renderer as guided by the "Learn Computer Graphics Programming" course by Gustavo Pezzi.
Here's where I take videos of significant milestones to look back on the progress I've made.
A cube made up of a cloud of points with simple perspective projection applied.
01.Simple-Perspective-Cube-Points.mp4
01.Simple-Perspective-Cube-Points.mp4
Rotating the cube with simple rotation transformations.
02.Simple-Vector-Rotation-Transformation.mp4
02.Simple-Vector-Rotation-Transformation.mp4
The cube is now expressed as a collection of triangle faces and is rendered as a wireframe using a simple line rasterization algorithm.
03.Wireframe-Cube.mp4
Instead of a static cube, the renderer can now read in arbitrary OBJ files to render.
04.Render-OBJ-File.mp4
Implemented a bunch of vector functions and used them to implement back-face culling; mesh faces that aren't visible by the camera are no longer rendered.
05.Back-Face-Culling.mp4
Triangles are now filled in with a set of static colors. Render modes can be changed at runtime between wireframe and rasterized, plus an option for showing dots on vertices and enabling/disabling back-face culling.
06.Triangle-Rasterization.mp4
With a naive sorting algorithm, faces are rasterized in the correct depth order, preventing back faces from "bleeding through."
07.Face-Depth-Sorting.mp4
Use rotation, scaling, and translation matrices to apply transformations to mesh vertices.
08.Transformation-Matrices.mp4
08.Transformation-Matrices.mp4
Simple lighting appearance by shading each face in relation to a global light source.
09.Flat-Shading-Global-Lighting.mp4
09.Flat-Shading-Global-Lighting.mp4
Textures are mapped onto triangle faces using barycentric weighting.
10.Texture-UV-Mapping.mp4
Textures are mapped with perspective-corrected barycentric weighting.
11.Perspective-Correct-Interpolation.mp4
11.Perspective-Correct-Interpolation.mp4
Textures can be loaded from PNG files and mapped using UV coordinates from associated OBJ files.
12.Obj-Texture-Loading.mp4
Z-Buffer is used to determine which pixels are rendered on top, reducing glitching of triangles popping on top of others.
13.Z-Buffer.mp4
A simple camera that can rotate left/right and translate up/down/forward/back.
14.Simple-Camera.mp4
Meshes are now clipped against the edges of the camera frustum, adjusting triangle and texture coordinates to only draw what is visible.
15.Frustum-Clipping.mp4
Updated input processing to smooth out camera movement. Reduced resolution to achieve a retro look with higher frame rates.
16.Camera-Adjustments.mp4
Multiple meshes can now be rendered in the scene.
17.Multiple-Meshes.mp4
Concepts that I still lack some total understanding of:
- Coordinate system handedness and how it affects operations such as cross product
- Perspective correct interpolation
The two triangles that make up the viewer's angle to the screen-space projected
point and the 3D point are similar triangles, which share a constant ratio
This can be simplified:
Note
Similarly, the Y perspective projection:
The "handedness" of the coordinate system defines how different dimensional axis are interpreted.
- "Left-handed" coordinate systems define Z values as growing "into" the screen, away from the viewer.
- "Right-handed" coordinate systems define Z values as growing "out of" the screen, toward the viewer.
DirectX uses a left-handed coordinate system, while OpenGL uses a right-handed coordinate system.
Illustration from https://www.oreilly.com/library/view/learn-arcore/9781788830409
A separate but related convention is triangle "winding order," or the order in which the vertices around the edge of a triangle are traversed. The winding order can be clockwise, or counter-clockwise.
The coordinate system and winding order convention can determine how normal values should be calculated.
There's additional context in this lesson video.
Given a triangle with points
Use the same method as above, but with the right hand. Notice that the normal direction is inverted given the same triangle.
Physicists normally use the right hand rule.
Goal: Rotate a 2D vector
Let
Let
Since
Since
After applying the angle
Trig functions that add two values can be expanded using the angle addition formula:
You can substitute
Similarly,
These are the formulas that are used by a rotation transformation matrix.
The same principle applies to 3 dimensions, but with one dimension at a time:
vec3_t Vec3RotateX(vec3_t v, float angle)
{
vec3_t rotated_vector = {
.x = v.x,
.y = v.y * cosf(angle) - v.z * sinf(angle),
.z = v.y * sinf(angle) + v.z * cosf(angle),
};
return rotated_vector;
}
vec3_t Vec3RotateY(vec3_t v, float angle)
{
vec3_t rotated_vector = {
.x = v.x * cosf(angle) - v.z * sinf(angle),
.y = v.y,
.z = v.x * sinf(angle) + v.z * cosf(angle),
};
return rotated_vector;
}
vec3_t Vec3RotateZ(vec3_t v, float angle)
{
vec3_t rotated_vector = {
.x = v.x * cosf(angle) - v.y * sinf(angle),
.y = v.x * sinf(angle) + v.y * cosf(angle),
.z = v.z,
};
return rotated_vector;
}Magnitude refers to the length of the vector:
Adding vectors is basically starting one vector from the end of the other:
Subtraction is the same as addition, but invert/negate the second vector:
The cross product helps to calculate the normal vector of a plane.
The cross product of two vectors
To calculate the cross product:
There are two possible perpendicular vectors for any given pair of vectors. The order of operands will determine which direction is calculated.
The magnitude of the cross product is related to the angle between the two input vectors:
Resource for additional information on how to derive the cross product
The dot product of two vectors produces a scalar value of the sum of the components of each given vector multiplied together.
When used with unit vectors, the dot product can be used to produce a "projection" of one vector onto the other.
The more "aligned" the vectors are, the larger the dot product is. If they are
exactly the same, the dot product is
At a 90 degree offset, the dot product is
If the two vectors are complete opposites, the dot product is
A normalized vector is a vector with a magnitude of 1.
If you don't care about the length of a vector, it's often better to express it as a normalized vector.
If we'd like to avoid rendering faces that are facing away from the camera, we can simply compare their normal vector to the vector of the camera.
Here's how we can get the normal vector of a triangle face:
Note that we take our vertices in clockwise order, consistent with our chosen coordinate system.
Once we have the normal vector, we can compare it to the camera ray vector using the dot product to determine if the face is facing toward the camera or away from it.
To find the camera ray vector, we simply subtract the camera position from the point we are observing.
Just a way of expressing and manipulating a set of values in rows and columns.
Matrix
A matrix has a set of elements that can be referenced as followed:
Matrices are useful for solving systems of equations:
In computer graphics, matrices are useful in converting sets of geometric data into different coordinate systems. They can be used to apply translation, rotation, projection, and many other transformations.
Simply add each element together.
Simply subtract each element from each other.
Matrix multiplication is more complex. For each combination of row and column you must multiply the row elements with the column elements and sum the results:
Multiplication is only possible when the number of columns on the left matrix is equal to the number of rows on the right matrix.
The dimension of the resulting matrix will have the number of rows of the left matrix and the number of columns of the right matrix.
Matrix multiplication is not commutative:
A square matrix with 1's in the diagonal and 0's everywhere else.
Any matrix multiplied by the identity matrix will return an unchanged result.
Earlier, we determined that you can calculate the new
This can be represented in matrix form:
This matrix is called a 2D rotation matrix:
When it is multiplied against a set of coordinates, it produces a set of
transformed coordinates rotated by
In linear algebra, linear transformations can be represented by matrices.
4x4 matrices are usually used to represent 3D transformations (scale, translation, rotation, etc.)
We use 4x4 instead of 3x3 because some transformations (ex. translation) require an extra row/column.
To enable multiplication, an extra component
Performing this multiplication yields the following:
Performing the multiplication yields the following:
These are defined in a left-handed coordinate system, such that each axis is rotated counter-clockwise around its axis. See direction.
The rotation matrix for the X axis looks like this:
The rotation matrix for the Y axis looks like this:
The rotation matrix for the Z axis looks like this:
By combining translation, rotation, and scaling matrices via matrix multiplication, we can express the location of an object in the world with a single matrix.
The order of transformations matters. The usual order is:
- Scale
- Rotate
- Translate
If these are performed out-of-order, it may result in unexpected values. For example, if translation is applied before rotation such that the object has moved away from the origin (0, 0), then rotation will be still be applied around the origin of (0, 0), exaggerating the result of the rotation transformation.
You can also use matrices to achieve projection of points onto a plane.
Projection Matrices handle:
- Aspect ratio: adjust x and y values based on screen width and height
- Field of view: adjust x and y values based on FOV angle
- Normalization: adjust x, y, and z values to sit between -1 and 1
The aspect ratio of the height vs width of the screen.
The field of view is defined as the 'scale factor' for how points should be adjusted to fit within the given FOV angle.
We must also normalize z to a 'normalized device coordinate' between 0 and 1.
We do this by defining two planes;
We can use values
We can substitute in the values we defined above:
To apply this using matrix multiplication, we can use a matrix like the following:
Note the
This note has additional explanation for the values contained within the projection matrix.
In the examples above, vertices are represented in "column-major" order:
An alternative representation is "row-major":
Different graphics APIs may choose to use different representations for a variety of reasons.
One implication to row-major vs column-major is the order of operands for matrix multiplication. Vertices defined in row-major format are "post-multiplied" against a projection matrix:
Where as column-major vertices are "pre-multiplied":
This note has more details.
Simple lighting can be achieved by implementing one global light with a direction vector. The direction vector can be compared against each face normal via dot product to determine a lighting intensity.
Texture coordinates are represented as
UV mapping is the process of mapping the vertices of a face to positions on a texture.
Barycentric coordinates are like applying a set of weight values on vertices to decide where a point is located in the middle of a triangle face.
These 'weight values' also represent the areas of the three sub-triangles made
by the point
The sum of the barycentric weights is always
The weights 'pull' the vertices to result in coordinate
Given the triangle and point
To calculate the area of the triangle
Ensure that the order of the cross product matches the coordinate system in use, left-handed (clockwise) in this case.
The straight mapping we achieved so far is called "affine mapping." It does not take into account perspective.
To find depth values, you cannot simply interpolate
However, the reciprocal of the Z components is linear. So we can use
The original
To achieve perspective correct mapping, we:
- Use the reciprocal of all attributes (
$\frac{1}{w}$ ) (now linear in screen space) - Inerpolate over the triangle face (using barycentric weights,
$\frac{1}{w}$ factor) - Divide all attributes by
$\frac{1}{w}$ (undoes the perspective transform).
There is a good academic paper describing the derivation of perspective correct interpolation here.
Additional resources:
- https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/perspective-correct-interpolation-vertex-attributes.html
- https://www.youtube.com/watch?v=zPLfyj-Szow&t=2218s
Can also be called the depth buffer.
This stores the depth of each screen pixel in an array, and helps determine which pixel is "in front."
Alternative to painter's algorithm.
As explained in perspective-correct texture interpolation, the Z depth is not
linear in screen space across the surface of the triangle. Instead, like
texture mapping, the reciprocal
Or a "view matrix" is used to transform the 3D scene into a perspective from a camera or view.
One approach to doing this is implementing a "look at" function that returns a matrix that can transform world vertices into camera space from a certain point looking at another point.
The matrix will:
- Translate the whole scene inversely from the camera eye position to the
origin (matrix
$M_T$ ) - Rotating the scene with reverse orientation (matrix
$M_R$ ) so the camera is positioned at the origin and facing the positive Z axis (since our renderer is left-handed).
The translation matrix will simply be the negated coordinates of the eye position:
For the rotation matrix, we need to compute the forward (
This matrix is used to convert between coordinate systems. Note, it must be inverted (since the scene must move 'around' the camera). An inverted matrix can be thought of like an "undo" of the original matrix.
For orthogonal matrices, inversion is a simple matter of transposing (flipping so that rows become columns and columns become rows).
Multiplying the rotation and transformation matrices with the values above:
The last column can be simplified using dot product:
Clipping is the process of removing objects or line segments that are outside the viewing volume.
For frustum clipping, six planes are used:
- Top
- Bottom
- Left
- Right
- Near
- Far
A plane is defined by a point
Notably, the camera origin point is present on every frustum plane. So it
makes for a convenient starting point
To calculate the right frustum plane, simply draw the normal vector at 90
degrees from the right camera boundary vector (
A similar process can be used for the left, top, and bottom planes.
Unlike the other planes, the point
Similarly, for the far plane:
The negative
A point
A point
A point
The linear interpolation equation allows us to calculate any point along a line:
The interpolation factor
Given a plane that intersects the line between
You can use the dot product to determine each point's relationship with the plane:
We can use these along with the linear interpolation equation:
An dot product each component with
We know the value of
We need to isolate
or
First, list each of the vertices along the boundary of the polygon and determine if each is inside or outside of the plane. The lines with vertices that straddle the boundary must be clipped, and the intersection point should be added to both lists.
| Inside | Outside |
|---|---|
The resulting polygon is the set of points from the "inside" vertices list.
This operation needs to be repeated for each plane in the view frustum in order to achieve frustum space clipping.
To turn a polygon back into a set of triangles, we can simply iterate through sets of 3 vertices like so:
for (i = 0; i < (num_vertices - 2); ++i) {
index0 = 0;
index1 = i + 1;
index2 = i + 2;
create_triangle(index0, index1, index2);
}
UV coordinates for triangles can be clipped via linear interpolation using same interpolation factor that produces the new vertices along the edges of the clipping plane.
Usually, graphics pipelines will perform clipping after projection but before perspective divide. There are several advantages to doing this:
- Perspective divide is where x, y, and z are divided by w. Thus, before
perspective divide, every vertex that is inside the frustum is between
$-1 * w$ and$1 * w$ , making frustum culling as trivial as comparing each component against$w$ . - Texture coordinates can still be interpolated linearly in this space, since the perspective divide has not happened yet.
- Division by zero is avoided, since clipping and culling are against
$z_{near}$ .
An additional resource on homogeneous clipping: https://fabiensanglard.net/polygon_codec/index.php
Conventions that handle how geometry primitives should be rasterized.
Defined conventions can make sure cases like shared edges can be handled properly without gaps or overdraw.
A "fill convention" handles these cases with neighboring triangles. One such convention is called the "top left rule," where pixels are defined as "inside" a triangle if they are along the top edge of left edge.
Use floating points to represent sub-pixels. Bias towards the center of each pixel when calculating geometry (0.5, 0.5).
Fixed-point math becomes important here (vs floating points).