Report 3
Viewing
Miriam de Bengoa Aletta
INDEX
1. Introduction
2. Viewing transformations
3. Projective transformations
4. Some properties of the perspective transform
5. Field-of-view
6. References
Introduction
Previously, we saw how to use matrix transformations to arrange geometric objects in 2D or 3D
space. Another important use of geometric transformations is moving objects between their 3D
locations and their positions in a 2D view of the 3D world. This 3D-to-2D mapping is called a viewing
transformation, and it plays an important role in object-order rendering, where we need to quickly
determine the image-space location of each object in the scene.
We will focuse on how to use matrix transformations to express any parallel or perspective view.
The transformations project 3D points in the world space to 2D points in image space, allowing us to
project any point along a given pixel's viewing ray back to its position in image space.
On its own, the ability to project points from 3D to 2D is only useful for wireframe renderings,
where only the edges of objects are drawn and closer surfaces do not occlude distant ones. However,
to produce solid renderings, where we need to determine which surface is closest to the viewer at
any given point, more complex methods are required. We are working with a model consisting of 3D
line segments, defined by the (x, y, z) coordinates of their endpoints.
Viewing transformations
The viewing transformation maps 3D coordinates to 2D image pixels and is broken into three steps:
1. Camera transformation: Positions the camera at the origin with the correct orientation.
2. Projection transformation: Projects points from camera space to fit within a specific range (-1 to 1).
3. Viewport transformation: Maps the projected points to screen space in pixel coordinates.
These transformations are applied sequentially to convert world space to camera space, then to the
canonical view, and finally to screen space.
We begin with a problem whose solution will be reused for any
viewing condition. We will assume that all line segments to be
drawn are completely inside the canonical view volume; The
canonical view is mapped to screen coordinates, where x = -1
maps to the left, x = +1 to the right, y = -1 to the bottom,
and y = +1 to the top of the screen (right picture).
Since the viewport transformation maps one axis-aligned rectangle to another, it is a case of
the windowing transform given by ~
This matrix ignores the z-coordinate of the points in the canonical view volume, because a point’s
distance along the projection direction doesn’t affect where that point projects in the image.
The orthographic projection transformation
To render geometry in a region other than the canonical view volume, the view direction and
orientation are kept fixed (looking along -z with +y up), but arbitrary rectangles can be viewed.
Instead of replacing the viewport matrix, it is multiplied by another matrix on the right.
Under these conditions, the view volume is an axis-aligned box defined by [l, r] × [b, t] × [f, n],
known as the orthographic view volume. The bounding planes are named as follows:
• x = l (left plane),
• x = r (right plane),
• y = b (bottom plane),
• y = t (top plane),
• z = n (near plane),
• z = f (far plane).
Assuming a viewer looking along the -z axis with their head pointing in the +y direction, n > f,
which may seem counterintuitive but is correct if the entire orthographic view volume has negative
z values. This means the z = n "near" plane is closer to the viewer only if n > f. The
transformation from the orthographic view volume to the canonical view volume is another
windowing transformation, obtained by substituting the bounds of the orthographic and canonical
view volumes into the corresponding
equation to derive the transformation matrix:
The camera transformation
To change the viewpoint in 3D and look in any direction, the following
conventions are used:
• Eye position (e): The location from which the viewer "sees,"
like the center of a camera lens.
• Gaze direction (g): The direction the viewer is looking.
• View-up vector (t): A vector in the plane that bisects the
viewer's head into left and right halves and points "upward."
These vectors set up a coordinate system with origin e and basis
vectors u, v, and w:
g txw
w=- u= v=wxu
g txw
1
To transform points from the world coordinate system (origin o with x, y, z axes) to the camera's uvw
coordinate system, a transformation matrix is needed. This matrix, known as the canonical-to-basis
matrix, changes the coordinates from the world system to the camera’s frame:
This transformation can be understood as first moving the eye position e to the origin, then aligning
the basis vectors u, v, and w with the x, y, and z axes.
Projective transformations
Perspective projection requires special handling because it involves dividing by the z-coordinate,
which cannot be achieved with standard affine transformations. In camera space, with the viewpoint
at the origin and the camera looking along the z-axis, the screen size of an object is proportional
to 1/z.
To adapt perspective projection to matrix
operations, homogeneous coordinates are
used, where a point (x, y, z) is represented
by the vector [x y z 1]ᵀ. In affine
transformations, this ensures w = 1 by
using [0 0 0 1]ᵀ as the fourth row of the
transformation matrix. By allowing w to
vary, we can represent the point as (x/w,
y/w, z/w), expanding the range of possible
transformations.
This approach enables perspective transformations, where expressions like:
a 1 x + b1 y + c 1z + d 1
x' = ex + fy +gz + h can be computed, treating w as the common denominator for all
transformed coordinates. This allows us to represent functions as "linear rational functions,"
maintaining the same denominator for x', y', and z'.
Expressed as a matrix transformation:
Perspective projection
Projective transformations simplify the division by z needed for perspective projection. In 2D,
the transformation can be represented by a matrix that maps a homogeneous vector [y, z, 1]ᵀ
to [dy, z]ᵀ, which corresponds to the point (dy/z). For 3D, with a camera at the origin looking
along the -z axis and near and far planes defining the viewing range, the perspective projection
matrix maps points with a near
plane distance of -n.
The 3D perspective matrix is:
This matrix ensures that x and y coordinates are scaled by n/z,
implementing the desired perspective. The third row carries the z-
coordinate for depth processing (e.g., hidden surface removal), but
the perspective projection changes the z values non-linearly. This
matrix has the property of leaving points on the near (z = n) and
far (z = f) planes unchanged, while scaling x and y by the
appropriate factor.
In summary, although z cannot be perfectly preserved during
perspective projection, this matrix handles it well for rendering
purposes.
In perspective projection, the transformation scales x and y
and divides them by z, with n and z being negative within the
view volume, ensuring no "flips" in x and y. The transform
preserves the order of z-values between z = n and z = f,
which is crucial for depth sorting, essential for hidden
surface elimination.
The inverse of the perspective matrix P, useful for converting screen
coordinates back to the original space (e.g., for picking), is given by:
While this matrix isn't the literal inverse of P, it performs the inverse transformation described by
P.
When combined with the orthographic projection matrix Morth, the perspective matrix P maps the
frustum-shaped perspective view volume to an orthographic view volume, an axis-aligned box. This
integration allows us to apply all the orthographic transformations and algorithms by simply adding
one matrix for perspective projection and a division by w. Thus, the full potential of the 4x4 matrix
is used, adding efficiency to the projection process.
Some properties of the perspective transform
A key property of the perspective transform is that it maps lines to lines and planes to planes,
and it maps line segments within the view volume to line segments in the canonical volume. For
a line segment defined as q + t (Q − q), applying a 4 × 4 matrix M results in
r + t (R - r)
Mq + t (MQ − Mq) ≡ r + t (R − r), with a homogeneous coordinate yielding:
wr + t (wR - wr)
r + f(t) (R - r) wRt
This can be rewritten as: where f(t)=
wr wr + t (wR - wr)
.
This ensures that the transformed line segment remains a 3D line, preserving the relative
ordering of points (i.e., no reordering or "tearing" occurs). Consequently, the perspective
transform also maps the edges and vertices of triangles to the edges and vertices of another
triangle, maintaining the properties of triangles and planes.
Field-of-view
To simplify window specifications, we can impose constraints where the window is centered, meaning:
l=−r, b=−t.
To ensure square pixels without image distortion, the ratio r/t should match the ratio of horizontal
to vertical pixels: nx r
ny t
With nx and ny set, only one degree of freedom remains, often defined by the vertical field-of-view
(θ). This angle represents the vertical extent of the view and is distinct from the horizontal or
diagonal field-of-view angles. From the picture we can see that:
θ t
tan =
2 n
Given n and θ, t can be calculated, allowing us to adapt
code for more generalized viewing. In some systems,
n is preset, further reducing the degrees of freedom
for configuration.
References
http://repo.darmajaya.ac.id/5422/1/
Fundamentals%20of%20Computer%20Graphics%2C%20Fourth%20Edition%20%28%20PDFDrive%20
%29.pdf