The vertices that represent the objects forming a scene undergo several transformations through the OpenGL pipeline until they become pixels in the screen.
These steps through the pipeline transform the vertices coordinates from one space into another, and each new space reference the objects according to its own new coordinate space. Some of these transformations are represented by matrices as we'll see.
Each object is typically represented in its own coordinate system, known as its model space (or local space). In order to assemble an scene it's necessary to transform the vertices from their local spaces to the world space, which is common to all the objects.
So, the model transformation, represented by matrix called Mmodel, transforms each vertex of each object from its model space to the the world space. In other words, it performs the appropriate coordinate change (rotation, translation, scale) to each vertex.
The next step is to transform the vertices from the world space to the the eye or camera space. This transformation is much like the previous one: it's a coordinate change between the world coordinate system and the new orthogonal coordinate system of the camera.
So, we need other matrix (Mview) to apply this transformation. Moreover, in this case, the matrix will be the same for all scene objects.
Typically, model and view matrices are joined together:
In this step of the pipeline it's decided what goes into the field of view and how the objects are projected onto the screen. This is done by specifying a viewing volume or clipping volume and selecting a projection mode (Perspective or orthographic. I focus on perspective projection since here). Objects (or parts of them) outside the clipping volume are clipped out of the scene.
The goal of this transformation is to transform the viewing volume into a canonical unit cube, and this requires to deform the space and translating the cube to the origin. This means that all the vertices have been squashed into a unit cube, where all the coordinates are within the range of [-1,-1,-1] to [1,1,1]. If any coordinate falls outside this range, it will be ‘clipped’ (not drawn), since it’ll be outside the visible area of the screen.
To achieve the above transformation two steps are required:
- Projection matrix transformation:
Once we have the vertices in eye-coordinates, we use the projection matrix (Mprojection) to transform the vertices into clip-coordinates.
where Mprojection is formed this way:
The frustum culling (clipping) is performed after the above transformation, in the clip coordinates, just before dividing by wclip.
The OpenGL vertex shader gl_Position output value contains the vertex position in homogeneous clip space. When all the vertices have been processed by the vertex shader, triangles out of clip space are culled and those whose vertices are in clip space are clipped.After clipping, the vertices are normalized by the next step, the perspective division.A good explanation on the frustum clipping: About the Projection Matrix, the GPU Rendering Pipeline and Clipping - Scratchapixel
- Perspective division (necessary because it is a perspective transformation):
The clip coordinates are still a homogeneous coordinates. In order to get the normalized device coordinates (NDC) they have to be divided by the w-component:
It's worthy to note that until this step the vertex w-component has been equal to 1. As we can deduce form the Mprojection the wclip changes to a value totally dependent of zeye:
wclip = -zeye
Therefore, the perspective division is what really deforms the space: where the truncated pyramid frustum is mapped to a unitary cube.This division is performed in hardware (by the GPU itself) because it's not possible to perform that division using the matrix algebra multiplication (matrix x vector).
A detail to consider before moving forward: it's interesting to study the relation between zeye and zndc. It is an non-linear relationship what means there is very high precision at the near plane, but very little precision at the far plane (this depends on the range [-n, -f]). A long distance between n and f could cause depth buffer precision problems.
The viewport is a rectangular display area on the application window measured in screen's coordinates (pixels). In 3D graphics, a viewport is 3-dimensional to support z-ordering.
So, in the Viewport transformation the vertices (in NDC-space) have to be mapped onto a 2D screen (screen pixel coordinates) and the z-component of the vertex gets written into the OpenGL depth buffer for visibility testing.
The viewport transformation is made up of a y-axis reflection; x, y and z axes scalings; and a translation (of the origin from the center of the near plane of clipping volume to the top-left corner of the 3D viewport). They are all linear transformation; linear relationship between NDC and the window coordinates:
The parameters of the transformation are provided by the OpenGL functions:
- glViewport(x, y, w, h): this function is used to set the size of the viewport on the screen. The viewport can fill the entire screen, or it can fill only a portion of the screen, and it is the area where OpenGL will render the image.
The function parameters x and y specify the lower left corner of the viewport rectangle; w and h specify the width and height of the viewport.
- glDepthRange(n, f): this function is used to determine the z value range of the window coordinates. By default, the range of the Depth buffer is in the range of 0 to 1, where 0.0 is the closest to the viewer and 1.0 being the furtherest away.
More web resources on this topic:
Very good explanation on homogeneous coordinates and on difference between affine transformation and projection matrices (at the end of the article):
Projection Matrices: What You Need to Know First - Scratchapixel