You're in
Developer | Entrepreneur | Investor
Software engineer (entrepreneur and investor at times). These days doing performant frontend and graphics on the web platform at Barcelona Supercomputing Center

# OpenGL vertex transformations

19 Jun 2015   |   iker hurtado
This post is a overview of the OpenGL vertices transformations; more or less valid for current OpenGL flavors: OpenGL, OpenGL ES and WebGL. It doesn't cover the mathematical deduction of the matrices but links web pages that do that properly.

The vertices that represent the objects forming a scene undergo several transformations through the OpenGL pipeline until they become pixels in the screen.

These steps through the pipeline transform the vertices coordinates from one space into another, and each new space reference the objects according to its own new coordinate space. Some of these transformations are represented by matrices as we'll see.

### Model transformation

Each object is typically represented in its own coordinate system, known as its model space (or local space). In order to assemble an scene it's necessary to transform the vertices from their local spaces to the world space, which is common to all the objects.

So, the model transformation, represented by matrix called Mmodel, transforms each vertex of each object from its model space to the the world space. In other words, it performs the appropriate coordinate change (rotation, translation, scale) to each vertex.

### View transformation

The next step is to transform the vertices from the world space to the the eye or camera space. This transformation is much like the previous one: it's a coordinate change between the world coordinate system and the new orthogonal coordinate system of the camera.

So, we need other matrix (Mview) to apply this transformation. Moreover, in this case, the matrix will be the same for all scene objects.

Typically, model and view matrices are joined together:

It's important to note that taking the camera as the coordinate system (located at (0, 0, 0) and facing to -Z axis) means that if we want to move this camera we must move the world in the opposite direction. For example, if we had to move a real camera to the left, it’s the same as moving the world to the right.

### Projection transformation

In this step of the pipeline it's decided what goes into the field of view and how the objects are projected onto the screen. This is done by specifying a viewing volume or clipping volume and selecting a projection mode (Perspective or orthographic. I focus on perspective projection since here). Objects (or parts of them) outside the clipping volume are clipped out of the scene.

A very good explanation of the Orthographic Projection: OpenGL ES 2.0 orthographic projection - The Code Crate

The goal of this transformation is to transform the viewing volume into a canonical unit cube, and this requires to deform the space and translating the cube to the origin. This means that all the vertices have been squashed into a unit cube, where all the coordinates are within the range of [-1,-1,-1] to [1,1,1]. If any coordinate falls outside this range, it will be ‘clipped’ (not drawn), since it’ll be outside the visible area of the screen.

The projection matrix remaps the viewing frustum into a unit cube or canonical view volume.

To achieve the above transformation two steps are required:

• Projection matrix transformation:

Once we have the vertices in eye-coordinates, we use the projection matrix (Mprojection) to transform the vertices into clip-coordinates.

where Mprojection is formed this way:

$\left(\begin{array}{cccc}\frac{n}{r}& 0& 0& 0\\ 0& \frac{n}{t}& 0& 0\\ 0& 0& \frac{-\left(f+n\right)}{f-n}& \frac{-2fn}{f-n}\\ 0& 0& -1& 0\end{array}\right)$

The frustum culling (clipping) is performed after the above transformation, in the clip coordinates, just before dividing by wclip.

The OpenGL vertex shader gl_Position output value contains the vertex position in homogeneous clip space. When all the vertices have been processed by the vertex shader, triangles out of clip space are culled and those whose vertices are in clip space are clipped.After clipping, the vertices are normalized by the next step, the perspective division.

• Perspective division (necessary because it is a perspective transformation):

The clip coordinates are still a homogeneous coordinates. In order to get the normalized device coordinates (NDC) they have to be divided by the w-component:

It's worthy to note that until this step the vertex w-component has been equal to 1. As we can deduce form the Mprojection the wclip changes to a value totally dependent of zeye:

```wclip = -zeye
```

Therefore, the perspective division is what really deforms the space: where the truncated pyramid frustum is mapped to a unitary cube.

This division is performed in hardware (by the GPU itself) because it's not possible to perform that division using the matrix algebra multiplication (matrix x vector).

A detail to consider before moving forward: it's interesting to study the relation between zeye and zndc. It is an non-linear relationship what means there is very high precision at the near plane, but very little precision at the far plane (this depends on the range [-n, -f]). A long distance between n and f could cause depth buffer precision problems.

Comparison of depth buffer precisions.
This topic is very well explained here: OpenGL Projection Matrix - songho.ca

### Viewport transformation

The viewport is a rectangular display area on the application window measured in screen's coordinates (pixels). In 3D graphics, a viewport is 3-dimensional to support z-ordering.

So, in the Viewport transformation the vertices (in NDC-space) have to be mapped onto a 2D screen (screen pixel coordinates) and the z-component of the vertex gets written into the OpenGL depth buffer for visibility testing.

The viewport transformation is made up of a y-axis reflection; x, y and z axes scalings; and a translation (of the origin from the center of the near plane of clipping volume to the top-left corner of the 3D viewport). They are all linear transformation; linear relationship between NDC and the window coordinates:

The parameters of the transformation are provided by the OpenGL functions:

• glViewport(x, y, w, h): this function is used to set the size of the viewport on the screen. The viewport can fill the entire screen, or it can fill only a portion of the screen, and it is the area where OpenGL will render the image.
The function parameters x and y specify the lower left corner of the viewport rectangle; w and h specify the width and height of the viewport.
• glDepthRange(n, f): this function is used to determine the z value range of the window coordinates. By default, the range of the Depth buffer is in the range of 0 to 1, where 0.0 is the closest to the viewer and 1.0 being the furtherest away.
If the aspect ratio of the viewport and the projection plane are not the same, the shapes will be distorted. Hence, it is advisable to use the same aspect ratio for the viewport and the projection plane. Programmatically, this means that we should include together the glViewport() command and re-configure the aspect ratio of the projection plane in the resize window handler.
This topic is very well explained here: OpenGL viewport transformation matrix - The Code Crate

More web resources on this topic:

OpenGL ES 2.0 matrix transformations - The Code Crate

OpenGL Transformation - songho.ca

Mathematics of Computing the 2D Coordinates of a 3D Point - Scratchapixel

The OpenGL Perspective Projection Matrix - Scratchapixel

Very good explanation on homogeneous coordinates and on difference between affine transformation and projection matrices (at the end of the article):
Projection Matrices: What You Need to Know First - Scratchapixel