You're in
Iker Hurtado's pro blog
Developer | Entrepreneur | Investor
Software engineer (entrepreneur and investor at times). These days doing performant frontend and graphics on the web platform at Barcelona Supercomputing Center

Notes on OpenGL ES Graphics Pipeline

24 Mar 2015   |   iker hurtado  
Share on Twitter Share on Google+ Share on Facebook
In this post I gather some notes I wrote down when I was studying the OpenGL ES Graphics Pipeline. It's not intended to be a exhaustive or complete resource about this topic. I leave the mathematical theory of transformations out.

The graphics rendering pipeline transforms the description of objects in terms of vertices of primitives (triangle, point, line, etc) into a grid of color values (raster image) that will be drawn on the display.

High level graphics pipeline diagram. Source: The Cg Tutorial - Chapter 1. Introduction - nVidia

Vertices transformation

The first set of transformations applied to vertices inside GPU are several sequential coordinate transformations in order to position them in from of the viewer and with the right size for the chosen perspective.

A transformation converts a vertex V from one space (or coordinate system) to another space V'. This transformation is carried by multiplying the vector with a transformation matrix (V' = M V).

Vertices coordinate transformation diagram. Source: OpenGL ES 2.x Programming Introduction - Champ Yen

I briefly describe these transformations:

Model transformation

Each object is typically represented in its own coordinate system, known as its model space (or local space). In order to assemble an scene it's necessary to transform the vertices from their local spaces to the world space, which is common to all the objects.

View transformation

The camera (viewer) position and orientation will determine what objects appear in the final image and the side of them that will be represented. To setup this transformation OpenGL needs the camera position (EYE), the direction where the camera is aiming at (AT) and the upward orientation (UP) of the camera roughly (this vector is used to form a plane together with AT; this does not need to be the new Y axis). All these data will be in world coordinates. With these three data is possible to calculate the new orthogonal coordinate system of the camera.

View transformation drawing. Source: 3D Graphics with OpenGL. Basic Theory - Chua Hock-Chuan

Projection transformation

Once the camera is positioned and oriented, it is time to decide what goes into the field of view and how the objects are projected onto the screen. This is done by specifying a viewing volume or clipping volume and selecting a projection mode (Perspective or orthographic. I focus on perspective projection since here). Objects (or parts of them) outside the clipping volume are clipped out of the scene.

Perspective or orthographic projections drawing. Source: Modern OpenGL tutorial (python)- Nicolas P. Rougier

The projection transformation turns the viewing volume into a cube clipping-volume. It's worthy to note that this geometric operation performs a space 'deformation' by transforming the frustum (truncated pyramid) into a cube.

As previous transformations, this vertex processing stage transform individual vertices. The relationships between vertices (such as primitives) are not considered in this stage.

The projection matrix remaps the viewing frustum into a unit cube or canonical view volume. Source: OpenGL Projection Matrix - songho.ca

Vertices that does not contribute to the final image will be discarded to improve the performance. This is known as view-frustum culling and it is done automatically by the GPU. If an object partially overlaps with the view frustum, it will be clipped in the later stage.

Each vertex inside the frustum is transformed, positioned and normalized in the clipping-volume cube space, together with their vertex-normal. The x and y coordinates (in the range of -1 to +1) will represent its position on the screen, and the z value (in the range of -1 to +1) represents its depth and will be stored in the Z-Buffer (usually remapped to the range of 0 to 1).

Vertex shaders

In hardware that supports OpenGL ES (2.0 or higher) all the explained transformations are implemented programmatically by vertex shaders.

Vertex shaders allow to apply transformations and deformations at vertex level; These know nothing about connections among vertices (geometry). So, the vertex shader only works with information related to the vertices (position, normal vector, color and texture coordinates).

Primitive Assembly

Each vertex transformed by the vertex shader includes its position (considered in the clip coordinate space): the (x, y, z, w) value of the vertex. The primitive type and vertex indices passed in by the application determine the individual primitives that will be rendered in this stage.


The clipping stage will clip each primitive to the clip volume. This operation is automatically made by GPU hardware.

In the most complicated case: triangle primitives, they are clipped against the viewing volume by generating appropriate triangles who's vertices are on the boundary of the clipping volume. This may generate more than 1 triangle.

When primitives are clipped, new per-vertex outputs must be generated for them. These are generated via linear interpolation (in clip-space) of the output values. Flat-shaded outputs don't get this treatment.

Viewport transformation

Before the rasterization itself, it's necessary to do the last vertices transformation: from the clipping-volume coordinates to the display area (viewport) saved for this application.

Viewport is a rectangular display area on the application window measured in screen's coordinates (pixels). In 3D graphics, a viewport is 3-dimensional to support z-ordering.

The viewport transformation is made up of a y-axis reflection; x, y and z axes scalings; and a translation (of the origin from the center of the near plane of clipping volume to the top-left corner of the 3D viewport).

If the aspect ratio of the viewport and the projection plane are not the same, the shapes will be distorted. Hence, it is advisable to use the same aspect ratio for the viewport and the projection plane. Programmatically, this means that we should include together the glViewport() command and re-configure the aspect ratio of the projection plane in the resize window handler.

Geometry transformation and shaders

Some advanced graphic hardware and APIs (not OpenGL ES for now) supports geometry (primitives) transformation through a programmable stage after the primitive assembly one.

This is possible by the geometry shaders: they facilitate the creation and destruction of geometric primitives on the GPU at runtime (vertices, lines and triangles).

For each input primitive, the geometry shader return zero or more output primitives. The primitive input and output need not be the same type. Primitive outflow is obtained in the same order as the primitives entering.

Graphics pipeline with geometry transformation stage. Source: The OpenGL - DirectX graphics pipeline - Wikimedia Commons

Raster stage

After the vertices have been transformed and primitives have been clipped, the rasterization pipelines take an individual primitives and generate appropriate fragments for them aligned with the pixel-grid. The fragments are a kind of 3D pixels -identified by its integer location (x, y) in screen space- which have been interpolated from the vertices, and have the same set of attributes as the vertices, such as position, color, normal, texture.

From vertices to fragments diagram. Source: OpenGL ES 2.x Programming Introduction - Champ Yen


Before triangles are rasterized, it's necessary to determine whether they are front-facing (facing the viewer) or back-facing. It can be declared based on the normal vector and the vector connecting the surface and the camera. The culling operation discards triangles that face away from the viewer.

This optimization step is usually programmable and disabled by default. It shall not be enabled if the object is transparent and alpha blending is active.

Rasterization and interpolation

There two sub-stages in the non-programmable part of rasterization and interpolation process (from here I refer to triangles but it would be analogous for other primitives):

Triangle Setup: 2D coordinates defining the contours of each triangle are calculated. This information is used in the following stage (and interpolation) and is typically implemented directly in hardware dedicated.

Triangle Traversal or Scan Conversion: All fragments belonging to the triangle are generated. The fragments are calculated by interpolating the information from the three points defined in the Triangle Setup stage.

At this time, each fragment has a position (aligned to the pixel-grid), depth, color, normal and texture coordinates.

Fragment processing and shaders

Once we have a set of fragments for each primitive, in this stage we can programmatically (by fragment shaders) process the fragment data how we want.

This process is usually dedicated to texture application and pixel-level lighting.

It's worthy to note that, at this point, the fragments keep their depth.

The per-fragment operations stage performs the following functions (and tests) on each fragment:

Pixel ownership test: This test determines if one pixel in the framebuffer is currently owned by OpenGL ES. This test allows the window system to control which pixels in the framebuffer belong to the current OpenGL ES context.

Scissor test: The scissor test determines if a fragment lies within the scissor rectangle defined as part of the OpenGL ES state. If the fragment is outside the scissor region, the fragment is discarded.

Stencil and depth tests: These perform tests on the stencil and depth value of the incoming fragment to determine if the fragment should be rejected or not.
The z-buffer (or depth-buffer) is usually used to resolve the fragments visibility, in other words, to remove hidden surfaces. This way: the z-buffer is initialized to 1 (farthest) and color-buffer to the background color. For each fragment (of each primitive) processed, its z-value is checked against the buffer value. If its z-value is smaller than the z-buffer, its color and z-value are copied into the buffer. Otherwise, this fragment is occluded by another object and discarded.

Blending: Blending combines the newly generated fragment color value with the color values stored in the framebuffer at the same location.

Hidden-surface removal works only if the front object is totally opaque; but a fragment is not necessarily opaque, and could contain an alpha value specifying its degree of transparency.

The alpha is typically normalized to the range of 0 (totally transparent) to 1 (opaque). If the fragment is not totally opaque, then part of its background object could show through, which is known as alpha blending.

In alpha-blending, the order of placing the fragment is important. The fragments must be sorted from back-to-front, with the largest z-value processed first.

Alpha-blending and hidden-surface removal are mutually exclusive.

Dithering: Dithering can be used to minimize the artifacts that can occur from using limited precision to store color values in the frame-buffer.

At the end of the per-fragment stage, either the fragment is rejected or a fragment color, depth, or stencil value is written to the framebuffer at that loca- tion. The fragment color, depth, and stencil values are written depending on whether the appropriate write masks are enabled or not. Thus it achieves a finer control over the color, depth, and stencil values written into the appropriate buffers.

The main sources of information I used for this study:

- 3D Graphics with OpenGL - The Basic Theory - Chua Hock-Chuan

- Desarrollo de videojuegos: un enfoque práctico – 2014 (Bock II Programación gráfica - 267 page)