The graphics rendering pipeline transforms the description of objects in terms of vertices of primitives (triangle, point, line, etc) into a grid of color values (raster image) that will be drawn on the display.
The first set of transformations applied to vertices inside GPU are several sequential coordinate transformations in order to position them in from of the viewer and with the right size for the chosen perspective.
A transformation converts a vertex V from one space (or coordinate system) to another space V'. This transformation is carried by multiplying the vector with a transformation matrix (V' = M V).
I briefly describe these transformations:
Each object is typically represented in its own coordinate system, known as its model space (or local space). In order to assemble an scene it's necessary to transform the vertices from their local spaces to the world space, which is common to all the objects.
The camera (viewer) position and orientation will determine what objects appear in the final image and the side of them that will be represented. To setup this transformation OpenGL needs the camera position (EYE), the direction where the camera is aiming at (AT) and the upward orientation (UP) of the camera roughly (this vector is used to form a plane together with AT; this does not need to be the new Y axis). All these data will be in world coordinates. With these three data is possible to calculate the new orthogonal coordinate system of the camera.
Once the camera is positioned and oriented, it is time to decide what goes into the field of view and how the objects are projected onto the screen. This is done by specifying a viewing volume or clipping volume and selecting a projection mode (Perspective or orthographic. I focus on perspective projection since here). Objects (or parts of them) outside the clipping volume are clipped out of the scene.
The projection transformation turns the viewing volume into a cube clipping-volume. It's worthy to note that this geometric operation performs a space 'deformation' by transforming the frustum (truncated pyramid) into a cube.
As previous transformations, this vertex processing stage transform individual vertices. The relationships between vertices (such as primitives) are not considered in this stage.
Vertices that does not contribute to the final image will be discarded to improve the performance. This is known as view-frustum culling and it is done automatically by the GPU. If an object partially overlaps with the view frustum, it will be clipped in the later stage.
Each vertex inside the frustum is transformed, positioned and normalized in the clipping-volume cube space, together with their vertex-normal. The x and y coordinates (in the range of -1 to +1) will represent its position on the screen, and the z value (in the range of -1 to +1) represents its depth and will be stored in the Z-Buffer (usually remapped to the range of 0 to 1).
In hardware that supports OpenGL ES (2.0 or higher) all the explained transformations are implemented programmatically by vertex shaders.
Vertex shaders allow to apply transformations and deformations at vertex level; These know nothing about connections among vertices (geometry). So, the vertex shader only works with information related to the vertices (position, normal vector, color and texture coordinates).
Each vertex transformed by the vertex shader includes its position (considered in the clip coordinate space): the (x, y, z, w) value of the vertex. The primitive type and vertex indices passed in by the application determine the individual primitives that will be rendered in this stage.
The clipping stage will clip each primitive to the clip volume. This operation is automatically made by GPU hardware.
In the most complicated case: triangle primitives, they are clipped against the viewing volume by generating appropriate triangles who's vertices are on the boundary of the clipping volume. This may generate more than 1 triangle.
When primitives are clipped, new per-vertex outputs must be generated for them. These are generated via linear interpolation (in clip-space) of the output values. Flat-shaded outputs don't get this treatment.
Before the rasterization itself, it's necessary to do the last vertices transformation: from the clipping-volume coordinates to the display area (viewport) saved for this application.
Viewport is a rectangular display area on the application window measured in screen's coordinates (pixels). In 3D graphics, a viewport is 3-dimensional to support z-ordering.
The viewport transformation is made up of a y-axis reflection; x, y and z axes scalings; and a translation (of the origin from the center of the near plane of clipping volume to the top-left corner of the 3D viewport).
Geometry transformation and shaders
Some advanced graphic hardware and APIs (not OpenGL ES for now) supports geometry (primitives) transformation through a programmable stage after the primitive assembly one.
This is possible by the geometry shaders: they facilitate the creation and destruction of geometric primitives on the GPU at runtime (vertices, lines and triangles).
For each input primitive, the geometry shader return zero or more output primitives. The primitive input and output need not be the same type. Primitive outflow is obtained in the same order as the primitives entering.
After the vertices have been transformed and primitives have been clipped, the rasterization pipelines take an individual primitives and generate appropriate fragments for them aligned with the pixel-grid. The fragments are a kind of 3D pixels -identified by its integer location (x, y) in screen space- which have been interpolated from the vertices, and have the same set of attributes as the vertices, such as position, color, normal, texture.
Before triangles are rasterized, it's necessary to determine whether they are front-facing (facing the viewer) or back-facing. It can be declared based on the normal vector and the vector connecting the surface and the camera. The culling operation discards triangles that face away from the viewer.
This optimization step is usually programmable and disabled by default. It shall not be enabled if the object is transparent and alpha blending is active.
Rasterization and interpolation
There two sub-stages in the non-programmable part of rasterization and interpolation process (from here I refer to triangles but it would be analogous for other primitives):
Triangle Setup: 2D coordinates defining the contours of each triangle are calculated. This information is used in the following stage (and interpolation) and is typically implemented directly in hardware dedicated.
Triangle Traversal or Scan Conversion: All fragments belonging to the triangle are generated. The fragments are calculated by interpolating the information from the three points defined in the Triangle Setup stage.
At this time, each fragment has a position (aligned to the pixel-grid), depth, color, normal and texture coordinates.
Fragment processing and shaders
Once we have a set of fragments for each primitive, in this stage we can programmatically (by fragment shaders) process the fragment data how we want.
This process is usually dedicated to texture application and pixel-level lighting.
The per-fragment operations stage performs the following functions (and tests) on each fragment:
• Pixel ownership test: This test determines if one pixel in the framebuffer is currently owned by OpenGL ES. This test allows the window system to control which pixels in the framebuffer belong to the current OpenGL ES context.
• Scissor test: The scissor test determines if a fragment lies within the scissor rectangle defined as part of the OpenGL ES state. If the fragment is outside the scissor region, the fragment is discarded.
• Stencil and depth tests: These perform tests on the stencil and depth value of the incoming fragment to determine if the fragment should be rejected or not.
• Blending: Blending combines the newly generated fragment color value with the color values stored in the framebuffer at the same location.
Hidden-surface removal works only if the front object is totally opaque; but a fragment is not necessarily opaque, and could contain an alpha value specifying its degree of transparency.
The alpha is typically normalized to the range of 0 (totally transparent) to 1 (opaque). If the fragment is not totally opaque, then part of its background object could show through, which is known as alpha blending.
In alpha-blending, the order of placing the fragment is important. The fragments must be sorted from back-to-front, with the largest z-value processed first.
At the end of the per-fragment stage, either the fragment is rejected or a fragment color, depth, or stencil value is written to the framebuffer at that loca- tion. The fragment color, depth, and stencil values are written depending on whether the appropriate write masks are enabled or not. Thus it achieves a finer control over the color, depth, and stencil values written into the appropriate buffers.
- Desarrollo de videojuegos: un enfoque práctico – 2014 (Bock II Programación gráfica - 267 page)