6. Coordinate Spaces Written by Marius Horga & Caroline Begbie

To easily find a point on a grid, you need a coordinate system. For example, if the grid happens to be your iPhone 13 screen, the center point might be `x: 195, y: 422`. However, that point may be different depending on what space it’s in.

In the previous chapter, you learned about matrices. By multiplying a vertex’s position by a particular matrix, you can convert the vertex position to a different coordinate space. There are typically six spaces a vertex travels as its making its way through the pipeline:

• Object
• World
• Camera
• Clip
• NDC (Normalized Device Coordinate)
• Screen

Since this is starting to read like a description of Voyager leaving our solar system, let’s have a quick conceptual look at each coordinate space before attempting the conversions.

Object Space

If you’re familiar with the Cartesian coordinate system, you know that it uses two points to map an object’s location. The following image shows a 2D grid with the possible vertices of the dog mapped using Cartesian coordinates.

The positions of the vertices are in relation to the dog’s origin, which is located at `(0, 0)`. The vertices in this image are located in object space (or local or model space). In the previous chapter, `Triangle` held an array of vertices in object space, describing the vertex of each point of the triangle.

World Space

In the following image, the direction arrows mark the world’s origin at `(0, 0, 0)`. So, in world space, the dog is at `(1, 0, 1)` and the cat is at `(-1, 0, -2)`.

Camera Space

Enough about the cat. Let’s move on to the dog. For him, the center of the universe is the person holding the camera. So, in camera space (or view space), the camera is at `(0, 0, 0)` and the dog is approximately at `(-3, -2, 7)`. When the camera moves, it stays at `(0, 0, 0)`, but the positions of the dog and cat move relative to the camera.

Clip Space

The main reason for doing all this math is to project with perspective. In other words, you want to take a three-dimensional scene into a two-dimensional space. Clip space is a distorted cube that’s ready for flattening.

NDC (Normalized Device Coordinate) Space

Projection into clip space creates a half cube of `w` size. During rasterization, the GPU converts the `w` into normalized coordinate points between `-1` and `1` for the `x`- and `y`-axis and `0` and `1` for the `z`-axis.

Screen Space

Now that the GPU has a normalized cube, it will flatten clip space into two dimensions and convert everything into screen coordinates, ready to display on the device’s screen.

Converting Between Spaces

To convert from one space to another, you can use transformation matrices. In the following image, the vertex on the dog’s ear is `(-1, 4, 0)` in object space. But in world space, the origin is different, so the vertex — judging from the image — is at about `(0.75, 1.5, 1)`.

Coordinate Systems

Different graphics APIs use different coordinate systems. You already found out that Metal’s NDC (Normalized Device Coordinates) uses `0` to `1` on the `z`-axis. You also may already be familiar with OpenGL, which uses `1` to `-1` on the `z`-axis.

The Starter Project

With a better understanding of coordinate systems and spaces, you’re ready to start creating matrices.

Uniforms

Constant values that are the same across all vertices or fragments are generally referred to as uniforms. The first step is to create a uniform structure to hold the conversion matrices. After that, you’ll apply the uniforms to every vertex.

``````#import <simd/simd.h>
``````
``````typedef struct {
matrix_float4x4 modelMatrix;
matrix_float4x4 viewMatrix;
matrix_float4x4 projectionMatrix;
} Uniforms;
``````

The Model Matrix

Your train vertices are currently in object space. To convert these vertices to world space, you’ll use `modelMatrix`. By changing `modelMatrix`, you’ll be able to translate, scale and rotate your train.

``````var uniforms = Uniforms()
``````
``````let translation = float4x4(translation: [0.5, -0.4, 0])
let rotation =
uniforms.modelMatrix = translation * rotation
``````
``````renderEncoder.setVertexBytes(
&uniforms,
length: MemoryLayout<Uniforms>.stride,
index: 11)
``````
``````#import "Common.h"
``````
``````vertex VertexOut vertex_main(
VertexIn in [[stage_in]],
constant Uniforms &uniforms [[buffer(11)]])
{
float4 position = uniforms.modelMatrix * in.position;
VertexOut out {
.position = position
};
return out;
}
``````

View Matrix

To convert between world space and camera space, you set a view matrix. Depending on how you want to move the camera in your world, you can construct the view matrix appropriately. The view matrix you’ll create here is a simple one, best for FPS (First Person Shooter) style games.

``````uniforms.viewMatrix = float4x4(translation: [0.8, 0, 0]).inverse
``````
``````float4 position = uniforms.modelMatrix * in.position;
``````
``````float4 position = uniforms.viewMatrix * uniforms.modelMatrix
* in.position;
``````

``````renderEncoder.setVertexBytes(
&uniforms,
length: MemoryLayout<Uniforms>.stride,
index: 1)
``````
``````timer += 0.005
uniforms.viewMatrix = float4x4.identity
let translationMatrix = float4x4(translation: [0, -0.6, 0])
let rotationMatrix = float4x4(rotationY: sin(timer))
uniforms.modelMatrix = translationMatrix * rotationMatrix
``````

Projection

It’s time to apply some perspective to your render to give your scene some depth.

Projection Matrix

➤ Open Renderer.swift, and add this code to `mtkView(_:drawableSizeWillChange:)`:

``````let aspect =
Float(view.bounds.width) / Float(view.bounds.height)
let projectionMatrix =
float4x4(
near: 0.1,
far: 100,
aspect: aspect)
uniforms.projectionMatrix = projectionMatrix
``````
``````mtkView(
metalView,
drawableSizeWillChange: metalView.bounds.size)
``````
``````float4 position =
uniforms.projectionMatrix * uniforms.viewMatrix
* uniforms.modelMatrix * in.position;
``````

``````uniforms.viewMatrix = float4x4.identity
``````
``````uniforms.viewMatrix = float4x4(translation: [0, 0, -3]).inverse
``````

``````    renderEncoder.setTriangleFillMode(.lines)
``````

Perspective Divide

Now that you’ve converted your vertices from object space through world space, camera space and clip space, the GPU takes over to convert to NDC coordinates (that’s `-1` to `1` in the `x` and `y` directions and `0` to `1` in the `z` direction). The ultimate aim is to scale all the vertices from clip space into NDC space, and by using the fourth `w` component, that task gets a lot easier.

NDC to Screen

Finally, the GPU converts from normalized coordinates to whatever the device screen size is. You may already have done something like this at some time in your career when converting between normalized coordinates and screen coordinates.

``````converted.x = point.x * screenWidth/2  + screenWidth/2
converted.y = point.y * screenHeight/2 + screenHeight/2
``````
``````converted = matrix * point
``````

Refactoring the Model Matrix

Currently, you set all the matrices in `Renderer`. Later, you’ll create a `Camera` structure to calculate the view and projection matrices.

``````struct Transform {
var position: float3 = [0, 0, 0]
var rotation: float3 = [0, 0, 0]
var scale: Float = 1
}
``````
``````extension Transform {
var modelMatrix: matrix_float4x4 {
let translation = float4x4(translation: position)
let rotation = float4x4(rotation: rotation)
let scale = float4x4(scaling: scale)
let modelMatrix = translation * rotation * scale
return modelMatrix
}
}
``````
``````protocol Transformable {
var transform: Transform { get set }
}
``````
``````extension Transformable {
var position: float3 {
get { transform.position }
set { transform.position = newValue }
}
var rotation: float3 {
get { transform.rotation }
set { transform.rotation = newValue }
}
var scale: Float {
get { transform.scale }
set { transform.scale = newValue }
}
}
``````
``````class Model: Transformable {
``````
``````var transform = Transform()
``````
``````let translation = float4x4(translation: [0.5, -0.4, 0])
let rotation =
uniforms.modelMatrix = translation * rotation
``````
``````let translationMatrix = float4x4(translation: [0, -0.6, 0])
let rotationMatrix = float4x4(rotationY: sin(timer))
uniforms.modelMatrix = translationMatrix * rotationMatrix
``````
``````model.position.y = -0.6
model.rotation.y = sin(timer)
uniforms.modelMatrix = model.transform.modelMatrix
``````

Key Points

• Coordinate spaces map different coordinate systems. To convert from one space to another, you can use matrix multiplication.
• Model vertices start off in object space. These are generally held in the file that comes from your 3D app, such as Blender, but you can procedurally generate them too.
• The model matrix converts object space vertices to world space. These are the positions that the vertices hold in the scene’s world. The origin at `[0, 0, 0]` is the center of the scene.
• The view matrix moves vertices into camera space. Generally, your matrix will be the inverse of the position of the camera in world space.
• The projection matrix applies three-dimensional perspective to your vertices.

Where to Go From Here?

You’ve covered a lot of mathematical concepts in this chapter without diving too far into the underlying mathematical principles. To get started in computer graphics, you can fill your transform matrices and continue multiplying them at the usual times, but to be sufficiently creative, you’ll need to understand some linear algebra. A great place to start is Grant Sanderson’s Essence of Linear Algebra at https://bit.ly/3iYnkN1. This video treats vectors and matrices visually. You’ll also find some additional references in references.markdown in the resources folder for this chapter.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.