Vision Tutorial for iOS: What’s New With Face Detection?

Learn what’s new with Face Detection and how the latest additions to Vision framework can help you achieve better results in image segmentation and analysis. By Tom Elliott.

5 (2) · 1 Review

Download materials
Save for later
Share
You are currently viewing page 3 of 5 of this article. Click here to view the first page.

Assuring Quality

Before requesting a quality score, your app needs a place to store the quality of the current frame. First, update the model to hold information about face quality.

Open CameraViewModel.swift. Underneath the FaceGeometryModel struct, add the following to store the quality state:

struct FaceQualityModel {
  let quality: Float
}

This struct contains a float property to store the most recent detected quality.

Under the declaration of faceGeometryState, add a property to publish face quality state:

// 1
@Published private(set) var faceQualityState: FaceObservation<FaceQualityModel> {
  didSet {
    // 2
    processUpdatedFaceQuality()
  }
}
  1. This follows a pattern like the faceGeometryState property above. A FaceObservation enum wraps the underlying model value. FaceObservation is a generic wrapper providing type safety. It contains three states: face found, face not found and error.
  2. Updates to faceQualityState call processUpdatedFaceQuality().

Don't forget to initialize the faceQualityState in init():

faceQualityState = .faceNotFound

This sets the initial value of faceQualityState to .faceNotFound.

Next, add a new published property for acceptable quality:

@Published private(set) var isAcceptableQuality: Bool {
  didSet {
    calculateDetectedFaceValidity()
  }
}

As with the other properties, initialize it in the init() method:

isAcceptableQuality = false

Now, you can write the implementation for processUpdatedFaceQuality():

switch faceQualityState {
case .faceNotFound:
  isAcceptableQuality = false
case .errored(let error):
  print(error.localizedDescription)
  isAcceptableQuality = false
case .faceFound(let faceQualityModel):
  if faceQualityModel.quality < 0.2 {
    isAcceptableQuality = false
  }

  isAcceptableQuality = true
}

Here, you enumerate over the different states of FaceObservation. An acceptable quality has a score of 0.2 or higher.

Update calculateDetectedFaceValidity() to account for acceptable quality by replacing the last line with:

isAcceptableYaw && isAcceptableQuality

Handling Quality Result

The faceQualityState property is now set up to store detected face quality. But, there isn't a way for anything to update that state. Time to fix that.

In the CameraViewModelAction enum, add a new action after faceObservationDetected:

case faceQualityObservationDetected(FaceQualityModel)

And, update the perform(action:) method switch to handle the new action:

case .faceQualityObservationDetected(let faceQualityObservation):
  publishFaceQualityObservation(faceQualityObservation)

Here, you're calling publishFaceQualityObservation() whenever the model performs the faceQualityObservationDetected action. Replace the function definition and empty implementation of publishFaceQualityObservation() with:

// 1
private func publishFaceQualityObservation(_ faceQualityModel: FaceQualityModel) {
  // 2
  DispatchQueue.main.async { [self] in
    // 3
    faceDetectedState = .faceDetected
    faceQualityState = .faceFound(faceQualityModel)
  }
}

Here, you're:

  1. Updating the function definition to pass in a FaceQualityModel.
  2. Dispatching to the main thread for safety.
  3. Updating the faceDetectedState and faceQualityState to record a face detection. The quality state stores the quality model.

Detecting Quality

Now the view model is all set up, and it's time to do some detecting. Open FaceDetector.swift.

Add a new request in captureOutput(_:didOutput:from:) after setting the revision for detectFaceRectanglesRequest:

let detectCaptureQualityRequest =
  VNDetectFaceCaptureQualityRequest(completionHandler: detectedFaceQualityRequest)
detectCaptureQualityRequest.revision =
  VNDetectFaceCaptureQualityRequestRevision2

Here, you create a new face quality request with a completion handler that calls detectedFaceQualityRequest. Then, you set it to use revision 2.

Add the request to the array passed to sequenceHandler a few lines below:

[detectFaceRectanglesRequest, detectCaptureQualityRequest],

Finally, write the implementation for the completion handler, detectedFaceQualityRequest(request:error:):

// 1
guard let model = model else {
  return
}

// 2
guard
  let results = request.results as? [VNFaceObservation],
  let result = results.first
else {
  model.perform(action: .noFaceDetected)
  return
}

// 3
let faceQualityModel = FaceQualityModel(
  quality: result.faceCaptureQuality ?? 0
)

// 4
model.perform(action: .faceQualityObservationDetected(faceQualityModel))

This implementation follows the pattern of the face rectangles completion handler above.

Here, you:

  1. Make sure the view model isn't nil, otherwise return early.
  2. Check to confirm the request contains valid VNFaceObservation results and extract the first one.
  3. Pull out the faceCaptureQuality from the result (or default to 0 if it doesn't exist). Use it to initialize a FaceQualityModel.
  4. Finally, perform the faceQualityObservationDetected action you created, passing through the new faceQualityModel.

Open DebugView.swift. After the roll/pitch/yaw DebugSection, at the end of the VStack, add a section to output the current quality:

DebugSection(observation: model.faceQualityState) { qualityModel in
  DebugText("Q: \(qualityModel.quality)")
    .debugTextStatus(status: model.isAcceptableQuality ? .passing : .failing)
}

Build and run. The debug text now shows the quality of the detected face. The shutter is only enabled if the quality rises above 0.2.

Showing the quality score in the debug view

Offering Helpful Hints

The app always displays the same message if one of the acceptability criteria fails. Because the model has state for each, you can make the app more helpful.

Open UserInstructionsView.swift and find faceDetectionStateLabel(). Replace the entire faceDetected case with the following:

if model.hasDetectedValidFace {
  return "Please take your photo :]"
} else if model.isAcceptableBounds == .detectedFaceTooSmall {
  return "Please bring your face closer to the camera"
} else if model.isAcceptableBounds == .detectedFaceTooLarge {
  return "Please hold the camera further from your face"
} else if model.isAcceptableBounds == .detectedFaceOffCentre {
  return "Please move your face to the centre of the frame"
} else if !model.isAcceptableRoll || !model.isAcceptablePitch || !model.isAcceptableYaw {
  return "Please look straight at the camera"
} else if !model.isAcceptableQuality {
  return "Image quality too low"
} else {
  return "We cannot take your photo right now"
}

This code picks a specific instruction depending on which criteria has failed. Build and run the app and play with moving your face into and out of the acceptable region.

Improved user instructions

Segmenting Sapiens

New in iOS 15, the Vision framework now supports person segmentation. Segmentation just means separating out a subject from everything else in the image. For example, replacing the background of an image but keeping the foreground intact — a technique you've certainly seen on a video call in the last year!

In the Vision framework, person segmentation is available using GeneratePersonSegmentationRequest. This feature works by analyzing a single frame at a time. There are three quality options available. Segmentation of a video stream requires analyzing the video frame by frame.

The results of the person segmentation request include a pixelBuffer. This contains a mask of the original image. White pixels represent a person in the original image and black represent the background.

Passport photos need the person photographed against a pure white background. Person segmentation is a great way to replace the background but leave the person intact.

Using Metal

Before replacing the background in the image, you need to know a bit about Metal.

Metal is a very powerful API provided by Apple. It performs graphics-intensive operations on the GPU for high performance image processing. It is fast enough to process each frame in a video in real time. This sounds pretty useful!

Open CameraViewController.swift. Look at the bottom of configureCaptureSession(). The camera view controller displays the preview layer from the AVCaptureSession.

The class supports two modes. One where Metal is used and one where Metal is not used. Currently it's set up to not use Metal. You'll change that now.

In viewDidLoad(), add the following code before the call to configureCaptureSession():

configureMetal()

This configures the app to use Metal. The view controller now draws the result from Metal instead of the AVCaptureSession. This isn't a tutorial on Metal, though, so the setup code is already written. Feel free to read the implementation in configureMetal() if you're curious.

With Metal configured to draw the view, you have complete control over what the view displays.