Face Detection Tutorial Using the Vision Framework for iOS

In this tutorial, you’ll learn how to use Vision for face detection of facial features and overlay the results on the camera feed in real time. By Yono Mittlefehldt.

Leave a rating/review
Download materials
Save for later
Share
You are currently viewing page 2 of 3 of this article. Click here to view the first page.

What Else Can You Detect?

Aside from face detection, the Vision framework has APIs you can use to detect all sorts of things.

  • Rectangles: With VNDetectRectanglesRequest, you can detect rectangles in the camera input, even if they are distorted due to perspective.
  • Text: You can detect the bounding boxes around individual text characters by using VNDetectTextRectanglesRequest. Note, however, this doesn’t recognize what the characters are, it only detects them.
  • Horizon: Using VNDetectHorizonRequest, you can determine the angle of the horizon in images.
  • Barcodes: You can detect and recognize many kinds of barcodes with VNDetectBarcodesRequest. See the full list here.
  • Objects: By combining the Vision framework with CoreML, you can detect and classify specific objects using VNCoreMLRequest.
  • Image alignment: With VNTranslationalImageRegistrationRequest and VNHomographicImageRegistrationRequest you can align two images that have overlapping content.

Amazing, right?

Well, there’s one more very important thing you can detect with the Vision framework. You can use it to detect face landmarks! Since this tutorial is all about face detection, you’ll be doing that in the next section.

Detecting Face Landmarks

The first thing you need to do is update your Vision request to detect face landmarks. To do this, open FaceDetectionViewController.swift and in captureOutput(_:didOutput:from:) replace the line where you define detectFaceRequest with this:

let detectFaceRequest = VNDetectFaceLandmarksRequest(completionHandler: detectedFace)

If you were to build and run now, you wouldn’t see any difference from before. You’d still see a red bounding box around your face.

Why?

Because VNDetectFaceLandmarksRequest will first detect all faces in the image before analyzing them for facial features.

Next, you’re going to need to define some helper methods. Right below convert(rect:), add the following code:

// 1
func landmark(point: CGPoint, to rect: CGRect) -> CGPoint {
  // 2
  let absolute = point.absolutePoint(in: rect)
  
  // 3
  let converted = previewLayer.layerPointConverted(fromCaptureDevicePoint: absolute)
  
  // 4
  return converted
}

With this code, you:

  1. Define a method which converts a landmark point to something that can be drawn on the screen.
  2. Calculate the absolute position of the normalized point by using a Core Graphics extension defined in CoreGraphicsExtensions.swift.
  3. Convert the point to the preview layer’s coordinate system.
  4. Return the converted point.

Below that method, add the following:

func landmark(points: [CGPoint]?, to rect: CGRect) -> [CGPoint]? {
  return points?.compactMap { landmark(point: $0, to: rect) }
}

This method takes an array of these landmark points and converts them all.

Next, you’re going to refactor some of your code to make it easier to work with and add functionality. Add the following method right below your two new helper methods:

func updateFaceView(for result: VNFaceObservation) {
  defer {
    DispatchQueue.main.async {
      self.faceView.setNeedsDisplay()
    }
  }

  let box = result.boundingBox    
  faceView.boundingBox = convert(rect: box)

  guard let landmarks = result.landmarks else {
    return
  }
    
  if let leftEye = landmark(
    points: landmarks.leftEye?.normalizedPoints, 
    to: result.boundingBox) {
    faceView.leftEye = leftEye
  }
}

The only thing new here is the first if statement in the function. That if uses your new helper methods to convert the normalized points that make up the leftEye into coordinates that work with the preview layer. If everything went well, you assigned those converted points to the leftEye property of the FaceView.

The rest looks familiar because you already wrote it in detectedFace(request:error:). So, you should probably clean that up now.

In detectedFace(request:error:), replace the following code:

let box = result.boundingBox
faceView.boundingBox = convert(rect: box)
    
DispatchQueue.main.async {
  self.faceView.setNeedsDisplay()
}

with:

updateFaceView(for: result)

This calls your newly defined method to handle updating the FaceView.

There’s one last step before you can try out your code. Open FaceView.swift and add the following code to the end of draw(_:), right after the existing statement context.strokePath():

// 1
UIColor.white.setStroke()
    
if !leftEye.isEmpty {
  // 2
  context.addLines(between: leftEye)
  
  // 3
  context.closePath()
  
  // 4
  context.strokePath()
}

Here you:

  1. Set the stroke color to white, to differentiate from the red bounding box.
  2. Add lines between the points that define the leftEye, if there are any points.
  3. Close the path, to make a nice eye shape.
  4. Stroke the path, to make it visible.

Time to build and run!

A fun game with computer vision APIs is to look for words like left and right and guess what they mean. It’s different every time!

Note: You’ve added code to annotate the left eye, but what does that mean? With Vision, you should expect to see the outline drawn not on your left eye, but on your eye which is on the left side of the image.

Awesome! If you try to open your eye wide or shut it, you should see the drawn eye change shape slightly, although not as much.

This is a fantastic milestone. You may want to take a quick break now, as you’ll be adding all the other face landmarks in one fell swoop.

Back already? You’re industrious! Time to add those other landmarks.

While you still have FaceView.swift open, add the following to the end of draw(_:), after the code for the left eye:

if !rightEye.isEmpty {
  context.addLines(between: rightEye)
  context.closePath()
  context.strokePath()
}
    
if !leftEyebrow.isEmpty {
  context.addLines(between: leftEyebrow)
  context.strokePath()
}
    
if !rightEyebrow.isEmpty {
  context.addLines(between: rightEyebrow)
  context.strokePath()
}
    
if !nose.isEmpty {
  context.addLines(between: nose)
  context.strokePath()
}
    
if !outerLips.isEmpty {
  context.addLines(between: outerLips)
  context.closePath()
  context.strokePath()
}
    
if !innerLips.isEmpty {
  context.addLines(between: innerLips)
  context.closePath()
  context.strokePath()
}
    
if !faceContour.isEmpty {
  context.addLines(between: faceContour)
  context.strokePath()
}

Here you’re adding drawing code for the remaining face landmarks. Note that leftEyebrow, rightEyebrow, nose and faceContour don’t need to close their paths. Otherwise, they look funny.

Now, open FaceDetectionViewController.swift again. At the end of updateFaceView(for:), add the following:

if let rightEye = landmark(
  points: landmarks.rightEye?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.rightEye = rightEye
}
    
if let leftEyebrow = landmark(
  points: landmarks.leftEyebrow?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.leftEyebrow = leftEyebrow
}
    
if let rightEyebrow = landmark(
  points: landmarks.rightEyebrow?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.rightEyebrow = rightEyebrow
}
    
if let nose = landmark(
  points: landmarks.nose?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.nose = nose
}
    
if let outerLips = landmark(
  points: landmarks.outerLips?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.outerLips = outerLips
}
    
if let innerLips = landmark(
  points: landmarks.innerLips?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.innerLips = innerLips
}
    
if let faceContour = landmark(
  points: landmarks.faceContour?.normalizedPoints, 
  to: result.boundingBox) {
  faceView.faceContour = faceContour
}

With this code, you add the remaining face landmarks to the FaceView and that’s it! You’re ready to build and run!

Nice work!