Face Detection Tutorial Using the Vision Framework for iOS
In this tutorial, you’ll learn how to use Vision for face detection of facial features and overlay the results on the camera feed in real time. By Yono Mittlefehldt.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Face Detection Tutorial Using the Vision Framework for iOS
20 mins
Precise gene editing technology has been around since about 2012. So why don’t we all have super powers yet?!?
And what’s the greatest super power? No. Not flying. That’s far too dangerous.
The correct answer is laser heat vision!
Imagine what you could do with laser heat vision! You could save money on a microwave, easily light any candle in sight and don’t forget the ability to burn your initials into your woodworking projects. How cool would that be?
Well, apparently real life superpowers aren’t here yet, so you’ll have to deal with the next best thing. You’ll have to use your iPhone to give you pretend laser heat vision.
Fortunately, Apple has a framework that can help you out with this plan B.
In this tutorial, you’ll learn how to use the Vision framework to:
- Create requests for face detection and detecting face landmarks.
- Process these requests.
- Overlay the results on the camera feed to get real-time, visual feedback.
Get ready to super power your brain and your eyes!
Getting Started
Click the Download Materials button at the top or bottom of this tutorial. Open the starter project and explore to your heart’s content.
Currently, the Face Lasers app doesn’t do a whole lot. Well, it does show you your beautiful mug!
There’s also a label at the bottom that reads Face. You may have noticed that if you tap the screen, this label changes to read Lasers.
That’s exciting! Except that there don’t seem to be any lasers. That’s less exciting. Don’t worry — by the end of this tutorial, you’ll be shooting lasers out of your eyes like Super(wo)man!
You’ll also notice some useful Core Graphics extensions. You’ll make use of these throughout the tutorial to simplify your code.
Vision Framework Usage Patterns
All Vision framework APIs use three constructs:
- Request: The request defines the type of thing you want to detect and a completion handler that will process the results. This is a subclass of
VNRequest
. - Request handler: The request handler performs the request on the provided pixel buffer (think: image). This will be either a
VNImageRequestHandler
for single, one-off detections or aVNSequenceRequestHandler
to process a series of images. - Results: The results will be attached to the original request and passed to the completion handler defined when creating the request. They are subclasses of
VNObservation
Simple right?
Writing Your First Face Detector
Open FaceDetectionViewController.swift and add the following property at the top of the class:
var sequenceHandler = VNSequenceRequestHandler()
This defines the request handler you’ll be feeding images to from the camera feed. You’re using a VNSequenceRequestHandler
because you’ll perform face detection requests on a series of images, instead a single static one.
Now scroll to the bottom of the file where you’ll find an empty captureOutput(_:didOutput:from:)
delegate method. Fill it in with the following code:
// 1
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// 2
let detectFaceRequest = VNDetectFaceRectanglesRequest(completionHandler: detectedFace)
// 3
do {
try sequenceHandler.perform(
[detectFaceRequest],
on: imageBuffer,
orientation: .leftMirrored)
} catch {
print(error.localizedDescription)
}
With this code you:
- Get the image buffer from the passed in sample buffer.
- Create a face detection request to detect face bounding boxes and pass the results to a completion handler.
- Use your previously defined sequence request handler to perform your face detection request on the image. The
orientation
parameter tells the request handler what the orientation of the input image is.
Now you maybe be wondering: But what about detectedFace(request:error:)
? In fact, Xcode is probably wondering the same thing.
You’ll define that now.
Add the following code for detectedFace(request:error:)
to the FaceDetectionViewController
class, wherever you like:
func detectedFace(request: VNRequest, error: Error?) {
// 1
guard
let results = request.results as? [VNFaceObservation],
let result = results.first
else {
// 2
faceView.clear()
return
}
// 3
let box = result.boundingBox
faceView.boundingBox = convert(rect: box)
// 4
DispatchQueue.main.async {
self.faceView.setNeedsDisplay()
}
}
In this method you:
- Extract the first result from the array of face observation results.
- Clear the
FaceView
if something goes wrong or no face is detected. - Set the bounding box to draw in the
FaceView
after converting it from the coordinates in theVNFaceObservation
. - Call
setNeedsDisplay()
to make sure theFaceView
is redrawn.
The result’s bounding box coordinates are normalized between 0.0 and 1.0 to the input image, with the origin at the bottom left corner. That’s why you need to convert them to something useful.
Unfortunately, this function doesn’t exist. Fortunately, you’re a talented programmer!
Right above where you placed the method definition for detectedFace(request:error:)
, add the following method definition:
func convert(rect: CGRect) -> CGRect {
// 1
let origin = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.origin)
// 2
let size = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.size.cgPoint)
// 3
return CGRect(origin: origin, size: size.cgSize)
}
Here you:
- Use a handy method from
AVCaptureVideoPreviewLayer
to convert a normalized origin to the preview layer’s coordinate system. - Then use the same handy method along with some nifty Core Graphics extensions to convert the normalized size to the preview layer’s coordinate system.
- Create a
CGRect
using the new origin and size.
You’re probably tempted to build and run this. And if you did, you would be disappointed to see nothing on the screen except your own face, sadly free of lasers.
Currently FaceView
has an empty draw(_:)
method. You need to fill that in if you want to see something on screen!
Switch to FaceView.swift and add the following code to draw(_:)
:
// 1
guard let context = UIGraphicsGetCurrentContext() else {
return
}
// 2
context.saveGState()
// 3
defer {
context.restoreGState()
}
// 4
context.addRect(boundingBox)
// 5
UIColor.red.setStroke()
// 6
context.strokePath()
With this code, you:
- Get the current graphics context.
- Push the current graphics state onto the stack.
- Restore the graphics state when this method exits.
- Add a path describing the bounding box to the context.
- Set the color to red.
- Draw the actual path described in step four.
Phew! You’ve been coding for quite some time. It’s finally time!
Go ahead and build and run your app.

What a good looking detected face!
What Else Can You Detect?
Aside from face detection, the Vision framework has APIs you can use to detect all sorts of things.
- Rectangles: With
VNDetectRectanglesRequest
, you can detect rectangles in the camera input, even if they are distorted due to perspective. - Text: You can detect the bounding boxes around individual text characters by using
VNDetectTextRectanglesRequest
. Note, however, this doesn’t recognize what the characters are, it only detects them. - Horizon: Using
VNDetectHorizonRequest
, you can determine the angle of the horizon in images. - Barcodes: You can detect and recognize many kinds of barcodes with
VNDetectBarcodesRequest
. See the full list here. - Objects: By combining the Vision framework with CoreML, you can detect and classify specific objects using
VNCoreMLRequest
. - Image alignment: With
VNTranslationalImageRegistrationRequest
andVNHomographicImageRegistrationRequest
you can align two images that have overlapping content.
Amazing, right?
Well, there’s one more very important thing you can detect with the Vision framework. You can use it to detect face landmarks! Since this tutorial is all about face detection, you’ll be doing that in the next section.
Detecting Face Landmarks
The first thing you need to do is update your Vision request to detect face landmarks. To do this, open FaceDetectionViewController.swift and in captureOutput(_:didOutput:from:)
replace the line where you define detectFaceRequest
with this:
let detectFaceRequest = VNDetectFaceLandmarksRequest(completionHandler: detectedFace)
If you were to build and run now, you wouldn’t see any difference from before. You’d still see a red bounding box around your face.
Why?
Because VNDetectFaceLandmarksRequest
will first detect all faces in the image before analyzing them for facial features.
Next, you’re going to need to define some helper methods. Right below convert(rect:)
, add the following code:
// 1
func landmark(point: CGPoint, to rect: CGRect) -> CGPoint {
// 2
let absolute = point.absolutePoint(in: rect)
// 3
let converted = previewLayer.layerPointConverted(fromCaptureDevicePoint: absolute)
// 4
return converted
}
With this code, you:
- Define a method which converts a landmark point to something that can be drawn on the screen.
- Calculate the absolute position of the normalized point by using a Core Graphics extension defined in CoreGraphicsExtensions.swift.
- Convert the point to the preview layer’s coordinate system.
- Return the converted point.
Below that method, add the following:
func landmark(points: [CGPoint]?, to rect: CGRect) -> [CGPoint]? {
return points?.compactMap { landmark(point: $0, to: rect) }
}
This method takes an array of these landmark points and converts them all.
Next, you’re going to refactor some of your code to make it easier to work with and add functionality. Add the following method right below your two new helper methods:
func updateFaceView(for result: VNFaceObservation) {
defer {
DispatchQueue.main.async {
self.faceView.setNeedsDisplay()
}
}
let box = result.boundingBox
faceView.boundingBox = convert(rect: box)
guard let landmarks = result.landmarks else {
return
}
if let leftEye = landmark(
points: landmarks.leftEye?.normalizedPoints,
to: result.boundingBox) {
faceView.leftEye = leftEye
}
}
The only thing new here is the first if
statement in the function. That if
uses your new helper methods to convert the normalized points that make up the leftEye
into coordinates that work with the preview layer. If everything went well, you assigned those converted points to the leftEye
property of the FaceView
.
The rest looks familiar because you already wrote it in detectedFace(request:error:)
. So, you should probably clean that up now.
In detectedFace(request:error:)
, replace the following code:
let box = result.boundingBox
faceView.boundingBox = convert(rect: box)
DispatchQueue.main.async {
self.faceView.setNeedsDisplay()
}
with:
updateFaceView(for: result)
This calls your newly defined method to handle updating the FaceView
.
There’s one last step before you can try out your code. Open FaceView.swift and add the following code to the end of draw(_:)
, right after the existing statement context.strokePath()
:
// 1
UIColor.white.setStroke()
if !leftEye.isEmpty {
// 2
context.addLines(between: leftEye)
// 3
context.closePath()
// 4
context.strokePath()
}
Here you:
- Set the stroke color to white, to differentiate from the red bounding box.
- Add lines between the points that define the
leftEye
, if there are any points. - Close the path, to make a nice eye shape.
- Stroke the path, to make it visible.
Time to build and run!
A fun game with computer vision APIs is to look for words like left and right and guess what they mean. It’s different every time!

Awesome! If you try to open your eye wide or shut it, you should see the drawn eye change shape slightly, although not as much.
This is a fantastic milestone. You may want to take a quick break now, as you’ll be adding all the other face landmarks in one fell swoop.

Back already? You’re industrious! Time to add those other landmarks.
While you still have FaceView.swift open, add the following to the end of draw(_:)
, after the code for the left eye:
if !rightEye.isEmpty {
context.addLines(between: rightEye)
context.closePath()
context.strokePath()
}
if !leftEyebrow.isEmpty {
context.addLines(between: leftEyebrow)
context.strokePath()
}
if !rightEyebrow.isEmpty {
context.addLines(between: rightEyebrow)
context.strokePath()
}
if !nose.isEmpty {
context.addLines(between: nose)
context.strokePath()
}
if !outerLips.isEmpty {
context.addLines(between: outerLips)
context.closePath()
context.strokePath()
}
if !innerLips.isEmpty {
context.addLines(between: innerLips)
context.closePath()
context.strokePath()
}
if !faceContour.isEmpty {
context.addLines(between: faceContour)
context.strokePath()
}
Here you’re adding drawing code for the remaining face landmarks. Note that leftEyebrow
, rightEyebrow
, nose
and faceContour
don’t need to close their paths. Otherwise, they look funny.
Now, open FaceDetectionViewController.swift again. At the end of updateFaceView(for:)
, add the following:
if let rightEye = landmark(
points: landmarks.rightEye?.normalizedPoints,
to: result.boundingBox) {
faceView.rightEye = rightEye
}
if let leftEyebrow = landmark(
points: landmarks.leftEyebrow?.normalizedPoints,
to: result.boundingBox) {
faceView.leftEyebrow = leftEyebrow
}
if let rightEyebrow = landmark(
points: landmarks.rightEyebrow?.normalizedPoints,
to: result.boundingBox) {
faceView.rightEyebrow = rightEyebrow
}
if let nose = landmark(
points: landmarks.nose?.normalizedPoints,
to: result.boundingBox) {
faceView.nose = nose
}
if let outerLips = landmark(
points: landmarks.outerLips?.normalizedPoints,
to: result.boundingBox) {
faceView.outerLips = outerLips
}
if let innerLips = landmark(
points: landmarks.innerLips?.normalizedPoints,
to: result.boundingBox) {
faceView.innerLips = innerLips
}
if let faceContour = landmark(
points: landmarks.faceContour?.normalizedPoints,
to: result.boundingBox) {
faceView.faceContour = faceContour
}
With this code, you add the remaining face landmarks to the FaceView
and that’s it! You’re ready to build and run!

Nice work!
Using Detected Faces
Face detection is something you’ve probably been seeing more of recently. It can be especially useful for image processing, when you want to really make the people in the images shine.
But you’re going to do something way cooler than that. You’re going to shoot lasers out of your eyes!
Time to get started.
While still in FaceDetectionViewController.swift, right below updateFaceView(for:)
, add the following method:
// 1
func updateLaserView(for result: VNFaceObservation) {
// 2
laserView.clear()
// 3
let yaw = result.yaw ?? 0.0
// 4
if yaw == 0.0 {
return
}
// 5
var origins: [CGPoint] = []
// 6
if let point = result.landmarks?.leftPupil?.normalizedPoints.first {
let origin = landmark(point: point, to: result.boundingBox)
origins.append(origin)
}
// 7
if let point = result.landmarks?.rightPupil?.normalizedPoints.first {
let origin = landmark(point: point, to: result.boundingBox)
origins.append(origin)
}
}
Whew! That was quite a bit of code. Here’s what you did with it:
- Define a new method that will update the
LaserView
. It’s a bit likeupdateFaceView(for:)
. - Clear the
LaserView
. - Get the yaw from the result. The yaw is a number that tells you how much your face is turned. If it’s negative, you’re looking to the left. If it’s positive, you’re looking to the right.
- Return if the yaw is 0.0. If you’re looking straight forward, no face lasers. 😞
- Create an array to store the origin points of the lasers.
- Add a laser origin based on the left pupil.
- Add a laser origin based on the right pupil.
VNFaceObservation
would not move.
OK, you’re not quite done with that method, yet. You’ve determined the origin of the lasers. However, you still need to add logic to figure out where the lasers will be focused.
At the end of your newly created updateLaserView(for:)
, add the following code:
// 1
let avgY = origins.map { $0.y }.reduce(0.0, +) / CGFloat(origins.count)
// 2
let focusY = (avgY < midY) ? 0.75 * maxY : 0.25 * maxY
// 3
let focusX = (yaw.doubleValue < 0.0) ? -100.0 : maxX + 100.0
// 4
let focus = CGPoint(x: focusX, y: focusY)
// 5
for origin in origins {
let laser = Laser(origin: origin, focus: focus)
laserView.add(laser: laser)
}
// 6
DispatchQueue.main.async {
self.laserView.setNeedsDisplay()
}
Here you:
- Calculate the average y coordinate of the laser origins.
- Determine what the y coordinate of the laser focus point will be based on the average y of the origins. If your pupils are above the middle of the screen, you'll shoot down. Otherwise, you'll shoot up. You calculated
midY
inviewDidLoad()
. - Calculate the x coordinate of the laser focus based on the
yaw
. If you're looking left, you should shoot lasers to the left. - Create a
CGPoint
from your two focus coordinates. - Generate some
Laser
s and add them to theLaserView
. - Tell the iPhone that the
LaserView
should be redrawn.
Now you need to call this method from somewhere. detectedFace(request:error:)
is the perfect place! In that method, replace the call to updateFaceView(for:)
with the following:
if faceViewHidden {
updateLaserView(for: result)
} else {
updateFaceView(for: result)
}
This logic chooses which update method to call based on whether or not the FaceView
is hidden.
Currently, if you were to build and run, you would only shoot invisible lasers out of your eyes. While that sounds pretty cool, wouldn't it be better to see the lasers?
To fix this, you need tell the iPhone how to draw the lasers.
Open LaserView.swift and find the draw(_:)
method. It should be completely empty. Now add the following code to it:
// 1
guard let context = UIGraphicsGetCurrentContext() else {
return
}
// 2
context.saveGState()
// 3
for laser in lasers {
// 4
context.addLines(between: [laser.origin, laser.focus])
context.setStrokeColor(red: 1.0, green: 1.0, blue: 1.0, alpha: 0.5)
context.setLineWidth(4.5)
context.strokePath()
// 5
context.addLines(between: [laser.origin, laser.focus])
context.setStrokeColor(red: 1.0, green: 0.0, blue: 0.0, alpha: 0.8)
context.setLineWidth(3.0)
context.strokePath()
}
// 6
context.restoreGState()
With this drawing code, you:
- Get the current graphics context.
- Push the current graphics state onto the stack.
- Loop through the lasers in the array.
- Draw a thicker white line in the direction of the laser.
- Then draw a slightly thinner red line over the white line to give it a cool laser effect.
- Pop the current graphics context off the stack to restore it to its original state.
That's it. Build and run time!
Tap anywhere on the screen to switch to Lasers mode.

Great job!
Where to Go From Here?
You can do all sorts of things with your new found knowledge. Imagine combining the face detection with depth data from the camera to create cool effects focused around the people in your photos. To learn more about using depth data, check out this tutorial on working with image depth maps and this tutorial on working with video depth maps.
Or how about trying out a Vision and CoreML tag team? That sounds really cool, right? If that piques your interest, we have a tutorial for that!
You could learn how to do face tracking using ARKit with this awesome tutorial.
There are, of course, plenty of other Vision APIs you can play with. Now that you have a foundational knowledge of how to use them, you can explore them all!
We hope you enjoyed this tutorial and, if you have any questions or comments, please join the forum discussion below!