Capturing Text From Camera Using SwiftUI

Learn how to capture text from the iPhone camera into your SwiftUI app so your users can enter data more quickly and easily. By Audrey Tam.

Leave a rating/review
Download materials
Save for later
Share

Your iPhone camera lets you capture landscapes, people and events, but it’s also a useful information-gathering tool. You see a concert poster or an ad for a service you need or a restaurant that looks interesting, and you take a photo. Later, you type or speak some text from the photo — a URL, a date, a phone number — into a search engine or a new contact.

But keyboard or voice input is error-prone. What if you could just copy and paste text from a photo or directly from the camera view? Better still, what if you could scan text in a photo or camera view directly into your app? Give a big welcome to iOS 15 Live Text!

In this tutorial, you’ll learn how to capture text from the iPhone camera into your SwiftUI app to let your users enter data more quickly and easily.

Note: You should be comfortable with using SwiftUI, Swift and Xcode to develop iOS apps. You’ll need Xcode 13 and an iPhone running iOS 15. Your iPhone must have been manufactured in 2018 or later, so it has an A12 (or later) Neural Engine. Its list of preferred languages must include at least one of these: English, Spanish, Chinese, French, Italian, German, Portuguese. And check this list for your region.

iOS 15 Live Text

The wonderful new iOS 15 Live Text feature works right out of the box, but only for fairly new boxes. And only for the languages and regions listed in the note above.

You need a newish (2018+) iPhone, one with an A12 or later Neural Engine. Live Text uses Apple’s Vision machine learning model, which needs the Neural Engine.

Live Text works on iPhone XS, iPhone XR and later. These iPhones do not support Live Text: iPhone SE (first generation), iPhone 6S, iPhone 6S Plus, iPhone 7, iPhone 7 Plus and iPhone X.

Live Text sort of works in Photos on 2018-or-later iPads, but not iPad Camera. This article is about using Live Text with Camera, so it’s only about iPhones.

Now grab your iPhone and make sure Live Text is turned on: In Settings, turn on Camera ▸ Live Text and General ▸ Language & Region ▸ Live Text:

Turn on Live Text in Settings.

Turn on Live Text in Settings.

Note: If you don’t see these settings, your iPhone doesn’t have a Neural Engine. If you’re looking for an excuse to get an iPhone 13, you’re welcome!

Live Text in Photos

Live Text detects text in photos and in the Camera ▸ Photo view finder. In this tutorial, you’ll use your iPhone’s camera. But first, see how great it is on your existing photos. It’s also easier to practice using Live Text on an image that isn’t moving around in a hand-held camera view.

Open Photos and find a photo that has some text, especially a URL, phone number, email or street address.

I have this photo I took at a yarn expo, as a reminder about a vendor I wanted to look up later.

Live Text in Photos: Tapping a URL opens it in Safari.

Live Text in Photos: Tapping a URL opens it in Safari.

I tapped the Live Text button (three lines in a viewfinder square); it turned blue. Then I tapped tarndie.com, and their web page opened in Safari!

If there’s a map address in your photo, tapping it opens Maps. Tapping a phone number displays the usual menu to Call, Send Message, FaceTime etc.

Live Text in Photos: Tapping address or phone number

Live Text in Photos: Tapping address or phone number

And if there’s an email address, tapping it opens a new message in your email app.

If you want to copy text from an app that doesn’t let you select text, just take a screenshot and open the preview photo:

Live Text in screenshot

Live Text in screenshot

Note: Thanks to Harshil Shah for tweeting this. The screen is from Chris Wu’s Museum Shuffle.

Live Text works the same way in the Camera app, but you need a steady hand. If you can’t quite get it to focus on what you want, just take a photo, then use Live Text on that.

Now keep reading to see how you can use Live Text in your apps.

Getting Started

Download the project materials using the Download Materials button at the top or bottom of this tutorial.

Open the WaitForIt project in the starter folder. This is a simple app where you can keep track of how long you have to wait for someone’s birthday. It uses the new Date.RelativeFormatStyle method relative(presentation:unitsStyle:).

WaitForIt

WaitForIt

To get camera input, you must run this app on your iPhone-with-Neural-Engine.

Build and Run on Your Device

Connect your iPhone to your Mac with a cable. In the target’s Signing & Capabilities tab, customize the Bundle Identifier and set a Team.

Customize bundle ID and set team.

Customize bundle ID and set team.

Select your iPhone from the run destination menu, then build and run.

Write or type your name and birthday in this format:

Name and birthday sample

Name and birthday sample

Camera input works with hand-writing but, in my experience, the writing needs to be really clear, more like careful printing than cursive.

Tap the + button. In the Add Person view, tap the Name text field then tap it a second time to show the Scan Text button:

Double-tap text field to show Scan Text button.

Double-tap text field to show Scan Text button.

Note: You might see just the scan button icon, alongside the Paste button. This tends to happen after you’ve used the scan button a few times, and the system decides you don’t need the text label anymore.

Tap this button to open the camera and point the camera at your name and birthday text:

Point camera at text.

Point camera at text.

Brackets appear around detected text, and the detected text also appears in the text field. The brackets and text field text can change as your hand moves the camera, detecting different amounts of text.

You can tap to indicate where you want the camera to focus, and you can drag up on the camera view to enlarge it:

Drag up to enlarge camera view.

Drag up to enlarge camera view.

If you want only part of the detected text, tap the scan button in the lower right corner to display the text:

Tap scan button to display detected text.

Tap scan button to display detected text.

Then tap or swipe to select what you want from the detected text:

Select from detected text.

Select from detected text.

And tap Insert to accept the text you selected:

Insert selected text.

Insert selected text.

Now add the birthday text in the same way, then tap Done to return to the list view:

New person added

New person added

It’s Magic

Now look at the code in AddPersonView.swift. There is absolutely nothing in the code about scanning text from the camera. This feature is part of iOS 15, and you get it free in any editable view.

So what’s in the rest of this article? A couple of features to improve the user experience:

  • Filtering camera input for specific text content types like dates, phone numbers and email addresses.
  • Displaying a Scan Text button to make the camera input feature visible to your users.

You can also implement a Scan Text button to create an editable view that isn’t a text field or text view, like the image-with-label example in the WWDC presentation.

Filtering Text Content Types

You’re probably a bit underwhelmed by this scanning, tapping and swiping procedure. If your app is looking for a specific format of information — URL, date, email or phone number — you want the camera to “see” only the relevant text and ignore the rest.

Your apps might already specify keyboard type to make it more convenient for users to enter numbers or email addresses. Maybe you also specify text content type to guide the keyboard’s suggestions and autofill.

Good news: You can use text content type to filter camera text input!

Filtering Date Text

Start by adding this modifier to the second (Birthday) TextField in AddPersonView.swift:

.textContentType(.dateTime)

This tells the system you expect the input text to be some form of date, time or duration. The Neural Engine’s Vision model will use this hint to filter the camera’s input for date or time text.

There are several text content types related to a person’s name, so why don’t you modify the Name text field? Well, for now, camera input only works with a few text content types.

Of all the text content types in the Attributes inspector Text Content Type menu, the camera currently filters for only fullStreetAddress, telephoneNumber, emailAddress, URL, shipmentTrackingNumber, flightNumber and dateTime.

Camera auto-detects only the content types with check marks.

Camera auto-detects only the content types with check marks.

OK, time to see if your modifier helps.

Build and run on your device and tap the + button. In the Add Person view, tap the Birthday text field then tap it again:

Button label might be Scan Date or Time when textContentType is dateTime.

Button label might be Scan Date or Time when textContentType is dateTime.

Note: As with the Scan Text button label, you might see Paste | scan-button-icon instead of Scan Date or Time. The date filter still works.

Now the camera only highlights text that relates to dates or times:

Camera detects only date or time.

Camera detects only date or time.

As before, any detected date or time text appears immediately in the text field. You must still tap Insert to accept the text.

What a great way to speed up text input from the camera!

Note: I added an email address to try out that text content type. If you change dateTime to emailAddress, the camera will focus only on email addresses.

Display a Camera Button

Everything so far is all built into iOS 15. But you can add the relevant code to your apps, too.

For example, to make the camera input feature more visible to your users, you can add a button that sets the whole magical process in motion. Once you know how the magic happens, you can use it to scan text from the camera into views that aren’t text fields or text views.

Magic Method

The new method captureTextFromCamera(responder:identifier:) is the key to the magic, which starts when your app calls this method to launch the camera. The responder must conform to UIResponder and UIKeyInput. A responder uses UIKeyInput methods to implement simple text entry.

Uh oh, UI prefixes … Yes indeed, captureTextFromCamera(responder:identifier:) is a UIAction, so you need a UIView to call it. You’ll create a UIButton that AddPersonView can display. You’ll set the action of this button to captureTextFromCamera(responder:identifier:). And the action’s responder will pass any text captured from the camera to a TextField in AddPersonView.

UIViewRepresentable

To create a UIView you can use in a SwiftUI app, you need to build a structure that conforms to UIViewRepresentable.

Note: Learn more about UIViewRepresentable in SwiftUI Apprentice and SwiftUI by Tutorials.

First, create a new Swift file and name it ScanButton. In this new file, replace import Foundation with the following code:

import SwiftUI

struct ScanButton: UIViewRepresentable {
  func makeUIView(context: Context) -> UIButton {
    let button = UIButton()
    return button
  }

  func updateUIView(_ uiView: UIButton, context: Context) { }
}

To conform to UIViewRepresentable, ScanButton must implement makeUIView(context:) and updateUIView(_:context:).

In this minimal form, makeUIView(context:) simply creates a UIButton. AddPersonView won’t update the button, so updateUIView(_:context:) is empty.

Coordinator

Tapping the button can capture text that ScanButton must pass to AddPersonView. To transfer data from a UIView to a SwiftUI View, ScanButton needs a coordinator.

Add this code inside ScanButton:

func makeCoordinator() -> Coordinator {
  Coordinator(self)
}

class Coordinator: UIResponder, UIKeyInput {
  let parent: ScanButton
  init(_ parent: ScanButton) { self.parent = parent }

  var hasText = false
  func insertText(_ text: String) { }
  func deleteBackward() { }
}

ScanButton calls makeCoordinator() before makeUIView(context:) and stores the Coordinator object in context.coordinator.

The action captureTextFromCamera(responder:identifier:) needs a UIResponder argument that conforms to UIKeyInput, so you make Coordinator a subclass of UIResponder and add the UIKeyInput protocol. Implementing this protocol will enable the coordinator to control text input.

UIKeyInput requires you to provide hasText, insertText(_:) and deleteBackward(). You want camera input and not keyboard input, so you only have to implement insertText(_:) to handle the camera input. The value of hasText doesn’t matter, so set it to false. And deleteBackward() doesn’t need to do anything.

The purpose of Coordinator is to pass text from the camera back to the SwiftUI view that calls ScanButton, so ScanButton needs a binding to a String property in the SwiftUI view.

Add this property at the top of ScanButton:

@Binding var text: String

AddPersonView will pass either $name or $birthday to ScanButton.

Now you can finish setting up Coordinator. Add this line to insertText:

parent.text = text

Yes, this really is all Coordinator needs to do!

Setting the Button’s Action

Now back to making your UIButton.

In makeUIView(context:), replace the button declaration with the following:

let textFromCamera = UIAction.captureTextFromCamera(
  responder: context.coordinator,
  identifier: nil)
let button = UIButton(primaryAction: textFromCamera)

You create a UIAction to capture text from the camera with your Coordinator object as the responder.

Then, you create the button with this action as its primary action. This sets the button’s title and image to the action’s title and image.

So what does this look like? Set up a preview to see…

Add this code, below ScanButton:

struct ScanButton_Previews: PreviewProvider {
  static var previews: some View {
    ScanButton(text: .constant(""))
      .previewLayout(.sizeThatFits)
  }
}

The preview canvas appears.

If the app is still running on your phone, stop it.

Select a simulator in the run destination menu, then press Option-Command-P or click Resume to see this:

Button preview

Button preview

The sizeThatFits is actually the full screen size of the selected simulator. You’ll trim it down in AddPersonView by setting width and height values for its frame.

Adding ScanButton to AddPersonView

Now head back to AddPersonView.swift. There are two ways you can use ScanButton here.

One way is to display the button next to the text field.

Replace the Name TextField with this HStack:

HStack {
  TextField("Name", text: $name)
  ScanButton(text: $name)
    .frame(width: 100, height: 56, alignment: .leading)
}

You pass a binding to name, set the button’s width and height, and make sure the action’s image is at the leading edge of the button’s frame.

Refresh the preview to see your button:

Scan Text button next to text field

Scan Text button next to text field

Another way to display the button is in the keyboard toolbar.

Add this modifier to the Birthday TextField:

.toolbar {
  ToolbarItemGroup(placement: .keyboard) {
    Spacer()
    ScanButton(text: $birthday)
  }
}

This time, you pass a binding to birthday. The toolbar constrains the button’s frame, and you insert Spacer() to push the button to the toolbar’s trailing edge.

Run Live Preview, then tap in the Birthday text field to see your button in the keyboard toolbar:

Scan Text button in keyboard toolbar

Scan Text button in keyboard toolbar

Note: Actually, this screenshot is from the simulator. The keyboard doesn’t appear in Live Preview in Xcode 13.

Now, turn off Live Preview and run the app on your device to try out both buttons.

Using Scan Text buttons

Using Scan Text buttons

Your ScanButton doesn’t behave exactly like the built-in scan button. Text detected by the camera doesn’t immediately appear in the text field. Also, the Birthday Scan Text button ignores the text field’s text content type setting. At least, this happens with the RC versions of Xcode 13 and iOS 15. You might see better results in a later Xcode/iOS version.

Double-tapping in the Birthday text field, then using the built-in scan button, still filters for date text.

Scan Text Into Title

ScanButton helps your users know about this input option, but it’s not really necessary. Once users know about camera input, they’ll expect to use it with any text field or text view. And this just works, with no help required from you.

What if you want to scan text into something like a label? No problem, ScanButton doesn’t care where it inserts captured text: Simply give it a String variable to insert text into, then the calling view can use this String value anywhere it wants.

For example, you can use scanned text to change the navigation title of AddPersonView.

In ScanButton, add this @Binding:

@Binding var title: String

In insertText(_:), add this line:

parent.title = "Add \(text)"

And in previews, replace ScanButton(text:) with this:

ScanButton(text: .constant(""), title: .constant(""))

Now, AddPersonView.swift is complaining about the missing argument for parameter title.

Add this @State property to AddPersonView:

@State private var title = "Add a Person"

In the Name text field’s HStack, replace ScanButton(text:) with this line:

ScanButton(text: $name, title: $title)

In the Birthday text field’s toolbar, replace ScanButton(text:) with this line:

ScanButton(text: $birthday, title: .constant(title))

You don’t want the birthday text to affect the view’s navigation title.

Finally, change .navigationTitle("Add a Person") to this:

.navigationTitle(title)

Build and run on your phone, tap the + button, then tap the Scan Text button next to the Name text field. Scan in a name and insert it.

The navigation title changes to match the inserted name:

Scanned text in navigation bar title

Scanned text in navigation bar title

This only works with your Scan Text button. Double-tapping in the Name text field, then using that scan button, won’t change the title.

Also check that scanning text into the Birthday text field doesn’t affect the navigation title.

Button Menu

If you want the Scan Text option to appear in a button menu, or you just want the scan button to have a smaller footprint, change the way you create ScanButton.

Modify makeUIView(context:) in ScanButton: Comment out the button declaration and add this code instead:

let button = UIButton()
button.setImage(
  UIImage(systemName: "camera.badge.ellipsis"),
  for: .normal)
button.menu = UIMenu(children: [textFromCamera])

You set the button image to a small SF Symbol and add the textFromCamera action to the button’s menu.

In AddPersonView.swift, change the Name text field’s ScanButton width to 56 (or less):

ScanButton(text: $name, title: $title)
  .frame(width: 56, height: 56, alignment: .leading)

Check this out in Live Preview, Simulator or on your device. Now, a long press shows the Scan Text option.

Scan Text as a menu option

Scan Text as a menu option

Where To Go From Here?

Download the final project using the Download Materials button at the top or bottom of the tutorial.

In this tutorial, you practiced using Live Text in Photos and used the built-in scan-text function to get text input from your phone’s camera. Then, you specified the textContentType of a text field to make the Vision model filter camera input for text in that format.

To make the camera input feature more visible to your users, you created a UIViewRepresentable button to launch captureTextFromCamera(responder:identifier:). For text fields and text views, this provides functionality similar to the built-in scan-text button. You easily extended your button to scan text into a non-editable view. Finally, you modified your button to have a menu with scan-text as one of its items.

Apple Resources

If you have any comments or questions, feel free to join in the forum discussion below!