Building an iOS App like Siri
Learn how to build an app that recognizes voice commands and speaks like Siri, using the SpeechKit framework. By Subhransu Behera.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Building an iOS App like Siri
30 mins
- What is SpeechKit?
- SpeechKit Framework Quick Reference
- Getting Started
- Getting a Yelp API Token
- Getting a Nuance Development Key
- Adding the SpeechKit Framework
- Speech Recognition with SKRecognizer
- Using Yelp Search to Find Matching Restaurants
- Text-to-speech Synthesis Using SKVocalizer
- Comparison with Other SDKs
- Where To Go From Here?
Using Yelp Search to Find Matching Restaurants
Once SpeechKit
can recognize a voice, you need to identify the search keyword and find matching restaurants using Yelp’s API.
To do this, start by opening ViewController.h and import the YelpAPIService.h header that contains the definition of YelpAPIService
class:
#import "YelpAPIService.h"
Next, declare that ViewController class will use YelpAPIServiceDelegate
:
@interface ViewController : UIViewController <UITextFieldDelegate, UITableViewDelegate, UITableViewDataSource, SpeechKitDelegate, SKRecognizerDelegate, YelpAPIServiceDelegate>
The YelpAPIServiceDelegate
provides the loadResultWithDataArray:
to detect when the Yelp search finishes and provides an array of matching restaurants.
Next, declare a few properties and methods:
@property (strong, nonatomic) YelpAPIService *yelpService;
@property (strong, nonatomic) NSString* searchCriteria;
- (NSString *)getYelpCategoryFromSearchText;
- (void)findNearByRestaurantsFromYelpbyCategory:(NSString *)categoryFilter;
This declares a property for the YelpAPIService
included in the starter project to interact with the Yelp API, a property for the search string, and two methods that will find Yelp search categories and matching restaurants.
Next, switch to ViewController.m and implement getYelpCategoryFromSearchText
as follows:
- (NSString *)getYelpCategoryFromSearchText {
NSString *categoryFilter;
if ([[self.searchTextField.text componentsSeparatedByString:@" restaurant"] count] > 1) {
NSCharacterSet *separator = [NSCharacterSet whitespaceAndNewlineCharacterSet];
NSArray *trimmedWordArray = [[[self.searchTextField.text componentsSeparatedByString:@"restaurant"] firstObject] componentsSeparatedByCharactersInSet:separator];
if ([trimmedWordArray count] > 2) {
int objectIndex = (int)[trimmedWordArray count] - 2;
categoryFilter = [trimmedWordArray objectAtIndex:objectIndex];
}
else {
categoryFilter = [trimmedWordArray objectAtIndex:0];
}
}
else if (([[self.searchTextField.text componentsSeparatedByString:@" restaurant"] count] <= 1)
&& self.searchTextField.text && self.searchTextField.text.length > 0){
categoryFilter = self.searchTextField.text;
}
return categoryFilter;
}
getYelpCategoryFromSearchText
extracts a category or keyword from search text, by looking for a particular pattern. For example, if the user says, “Japanese restaurants nearby” or “nearby Japanese restaurants” or “Japanese restaurant” or “Japanese restaurants” it’ll detect the keyword “Japanese” and pass that to the Yelp API.
The code in the above method splits the best possible search result by space and by taking the word that precedes ‘restaurant.’ For a more complex application, a complete set of grammar may be specified to fit the context or search category. For many applications, the search text is whatever the user says.
Next add this new method:
- (void)findNearByRestaurantsFromYelpbyCategory:(NSString *)categoryFilter {
if (categoryFilter && categoryFilter.length > 0) {
if (([CLLocationManager authorizationStatus] != kCLAuthorizationStatusDenied)
&& self.appDelegate.currentUserLocation &&
self.appDelegate.currentUserLocation.coordinate.latitude) {
[self.tableViewDisplayDataArray removeAllObjects];
[self.resultTableView reloadData];
self.messageLabel.text = @"Fetching results..";
self.activityIndicator.hidden = NO;
self.yelpService = [[YelpAPIService alloc] init];
self.yelpService.delegate = self;
self.searchCriteria = categoryFilter;
[self.yelpService searchNearByRestaurantsByFilter:[categoryFilter lowercaseString] atLatitude:self.appDelegate.currentUserLocation.coordinate.latitude andLongitude:self.appDelegate.currentUserLocation.coordinate.longitude];
}
else {
UIAlertView *alert = [[UIAlertView alloc] initWithTitle:@"Location is Disabled"
message:@"Enable it in settings and try again"
delegate:nil
cancelButtonTitle:@"OK"
otherButtonTitles:nil];
[alert show];
}
}
}
This method accepts a search category and uses the Yelp API to look for nearby restaurants that might quell the user’s hunger pangs. It’s usually a cuisine or category of restaurant, for example Chinese, Japanese, Barbecue, Sandwiches, Indian, etc. that the method passes, as well as the user’s latitude and longitude.
Almost done. Add the following line of code in the recognizer:didFinishWithResults:results
delegate method below the line self.recordButton.selected = !self.recordButton.isSelected;
:
// This will extract category filter from search text
NSString *yelpCategoryFilter = [self getYelpCategoryFromSearchText];
// This will find nearby restaurants by category
[self findNearByRestaurantsFromYelpbyCategory:yelpCategoryFilter];
The above code makes use of the methods that you implemented in steps 1 & 2; they use them to get a search category and execute a Yelp search.
YelpAPI calls loadResultWithDataArray:
once it has a response, so as a final step let’s implement that.
# pragma mark - Yelp API Delegate Method
-(void)loadResultWithDataArray:(NSArray *)resultArray {
self.messageLabel.text = @"Tap on the mic";
self.activityIndicator.hidden = YES;
self.tableViewDisplayDataArray = [resultArray mutableCopy];
[self.resultTableView reloadData];
}
Once the application has Yelp’s response, it reloads the tableView with the results. cellForRowAtIndexPath:
is already implemented in the sample project, as it displays a thumbnail, name, address and rating of each restaurant as received from Yelp.
Build & Run. Once the application launches, tap on the Mic icon and speak sentences like ‘Japanese Restaurants”> or “Chinese Restaurants nearby” or whatever kind of restaurant you’d like the app to find for you.
Note: If you don’t get any results, it could be there are no restaurants in Yelp’s database nearby your location. Try to choose a restaurant type you are sure is nearby you.
Note: If you don’t get any results, it could be there are no restaurants in Yelp’s database nearby your location. Try to choose a restaurant type you are sure is nearby you.
Text-to-speech Synthesis Using SKVocalizer
Now you’re almost there! For this next exercise, you will learn how to use SKVoicalizerDelegate
for text-to-speech synthesis.
In ViewController.h, declare that ViewController
is going to use SKVocalizerDelegate
. The ViewController delegate declaration should look like this:
@interface ViewController : UIViewController <UITextFieldDelegate, UITableViewDelegate, UITableViewDataSource, SpeechKitDelegate, SKRecognizerDelegate, YelpAPIServiceDelegate, SKVocalizerDelegate>
Declare these two properties:
@property (strong, nonatomic) SKVocalizer* vocalizer;
@property BOOL isSpeaking;
This declares a property for the vocalizer, and a BOOL that will keep track of the status of text-to-speech process.
Next, in ViewController.m add the following code in the else
section of recordButtonTapped:
:
if (self.isSpeaking) {
[self.vocalizer cancel];
self.isSpeaking = NO;
}
When the user taps the record button, the above codes will stop the current speech — if there’s one in progress — and cancel all pending speech requests.
Now, add the following code at the end of loadResultWithDataArray:resultArray:
:
if (self.isSpeaking) {
[self.vocalizer cancel];
}
self.isSpeaking = YES;
// 1
self.vocalizer = [[SKVocalizer alloc] initWithLanguage:@"en_US" delegate:self];
if ([self.tableViewDisplayDataArray count] > 0) {
// 2
[self.vocalizer speakString:[NSString stringWithFormat:@"I found %lu %@ restaurants",
(unsigned long)[self.tableViewDisplayDataArray count],
self.searchCriteria]];
}
else {
[self.vocalizer speakString:[NSString stringWithFormat:@"I could not find any %@ restaurants",
self.searchCriteria]];
}
These lines of code configure a new SKVocalizer
. Why? The text-to-speech synthesis uses SKVocalizer
to make your application speak text, and with this app it’s the number of restaurants it found.
This happens in 2 steps:
- First, you need to initialize the vocalizer object using
initWithLanguage:language delegate:
. - Second, you make the vocalizer object speak something using
speakString:
method. You’ve already added the code required to initialize and speak the text in previous step.
Next, add the SKVocalizer
delegate methods to cancel the vocalizer if there’s any error.
- (void)vocalizer:(SKVocalizer *)vocalizer willBeginSpeakingString:(NSString *)text {
self.isSpeaking = YES;
}
- (void)vocalizer:(SKVocalizer *)vocalizer didFinishSpeakingString:(NSString *)text withError:(NSError *)error {
if (error !=nil) {
UIAlertView *alert = [[UIAlertView alloc] initWithTitle:@"Error"
message:[error localizedDescription]
delegate:nil
cancelButtonTitle:@"OK"
otherButtonTitles:nil];
[alert show];
if (self.isSpeaking) {
[self.vocalizer cancel];
}
}
self.isSpeaking = NO;
}
These methods are called when the vocalizer starts and stops speaking, respectively. Note you set the isSpeaking
flag appropriately here.
Guess what – you’re finally done! Build and run, tap on the Mic icon and ask the app to find restaurants, for example, “Japanese Restaurants” or “Chinese Restaurants nearby.” The results should be similar to your previous results, but this time SKVocalizer
will make your application say how many restaurants it found.
Note: If you don’t hear SpeechKit saying anything, make sure your device isn’t in silent mode, and that the volume is turned up. This is an easy mistake to make because the initial messages that come from earcons like ‘listening’, ‘done listening’ will play even if your device is in silent mode, which might make you not realize your device is in silent mode.
Note: If you don’t hear SpeechKit saying anything, make sure your device isn’t in silent mode, and that the volume is turned up. This is an easy mistake to make because the initial messages that come from earcons like ‘listening’, ‘done listening’ will play even if your device is in silent mode, which might make you not realize your device is in silent mode.
Watch out – your app now talks back!