Image Recognition With ML Kit

See how to use the new ML Kit library from Google to easily enable machine learning capabilities in your Android app and perform image recognition. By Aldo Olivares.

Leave a rating/review
Download materials
Save for later

A few months back Google introduced ML Kit, a new API to help developers add machine learning (ML) capabilities into their apps. Thanks to ML Kit, adding ML to your app is super easy and no longer restricted to ML experts.

In this tutorial, you’ll learn how to use Google’s ML Kit in your Android apps by creating an app capable of detecting food in your photographs. By the end of this tutorial, you’ll have learned:

  • What Image Recognition is and how it is useful.
  • How to set up ML Kit with your Android app and Firebase.
  • How to run image recognition on-device and on-cloud.
  • How to use the results from running image recognition with ML Kit.

Note: This tutorial assumes you have basic knowledge of Kotlin and Android. If you’re new to Android, check out our catalog of Android tutorials. If you know Android, but are unfamiliar with Kotlin, take a look at Kotlin for Android: An Introduction.

If you’ve never used Firebase before, check out the Firebase Tutorial for Android.

Getting Started

Instagram is a site regularly used by food bloggers. People love taking food pictures to share with family and friends. But how do you know if the food is delicious or not?

The project you’re working on, Delicious Food, will allow you to take a picture of some food with your camera and identify if the food is as good as it looks.

Start by downloading the materials for this tutorial using the Download materials button at the top or bottom of this tutorial. With the Android Studio 3.1.4 or greater welcome screen showing, open the project by clicking Open an existing Android Studio project and select the build.gradle file in the root of the DeliciousFoodStarterProject project.

If you explore the project, you will find in the layout folder two layout files (activity_main.xml, activity_splash.xml) and in the java folder three Kotlin files: MainActivity.kt, SplashActivity.kt and Extensions.kt.

The interface is already built for you, so you will only focus on writing code for this tutorial inside the MainActivity.kt file.

Build and run the app on a device or emulator.

Right now, it is an empty canvas with a button at the bottom. That is about to change! :]

Before diving into the project, first a little about image recognition.

Understanding Image Recognition

Image recognition, in the context of ML, is the ability of software to identify objects, places, people, writing and actions in images. Computers can use machine vision technologies, in combination with a camera and artificial intelligence software, to achieve image recognition. It is used to perform a large number of machine-based visual tasks, such as labeling the content of images with meta-tags.

Various types of labeling are possible, and include:

  • Image Labeling to classify common elements in pictures.
  • Text Recognition to process and recognize text from pictures.
  • Face Detection to help you know if a face is smiling, tilted, or frowning in pictures.
  • Barcode Scanning to read data encoded in standard barcode formats like QR Codes.
  • Landmark Recognition to identify popular places in images.

Current and Future Uses of Image Recognition

Image Labeling on Social Networks

If you have a Facebook or Instagram account, you might be familiar with face recognition.

Whenever a user uploads a photo, Facebook immediately suggests tagging some of your friends. Besides the tagging feature, image recognition translates content for visually impaired people with screen readers. It also helps to recognize inappropriate or offensive images.

There are privacy concerns around using people’s pictures to train ML and Artificial Intelligence (AI) technologies. Facebook states it only uses public pictures and not pictures from private accounts, most users are not even aware of that usage.

Security and privacy aside, it’s always interesting to know how ML and AIs work behind the scenes.

Organization of Pictures

Another popular use of image recognition is the automated organization of photo albums. Have you ever traveled to another country and ended up with hundreds of pictures stored on your phone?

Google Photos is a great example of such an app to store images. It helps you organize your pictures in albums by identifying common places, objects, friends or even pets.

Image recognition improves the user experience of organizing photos inside the app, enabling better discovery with the ability to accurately search through images. This is possible thanks to new discoveries in ML technologies, which identify patterns and groups of objects.

Image recognition is also used commercially to organize pictures in stock photography websites and provides photographers a platform to sell their content.

A problem with stock photo websites before ML is that many photographers are not tech savvy, or they have thousands of pictures to upload. Manual image classification is very time consuming and tedious.

Image recognition is thus critical for stock photography websites. It makes life easier for contributors by providing instant keyword suggestions and categories. It also helps users by making visual content available for search engines.

Self-Driving Cars

In the past couple of years, self-driving cars have evolved dramatically. Companies like Uber use computer vision technologies to create different versions of self-driving vehicles ranging from delivery trucks to cab drivers.

Computer vision and AI are the main technologies used to power self-driving cars. Image recognition helps to predict the speed and location of other objects in motion on the roads.

Augmented Reality

Augmented Reality has long been one of the most researched topics due to its uses in fields like gaming and user experience (UX).

With the help of image recognition, you can superimpose digital information on top of objects that you can see in the world – providing rich user experiences and interaction without precedents.

Pokémon Go, for example, uses augmented reality to put Pokémon in the landscape of places like the Eiffel Tower or the Empire State building.

Now that you have some background on the possible use cases for image recognition, it’s time to learn about on-device and on-cloud APIs in Firebase.

On-Device vs. On-Cloud APIs

On-device APIs can process data quickly without the need for an Internet connection. This is useful if you don’t want to consume the mobile data of your users and you need fast processing.

The main drawback is the confidence of results provided by ML. The confidence is a value showing how happy the ML algorithm is with the answer it provided. On-device APIs only have so much information to consult, so don’t be surprised if your device thinks that photo of a hot dog is a hamburger.

On-cloud APIs offer much more powerful processing capabilities thanks to Google Cloud Platform’s ML technology, but these APIs require an Internet connection to work. In the case of using the Google Cloud Platform, this requires a payment after the first 1,000 requests.

You can read a comparison of on-device and on-cloud APIs here, provided by Google:
Image Recognition on device vs in cloud

On-device APIs have 400+ labels for each category. They cover the most commonly found concepts in photos (like ‘food’). Cloud APIs have more than a 1,000+ labels in many categories, making it more likely that you get an accurate result.

Overall, the recommendation on which to use, on-device or on-cloud, is that you first carefully analyze your project needs. If you believe that you will need a high level of accuracy and money is not an issue, then go for the on-cloud APIs. Otherwise, stick with the on-device APIs; they are usually enough for most projects.

Note: Since this project is rather simple, all the code lives inside the MainActivity.kt file. However, for more complex projects, you will normally use design patterns like MVC or MVVM. Remember that activities should focus on interacting with the UI rather than your API or database.