Chapters

Hide chapters

Machine Learning by Tutorials

Second Edition · iOS 13 · Swift 5.1 · Xcode 11

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: Machine Learning with Images

Section 1: 10 chapters
Show chapters Hide chapters

4. Getting Started with Python & Turi Create
Written by Audrey Tam & Matthijs Hollemans

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

Congratulations! If you’ve made it this far, you’ve developed a strong foundation for absorbing machine learning material. However, before we can move forward, we need to address the 10,000 pound snake in the room… Python. Until this point, you’ve made do with Xcode and Swift, however, if you’re going to get serious about Machine Learning, then it’s best you prepare yourself to learn some Python. In this chapter,

  • You’ll learn how to set up and use tools from the Python ecosystem for data science and machine learning (ML).
  • You’ll install Anaconda, a very popular distribution of Python (and R).
  • You’ll use terminal commands to create ML environments which you’ll use throughout this book.
  • Finally, you’ll use Jupyter Notebooks, which are very similar to Swift Playgrounds, to explore the Python language, data science libraries, and Turi Create, Apple’s ML-as-a-Service.

Starter folder

The starter folder for this chapter contains:

  • A notebook folder: The sample Jupyter Notebook data files.
  • .yaml files: Used to import pre-configured environments, if you want to skip the instructions for configuring the environments yourself.

Python

Python is the dominant programming language used for data science and machine learning. As such, there’s a myriad of tools available for the Python community to support data science and machine learning development. These include:

Packages and environments

Python is already installed on macOS. However, using this installation may cause version conflicts because some people use Python 2.7 while others use Python 3.x, which are incompatible branches of the same language. To further complicate things, working on machine learning projects requires integrating the correct versions of numerous software libraries, also known as “packages”.

Conda

The data science community developed Conda to make life easier. Conda handles Python language versions, Python packages, and associated native libraries. It’s both an environment manager and a package manager. And, if you need a package that Conda doesn’t know about, you can use pip within a conda environment to grab the package.

Installing Anaconda

In a browser, navigate to https://www.anaconda.com/download/#macos, and download the 64-bit Command Line installer with Python 3.7, as highlighted in the image below:

sh Anaconda3-2019.07-MacOSX-x86_64.sh

conda --version
export PATH="/Users/<username>/anaconda3/bin":"${PATH}"

Using Anaconda Navigator

Anaconda comes with a desktop GUI that you can use to create environments and install packages in an environment. However, in this book, you’ll do everything from the command line. Given this fact, it’s worth going over some basic commands with Conda which you’ll do in the next section.

Useful Conda commands

As mentioned before, Conda is a package and environment management system. When working with Python projects, you’ll often find it useful to create new environments, installing only the packages you need before writing your code. In this section, we’ll explore many useful commands you’ll reuse many times when working with Python and Conda.

Basic workflow

Create a new environment:

conda create -n <env name>
conda create -n <new env name> --clone <existing env name>
conda env create -f <.yaml file>
conda activate <env name>
(envname) $
conda install <pkg names>
conda install -n <env name> <pkg names>
pip install -r requirements.txt
jupyter notebook <directory path>
conda deactivate
conda remove -n <env name> --all
conda env remove -n <env name>

Listing environments or packages

List the environments you’ve created; the one with the * is the currently active environment:

conda info --envs
conda env list
(activeenv) $ conda list
(activeenv) $ conda list <package name>
conda list -n <env name>
conda list -n <env name> <package name>

Setting up a base ML environment

In this section, you’ll set up some environments. If you prefer a quicker start, create an environment from myenv.yaml and skip down to the Jupyter Notebooks section. You can do this by importing mlenv.yaml into Anaconda Navigator or by running the following command from a Terminal window:

conda env create -f starter/myenv.yaml

Python libraries for data science

Begin by creating a custom base environment for ML, with NumPy, Pandas, Matplotlib, SciPy and scikit-learn. You’ll be using these data science libraries in this book, but they’re not automatically included in new Conda environments.

conda create -n mlenv python=3.7
conda activate mlenv
conda install numpy pandas matplotlib seaborn scipy scikit-learn scikit-image ipython jupyter

An important note about package versions

Technology moves fast, also in the world of Python. Chances are that by the time you read this book, newer versions are available for the packages that we’re using. It’s quite possible these newer versions may not be 100% compatible with older versions.

Jupyter Notebooks

With Jupyter Notebooks, which are a lot like Swift Playgrounds, you can write and run code, and you can write and render markdown to explain the code.

Starting Jupyter

From Terminal, first activate your environment and then start Jupyter:

$ conda activate mlenv
$ jupyter notebook
/anaconda3/envs/mlenv/bin/jupyter_mac.command ; exit;

Pandas and Matplotlib

The notebook has a single empty cell. In that cell, type the following lines:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_json('corpus.json', orient='records')
data.head()

?data.tail

data.tail(3)

data.sort_values(by='title')

authors = data.author
freq = authors.value_counts()
freq

plt.hist(freq, bins=100)
plt.show()

Differences between Python and Swift

In this section, you’ll spend some time getting familiar with common Python syntax.

if a == b:
    print('a and b are equal')
    if a > c:
        print('and a is also greater than c')
if authors is None:
    print('authors is None')
else:
    print('authors is not None')
authors is not None
def mysum(x, y):
    result = x + y
    return result

print(mysum(1, 3))
mylist = [1, 2]
mylist.append(3)
if mylist:
    print('mylist is not empty')

for value in mylist:
    print(value)

print('List length: %d' % len(mylist))
mylist is not empty
1
2
3
List length: 3
for value in mylist:
    print(value)

    print('List length: %d' % len(mylist))
1
List length: 3
2
List length: 3
3
List length: 3

Transfer learning with Turi Create

Despite the difference in programming languages, deep down Turi Create shares a lot with Create ML, including transfer learning. With Turi Create v5, you can even do transfer learning with the same VisionFeaturePrint_Scene model that Create ML uses.

Creating a Turi Create environment

First, you need a new environment with the turicreate package installed. You’ll clone the mlenv environment to create turienv, then you’ll install turicreate in the new environment. Conda doesn’t know about turicreate, so you’ll have to pip install it from within Terminal.

conda create -n turienv --clone mlenv
#
# To activate this environment, use:
# > conda activate turienv
#
# To deactivate an active environment, use:
# > conda deactivate
#
conda activate turienv
pip install -U turicreate==5.8

List pip-installed packages

In Terminal, use this command to list all of the packages in the active environment or a specific package:

conda list
conda list coremltools
# packages in environment at /Users/amt1/anaconda3/envs/mlenv:
#
# Name                    Version                   Build  Channel
coremltools               3.0                       <pip>

Turi Create notebook

Note: If you skipped the manual environment setup and imported turienv.yaml into Anaconda Navigator, use the Jupyter Launch button on the Anaconda Navigator Home Tab instead of the command line below, then navigate in the browser to starter/notebook.

jupyter notebook <drag the starter/notebook folder in Finder to here>

import turicreate as tc
import matplotlib.pyplot as plt
train_data = tc.image_analysis.load_images("snacks/train",
                                           with_path=True)
len(train_data)
train_data.head()
The first rows in the SFrame
Bje luvrx dujj af vtu ZVvego

train_data.explore()
Explore the training images
Asvdola gba fhoakoyf iwuril

plt.imshow(train_data[0]["image"].pixel_data)
Looking at an image with matplotlib
Riomakb ec ob awore zigp zolwtucqok

snacks/train/hot dog/8ace0d8a912ed2f6.jpg
# Grab the full path of the first training example
path = train_data[0]["path"]
print(path)

# Find the class label
import os
os.path.basename(os.path.split(path)[0])

Getting the class labels

OK, now you know how to extract the class name for a single image, but there are over 4,800 images in the dataset. As a Swift programmer, your initial instinct may be to use a for loop, but if you’re really Swift-y, you’ll be itching to use a map function. SFrame has a handy apply() method that, like Swift’s map or forEach, lets you apply a function to every row in the frame:

train_data["path"].apply(lambda path: ...do something with path...)
train_data["label"] = train_data["path"].apply(lambda path:
    os.path.basename(os.path.split(path)[0]))
The SFrame now has a new column
Vhe WBcodi saj bak a tex wahivj

train_data["label"].summary()
Summary for the label column
Vankilp xof zpe rixuw xuxenx

train_data["label"].value_counts().print_rows(num_rows=20)

Let’s do some training

Once you have your data in an SFrame, training a model with Turi Create takes only a single line of code (OK, it’s three lines, but only because we have to fit it on the page):

model = tc.image_classifier.create(train_data, target="label",
                                   model="VisionFeaturePrint_Scene",
                                   verbose=True, max_iterations=50)
model = tc.load_model("HealthySnacks.model")

Validation

After 15 iterations, validation accuracy is close to training accuracy at ~90%. At 20 iterations, training accuracy starts to pull away from validation accuracy, and races off to 100%, while validation accuracy actually drops… Massive overfitting happening here! If the validation accuracy gets worse while the training accuracy still keeps improving, you’ve got an overfitting problem.

Testing

Run these commands to load the test dataset and get the class labels:

test_data = tc.image_analysis.load_images("snacks/test", with_path=True)

test_data["label"] = test_data["path"].apply(lambda path:
       os.path.basename(os.path.split(path)[0]))

len(test_data)
metrics = model.evaluate(test_data)
print("Accuracy: ", metrics["accuracy"])
print("Precision: ", metrics["precision"])
print("Recall: ", metrics["recall"])
print("Confusion Matrix:\n", metrics["confusion_matrix"])
Accuracy:  0.8697478991596639
Precision:  0.8753552272362406
Recall:  0.8695450680272108
Confusion Matrix:
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|  ice cream   |      candy      |   1   |
|    apple     |      banana     |   3   |
|    orange    |    pineapple    |   2   |
|    apple     |    strawberry   |   1   |
|  pineapple   |      banana     |   1   |
|  strawberry  |      salad      |   2   |
|   popcorn    |      waffle     |   1   |
|    carrot    |      salad      |   2   |
|    orange    |    watermelon   |   1   |
|   popcorn    |     popcorn     |   36  |
+--------------+-----------------+-------+
[107 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
The confusion matrix
Wli wihsajeub muzxiv

Exporting to Core ML

In the next cell, Shift-Enter this command:

model
Class                                    : ImageClassifier

Schema
------
Number of classes                        : 20
Number of feature columns                : 1
Input image shape                        : (3, 299, 299)

Training summary
----------------
Number of examples                       : 4590
Training loss                            : 1.2978
Training time (sec)                      : 174.5081
model.save("HealthySnacks.model")
model.export_coreml("HealthySnacks.mlmodel")

Shutting down Jupyter

To shut down Jupyter, click the Logout button in this browser window and also in the window showing your ML directory.

Deactivating the active environment

If you activated turienv at the terminal command line, enter this command to deactivate it:

conda deactivate

Docker and Colab

There are two other high-level tools for supporting machine learning in Python: Docker and Google Colaboratory. These can be useful for developing machine learning projects, but we’re not covering them in detail in this book.

Docker

Docker is like a virtual machine but simpler. Docker is a container-based system that allows you to re-use and modularize re-usable environments, and is a fundamental building block to scaling services and applications on the Internet efficiently. Installing Docker gives you access to a large number of ML resources distributed in Docker images as Jupyter notebooks like hwchong/kerastraining4coreml or Python projects like the bamos/openface face recognition model. Our Beginning Machine Learning with Keras & Core ML (bit.ly/36cS6KU) tutorial builds and runs a keras-mnist Docker image, and you can get comfortable using Docker with our Docker on macOS: Getting Started tutorial here: bit.ly/2os0KnY.

Search Docker Hub for image classifier
Jaamdt Telduw Gen pat uqelu bgeqziciel

Google Colaboratory

Google Research’s Colaboratory at colab.research.google.com is a Jupyter Notebook environment that runs in a browser. It comes with many of the machine learning libraries you’ll need, already installed. Its best feature is, you can set the runtime type of a notebook to GPU to use Google’s GPU for free. It even lets you use Google’s TPUs (tensor processing units).

from google.colab import drive
drive.mount('/content/drive/')

!ls "/content/drive/My Drive/machine-learning/snacks"

Key points

  • Get familiar with Python. Its widespread adoption with academics in the machine learning field means if you want to keep up to date with machine learning, you’ll have to get on board.
  • Get familiar with Conda. It will make working with Python significantly more pleasant. It allows you to try Python libraries in a controlled environment without damaging any existing environment.
  • Get familiar with Jupyter notebooks. Like Swift playgrounds, they provide a means to quickly test all things Python especially when used in combination with Conda.

Where to go from here?

You’re all set to continue learning about machine learning for image classification using Python tools. The next chapter shows you a few more Turi Create tricks. After that, you’ll be ready to learn how to create your own deep learning model in Keras.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now