Hide chapters

Machine Learning by Tutorials

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: Machine Learning with Images

Section 1: 10 chapters
Show chapters Hide chapters

6. Taking Control of Training with Keras
Written by Matthijs Hollemans

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

In the previous chapters, you’ve learned how to train your own models using Create ML and Turi Create. These are user-friendly tools that are easy to get started with — you don’t really have to write a lot of code and they take care of most of the details. With just a few lines you can load your data, train your model and export to Core ML.

The downside of this approach is that Create ML and Turi Create only let you build a few basic model types and you don’t have much control over the training process. This is fine if you’re just getting your feet wet with machine learning. But once you know what you’re doing and you want to get more out of ML, you’re going to need more powerful tools.

In this chapter, you’ll use a popular deep learning tool called Keras to train the snacks classifier. Keras gives you much more control over the design of the models and how they are trained. Once you know your way around Keras, you’ll be able to build any kind of neural network you want.

Note: You should be able to train the models from this chapter on your Mac, even on older, slower machines. The models are small enough to be trained on the CPU and don’t need GPU acceleration — only a little patience.

Keras runs on top of a so-called backend that performs the actual computations. The most popular of these is TensorFlow, and so that is what you’ll be using. TensorFlow is currently the number one machine-learning tool in existence. However, it can be a little tricky to use due to its low-level nature. Keras makes using TensorFlow a lot easier.

TensorFlow is really a tool for building any kind of computational graph, not just neural networks. Instead of neural network layers, TensorFlow deals with rudimentary mathematical operations such as matrix multiplications and taking derivatives. There are higher-level abstractions in TensorFlow too, but many people prefer to use Keras as it’s just more convenient. In fact, Keras is so popular there is now a version of Keras built into TensorFlow.

Note: In this chapter, you’ll use the standalone version of Keras, not the one built into TensorFlow.

Getting started

First, you need to set up a Python environment for running Keras. The quickest way is to perform these commands from a Terminal window:

$ cd /path/to/chapter/resources
$ conda env create --file=starter/kerasenv.yaml
$ conda activate kerasenv
$ jupyter notebook

If you downloaded the snacks dataset for a previous chapter, copy or move it into the starter folder. Otherwise, double-click starter/snacks-download-link.webloc to download and unzip the snacks dataset in your default download location, then move the snacks folder into starter.

Note: In this book we’re using Keras version 2.2.4 and TensorFlow version 1.14. Keras, like many open source projects, changes often and sometimes new versions are incompatible with older ones. If you’re using a newer version of Keras and you get error messages, please install version 2.2.4 into your working environment. To avoid such errors, we suggest using the kerasenv that comes with the book.

Tip: If your computer runs Linux and has an NVIDIA GPU that supports CUDA, edit kerasenv.yaml and replace tensorflow=1.14 with tensorflow-gpu=1.14. Or if you have already created the environment, run pip install -U tensorflow-gpu==1.14. This will install the GPU version of TensorFlow, which runs a lot faster.

Back to basics with logistic regression

One of the key topics in this book is transfer learning: a logistic regression model is trained on top of features extracted from the training images. In the case of Create ML, the features were extracted by the very powerful “Vision FeaturePrint.Scene” neural network that is built into iOS 12. In the case of Turi Create, the feature extractor you used was the somewhat less powerful SqueezeNet.

A quick refresher

Logistic regression is a statistical model used in machine learning that tries to find a straight line between your data points that best separates the classes.

Logistic regression
Wucerxas meslaqyuuq

Let’s talk math

In the above illustration, data points are two dimensional: They have two coordinates, x[0] and x[1]. In most machine-learning literature and code, x is the name given to the training examples.

y = a*x + b
y = a[0]*x[0] + a[1]*x[1] + b
The values of y
Cne qatuos om g

func formula(x0: Double, x1: Double) -> Double {
  let a0 = 1.2
  let a1 = -1.5
  let b = 0.2
  return a0*x0 + a1*x1 + b

Into the 150,000th dimension

Two-dimensional data is easy enough to understand, but how does this work when you have data points with 150,000 or more dimensions? You just keep adding coefficients to the formula:

y = a[0]*x[0] + a[1]*x[1] + a[2]*x[2]
	 + ... + a[149999]*x[149999] + b
y = dot(a, x) + b
func dot(_ v: [Double], _ w: [Double]) -> Double {
  var sum: Double = 0
  for i in 0..<v.count {
    sum += v[i] * w[i]
  return sum

From linear to logistic

To turn the linear regression formula into a classifier, you extend the formula to make it a logistic regression:

probability = sigmoid(dot(a, x) + b)
sigmoid(x) = 1 / (1 + exp(-x))
The logistic sigmoid function
Xvu sakevcul zifsuin mepmraev

Not everything is black and white…

What if you have more than two classes? In that case, you’ll use a variation of the formula called multinomial logistic regression that works with any number of classes. Instead of one output, you now compute a separate prediction for each class:

probability_A = sigmoid(dot(a_A, x) + b_A)
probability_B = sigmoid(dot(a_B, x) + b_B)
probability_C = sigmoid(dot(a_C, x) + b_C)
probability_D = sigmoid(dot(a_D, x) + b_D)
...and so on...
output = matmul(W, x) + b
probability_A = sigmoid(output[0])
probability_B = sigmoid(output[1])
probability_C = sigmoid(output[2])
...and so on...
probabilities = softmax(matmul(W, x) + b)

Building the model

In this section, you’ll turn the above math into code using Keras. Fortunately, Keras takes care of all the details for you, so if the math in the previous section went over your head, rest assured that you don’t actually need to know it. Phew!

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import *
from keras import optimizers
image_width = 32
image_height = 32
num_classes = 20
model = Sequential()
model.add(Flatten(input_shape=(image_height, image_width, 3)))
The logistic regression model in Keras
Lba kacapsuj mengopfuuq tatic ur Rimey

Flatten turns the 3D image into a 1D vector
Jkuxpug zigvv vsi 3K awalu eype o 1M yeblis

What the Dense layer does
Vbeg bse Jizfu fenag nuav

y = a[0]*x[0] + a[1]*x[1] + ... + a[3071]*x[3071] + b
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 3072)              0         
dense_1 (Dense)              (None, 20)                61460     
activation_1 (Activation)    (None, 20)                0         
Total params: 61,460
Trainable params: 61,460
Non-trainable params: 0

Compiling the model

Before you can use the model you first need to compile it. This tells Keras how to train the model.


Loading the data

You’ve already seen the snacks dataset in the previous chapters. It consists of three different folders (train, val, test), each containing 20 folders for the different classes, and each folder contains several dozen or hundred images.

The snacks dataset
Zwu ghuwpm hoxuzow

images_dir = "snacks"
train_data_dir = images_dir + "/train/"
val_data_dir = images_dir + "/val/"
test_data_dir = images_dir + "/test/"
from keras.preprocessing import image
img = image.load_img(
  train_data_dir + "apple/cecd90f5d46f57b0.jpg",
  target_size=(image_width, image_height))
%matplotlib inline
import matplotlib.pyplot as plt
Viewing the image with matplotlib
Cieyakn bra ocida koyf ratcxignuy

x = image.img_to_array(img)
array([[[215., 215., 217.],
        [211., 211., 211.],
        [207., 207., 207.],
        [152., 150., 137.],
        [148., 146., 133.],
        [149., 147., 132.]], ...
def normalize_pixels(image):
    return image / 127.5 - 1
x = image.img_to_array(img)
x = normalize_pixels(x)
x = np.expand_dims(x, axis=0)
array([[[[ 0.6862745 ,  0.6862745 ,  0.7019608 ],
         [ 0.654902  ,  0.654902  ,  0.654902  ],
         [ 0.62352943,  0.62352943,  0.62352943],
         [ 0.19215691,  0.17647064,  0.07450986],
         [ 0.16078436,  0.14509809,  0.04313731],
         [ 0.1686275 ,  0.15294123,  0.03529418]], ...

Too soon to start making predictions?

Even though the model isn’t trained yet, you can already make a prediction on the input image:

pred = model.predict(x)
[[0.04173137 0.00418671 0.02269506 0.02889681 0.08140159 0.03577968
  0.03044504 0.04758682 0.07940029 0.07274284 0.04531444 0.0115772
  0.17158438 0.02129039 0.0233359  0.1150756  0.00603842 0.08578367
  0.03525693 0.03987688]]

Using generators

You’ve seen how to load an image into a tensor and how to plot it in the notebook. That’s handy for verifying that the training data is correct. During training, you won’t have to load the training images by hand. Keras has a useful helper class called ImageDataGenerator that can automatically load images from folders.

from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
batch_size = 64

train_generator = datagen.flow_from_directory(
                    target_size=(image_width, image_height),

val_generator = datagen.flow_from_directory(
                    target_size=(image_width, image_height),

test_generator = datagen.flow_from_directory(
                    target_size=(image_width, image_height),
Found 4838 images belonging to 20 classes.
Found 955 images belonging to 20 classes.
Found 952 images belonging to 20 classes.
x, y = next(train_generator)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 1., 0., 0., 0., 0., 0., 0.], dtype=float32)
{'apple': 0,
 'banana': 1,
 'cake': 2,
 'candy': 3,
 'carrot': 4,
 'cookie': 5,
 'doughnut': 6,
 'grape': 7,
 'hot dog': 8,
 'ice cream': 9,
 'juice': 10,
 'muffin': 11,
 'orange': 12,
 'pineapple': 13,
 'popcorn': 14,
 'pretzel': 15,
 'salad': 16,
 'strawberry': 17,
 'waffle': 18,
 'watermelon': 19}
index2class = {v:k for k,v
  in train_generator.class_indices.items()}
'apple'      [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
'banana'     [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
'cake'       [0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
'candy'      [0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
              . . .
'waffle'     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0]
'watermelon' [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]

The first evaluation

At this point, it’s a good idea to run the untrained model on the entire test set, to verify that the model and the generators actually work.

[3.311799808710563, 0.059873949579831935]

Training the logistic regression model

All the pieces are in place to finally train the model. First, do the following:

import warnings

What happens during training?

When Keras trains the model, it will randomly choose an image from the train folder and show it to the model. Say it picks an image from the banana folder. The model will then make a prediction, for example pretzel. Of course, this is totally wrong.

[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
  0., 0., 0., 0., 0., 0., 0., 0., 0., 0. ]
The ground-truth probabilities
Mci zjeaqz-xmuxx wvuzicumubuoz

[ 0.01360181, 0.21590623, 0.00830788, 0.01217055, 0.05090828,
  0.01749134, 0.01430813, 0.07134261, 0.02015499, 0.00142231,
  0.01328659, 0.01184934, 0.01497147, 0.04739711, 0.00372085,
  0.38552788, 0.03598726, 0.0047219 , 0.01521332, 0.04171015 ]
The predicted probabilities
Rse nxikefwiq rqadiyuyeseup

Hey, it’s progress!

While the training process is happening, Keras outputs a progress bar:

Epoch 1/5
76/76 [==============================] - 3s 38ms/step -
	loss: 3.2150 - acc: 0.1050 -
	val_loss: 3.2654 - val_acc: 0.1162
Epoch 2/5
76/76 [==============================] - 2s 26ms/step -
	loss: 2.7257 - acc: 0.2079 -
	val_loss: 3.2375 - val_acc: 0.1152
Epoch 3/5
76/76 [==============================] - 2s 27ms/step -
	loss: 2.4124 - acc: 0.2990 -
	val_loss: 3.2756 - val_acc: 0.1120
Epoch 4/5
76/76 [==============================] - 2s 27ms/step -
	loss: 2.1712 - acc: 0.3722 -
	val_loss: 3.2727 - val_acc: 0.1246
Epoch 5/5
76/76 [==============================] - 2s 26ms/step -
	loss: 1.9735 - acc: 0.4462 -
	val_loss: 3.3359 - val_acc: 0.1141
[3.142886356145394, 0.12079831951556086]

It could be better…

What does this mean? Well, the model did learn something. After all, you started with a validation accuracy of 0.05 and it went up to about 0.12. So the model did gain a little bit of knowledge about the dataset. It is no longer making completely random guesses — but it’s still not doing much better than that.

Your first neural network

Logistic regression is considered to be one of the classical machine-learning algorithms. Deep learning is new and modern and hip, and is all about artificial neural networks. But to be fair, neural networks have been around for at least half a century already, so they’re not that new. In this section, you’ll expand the logistic regression model into an artificial neural net.

An old-school fully-connected neural network
Ux ocn-rlweod tolqh-kegzostez zaizis roxtily

model = Sequential()
model.add(Flatten(input_shape=(image_height, image_width, 3)))
model.add(Dense(500, activation="relu"))  # this line is new
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 3072)              0         
dense_1 (Dense)              (None, 500)               1536500   
dense_2 (Dense)              (None, 20)                10020     
activation_1 (Activation)    (None, 20)                0         
Total params: 1,546,520
Trainable params: 1,546,520
Non-trainable params: 0
The ReLU activation function
Wbo ZoHE uljezujaoq pabgqoum

y = max(0, x)
output_dense_1 = relu(matmul(W_1, x) + b_1)
output_dense_2 = softmax(matmul(W_2, output_dense_1) + b_2)
Epoch 1/3
76/76 [==============================] - 2s 28ms/step -
	loss: 3.2228 - acc: 0.1315 -
	val_loss: 3.1306 - val_acc: 0.1351
Epoch 2/3
76/76 [==============================] - 2s 24ms/step -
	loss: 2.4553 - acc: 0.2849 -
	val_loss: 3.0794 - val_acc: 0.1466
Epoch 3/3
76/76 [==============================] - 2s 27ms/step -
	loss: 2.0033 - acc: 0.4284 -
	val_loss: 3.1929 - val_acc: 0.1613


Challenge 1: Add layers to the neural network

Try adding more layers to the neural network, and varying the number of neurons inside these layers. Can you get a better test score this way? You’ll find that the more layers you add, the harder it actually becomes to train the model.

Key points

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2023 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now