Beginning Machine Learning with Keras & Core ML
In this Keras machine learning tutorial, you’ll learn how to train a convolutional neural network model, convert it to Core ML, and integrate it into an iOS app. By Audrey Tam.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Beginning Machine Learning with Keras & Core ML
50 mins
 Why Use Keras?
 Getting Started
 Setting Up Docker
 ML in a Nutshell
 Keras Code Time!
 Import Utilities & Dependencies
 Load & PreProcess Data
 Define Model Architecture
 Train the Model
 Convolutional Neural Network: Explanations
 Sequential
 Conv2D
 MaxPooling2D
 Dropout
 Flatten
 Dense
 Compile
 Fit
 Verbose
 Results
 Convert to Core ML Model
 Inspect Core ML model
 Add Metadata for Xcode
 Save the Core ML Model
 Use Model in iOS App
 Where To Go From Here?
 Resources
 Further Reading
Sequential
You first create an empty Sequential
model, then add a linear stack of layers: the layers run in the sequence that they’re added to the model. The Keras documentation has several examples of Sequential
models.
The first layer must have information about the input shape, which for MNIST is (28, 28, 1). The other layers infer their input shape from the output shape of the previous layer. Here’s the output shape part of the model summary:
Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 24, 24, 32) 832 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 12, 12, 32) 0 _________________________________________________________________ dropout_6 (Dropout) (None, 12, 12, 32) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 10, 10, 64) 18496 _________________________________________________________________ max_pooling2d_6 (MaxPooling2 (None, 5, 5, 64) 0 _________________________________________________________________ dropout_7 (Dropout) (None, 5, 5, 64) 0 _________________________________________________________________ conv2d_8 (Conv2D) (None, 5, 5, 128) 8320 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 2, 2, 128) 0 _________________________________________________________________ dropout_8 (Dropout) (None, 2, 2, 128) 0 _________________________________________________________________ flatten_3 (Flatten) (None, 512) 0 _________________________________________________________________ dense_5 (Dense) (None, 128) 65664 _________________________________________________________________ dense_6 (Dense) (None, 10) 1290
Conv2D
This model has three Conv2D
layers:
Conv2D(32, (5, 5), input_shape=input_shape, activation='relu')
Conv2D(64, (3, 3), activation='relu')
Conv2D(128, (1, 1), activation='relu')
 The first parameter — 32, 64, 128 — is the number of filters, or features, you want to train this layer to detect. This is also the depth — the last dimension — of the output shape.
 The second parameter — (5, 5), (3, 3), (1, 1) — is the kernel size: a tuple specifying the width and height of the convolution window that slides over the input space, computing weighted sums — dot products of the kernel weights and the input unit values.
 The third parameter
activation='relu'
specifies the ReLU (Rectified Linear Unit) activation function. When the kernel is centered on an input unit, the unit is said to activate or fire if the weighted sum is greater than a threshold value:weighted_sum > threshold
. Thebias
value isthreshold
: the unit fires ifweighted_sum + bias > 0
. Training the model calculates the kernel weights and the bias value for each filter. ReLU is the most popular activation function for deep neural networks.
MaxPooling2D
MaxPooling2D(pool_size=(2, 2))
A pooling layer slides an nrows by mcolumns filter across the previous layer, replacing the n x m values with their maximum value. Pooling filters are usually square: n = m. The most commonly used 2 x 2 pooling filter, shown below, halves the width and height of the previous layer, thus reducing the number of parameters, which helps control overfitting.
Malireddi’s model has a pooling layer after each convolutional layer, which greatly reduces the final model size and training time.
Chollet’s model has two convolutional layers before pooling. This is recommended for larger networks, as it allows the convolutional layers to develop more complex features before pooling discards 75% of the values.
Conv2D
and MaxPooling2D
parameters determine each layer’s output shape and number of trainable parameters:
Output Shape = (input width – kernel width + 1, input height – kernel height + 1, number of filters)
You can’t center a 3×3 kernel over the first and last units in each row and column, so the output width and height are 2 pixels less than the input. A 5×5 kernel reduces output width and height by 4 pixels.

Conv2D(32, (5, 5), input_shape=(28, 28, 1))
: (284, 284, 32) = (24, 24, 32) 
MaxPooling2D
halves the input width and height: (24/2, 24/2, 32) = (12, 12, 32) 
Conv2D(64, (3, 3))
: (122, 122, 64) = (10, 10, 64) 
MaxPooling2D
halves the input width and height: (10/2, 10/2, 64) = (5, 5, 64) 
Conv2D(128, (1, 1))
: (50, 50, 128) = (5, 5, 128)
Param # = number of filters x (kernel width x kernel height x input depth + 1 bias)

Conv2D(32, (5, 5), input_shape=(28, 28, 1))
: 32 x (5x5x1 + 1) = 832 
Conv2D(64, (3, 3))
: 64 x (3x3x32 + 1) = 18,496 
Conv2D(128, (1, 1))
: 128 x (1x1x64 + 1) = 8320
Challenge: Calculate the output shapes and parameter numbers for Chollet’s architecture model_c
.
[spoiler title=”Solution”]
Output Shape = (input width – kernel width + 1, input height – kernel height + 1, number of filters)

Conv2D(32, (3, 3), input_shape=(28, 28, 1))
: (282, 282, 32) = (26, 26, 32) 
Conv2D(64, (3, 3))
: (262, 262, 64) = (24, 24, 64) 
MaxPooling2D
halves the input width and height: (24/2, 24/2, 64) = (12, 12, 64)
Param # = number of filters x (kernel width x kernel height x input depth + 1 bias)

Conv2D(32, (3, 3), input_shape=(28, 28, 1))
: 32 x (3x3x1 + 1) = 320 
Conv2D(64, (3, 3))
: 64 x (3x3x32 + 1) = 18,496
[/spoiler]
Dropout
Dropout(0.5)
Dropout(0.2)
A dropout layer is often paired with a pooling layer. It randomly sets a fraction of input units to 0. This is another method to control overfitting: neurons are less likely to be influenced too much by neighboring neurons, because any of them might drop out of the network at random. This makes the network less sensitive to small variations in the input, so more likely to generalize to new inputs.
Aurélien Géron, in Handson Machine Learning with ScikitLearn & TensorFlow, compares this to a workplace where, on any given day, some percentage of the people might not come to work: everyone would have to be able to do critical tasks, and would have to cooperate with more coworkers. This would make the company more resilient, and less dependent on any single worker.
Flatten
The weights from the convolutional layers must be made 1dimensional — flattened — before passing them to the fully connected Dense layer.
model_m.add(Dropout(0.2))
model_m.add(Flatten())
model_m.add(Dense(128, activation='relu'))
The output shape of the previous layer is (2, 2, 128), so the output of Flatten()
is an array with 512 elements.