Beginning Machine Learning with Keras & Core ML

In this Keras machine learning tutorial, you’ll learn how to train a convolutional neural network model, convert it to Core ML, and integrate it into an iOS app. By Audrey Tam.

Leave a rating/review
Save for later
Share
You are currently viewing page 4 of 6 of this article. Click here to view the first page.

Sequential

You first create an empty Sequential model, then add a linear stack of layers: the layers run in the sequence that they’re added to the model. The Keras documentation has several examples of Sequential models.

Note: Keras also has a functional API for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers. Google’s Inception and Microsoft Research Asia’s Residual Networks are examples of complex models with nonlinear connectivity structures.

The first layer must have information about the input shape, which for MNIST is (28, 28, 1). The other layers infer their input shape from the output shape of the previous layer. Here’s the output shape part of the model summary:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_6 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 10, 10, 64)        18496     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 5, 5, 128)         8320      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 2, 2, 128)         0         
_________________________________________________________________
dropout_8 (Dropout)          (None, 2, 2, 128)         0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 128)               65664     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                1290      

Conv2D

This model has three Conv2D layers:

Conv2D(32, (5, 5), input_shape=input_shape, activation='relu')
Conv2D(64, (3, 3), activation='relu')
Conv2D(128, (1, 1), activation='relu')
  • The first parameter — 32, 64, 128 — is the number of filters, or features, you want to train this layer to detect. This is also the depth — the last dimension — of the output shape.
  • The second parameter — (5, 5), (3, 3), (1, 1) — is the kernel size: a tuple specifying the width and height of the convolution window that slides over the input space, computing weighted sums — dot products of the kernel weights and the input unit values.
  • The third parameter activation='relu' specifies the ReLU (Rectified Linear Unit) activation function. When the kernel is centered on an input unit, the unit is said to activate or fire if the weighted sum is greater than a threshold value: weighted_sum > threshold. The bias value is -threshold: the unit fires if weighted_sum + bias > 0. Training the model calculates the kernel weights and the bias value for each filter. ReLU is the most popular activation function for deep neural networks.

MaxPooling2D

MaxPooling2D(pool_size=(2, 2))

A pooling layer slides an n-rows by m-columns filter across the previous layer, replacing the n x m values with their maximum value. Pooling filters are usually square: n = m. The most commonly used 2 x 2 pooling filter, shown below, halves the width and height of the previous layer, thus reducing the number of parameters, which helps control overfitting.

Malireddi’s model has a pooling layer after each convolutional layer, which greatly reduces the final model size and training time.

Chollet’s model has two convolutional layers before pooling. This is recommended for larger networks, as it allows the convolutional layers to develop more complex features before pooling discards 75% of the values.

Conv2D and MaxPooling2D parameters determine each layer’s output shape and number of trainable parameters:

Output Shape = (input width – kernel width + 1, input height – kernel height + 1, number of filters)

You can’t center a 3×3 kernel over the first and last units in each row and column, so the output width and height are 2 pixels less than the input. A 5×5 kernel reduces output width and height by 4 pixels.

  • Conv2D(32, (5, 5), input_shape=(28, 28, 1)): (28-4, 28-4, 32) = (24, 24, 32)
  • MaxPooling2D halves the input width and height: (24/2, 24/2, 32) = (12, 12, 32)
  • Conv2D(64, (3, 3)): (12-2, 12-2, 64) = (10, 10, 64)
  • MaxPooling2D halves the input width and height: (10/2, 10/2, 64) = (5, 5, 64)
  • Conv2D(128, (1, 1)): (5-0, 5-0, 128) = (5, 5, 128)

Param # = number of filters x (kernel width x kernel height x input depth + 1 bias)

  • Conv2D(32, (5, 5), input_shape=(28, 28, 1)): 32 x (5x5x1 + 1) = 832
  • Conv2D(64, (3, 3)): 64 x (3x3x32 + 1) = 18,496
  • Conv2D(128, (1, 1)): 128 x (1x1x64 + 1) = 8320

Challenge: Calculate the output shapes and parameter numbers for Chollet’s architecture model_c.

[spoiler title=”Solution”]
Output Shape = (input width – kernel width + 1, input height – kernel height + 1, number of filters)

  • Conv2D(32, (3, 3), input_shape=(28, 28, 1)): (28-2, 28-2, 32) = (26, 26, 32)
  • Conv2D(64, (3, 3)): (26-2, 26-2, 64) = (24, 24, 64)
  • MaxPooling2D halves the input width and height: (24/2, 24/2, 64) = (12, 12, 64)

Param # = number of filters x (kernel width x kernel height x input depth + 1 bias)

  • Conv2D(32, (3, 3), input_shape=(28, 28, 1)): 32 x (3x3x1 + 1) = 320
  • Conv2D(64, (3, 3)): 64 x (3x3x32 + 1) = 18,496

[/spoiler]

Dropout

Dropout(0.5)
Dropout(0.2)

A dropout layer is often paired with a pooling layer. It randomly sets a fraction of input units to 0. This is another method to control overfitting: neurons are less likely to be influenced too much by neighboring neurons, because any of them might drop out of the network at random. This makes the network less sensitive to small variations in the input, so more likely to generalize to new inputs.

Aurélien Géron, in Hands-on Machine Learning with Scikit-Learn & TensorFlow, compares this to a workplace where, on any given day, some percentage of the people might not come to work: everyone would have to be able to do critical tasks, and would have to cooperate with more co-workers. This would make the company more resilient, and less dependent on any single worker.

Flatten

The weights from the convolutional layers must be made 1-dimensional — flattened — before passing them to the fully connected Dense layer.

model_m.add(Dropout(0.2))
model_m.add(Flatten())
model_m.add(Dense(128, activation='relu'))

The output shape of the previous layer is (2, 2, 128), so the output of Flatten() is an array with 512 elements.