Image recognition using Keras

Image recognition using Keras

Introduction

In this assignment you will use the Keras Neural Network API for Python to build neural networks for image classification.  

Data Set

The data set contains 60.000 images labeled with 10 different categories:

Numeric ID Category Name
0 airplane
1 automobile
2 bird
3 cat
4 deer
5 dog
6 frog
7 horse
8 ship
9 truck

Each image is 32×32 pixels large and there are three color channels (red, green blue). Each image can therefore be represented as three 32×32 matrices or one 32x32x3 cube.
Here are some example images:

Error! Filename not specified.

Getting started: Installing Numpy, Keras and TensorFlow

Keras is a high-level Python API that allows you to easily construct, train, and apply neural networks.
However, Keras is not a neural network library itself and depends on one of several neural network backends. We will use the Tensorflow backend. TensorFlow is an open-source library for neural networks (and other mathematical models based on sequences of matrix and tensor computations), originally developed by Google.

Numpy is a numeric computing package for Python. Keras uses numpy data structures.

Using an interactive Python shell

For this homework assignment, it can be useful to use an interactive Python interpreter to run experiments. We recommend using Jupyter Notebook, which is a web-application that allows you to type and run Python code in a web browser. Other options include IPython qtconsole or the built-in ipython console in Anaconda’s Spyder IDE. 

Part 1 – Loading the CIFAR-10 Data

The following code fragment imports the CIFAR-10 data using Keras.

from keras.datasets import cifar10

train, test = cifar10.load_data()

xtrain, ytrain = train

xtest, ytest = test

xtrain, ytrain, xtest, and ytest are numpy n-dimension arrays, containing the training and testing data.
You can look at the format of these arrays:

>>> xtrain.shape
(50000, 32, 32, 3)
>>>  ytrain.shape
(50000, 1)

The input training data (xtrain) is a 4-dimensional array containing 50000 images, each of them a 32x32x3 tensor. Numpy arrays can be indexed like nested Python lists, so xtrain[0] will give you the first 32x32x3 image.

The input label (ytrain) is a vector containing the numeric class for each image. For example, xtrain[0] is an image of a frog and therefore ytrain[0] contains the value 6.

Visualizing Images

from matplotlib import pyplot as plt
%matplotlib inline
xtrain, ytrain_1hot, xtest, ytest_1hot = load_cifar10()
plt.imshow(xtrain[6]) 

1-hot representation for class labels

The output layer of the neural networks will contain 10 neurons corresponding to the 10 classes. The classifier predicts the class whose corresponding neuron has the highest activation. We need to convert the numeric indices for each image into a 1-hot vector of length 10, so that the for class label n the n-th element is 1 and all other elements are 0.

Write the function load_cifar10(), which should load the cifar-10 data as described above and should return 4 numpy arrays xtrain, ytrain_1hot, xtest, ytest_1hot. Your function should convert the y arrays into the 1-hot representation.

Your function should also do the following normalization on the data. The R,G and B values for each pixel range between 0 and 255. Before returning the training data, normalize it so that these value range between 0.0 and 1.0.

Part 2 – Multilayer-Neural Network

Designing the Network

We will start by building a Neural Network with a single hidden layer. Keras neural networks are created in two steps. First, you specify the layers of the network (the “computation graph”), then you compile this network so you can train the weights and evaluate it. This split is typical of most neural network packages.

You will complete the function build_multilayer_nn(), that creates and returns a Keras model object, but will not train it.

Each neural network consists of a number of layers. In Keras, the layers are more general. You can think of each-layer as a function applied to an n-dimensional array and resulting in another m-dimensional array.

nn = Sequential()

You can then add layers to this object one-by-one.

The single hidden layer we will use is an instance of class.  A dense layer is one in which all neurons are connected to all inputs to the layer (i.e. a typical neural network layer). The following creates a layer with 100 neurons and using the recitifier function as the activation function.

hidden = Dense(units=100, activation=”relu”)
nn.add(hidden)

The output layer, as mentioned above, will be a dense layer containing 10 neurons. As activation function, we will use the Softmax function, which adjusts the activations at the output layer so that activations sum up to 1.0. It is then possible to think of the output activation as a “probability distribution” indicating the probability that the input belongs to a certain class.

output = Dense(units=10, activation=”softmax”)
nn.add(output)

When defining the first layer of the network, which is the hidden layer, we need to also define the shape of the input array to that layer. For all other layers, the input shape is infered automatically.

The problem is that Dense layers only accept 1 dimensional input, but each image is a 32x32x3 array. We could convert each 32x32x3 array into an array of size 3072 and then specify the input shape of the hidden layer as

hidden = Dense(units=100, activation=”relu”, input_shape=(3072,))

When you run the function build_multilayer_nn() it will return the Keras model object. You can run the method summary() on this object to print out a summary of the network topology.

>>> nn = build_multilayer_nn()
>>> nn.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 3072) 0
_________________________________________________________________
dense_1 (Dense) (None, 100) 307300
_________________________________________________________________
dense_2 (Dense) (None, 10) 1010
=================================================================
Total params: 308,310
Trainable params: 308,310
Non-trainable params: 0
_________________________________________________________________

Training the Model

The function train_multilayer_nn(model, xtrain, ytrain) has already been written for you.

def train_multilayer_nn(model, xtrain, ytrain):
sgd = optimizers.SGD(lr=0.01)
model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])
model.fit(xtrain, ytrain_1hot, epochs=20, batch_size=32)

Before the training process, the model needs to be configured (“compiled”) by providing a loss function, optimizer algorithm, and a metric to be measured. The loss function we will use is categorical crossentropy, which essentially measures how different the output distribution is from the target distribution (i.e. the one-hot target vector).
The optimizer is stochastic gradient descent (SGD). The idea behind SGD is similar to the gradient descent optimization methods discussed in class, but the training samples are shuffled in every training epoch and the gradient is updated with respect to a batch of training samples rather than a single one. We use a learning rate of 0.01, set the batch size to 32 and train for 20 epochs.

If you set up the model topology correctly, you should be able to train the model like this:

>>> xtrain, ytrain_1hot, xtest, ytest_1hot = load_cifar10()
>>> nn = build_multilayer_nn()
>>> train_multilayer_nn(nn, xtrain, ytrain_1hot)

The training should not take more than 5 minutes and should report a final accuracy of about 0.52 on the training set. The model is not yet fitted well to the training data, so training has not yet converged. You can continue running the training process for more epochs, but model performance will improve only slowly.

Evaluating the Trained Model

Once the model has been trained, you should evaluate it on the test set.

>>> nn.evaluate(xtest, ytest_1hot) 

Part 3 – Convolution Neural Network

  • You can create a convolution layer using the keras.layers.Conv2D layer type.

Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

will use filters of size 3×3 to create 32 feature maps of size 32×32. The output shape will be 32x32x32.

  • You can create a pooling layer using the keras.layers.MaxPooling2D layer type.

MaxPooling2D(pool_size=(2, 2))

will take the feature maps created by a convolution layer as input and scale down their size. The pool_size parameters specify the factor by which the image maps are sampled down. Applying this layer to the 32x32x32 representation output by the convolution layer will result in a 16x16x32 representation.

To make the model generalize well to the test data, some drop-out is useful. Drop-out randomly sets some units to 0 during each training update, which prevents the model from over-fitting.

  • You can create a dropout layer using the keras.layers.Dropout layer type.

Dropout(0.25)

will create a layer that drops 25% of the units in each training step.

Designing the Network

Write the function build_convolution_nn() that constructs a convolution neural network containing of:

  • Twoconvolution layers, consisting of 32 feature maps with a filter size of 3×3.
  • One pooling layer, that reduces the size of each feature map to 16×16.
  • One drop-out layer that drops 25% of the units.
  • Two more convolution layers, consisting of 32 feature maps with a filter size of 3×3.
  • Another pooling layer, that reduces the size of each feature map to 8×8.  The output shape should be 8x8x32
  • One drop-out layer that drops 50% of the units.
  • Feed this output into a regular multilayer neural network with two hidden layers of size 250 and 100. The output layer should be of size 10, as before.

Note that the first convolution layer will have to specify the input shape as 32x32x3. You will have to flatten the output of the feature extraction stage before you feed it into the dense layers.  

Training the model

Write the function train_convolution_nn(model, xtrain, ytrain), that trains the model using same parameters we used for the multilayer neural network. The two functions will be identical at this point.

On the test set, this model should provide accuracies in the high 60s. This is not great (the state of the art on this task is about 96%), but a significant achievement, considering that the baseline is 10%.

Optimizing the model

Experiment by modifying the network topology (adding/removing layers etc.) or training parameters (learning rate, number of epochs, batch size, …).

Part 4 – Convolution Neural Network for Binary Classification

The 10 categories in the CIFAR-10 data can be grouped into 2 super-categories: animals and vehicles. Your task is to modify the model above so that it performs binary classification: Is the output an animal (1) or a vehicle (0).

You need to change the following:

  1. Write the function get_binary_cifar10() that returns the arrays xtrain, ytrain, xtest, ytest as before, but now ytrain should be a binary vector of size (50000,) and ytest should be a vector of size (10000,), where 1 indicates an animal and 0 indicates a vehicle.
  2. Write the function build_binary_classifier() that creates the structure of a convolution neural network that performs binary classification. The output to this network should be a single neuron (a dense layer of size 1) with sigmoid activation.
  3. Write a function train_binary_classifier() that trains the model on the new output data produces by get_binary_cifar10(). Use binary_crossentropy instead of categorical_crossentropy.
  4. Evaluate the model. Answer the following questions in a comment at the beginning of the file: Is the binary classification task easier or more difficult than classification into 10 categories? Justify your response.
  5. import numpy as np
  6. import tensorflow as tf
  7. from keras.datasets import cifar10
  8. from keras import Sequential
  9. from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
  • from keras import optimizers
  • def load_cifar10():
  1. train, test = cifar10.load_data()
  2. xtrain, ytrain = train
  3. xtest, ytest = test
  4. #return xtrain, ytrain_1hot, xtest, ytest_1hot
  • def build_multilayer_nn():
  1. pass
  • def train_multilayer_nn(model, xtrain, ytrain):
  1. sgd = optimizers.SGD(lr=0.01)
  2. compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])
  3. fit(xtrain, ytrain_1hot, epochs=30, batch_size=32)
  • def build_convolution_nn():
  1. pass
  • def train_convolution_nn():
  1. pass
  • def get_binary_cifar10():
  1. pass
  • def build_binary_classifier():
  1. pass
  • def train_binary_classifier():
  1. pass
  • if __name__ == “__main__”:
  1. # Write any code for testing and evaluation in this main section.

Solution 

import numpy as np

import tensorflow as tf

from keras.datasets import cifar10

from keras import Sequential

from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout

from keras import optimizers

#

# binary classification task is simpler that multi-class classification.

# it is much easier to distinguish between animals and vehicles than detect

# each instance of 10 classes separately

#

def load_cifar10():

train, test = cifar10.load_data()

xtrain, ytrain = train

xtest, ytest = test

ytrain_1hot = np.squeeze(np.eye(10)[ytrain])

ytest_1hot = np.squeeze(np.eye(10)[ytest])

return xtrain / 255.0, ytrain_1hot, xtest / 255.0, ytest_1hot

# nn.evaluate: [loss, accuracy]

# [1.4599219238281249, 0.4849]

def build_multilayer_nn():

model = Sequential()

flatten = Flatten(input_shape=(32, 32, 3))

model.add(flatten)

hidden = Dense(units=100, activation=”relu”)

model.add(hidden)

output = Dense(units=10, activation=”softmax”)

model.add(output)

return model

def train_multilayer_nn(model, xtrain, ytrain):

sgd = optimizers.SGD(lr=0.01)

model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

model.fit(xtrain, ytrain_1hot, epochs=30, batch_size=32)

# cnn.evaluate: [loss, accuracy]

# [0.66798335824012756, 0.7713]

def build_convolution_nn():

model = Sequential()

conv1 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”, input_shape=(32, 32, 3))

model.add(conv1)

conv2 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv2)

pool1 = MaxPooling2D(pool_size=(2, 2))

model.add(pool1)

dropout1 = Dropout(0.25)

model.add(dropout1)

conv3 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv3)

conv4 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv4)

pool2 = MaxPooling2D(pool_size=(2, 2))

model.add(pool2)

dropout2 = Dropout(0.5)

model.add(dropout2)

flatten = Flatten()

model.add(flatten)

hidden1 = Dense(units=250, activation=”relu”)

model.add(hidden1)

hidden2 = Dense(units=100, activation=”relu”)

model.add(hidden2)

output = Dense(units=10, activation=”softmax”)

model.add(output)

return model

def train_convolution_nn(model, xtrain, ytrain):

sgd = optimizers.SGD(lr=0.01)

model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

model.fit(xtrain, ytrain, epochs=30, batch_size=32)

def get_binary_cifar10():

train, test = cifar10.load_data()

xtrain, ytrain = train

xtest, ytest = test

mask = np.array([0, 0, 1, 1, 1, 1, 1, 1, 0, 0])

ytrain_bin = mask[ytrain.flatten()]

ytest_bin = mask[ytest.flatten()]

return xtrain / 255.0, ytrain_bin, xtest / 255.0, ytest_bin

# bcnn.evaluate: [loss, accuracy]

# [0.14210637753903865, 0.9412]

def build_binary_classifier():

model = Sequential()

conv1 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”, input_shape=(32, 32, 3))

model.add(conv1)

conv2 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv2)

pool1 = MaxPooling2D(pool_size=(2, 2))

model.add(pool1)

dropout1 = Dropout(0.25)

model.add(dropout1)

conv3 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv3)

conv4 = Conv2D(32, (3, 3), activation=’relu’, padding=”same”)

model.add(conv4)

pool2 = MaxPooling2D(pool_size=(2, 2))

model.add(pool2)

dropout2 = Dropout(0.5)

model.add(dropout2)

flatten = Flatten()

model.add(flatten)

hidden1 = Dense(units=250, activation=”relu”)

model.add(hidden1)

hidden2 = Dense(units=100, activation=”relu”)

model.add(hidden2)

output = Dense(units=1, activation=”sigmoid”)

model.add(output)

return model

def train_binary_classifier(model, xtrain, ytrain):

sgd = optimizers.SGD(lr=0.01)

model.compile(loss=’binary_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

model.fit(xtrain, ytrain, epochs=30, batch_size=32)

if __name__ == “__main__”:

# Write any code for testing and evaluation in this main section.

xtrain, ytrain_1hot, xtest, ytest_1hot = load_cifar10()

nn = build_multilayer_nn()

nn.summary()

train_multilayer_nn(nn, xtrain, ytrain_1hot)

print(nn.evaluate(xtest, ytest_1hot))

cnn = build_convolution_nn()

cnn.summary()

train_convolution_nn(cnn, xtrain, ytrain_1hot)

print(cnn.evaluate(xtest, ytest_1hot))

xtrain, ytrain, xtest, ytest = get_binary_cifar10()

bcnn = build_binary_classifier()

bcnn.summary()

train_binary_classifier(bcnn, xtrain, ytrain)

print(bcnn.evaluate(xtest, ytest))