I'm currently taking a deep learning course, which used learning the XOR function as its first example of feedforward networks. The XOR function has the following truth table
\(x\) | \(y\) | \(x \oplus y\) |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
which when graphed, is not linearly separable (1s cannot be separated from the 0s by drawing a line)
So if a linear model won't work, I guess that means we need a nonlinear one. We can do this by using the \(relu(x)\) activation function on the outputs of our neurons. \(relu(x)\) is defined as
\[relu(x) = \max\{0, x\}\]
and graphed below.
However, since \(relu(x)\) has sharp corners, it is not differentiable at \(x = 0\), so gradient based learning methods won't work as well. So we use the \(softplus(x)\) function instead, which is a softened version of \(relu(x)\) defined as
\[softplus(x) = \log(1 + \exp(x))\]
shown below.
We begin with your usual imports
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
Then define the inputs and expected outputs of the neural network
inputs = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
xor_outputs = np.array([0, 1, 1, 0])
Next, we define the structure of the neural network. Note that I had to increase the learning rate from the default value.
XOR = Sequential()
XOR.add(Dense(2, activation='softplus', input_dim=2))
XOR.add(Dense(1, activation='sigmoid'))
# Make the model learn faster (take bigger steps) than by default.
sgd = SGD(lr=0.1)
XOR.compile(loss='binary_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
This defines the network
where the hidden layer activation function is \(softplus(x)\) and the output layer activation function is the traditional sigmoid function used to output a number between 0 and 1, indicating the probability of the output being a logical 0 or a logical 1. Note that Keras does not require us to explicitly form the input layer.
Now we actually train the network.
XOR.fit(inputs, xor_outputs, epochs=5000, verbose=0)
cost, acc = XOR.evaluate(inputs, xor_outputs, verbose=0)
print(f'cost: {cost}, acc: {acc * 100}%')
print(XOR.predict(inputs))
which outputs
cost: 0.007737404201179743, acc: 100.0%
[[0.00496492]
[0.9978434 ]
[0.98019916]
[0.00380662]]
Training the network on other boolean functions work exactly the same way, so much so that the only difference is using a different output array.
This was my first experience with a neural network, so here are some things that I learned for your amusement:
Note that boolean functions are bad functions for neural networks to learn. This is because their domain and ranges are discrete and (typically) small. Learning the function takes more time and space than simply listing a truth table.