Learning Boolean Functions with Neural Networks

I'm currently taking a deep learning course, which used learning the XOR function as its first example of feedforward networks. The XOR function has the following truth table

\(x\) \(y\) \(x \oplus y\)
0 0 0
0 1 1
1 0 1
1 1 0

which when graphed, is not linearly separable (1s cannot be separated from the 0s by drawing a line)

XOR is not linearly separable
XOR is not linearly separable

So if a linear model won't work, I guess that means we need a nonlinear one. We can do this by using the \(relu(x)\) activation function on the outputs of our neurons. \(relu(x)\) is defined as

\[relu(x) = \max\{0, x\}\]

and graphed below.

The RELU function
The RELU function

However, since \(relu(x)\) has sharp corners, it is not differentiable at \(x = 0\), so gradient based learning methods won't work as well. So we use the \(softplus(x)\) function instead, which is a softened version of \(relu(x)\) defined as

\[softplus(x) = \log(1 + \exp(x))\]

shown below.

The softplus function
The softplus function

We begin with your usual imports

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

Then define the inputs and expected outputs of the neural network

inputs = np.array([[0, 0],
                   [0, 1],
                   [1, 0],
                   [1, 1]])

xor_outputs = np.array([0, 1, 1, 0])

Next, we define the structure of the neural network. Note that I had to increase the learning rate from the default value.

XOR = Sequential()
XOR.add(Dense(2, activation='softplus', input_dim=2))
XOR.add(Dense(1, activation='sigmoid'))

# Make the model learn faster (take bigger steps) than by default.
sgd = SGD(lr=0.1)
XOR.compile(loss='binary_crossentropy',
            optimizer=sgd,
            metrics=['accuracy'])

This defines the network

The XOR network

where the hidden layer activation function is \(softplus(x)\) and the output layer activation function is the traditional sigmoid function used to output a number between 0 and 1, indicating the probability of the output being a logical 0 or a logical 1. Note that Keras does not require us to explicitly form the input layer.

Now we actually train the network.

XOR.fit(inputs, xor_outputs, epochs=5000, verbose=0)
cost, acc = XOR.evaluate(inputs, xor_outputs, verbose=0)
print(f'cost: {cost}, acc: {acc * 100}%')
print(XOR.predict(inputs))

which outputs

cost: 0.007737404201179743, acc: 100.0%
[[0.00496492]
 [0.9978434 ]
 [0.98019916]
 [0.00380662]]

Training the network on other boolean functions work exactly the same way, so much so that the only difference is using a different output array.


This was my first experience with a neural network, so here are some things that I learned for your amusement:

Note that boolean functions are bad functions for neural networks to learn. This is because their domain and ranges are discrete and (typically) small. Learning the function takes more time and space than simply listing a truth table.