# 1.d.iii. Feed-Forward Neural Network Models

Using feed-forward neural networks were first seriously applied to modeling natural language by Yoshua Bengio in 2003 . Bengio used a feed-forward network, where each word in the vocabulary is mapped to a $$m$$-dimensional vector in a continuous vector space.

The paper is light on the details of the mapping $$C : V \to \mathbb R^m$$. fills in some of the details, but still doesn't explicitly explain how $$C$$ is constructed. It seems to indicate that $$C$$ is learned at the same time as the rest of the network? So does that means that the actual inputs are the word indices?

Then each word in the sequence $$w_{i-k:i}$$ is mapped to their corresponding vectors, which are then concatenated to form the $$k \cdot m$$-dimensional input vector for the neural network.

The softmax output layer shown above is the most computational aspect of Bengio's language model — given a vocabulary of size $$v$$, one softmax computation requires a matrix-vector multiplication with a matrix sized $$d_\text{hidden} \times v$$, followed by $$v$$ exponentiations. This makes using large vocabularies prohibitively expensive .