RNN Encoder-Decoder
-
An RNN Encoder-Decoder is an Recurrent Neural Network architecture which comprises of two units, both are RNNs.
- An encoder takes a variable length sequence as input and encodes it to some hidden state of fixed length
- A decoder which takes the hidden state from the encoder and the leftwards context of the target sequence, and predicts the subsequent token in the target sequence.
-
At training time, the decoder is conditioned on the preceding tokens on the original sequence. However, at test time, we condition the decoder on the tokens already predicted. This may be achieved using teacher forcing.
-
The encoder transforms the input data as follows.
Suppose our input sequence is
. Then, at time step , we obtain a hidden state as Where we concatenate the input feature vector with the previous hidden state.
After this, we transform the hidden states into a context variable
such that -
Let
be an output sequence. For each time step
, we assign a conditional probability based on the previous inputs and the context variable. That is,
To perform the actual prediction, we simply take:
the previous token’s target. the hidden state from the previous time step , the context variable. We obtain the new hidden state
as Where
is some function that describes the decoder’s transformation. An output layer is then determined using
to figure out the conditional probabilities. Finally, we use softmax to generate our token . In practice, we may continue generating entries using the decoder simply by feeding the shifted sequence to the model again.
-
Since we use softmax for the decoder, we use the cross entropy Loss Function.
-
We perform an additional step during training and mask irrelevant entries with
so that they do not affect the loss (at the current time step). The masking is necessary since we also need to pad the sequence.
Transformers
- See Transformer Model. Transformers have the advantage that their latent space representation can be of variable dimension