Architectural Details

  • A hidden state is a state which is not necessarily observed, but which holds some form of latent representation about the inputs. Typically ,it is used to aggregate sequential data.
    • We use hidden states to avoid having to store many parameters since we are looking at the input’s values at time steps away.

      Let denote the hidden state at time step and as the input.

      We calculate the hidden state as

      That is, for a RNN, we want the current state to be dependent on the previous state. However, unlike the Markov Property, we actually do retain some information about all previously seen states so far.

RNN computation. Image taken from Zhanng et al.
  • In practice, we calculate the hidden states as follows. Let be the size of a minibatch be the size of each inputs in each example. be the hidden state. be a minibatch of inputs be some activation function

    Then

    Where and are weights ,and is a bias term.

  • We make use of Recurrent Layers. These are layers which use hidden states obtained from previous computations.

    More formally, Let be a minibatch of inputs at time step . be the hidden layer output output of time step . be an activation function.

    We perform the calculation of the output as

    • We parameterize on the weights of the non-hidden states, the weights of the inputs (as in a fully connected layer) and the bias of the output term.
  • Typical training involves using Backpropagation through Time.

Links