LSTM
LSTM
Memory
M. Stanley Fujimoto
CS778 – Winter 2016
30 Jan 2016
Why
• List the alphabet forwards
    List the alphabet backwards
• Lots of information that you store in your brain is not random access
    You learned them as a sequence
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Use Cases
• Predict the next word in a sentence
     The woman took out ________ purse
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Markov Models
• Traditional Markov model approaches are limited because their states must
  be drawn from a modestly sized discrete state space S
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
Why – Neural Network
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Neural Network, Extended
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Neural Network, Extended
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Recurrent Neural Network
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
  Why – Recurrent Neural Network
“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
Why – Recurrent Neural Network
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Continued Influence
Neural Network, Extended                                                      Recurrent Neural Network
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why – Recurrent Neural Network
“Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
Why - LSTM
• Designed to overcome:
   Long-term dependencies
   Vanishing/exploding gradients
Why – Long-Term Dependencies
• We don’t want to remember everything, just the important things for a long time
“(1) How Does LSTM Help Prevent the Vanishing (and Exploding) Gradient Problem in a Recurrent Neural Network? - Quora.” Accessed
February 19, 2016. https://www.quora.com/How-does-LSTM-help-prevent-the-vanishing-and-exploding-gradient-problem-in-a-recurrent-neural-
network.
“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
Why – Vanishing Gradients
“Recurrent Neural Networks Tutorial, Part 3 – BackpropagationThrough Time and Vanishing Gradients.” WildML, October 8, 2015.
http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/.
Why – Vanishing Gradients
“Recurrent Neural Networks Tutorial, Part 3 – BackpropagationThrough Time and Vanishing Gradients.” WildML, October 8, 2015.
http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/.
How – Vanishing Gradients
How – History
• Foundational research done in the 1980’s
      1982 – Hopfield: Introduction of family of recurrent neural networks
      1986 – Jordan: RNN architecture
      1989 – Williams and Zipser: Truncated BackProp Through Time (TBPTT)
      1990 – Elman: simpler RNN architecture
      1997 – Hochreiter and Schmidhuber: LSTM networks
      1999 – Gers, Schmidhuber, Cummins: Forget gate
      2005 – Graves and Schmidhuber: Bidirectional LSTM
      2012 – Pascanu, Mikolov, Bengio: Gradient clipping
      2014 – Cho, Bahdanau, van Merrienboer, Bougares: Gated Recurrent Unit
      2014 – Sutsekver, et al.: Sequence-to-Sequence Learning with Neural Nets
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – 1982, Hopfield
• Recurrent neural networks with
  pattern recognition capabilities
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – 1989, Williams and Zipser
• Truncated BackPropagation Through Time (TBPTT)
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – 1990, Elman
• Simpler than Jordan network
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – 1997, Sepp & Jürgen
                                            Hochreiter          and Schmidhuber
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – 1999, Gers, Schmidhuber, Cummins
• Added the Forget Gate to the LSTM structure, published in “Learning to
  Forget: Continual Prediction with LSTM”
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. “Learning to Forget: Continual Prediction with LSTM.” Neural Computation 12,
no. 10 (October 1, 2000): 2451–71. doi:10.1162/089976600300015015.
      How – A Reddit Explanation
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How – LSTM Structure
How – Step by Step: Cell State
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How – Step by Step: Forget Layer
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How – Step by Step: Input Gate Layer
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How – Step by Step: Cell State Update
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How – Step by Step: Output Value
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
How - LSTM Memory Cell
• Input node: g c(t)
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
How – Backpropagation
 Mazur. “A Step by Step Backpropagation Example.” Matt Mazur, March 17, 2015. http://mattmazur.com/2015/03/17/a-step-
 by-step-backpropagation-example/.
How – Backpropagation
How – RNN Backpropagation
• Normal BPTT (Backprop Through Time)
   How – Clarifications
Gers, Felix A., J. urgen Schmidhuber, and Fred Cummins. "Learning to Forget: Continual Prediction with LSTM." (1999).
How – Clarifications
• CEC
   Constant error carousel
• Weights = 1
Gers, Felix A., J. urgen Schmidhuber, and Fred Cummins. "Learning to Forget: Continual Prediction with LSTM." (1999).
   How – Clarifications
Gers, Felix A., J. urgen Schmidhuber, and Fred Cummins. "Learning to Forget: Continual Prediction with LSTM." (1999).
“Deep Learning Lecture 12: Recurrent Neural Nets and LSTMs - YouTube.” Accessed February 27, 2016.
https://www.youtube.com/watch?v=56TYLaQN4N8.
    How – Clarifications
“Deep Learning Lecture 12: Recurrent Neural Nets and LSTMs - YouTube.” Accessed February 27, 2016.
https://www.youtube.com/watch?v=56TYLaQN4N8.
 How – Clarifications
 • Forget Gate
 • LSTMs (with no forget gate) have difficulty with continual input streams
      A continual input stream is not segmented into subsequences (no starts, no ends)
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. “Learning to Forget: Continual Prediction with LSTM.” Neural Computation 12,
no. 10 (October 1, 2000): 2451–71. doi:10.1162/089976600300015015.
How – Clarifications
• Vanishing/Exploding Gradients
Gers, Felix A., J. urgen Schmidhuber, and Fred Cummins. "Learning to Forget: Continual Prediction with LSTM." (1999).
   How – Solving Life’s Problems
                                                                    • Derivative is the forget gate
Gers, Felix A., J. urgen Schmidhuber, and Fred Cummins. "Learning to Forget: Continual Prediction with LSTM." (1999).
“Why Can Constant Error Carousels (CECs) Prevent LSTM from the Problems of Vanishing/exploding Gradients? • /r/MachineLearning.”
Reddit. Accessed March 1, 2016.
https://www.reddit.com/r/MachineLearning/comments/34piyi/why_can_constant_error_carousels_cecs_prevent/.
How – 2012, Pascanu et al.
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. “On the Difficulty of Training Recurrent Neural Networks.” arXiv:1211.5063
[cs], November 21, 2012. http://arxiv.org/abs/1211.5063.
How – Exploding Gradients
                                         ���
                     ���                                           ���
                                         ���
��� ���
                                         ���
                             ���                       ���
                                         ���
��� ���
��� ���
����
                                   ���            ���
                                         ���
                                         ���
                                         ���
                     ���                                           ���
                                         ���
��� ���
                                         ���
                             ���                       ���
                                         ���
��� ���
��� ���
���� ���
����
���������� �����������
                                 ���               ���
                                             ���
                                             ���
                                             ���
                     ���                                               ���
                                             ���
��� ���
                                             ���
                                 ���                       ���
                                             ���
��� ���
��� ���
���
                                       ���            ���
                                              ���
                                              ���
                                              ���
                     ���                                               ���
                                              ���
��� ���
                                              ���
                                 ���                       ���
                                              ���
��� ���
��� ���
���� ����
���
����������� �����������
                                  ���                      ���
                                          ���
                                          ���
                                          ���
                     ���                                            ���
                                          ���
��� ���
                                          ���
                              ���                       ���
                                          ���
��� ���
���� ����
���
                                    ���            ���
                                          ���
                                          ���
                                          ���
                     ���                                           ���
                                          ���
��� ���
                                          ���
                              ���                      ���
                                          ���
��� ���
���� ����
���� ���
���
����������� �����������
                                  ���              ���
                                         ���
                                         ���
                                         ���
                     ���                                           ���
                                         ���
��� ���
                                         ���
                             ���                       ���
                                         ���
��� ���
��� ���
���
                                   ���            ���
                                         ���
                                         ���
                                         ���
                     ���                                          ���
                                         ���
��� ���
                                         ���
                             ���                      ���
                                         ���
��� ���
��� ���
���� ����
���
����������� �����������
                                 ���              ���
How – Forward Pass
How – Forward Pass
                      0             1
       c_t   0.786032719    0.85228885
       h_t   0.518914596   0.496404923
How – Forward Pass
           0       1                                           0           1
 â       0.9    1.04           a         tanh(a_hat) 0.71629787 0.777888067
  î     0.41    0.89           i         sig(i_hat)  0.601087879 0.708890173
 ˆf      0.9    1.11           f         sig(f_hat)  0.710949503 0.752129111
 ô      1.33    0.93           o         sig(o_hat)  0.790840635 0.717075285
                       0            1
  c_t      0.786032719      0.85228885
  h_t      0.518914596     0.496404923
How – Backward Pass: Equations
How – Backward Pass
∂h_t   ∂E/∂h_t
∂o_t   ∂E/∂o_t
∂c_t   ∂E/∂c_t
How – Backward Pass
                                           ���
                      ���                                            ���
                                           ���
��� ���
                                           ���
                               ���                       ���
                                           ���
��� ���
���� ����
���� ���
����
���������� �����������
                                   ����             ����
                                               ���
                                               ���
                                               ���
                      ���                                                    ���
                                               ���
��� ���
                                               ���
                                   ���                       ���
                                               ���
��� ���
���� ����
���� ���
����
���������� �����������
                                ����                         ����
                            �����������                  ������������
                                           ���
                                           ���
                                           ���
                      ���                                           ���
                                           ���
��� ���
                                           ���
                               ���                      ���
                                           ���
��� ���
���� ����
���� ����
���
����������� �����������
                               ����                 ����
                                               ���
                                               ���
                                               ���
                      ���                                                       ���
                                               ���
��� ���
                                               ���
                                   ���                      ���
                                               ���
��� ���
���� ����
���� ����
���
����������� �����������
                                ����                         ����
                            �����������                 ������������
                                            ���
                                            ���
                                            ���
                      ���                                            ���
                                            ���
��� ���
                                            ���
                                ���                      ���
                                            ���
��� ���
����� �����
���� ���
���
����������� �����������
                                ����                 ����
                                               ���
                                               ���
                                               ���
                      ���                                               ���
                                               ���
��� ���
                                               ���
                                   ���                      ���
                                               ���
��� ���
����� �����
���� ���
���
����������� �����������
                                ����                        ����
                            �����������                 �����������
How – Calculate ∂c_t-1
How – Backward Pass
                                               ���
                      ���                                                    ���
                                               ���
��� ���
                                               ���
                                   ���                       ���
                                               ���
��� ���
���� ����
���� ���
����
���������� �����������
                                ����                         ����
                            �����������                  ������������
                                                 ���
                                                 ���
                                                 ���
                      ���                                                      ���
                                                 ���
��� ���
                                                 ���
                                     ���                       ���
                                                 ���
��� ���
                                ����                          ����
                            ������������                  �����������
���� ���
����
���������� �����������
                                  ����                         ����
                              �����������                  ������������
                                               ���
                                               ���
                                               ���
                      ���                                                       ���
                                               ���
��� ���
                                               ���
                                   ���                      ���
                                               ���
��� ���
���� ����
���� ����
���
����������� �����������
                                ����                         ����
                            �����������                 ������������
                                                 ���
                                                 ���
                                                 ���
                      ���                                                      ���
                                                 ���
��� ���
                                                 ���
                                     ���                      ���
                                                 ���
��� ���
                                 ����                        ����
                            ������������                 �����������
���� ����
���
����������� �����������
                                  ����                         ����
                              �����������                 ������������
                                               ���
                                               ���
                                               ���
                      ���                                               ���
                                               ���
��� ���
                                               ���
                                   ���                      ���
                                               ���
��� ���
����� �����
���� ���
���
����������� �����������
                                ����                        ����
                            �����������                 �����������
                                                 ���
                                                 ���
                                                 ���
                      ���                                                  ���
                                                 ���
��� ���
                                                 ���
                                    ���                        ���
                                                 ���
��� ���
                                �����                        �����
                            ������������                  ����������
���� ���
���
����������� �����������
                                  ����                         ����
                              �����������                  �����������
                                               ���
                                               ���
                                               ���
                      ���                                                       ���
                                               ���
��� ���
                                               ���
                                   ���                      ���
                                               ���
��� ���
���� ����
���� ����
���
����������� �����������
                                ����                        ����
                            �����������                 ������������
                                                 ���
                                                 ���
                                                 ���
                      ���                                                      ���
                                                 ���
��� ���
                                                 ���
                                     ���                      ���
                                                 ���
��� ���
                                ����                         ����
                            ������������                 �����������
���� ����
���
����������� �����������
                                  ����                        ����
                              �����������                 ������������
How – Backward Pass
��� ���
                                                 ���
                                     ���                       ���
                                                 ���
��� ���
                                ����                          ����
                            ������������                  �����������
���� ���
����
���������� �����������
                                  ����                         ����
                              �����������                  ������������
                                                 ���
                                                 ���
��� ���
                                                 ���
                                     ���                      ���
                                                 ���
��� ���
                                 ����                        ����
                            ������������                 �����������
���� ����
���
����������� �����������
                                  ����                         ����
                              �����������                 ������������
                                                 ���
                                                 ���
��� ���
                                                 ���
                                    ���                        ���
                                                 ���
��� ���
                                �����                        �����
                            ������������                  ����������
���� ���
���
����������� �����������
                                  ����                         ����
                              �����������                  �����������
                                                 ���
                                                 ���
��� ���
                                                 ���
                                     ���                      ���
                                                 ���
��� ���
                                ����                         ����
                            ������������                 �����������
���� ����
���
����������� �����������
                                  ����                        ����
                              �����������                 ������������
    How – Backward Pass
      deltas
                        0             1                0                1
W     c        0.006839763   0.061557871   -‐0.005264398   -‐0.047379579
      i        0.004013809   0.036124282   -‐0.003018884   -‐0.027169952
      f        0.002401211   0.021610899   -‐0.001402398   -‐0.012621584
      o        0.005632084   0.050688752   -‐0.007072752    -‐0.06365477
��� ���
                                                     ���
                                         ���                       ���
                                                     ���
��� ���
                                    ����                          ����
                                ������������                  �����������
���� ���
����
���������� �����������
                                      ����                         ����
                                  �����������                  ������������
                                                     ���
                                                     ���
��� ���
                                                     ���
                                         ���                      ���
                                                     ���
��� ���
                                     ����                        ����
                                ������������                 �����������
���� ����
���
����������� �����������
                                      ����                         ����
                                  �����������                 ������������
                                                     ���
                                                     ���
��� ���
                                                     ���
                                        ���                        ���
                                                     ���
��� ���
                                    �����                        �����
                                ������������                  ����������
���� ���
���
����������� �����������
                                      ����                         ����
                                  �����������                  �����������
                                                     ���
                                                     ���
��� ���
                                                     ���
                                         ���                      ���
                                                     ���
��� ���
                                    ����                         ����
                                ������������                 �����������
���� ����
���
����������� �����������
                                      ����                        ����
                                  �����������                 ������������
How – Backward Pass
• Numerical stability L
What - Datasets
• Text generation (char-to-char, word-to-word)
   Book of Mormon
      English
      Portuguese
      Spanish
What – Book of Mormon, c2c
• Seed: and it came to pass
• and it came to pass that the Lord of the vineyard said unto them: behold, I
 am a descendant of the death of the seed of Joseph, have I prepared for the
 same being was a men of God, or of our prophecies which I have spoken,
 which is the season of his words, and I will bring the Father in the name of
 Jesus.
• Capitals added
• Translation
What – O Livro de Mórmon, c2c
• Seed: e o senhor disse
• Capitals added
• Translation
What – El Libro de Mormón, c2c
• Seed: y el senhor dice
• y el senhor dice todas las cosas que están escritas de las palabras que habló
 el ángel del señor de los ejércitos, de modo que no pudieron ser en ellas; y
 también le dije a su pueblo: esto fue para llevar a causa de la iniquidad de
 ammoríah por el espacio de muchos días
• Translate
What – Book of Mormon, w2w
•   and thus it did come to pass that the people began to wax strong in wickedness
    and abominations ; and he begat sons and daughters ; and he begat coriantum ,
    and he anointed coriantum to reign in his stead .
•   and it came to pass that the lamanites did come down against us to battle .
•   and we did gather in the city of gid , when they did come unto the city of night ,
    the city desolation .
•   and it was one day that there was no contention among all the people , but
    insomuch that there was no wars , and rumors of wars ; and every kind that they
    might read the ways which he desired to murder the people .
•   and he did teach them the words which had been spoken by the mouth of all the
    prophets ; and all the people who were in the land were called by the people of
    nephi .
•   and thus did the people of nephi , throughout all the land of zarahemla , and
    there was no contention by the nephites .
•   and thus they had become weak , because of their wickedness and their
    abominations . and it came to pass that the people did wax more strong in the
    land .
What - Translation
    What – Gated Recurrent Units
    • Differences from LSTMs:
         GRU has 2 gates while LSTM has 3 gates
         GRU’s internal memory is completely exposed as output
            No output gate
“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
      What – 2005, Graves and Schmidhuber
      • Bidirectional LSTM networks
Graves, Alex, and Jürgen Schmidhuber. “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures.”
Neural Networks 18, no. 5–6 (July 2005): 602–10. doi:10.1016/j.neunet.2005.06.042.
“Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs.” WildML, September 17, 2015. http://www.wildml.com/2015/09/recurrent-
neural-networks-tutorial-part-1-introduction-to-rnns/.
What – 2012, Pascanu et al.
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. “On the Difficulty of Training Recurrent Neural Networks.” arXiv:1211.5063
[cs], November 21, 2012. http://arxiv.org/abs/1211.5063.
What – 2014, Cho, et al.
“Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
    What – Gated Recurrent Unit
    • Reset to normal RNN by setting:
         Reset gate to all 1s
         Output gate to all 0s
“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
    What – LSTM vs. GRU
    • Unclear which is better
“Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
          What – 2014, Sutskever, et al.
“NIPS: Oral Session 4 - Ilya Sutskever - Microsoft Research.” Accessed March 1, 2016. http://research.microsoft.com/apps/video/?id=239083.
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks.” arXiv:1409.3215 [cs], September 10,
2014. http://arxiv.org/abs/1409.3215.
What – Input/Output Architectures
Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.”
arXiv:1506.00019 [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
    Bibliography
•   “(1) How Does LSTM Help Prevent the Vanishing (and Exploding) Gradient Problem in a Recurrent Neural Network? - Quora.” Accessed February 19,
    2016. https://www.quora.com/How-does-LSTM-help-prevent-the-vanishing-and-exploding-gradient-problem-in-a-recurrent-neural-network.
•   “Anyone Can Learn To Code an LSTM-RNN in Python (Part 1: RNN) - I Am Trask.” Accessed January 31, 2016.
    https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/.
•   “Deep Learning Lecture 12: Recurrent Neural Nets and LSTMs - YouTube.” Accessed February 27, 2016.
    https://www.youtube.com/watch?v=56TYLaQN4N8.
•   Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. “Learning to Forget: Continual Prediction with LSTM.” Neural Computation 12, no. 10
    (October 1, 2000): 2451–71. doi:10.1162/089976600300015015.
•   Graves, Alex, and Jürgen Schmidhuber. “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures.”
    Neural Networks 18, no. 5–6 (July 2005): 602–10. doi:10.1016/j.neunet.2005.06.042.
•   Hochreiter, S, and J Schmidhuber. “Long Short-Term Memory.” Neural Computation 9, no. 8 (November 1997): 1735–80. doi:10.1162/neco.1997.9.8.1735.
•   Lipton, Zachary C., John Berkowitz, and Charles Elkan. “A Critical Review of Recurrent Neural Networks for Sequence Learning.” arXiv:1506.00019
    [cs], May 29, 2015. http://arxiv.org/abs/1506.00019.
•   Mazur. “A Step by Step Backpropagation Example.” Matt Mazur, March 17, 2015. http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-
    example/.
•   Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. “On the Difficulty of Training Recurrent Neural Networks.” arXiv:1211.5063 [cs], November 21,
    2012. http://arxiv.org/abs/1211.5063.
•   “Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs.” WildML, September 17, 2015. http://www.wildml.com/2015/09/recurrent-neural-
    networks-tutorial-part-1-introduction-to-rnns/.
•   “Recurrent Neural Networks Tutorial, Part 3 – Backpropagation Through Time and Vanishing Gradients.” WildML, October 8, 2015.
    http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/.
•   “Recurrent Neural Network Tutorial, Part 4 – Implementing a GRU/LSTM RNN with Python and Theano.” WildML, October 27, 2015.
    http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/.
•   “SCRN vs LSTM • /r/MachineLearning.” Reddit. Accessed February 10, 2016.
•   https://www.reddit.com/r/MachineLearning/comments/44bxdj/scrn_vs_lstm/.
•   “Simple LSTM.” Accessed March 1, 2016. http://nicodjimenez.github.io/2014/08/08/lstm.html.
•   “The Unreasonable Effectiveness of Recurrent Neural Networks.” Accessed November 24, 2015. http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
•   “Understanding LSTM Networks -- Colah’s Blog.” Accessed January 25, 2016. http://colah.github.io/posts/2015-08-Understanding-LSTMs/.
•   “Vanishing Gradient Problem.” Wikipedia, the Free Encyclopedia, December 29, 2015.
    https://en.wikipedia.org/w/index.php?title=Vanishing_gradient_problem&oldid=697291222.
•   “Why Can Constant Error Carousels (CECs) Prevent LSTM from the Problems of Vanishing/exploding Gradients? • /r/MachineLearning.” Reddit.
    Accessed March 1, 2016. https://www.reddit.com/r/MachineLearning/comments/34piyi/why_can_constant_error_carousels_cecs_prevent/.
•   Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks.” arXiv:1409.3215 [cs], September 10, 2014.
    http://arxiv.org/abs/1409.3215.
•   “NIPS: Oral Session 4 - Ilya Sutskever - Microsoft Research.” Accessed March 1, 2016. http://research.microsoft.com/apps/video/?id=239083.
•   Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks.” arXiv:1409.3215 [cs], September 10, 2014.
    http://arxiv.org/abs/1409.3215.