Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
                  Recurrent Neural
                     Networks
                   Why sequence
deeplearning.ai
                     models?
Examples of sequence data
                                                         “The quick brown fox jumped
Speech recognition                                       over the lazy dog.”
Music generation                        ∅
                             “There is nothing to like
Sentiment classification         in this movie.”
DNA sequence analysis        AGCCCCTGTGAGGAACTAG          AGCCCCTGTGAGGAACTAG
Machine translation          Voulez-vous chanter avec      Do you want to sing with
                                      moi?                          me?
Video activity recognition                                         Running
Name entity recognition       Yesterday, Harry Potter        Yesterday, Harry Potter
                              met Hermione Granger.          met Hermione Granger.
                                                                           Andrew Ng
                  Recurrent Neural
                     Networks
                     Notation
deeplearning.ai
Motivating example
  x:   Harry Potter and Hermione Granger invented a new spell.
                                                        Andrew Ng
Representing words
  x:   Harry Potter and Hermione Granger invented a new spell.
       ! "#$   ! "%$ ! "&$             ⋯                 ! "($
                                                        Andrew Ng
Representing words
  x:   Harry Potter and Hermione Granger invented a new spell.
       ! "#$   ! "%$ ! "&$                  ⋯            ! "($
                              And = 367
                              Invented = 4700
                              A=1
                              New = 5976
                              Spell = 8376
                              Harry = 4075
                              Potter = 6830
                              Hermione = 4200
                              Gran… = 4000
                                                        Andrew Ng
                  Recurrent Neural
                     Networks
                  Recurrent Neural
deeplearning.ai
                   Network Model
Why not a standard network?
   ! "#$                                           ) "#$
   ! "%$                                           ) "%$
     ⋮              ⋮              ⋮                 ⋮
   ! "'($                                          ) "'*$
Problems:
 - Inputs, outputs can be different lengths in different examples.
 - Doesn’t share features learned across different positions of text.
                                                                 Andrew Ng
Recurrent Neural Networks
     He said, “Teddy Roosevelt was a great President.”
     He said, “Teddy bears are on sale!”
                                                         Andrew Ng
Forward Propagation
             )- "#$         )- "%$         )- ".$                   )- "'* $
                    +"#$           +"%$                 +"'( /#$
     +",$                                           ⋯
            ! "#$          ! "%$          ! ".$                    ! "'( $
                                                                               Andrew Ng
Simplified RNN notation
 +"1$ = 3(566 +"1/#$ + 568 ! "1$ + 96 )
 )- "1$ = 3(5;6 +"1$ + 9; )
                                          Andrew Ng
                  Recurrent Neural
                     Networks
                  Backpropagation
deeplearning.ai
                   through time
Forward propagation and backpropagation
            '( "&$         '( ")$         '( "*$                   '( "+. $
                   !"&$           !")$                 !"+, -&$
    !"#$                                           ⋯
           % "&$          % ")$          % "*$                    % "+, $
                                                                              Andrew Ng
Forward propagation and backpropagation
 ℒ "1$ '( "1$ , ' "1$ =
                          Backpropagation through time
                                                Andrew Ng
                  Recurrent Neural
                     Networks
                  Different types
deeplearning.ai
                      of RNNs
Examples of sequence data
                                                         “The quick brown fox jumped
Speech recognition                                       over the lazy dog.”
Music generation                        ∅
                             “There is nothing to like
Sentiment classification         in this movie.”
DNA sequence analysis        AGCCCCTGTGAGGAACTAG          AGCCCCTGTGAGGAACTAG
Machine translation          Voulez-vous chanter avec      Do you want to sing with
                                      moi?                          me?
Video activity recognition                                         Running
Name entity recognition       Yesterday, Harry Potter        Yesterday, Harry Potter
                              met Hermione Granger.          met Hermione Granger.
                                                                           Andrew Ng
Examples of RNN architectures
                                Andrew Ng
Examples of RNN architectures
                                Andrew Ng
Summary of RNN types
        () #'%                          () #'% () #*%          () #+, %                                 ()
"#$%                      "#$%                           ⋯                "#$%                   ⋯
        & #'%                           &                                         & #'% & #*%         & #+. %
  One to one                          One to many                                Many to one
       () #'%    () #*%        ()   #+, %                                               () #'%       () #+, %
"#$%                                                    "#$%              ⋯        ⋯             ⋯
                          ⋯
       & #'%      & #*%                                         & #'%         & #+. %
                              & #+. %
       Many to many                                                       Many to many
                                                                                                     Andrew Ng
                   Recurrent Neural
                      Networks
                  Language model and
deeplearning.ai
                  sequence generation
What is language modelling?
 Speech recognition
      The apple and pair salad.
      The apple and pear salad.
      !(The apple and pair salad) =
      !(The apple and pear salad) =
                                      Andrew Ng
Language modelling with an RNN
  Training set: large corpus of english text.
  Cats average 15 hours of sleep a day.
  The Egyptian Mau is a bread of cat. <EOS>
                                                Andrew Ng
RNN model
 Cats average 15 hours of sleep a day. <EOS>
 ℒ &' ()*, & ()* = − - &0()* log &'0()*
                     0
 ℒ = - ℒ ()* &' ()*, & ()*
       )                                       Andrew Ng
                  Recurrent Neural
                     Networks
                  Sampling novel
deeplearning.ai
                    sequences
Sampling a sequence from a trained RNN
           '( "&$      '( "/$   '( "0$       '( ")* $
    !"#$   !"&$       !"/$      !"0$     ⋯   !")* $
           % "&$    ' "&$       ' "/$        ' ")- .&$
                                                         Andrew Ng
Character-level language model
    y<n> la 1 softmax co do dai bang voi vocabulary. Cho nao co percentage cao nhat thi y<n> la word do (hoac chon theo random choice,
    danh xe percentage).
    Nhu vay P(n) = P(y<n>) trong softmax. Do do, P(ca cau y) = P(y<0>).P(y<1>)....
 Vocabulary = [a, aaron, …, zulu, <UNK>]
                            '( "&$           '( "/$         '( "0$                                 '( ")* $
        !"#$               !"&$            !"/$             !"0$                 ⋯                !")* $
                           % "&$            '( "&$           '( "/$                             '( ")- .&$
                                                                                                                         Andrew Ng
Sequence generation
Neu model train tren tap du lieu tin tuc (News) thi ket qua la doan van khong co lang man, an du.
Con neu model train tren Shakespeare thi noi dung se tru tinh, an du va co nhieu bien phap tu tu.
                       News                                                                         Shakespeare
President enrique peña nieto, announced                                          The mortal moon hath her eclipse in love.
sench’s sulk former coming football langston
paring.                                                                          And subject of this thou art another this fold.
“I was not at all surprised,” said hich langston.                                When besser be my love to me see sabl’s.
“Concussion epidemic”, to be examined.                                           For whose are ruse of mine eyes heaves.
The gray football the told some and this has on
the uefa icon, should money as.
                                                                                                                      Andrew Ng
                   Recurrent Neural
                      Networks
                  Vanishing gradients
deeplearning.ai
                      with RNNs
Vanishing gradients with RNNs
Vi du: Translate tu tieng viet sang
tieng anh. Den tu "đã là" ta dich
sang were hay was. RNN can             '( "&$         '( "-$       '( "/$                   '( ")* $
thong tin ty trc do de chon, nhung
tu cat thi o qua xa. Co lay history
nhung qua xa thi khong update
duoc. Do do can LSTM.
                !"#$                  !"&$        !"-$             !"/$         ⋯       !")* $
                                      % "&$     % "-$              % "/$                % "). $
                %              ⋮          ⋮       ⋮            ⋮      ⋯     ⋮       ⋮   ⋮              '(
              Exploding gradients.
                                                                                                            Andrew Ng
                  Recurrent Neural
                     Networks
                  Gated Recurrent
deeplearning.ai
                    Unit (GRU)
RNN unit
           !"#$ = &(() !"#*+$ , - "#$ + /) )
                                               Andrew Ng
GRU (simplified)
         The cat, which already ate …, was full.
[Cho et al., 2014. On the properties of neural machine translation: Encoder-decoder approaches]
[Chung et al., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]   Andrew Ng
Full GRU
                                                                          Truoc do: c = c~ va c = c~ = tanh(c_)
                                                                          Thay thanh: c = Tc*c~ + (1-Tc)*c_
  5̃ "#$ = tanh((> [ 5 "#*+$ , - "#$ ] + /> )                             => c = c~ = tanh(c_) = tanh(Tc*c~_ + (1-Tc)*c_ _)
  Γ2 = 3((2 5 "#*+$ , - "#$ + /2 )
  5 "#$ = Γ2 ∗ 5̃ "#$ + 1 − Γ2 + 5 "#*+$
  y tuong: Thay the y = tanh(..) la c~ = tanh(...)
  Sau do co 1 bien Tu bien thien tu 0 den 1. Nhu 1 he so (1 switch dieu chinh volumn cho tung khau)
  y = c = gamma.c~ + (1-gamma)*c_
  Nhu vay gamma la 1 he so scale tu c~ thanh c, gamma la su anh huong cua history vao c~ de thanh c.
  Binh thuong y = c = c~ nhung khi co gamma thi ta can nhac them 1 phan gia tri qua khu de thay doi gia gamma la % hien tai va qua khu.
  The cat, which ate already, was full.
                                                                                                                              Andrew Ng
                  Recurrent Neural
                     Networks
                   LSTM (long short
deeplearning.ai
                  term memory) unit
  GRU and LSTM
                     GRU                                                                              LSTM
 !̃ #$% = tanh(,- Γ/ ∗ ! #$12%, 4 #$% + 6- )
 Γ8 = 9(,8 ! #$12%, 4 #$% + 68 )
          khong co Tr dau chi co Tu va (1-Tu) thoi
 Γ/ = 9(,/ ! #$12%, 4 #$% + 6/ )
 ! #$% = Γ8 ∗ !̃ #$% + 1 − Γ8 ∗ ! #$12%
 =#$% = ! #$%
Khac voi GRU, LSTM co he so rieng cho c~ la Tu va c_ la Tf, ngoai ra con 1 he so To chuyen tu ct thanh at. Nhu vay thi he so c~ la To*Tu va c_ la To*Tf
[Hochreiter & Schmidhuber 1997. Long short-term memory]                                                                                 Andrew Ng
  LSTM units
              GRU                                                          LSTM
 !̃ #$% = tanh(,- Γ/ ∗ ! #$12%, 4 #$% + 6- )              !̃ #$% = tanh(,- =#$12%, 4 #$% + 6- )
 Γ8 = 9(,8 ! #$12%, 4 #$% + 68 )                          Γ8 = 9(,8 =#$12%, 4 #$% + 68 )
 Γ/ = 9(,/ ! #$12%, 4 #$% + 6/ )                          Γ> = 9(,> =#$12%, 4 #$% + 6> )
 ! #$% = Γ8 ∗ !̃ #$% + 1 − Γ8 ∗ ! #$12%                    Γ? = 9(,? =#$12%, 4 #$% + 6? )
 =#$% = ! #$%                                             ! #$% = Γ8 ∗ !̃ #$% + Γ> ∗ ! #$12%
                                                          =#$% = Γ? ∗ ! #$%
[Hochreiter & Schmidhuber 1997. Long short-term memory]                                   Andrew Ng
LSTM in pictures
                                                                                                                                                        D #$%
  !̃ #$% = tanh(,- =#$12%, 4 #$% + 6- )
                                                                                                                                                    softmax
                                                                                                                                           =#$%
  Γ8 = 9(,8 =#$12%, 4 #$% + 68 )
                                                                     ! #$12%                 *              ⨁                ! #$%
                                                                                                                                                         --
  Γ> = 9(,> =#$12%, 4 #$% + 6> )                                                                                                            tanh                     ! #$%
                                                                                                                *                                  =#$%
  Γ? = 9(,?         =#$12%, 4 #$%             + 6? )                 =#$12%             B   #$%
                                                                                                      C   #$%
                                                                                                                        !̃   #$%     A   #$%
                                                                                                                                               *
                                                                                                                                                                     =#$%
  ! #$% = Γ8 ∗ !̃ #$% + Γ> ∗ ! #$12%
                                                                                        forget gate   update gate             tanh       output gate
  =#$% = Γ? ∗ ! #$%
                                                                                        4 #$%
                            D #2%                                      D #F%      Duong a la duong value                   D #G%
                                                                                  Duong c la duong history, luu tru cac
                        softmax                                        softmax                                            softmax
                                                                                  gia tri qua khu de bo sung cho a
                     =#2%                                           =#F%                                               #G%           =
                                                                                  ! #F%
                                                                           --                                                                      --
                                                                                                                                                               ! #G%
                            --
                                        ! #2%
! #E%     *     ⨁                               ! #2%     *     ⨁                             ! #F%                 *        ⨁
=#E%                                    #2%     =#2%                             =#F%
                                                                                              =#F%                                                            =#G%
                                    =
        4 #2%                                           4 #F%                                                   4 #G%                                    Andrew Ng
                  Recurrent Neural
                     Networks
                  Bidirectional RNN
deeplearning.ai
Getting information from the future
 He said, “Teddy bears are on sale!”
 He said, “Teddy Roosevelt was a great President!”
       !" #)%     !" #(%    !" #*%      !" #.%   !" #-%   !" #/%   !" #$%
+#,%    +#)%      +#(%      +#*%        +#.%     +#-%     +#/%     +#$%
         ' #)%    ' #(%      ' #*%      ' #.%    ' #-%    ' #/%    ' #$%
         He       said,    “Teddy      bears     are      on       sale!”
                                                                          Andrew Ng
                           Wy khong chi thoa man Wy*a_(forward) = a(forward)
Bidirectional RNN (BRNN)
                           ma cung phai thoa man Wy*a(backward) =
                           a_(backward). De hieu la, Wy thoa man train "I love
                           you" ra "anh yeu em" thi khi dua "You love I" cung
                           phai ra "em yeu anh". Noi chung thi, Wy la giai
                           nghiem 2 chieu cua phep Wy*a_(forward) = a(forward)
                           va Wy*a(backward) = a_(backward)
                                                               Andrew Ng
                  Recurrent Neural
                     Networks
                    Deep RNNs
deeplearning.ai
 Deep RNN example                                                      softmax of tôi          softmax of yêu
softmax of class (dog,cat) = (1x2)                                     (1x923)                 (1x923)
                                                  , "#$                       , "%$                       , "&$                      , "'$
                          ([&]"+$                ([&]"#$                     ([&]"%$                      ([&]"&$                    ([&]"'$
                          ([%]"+$                ([%]"#$                      ([%]"%$                     ([%]"&$                    ([%]"'$
                          ([#]"+$
                  Ex: "I love you too"                                                  one hot of love             one hot of you
                                                  !"#$                        ! "%$                       ! "&$     (1x1023)         ! "'$
                  va 1 dict co 1023 tu english                                          (1x1023)
                  va 923 tu tieng viet         one hot of I in vocab                                                                  too (1x1023)
                                               mat (1x1023)
image (12x12 = 1x 224)
                                                      Deep RNN la Deeper tung block => so luong weight tang cao tu length of RNN L thanh L*depth
                                                                                                                                     Andrew Ng