PyTorch and Neural Nets
CS285 Deep RL
     Instructor: Kyle Stachowicz
                      [Adapted from Marwa Abdulhai’s CS285 Fa22 Slides]
        PyTorch Tutorial (Colab)
  https://colab.research.google.com/drive/12nQiv6aZHXNuCfAAuTjJenDWKQbIt2Mz
http://bit.ly/cs285-pytorch-2023
   Goal of this course
Train an agent to perform useful tasks
             train the model
  data                          agent
              collect data
         Goal of this course
Train an agent to perform useful tasks
             train the
               train   model
                     the model
                                         focus for today’s lecture!
  data                           agent
              collect data
How do train a model?
                         dataset          neural network
      gradient
      descent                      loss
PyTorch does all of these!
                             What is PyTorch?
Python library for:
• Defining neural networks
• Automating computing gradients
• And more! (datasets, optimizers,
 GPUs, etc.)
                    How does PyTorch work?
You define:
PyTorch computes:
                                         [picture from Stanford’s CS231n]
• Fast CPU implementations   • Fast CPU implementations
• CPU-only                   • Allows GPU
• No autodiff                • Supports autodiff
• Imperative                 • Imperative
                             Other features include:
                             • Datasets and dataloading
                             • Common neural network operations
                             • Built-in optimizers (Adam, SGD, …)
The Basics
             100x faster!
Multidimensional Arrays
Multidimensional Indexing
                           Axis 1
                 32   27    5       54   1
       Axis 0
   A             99   4    23       3    57
                 76   42   34       82   5
                A.shape == (3, 5)
Multidimensional Indexing
                         Axis 1
               32   27    5       54   1
      Axis 0
               99   4    23       3    57
               76   42   34       82   5
                    A[0, 3]
Multidimensional Indexing
                         Axis 1
               32   27    5       54   1
      Axis 0
               99   4    23       3    57
               76   42   34       82   5
                    A[:, 3]
Multidimensional Indexing
                         Axis 1
               32   27    5       54   1
      Axis 0
               99   4    23       3    57
               76   42   34       82   5
                    A[0, :]
Multidimensional Indexing
                         Axis 1
               32   27    5       54   1
      Axis 0
               99   4    23       3    57
               76   42   34       82   5
                    A[0, 2:4]
Multidimensional Indexing
                           Axis 1
                    32 27
                   32   27   55    54
                                  54    11
       Axis 0     32 27
                 32    27  55    54
                                54    11
   A                99
                   99
                  99     44    23
                              23    33    57
                                         57
                 99    44    23
                            23    33    57
                                       57
                    76 42
                   76   42 3434 82
                                 82   55
                  76 42
                 76    42 34
                           34 8282  55
                                               Axis 2
                A.shape == (3, 5, 4)
Multidimensional Indexing
                          Axis 1
                   32 27
                  32   27   55    54
                                 54    11
       Axis 0    32 27
                32    27  55    54
                               54    11
   A               99
                  99
                 99     44    23
                             23    33    57
                                        57
                99    44    23
                           23    33    57
                                      57
                   76 42
                  76   42 3434 82
                                82   55
                 76 42
                76    42 34
                          34 8282  55
                                              Axis 2
                    A[0, ...]
Multidimensional Indexing
                          Axis 1
                   32 27
                  32   27   55    54
                                 54    11
       Axis 0    32 27
                32    27  55    54
                               54    11
   A               99
                  99
                 99     44    23
                             23    33    57
                                        57
                99    44    23
                           23    33    57
                                      57
                   76 42
                  76   42 3434 82
                                82   55
                 76 42
                76    42 34
                          34 8282  55
                                              Axis 2
                    A[..., 1]
Broadcasting
TL;DR: Shape (1, 3, 2) acts
like (6, 5, 4, 3, 2) when added
to shape (6, 5, 4, 3, 2)
(Trailing dimensions will be
matched, arrays will be
repeated along matching
dimensions)
                                  https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html
Shape Operations
Device Management
• Numpy: all arrays live on the CPU’s RAM
• Torch: tensors can either live on CPU or GPU memory
   • Move to GPU with .to(“cuda”)/.cuda()
   • Move to CPU with .to(“cpu”)/.cpu()
   YOU CANNOT PERFORM OPERATIONS BETWEEN
         TENSORS ON DIFFERENT DEVICES!
Computing Gradients
                          P          b
                      x
                              loss
                                y
                              loss
Computing Gradients
                            P          b
                        x
            .detach()            y
                                loss
Training Loop
    REMEMBER THIS!
Converting Numpy / PyTorch
Numpy -> PyTorch:
         torch.from_numpy(numpy_array).float()
PyTorch -> Numpy:
• (If requires_grad) Get a copy without graph with .detach()
• (If on GPU) Move to CPU with .to(“cpu”)/.cpu()
• Convert to numpy with .numpy
All together:
            torch_tensor.detach().cpu().numpy()
Custom networks
                  • Prefer net() over
                    net.forward()
                  • Everything (network
                    and its inputs) on
                    the same device!!!
Torch Best Practices
•When in doubt, assert is your friend
  assert x.shape == (B, N), \
         f”Expected shape ({B, N}) but got {x.shape}”
•Be extra careful with .reshape/.view
  • If you use it, assert before and after
  • Only use it to collapse/expand a single dim
  • In Torch, prefer .flatten()/.permute()/.unflatten()
•If you do some complicated operation, test it!
  • Compare to a pure Python implementation
Torch Best Practices (continued)
•Don’t mix numpy and Torch code
  • Understand the boundaries between the two
  • Make sure to cast 64-bit numpy arrays to 32 bits
  • torch.Tensor only in nn.Module!
•Training loop will always look the same
  • Load batch, compute loss
  • .zero_grad(), .backward(), .step()
        PyTorch Tutorial (Colab)
  https://colab.research.google.com/drive/12nQiv6aZHXNuCfAAuTjJenDWKQbIt2Mz
http://bit.ly/cs285-pytorch-2023