PyTorch Lightning example by ordabayevy · Pull Request #3189 · pyro-ppl/pyro

ordabayevy · 2023-03-13T01:07:23Z

This example shows how to train Pyro models using PyTorch Lightning and is adapted from Horovod example.

ordabayevy · 2023-03-14T04:17:57Z

Addresses #3171.

eb8680

Nice! I've been using Lightning recently as well, so I left some (optional) suggestions aimed at making the example slightly more PyTorch-idiomatic using new features from #3149

eb8680 · 2023-03-14T13:22:35Z

+
+def main(args):
+    # Create a model, synthetic data, a guide, and a lightning module.
+    pyro.set_rng_seed(args.seed)


This option added in #3149 ensures that parameters of PyroModules will not be implicitly shared across model instances via the Pyro parameter store:

Suggested change

pyro.set_rng_seed(args.seed)

pyro.set_rng_seed(args.seed)

pyro.settings.set(module_local_params=True)

It's not really exercised in this simple example since there's only one model and guide but I think it's good practice to enable it whenever models and guides can be written as PyroModules and trained using generic PyTorch infrastructure like torch.optim and PyTorch Lightning.

eb8680 · 2023-03-14T13:37:44Z

+    guide = AutoNormal(model)
+    training_plan = PyroLightningModule(model, guide, args.learning_rate)


This change uses the new __call__ method added to the base pyro.infer.elbo.ELBO in #3149 that takes a model and guide returns a torch.nn.Module wrapper around the loss:

Suggested change

guide = AutoNormal(model)

training_plan = PyroLightningModule(model, guide, args.learning_rate)

guide = AutoNormal(model)

loss_fn = Trace_ELBO()(model, guide)

training_plan = PyroLightningModule(loss_fn, args.learning_rate)

It saves you from having to pass around a model and guide everywhere or deal with the Pyro parameter store, which makes SVI a little easier to use with other PyTorch tools like Lightning and the PyTorch JIT.

I didn't know about ELBOModule. This is much neater!

eb8680 · 2023-03-14T13:41:07Z

+    # All relevant parameters need to be initialized before ``configure_optimizer`` is called.
+    # Since we used AutoNormal guide our parameters have not be initialized yet.
+    # Therefore we warm up the guide by running one mini-batch through it.
+    mini_batch = dataset[: args.batch_size]
+    guide(*mini_batch)


Suggested change

# All relevant parameters need to be initialized before ``configure_optimizer`` is called.

# Since we used AutoNormal guide our parameters have not be initialized yet.

# Therefore we warm up the guide by running one mini-batch through it.

mini_batch = dataset[: args.batch_size]

guide(*mini_batch)

# All relevant parameters need to be initialized before ``configure_optimizer`` is called.

# Since we used AutoNormal guide our parameters have not be initialized yet.

# Therefore we initialize the model and guide by running one mini-batch through the loss.

mini_batch = dataset[: args.batch_size]

loss_fn(*mini_batch)

eb8680 · 2023-03-14T13:45:17Z

+# Distributed training via Pytorch Lightning.
+#
+# This tutorial demonstrates how to distribute SVI training across multiple
+# machines (or multiple GPUs on one or more machines) using the PyTorch Lightning


Where is the distributed training in this example? Is it hidden in the default configuration of the DataLoader and TrainingPlan in main below?

Argparse arguments are passed to the pl.Trainer:

trainer = pl.Trainer.from_argparse_args(args)

So you can run the script as follows:

$ python examples/svi_lightning.py --accelerator gpu --devices 2 --max_epochs 100 --strategy ddp

When there are multiple devices DataLoader will use DistributedSampler automatically.

eb8680 · 2023-03-14T14:00:41Z

+    def __init__(self, model, guide, lr):
+        super().__init__()
+        self.pyro_model = model
+        self.pyro_guide = guide
+        self.loss_fn = Trace_ELBO().differentiable_loss


Suggested change

def __init__(self, model, guide, lr):

super().__init__()

self.pyro_model = model

self.pyro_guide = guide

self.loss_fn = Trace_ELBO().differentiable_loss

def __init__(self, loss_fn: pyro.infer.elbo.ELBOModule, lr: float):

super().__init__()

self.loss_fn = loss_fn

self.model = loss_fn.model

self.guide = loss_fn.guide

eb8680 · 2023-03-14T14:01:18Z

+
+    def training_step(self, batch, batch_idx):
+        """Training step for Pyro training."""
+        loss = self.loss_fn(self.pyro_model, self.pyro_guide, *batch)


Suggested change

loss = self.loss_fn(self.pyro_model, self.pyro_guide, *batch)

loss = self.loss_fn(*batch)

eb8680 · 2023-03-14T14:01:32Z

+
+    def configure_optimizers(self):
+        """Configure an optimizer."""
+        return torch.optim.Adam(self.pyro_guide.parameters(), lr=self.lr)


Suggested change

return torch.optim.Adam(self.pyro_guide.parameters(), lr=self.lr)

return torch.optim.Adam(self.loss_fn.parameters(), lr=self.lr)

eb8680 · 2023-03-14T14:22:48Z

+        self.lr = lr
+


Adding a forward method that calls Predictive is sometimes helpful:

Suggested change

self.lr = lr

self.lr = lr

self.predictive = pyro.infer.Predictive(self.model, guide=self.guide)

def forward(self, *args):

return self.predictive(*args)

ordabayevy

Thanks for reviewing @eb8680. I think it is much neater now using ELBOModule!

ordabayevy · 2023-03-14T16:27:51Z

+# Distributed training via Pytorch Lightning.
+#
+# This tutorial demonstrates how to distribute SVI training across multiple
+# machines (or multiple GPUs on one or more machines) using the PyTorch Lightning


Argparse arguments are passed to the pl.Trainer:

trainer = pl.Trainer.from_argparse_args(args)

So you can run the script as follows:

$ python examples/svi_lightning.py --accelerator gpu --devices 2 --max_epochs 100 --strategy ddp

When there are multiple devices DataLoader will use DistributedSampler automatically.

ordabayevy · 2023-03-14T19:21:09Z

+    guide = AutoNormal(model)
+    training_plan = PyroLightningModule(model, guide, args.learning_rate)


I didn't know about ELBOModule. This is much neater!

fritzo

Looks great! Can you just confirm the generated docs are readable, i.e. after running make tutorial? Also ensure the title isn't too long when it appears on the left hand side TOC.

ordabayevy · 2023-03-14T21:18:22Z

@fritzo There is something wrong with building tutorials when I run make tutorial:

make tutorial
make -C tutorial html
make[1]: Entering directory '/mnt/disks/dev/repos/pyro/tutorial'
Running Sphinx v6.1.3
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 80 source files that are out of date
updating environment: [new config] 80 added, 0 changed, 0 removed
reading sources... [100%] svi_part_ii .. working_memory                                                                                                             

Warning, treated as error:
/mnt/disks/dev/repos/pyro/tutorial/source/gp.ipynb:973:Duplicate substitution definition name: "image0".
make[1]: *** [Makefile:20: html] Error 2
make[1]: Leaving directory '/mnt/disks/dev/repos/pyro/tutorial'
make: *** [Makefile:18: tutorial] Error 2

Trying to figure out what is wrong ... (if you know a quick fix would appreciate it)

fritzo · 2023-03-15T00:34:36Z

@ordabayevy not sure what's causing the build issue...

Unrelated, I see

.../pyro-ppl/pyro/tutorial/source/svi_lightning.rst: WARNING: document isn't included in any toctree

Could you add svi_lightning to tutorial/source/index.rst so it shows up on the website?

ordabayevy · 2023-03-15T01:18:05Z

Still no luck with make tutorial. When I try to build tutorials on dev branch I get this:

Details

make -C tutorial html
make[1]: Entering directory '/mnt/disks/dev/repos/pyro/tutorial'
Running Sphinx v6.1.3
making output directory... done
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 79 source files that are out of date
updating environment: [new config] 79 added, 0 changed, 0 removed
reading sources... [100%] tensor_shapes .. working_memory                                                                                                           

Warning, treated as error:
/mnt/disks/dev/repos/pyro/tutorial/source/logistic-growth.ipynb:1220:File not found: 'workflow.html'
make[1]: *** [Makefile:20: html] Error 2
make[1]: Leaving directory '/mnt/disks/dev/repos/pyro/tutorial'
make: *** [Makefile:18: tutorial] Error 2

ordabayevy · 2023-03-15T03:50:50Z

Can you just confirm the generated docs are readable, i.e. after running make tutorial? Also ensure the title isn't too long when it appears on the left hand side TOC.

I was able to build the tutorial by ignoring warnings and can confirm that the generated doc is readable and the title in the left hand side TOC is not too long.

fritzo

Looks great, thanks for building tutorials. I'll look into fixing those warnings.

@eb8680 any further comments? I'll hold off merging, feel free to merge

eb8680

LGTM

ordabayevy · 2023-03-16T03:49:16Z

Thanks @eb8680 and @fritzo for reviewing!

PyTorch Lightning example (pyro-ppl#3189)

fritzo and others added 10 commits February 1, 2021 15:01

Bump to version 1.5.2 (#2755)

585beb9

Merge branch 'dev'

260d05b

Merge branch 'dev'

8e310b1

Merge branch 'dev'

a48dc7c

Merge tag '1.8.1'

ff4325d

Merge branch 'dev'

b91c739

Merge branch 'dev'

086a684

Merge tag '1.8.4'

78aabaa

PyTorch Lightning example

99e9ec2

fixes

dbf878a

ordabayevy added the Examples label Mar 13, 2023

Yerdos Ordabayev added 2 commits March 13, 2023 02:49

fix test

562d1bb

update comments

92d3065

ordabayevy added the awaiting review label Mar 13, 2023

ordabayevy requested a review from fritzo March 13, 2023 06:08

fix pip install pyro-ppl

be091d1

eb8680 reviewed Mar 14, 2023

View reviewed changes

address comments

7cb5f86

ordabayevy commented Mar 14, 2023

View reviewed changes

fritzo previously approved these changes Mar 14, 2023

View reviewed changes

Merge branch 'master' into svi-lightning

c56582d

add svi_lightning to toctree

e7e44b5

ordabayevy dismissed fritzo’s stale review via e7e44b5 March 15, 2023 01:09

ordabayevy mentioned this pull request Mar 15, 2023

Use ELBOModule cellarium-ai/cellarium-ml#35

Closed

eb8680 mentioned this pull request Mar 15, 2023

Add a "modern Pyro" programming tutorial BasisResearch/chirho#45

Open

fritzo approved these changes Mar 16, 2023

View reviewed changes

eb8680 approved these changes Mar 16, 2023

View reviewed changes

eb8680 merged commit c6851b8 into dev Mar 16, 2023

eb8680 deleted the svi-lightning branch March 16, 2023 03:43

luisdiaz1997 added a commit to luisdiaz1997/pyro that referenced this pull request Mar 16, 2023

Merge pull request #1 from pyro-ppl/dev

cd84bde

PyTorch Lightning example (pyro-ppl#3189)

luisdiaz1997 added a commit to luisdiaz1997/pyro that referenced this pull request Mar 16, 2023

Merge pull request #2 from pyro-ppl/dev

7203103

PyTorch Lightning example (pyro-ppl#3189)

ordabayevy mentioned this pull request Mar 21, 2023

Top level pyro object should sublcass nn.Module and not PyroModule cellarium-ai/cellarium-ml#39

Closed

ordabayevy mentioned this pull request Apr 20, 2023

Multi-GPU training with PyTorch lightning #3171

Closed

	pyro.set_rng_seed(args.seed)
	pyro.set_rng_seed(args.seed)
	pyro.settings.set(module_local_params=True)

		guide = AutoNormal(model)
		training_plan = PyroLightningModule(model, guide, args.learning_rate)

	loss = self.loss_fn(self.pyro_model, self.pyro_guide, *batch)
	loss = self.loss_fn(*batch)

	return torch.optim.Adam(self.pyro_guide.parameters(), lr=self.lr)
	return torch.optim.Adam(self.loss_fn.parameters(), lr=self.lr)

-        self.lr = lr
+        self.lr = lr
+        self.predictive = pyro.infer.Predictive(self.model, guide=self.guide)
+    def forward(self, *args):
+        return self.predictive(*args)

Uh oh!

Conversation

ordabayevy commented Mar 13, 2023

Uh oh!

ordabayevy commented Mar 14, 2023

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ordabayevy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fritzo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ordabayevy commented Mar 14, 2023

Uh oh!

fritzo commented Mar 15, 2023

Uh oh!

ordabayevy commented Mar 15, 2023

Uh oh!

ordabayevy commented Mar 15, 2023

Uh oh!

fritzo left a comment

Choose a reason for hiding this comment

Uh oh!

eb8680 left a comment

Choose a reason for hiding this comment

Uh oh!

ordabayevy commented Mar 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fritzo left a comment •

edited

Loading