-
Notifications
You must be signed in to change notification settings - Fork 889
Inf2 example #2399
Inf2 example #2399
Conversation
* fix INF2 example handler * Add logging for padding in inf2 handler * update response timeout and model * Update documentation to show opt-6.7b as the example model * Update model batch log --------- Co-authored-by: Naman Nandan <namannan@amazon.com>
Codecov Report
@@ Coverage Diff @@
## master #2399 +/- ##
=======================================
Coverage 72.01% 72.01%
=======================================
Files 78 78
Lines 3648 3648
Branches 58 58
=======================================
Hits 2627 2627
Misses 1017 1017
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much @namannandan LGTM
| model_name = ctx.model_yaml_config["handler"]["model_name"] | ||
|
|
||
| # allocate "tp_degree" number of neuron cores to the worker process | ||
| os.environ["NEURON_RT_NUM_CORES"] = str(tp_degree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make sure neuron has enough number of cores to support tp_degree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Here, if the required number of neuron cores, i.e torch-neuronx currently does not have an API that provides the number of available(unallocated) neuron cores.tp_degree are not available then the model loading will fail with error of the form:
ERROR TDRV:db_vtpb_get_mla_and_tpb Could not find VNC id 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that torch-neuronx does have a method to query the number of available unallocated cores using torch_neuronx.xla_impl.data_parallel.device_count(). Updated the handler to verify that the necessary number of cores are available before proceeding with model loading
ecc5e02 to
50668c5
Compare
|
Successfully tested the example:
|
Description
Inferentia2 example based on
opt-6.7bmodelType of change
Feature/Issue validation/testing
125mparameter variant of theoptmodel6.7bparameter variant ofoptmodel