TorchInductor CPU Performance Dashboard #93531

blzheng · 2022-10-13T02:29:48Z

Dashboard to track the performance of torchinductor on CPU.

cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @soumith @ngimel @chauhang

blzheng · 2022-10-13T02:52:28Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

We evaluate torchinductor across three benchmark suites - torchbench, huggingface and timm. We run these experiments on ICX 8375C. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 84%, 43/51 | 98%, 43/44  | 87%, 53/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.03x    |    1.08x    |    1.03x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    2.23    |    4.47     |    3.75     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    0.0x    |    0.0x     |    0.0x     |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           squeezenet1_1           |  16  |  1.4738  |
|             resnet18              |  8   |  1.1306  |
|              alexnet              | 128  |  1.1131  |
|          vision_maskrcnn          |  1   |   1.1    |
|         soft_actor_critic         | 256  |  1.0935  |
|          resnext50_32x4d          |  8   |  1.0873  |
|        shufflenet_v2_x1_0         |  64  |  1.0814  |
|             resnet50              |  32  |  1.0604  |
|        mobilenet_v3_large         |  32  |  1.053   |
|           timm_resnest            |  32  |  1.0515  |
|           mobilenet_v2            |  16  |  1.0505  |
|            densenet121            |  64  |  1.0445  |
|            mnasnet1_0             |  32  |  1.0392  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0286  |
|               vgg16               |  4   |  1.0269  |
|            Super_SloMo            |  6   |  1.0227  |
|           pytorch_unet            |  1   |  1.019   |
|               dcgan               | 256  |  1.0135  |
|            timm_vovnet            |  32  |  1.0131  |
|          LearningToPaint          |  96  |  1.0075  |
|       functorch_dp_cifar10        |  64  |  1.0023  |
|      resnet50_quantized_qat       |  32  |  0.9979  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9979  |
|              yolov3               |  8   |  0.995   |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9871  |
|           hf_Longformer           |  1   |  0.9642  |
|            timm_regnet            |  32  |  0.963   |
|        Background_Matting         |  1   |  0.9402  |
|                drq                |  1   |  0.9262  |
|               dlrm                | 2048 |  0.9218  |
|            timm_nfnet             | 128  |  0.8948  |
|           hf_DistilBert           |  1   |  0.8699  |
|               hf_T5               |  1   |  0.8467  |
|             hf_Albert             |  1   |  0.8211  |
|              hf_Bert              |  1   |  0.8087  |
|           fastNLP_Bert            |  1   |  0.7941  |
|              hf_Bart              |  1   |  0.7719  |
|      nvidia_deeprecommender       | 256  |  0.7583  |
|              hf_GPT2              |  1   |  0.7083  |
|         timm_efficientnet         |  64  |  0.6984  |
|      timm_vision_transformer      |  8   |  0.6756  |
| attention_is_all_you_need_pytorch |  32  |  0.5568  |
|           BERT_pytorch            |  2   |  0.4027  |
|           lennard_jones           | 1000 |  0.3608  |
|             tacotron2             |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
|            hf_BigBird             |  0   |   0.0    |
|          pytorch_stargan          |  0   |   0.0    |
|            tts_angular            |  0   |   0.0    |
|              demucs               |  0   |   0.0    |
|            hf_Reformer            |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+-------------+
|               name                | bs  |  inductor   |
+-----------------------------------+-----+-------------+
|           BERT_pytorch            |  2  |    pass     |
|          resnext50_32x4d          |  2  |    pass     |
|        Background_Matting         |  1  |    pass     |
|    mobilenet_v2_quantized_qat     |  2  |    pass     |
|        mobilenet_v3_large         |  2  |    pass     |
|      nvidia_deeprecommender       |  2  |    pass     |
|   pytorch_CycleGAN_and_pix2pix    |  1  |    pass     |
|           pytorch_unet            |  2  |    pass     |
|             resnet18              |  2  |    pass     |
|             resnet50              |  2  |    pass     |
|      resnet50_quantized_qat       |  2  |    pass     |
|        shufflenet_v2_x1_0         |  2  |    pass     |
|           lennard_jones           |  2  |    pass     |
|         soft_actor_critic         | 256 |    pass     |
|           squeezenet1_1           |  2  |    pass     |
|         timm_efficientnet         |  2  |    pass     |
|            timm_nfnet             |  2  |    pass     |
|            timm_regnet            |  2  |    pass     |
|           timm_resnest            |  2  |    pass     |
|      timm_vision_transformer      |  2  |    pass     |
|            timm_vovnet            |  2  |    pass     |
|               vgg16               |  2  |    pass     |
|            mnasnet1_0             |  2  |    pass     |
|           mobilenet_v2            |  2  |    pass     |
|               hf_T5               |  2  |    pass     |
|           fastNLP_Bert            |  2  |    pass     |
|          LearningToPaint          |  2  |    pass     |
|            Super_SloMo            |  2  |    pass     |
|              alexnet              |  2  |    pass     |
| attention_is_all_you_need_pytorch |  2  |    pass     |
|               dcgan               |  2  |    pass     |
|            densenet121            |  2  |    pass     |
|     detectron2_fcos_r_50_fpn      |  2  |    pass     |
|                drq                |  1  |    pass     |
|               dlrm                |  2  |    pass     |
|       functorch_dp_cifar10        |  2  |    pass     |
|             hf_Albert             |  2  |    pass     |
|              hf_Bart              |  2  |    pass     |
|              hf_Bert              |  2  |    pass     |
|           hf_DistilBert           |  2  |    pass     |
|              hf_GPT2              |  2  |    pass     |
|           hf_Longformer           |  2  |    pass     |
|              yolov3               |  2  |    pass     |
|            hf_Reformer            |  2  | fail_to_run |
|        speech_transformer         |  2  | fail_to_run |
|              demucs               |  1  | fail_to_run |
|          pytorch_stargan          | 16  | fail_to_run |
|            hf_BigBird             |  2  | fail_to_run |
|            tts_angular            |  2  | fail_to_run |
|             tacotron2             |  0  |   0.0000    |
|          vision_maskrcnn          |  0  |   0.0000    |
+-----------------------------------+-----+-------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|          vision_maskrcnn          |  1   | 11.5683  |
|           hf_Longformer           |  1   |  9.2219  |
|     detectron2_fcos_r_50_fpn      |  1   |  9.1967  |
|              yolov3               |  8   |  6.8279  |
|            timm_nfnet             | 128  |  5.4197  |
|            densenet121            |  64  |  4.9745  |
|            Super_SloMo            |  6   |  4.4554  |
| attention_is_all_you_need_pytorch |  32  |  3.6369  |
|              hf_Bart              |  1   |  3.5927  |
|           fastNLP_Bert            |  1   |  3.4844  |
|           BERT_pytorch            |  2   |  3.4361  |
|               hf_T5               |  1   |  3.388   |
|              hf_Bert              |  1   |  2.9421  |
|              hf_GPT2              |  1   |  2.7631  |
|            timm_regnet            |  32  |  2.6082  |
|      timm_vision_transformer      |  8   |  2.5672  |
|             hf_Albert             |  1   |  2.408   |
|         timm_efficientnet         |  64  |  2.0037  |
|        Background_Matting         |  1   |  1.8045  |
|        shufflenet_v2_x1_0         |  64  |  1.5502  |
|            timm_vovnet            |  32  |  1.5498  |
|       functorch_dp_cifar10        |  64  |  1.4493  |
|        mobilenet_v3_large         |  32  |  1.4454  |
|           timm_resnest            |  32  |  1.2875  |
|             resnet50              |  32  |  1.271   |
|          resnext50_32x4d          |  8   |  1.2652  |
|           mobilenet_v2            |  16  |  1.1627  |
|           hf_DistilBert           |  1   |  1.1205  |
|            mnasnet1_0             |  32  |  1.1095  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.7235  |
|           pytorch_unet            |  1   |  0.6441  |
|               dlrm                | 2048 |  0.6379  |
|           squeezenet1_1           |  16  |  0.6199  |
|          LearningToPaint          |  96  |  0.5728  |
|             resnet18              |  8   |  0.505   |
|               vgg16               |  4   |  0.4815  |
|              alexnet              | 128  |  0.3835  |
|                drq                |  1   |  0.3087  |
|      nvidia_deeprecommender       | 256  |  0.2971  |
|           lennard_jones           | 1000 |  0.1949  |
|         soft_actor_critic         | 256  |  0.1944  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1375  |
|               dcgan               | 256  |  0.1053  |
|      resnet50_quantized_qat       |  32  | -0.0596  |
|              demucs               |  0   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|          pytorch_stargan          |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|            tts_angular            |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   |   nan    |
|        Background_Matting         |  1   |   nan    |
|          LearningToPaint          |  96  |   nan    |
|            Super_SloMo            |  6   |   nan    |
|              alexnet              | 128  |   nan    |
| attention_is_all_you_need_pytorch |  32  |   nan    |
|               dcgan               | 256  |   nan    |
|              demucs               |  0   |   nan    |
|            densenet121            |  64  |   nan    |
|     detectron2_fcos_r_50_fpn      |  1   |   nan    |
|               dlrm                | 2048 |   nan    |
|                drq                |  1   |   nan    |
|           fastNLP_Bert            |  1   |   nan    |
|       functorch_dp_cifar10        |  64  |   nan    |
|             hf_Albert             |  1   |   nan    |
|              hf_Bart              |  1   |   nan    |
|              hf_Bert              |  1   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|           hf_DistilBert           |  1   |   nan    |
|              hf_GPT2              |  1   |   nan    |
|           hf_Longformer           |  1   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|               hf_T5               |  1   |   nan    |
|           lennard_jones           | 1000 |   nan    |
|            mnasnet1_0             |  32  |   nan    |
|           mobilenet_v2            |  16  |   nan    |
|    mobilenet_v2_quantized_qat     |  96  |   nan    |
|        mobilenet_v3_large         |  32  |   nan    |
|      nvidia_deeprecommender       | 256  |   nan    |
|   pytorch_CycleGAN_and_pix2pix    |  1   |   nan    |
|          pytorch_stargan          |  0   |   nan    |
|           pytorch_unet            |  1   |   nan    |
|             resnet18              |  8   |   nan    |
|             resnet50              |  32  |   nan    |
|      resnet50_quantized_qat       |  32  |   nan    |
|          resnext50_32x4d          |  8   |   nan    |
|        shufflenet_v2_x1_0         |  64  |   nan    |
|         soft_actor_critic         | 256  |   nan    |
|        speech_transformer         |  0   |   nan    |
|           squeezenet1_1           |  16  |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientnet         |  64  |   nan    |
|            timm_nfnet             | 128  |   nan    |
|            timm_regnet            |  32  |   nan    |
|           timm_resnest            |  32  |   nan    |
|      timm_vision_transformer      |  8   |   nan    |
|            timm_vovnet            |  32  |   nan    |
|            tts_angular            |  0   |   nan    |
|               vgg16               |  4   |   nan    |
|          vision_maskrcnn          |  1   |   nan    |
|              yolov3               |  8   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  6.6101  |
|               GoogleFnet                | 1  |  1.2581  |
|     M2M100ForConditionalGeneration      | 2  |  1.2183  |
|             OPTForCausalLM              | 4  |  1.2159  |
|             XGLMForCausalLM             | 1  |  1.1916  |
|          MobileBertForMaskedLM          | 16 |  1.1915  |
|               DistillGPT2               | 1  |  1.1535  |
|           ElectraForCausalLM            | 1  |  1.1114  |
|          AllenaiLongformerBase          | 1  |  1.0992  |
|            YituTechConvBert             | 1  |  1.0688  |
|           RobertaForCausalLM            | 4  |  1.0421  |
|       MT5ForConditionalGeneration       | 2  |  1.0358  |
|     MobileBertForQuestionAnswering      | 32 |  1.0298  |
|         MegatronBertForCausalLM         | 2  |  1.0162  |
|           DebertaForMaskedLM            | 4  |  1.006   |
|            AlbertForMaskedLM            | 2  |  1.0018  |
|       AlbertForQuestionAnswering        | 2  |  0.9888  |
|                CamemBert                | 1  |  0.9838  |
|       DebertaForQuestionAnswering       | 4  |  0.9677  |
|            TrOCRForCausalLM             | 8  |  0.9655  |
|           PegasusForCausalLM            | 8  |  0.9586  |
|            PLBartForCausalLM            | 16 |  0.9538  |
|          DistilBertForMaskedLM          | 16 |  0.9396  |
|     PLBartForConditionalGeneration      | 8  |  0.9342  |
|     PegasusForConditionalGeneration     | 4  |  0.9337  |
|            MBartForCausalLM             | 16 |  0.908   |
|      MBartForConditionalGeneration      | 8  |  0.8992  |
|         Speech2Text2ForCausalLM         | 64 |  0.8708  |
|        BertForQuestionAnswering         | 64 |  0.8658  |
|       RobertaForQuestionAnswering       | 64 |  0.8653  |
|             BertForMaskedLM             | 64 |  0.8638  |
|     DistilBertForQuestionAnswering      | 32 |  0.8587  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8549  |
|                 T5Small                 | 1  |   0.83   |
|    LayoutLMForSequenceClassification    | 16 |  0.8279  |
|           LayoutLMForMaskedLM           | 16 |  0.8273  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.7867  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7799  |
|             BartForCausalLM             | 2  |  0.7795  |
|       ElectraForQuestionAnswering       | 64 |  0.7742  |
|      GPT2ForSequenceClassification      | 4  |  0.7712  |
|       T5ForConditionalGeneration        | 4  |  0.7252  |
|      BartForConditionalGeneration       | 1  |  0.7104  |
|                 BigBird                 | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |
|             OPTForCausalLM              | 1  |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |
|     PLBartForConditionalGeneration      | 1  |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |
|                 T5Small                 | 1  |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |
|            MBartForCausalLM             | 1  |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |
|          AllenaiLongformerBase          | 1  |    pass     |
|             BartForCausalLM             | 1  |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |
|             BertForMaskedLM             | 1  |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |
|                CamemBert                | 1  |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |
|               DistillGPT2               | 1  |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |
|               GoogleFnet                | 1  |    pass     |
|            YituTechConvBert             | 1  |    pass     |
|                 BigBird                 | 1  | fail_to_run |
+-----------------------------------------+----+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 16 | 10.4985  |
|     MobileBertForQuestionAnswering      | 32 | 10.4742  |
|          AllenaiLongformerBase          | 1  | 10.1465  |
|      BartForConditionalGeneration       | 1  |  8.0126  |
|      MBartForConditionalGeneration      | 8  |  7.766   |
|           DebertaForMaskedLM            | 4  |  7.4744  |
|       DebertaForQuestionAnswering       | 4  |  7.3231  |
|     PegasusForConditionalGeneration     | 4  |  7.2929  |
|    MegatronBertForQuestionAnswering     | 8  |  6.6289  |
|         MegatronBertForCausalLM         | 2  |  6.5296  |
|     M2M100ForConditionalGeneration      | 2  |  6.517   |
|             XGLMForCausalLM             | 1  |  5.7003  |
| BlenderbotSmallForConditionalGeneration | 32 |  5.2596  |
|            YituTechConvBert             | 1  |  4.5927  |
|       T5ForConditionalGeneration        | 4  |  4.5008  |
|           LayoutLMForMaskedLM           | 16 |  4.074   |
|       ElectraForQuestionAnswering       | 64 |  4.0555  |
|    LayoutLMForSequenceClassification    | 16 |  3.867   |
|     PLBartForConditionalGeneration      | 8  |  3.8398  |
|             BertForMaskedLM             | 64 |  3.6523  |
|       RobertaForQuestionAnswering       | 64 |  3.5445  |
|                 T5Small                 | 1  |  3.5072  |
|             BartForCausalLM             | 2  |  3.3817  |
|        BertForQuestionAnswering         | 64 |  3.3744  |
|           RobertaForCausalLM            | 4  |  3.3594  |
|      GPT2ForSequenceClassification      | 4  |  3.3011  |
|       MT5ForConditionalGeneration       | 2  |  3.277   |
|            MBartForCausalLM             | 16 |  3.1166  |
|           PegasusForCausalLM            | 8  |  3.0688  |
|                CamemBert                | 1  |  3.0637  |
|           ElectraForCausalLM            | 1  |  3.0246  |
|            TrOCRForCausalLM             | 8  |  2.9159  |
|             OPTForCausalLM              | 4  |  2.8233  |
|       AlbertForQuestionAnswering        | 2  |  2.7299  |
|            AlbertForMaskedLM            | 2  |  2.654   |
|       BlenderbotSmallForCausalLM        | 64 |  2.312   |
|               GoogleFnet                | 1  |  1.8488  |
|          DistilBertForMaskedLM          | 16 |  1.725   |
|     DistilBertForQuestionAnswering      | 32 |  1.6647  |
|            PLBartForCausalLM            | 16 |  1.6417  |
|         Speech2Text2ForCausalLM         | 64 |  1.5846  |
|               DistillGPT2               | 1  |  1.4312  |
|            XLNetLMHeadModel             | 4  | -17.1104 |
|                 BigBird                 | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |   nan    |
|       AlbertForQuestionAnswering        | 2  |   nan    |
|          AllenaiLongformerBase          | 1  |   nan    |
|             BartForCausalLM             | 2  |   nan    |
|      BartForConditionalGeneration       | 1  |   nan    |
|             BertForMaskedLM             | 64 |   nan    |
|        BertForQuestionAnswering         | 64 |   nan    |
|                 BigBird                 | 0  |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |   nan    |
| BlenderbotSmallForConditionalGeneration | 32 |   nan    |
|                CamemBert                | 1  |   nan    |
|           DebertaForMaskedLM            | 4  |   nan    |
|       DebertaForQuestionAnswering       | 4  |   nan    |
|          DistilBertForMaskedLM          | 16 |   nan    |
|     DistilBertForQuestionAnswering      | 32 |   nan    |
|               DistillGPT2               | 1  |   nan    |
|           ElectraForCausalLM            | 1  |   nan    |
|       ElectraForQuestionAnswering       | 64 |   nan    |
|      GPT2ForSequenceClassification      | 4  |   nan    |
|               GoogleFnet                | 1  |   nan    |
|           LayoutLMForMaskedLM           | 16 |   nan    |
|    LayoutLMForSequenceClassification    | 16 |   nan    |
|     M2M100ForConditionalGeneration      | 2  |   nan    |
|            MBartForCausalLM             | 16 |   nan    |
|      MBartForConditionalGeneration      | 8  |   nan    |
|       MT5ForConditionalGeneration       | 2  |   nan    |
|         MegatronBertForCausalLM         | 2  |   nan    |
|    MegatronBertForQuestionAnswering     | 8  |   nan    |
|          MobileBertForMaskedLM          | 16 |   nan    |
|     MobileBertForQuestionAnswering      | 32 |   nan    |
|             OPTForCausalLM              | 4  |   nan    |
|            PLBartForCausalLM            | 16 |   nan    |
|     PLBartForConditionalGeneration      | 8  |   nan    |
|           PegasusForCausalLM            | 8  |   nan    |
|     PegasusForConditionalGeneration     | 4  |   nan    |
|           RobertaForCausalLM            | 4  |   nan    |
|       RobertaForQuestionAnswering       | 64 |   nan    |
|         Speech2Text2ForCausalLM         | 64 |   nan    |
|       T5ForConditionalGeneration        | 4  |   nan    |
|                 T5Small                 | 1  |   nan    |
|            TrOCRForCausalLM             | 8  |   nan    |
|             XGLMForCausalLM             | 1  |   nan    |
|            XLNetLMHeadModel             | 4  |   nan    |
|            YituTechConvBert             | 1  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.4097  |
|          inception_v3           | 128 |  1.1948  |
|       gluon_inception_v3        | 128 |  1.1843  |
|        adv_inception_v3         | 128 |  1.1781  |
|        res2net101_26w_4s        | 64  |  1.0904  |
|             dpn107              | 32  |  1.0665  |
|             dla102              | 64  |  1.0544  |
|            repvgg_a2            | 128 |  1.0423  |
|            hrnet_w18            |  2  |  1.0366  |
|        res2net50_14w_8s         |  2  |  1.0365  |
|            gernet_l             | 128 |  1.036   |
|            lcnet_050            | 128 |  1.0349  |
|        ese_vovnet19b_dw         | 128 |  1.0294  |
|      mobilenetv3_large_100      | 128 |  1.0176  |
|         mobilenetv2_100         | 128 |  1.0172  |
|           fbnetc_100            | 128 |  1.0149  |
|           resnest101e           | 32  |  1.0148  |
|          gmixer_24_224          | 64  |  1.0147  |
|          cspdarknet53           | 64  |  1.0142  |
|           mnasnet_100           | 128 |  1.0138  |
|          ghostnet_100           | 128 |  1.0114  |
|          spnasnet_100           | 128 |  1.0066  |
|     swsl_resnext101_32x16d      | 32  |  1.0063  |
|        convmixer_768_32         | 32  |  1.0046  |
|            fbnetv3_b            | 128 |  1.0024  |
|        gluon_xception65         | 32  |  1.0008  |
|           selecsls42b           | 128 |  0.9956  |
|           regnety_002           | 128 |  0.9546  |
|           res2next50            |  2  |  0.9453  |
|         poolformer_m36          | 64  |  0.9104  |
|            nfnet_l0             | 64  |  0.9095  |
|           dm_nfnet_f0           | 128 |  0.9038  |
|      xcit_large_24_p8_224       |  5  |  0.8902  |
|           tf_mixnet_l           | 64  |  0.8771  |
|           volo_d1_224           | 64  |  0.8742  |
|  swin_base_patch4_window7_224   | 64  |  0.8543  |
|      beit_base_patch16_224      | 64  |  0.8432  |
|            mixnet_l             | 64  |  0.8371  |
|           rexnet_100            | 128 |  0.8289  |
|          mixer_b16_224          | 64  |  0.8269  |
|          cait_m36_384           |  2  |  0.8253  |
|          resmlp_12_224          | 128 |  0.8249  |
| deit_base_distilled_patch16_224 | 64  |  0.8189  |
|           convit_base           | 32  |  0.8189  |
|      vit_base_patch16_224       | 64  |  0.8177  |
|         visformer_small         | 128 |  0.8131  |
|        tnt_s_patch16_224        | 64  |  0.7996  |
|          convnext_base          | 32  |  0.7934  |
|         coat_lite_mini          | 128 |  0.7662  |
|       tf_efficientnet_b0        | 128 |  0.7497  |
|           mobilevit_s           | 32  |  0.7402  |
|          jx_nest_base           | 32  |  0.732   |
|        twins_pcpvt_base         | 32  |  0.7223  |
|            pit_b_224            | 64  |  0.7186  |
|            tinynet_a            | 128 |  0.701   |
|         crossvit_9_240          | 64  |  0.6647  |
|          gmlp_s16_224           | 64  |  0.6182  |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|  swin_base_patch4_window7_224   | 2  | fail_accuracy |
|          ghostnet_100           | 2  | fail_accuracy |
|             dpn107              | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|  swin_base_patch4_window7_224   | 64  | 10.7024  |
|          cait_m36_384           |  2  | 10.2615  |
|      xcit_large_24_p8_224       |  5  |  9.9294  |
|         poolformer_m36          | 64  |  8.9775  |
|          jx_nest_base           | 32  |  8.5136  |
|            hrnet_w18            |  2  |  8.3372  |
|        twins_pcpvt_base         | 32  |  8.1557  |
|          pnasnet5large          | 16  |  6.6776  |
|        tnt_s_patch16_224        | 64  |  6.528   |
|          gmlp_s16_224           | 64  |  6.4682  |
|           tf_mixnet_l           | 64  |  6.137   |
|           volo_d1_224           | 64  |  6.1065  |
|           mobilevit_s           | 32  |  6.0721  |
|           dm_nfnet_f0           | 128 |  5.7608  |
|            mixnet_l             | 64  |  5.7029  |
|         crossvit_9_240          | 64  |  4.9915  |
|          convnext_base          | 32  |  4.9459  |
|        res2net101_26w_4s        | 64  |  4.7408  |
|            pit_b_224            | 64  |  4.6434  |
|             dpn107              | 32  |  4.4485  |
|         coat_lite_mini          | 128 |  4.308   |
|        res2net50_14w_8s         |  2  |  4.2332  |
|            fbnetv3_b            | 128 |  3.9204  |
|           convit_base           | 32  |  3.9035  |
|      beit_base_patch16_224      | 64  |  3.7148  |
|          gmixer_24_224          | 64  |  3.0949  |
| deit_base_distilled_patch16_224 | 64  |  3.0859  |
|         visformer_small         | 128 |  3.0531  |
|            nfnet_l0             | 64  |  2.9973  |
|            tinynet_a            | 128 |  2.9481  |
|      vit_base_patch16_224       | 64  |  2.888   |
|       tf_efficientnet_b0        | 128 |  2.8524  |
|           rexnet_100            | 128 |  2.8057  |
|          mixer_b16_224          | 64  |  2.6488  |
|           res2next50            |  2  |  2.4974  |
|          cspdarknet53           | 64  |  2.362   |
|          ghostnet_100           | 128 |  2.1989  |
|       gluon_inception_v3        | 128 |  2.1866  |
|        gluon_xception65         | 32  |  2.0955  |
|             dla102              | 64  |   2.09   |
|           fbnetc_100            | 128 |  1.8692  |
|          spnasnet_100           | 128 |  1.8467  |
|           regnety_002           | 128 |  1.8316  |
|          resmlp_12_224          | 128 |  1.8239  |
|        convmixer_768_32         | 32  |  1.8029  |
|            gernet_l             | 128 |  1.718   |
|        adv_inception_v3         | 128 |  1.704   |
|          inception_v3           | 128 |  1.6711  |
|      mobilenetv3_large_100      | 128 |  1.6126  |
|            repvgg_a2            | 128 |  1.5339  |
|           mnasnet_100           | 128 |  1.5139  |
|        ese_vovnet19b_dw         | 128 |  1.3457  |
|         mobilenetv2_100         | 128 |  1.2205  |
|     swsl_resnext101_32x16d      | 32  |  1.1662  |
|           selecsls42b           | 128 |  0.9562  |
|            lcnet_050            | 128 |  0.9299  |
|           resnest101e           | 32  | -1.6045  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |   nan    |
|      beit_base_patch16_224      | 64  |   nan    |
|          botnet26t_256          |  0  |   nan    |
|          cait_m36_384           |  2  |   nan    |
|         coat_lite_mini          | 128 |   nan    |
|           convit_base           | 32  |   nan    |
|        convmixer_768_32         | 32  |   nan    |
|          convnext_base          | 32  |   nan    |
|         crossvit_9_240          | 64  |   nan    |
|          cspdarknet53           | 64  |   nan    |
| deit_base_distilled_patch16_224 | 64  |   nan    |
|             dla102              | 64  |   nan    |
|           dm_nfnet_f0           | 128 |   nan    |
|             dpn107              | 32  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        ese_vovnet19b_dw         | 128 |   nan    |
|           fbnetc_100            | 128 |   nan    |
|            fbnetv3_b            | 128 |   nan    |
|            gernet_l             | 128 |   nan    |
|          ghostnet_100           | 128 |   nan    |
|       gluon_inception_v3        | 128 |   nan    |
|        gluon_xception65         | 32  |   nan    |
|          gmixer_24_224          | 64  |   nan    |
|          gmlp_s16_224           | 64  |   nan    |
|            hrnet_w18            |  2  |   nan    |
|          inception_v3           | 128 |   nan    |
|          jx_nest_base           | 32  |   nan    |
|            lcnet_050            | 128 |   nan    |
|          mixer_b16_224          | 64  |   nan    |
|            mixnet_l             | 64  |   nan    |
|           mnasnet_100           | 128 |   nan    |
|         mobilenetv2_100         | 128 |   nan    |
|      mobilenetv3_large_100      | 128 |   nan    |
|           mobilevit_s           | 32  |   nan    |
|            nfnet_l0             | 64  |   nan    |
|            pit_b_224            | 64  |   nan    |
|          pnasnet5large          | 16  |   nan    |
|         poolformer_m36          | 64  |   nan    |
|           regnety_002           | 128 |   nan    |
|            repvgg_a2            | 128 |   nan    |
|        res2net101_26w_4s        | 64  |   nan    |
|        res2net50_14w_8s         |  2  |   nan    |
|           res2next50            |  2  |   nan    |
|          resmlp_12_224          | 128 |   nan    |
|           resnest101e           | 32  |   nan    |
|           rexnet_100            | 128 |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
|           selecsls42b           | 128 |   nan    |
|          spnasnet_100           | 128 |   nan    |
|  swin_base_patch4_window7_224   | 64  |   nan    |
|     swsl_resnext101_32x16d      | 32  |   nan    |
|       tf_efficientnet_b0        | 128 |   nan    |
|           tf_mixnet_l           | 64  |   nan    |
|            tinynet_a            | 128 |   nan    |
|        tnt_s_patch16_224        | 64  |   nan    |
|        twins_pcpvt_base         | 32  |   nan    |
|         visformer_small         | 128 |   nan    |
|      vit_base_patch16_224       | 64  |   nan    |
|           volo_d1_224           | 64  |   nan    |
|      xcit_large_24_p8_224       |  5  |   nan    |
+---------------------------------+-----+----------+

blzheng · 2022-10-13T02:59:11Z

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

We evaluate torchinductor across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 87%, 47/54 | 93%, 41/44  | 87%, 53/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.00x    |    1.04x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   17.67    |    21.57    |    37.83    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    0.0x    |    0.0x     |    0.0x     |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           squeezenet1_1           |  16  |  2.0426  |
|       functorch_dp_cifar10        |  64  |  1.3211  |
|        shufflenet_v2_x1_0         |  64  |  1.2811  |
|           timm_resnest            |  32  |  1.2005  |
|              alexnet              | 128  |  1.1775  |
|             resnet18              |  8   |  1.1458  |
|          vision_maskrcnn          |  1   |  1.0896  |
|               vgg16               |  4   |  1.0833  |
|            densenet121            |  64  |  1.0796  |
|             resnet50              |  32  |  1.0712  |
|          resnext50_32x4d          |  8   |  1.0589  |
|            timm_vovnet            |  32  |  1.0577  |
|            Super_SloMo            |  6   |  1.053   |
|                drq                |  1   |  1.0514  |
|              yolov3               |  8   |  1.0338  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0336  |
|         soft_actor_critic         | 256  |  1.0149  |
|          LearningToPaint          |  96  |  1.0088  |
|            timm_regnet            |  32  |  1.004   |
|           pytorch_unet            |  1   |  1.0029  |
| attention_is_all_you_need_pytorch |  32  |  1.0016  |
|               dlrm                | 2048 |  1.0014  |
|      resnet50_quantized_qat       |  32  |  1.0002  |
|    mobilenet_v2_quantized_qat     |  96  |   1.0    |
|               dcgan               | 256  |  0.9906  |
|        mobilenet_v3_large         |  32  |  0.9839  |
|          pytorch_stargan          |  16  |  0.9812  |
|           mobilenet_v2            |  16  |  0.9542  |
|            mnasnet1_0             |  32  |  0.9527  |
|        Background_Matting         |  1   |  0.951   |
|   timm_vision_transformer_large   |  8   |  0.7949  |
|           BERT_pytorch            |  2   |  0.7655  |
|           hf_Longformer           |  1   |  0.7648  |
|              hf_GPT2              |  1   |  0.7537  |
|            timm_nfnet             | 128  |  0.7182  |
|      nvidia_deeprecommender       | 256  |  0.6899  |
|           hf_DistilBert           |  1   |  0.6658  |
|             hf_Albert             |  1   |  0.6541  |
|            hf_T5_large            |  1   |  0.6501  |
|              hf_Bert              |  1   |  0.6301  |
|           fastNLP_Bert            |  1   |  0.5986  |
|              hf_Bart              |  1   |  0.5983  |
|           hf_GPT2_large           |  1   |  0.5679  |
|      timm_vision_transformer      |  8   |  0.5591  |
|               hf_T5               |  1   |  0.5566  |
|            hf_T5_base             |  1   |  0.4761  |
|         timm_efficientnet         |  64  |  0.4639  |
|           lennard_jones           | 1000 |  0.284   |
|            hf_BigBird             |  0   |   0.0    |
|            hf_Reformer            |  0   |   0.0    |
|            tts_angular            |  0   |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
|              demucs               |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+-------------+
|               name                | bs  |  inductor   |
+-----------------------------------+-----+-------------+
|           BERT_pytorch            |  2  |    pass     |
|          resnext50_32x4d          |  2  |    pass     |
|           mobilenet_v2            |  2  |    pass     |
|    mobilenet_v2_quantized_qat     |  2  |    pass     |
|        mobilenet_v3_large         |  2  |    pass     |
|      nvidia_deeprecommender       |  2  |    pass     |
|   pytorch_CycleGAN_and_pix2pix    |  1  |    pass     |
|          pytorch_stargan          | 16  |    pass     |
|           pytorch_unet            |  2  |    pass     |
|             resnet18              |  2  |    pass     |
|             resnet50              |  2  |    pass     |
|      resnet50_quantized_qat       |  2  |    pass     |
|        shufflenet_v2_x1_0         |  2  |    pass     |
|           lennard_jones           |  2  |    pass     |
|         soft_actor_critic         | 256 |    pass     |
|           squeezenet1_1           |  2  |    pass     |
|         timm_efficientnet         |  2  |    pass     |
|            timm_nfnet             |  2  |    pass     |
|            timm_regnet            |  2  |    pass     |
|           timm_resnest            |  2  |    pass     |
|      timm_vision_transformer      |  2  |    pass     |
|   timm_vision_transformer_large   |  2  |    pass     |
|            timm_vovnet            |  2  |    pass     |
|               vgg16               |  2  |    pass     |
|        Background_Matting         |  1  |    pass     |
|            mnasnet1_0             |  2  |    pass     |
|            hf_T5_large            |  2  |    pass     |
|       functorch_dp_cifar10        |  2  |    pass     |
|          LearningToPaint          |  2  |    pass     |
|            Super_SloMo            |  2  |    pass     |
|              alexnet              |  2  |    pass     |
| attention_is_all_you_need_pytorch |  2  |    pass     |
|               dcgan               |  2  |    pass     |
|            densenet121            |  2  |    pass     |
|               dlrm                |  2  |    pass     |
|            hf_T5_base             |  2  |    pass     |
|           fastNLP_Bert            |  2  |    pass     |
|                drq                |  1  |    pass     |
|             hf_Albert             |  2  |    pass     |
|              hf_Bart              |  2  |    pass     |
|              hf_Bert              |  2  |    pass     |
|           hf_DistilBert           |  2  |    pass     |
|              hf_GPT2              |  2  |    pass     |
|           hf_GPT2_large           |  2  |    pass     |
|           hf_Longformer           |  2  |    pass     |
|               hf_T5               |  2  |    pass     |
|              yolov3               |  2  |    pass     |
|            hf_BigBird             |  2  | fail_to_run |
|     detectron2_fcos_r_50_fpn      |  2  | fail_to_run |
|              demucs               |  1  | fail_to_run |
|            hf_Reformer            |  2  | fail_to_run |
|            tts_angular            |  2  | fail_to_run |
|        speech_transformer         |  0  |   0.0000    |
|          vision_maskrcnn          |  0  |   0.0000    |
+-----------------------------------+-----+-------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_base             |  1   | 110.8172 |
|            timm_nfnet             | 128  | 89.3988  |
|            densenet121            |  64  | 83.6673  |
|           hf_GPT2_large           |  1   | 68.8979  |
|   timm_vision_transformer_large   |  8   | 54.2422  |
|            hf_T5_large            |  1   | 49.4562  |
|         timm_efficientnet         |  64  | 39.8036  |
|          vision_maskrcnn          |  1   | 39.3907  |
|            Super_SloMo            |  6   | 34.6477  |
|        Background_Matting         |  1   | 26.0698  |
|              yolov3               |  8   | 25.8651  |
|           pytorch_unet            |  1   | 20.5692  |
|        mobilenet_v3_large         |  32  |  18.621  |
|            timm_regnet            |  32  | 16.8904  |
|      timm_vision_transformer      |  8   | 13.7887  |
|           hf_Longformer           |  1   | 12.8186  |
|        shufflenet_v2_x1_0         |  64  | 11.4008  |
|              hf_Bart              |  1   | 10.0706  |
|           mobilenet_v2            |  16  |  9.776   |
|            timm_vovnet            |  32  |  9.7671  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2664  |
|          LearningToPaint          |  96  |  9.0346  |
|          pytorch_stargan          |  16  |  8.9297  |
|               hf_T5               |  1   |  8.3078  |
|           fastNLP_Bert            |  1   |  7.3867  |
|           timm_resnest            |  32  |  7.3225  |
|               vgg16               |  4   |  7.1152  |
|             resnet50              |  32  |  6.4957  |
|            mnasnet1_0             |  32  |  6.1132  |
| attention_is_all_you_need_pytorch |  32  |  5.6499  |
|           squeezenet1_1           |  16  |  5.4609  |
|             hf_Albert             |  1   |   5.4    |
|              hf_Bert              |  1   |  5.222   |
|              hf_GPT2              |  1   |  4.8789  |
|              alexnet              | 128  |  4.5239  |
|           BERT_pytorch            |  2   |  4.098   |
|           hf_DistilBert           |  1   |  3.285   |
|             resnet18              |  8   |  2.9366  |
|               dcgan               | 256  |  2.7928  |
|      nvidia_deeprecommender       | 256  |  2.6756  |
|       functorch_dp_cifar10        |  64  |  2.4487  |
|          resnext50_32x4d          |  8   |  1.7993  |
|               dlrm                | 2048 |  1.6011  |
|                drq                |  1   |  0.4278  |
|         soft_actor_critic         | 256  |  0.2726  |
|           lennard_jones           | 1000 |  0.2698  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1607  |
|      resnet50_quantized_qat       |  32  |  0.1281  |
|              demucs               |  0   |   nan    |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|            tts_angular            |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   |   nan    |
|        Background_Matting         |  1   |   nan    |
|          LearningToPaint          |  96  |   nan    |
|            Super_SloMo            |  6   |   nan    |
|              alexnet              | 128  |   nan    |
| attention_is_all_you_need_pytorch |  32  |   nan    |
|               dcgan               | 256  |   nan    |
|              demucs               |  0   |   nan    |
|            densenet121            |  64  |   nan    |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|               dlrm                | 2048 |   nan    |
|                drq                |  1   |   nan    |
|           fastNLP_Bert            |  1   |   nan    |
|       functorch_dp_cifar10        |  64  |   nan    |
|             hf_Albert             |  1   |   nan    |
|              hf_Bart              |  1   |   nan    |
|              hf_Bert              |  1   |   nan    |
|            hf_BigBird             |  0   |   nan    |
|           hf_DistilBert           |  1   |   nan    |
|              hf_GPT2              |  1   |   nan    |
|           hf_GPT2_large           |  1   |   nan    |
|           hf_Longformer           |  1   |   nan    |
|            hf_Reformer            |  0   |   nan    |
|               hf_T5               |  1   |   nan    |
|            hf_T5_base             |  1   |   nan    |
|            hf_T5_large            |  1   |   nan    |
|           lennard_jones           | 1000 |   nan    |
|            mnasnet1_0             |  32  |   nan    |
|           mobilenet_v2            |  16  |   nan    |
|    mobilenet_v2_quantized_qat     |  96  |   nan    |
|        mobilenet_v3_large         |  32  |   nan    |
|      nvidia_deeprecommender       | 256  |   nan    |
|   pytorch_CycleGAN_and_pix2pix    |  1   |   nan    |
|          pytorch_stargan          |  16  |   nan    |
|           pytorch_unet            |  1   |   nan    |
|             resnet18              |  8   |   nan    |
|             resnet50              |  32  |   nan    |
|      resnet50_quantized_qat       |  32  |   nan    |
|          resnext50_32x4d          |  8   |   nan    |
|        shufflenet_v2_x1_0         |  64  |   nan    |
|         soft_actor_critic         | 256  |   nan    |
|        speech_transformer         |  0   |   nan    |
|           squeezenet1_1           |  16  |   nan    |
|         timm_efficientnet         |  64  |   nan    |
|            timm_nfnet             | 128  |   nan    |
|            timm_regnet            |  32  |   nan    |
|           timm_resnest            |  32  |   nan    |
|      timm_vision_transformer      |  8   |   nan    |
|   timm_vision_transformer_large   |  8   |   nan    |
|            timm_vovnet            |  32  |   nan    |
|            tts_angular            |  0   |   nan    |
|               vgg16               |  4   |   nan    |
|          vision_maskrcnn          |  1   |   nan    |
|              yolov3               |  8   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  0.9522  |
|       AlbertForQuestionAnswering        | 2  |  0.8971  |
|            AlbertForMaskedLM            | 2  |  0.8924  |
|       DebertaForQuestionAnswering       | 4  |  0.8817  |
|     MobileBertForQuestionAnswering      | 32 |  0.8509  |
|     M2M100ForConditionalGeneration      | 2  |  0.8381  |
|               GoogleFnet                | 1  |  0.8183  |
|           DebertaForMaskedLM            | 4  |  0.8167  |
|             OPTForCausalLM              | 4  |  0.8164  |
|            YituTechConvBert             | 1  |  0.7934  |
|             XGLMForCausalLM             | 1  |  0.7806  |
|    MegatronBertForQuestionAnswering     | 8  |  0.7733  |
|     PegasusForConditionalGeneration     | 4  |  0.7671  |
|      MBartForConditionalGeneration      | 8  |   0.76   |
|            TrOCRForCausalLM             | 8  |  0.7563  |
|           PegasusForCausalLM            | 8  |  0.7555  |
|            MBartForCausalLM             | 16 |  0.7508  |
|          MobileBertForMaskedLM          | 16 |  0.7499  |
|          AllenaiLongformerBase          | 1  |  0.7458  |
|       RobertaForQuestionAnswering       | 64 |  0.739   |
|     DistilBertForQuestionAnswering      | 32 |  0.7389  |
|        BertForQuestionAnswering         | 64 |  0.7387  |
|           RobertaForCausalLM            | 4  |  0.7317  |
|     PLBartForConditionalGeneration      | 8  |  0.7185  |
|             BertForMaskedLM             | 64 |  0.7183  |
|          DistilBertForMaskedLM          | 16 |  0.7094  |
|            PLBartForCausalLM            | 16 |  0.6941  |
|         Speech2Text2ForCausalLM         | 64 |  0.6843  |
|    LayoutLMForSequenceClassification    | 16 |  0.6161  |
|               DistillGPT2               | 1  |  0.6031  |
|                CamemBert                | 1  |  0.602   |
|           LayoutLMForMaskedLM           | 16 |  0.5932  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.5907  |
|       BlenderbotSmallForCausalLM        | 64 |  0.5839  |
|             BartForCausalLM             | 2  |  0.5709  |
|      BartForConditionalGeneration       | 1  |  0.5091  |
|      GPT2ForSequenceClassification      | 4  |  0.493   |
|       ElectraForQuestionAnswering       | 64 |   0.47   |
|       T5ForConditionalGeneration        | 4  |  0.4668  |
|                 T5Small                 | 1  |  0.4642  |
|           ElectraForCausalLM            | 1  |   0.43   |
|         MegatronBertForCausalLM         | 0  |   0.0    |
|                 BigBird                 | 0  |   0.0    |
|       MT5ForConditionalGeneration       | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  |    pass     |
|       AlbertForQuestionAnswering        | 1  |    pass     |
|      MBartForConditionalGeneration      | 1  |    pass     |
|       MT5ForConditionalGeneration       | 1  |    pass     |
|         MegatronBertForCausalLM         | 1  |    pass     |
|    MegatronBertForQuestionAnswering     | 1  |    pass     |
|          MobileBertForMaskedLM          | 1  |    pass     |
|     MobileBertForQuestionAnswering      | 1  |    pass     |
|             OPTForCausalLM              | 1  |    pass     |
|            PLBartForCausalLM            | 1  |    pass     |
|     PLBartForConditionalGeneration      | 1  |    pass     |
|           PegasusForCausalLM            | 1  |    pass     |
|     PegasusForConditionalGeneration     | 1  |    pass     |
|           RobertaForCausalLM            | 1  |    pass     |
|       RobertaForQuestionAnswering       | 1  |    pass     |
|         Speech2Text2ForCausalLM         | 1  |    pass     |
|       T5ForConditionalGeneration        | 1  |    pass     |
|                 T5Small                 | 1  |    pass     |
|            TrOCRForCausalLM             | 1  |    pass     |
|             XGLMForCausalLM             | 1  |    pass     |
|            XLNetLMHeadModel             | 1  |    pass     |
|            MBartForCausalLM             | 1  |    pass     |
|     M2M100ForConditionalGeneration      | 1  |    pass     |
|    LayoutLMForSequenceClassification    | 1  |    pass     |
|           LayoutLMForMaskedLM           | 1  |    pass     |
|          AllenaiLongformerBase          | 1  |    pass     |
|             BartForCausalLM             | 1  |    pass     |
|      BartForConditionalGeneration       | 1  |    pass     |
|             BertForMaskedLM             | 1  |    pass     |
|        BertForQuestionAnswering         | 1  |    pass     |
|       BlenderbotSmallForCausalLM        | 1  |    pass     |
| BlenderbotSmallForConditionalGeneration | 1  |    pass     |
|                CamemBert                | 1  |    pass     |
|           DebertaForMaskedLM            | 1  |    pass     |
|       DebertaForQuestionAnswering       | 1  |    pass     |
|          DistilBertForMaskedLM          | 1  |    pass     |
|     DistilBertForQuestionAnswering      | 1  |    pass     |
|               DistillGPT2               | 1  |    pass     |
|           ElectraForCausalLM            | 1  |    pass     |
|       ElectraForQuestionAnswering       | 1  |    pass     |
|      GPT2ForSequenceClassification      | 1  |    pass     |
|               GoogleFnet                | 1  |    pass     |
|            YituTechConvBert             | 1  |    pass     |
|                 BigBird                 | 1  | fail_to_run |
+-----------------------------------------+----+-------------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|       T5ForConditionalGeneration        | 4  | 55.5113  |
|       ElectraForQuestionAnswering       | 64 | 53.9419  |
|      GPT2ForSequenceClassification      | 4  |  51.708  |
|      BartForConditionalGeneration       | 1  | 48.6713  |
|           LayoutLMForMaskedLM           | 16 | 47.4153  |
|    LayoutLMForSequenceClassification    | 16 | 36.5231  |
|             BartForCausalLM             | 2  | 34.1458  |
|             BertForMaskedLM             | 64 | 33.9874  |
| BlenderbotSmallForConditionalGeneration | 32 | 30.9125  |
|       BlenderbotSmallForCausalLM        | 64 | 30.2863  |
|            AlbertForMaskedLM            | 2  | 26.5338  |
|           DebertaForMaskedLM            | 4  | 26.4973  |
|     PegasusForConditionalGeneration     | 4  | 26.3414  |
|                 T5Small                 | 1  | 25.6895  |
|      MBartForConditionalGeneration      | 8  | 24.7751  |
|            XLNetLMHeadModel             | 4  | 24.4838  |
|          AllenaiLongformerBase          | 1  | 22.8838  |
|     MobileBertForQuestionAnswering      | 32 |  20.116  |
|       AlbertForQuestionAnswering        | 2  | 19.5679  |
|       RobertaForQuestionAnswering       | 64 | 19.4393  |
|    MegatronBertForQuestionAnswering     | 8  | 18.6799  |
|        BertForQuestionAnswering         | 64 | 18.6026  |
|            MBartForCausalLM             | 16 |  18.417  |
|          MobileBertForMaskedLM          | 16 | 16.5618  |
|       DebertaForQuestionAnswering       | 4  | 15.1709  |
|     PLBartForConditionalGeneration      | 8  | 13.9216  |
|         Speech2Text2ForCausalLM         | 64 | 12.1095  |
|           PegasusForCausalLM            | 8  | 11.5466  |
|          DistilBertForMaskedLM          | 16 | 11.5357  |
|     DistilBertForQuestionAnswering      | 32 | 11.0201  |
|     M2M100ForConditionalGeneration      | 2  | 10.8537  |
|            PLBartForCausalLM            | 16 | 10.2818  |
|            YituTechConvBert             | 1  |  8.115   |
|             XGLMForCausalLM             | 1  |  8.0807  |
|            TrOCRForCausalLM             | 8  |  7.8229  |
|                CamemBert                | 1  |  6.4793  |
|           RobertaForCausalLM            | 4  |  6.2543  |
|           ElectraForCausalLM            | 1  |  5.9684  |
|             OPTForCausalLM              | 4  |  5.4036  |
|               DistillGPT2               | 1  |  5.024   |
|               GoogleFnet                | 1  |  3.2717  |
|                 BigBird                 | 0  |   nan    |
|       MT5ForConditionalGeneration       | 0  |   nan    |
|         MegatronBertForCausalLM         | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |   nan    |
|       AlbertForQuestionAnswering        | 2  |   nan    |
|          AllenaiLongformerBase          | 1  |   nan    |
|             BartForCausalLM             | 2  |   nan    |
|      BartForConditionalGeneration       | 1  |   nan    |
|             BertForMaskedLM             | 64 |   nan    |
|        BertForQuestionAnswering         | 64 |   nan    |
|                 BigBird                 | 0  |   nan    |
|       BlenderbotSmallForCausalLM        | 64 |   nan    |
| BlenderbotSmallForConditionalGeneration | 32 |   nan    |
|                CamemBert                | 1  |   nan    |
|           DebertaForMaskedLM            | 4  |   nan    |
|       DebertaForQuestionAnswering       | 4  |   nan    |
|          DistilBertForMaskedLM          | 16 |   nan    |
|     DistilBertForQuestionAnswering      | 32 |   nan    |
|               DistillGPT2               | 1  |   nan    |
|           ElectraForCausalLM            | 1  |   nan    |
|       ElectraForQuestionAnswering       | 64 |   nan    |
|      GPT2ForSequenceClassification      | 4  |   nan    |
|               GoogleFnet                | 1  |   nan    |
|           LayoutLMForMaskedLM           | 16 |   nan    |
|    LayoutLMForSequenceClassification    | 16 |   nan    |
|     M2M100ForConditionalGeneration      | 2  |   nan    |
|            MBartForCausalLM             | 16 |   nan    |
|      MBartForConditionalGeneration      | 8  |   nan    |
|       MT5ForConditionalGeneration       | 0  |   nan    |
|         MegatronBertForCausalLM         | 0  |   nan    |
|    MegatronBertForQuestionAnswering     | 8  |   nan    |
|          MobileBertForMaskedLM          | 16 |   nan    |
|     MobileBertForQuestionAnswering      | 32 |   nan    |
|             OPTForCausalLM              | 4  |   nan    |
|            PLBartForCausalLM            | 16 |   nan    |
|     PLBartForConditionalGeneration      | 8  |   nan    |
|           PegasusForCausalLM            | 8  |   nan    |
|     PegasusForConditionalGeneration     | 4  |   nan    |
|           RobertaForCausalLM            | 4  |   nan    |
|       RobertaForQuestionAnswering       | 64 |   nan    |
|         Speech2Text2ForCausalLM         | 64 |   nan    |
|       T5ForConditionalGeneration        | 4  |   nan    |
|                 T5Small                 | 1  |   nan    |
|            TrOCRForCausalLM             | 8  |   nan    |
|             XGLMForCausalLM             | 1  |   nan    |
|            XLNetLMHeadModel             | 4  |   nan    |
|            YituTechConvBert             | 1  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.664   |
|          inception_v3           | 128 |  1.3505  |
|       gluon_inception_v3        | 128 |  1.3323  |
|        adv_inception_v3         | 128 |  1.3165  |
|           res2next50            |  2  |  1.1219  |
|        res2net50_14w_8s         |  2  |  1.0979  |
|        ese_vovnet19b_dw         | 128 |  1.0866  |
|             dla102              | 64  |  1.0829  |
|           resnest101e           | 32  |  1.0724  |
|        res2net101_26w_4s        | 64  |  1.0501  |
|            hrnet_w18            |  2  |  1.0403  |
|             dpn107              | 32  |  1.0259  |
|            repvgg_a2            | 128 |  1.0171  |
|        convmixer_768_32         | 32  |  1.0058  |
|            gernet_l             | 128 |  0.998   |
|     swsl_resnext101_32x16d      | 32  |  0.9949  |
|          ghostnet_100           | 128 |  0.9917  |
|          cspdarknet53           | 64  |  0.9902  |
|           selecsls42b           | 128 |  0.9854  |
|        gluon_xception65         | 32  |  0.9847  |
|            lcnet_050            | 128 |  0.983   |
|           regnety_002           | 128 |  0.979   |
|            fbnetv3_b            | 128 |  0.9709  |
|      mobilenetv3_large_100      | 128 |  0.9617  |
|          spnasnet_100           | 128 |  0.9466  |
|           mnasnet_100           | 128 |  0.9465  |
|           fbnetc_100            | 128 |  0.9433  |
|         mobilenetv2_100         | 128 |  0.942   |
|          gmixer_24_224          | 64  |  0.8434  |
|      xcit_large_24_p8_224       |  5  |  0.835   |
|            nfnet_l0             | 64  |  0.7418  |
|         poolformer_m36          | 64  |  0.7288  |
|          cait_m36_384           |  2  |  0.7162  |
|           dm_nfnet_f0           | 128 |  0.6992  |
| deit_base_distilled_patch16_224 | 64  |  0.6954  |
|          mixer_b16_224          | 64  |  0.6919  |
|      beit_base_patch16_224      | 64  |  0.6911  |
|      vit_base_patch16_224       | 64  |  0.6855  |
|  swin_base_patch4_window7_224   | 64  |  0.6759  |
|           convit_base           | 32  |  0.6559  |
|          resmlp_12_224          | 128 |  0.6513  |
|           tf_mixnet_l           | 64  |  0.633   |
|            mixnet_l             | 64  |  0.6284  |
|           volo_d1_224           | 64  |  0.6235  |
|         visformer_small         | 128 |  0.6217  |
|          convnext_base          | 32  |  0.6019  |
|           rexnet_100            | 128 |  0.5803  |
|        tnt_s_patch16_224        | 64  |  0.561   |
|            pit_b_224            | 64  |  0.5376  |
|        twins_pcpvt_base         | 32  |  0.5318  |
|          jx_nest_base           | 32  |  0.5297  |
|         coat_lite_mini          | 128 |  0.5291  |
|         crossvit_9_240          | 64  |  0.4988  |
|       tf_efficientnet_b0        | 128 |  0.4821  |
|           mobilevit_s           | 32  |   0.47   |
|            tinynet_a            | 128 |  0.4677  |
|          gmlp_s16_224           | 64  |  0.419   |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|  swin_base_patch4_window7_224   | 2  | fail_accuracy |
|          ghostnet_100           | 2  | fail_accuracy |
|             dpn107              | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+-----------+
|              name               | bs  | inductor  |
+---------------------------------+-----+-----------+
|           dm_nfnet_f0           | 128 | 120.4526  |
|            pit_b_224            | 64  |  93.2877  |
|         coat_lite_mini          | 128 |  86.3203  |
|  swin_base_patch4_window7_224   | 64  |   85.21   |
|           rexnet_100            | 128 |  78.2162  |
|          jx_nest_base           | 32  |  73.2018  |
|        twins_pcpvt_base         | 32  |  73.0605  |
|             dpn107              | 32  |  68.3539  |
|            nfnet_l0             | 64  |  59.3309  |
|           volo_d1_224           | 64  |   59.28   |
|          gmlp_s16_224           | 64  |  54.9854  |
|         poolformer_m36          | 64  |  54.1717  |
|        tnt_s_patch16_224        | 64  |  53.9854  |
|          pnasnet5large          | 16  |  53.3107  |
|       tf_efficientnet_b0        | 128 |  52.2643  |
|            tinynet_a            | 128 |  51.7527  |
|         visformer_small         | 128 |  49.5635  |
|          convnext_base          | 32  |  48.0006  |
|           mobilevit_s           | 32  |  47.8097  |
|           convit_base           | 32  |  47.6997  |
|            mixnet_l             | 64  |  47.2449  |
| deit_base_distilled_patch16_224 | 64  |  47.1471  |
|      vit_base_patch16_224       | 64  |  46.3161  |
|         crossvit_9_240          | 64  |  45.886   |
|      beit_base_patch16_224      | 64  |  43.2621  |
|          cait_m36_384           |  2  |  40.3293  |
|      xcit_large_24_p8_224       |  5  |  39.4825  |
|            fbnetv3_b            | 128 |  39.2445  |
|          mixer_b16_224          | 64  |  36.5714  |
|             dla102              | 64  |  34.2857  |
|          ghostnet_100           | 128 |  32.7922  |
|           tf_mixnet_l           | 64  |  29.7102  |
|        res2net101_26w_4s        | 64  |  29.1843  |
|          cspdarknet53           | 64  |  27.721   |
|          resmlp_12_224          | 128 |  24.0163  |
|        gluon_xception65         | 32  |  22.7874  |
|        ese_vovnet19b_dw         | 128 |  21.0097  |
|          gmixer_24_224          | 64  |  19.8152  |
|        adv_inception_v3         | 128 |  18.1021  |
|           fbnetc_100            | 128 |  15.4501  |
|           selecsls42b           | 128 |  14.6314  |
|      mobilenetv3_large_100      | 128 |  13.3493  |
|         mobilenetv2_100         | 128 |  13.0842  |
|           regnety_002           | 128 |  12.2577  |
|            gernet_l             | 128 |  11.3181  |
|            repvgg_a2            | 128 |  8.4116   |
|            lcnet_050            | 128 |  7.8323   |
|            hrnet_w18            |  2  |  7.7945   |
|        convmixer_768_32         | 32  |  5.9951   |
|        res2net50_14w_8s         |  2  |  5.1672   |
|           mnasnet_100           | 128 |  4.6516   |
|          spnasnet_100           | 128 |  3.0418   |
|           res2next50            |  2  |  2.0724   |
|       gluon_inception_v3        | 128 | -17.0452  |
|          inception_v3           | 128 | -18.3518  |
|     swsl_resnext101_32x16d      | 32  | -26.2001  |
|           resnest101e           | 32  | -132.8819 |
|          botnet26t_256          |  0  |    nan    |
|       eca_botnext26ts_256       |  0  |    nan    |
|        eca_halonext26ts         |  0  |    nan    |
|        sebotnet33ts_256         |  0  |    nan    |
+---------------------------------+-----+-----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |   nan    |
|      beit_base_patch16_224      | 64  |   nan    |
|          botnet26t_256          |  0  |   nan    |
|          cait_m36_384           |  2  |   nan    |
|         coat_lite_mini          | 128 |   nan    |
|           convit_base           | 32  |   nan    |
|        convmixer_768_32         | 32  |   nan    |
|          convnext_base          | 32  |   nan    |
|         crossvit_9_240          | 64  |   nan    |
|          cspdarknet53           | 64  |   nan    |
| deit_base_distilled_patch16_224 | 64  |   nan    |
|             dla102              | 64  |   nan    |
|           dm_nfnet_f0           | 128 |   nan    |
|             dpn107              | 32  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        ese_vovnet19b_dw         | 128 |   nan    |
|           fbnetc_100            | 128 |   nan    |
|            fbnetv3_b            | 128 |   nan    |
|            gernet_l             | 128 |   nan    |
|          ghostnet_100           | 128 |   nan    |
|       gluon_inception_v3        | 128 |   nan    |
|        gluon_xception65         | 32  |   nan    |
|          gmixer_24_224          | 64  |   nan    |
|          gmlp_s16_224           | 64  |   nan    |
|            hrnet_w18            |  2  |   nan    |
|          inception_v3           | 128 |   nan    |
|          jx_nest_base           | 32  |   nan    |
|            lcnet_050            | 128 |   nan    |
|          mixer_b16_224          | 64  |   nan    |
|            mixnet_l             | 64  |   nan    |
|           mnasnet_100           | 128 |   nan    |
|         mobilenetv2_100         | 128 |   nan    |
|      mobilenetv3_large_100      | 128 |   nan    |
|           mobilevit_s           | 32  |   nan    |
|            nfnet_l0             | 64  |   nan    |
|            pit_b_224            | 64  |   nan    |
|          pnasnet5large          | 16  |   nan    |
|         poolformer_m36          | 64  |   nan    |
|           regnety_002           | 128 |   nan    |
|            repvgg_a2            | 128 |   nan    |
|        res2net101_26w_4s        | 64  |   nan    |
|        res2net50_14w_8s         |  2  |   nan    |
|           res2next50            |  2  |   nan    |
|          resmlp_12_224          | 128 |   nan    |
|           resnest101e           | 32  |   nan    |
|           rexnet_100            | 128 |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
|           selecsls42b           | 128 |   nan    |
|          spnasnet_100           | 128 |   nan    |
|  swin_base_patch4_window7_224   | 64  |   nan    |
|     swsl_resnext101_32x16d      | 32  |   nan    |
|       tf_efficientnet_b0        | 128 |   nan    |
|           tf_mixnet_l           | 64  |   nan    |
|            tinynet_a            | 128 |   nan    |
|        tnt_s_patch16_224        | 64  |   nan    |
|        twins_pcpvt_base         | 32  |   nan    |
|         visformer_small         | 128 |   nan    |
|      vit_base_patch16_224       | 64  |   nan    |
|           volo_d1_224           | 64  |   nan    |
|      xcit_large_24_p8_224       |  5  |   nan    |
+---------------------------------+-----+----------+

jansel · 2022-10-13T04:31:19Z

This is inference? Or training?

blzheng · 2022-10-13T04:39:55Z

Inference.

blzheng · 2022-10-18T08:48:44Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 90%, 55/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.03x    |    1.12x    |    1.03x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    3.70    |    5.17     |    5.74     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.98x    |    0.94x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           timm_resnest            |  32  |  1.2382  |
|        shufflenet_v2_x1_0         |  64  |  1.2261  |
|            densenet121            |  64  |  1.134   |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1287  |
|           squeezenet1_1           |  16  |  1.1201  |
|             resnet50              |  32  |  1.1148  |
|         soft_actor_critic         | 256  |  1.1113  |
|              alexnet              | 128  |  1.1105  |
|            mnasnet1_0             |  32  |  1.0631  |
|               dlrm                | 2048 |  1.046   |
|                drq                |  1   |  1.0436  |
|             resnet18              |  8   |  1.036   |
|            Super_SloMo            |  6   |  1.0339  |
|            timm_regnet            |  32  |  1.0339  |
|            hf_T5_large            |  1   |  1.027   |
|            hf_BigBird             |  1   |  1.0227  |
|            hf_Reformer            |  1   |  1.0173  |
|               vgg16               |  4   |  1.0146  |
|          pytorch_stargan          |  16  |  1.013   |
|      resnet50_quantized_qat       |  32  |  1.0099  |
|    mobilenet_v2_quantized_qat     |  96  |  1.0007  |
|              demucs               |  1   |  0.9942  |
|            tts_angular            |  64  |  0.9912  |
|               hf_T5               |  1   |  0.9896  |
|       functorch_dp_cifar10        |  64  |  0.9846  |
|               dcgan               | 256  |  0.9822  |
|            timm_vovnet            |  32  |  0.9784  |
|           BERT_pytorch            |  2   |  0.9668  |
|          LearningToPaint          |  96  |  0.9624  |
|        Background_Matting         |  1   |  0.9616  |
|        mobilenet_v3_large         |  32  |  0.9279  |
|           pytorch_unet            |  1   |  0.9242  |
|           hf_Longformer           |  1   |  0.9203  |
|              hf_GPT2              |  1   |  0.9074  |
|           hf_GPT2_large           |  1   |  0.8962  |
|        speech_transformer         |  1   |  0.8892  |
|             hf_Albert             |  1   |  0.8793  |
|   timm_vision_transformer_large   |  8   |  0.8567  |
|            hf_T5_base             |  1   |  0.837   |
|           hf_DistilBert           |  1   |  0.8303  |
|      nvidia_deeprecommender       | 256  |  0.8219  |
|              hf_Bert              |  1   |  0.7961  |
|              yolov3               |  8   |  0.784   |
|              hf_Bart              |  1   |  0.7805  |
|           fastNLP_Bert            |  1   |  0.7779  |
|           mobilenet_v2            |  16  |  0.7575  |
|            timm_nfnet             | 128  |  0.7403  |
|          vision_maskrcnn          |  1   |  0.739   |
|          resnext50_32x4d          |  8   |  0.7271  |
|      timm_vision_transformer      |  8   |  0.7141  |
| attention_is_all_you_need_pytorch |  32  |  0.704   |
|         timm_efficientnet         |  64  |  0.6645  |
|           lennard_jones           | 1000 |  0.3621  |
|             tacotron2             |  0   |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|             tacotron2             |  0  |      0.0000      |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   | 19.5685  |
|          vision_maskrcnn          |  1   | 14.1261  |
|           hf_Longformer           |  1   |  10.801  |
|           hf_GPT2_large           |  1   | 10.5252  |
|   timm_vision_transformer_large   |  8   | 10.0167  |
|            hf_T5_base             |  1   |  9.5113  |
|            hf_BigBird             |  1   |  9.4953  |
|              yolov3               |  8   |  9.4608  |
|            timm_nfnet             | 128  |  8.7221  |
|            densenet121            |  64  |  8.1788  |
|            Super_SloMo            |  6   |  5.0844  |
|        speech_transformer         |  1   |  4.3803  |
|              hf_Bart              |  1   |  4.2628  |
|            timm_regnet            |  32  |  3.9241  |
| attention_is_all_you_need_pytorch |  32  |  3.9061  |
|               hf_T5               |  1   |  3.8853  |
|           fastNLP_Bert            |  1   |  3.8334  |
|         timm_efficientnet         |  64  |  3.6686  |
|           BERT_pytorch            |  2   |  3.6197  |
|              hf_Bert              |  1   |  3.3367  |
|              hf_GPT2              |  1   |  3.064   |
|        shufflenet_v2_x1_0         |  64  |  2.9784  |
|        Background_Matting         |  1   |  2.8748  |
|      timm_vision_transformer      |  8   |  2.8561  |
|        mobilenet_v3_large         |  32  |  2.7998  |
|           mobilenet_v2            |  16  |  2.7577  |
|            hf_Reformer            |  1   |  2.7114  |
|             hf_Albert             |  1   |  2.6651  |
|          resnext50_32x4d          |  8   |  2.6175  |
|            timm_vovnet            |  32  |  2.521   |
|            mnasnet1_0             |  32  |  2.4095  |
|             resnet50              |  32  |  2.3808  |
|           timm_resnest            |  32  |  1.8313  |
|           hf_DistilBert           |  1   |  1.7493  |
|       functorch_dp_cifar10        |  64  |  1.5456  |
|           pytorch_unet            |  1   |  1.4478  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.3776  |
|          pytorch_stargan          |  16  |  1.1897  |
|          LearningToPaint          |  96  |  1.0779  |
|             resnet18              |  8   |  0.9023  |
|           squeezenet1_1           |  16  |  0.7238  |
|              demucs               |  1   |  0.7217  |
|               dlrm                | 2048 |  0.6254  |
|               vgg16               |  4   |  0.4764  |
|            tts_angular            |  64  |  0.4255  |
|              alexnet              | 128  |  0.3327  |
|                drq                |  1   |  0.3197  |
|      nvidia_deeprecommender       | 256  |  0.3053  |
|         soft_actor_critic         | 256  |  0.1629  |
|           lennard_jones           | 1000 |  0.1179  |
|               dcgan               | 256  |  0.1104  |
|    mobilenet_v2_quantized_qat     |  96  | -0.0207  |
|      resnet50_quantized_qat       |  32  | -0.1267  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9985  |
|      nvidia_deeprecommender       | 256  |  0.9969  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9967  |
|            hf_T5_base             |  1   |  0.9947  |
|               vgg16               |  4   |  0.994   |
|           hf_GPT2_large           |  1   |  0.9939  |
|        Background_Matting         |  1   |  0.9934  |
|           pytorch_unet            |  1   |  0.9917  |
|            timm_nfnet             | 128  |  0.9916  |
|          LearningToPaint          |  96  |  0.9915  |
|              alexnet              | 128  |  0.991   |
|            hf_BigBird             |  1   |  0.9909  |
|            Super_SloMo            |  6   |  0.9886  |
|           lennard_jones           | 1000 |  0.9865  |
|         soft_actor_critic         | 256  |  0.9834  |
| attention_is_all_you_need_pytorch |  32  |  0.9832  |
|           hf_DistilBert           |  1   |  0.982   |
|            tts_angular            |  64  |  0.9816  |
|              hf_GPT2              |  1   |  0.9809  |
|           BERT_pytorch            |  2   |  0.9804  |
|                drq                |  1   |  0.9799  |
|              hf_Bart              |  1   |  0.9795  |
|           fastNLP_Bert            |  1   |  0.9793  |
|         timm_efficientnet         |  64  |  0.9791  |
|           mobilenet_v2            |  16  |  0.9776  |
|        speech_transformer         |  1   |  0.9734  |
|              hf_Bert              |  1   |  0.9733  |
|      resnet50_quantized_qat       |  32  |  0.972   |
|            timm_vovnet            |  32  |  0.972   |
|               hf_T5               |  1   |  0.9719  |
|   timm_vision_transformer_large   |  8   |  0.9702  |
|           hf_Longformer           |  1   |  0.9675  |
|        mobilenet_v3_large         |  32  |  0.9621  |
|          vision_maskrcnn          |  1   |  0.9553  |
|               dlrm                | 2048 |  0.9471  |
|             hf_Albert             |  1   |  0.9471  |
|             resnet50              |  32  |  0.9328  |
|           squeezenet1_1           |  16  |  0.931   |
|       functorch_dp_cifar10        |  64  |  0.9216  |
|      timm_vision_transformer      |  8   |  0.9087  |
|            hf_T5_large            |  1   |  0.9059  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.8997  |
|            mnasnet1_0             |  32  |  0.8658  |
|          resnext50_32x4d          |  8   |  0.8647  |
|        shufflenet_v2_x1_0         |  64  |  0.8498  |
|             resnet18              |  8   |  0.8451  |
|          pytorch_stargan          |  16  |  0.845   |
|            hf_Reformer            |  1   |  0.812   |
|              yolov3               |  8   |  0.7951  |
|            densenet121            |  64  |  0.7917  |
|           timm_resnest            |  32  |  0.7725  |
|            timm_regnet            |  32  |  0.7187  |
|               dcgan               | 256  |  0.6787  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  7.3631  |
|       MT5ForConditionalGeneration       | 2  |  1.8516  |
|             XGLMForCausalLM             | 1  |  1.3518  |
|     M2M100ForConditionalGeneration      | 2  |  1.3156  |
|               DistillGPT2               | 1  |  1.2913  |
|           ElectraForCausalLM            | 1  |  1.283   |
|          MobileBertForMaskedLM          | 16 |  1.2617  |
|               GoogleFnet                | 1  |  1.2474  |
|             OPTForCausalLM              | 4  |  1.2442  |
|            YituTechConvBert             | 1  |  1.1842  |
|     MobileBertForQuestionAnswering      | 32 |  1.1796  |
|                 BigBird                 | 1  |  1.0738  |
|          AllenaiLongformerBase          | 1  |  1.0663  |
|            AlbertForMaskedLM            | 2  |  1.0232  |
|       AlbertForQuestionAnswering        | 2  |  1.0177  |
|     PegasusForConditionalGeneration     | 4  |  0.9945  |
|         MegatronBertForCausalLM         | 2  |  0.9931  |
|                 T5Small                 | 1  |  0.9825  |
|           RobertaForCausalLM            | 4  |  0.9662  |
|           PegasusForCausalLM            | 8  |  0.9613  |
|            TrOCRForCausalLM             | 8  |  0.9557  |
|           DebertaForMaskedLM            | 4  |  0.9541  |
|     PLBartForConditionalGeneration      | 8  |  0.9521  |
|                CamemBert                | 1  |  0.947   |
|       DebertaForQuestionAnswering       | 4  |  0.9268  |
|            PLBartForCausalLM            | 16 |  0.9236  |
|      MBartForConditionalGeneration      | 8  |  0.9134  |
|      GPT2ForSequenceClassification      | 4  |  0.9105  |
|         Speech2Text2ForCausalLM         | 64 |  0.9003  |
|          DistilBertForMaskedLM          | 16 |  0.8957  |
|            MBartForCausalLM             | 16 |  0.8906  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8662  |
|       RobertaForQuestionAnswering       | 64 |  0.8525  |
|    LayoutLMForSequenceClassification    | 16 |  0.8496  |
|           LayoutLMForMaskedLM           | 16 |  0.846   |
|       T5ForConditionalGeneration        | 4  |  0.8412  |
|             BartForCausalLM             | 2  |  0.828   |
|             BertForMaskedLM             | 64 |  0.8275  |
|        BertForQuestionAnswering         | 64 |  0.8246  |
|     DistilBertForQuestionAnswering      | 32 |  0.8142  |
|       ElectraForQuestionAnswering       | 64 |  0.7925  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.788   |
|      BartForConditionalGeneration       | 1  |  0.7823  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7666  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 16 | 12.2183  |
|     MobileBertForQuestionAnswering      | 32 | 11.9702  |
|          AllenaiLongformerBase          | 1  | 11.5861  |
|                 BigBird                 | 1  |  9.5059  |
|      BartForConditionalGeneration       | 1  |  8.9487  |
|      MBartForConditionalGeneration      | 8  |  8.8587  |
|     PegasusForConditionalGeneration     | 4  |  8.4913  |
|           DebertaForMaskedLM            | 4  |  8.4733  |
|       DebertaForQuestionAnswering       | 4  |  8.4294  |
|     M2M100ForConditionalGeneration      | 2  |  7.602   |
|    MegatronBertForQuestionAnswering     | 8  |  7.4598  |
|         MegatronBertForCausalLM         | 2  |  6.4697  |
|             XGLMForCausalLM             | 1  |  6.4641  |
| BlenderbotSmallForConditionalGeneration | 32 |   5.96   |
|            YituTechConvBert             | 1  |  5.2836  |
|       T5ForConditionalGeneration        | 4  |  4.8153  |
|           LayoutLMForMaskedLM           | 16 |  4.5577  |
|       ElectraForQuestionAnswering       | 64 |  4.5315  |
|     PLBartForConditionalGeneration      | 8  |  4.4101  |
|             BertForMaskedLM             | 64 |  4.3663  |
|    LayoutLMForSequenceClassification    | 16 |  4.2891  |
|        BertForQuestionAnswering         | 64 |  4.1585  |
|       RobertaForQuestionAnswering       | 64 |  4.1072  |
|                 T5Small                 | 1  |  3.8782  |
|       MT5ForConditionalGeneration       | 2  |  3.8328  |
|           RobertaForCausalLM            | 4  |  3.8004  |
|             BartForCausalLM             | 2  |  3.7592  |
|      GPT2ForSequenceClassification      | 4  |  3.7104  |
|            MBartForCausalLM             | 16 |  3.6174  |
|           PegasusForCausalLM            | 8  |  3.5072  |
|                CamemBert                | 1  |  3.3432  |
|           ElectraForCausalLM            | 1  |  3.3289  |
|             OPTForCausalLM              | 4  |  3.3105  |
|            TrOCRForCausalLM             | 8  |  3.234   |
|       AlbertForQuestionAnswering        | 2  |  2.935   |
|            AlbertForMaskedLM            | 2  |  2.8877  |
|       BlenderbotSmallForCausalLM        | 64 |  2.7639  |
|               GoogleFnet                | 1  |  2.0909  |
|          DistilBertForMaskedLM          | 16 |  1.9801  |
|     DistilBertForQuestionAnswering      | 32 |  1.9167  |
|            PLBartForCausalLM            | 16 |  1.9065  |
|         Speech2Text2ForCausalLM         | 64 |  1.8383  |
|               DistillGPT2               | 1  |  1.6408  |
|            XLNetLMHeadModel             | 4  | -16.2899 |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |  0.9966  |
|       ElectraForQuestionAnswering       | 64 |  0.9966  |
|       AlbertForQuestionAnswering        | 2  |  0.9964  |
|           LayoutLMForMaskedLM           | 16 |  0.9961  |
|       BlenderbotSmallForCausalLM        | 64 |  0.996   |
|    LayoutLMForSequenceClassification    | 16 |  0.9956  |
|      GPT2ForSequenceClassification      | 4  |  0.9956  |
|             BertForMaskedLM             | 64 |  0.9955  |
|        BertForQuestionAnswering         | 64 |  0.9942  |
|       RobertaForQuestionAnswering       | 64 |  0.9942  |
|           DebertaForMaskedLM            | 4  |  0.9939  |
|       T5ForConditionalGeneration        | 4  |  0.9938  |
|             BartForCausalLM             | 2  |  0.9934  |
|       DebertaForQuestionAnswering       | 4  |  0.9927  |
|                 BigBird                 | 1  |  0.9924  |
|            XLNetLMHeadModel             | 4  |  0.9916  |
|            MBartForCausalLM             | 16 |  0.9915  |
|            PLBartForCausalLM            | 16 |  0.9902  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9902  |
|      BartForConditionalGeneration       | 1  |  0.9901  |
|           PegasusForCausalLM            | 8  |  0.9896  |
|     PegasusForConditionalGeneration     | 4  |  0.9887  |
|               GoogleFnet                | 1  |  0.9885  |
|         Speech2Text2ForCausalLM         | 64 |  0.988   |
|       MT5ForConditionalGeneration       | 2  |  0.9876  |
|          DistilBertForMaskedLM          | 16 |  0.9871  |
|      MBartForConditionalGeneration      | 8  |  0.9865  |
|            TrOCRForCausalLM             | 8  |  0.986   |
|     M2M100ForConditionalGeneration      | 2  |  0.9859  |
|               DistillGPT2               | 1  |  0.9851  |
|    MegatronBertForQuestionAnswering     | 8  |  0.985   |
|             XGLMForCausalLM             | 1  |  0.9834  |
|             OPTForCausalLM              | 4  |  0.9829  |
|     PLBartForConditionalGeneration      | 8  |  0.9812  |
|          AllenaiLongformerBase          | 1  |  0.9789  |
|                CamemBert                | 1  |  0.9726  |
|                 T5Small                 | 1  |  0.9719  |
|           RobertaForCausalLM            | 4  |  0.9716  |
|            YituTechConvBert             | 1  |  0.9622  |
|         MegatronBertForCausalLM         | 2  |  0.9588  |
|           ElectraForCausalLM            | 1  |  0.944   |
|          MobileBertForMaskedLM          | 16 |  0.9269  |
|     DistilBertForQuestionAnswering      | 32 |  0.9081  |
|     MobileBertForQuestionAnswering      | 32 |  0.8899  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          inception_v3           | 128 |  1.2067  |
|        res2net101_26w_4s        | 64  |  1.189   |
|        adv_inception_v3         | 128 |  1.1879  |
|       gluon_inception_v3        | 128 |  1.1847  |
|           mnasnet_100           | 128 |  1.1736  |
|          spnasnet_100           | 128 |  1.1705  |
|        convmixer_768_32         | 32  |  1.1569  |
|          ghostnet_100           | 128 |  1.1505  |
|           fbnetc_100            | 128 |  1.134   |
|           resnest101e           | 32  |  1.1296  |
|             dpn107              | 32  |  1.1117  |
|           volo_d1_224           | 64  |  1.1116  |
|            gernet_l             | 128 |  1.1023  |
|          pnasnet5large          | 16  |  1.0403  |
|           selecsls42b           | 128 |  1.0238  |
|      mobilenetv3_large_100      | 128 |  1.0237  |
|        res2net50_14w_8s         |  2  |  1.0053  |
|            repvgg_a2            | 128 |  1.0013  |
|          gmixer_24_224          | 64  |  0.9843  |
|         poolformer_m36          | 64  |  0.9821  |
|            hrnet_w18            |  2  |  0.9727  |
|         mobilenetv2_100         | 128 |  0.9628  |
|             dla102              | 64  |  0.951   |
|      xcit_large_24_p8_224       |  5  |  0.9447  |
|            fbnetv3_b            | 128 |   0.93   |
|           tf_mixnet_l           | 64  |  0.9279  |
|          cait_m36_384           |  2  |  0.9264  |
|        ese_vovnet19b_dw         | 128 |  0.9251  |
|           regnety_002           | 128 |  0.9222  |
|            nfnet_l0             | 64  |  0.9063  |
|            mixnet_l             | 64  |  0.8793  |
|     swsl_resnext101_32x16d      | 32  |  0.8755  |
|      beit_base_patch16_224      | 64  |  0.8658  |
|            lcnet_050            | 128 |  0.8585  |
|  swin_base_patch4_window7_224   | 64  |  0.8467  |
| deit_base_distilled_patch16_224 | 64  |  0.8454  |
|      vit_base_patch16_224       | 64  |  0.8451  |
|           convit_base           | 32  |  0.8401  |
|           dm_nfnet_f0           | 128 |  0.8236  |
|        tnt_s_patch16_224        | 64  |  0.8202  |
|          resmlp_12_224          | 128 |  0.8106  |
|          mixer_b16_224          | 64  |  0.809   |
|          jx_nest_base           | 32  |  0.8065  |
|           rexnet_100            | 128 |   0.8    |
|         visformer_small         | 128 |  0.7974  |
|          convnext_base          | 32  |  0.7851  |
|            pit_b_224            | 64  |  0.7825  |
|           res2next50            |  2  |  0.7694  |
|         coat_lite_mini          | 128 |  0.736   |
|        twins_pcpvt_base         | 32  |  0.7297  |
|            tinynet_a            | 128 |  0.7163  |
|       tf_efficientnet_b0        | 128 |  0.7025  |
|        gluon_xception65         | 32  |  0.6939  |
|         crossvit_9_240          | 64  |  0.6874  |
|          cspdarknet53           | 64  |  0.6632  |
|          gmlp_s16_224           | 64  |  0.553   |
|           mobilevit_s           | 32  |  0.4792  |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|            hrnet_w18            |  2  | 17.4435  |
|          pnasnet5large          | 16  | 15.8176  |
|           mobilevit_s           | 32  | 11.8116  |
|      xcit_large_24_p8_224       |  5  | 11.6298  |
|  swin_base_patch4_window7_224   | 64  | 11.2435  |
|          cait_m36_384           |  2  |  11.183  |
|         poolformer_m36          | 64  |  9.7591  |
|           dm_nfnet_f0           | 128 |  9.0595  |
|        twins_pcpvt_base         | 32  |  9.0463  |
|          jx_nest_base           | 32  |  8.8532  |
|        res2net101_26w_4s        | 64  |  8.7024  |
|           tf_mixnet_l           | 64  |  8.1885  |
|        res2net50_14w_8s         |  2  |  8.0084  |
|        gluon_xception65         | 32  |  7.7414  |
|            mixnet_l             | 64  |  7.7377  |
|        tnt_s_patch16_224        | 64  |  7.3083  |
|             dpn107              | 32  |  7.2266  |
|          gmlp_s16_224           | 64  |  7.1444  |
|            fbnetv3_b            | 128 |  6.8568  |
|           volo_d1_224           | 64  |  6.0621  |
|          cspdarknet53           | 64  |  5.9534  |
|             dla102              | 64  |  5.8048  |
|          convnext_base          | 32  |  5.558   |
|         crossvit_9_240          | 64  |  5.5577  |
|           rexnet_100            | 128 |  5.1342  |
|            nfnet_l0             | 64  |  5.0545  |
|       tf_efficientnet_b0        | 128 |  4.8415  |
|     swsl_resnext101_32x16d      | 32  |  4.7578  |
|            pit_b_224            | 64  |  4.7423  |
|           res2next50            |  2  |  4.6471  |
|          inception_v3           | 128 |  4.617   |
|            tinynet_a            | 128 |  4.4623  |
|          ghostnet_100           | 128 |  4.3717  |
|        adv_inception_v3         | 128 |  4.248   |
|       gluon_inception_v3        | 128 |  4.2008  |
|      beit_base_patch16_224      | 64  |  4.1799  |
|         coat_lite_mini          | 128 |  4.0925  |
|           convit_base           | 32  |  4.0284  |
|           fbnetc_100            | 128 |  3.6995  |
| deit_base_distilled_patch16_224 | 64  |  3.6688  |
|          gmixer_24_224          | 64  |  3.5157  |
|          spnasnet_100           | 128 |  3.5142  |
|         visformer_small         | 128 |  3.4201  |
|          mixer_b16_224          | 64  |  3.2746  |
|            repvgg_a2            | 128 |  3.2501  |
|      vit_base_patch16_224       | 64  |  3.0714  |
|           regnety_002           | 128 |  3.0702  |
|            gernet_l             | 128 |  3.0452  |
|      mobilenetv3_large_100      | 128 |  3.0294  |
|           mnasnet_100           | 128 |  2.8618  |
|         mobilenetv2_100         | 128 |  2.7368  |
|        ese_vovnet19b_dw         | 128 |  2.3969  |
|          resmlp_12_224          | 128 |  2.1207  |
|            lcnet_050            | 128 |  1.9732  |
|        convmixer_768_32         | 32  |  1.9472  |
|           selecsls42b           | 128 |  1.8057  |
|           resnest101e           | 32  |  1.7945  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9974  |
|           dm_nfnet_f0           | 128 |  0.9972  |
|      beit_base_patch16_224      | 64  |  0.9967  |
| deit_base_distilled_patch16_224 | 64  |  0.9965  |
|          mixer_b16_224          | 64  |  0.9963  |
|            gernet_l             | 128 |  0.9963  |
|            fbnetv3_b            | 128 |  0.9962  |
|            pit_b_224            | 64  |  0.9962  |
|            nfnet_l0             | 64  |  0.9962  |
|          cspdarknet53           | 64  |  0.9961  |
|        gluon_xception65         | 32  |  0.9955  |
|          resmlp_12_224          | 128 |  0.9951  |
|          inception_v3           | 128 |  0.9945  |
|            repvgg_a2            | 128 |  0.9943  |
|          convnext_base          | 32  |  0.9942  |
|           rexnet_100            | 128 |  0.9942  |
|         poolformer_m36          | 64  |  0.9937  |
|             dla102              | 64  |  0.9929  |
|           tf_mixnet_l           | 64  |  0.9928  |
|           mobilevit_s           | 32  |  0.9926  |
|          pnasnet5large          | 16  |  0.9926  |
|           fbnetc_100            | 128 |  0.992   |
|       tf_efficientnet_b0        | 128 |  0.9919  |
|           selecsls42b           | 128 |  0.9914  |
|           volo_d1_224           | 64  |  0.9913  |
|     swsl_resnext101_32x16d      | 32  |  0.9912  |
|            mixnet_l             | 64  |  0.9895  |
|          jx_nest_base           | 32  |  0.9894  |
|      xcit_large_24_p8_224       |  5  |  0.9892  |
|          cait_m36_384           |  2  |  0.9883  |
|        res2net101_26w_4s        | 64  |  0.9867  |
|        adv_inception_v3         | 128 |  0.9863  |
|       gluon_inception_v3        | 128 |  0.9855  |
|           mnasnet_100           | 128 |  0.9852  |
|          spnasnet_100           | 128 |  0.9852  |
|          ghostnet_100           | 128 |  0.9845  |
|            lcnet_050            | 128 |  0.9831  |
|           regnety_002           | 128 |  0.9808  |
|            tinynet_a            | 128 |  0.9741  |
|        convmixer_768_32         | 32  |  0.9715  |
|           res2next50            |  2  |  0.9548  |
|        res2net50_14w_8s         |  2  |  0.9372  |
|          gmlp_s16_224           | 64  |  0.9283  |
|          gmixer_24_224          | 64  |  0.9094  |
|        twins_pcpvt_base         | 32  |  0.9033  |
|  swin_base_patch4_window7_224   | 64  |  0.9009  |
|           resnest101e           | 32  |  0.8544  |
|      mobilenetv3_large_100      | 128 |  0.8465  |
|           convit_base           | 32  |  0.8249  |
|            hrnet_w18            |  2  |  0.8205  |
|         coat_lite_mini          | 128 |  0.8071  |
|         mobilenetv2_100         | 128 |  0.805   |
|      vit_base_patch16_224       | 64  |  0.8042  |
|         visformer_small         | 128 |  0.7898  |
|             dpn107              | 32  |  0.7697  |
|         crossvit_9_240          | 64  |  0.7352  |
|        tnt_s_patch16_224        | 64  |  0.6241  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

ESI-SYD · 2022-10-27T03:40:11Z

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor | 91%, 49/54 |  95%, 42/44  |  90%, 55/61 |  
+----------+------------+--------------+-------------+

Geometric mean speedup

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   1.03x    |    1.00x     |    1.01x    |  
+----------+------------+--------------+-------------+

Mean compilation time (seconds)

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   5.47     |     7.35     |    5.69     |  
+----------+------------+--------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+--------------+-------------+
| Compiler | torchbench | huggingface  | timm_models |
+----------+------------+--------------+-------------+
| inductor |   0.97x    |    0.98x     |     0.95x   |  
+----------+------------+--------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|        shufflenet_v2_x1_0         |  1  |  1.3718  |
|           squeezenet1_1           |  1  |  1.3473  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1701  |
|           timm_resnest            |  1  |  1.162   |
|             resnet18              |  1  |  1.121   |
| attention_is_all_you_need_pytorch |  1  |  1.0937  |
|              alexnet              |  1  |  1.0816  |
|               vgg16               |  1  |  1.0685  |
|            Super_SloMo            |  1  |  1.055   |
|                drq                |  1  |  1.0498  |
|         soft_actor_critic         | 256 |  1.0109  |
|               dcgan               |  1  |  1.007   |
|              demucs               |  1  |  1.0053  |
|               dlrm                |  1  |  1.0047  |
|          pytorch_stargan          | 16  |  0.9986  |
|            tts_angular            |  1  |  0.9985  |
|          LearningToPaint          |  1  |  0.9972  |
|      resnet50_quantized_qat       |  1  |  0.9955  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9899  |
|             resnet50              |  1  |  0.9861  |
|            hf_BigBird             |  1  |  0.9834  |
|      nvidia_deeprecommender       |  1  |  0.9787  |
|            timm_nfnet             |  1  |  0.9738  |
|           pytorch_unet            |  1  |  0.9646  |
|        Background_Matting         |  1  |  0.951   |
|            timm_vovnet            |  1  |  0.9484  |
|            timm_regnet            |  1  |  0.9359  |
|        speech_transformer         |  1  |  0.9162  |
|          resnext50_32x4d          |  1  |  0.8816  |
|           hf_Longformer           |  1  |  0.8638  |
|              hf_GPT2              |  1  |  0.8593  |
|            mnasnet1_0             |  1  |  0.8526  |
|            hf_Reformer            |  1  |  0.8378  |
|          vision_maskrcnn          |  1  |  0.8344  |
|              yolov3               |  1  |  0.8325  |
|           BERT_pytorch            |  1  |  0.7956  |
|            hf_T5_large            |  1  |  0.7947  |
|   timm_vision_transformer_large   |  1  |  0.7892  |
|             hf_Albert             |  1  |  0.7764  |
|        mobilenet_v3_large         |  1  |  0.7747  |
|            densenet121            |  1  |  0.7624  |
|           lennard_jones           |  1  |  0.7559  |
|               hf_T5               |  1  |  0.7424  |
|           hf_GPT2_large           |  1  |  0.7353  |
|           hf_DistilBert           |  1  |  0.7248  |
|              hf_Bert              |  1  |  0.6891  |
|              hf_Bart              |  1  |  0.6826  |
|           fastNLP_Bert            |  1  |  0.6784  |
|            hf_T5_base             |  1  |  0.6702  |
|           mobilenet_v2            |  1  |  0.6445  |
|      timm_vision_transformer      |  1  |  0.5835  |
|       functorch_dp_cifar10        |  1  |  0.5038  |
|         timm_efficientnet         |  1  |  0.4167  |
|     detectron2_fcos_r_50_fpn      |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|           BERT_pytorch            |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|              yolov3               |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|           hf_DistilBert           |  0  |      0.0000      |
|              hf_GPT2              |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 50.6758  |
|           hf_GPT2_large           |  1  |  34.583  |
|            hf_T5_large            |  1  | 28.4916  |
|          vision_maskrcnn          |  1  | 18.0325  |
|   timm_vision_transformer_large   |  1  | 13.3261  |
|           hf_Longformer           |  1  | 12.2457  |
|            hf_BigBird             |  1  | 10.1571  |
|              yolov3               |  1  |  9.6826  |
|            densenet121            |  1  |  7.9872  |
|              hf_Bart              |  1  |  6.8012  |
|            timm_nfnet             |  1  |  6.7634  |
|           fastNLP_Bert            |  1  |  6.5982  |
|              hf_Bert              |  1  |  5.2685  |
|               hf_T5               |  1  |  5.0497  |
|        speech_transformer         |  1  |  4.7929  |
|        Background_Matting         |  1  |  4.684   |
|            timm_regnet            |  1  |  4.2927  |
|            Super_SloMo            |  1  |  4.2702  |
|             hf_Albert             |  1  |  3.9343  |
|            hf_Reformer            |  1  |  3.7558  |
|           BERT_pytorch            |  1  |  3.7028  |
|        shufflenet_v2_x1_0         |  1  |  3.6925  |
|              hf_GPT2              |  1  |  3.5584  |
|         timm_efficientnet         |  1  |  3.4965  |
| attention_is_all_you_need_pytorch |  1  |  3.3817  |
|             resnet50              |  1  |  2.8523  |
|      timm_vision_transformer      |  1  |  2.8507  |
|          resnext50_32x4d          |  1  |  2.8399  |
|            mnasnet1_0             |  1  |  2.7662  |
|        mobilenet_v3_large         |  1  |  2.7274  |
|           pytorch_unet            |  1  |  2.7155  |
|           mobilenet_v2            |  1  |  2.6687  |
|            timm_vovnet            |  1  |  2.546   |
|           hf_DistilBert           |  1  |  2.4877  |
|           timm_resnest            |  1  |  2.2898  |
|       functorch_dp_cifar10        |  1  |  1.917   |
|          pytorch_stargan          | 16  |  1.3333  |
|          LearningToPaint          |  1  |  1.0428  |
|             resnet18              |  1  |  0.9963  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.8937  |
|               dlrm                |  1  |  0.8525  |
|              demucs               |  1  |  0.816   |
|           squeezenet1_1           |  1  |  0.6268  |
|            tts_angular            |  1  |  0.5151  |
|               vgg16               |  1  |  0.3717  |
|              alexnet              |  1  |  0.3331  |
|                drq                |  1  |  0.3052  |
|               dcgan               |  1  |  0.298   |
|      nvidia_deeprecommender       |  1  |  0.2661  |
|         soft_actor_critic         | 256 |  0.2076  |
|           lennard_jones           |  1  |  0.1296  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0764  |
|      resnet50_quantized_qat       |  1  |  0.0556  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|              demucs               |  1  |  0.999   |
|      nvidia_deeprecommender       |  1  |  0.9966  |
|      resnet50_quantized_qat       |  1  |  0.9953  |
|               vgg16               |  1  |  0.9952  |
|            hf_T5_base             |  1  |  0.9948  |
|        Background_Matting         |  1  |  0.9946  |
|    mobilenet_v2_quantized_qat     |  1  |  0.993   |
|           hf_GPT2_large           |  1  |  0.9927  |
|              alexnet              |  1  |  0.9917  |
|            hf_BigBird             |  1  |  0.9915  |
|           pytorch_unet            |  1  |  0.9913  |
|         soft_actor_critic         | 256 |  0.9909  |
|          pytorch_stargan          | 16  |  0.9906  |
|          LearningToPaint          |  1  |  0.9905  |
|   timm_vision_transformer_large   |  1  |  0.9901  |
|            tts_angular            |  1  |  0.9876  |
|           lennard_jones           |  1  |  0.9871  |
|                drq                |  1  |  0.9868  |
|               dcgan               |  1  |  0.9846  |
| attention_is_all_you_need_pytorch |  1  |  0.9835  |
|              hf_GPT2              |  1  |  0.9834  |
|           fastNLP_Bert            |  1  |  0.9824  |
|            hf_T5_large            |  1  |  0.9816  |
|           hf_DistilBert           |  1  |  0.9815  |
|              hf_Bert              |  1  |  0.9815  |
|           BERT_pytorch            |  1  |  0.9808  |
|        speech_transformer         |  1  |  0.9796  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9788  |
|             hf_Albert             |  1  |  0.9783  |
|               hf_T5               |  1  |  0.9752  |
|             resnet18              |  1  |  0.9752  |
|          vision_maskrcnn          |  1  |  0.975   |
|              hf_Bart              |  1  |  0.973   |
|            Super_SloMo            |  1  |  0.973   |
|           hf_Longformer           |  1  |  0.9685  |
|           timm_resnest            |  1  |  0.968   |
|            timm_vovnet            |  1  |  0.9678  |
|            timm_regnet            |  1  |  0.9662  |
|          resnext50_32x4d          |  1  |  0.9637  |
|             resnet50              |  1  |  0.9636  |
|           squeezenet1_1           |  1  |  0.9626  |
|            timm_nfnet             |  1  |  0.9595  |
|      timm_vision_transformer      |  1  |  0.9577  |
|              yolov3               |  1  |  0.9573  |
|               dlrm                |  1  |  0.9548  |
|            mnasnet1_0             |  1  |  0.9511  |
|        mobilenet_v3_large         |  1  |  0.9489  |
|           mobilenet_v2            |  1  |  0.9488  |
|       functorch_dp_cifar10        |  1  |  0.9474  |
|            hf_Reformer            |  1  |  0.9454  |
|        shufflenet_v2_x1_0         |  1  |  0.9432  |
|         timm_efficientnet         |  1  |  0.943   |
|            densenet121            |  1  |  0.8713  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
+-----------------------------------+-----+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0345  |
|            XLNetLMHeadModel             | 1  |  0.9686  |
|            AlbertForMaskedLM            | 1  |  0.9365  |
|       AlbertForQuestionAnswering        | 1  |  0.932   |
|          MobileBertForMaskedLM          | 1  |  0.8867  |
|                 BigBird                 | 1  |  0.8861  |
|     M2M100ForConditionalGeneration      | 1  |  0.8828  |
|             OPTForCausalLM              | 1  |  0.8575  |
|               GoogleFnet                | 1  |  0.8302  |
|       DebertaForQuestionAnswering       | 1  |  0.7953  |
|          AllenaiLongformerBase          | 1  |  0.792   |
|         MegatronBertForCausalLM         | 1  |  0.7858  |
|            YituTechConvBert             | 1  |  0.7855  |
|    MegatronBertForQuestionAnswering     | 1  |  0.7855  |
|     PegasusForConditionalGeneration     | 1  |  0.7852  |
|         Speech2Text2ForCausalLM         | 1  |  0.7839  |
|      MBartForConditionalGeneration      | 1  |  0.7785  |
|            TrOCRForCausalLM             | 1  |  0.7713  |
|           PegasusForCausalLM            | 1  |  0.7688  |
|            MBartForCausalLM             | 1  |  0.7678  |
|             XGLMForCausalLM             | 1  |  0.7664  |
|       RobertaForQuestionAnswering       | 1  |  0.7628  |
|           DebertaForMaskedLM            | 1  |  0.7606  |
|     DistilBertForQuestionAnswering      | 1  |  0.7468  |
|           RobertaForCausalLM            | 1  |  0.7448  |
|     PLBartForConditionalGeneration      | 1  |  0.7417  |
|          DistilBertForMaskedLM          | 1  |  0.7257  |
|            PLBartForCausalLM            | 1  |  0.7234  |
|               DistillGPT2               | 1  |  0.7063  |
|           LayoutLMForMaskedLM           | 1  |  0.6533  |
|    LayoutLMForSequenceClassification    | 1  |  0.6507  |
|             BartForCausalLM             | 1  |  0.6452  |
|      GPT2ForSequenceClassification      | 1  |  0.645   |
|                CamemBert                | 1  |  0.6426  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.6426  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6418  |
|       T5ForConditionalGeneration        | 1  |  0.6012  |
|                 T5Small                 | 1  |  0.6006  |
|      BartForConditionalGeneration       | 1  |  0.592   |
|       ElectraForQuestionAnswering       | 1  |  0.4796  |
|           ElectraForCausalLM            | 1  |  0.4554  |
|       MT5ForConditionalGeneration       | 1  |  0.4265  |
|             BertForMaskedLM             | 0  |   0.0    |
|        BertForQuestionAnswering         | 0  |   0.0    |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 31.7606  |
|          AllenaiLongformerBase          | 1  | 15.2198  |
|             BartForCausalLM             | 1  | 12.4113  |
|     MobileBertForQuestionAnswering      | 1  | 11.9979  |
|          MobileBertForMaskedLM          | 1  | 11.9653  |
|                 BigBird                 | 1  | 11.1688  |
|            XLNetLMHeadModel             | 1  | 10.6341  |
|           DebertaForMaskedLM            | 1  | 10.3189  |
|      MBartForConditionalGeneration      | 1  |  9.6144  |
|       DebertaForQuestionAnswering       | 1  |  9.5465  |
|     PegasusForConditionalGeneration     | 1  |  9.3128  |
|                 T5Small                 | 1  |  9.1569  |
|       T5ForConditionalGeneration        | 1  |  9.1361  |
|             XGLMForCausalLM             | 1  |  8.3606  |
|     M2M100ForConditionalGeneration      | 1  |  8.2767  |
|            AlbertForMaskedLM            | 1  |  8.2582  |
|      GPT2ForSequenceClassification      | 1  |  8.1982  |
|       AlbertForQuestionAnswering        | 1  |  8.1664  |
|         MegatronBertForCausalLM         | 1  |  7.8404  |
|    MegatronBertForQuestionAnswering     | 1  |  7.6892  |
|       MT5ForConditionalGeneration       | 1  |  7.571   |
|            YituTechConvBert             | 1  |   6.62   |
|                CamemBert                | 1  |  6.0292  |
| BlenderbotSmallForConditionalGeneration | 1  |  5.8922  |
|           LayoutLMForMaskedLM           | 1  |  5.7682  |
|    LayoutLMForSequenceClassification    | 1  |  5.3124  |
|     PLBartForConditionalGeneration      | 1  |  4.6582  |
|           ElectraForCausalLM            | 1  |  4.6399  |
|       ElectraForQuestionAnswering       | 1  |  4.2141  |
|           PegasusForCausalLM            | 1  |  4.0619  |
|            MBartForCausalLM             | 1  |  4.0019  |
|            TrOCRForCausalLM             | 1  |  3.8745  |
|           RobertaForCausalLM            | 1  |  3.7961  |
|       RobertaForQuestionAnswering       | 1  |  3.6619  |
|             OPTForCausalLM              | 1  |  3.2093  |
|               DistillGPT2               | 1  |  3.0738  |
|               GoogleFnet                | 1  |  2.7978  |
|       BlenderbotSmallForCausalLM        | 1  |  2.5215  |
|         Speech2Text2ForCausalLM         | 1  |   2.15   |
|            PLBartForCausalLM            | 1  |  1.9917  |
|          DistilBertForMaskedLM          | 1  |  1.9543  |
|     DistilBertForQuestionAnswering      | 1  |  1.9133  |
|             BertForMaskedLM             | 0  |   nan    |
|        BertForQuestionAnswering         | 0  |   nan    |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  0.9949  |
|       AlbertForQuestionAnswering        | 1  |  0.9948  |
|                 BigBird                 | 1  |  0.9932  |
|               GoogleFnet                | 1  |  0.9915  |
|      BartForConditionalGeneration       | 1  |  0.9905  |
|      GPT2ForSequenceClassification      | 1  |  0.9904  |
|           DebertaForMaskedLM            | 1  |  0.9886  |
|     PegasusForConditionalGeneration     | 1  |  0.9885  |
|       DebertaForQuestionAnswering       | 1  |  0.9879  |
|            TrOCRForCausalLM             | 1  |  0.9876  |
|     M2M100ForConditionalGeneration      | 1  |  0.9874  |
|            MBartForCausalLM             | 1  |  0.9858  |
|       MT5ForConditionalGeneration       | 1  |  0.985   |
|               DistillGPT2               | 1  |  0.9846  |
|                 T5Small                 | 1  |  0.9841  |
|       T5ForConditionalGeneration        | 1  |  0.9841  |
|             XGLMForCausalLM             | 1  |  0.9831  |
|           PegasusForCausalLM            | 1  |  0.9827  |
|            XLNetLMHeadModel             | 1  |  0.9824  |
|    LayoutLMForSequenceClassification    | 1  |  0.9821  |
|           LayoutLMForMaskedLM           | 1  |  0.9818  |
|     DistilBertForQuestionAnswering      | 1  |  0.9817  |
|          DistilBertForMaskedLM          | 1  |  0.9817  |
|         MegatronBertForCausalLM         | 1  |  0.9815  |
|                CamemBert                | 1  |  0.9815  |
|    MegatronBertForQuestionAnswering     | 1  |  0.9811  |
|           RobertaForCausalLM            | 1  |  0.9794  |
|       RobertaForQuestionAnswering       | 1  |  0.9792  |
|          AllenaiLongformerBase          | 1  |  0.979   |
|      MBartForConditionalGeneration      | 1  |  0.9778  |
|             OPTForCausalLM              | 1  |  0.9769  |
|            YituTechConvBert             | 1  |  0.976   |
|     PLBartForConditionalGeneration      | 1  |  0.974   |
|       BlenderbotSmallForCausalLM        | 1  |  0.9736  |
|         Speech2Text2ForCausalLM         | 1  |  0.9733  |
|            PLBartForCausalLM            | 1  |  0.9681  |
|           ElectraForCausalLM            | 1  |  0.9627  |
|             BartForCausalLM             | 1  |  0.9607  |
|       ElectraForQuestionAnswering       | 1  |  0.958   |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9569  |
|          MobileBertForMaskedLM          | 1  |  0.9308  |
|     MobileBertForQuestionAnswering      | 1  |  0.9129  |
|             BertForMaskedLM             | 0  |   nan    |
|        BertForQuestionAnswering         | 0  |   nan    |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.1504  |
|          inception_v3           | 1  |  1.1191  |
|       gluon_inception_v3        | 1  |  1.0991  |
|        adv_inception_v3         | 1  |  1.098   |
|        res2net101_26w_4s        | 1  |  1.0485  |
|          ghostnet_100           | 1  |  1.039   |
|        res2net50_14w_8s         | 1  |  1.0334  |
|           resnest101e           | 1  |  1.0113  |
|            nfnet_l0             | 1  |  1.0066  |
|            hrnet_w18            | 1  |  1.0041  |
|           regnety_002           | 1  |  0.9546  |
|            repvgg_a2            | 1  |  0.9453  |
|           selecsls42b           | 1  |  0.9205  |
|           dm_nfnet_f0           | 1  |  0.9179  |
|             dpn107              | 1  |  0.9178  |
|          spnasnet_100           | 1  |  0.9163  |
|           mnasnet_100           | 1  |  0.9118  |
|           res2next50            | 1  |  0.8947  |
|            gernet_l             | 1  |  0.8872  |
|     swsl_resnext101_32x16d      | 1  |  0.8771  |
|          gmixer_24_224          | 1  |  0.8391  |
|      xcit_large_24_p8_224       | 1  |  0.8366  |
|        convmixer_768_32         | 1  |  0.8288  |
|      beit_base_patch16_224      | 1  |  0.8284  |
|             dla102              | 1  |  0.8054  |
|      mobilenetv3_large_100      | 1  |  0.8004  |
|        gluon_xception65         | 1  |  0.7888  |
|           fbnetc_100            | 1  |  0.7678  |
| deit_base_distilled_patch16_224 | 1  |  0.7665  |
|      vit_base_patch16_224       | 1  |  0.7505  |
|            fbnetv3_b            | 1  |  0.7415  |
|           volo_d1_224           | 1  |  0.7371  |
|           convit_base           | 1  |  0.7311  |
|  swin_base_patch4_window7_224   | 1  |  0.7262  |
|          cait_m36_384           | 1  |  0.7251  |
|          mixer_b16_224          | 1  |  0.7156  |
|            lcnet_050            | 1  |  0.7139  |
|         poolformer_m36          | 1  |  0.6751  |
|         mobilenetv2_100         | 1  |  0.6626  |
|          cspdarknet53           | 1  |  0.6616  |
|          resmlp_12_224          | 1  |   0.66   |
|            pit_b_224            | 1  |  0.6482  |
|          convnext_base          | 1  |  0.6364  |
|        tnt_s_patch16_224        | 1  |  0.6352  |
|        twins_pcpvt_base         | 1  |  0.6059  |
|          jx_nest_base           | 1  |  0.6047  |
|         visformer_small         | 1  |  0.6011  |
|         crossvit_9_240          | 1  |  0.5865  |
|           tf_mixnet_l           | 1  |  0.5648  |
|        ese_vovnet19b_dw         | 1  |  0.546   |
|            mixnet_l             | 1  |  0.5282  |
|         coat_lite_mini          | 1  |  0.5237  |
|           rexnet_100            | 1  |  0.4773  |
|            tinynet_a            | 1  |  0.4549  |
|       tf_efficientnet_b0        | 1  |  0.4062  |
|           mobilevit_s           | 1  |  0.3664  |
|          gmlp_s16_224           | 1  |  0.364   |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            hrnet_w18            | 1  |   18.0   |
|          cait_m36_384           | 1  | 17.8966  |
|          pnasnet5large          | 1  | 15.4624  |
|      xcit_large_24_p8_224       | 1  | 12.9297  |
|  swin_base_patch4_window7_224   | 1  | 11.0748  |
|         poolformer_m36          | 1  |  9.9394  |
|        res2net101_26w_4s        | 1  |  9.2803  |
|        twins_pcpvt_base         | 1  |  9.1837  |
|           mobilevit_s           | 1  |  8.716   |
|          jx_nest_base           | 1  |  8.5922  |
|        res2net50_14w_8s         | 1  |  8.3796  |
|            mixnet_l             | 1  |  8.0371  |
|           tf_mixnet_l           | 1  |  7.9076  |
|             dpn107              | 1  |  7.8182  |
|        tnt_s_patch16_224        | 1  |  7.6116  |
|          gmlp_s16_224           | 1  |  7.3036  |
|           dm_nfnet_f0           | 1  |  6.8258  |
|            fbnetv3_b            | 1  |  6.7063  |
|        gluon_xception65         | 1  |  6.5942  |
|           volo_d1_224           | 1  |  6.0258  |
|             dla102              | 1  |  5.8349  |
|          convnext_base          | 1  |  5.6419  |
|         crossvit_9_240          | 1  |  5.3923  |
|           resnest101e           | 1  |  5.2057  |
|        adv_inception_v3         | 1  |  5.1079  |
|       gluon_inception_v3        | 1  |  5.0266  |
|          inception_v3           | 1  |  4.9859  |
|           res2next50            | 1  |  4.9741  |
|          cspdarknet53           | 1  |  4.8769  |
|            nfnet_l0             | 1  |  4.5565  |
|          ghostnet_100           | 1  |  4.4579  |
|            tinynet_a            | 1  |  4.2661  |
|           rexnet_100            | 1  |  4.2428  |
|     swsl_resnext101_32x16d      | 1  |  3.9205  |
|          gmixer_24_224          | 1  |  3.8914  |
|         coat_lite_mini          | 1  |  3.8132  |
|       tf_efficientnet_b0        | 1  |  3.7351  |
|           convit_base           | 1  |  3.7254  |
|        convmixer_768_32         | 1  |  3.6167  |
|         visformer_small         | 1  |  3.6126  |
|           fbnetc_100            | 1  |  3.601   |
|          spnasnet_100           | 1  |  3.5018  |
| deit_base_distilled_patch16_224 | 1  |  3.4593  |
|            pit_b_224            | 1  |  3.4397  |
|      beit_base_patch16_224      | 1  |  3.3555  |
|            gernet_l             | 1  |  3.2903  |
|            repvgg_a2            | 1  |  3.1485  |
|      mobilenetv3_large_100      | 1  |  3.0992  |
|         mobilenetv2_100         | 1  |  3.067   |
|      vit_base_patch16_224       | 1  |  2.9249  |
|           regnety_002           | 1  |  2.8927  |
|          mixer_b16_224          | 1  |  2.8533  |
|           mnasnet_100           | 1  |  2.8398  |
|        ese_vovnet19b_dw         | 1  |  2.4235  |
|            lcnet_050            | 1  |  2.4123  |
|           selecsls42b           | 1  |  2.2144  |
|          resmlp_12_224          | 1  |  1.7754  |
|          botnet26t_256          | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  |  0.983   |
|      vit_base_patch16_224       | 1  |  0.9803  |
|     swsl_resnext101_32x16d      | 1  |  0.9802  |
| deit_base_distilled_patch16_224 | 1  |  0.9785  |
|      xcit_large_24_p8_224       | 1  |  0.9776  |
|            pit_b_224            | 1  |  0.9772  |
|           selecsls42b           | 1  |  0.9742  |
|           convit_base           | 1  |  0.9734  |
|          mixer_b16_224          | 1  |  0.9731  |
|      beit_base_patch16_224      | 1  |  0.9731  |
|        convmixer_768_32         | 1  |  0.9718  |
|        ese_vovnet19b_dw         | 1  |  0.9685  |
|          convnext_base          | 1  |  0.9653  |
|            repvgg_a2            | 1  |  0.965   |
|          resmlp_12_224          | 1  |  0.9649  |
|            gernet_l             | 1  |  0.9642  |
|           dm_nfnet_f0           | 1  |  0.9628  |
|            nfnet_l0             | 1  |  0.9625  |
|            lcnet_050            | 1  |  0.9612  |
|             dpn107              | 1  |  0.9606  |
|         visformer_small         | 1  |  0.9583  |
|          cspdarknet53           | 1  |  0.9557  |
|        gluon_xception65         | 1  |  0.9537  |
|         mobilenetv2_100         | 1  |  0.9518  |
|          gmixer_24_224          | 1  |  0.9517  |
|      mobilenetv3_large_100      | 1  |  0.9515  |
|           regnety_002           | 1  |  0.9499  |
|          jx_nest_base           | 1  |  0.9497  |
|  swin_base_patch4_window7_224   | 1  |  0.9491  |
|           mnasnet_100           | 1  |  0.949   |
|         coat_lite_mini          | 1  |  0.949   |
|           res2next50            | 1  |  0.9488  |
|       tf_efficientnet_b0        | 1  |  0.9463  |
|       gluon_inception_v3        | 1  |  0.9458  |
|            tinynet_a            | 1  |  0.9453  |
|             dla102              | 1  |  0.9445  |
|        adv_inception_v3         | 1  |  0.9433  |
|          inception_v3           | 1  |  0.943   |
|          pnasnet5large          | 1  |  0.9425  |
|           fbnetc_100            | 1  |  0.9409  |
|          spnasnet_100           | 1  |  0.9397  |
|           rexnet_100            | 1  |  0.9397  |
|          ghostnet_100           | 1  |  0.9363  |
|           mobilevit_s           | 1  |  0.9355  |
|           volo_d1_224           | 1  |  0.9336  |
|           resnest101e           | 1  |  0.9329  |
|        res2net101_26w_4s        | 1  |  0.9305  |
|        res2net50_14w_8s         | 1  |  0.9252  |
|           tf_mixnet_l           | 1  |  0.9241  |
|            mixnet_l             | 1  |  0.924   |
|        tnt_s_patch16_224        | 1  |  0.9211  |
|        twins_pcpvt_base         | 1  |  0.9207  |
|         crossvit_9_240          | 1  |  0.9185  |
|            fbnetv3_b            | 1  |  0.9182  |
|         poolformer_m36          | 1  |  0.9156  |
|          gmlp_s16_224           | 1  |  0.9146  |
|            hrnet_w18            | 1  |  0.8641  |
|          botnet26t_256          | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

blzheng · 2022-10-28T02:02:04Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 94%, 51/54 | 100%, 44/44 | 90%, 55/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.16x    |    1.05x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    4.21    |    6.31     |    7.17     |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.94x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|         soft_actor_critic         | 256  |  1.6285  |
|           timm_resnest            |  32  |  1.3656  |
|             resnet50              |  32  |  1.301   |
|        shufflenet_v2_x1_0         |  64  |  1.2455  |
|            mnasnet1_0             |  32  |  1.1844  |
|                drq                |  1   |  1.146   |
|            densenet121            |  64  |  1.1441  |
|              alexnet              | 128  |  1.1201  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1175  |
|           squeezenet1_1           |  16  |  1.0813  |
|               vgg16               |  4   |  1.0698  |
|               dlrm                | 2048 |  1.0623  |
|             resnet18              |  8   |  1.0495  |
|            hf_T5_large            |  1   |  1.045   |
|            Super_SloMo            |  6   |  1.0397  |
|       functorch_dp_cifar10        |  64  |  1.0378  |
|            timm_regnet            |  32  |  1.0326  |
|          pytorch_stargan          |  16  |  1.0309  |
|        mobilenet_v3_large         |  32  |  1.0179  |
|               dcgan               | 256  |  1.0161  |
|               hf_T5               |  1   |  1.0094  |
|          LearningToPaint          |  96  |  1.0057  |
|            hf_Reformer            |  1   |  1.0056  |
|      resnet50_quantized_qat       |  32  |  1.0033  |
|    mobilenet_v2_quantized_qat     |  96  |  1.001   |
|            hf_BigBird             |  1   |  1.0007  |
|              demucs               |  1   |  0.9973  |
|            tts_angular            |  64  |  0.996   |
|            timm_vovnet            |  32  |  0.9907  |
|        Background_Matting         |  1   |  0.9842  |
|           BERT_pytorch            |  2   |  0.9805  |
|           hf_GPT2_large           |  1   |  0.9283  |
|        speech_transformer         |  1   |  0.9266  |
|           mobilenet_v2            |  16  |  0.9249  |
|              hf_GPT2              |  1   |  0.9178  |
|           pytorch_unet            |  1   |  0.9163  |
|           hf_Longformer           |  1   |  0.9125  |
|             hf_Albert             |  1   |  0.8922  |
|   timm_vision_transformer_large   |  8   |  0.8638  |
|            hf_T5_base             |  1   |  0.8383  |
|      nvidia_deeprecommender       | 256  |  0.832   |
|           hf_DistilBert           |  1   |  0.8276  |
|              yolov3               |  8   |  0.8185  |
|          resnext50_32x4d          |  8   |  0.8175  |
|              hf_Bert              |  1   |  0.7978  |
|            timm_nfnet             | 128  |  0.7914  |
|              hf_Bart              |  1   |  0.7894  |
|          vision_maskrcnn          |  1   |  0.7852  |
|           fastNLP_Bert            |  1   |  0.7733  |
| attention_is_all_you_need_pytorch |  32  |  0.7196  |
|         timm_efficientnet         |  64  |  0.6824  |
|      timm_vision_transformer      |  8   |  0.681   |
|           lennard_jones           | 1000 |  0.3501  |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|        Background_Matting         |  1  |  fail_accuracy   |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   |  20.652  |
|          vision_maskrcnn          |  1   | 16.9233  |
|   timm_vision_transformer_large   |  8   | 12.9825  |
|           hf_GPT2_large           |  1   | 12.2959  |
|           hf_Longformer           |  1   | 11.5857  |
|              yolov3               |  8   | 11.1044  |
|            timm_nfnet             | 128  | 10.2945  |
|            hf_BigBird             |  1   | 10.2459  |
|            hf_T5_base             |  1   | 10.0108  |
|            densenet121            |  64  |  9.4044  |
|            Super_SloMo            |  6   |  6.6149  |
|            timm_regnet            |  32  |  5.5325  |
|        speech_transformer         |  1   |  5.0215  |
|              hf_Bart              |  1   |  4.9445  |
|           fastNLP_Bert            |  1   |  4.4597  |
| attention_is_all_you_need_pytorch |  32  |  4.2971  |
|               hf_T5               |  1   |  4.245   |
|           BERT_pytorch            |  2   |  4.2274  |
|         timm_efficientnet         |  64  |  4.1576  |
|              hf_Bert              |  1   |  3.9968  |
|      timm_vision_transformer      |  8   |  3.8982  |
|        mobilenet_v3_large         |  32  |  3.8922  |
|        shufflenet_v2_x1_0         |  64  |  3.7045  |
|           mobilenet_v2            |  16  |  3.6952  |
|            hf_Reformer            |  1   |  3.5863  |
|              hf_GPT2              |  1   |  3.5126  |
|          resnext50_32x4d          |  8   |  3.4358  |
|            mnasnet1_0             |  32  |  3.329   |
|        Background_Matting         |  1   |  3.2703  |
|            timm_vovnet            |  32  |  3.0503  |
|             hf_Albert             |  1   |  3.0439  |
|             resnet50              |  32  |  2.6625  |
|           timm_resnest            |  32  |  2.5714  |
|       functorch_dp_cifar10        |  64  |  2.4244  |
|           hf_DistilBert           |  1   |  2.098   |
|          LearningToPaint          |  96  |  1.7859  |
|           pytorch_unet            |  1   |  1.5454  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.5254  |
|          pytorch_stargan          |  16  |  1.3323  |
|             resnet18              |  8   |  1.2597  |
|              demucs               |  1   |  1.1289  |
|           squeezenet1_1           |  16  |  0.8397  |
|      nvidia_deeprecommender       | 256  |  0.7496  |
|               dlrm                | 2048 |  0.706   |
|               vgg16               |  4   |  0.5992  |
|            tts_angular            |  64  |  0.5919  |
|              alexnet              | 128  |  0.4548  |
|                drq                |  1   |  0.3724  |
|         soft_actor_critic         | 256  |  0.2193  |
|           lennard_jones           | 1000 |  0.1881  |
|               dcgan               | 256  |  0.1669  |
|    mobilenet_v2_quantized_qat     |  96  |  0.0845  |
|      resnet50_quantized_qat       |  32  |  0.0364  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9989  |
|      resnet50_quantized_qat       |  32  |  0.9956  |
|               vgg16               |  4   |  0.9947  |
|            timm_regnet            |  32  |  0.9938  |
|   timm_vision_transformer_large   |  8   |  0.9936  |
|        Background_Matting         |  1   |  0.9933  |
|          LearningToPaint          |  96  |  0.9918  |
|              alexnet              | 128  |  0.9915  |
|            timm_nfnet             | 128  |  0.9913  |
|           pytorch_unet            |  1   |  0.9904  |
|            hf_BigBird             |  1   |  0.9901  |
|            timm_vovnet            |  32  |  0.9891  |
|            Super_SloMo            |  6   |  0.9878  |
|            densenet121            |  64  |  0.9866  |
|            mnasnet1_0             |  32  |  0.9851  |
|           lennard_jones           | 1000 |  0.9851  |
| attention_is_all_you_need_pytorch |  32  |  0.9834  |
|        mobilenet_v3_large         |  32  |  0.9826  |
|         soft_actor_critic         | 256  |  0.9822  |
|            tts_angular            |  64  |  0.9821  |
|        shufflenet_v2_x1_0         |  64  |  0.9818  |
|           mobilenet_v2            |  16  |  0.9808  |
|           hf_DistilBert           |  1   |  0.9805  |
|                drq                |  1   |  0.9801  |
|           hf_GPT2_large           |  1   |  0.9801  |
|              hf_Bart              |  1   |  0.9781  |
|           BERT_pytorch            |  2   |  0.9753  |
|            hf_T5_base             |  1   |  0.9752  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9742  |
|         timm_efficientnet         |  64  |  0.9734  |
|             resnet50              |  32  |  0.9717  |
|          vision_maskrcnn          |  1   |  0.9709  |
|          pytorch_stargan          |  16  |  0.9709  |
|        speech_transformer         |  1   |  0.9708  |
|             resnet18              |  8   |   0.97   |
|           squeezenet1_1           |  16  |  0.9679  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9673  |
|              hf_GPT2              |  1   |  0.964   |
|          resnext50_32x4d          |  8   |  0.9555  |
|             hf_Albert             |  1   |  0.9551  |
|              yolov3               |  8   |  0.9545  |
|               dlrm                | 2048 |  0.9473  |
|      timm_vision_transformer      |  8   |  0.9411  |
|           fastNLP_Bert            |  1   |  0.9254  |
|           hf_Longformer           |  1   |  0.9221  |
|       functorch_dp_cifar10        |  64  |  0.9091  |
|            hf_Reformer            |  1   |  0.822   |
|              hf_Bert              |  1   |  0.8084  |
|           timm_resnest            |  32  |  0.8031  |
|               dcgan               | 256  |  0.718   |
|            hf_T5_large            |  1   |  0.6768  |
|      nvidia_deeprecommender       | 256  |  0.587   |
|               hf_T5               |  1   |  0.5784  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 4  |  7.481   |
|       MT5ForConditionalGeneration       | 2  |  6.932   |
|             XGLMForCausalLM             | 1  |  1.3679  |
|     M2M100ForConditionalGeneration      | 2  |  1.3395  |
|               DistillGPT2               | 1  |  1.3298  |
|             OPTForCausalLM              | 4  |  1.291   |
|           ElectraForCausalLM            | 1  |  1.2877  |
|          MobileBertForMaskedLM          | 16 |  1.2679  |
|               GoogleFnet                | 1  |  1.2672  |
|     MobileBertForQuestionAnswering      | 32 |  1.2178  |
|            YituTechConvBert             | 1  |  1.2023  |
|                 BigBird                 | 1  |  1.1067  |
|         MegatronBertForCausalLM         | 2  |  1.0607  |
|            AlbertForMaskedLM            | 2  |  1.0285  |
|                 T5Small                 | 1  |  1.0266  |
|       AlbertForQuestionAnswering        | 2  |  1.022   |
|          AllenaiLongformerBase          | 1  |  1.018   |
|     PegasusForConditionalGeneration     | 4  |  1.0109  |
|     PLBartForConditionalGeneration      | 8  |  0.9846  |
|            TrOCRForCausalLM             | 8  |  0.9816  |
|           RobertaForCausalLM            | 4  |  0.9761  |
|           PegasusForCausalLM            | 8  |  0.9758  |
|                CamemBert                | 1  |  0.9677  |
|           DebertaForMaskedLM            | 4  |  0.9663  |
|         Speech2Text2ForCausalLM         | 64 |  0.9482  |
|       DebertaForQuestionAnswering       | 4  |  0.9393  |
|      MBartForConditionalGeneration      | 8  |  0.9377  |
|            PLBartForCausalLM            | 16 |  0.9336  |
|      GPT2ForSequenceClassification      | 4  |  0.9188  |
|          DistilBertForMaskedLM          | 16 |  0.9062  |
|            MBartForCausalLM             | 16 |  0.8997  |
|    MegatronBertForQuestionAnswering     | 8  |  0.8808  |
|       RobertaForQuestionAnswering       | 64 |  0.8567  |
|    LayoutLMForSequenceClassification    | 16 |  0.854   |
|       T5ForConditionalGeneration        | 4  |  0.8515  |
|           LayoutLMForMaskedLM           | 16 |  0.8487  |
|             BartForCausalLM             | 2  |  0.8387  |
|     DistilBertForQuestionAnswering      | 32 |  0.8373  |
|        BertForQuestionAnswering         | 64 |  0.8344  |
|             BertForMaskedLM             | 64 |  0.8302  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.8262  |
|      BartForConditionalGeneration       | 1  |  0.8234  |
|       ElectraForQuestionAnswering       | 64 |  0.7907  |
|       BlenderbotSmallForCausalLM        | 64 |  0.7829  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 32 | 14.3407  |
|          MobileBertForMaskedLM          | 16 | 14.3272  |
|          AllenaiLongformerBase          | 1  | 12.8257  |
|      MBartForConditionalGeneration      | 8  | 11.0707  |
|     PegasusForConditionalGeneration     | 4  | 10.7008  |
|                 BigBird                 | 1  | 10.3945  |
|       DebertaForQuestionAnswering       | 4  | 10.3599  |
|      BartForConditionalGeneration       | 1  | 10.1184  |
|           DebertaForMaskedLM            | 4  |  9.705   |
|    MegatronBertForQuestionAnswering     | 8  |  9.4832  |
|     M2M100ForConditionalGeneration      | 2  |  9.4558  |
|         MegatronBertForCausalLM         | 2  |  9.3521  |
|             XGLMForCausalLM             | 1  |  7.5338  |
| BlenderbotSmallForConditionalGeneration | 32 |  6.9944  |
|            YituTechConvBert             | 1  |  6.4043  |
|       ElectraForQuestionAnswering       | 64 |  6.0649  |
|       T5ForConditionalGeneration        | 4  |  5.9274  |
|        BertForQuestionAnswering         | 64 |  5.5988  |
|           LayoutLMForMaskedLM           | 16 |   5.4    |
|       MT5ForConditionalGeneration       | 2  |  5.2979  |
|     PLBartForConditionalGeneration      | 8  |  5.2489  |
|             BertForMaskedLM             | 64 |  5.2009  |
|            MBartForCausalLM             | 16 |  5.0805  |
|    LayoutLMForSequenceClassification    | 16 |  5.0601  |
|      GPT2ForSequenceClassification      | 4  |  4.9866  |
|           RobertaForCausalLM            | 4  |  4.6449  |
|       RobertaForQuestionAnswering       | 64 |  4.5597  |
|             BartForCausalLM             | 2  |  4.5368  |
|             OPTForCausalLM              | 4  |  4.379   |
|                 T5Small                 | 1  |  4.3357  |
|           PegasusForCausalLM            | 8  |  4.3134  |
|                CamemBert                | 1  |  4.1945  |
|            TrOCRForCausalLM             | 8  |  4.1408  |
|           ElectraForCausalLM            | 1  |  4.0948  |
|            AlbertForMaskedLM            | 2  |  3.5141  |
|       AlbertForQuestionAnswering        | 2  |  3.5065  |
|       BlenderbotSmallForCausalLM        | 64 |  3.2783  |
|     DistilBertForQuestionAnswering      | 32 |  2.9662  |
|               GoogleFnet                | 1  |  2.6108  |
|          DistilBertForMaskedLM          | 16 |  2.4159  |
|               DistillGPT2               | 1  |  2.3806  |
|            PLBartForCausalLM            | 16 |  2.2758  |
|         Speech2Text2ForCausalLM         | 64 |  2.1607  |
|            XLNetLMHeadModel             | 4  | -14.658  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 2  |  0.9964  |
|       AlbertForQuestionAnswering        | 2  |  0.9962  |
|             BertForMaskedLM             | 64 |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64 |  0.9954  |
|      GPT2ForSequenceClassification      | 4  |  0.9954  |
|       ElectraForQuestionAnswering       | 64 |  0.9953  |
|       T5ForConditionalGeneration        | 4  |  0.9939  |
|             BartForCausalLM             | 2  |  0.9936  |
|           DebertaForMaskedLM            | 4  |  0.9922  |
|                 BigBird                 | 1  |  0.9914  |
|            MBartForCausalLM             | 16 |  0.9912  |
|       DebertaForQuestionAnswering       | 4  |  0.9908  |
|      BartForConditionalGeneration       | 1  |  0.9906  |
|            PLBartForCausalLM            | 16 |  0.9902  |
| BlenderbotSmallForConditionalGeneration | 32 |  0.9897  |
|            TrOCRForCausalLM             | 8  |  0.9896  |
|           PegasusForCausalLM            | 8  |  0.9894  |
|               GoogleFnet                | 1  |  0.9891  |
|     DistilBertForQuestionAnswering      | 32 |  0.9889  |
|     PegasusForConditionalGeneration     | 4  |  0.9886  |
|         Speech2Text2ForCausalLM         | 64 |  0.9885  |
|           LayoutLMForMaskedLM           | 16 |  0.988   |
|          DistilBertForMaskedLM          | 16 |  0.9874  |
|      MBartForConditionalGeneration      | 8  |  0.9856  |
|    LayoutLMForSequenceClassification    | 16 |  0.9853  |
|        BertForQuestionAnswering         | 64 |  0.9826  |
|     PLBartForConditionalGeneration      | 8  |  0.9815  |
|             OPTForCausalLM              | 4  |  0.9807  |
|               DistillGPT2               | 1  |  0.9755  |
|          AllenaiLongformerBase          | 1  |  0.974   |
|     M2M100ForConditionalGeneration      | 2  |  0.9724  |
|          MobileBertForMaskedLM          | 16 |  0.9715  |
|           RobertaForCausalLM            | 4  |  0.9678  |
|             XGLMForCausalLM             | 1  |  0.9657  |
|       RobertaForQuestionAnswering       | 64 |  0.9552  |
|           ElectraForCausalLM            | 1  |  0.9379  |
|            XLNetLMHeadModel             | 4  |  0.9116  |
|                CamemBert                | 1  |  0.8447  |
|                 T5Small                 | 1  |  0.8396  |
|     MobileBertForQuestionAnswering      | 32 |  0.8228  |
|            YituTechConvBert             | 1  |  0.8046  |
|         MegatronBertForCausalLM         | 2  |  0.6934  |
|    MegatronBertForQuestionAnswering     | 8  |  0.6877  |
|       MT5ForConditionalGeneration       | 2  |  0.5174  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        res2net101_26w_4s        | 64  |  1.286   |
|           mnasnet_100           | 128 |  1.2825  |
|          spnasnet_100           | 128 |  1.2711  |
|          inception_v3           | 128 |  1.2154  |
|           fbnetc_100            | 128 |  1.2065  |
|       gluon_inception_v3        | 128 |  1.1991  |
|        adv_inception_v3         | 128 |  1.1929  |
|          ghostnet_100           | 128 |  1.1589  |
|        convmixer_768_32         | 32  |  1.1486  |
|      mobilenetv3_large_100      | 128 |  1.1401  |
|            gernet_l             | 128 |  1.1394  |
|             dpn107              | 32  |  1.1393  |
|         mobilenetv2_100         | 128 |  1.1004  |
|           volo_d1_224           | 64  |  1.0784  |
|        res2net50_14w_8s         |  2  |  1.0773  |
|            repvgg_a2            | 128 |  1.072   |
|            hrnet_w18            |  2  |  1.0713  |
|            fbnetv3_b            | 128 |  1.0666  |
|           resnest101e           | 32  |  1.0534  |
|           selecsls42b           | 128 |  1.0523  |
|           regnety_002           | 128 |  1.0394  |
|            lcnet_050            | 128 |  1.0392  |
|     swsl_resnext101_32x16d      | 32  |  1.0289  |
|          pnasnet5large          | 16  |  1.0235  |
|             dla102              | 64  |  0.9799  |
|           tf_mixnet_l           | 64  |  0.9699  |
|          cait_m36_384           |  2  |  0.9604  |
|         poolformer_m36          | 64  |  0.9602  |
|        ese_vovnet19b_dw         | 128 |  0.9537  |
|          gmixer_24_224          | 64  |  0.9481  |
|      xcit_large_24_p8_224       |  5  |  0.9399  |
|            mixnet_l             | 64  |  0.9373  |
|  swin_base_patch4_window7_224   | 64  |  0.8761  |
|      beit_base_patch16_224      | 64  |  0.8741  |
|            nfnet_l0             | 64  |  0.8696  |
|           dm_nfnet_f0           | 128 |  0.859   |
| deit_base_distilled_patch16_224 | 64  |  0.8468  |
|           convit_base           | 32  |  0.843   |
|      vit_base_patch16_224       | 64  |  0.8349  |
|          resmlp_12_224          | 128 |  0.8168  |
|          jx_nest_base           | 32  |  0.8133  |
|        tnt_s_patch16_224        | 64  |  0.8119  |
|           rexnet_100            | 128 |  0.8068  |
|         visformer_small         | 128 |  0.8049  |
|           res2next50            |  2  |  0.8005  |
|          mixer_b16_224          | 64  |  0.7952  |
|          convnext_base          | 32  |  0.7919  |
|            pit_b_224            | 64  |  0.7642  |
|            tinynet_a            | 128 |  0.7436  |
|       tf_efficientnet_b0        | 128 |  0.7389  |
|         coat_lite_mini          | 128 |  0.7325  |
|        gluon_xception65         | 32  |  0.7183  |
|        twins_pcpvt_base         | 32  |  0.7145  |
|         crossvit_9_240          | 64  |  0.6909  |
|          cspdarknet53           | 64  |  0.6884  |
|          gmlp_s16_224           | 64  |  0.5267  |
|           mobilevit_s           | 32  |  0.502   |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  | 19.2248  |
|            hrnet_w18            |  2  | 18.9544  |
|  swin_base_patch4_window7_224   | 64  | 14.8893  |
|      xcit_large_24_p8_224       |  5  | 14.0629  |
|           mobilevit_s           | 32  | 13.4383  |
|          cait_m36_384           |  2  | 13.0711  |
|         poolformer_m36          | 64  |  12.376  |
|        twins_pcpvt_base         | 32  | 11.2295  |
|           dm_nfnet_f0           | 128 | 10.6087  |
|           resnest101e           | 32  | 10.3014  |
|           tf_mixnet_l           | 64  |  9.6655  |
|        res2net101_26w_4s        | 64  |  9.6545  |
|          jx_nest_base           | 32  |  9.5788  |
|        tnt_s_patch16_224        | 64  |  9.3977  |
|            mixnet_l             | 64  |  9.3847  |
|        gluon_xception65         | 32  |  9.3583  |
|        res2net50_14w_8s         |  2  |  8.9778  |
|          gmlp_s16_224           | 64  |  8.6621  |
|             dpn107              | 32  |  8.6491  |
|            fbnetv3_b            | 128 |  8.4694  |
|           volo_d1_224           | 64  |  8.0304  |
|          convnext_base          | 32  |  7.4253  |
|         crossvit_9_240          | 64  |  7.0807  |
|            nfnet_l0             | 64  |  7.0103  |
|            pit_b_224            | 64  |  6.5919  |
|             dla102              | 64  |  6.5173  |
|          cspdarknet53           | 64  |  6.4078  |
|         coat_lite_mini          | 128 |  6.2778  |
|       tf_efficientnet_b0        | 128 |  6.0339  |
|     swsl_resnext101_32x16d      | 32  |  5.8507  |
|          ghostnet_100           | 128 |   5.82   |
|           convit_base           | 32  |  5.7277  |
|            tinynet_a            | 128 |  5.679   |
|           rexnet_100            | 128 |  5.6751  |
|          gmixer_24_224          | 64  |  5.6566  |
|           res2next50            |  2  |  5.3908  |
|      vit_base_patch16_224       | 64  |  5.0519  |
|         visformer_small         | 128 |  4.8499  |
|       gluon_inception_v3        | 128 |   4.78   |
|          mixer_b16_224          | 64  |  4.7777  |
|           fbnetc_100            | 128 |  4.7003  |
|      beit_base_patch16_224      | 64  |  4.6882  |
|          inception_v3           | 128 |  4.5974  |
|         mobilenetv2_100         | 128 |  4.5727  |
|        adv_inception_v3         | 128 |  4.5165  |
|            gernet_l             | 128 |  4.3313  |
| deit_base_distilled_patch16_224 | 64  |  4.3017  |
|           regnety_002           | 128 |  4.1075  |
|            repvgg_a2            | 128 |  4.0545  |
|          spnasnet_100           | 128 |  3.9151  |
|      mobilenetv3_large_100      | 128 |  3.5092  |
|           mnasnet_100           | 128 |  3.2211  |
|        convmixer_768_32         | 32  |  3.063   |
|        ese_vovnet19b_dw         | 128 |  2.7839  |
|            lcnet_050            | 128 |  2.1599  |
|           selecsls42b           | 128 |  2.1297  |
|          resmlp_12_224          | 128 |  2.0739  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9973  |
|     swsl_resnext101_32x16d      | 32  |  0.9967  |
|      vit_base_patch16_224       | 64  |  0.9965  |
|           dm_nfnet_f0           | 128 |  0.9965  |
|         visformer_small         | 128 |  0.9964  |
|      beit_base_patch16_224      | 64  |  0.9962  |
|          mixer_b16_224          | 64  |  0.9958  |
|            gernet_l             | 128 |  0.9956  |
|         coat_lite_mini          | 128 |  0.9955  |
|        gluon_xception65         | 32  |  0.9955  |
|          cspdarknet53           | 64  |  0.9955  |
|            pit_b_224            | 64  |  0.9954  |
|            fbnetv3_b            | 128 |  0.995   |
|           convit_base           | 32  |  0.9949  |
|           fbnetc_100            | 128 |  0.9947  |
|       tf_efficientnet_b0        | 128 |  0.9944  |
|           rexnet_100            | 128 |  0.9943  |
|          spnasnet_100           | 128 |  0.9943  |
|          gmixer_24_224          | 64  |  0.9943  |
|  swin_base_patch4_window7_224   | 64  |  0.9938  |
|         mobilenetv2_100         | 128 |  0.9938  |
|            repvgg_a2            | 128 |  0.9935  |
|          jx_nest_base           | 32  |  0.9933  |
|            nfnet_l0             | 64  |  0.9925  |
|          convnext_base          | 32  |  0.9922  |
|           mobilevit_s           | 32  |  0.9921  |
|           tf_mixnet_l           | 64  |  0.992   |
|         poolformer_m36          | 64  |  0.9916  |
|             dpn107              | 32  |  0.9916  |
|            mixnet_l             | 64  |  0.9916  |
|          ghostnet_100           | 128 |  0.9914  |
|          pnasnet5large          | 16  |  0.9914  |
|           selecsls42b           | 128 |  0.9907  |
|           mnasnet_100           | 128 |  0.9907  |
|           volo_d1_224           | 64  |  0.9898  |
|      xcit_large_24_p8_224       |  5  |  0.9887  |
|           regnety_002           | 128 |  0.9886  |
|          cait_m36_384           |  2  |  0.9885  |
|       gluon_inception_v3        | 128 |  0.985   |
|        adv_inception_v3         | 128 |  0.9849  |
|          inception_v3           | 128 |  0.9845  |
|          resmlp_12_224          | 128 |  0.9842  |
|      mobilenetv3_large_100      | 128 |  0.9839  |
| deit_base_distilled_patch16_224 | 64  |  0.9824  |
|            tinynet_a            | 128 |  0.9789  |
|            lcnet_050            | 128 |  0.9726  |
|             dla102              | 64  |  0.9696  |
|        convmixer_768_32         | 32  |  0.9682  |
|        res2net101_26w_4s        | 64  |  0.9647  |
|           res2next50            |  2  |  0.9514  |
|           resnest101e           | 32  |  0.951   |
|         crossvit_9_240          | 64  |  0.9487  |
|        twins_pcpvt_base         | 32  |  0.9332  |
|        res2net50_14w_8s         |  2  |  0.9328  |
|          gmlp_s16_224           | 64  |  0.9267  |
|            hrnet_w18            |  2  |  0.8875  |
|        tnt_s_patch16_224        | 64  |  0.7729  |
|          botnet26t_256          |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

ESI-SYD · 2022-11-03T06:32:32Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 89%, 49/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.06x    |    1.07x    |    1.08x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   10.95    |    14.59    |    16.49    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.93x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           timm_resnest            |  32  |  1.2876  |
|            densenet121            |  64  |  1.2032  |
|        shufflenet_v2_x1_0         |  64  |  1.1822  |
|             resnet18              |  8   |  1.1751  |
|       functorch_dp_cifar10        |  64  |  1.1673  |
|              alexnet              | 128  |   1.15   |
|           squeezenet1_1           |  16  |  1.1478  |
|            mnasnet1_0             |  32  |  1.1333  |
|           pytorch_unet            |  1   |  1.1325  |
|        mobilenet_v3_large         |  32  |  1.1291  |
|            timm_vovnet            |  32  |  1.1207  |
|          resnext50_32x4d          |  8   |  1.1188  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1158  |
|             resnet50              |  32  |  1.1079  |
|           mobilenet_v2            |  16  |  1.0973  |
|          vision_maskrcnn          |  1   |  1.0919  |
|               vgg16               |  4   |  1.087   |
|                drq                |  1   |  1.0844  |
|               dcgan               | 256  |  1.0739  |
|            hf_T5_large            |  1   |  1.0637  |
|            timm_regnet            |  32  |  1.0604  |
|               dlrm                | 2048 |  1.0598  |
|          LearningToPaint          |  96  |  1.0565  |
|              yolov3               |  8   |  1.0482  |
|            Super_SloMo            |  6   |  1.0463  |
|            hf_BigBird             |  1   |  1.0328  |
|          pytorch_stargan          |  16  |  1.0299  |
|               hf_T5               |  1   |  1.0255  |
|        Background_Matting         |  1   |  1.0247  |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0231  |
|            hf_Reformer            |  1   |  1.0204  |
|              demucs               |  1   |  1.0015  |
|            tts_angular            |  64  |  1.001   |
|    mobilenet_v2_quantized_qat     |  96  |  1.0008  |
|      resnet50_quantized_qat       |  32  |  0.9899  |
|           BERT_pytorch            |  2   |  0.9855  |
|           hf_GPT2_large           |  1   |  0.9577  |
|              hf_GPT2              |  1   |  0.9225  |
|           hf_Longformer           |  1   |  0.9222  |
|             hf_Albert             |  1   |  0.8953  |
|            hf_T5_base             |  1   |  0.8771  |
|   timm_vision_transformer_large   |  8   |  0.8726  |
|      nvidia_deeprecommender       | 256  |  0.8412  |
|           hf_DistilBert           |  1   |  0.8293  |
|            timm_nfnet             | 128  |  0.8142  |
|         timm_efficientnet         |  64  |  0.8089  |
|              hf_Bart              |  1   |  0.7965  |
|              hf_Bert              |  1   |  0.796   |
|      timm_vision_transformer      |  8   |  0.7337  |
| attention_is_all_you_need_pytorch |  32  |  0.7245  |
|           lennard_jones           | 1000 |  0.3547  |
|        speech_transformer         |  0   |   0.0    |
|         soft_actor_critic         |  0   |   0.0    |
|             tacotron2             |  0   |   0.0    |
|           fastNLP_Bert            |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 2  | pass_due_to_skip |
|           hf_GPT2_large           | 2  | pass_due_to_skip |
|   timm_vision_transformer_large   | 2  | pass_due_to_skip |
|               hf_T5               | 2  |       pass       |
|            hf_Reformer            | 2  |       pass       |
|            hf_T5_base             | 2  |       pass       |
|          LearningToPaint          | 2  |       pass       |
|            Super_SloMo            | 2  |       pass       |
|              alexnet              | 2  |       pass       |
| attention_is_all_you_need_pytorch | 2  |       pass       |
|               dcgan               | 2  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 2  |       pass       |
|               dlrm                | 2  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 2  |       pass       |
|            mnasnet1_0             | 2  |       pass       |
|             hf_Albert             | 2  |       pass       |
|              hf_Bart              | 2  |       pass       |
|              hf_Bert              | 2  |       pass       |
|            hf_BigBird             | 2  |       pass       |
|           hf_DistilBert           | 2  |       pass       |
|              hf_GPT2              | 2  |       pass       |
|           hf_Longformer           | 2  |       pass       |
|       functorch_dp_cifar10        | 2  |       pass       |
|           lennard_jones           | 2  |       pass       |
|          resnext50_32x4d          | 2  |       pass       |
|           BERT_pytorch            | 2  |       pass       |
|      resnet50_quantized_qat       | 2  |       pass       |
|           mobilenet_v2            | 2  |       pass       |
|    mobilenet_v2_quantized_qat     | 2  |       pass       |
|        mobilenet_v3_large         | 2  |       pass       |
|      nvidia_deeprecommender       | 2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 2  |       pass       |
|               vgg16               | 2  |       pass       |
|             resnet50              | 2  |       pass       |
|             resnet18              | 2  |       pass       |
|        Background_Matting         | 1  |       pass       |
|        shufflenet_v2_x1_0         | 2  |       pass       |
|           squeezenet1_1           | 2  |       pass       |
|         timm_efficientnet         | 2  |       pass       |
|            timm_nfnet             | 2  |       pass       |
|            timm_regnet            | 2  |       pass       |
|           timm_resnest            | 2  |       pass       |
|      timm_vision_transformer      | 2  |       pass       |
|            timm_vovnet            | 2  |       pass       |
|            tts_angular            | 2  |       pass       |
|     detectron2_fcos_r_50_fpn      | 2  |   fail_to_run    |
|           fastNLP_Bert            | 0  |      0.0000      |
|         soft_actor_critic         | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|          vision_maskrcnn          | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   | 60.6662  |
|          vision_maskrcnn          |  1   | 57.9893  |
|            hf_T5_large            |  1   | 30.2159  |
|            hf_BigBird             |  1   | 20.3531  |
|           hf_Longformer           |  1   | 19.4456  |
|   timm_vision_transformer_large   |  8   | 19.3107  |
|           BERT_pytorch            |  2   | 19.2475  |
|            timm_nfnet             | 128  | 19.1044  |
|           hf_GPT2_large           |  1   | 19.0724  |
|              yolov3               |  8   | 18.4934  |
|            hf_T5_base             |  1   |  18.095  |
|            densenet121            |  64  | 16.2855  |
|            Super_SloMo            |  6   |  14.675  |
|         timm_efficientnet         |  64  | 13.0065  |
|              hf_Bart              |  1   | 12.0408  |
|            timm_regnet            |  32  | 11.9215  |
|            hf_Reformer            |  1   | 11.5293  |
| attention_is_all_you_need_pytorch |  32  |  11.421  |
|               hf_T5               |  1   | 11.3392  |
|              hf_Bert              |  1   | 10.6351  |
|        mobilenet_v3_large         |  32  | 10.5692  |
|        shufflenet_v2_x1_0         |  64  | 10.4591  |
|        Background_Matting         |  1   |  10.313  |
|              hf_GPT2              |  1   | 10.1996  |
|           pytorch_unet            |  1   | 10.1988  |
|      timm_vision_transformer      |  8   | 10.1087  |
|           mobilenet_v2            |  16  | 10.0596  |
|          resnext50_32x4d          |  8   |  9.8076  |
|            mnasnet1_0             |  32  |  9.7355  |
|             hf_Albert             |  1   |  9.6999  |
|             resnet50              |  32  |  9.6626  |
|            timm_vovnet            |  32  |  9.639   |
|           timm_resnest            |  32  |  8.6942  |
|           hf_DistilBert           |  1   |  8.6865  |
|       functorch_dp_cifar10        |  64  |  8.1404  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  7.8394  |
|          pytorch_stargan          |  16  |  7.7223  |
|          LearningToPaint          |  96  |  7.7173  |
|             resnet18              |  8   |  7.4809  |
|           squeezenet1_1           |  16  |  7.2604  |
|               vgg16               |  4   |  7.1666  |
|               dlrm                | 2048 |  7.0587  |
|            tts_angular            |  64  |  7.012   |
|      nvidia_deeprecommender       | 256  |  6.9814  |
|              alexnet              | 128  |  6.8342  |
|                drq                |  1   |  6.5838  |
|               dcgan               | 256  |  6.5091  |
|           lennard_jones           | 1000 |  6.4467  |
|              demucs               |  1   |  1.4501  |
|      resnet50_quantized_qat       |  32  |  0.1674  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1381  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|      resnet50_quantized_qat       |  32  |  0.9962  |
|            hf_T5_base             |  1   |  0.9954  |
|            timm_nfnet             | 128  |  0.9928  |
|        Background_Matting         |  1   |  0.9925  |
|               vgg16               |  4   |  0.9925  |
|          LearningToPaint          |  96  |  0.9903  |
|            Super_SloMo            |  6   |  0.9903  |
|         timm_efficientnet         |  64  |  0.9902  |
|            hf_BigBird             |  1   |  0.9899  |
|           pytorch_unet            |  1   |  0.9891  |
|   timm_vision_transformer_large   |  8   |  0.9872  |
|            mnasnet1_0             |  32  |  0.9841  |
|           lennard_jones           | 1000 |  0.9824  |
| attention_is_all_you_need_pytorch |  32  |  0.982   |
|           mobilenet_v2            |  16  |  0.9801  |
|           hf_GPT2_large           |  1   |  0.9796  |
|        shufflenet_v2_x1_0         |  64  |  0.9793  |
|           hf_DistilBert           |  1   |  0.9792  |
|              alexnet              | 128  |  0.9791  |
|        mobilenet_v3_large         |  32  |  0.979   |
|          resnext50_32x4d          |  8   |  0.9787  |
|            timm_vovnet            |  32  |  0.9786  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9784  |
|                drq                |  1   |  0.9769  |
|              hf_Bart              |  1   |  0.9763  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9732  |
|           BERT_pytorch            |  2   |  0.9725  |
|           timm_resnest            |  32  |  0.9704  |
|             resnet50              |  32  |  0.9696  |
|          pytorch_stargan          |  16  |  0.9637  |
|              yolov3               |  8   |  0.9567  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9535  |
|              hf_GPT2              |  1   |  0.9527  |
|            tts_angular            |  64  |  0.9512  |
|               dlrm                | 2048 |  0.9468  |
|             hf_Albert             |  1   |  0.941   |
|      timm_vision_transformer      |  8   |  0.9341  |
|           hf_Longformer           |  1   |  0.9206  |
|       functorch_dp_cifar10        |  64  |  0.8988  |
|             resnet18              |  8   |  0.8541  |
|          vision_maskrcnn          |  1   |  0.853   |
|            hf_Reformer            |  1   |  0.8337  |
|            timm_regnet            |  32  |  0.8202  |
|           squeezenet1_1           |  16  |  0.7868  |
|            densenet121            |  64  |  0.7861  |
|              hf_Bert              |  1   |  0.7728  |
|               hf_T5               |  1   |  0.7683  |
|               dcgan               | 256  |  0.6899  |
|            hf_T5_large            |  1   |  0.6732  |
|      nvidia_deeprecommender       | 256  |  0.583   |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5426  |
|            XLNetLMHeadModel             | 32  |  1.9834  |
|     MobileBertForQuestionAnswering      | 64  |  1.3245  |
|               DistillGPT2               |  1  |  1.3185  |
|               GoogleFnet                |  1  |  1.2467  |
|            YituTechConvBert             |  1  |  1.1918  |
|          MobileBertForMaskedLM          | 32  |  1.1428  |
|                 BigBird                 |  1  |  1.0946  |
|     M2M100ForConditionalGeneration      |  8  |  1.0897  |
|             XGLMForCausalLM             |  8  |  1.0627  |
|          AllenaiLongformerBase          |  1  |  1.032   |
|                 T5Small                 |  1  |  1.0308  |
|            AlbertForMaskedLM            |  4  |  1.0299  |
|       AlbertForQuestionAnswering        |  4  |  1.0296  |
|             OPTForCausalLM              | 32  |  1.0154  |
|           DebertaForMaskedLM            |  4  |  0.9783  |
|      GPT2ForSequenceClassification      |  4  |  0.9597  |
|         Speech2Text2ForCausalLM         | 128 |  0.9463  |
|                CamemBert                |  1  |  0.9439  |
|       DebertaForQuestionAnswering       |  8  |  0.9296  |
|           RobertaForCausalLM            | 64  |  0.9035  |
|           ElectraForCausalLM            | 32  |  0.9019  |
|     PLBartForConditionalGeneration      | 16  |  0.8948  |
|      MBartForConditionalGeneration      | 16  |  0.8879  |
|            PLBartForCausalLM            | 32  |  0.8864  |
|    LayoutLMForSequenceClassification    | 16  |  0.8858  |
|         MegatronBertForCausalLM         | 16  |  0.8855  |
|     PegasusForConditionalGeneration     | 16  |  0.8843  |
|       T5ForConditionalGeneration        |  4  |  0.878   |
|            TrOCRForCausalLM             | 32  |  0.8737  |
|       RobertaForQuestionAnswering       | 128 |  0.8721  |
|           LayoutLMForMaskedLM           | 16  |  0.8682  |
|            MBartForCausalLM             | 32  |  0.8672  |
|           PegasusForCausalLM            | 32  |  0.8666  |
|    MegatronBertForQuestionAnswering     | 16  |  0.8616  |
|             BertForMaskedLM             | 64  |  0.8443  |
|     DistilBertForQuestionAnswering      | 64  |  0.8442  |
|        BertForQuestionAnswering         | 128 |  0.8413  |
|          DistilBertForMaskedLM          | 64  |  0.8356  |
|             BartForCausalLM             |  4  |  0.8227  |
|      BartForConditionalGeneration       |  2  |  0.8191  |
|       ElectraForQuestionAnswering       | 64  |  0.8126  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8103  |
|       BlenderbotSmallForCausalLM        | 64  |  0.7972  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|          MobileBertForMaskedLM          | 32  | 21.5827  |
|            YituTechConvBert             |  1  | 20.8297  |
|     MobileBertForQuestionAnswering      | 64  | 20.8148  |
|          AllenaiLongformerBase          |  1  | 20.7919  |
|           PegasusForCausalLM            | 32  | 20.7313  |
|     PegasusForConditionalGeneration     | 16  | 20.7223  |
|       DebertaForQuestionAnswering       |  8  | 20.5132  |
|           DebertaForMaskedLM            |  4  | 19.8944  |
|                 BigBird                 |  1  | 19.2729  |
|     M2M100ForConditionalGeneration      |  8  | 19.2597  |
|       AlbertForQuestionAnswering        |  4  | 18.5805  |
|      BartForConditionalGeneration       |  2  | 18.2648  |
|      MBartForConditionalGeneration      | 16  |  17.935  |
|             XGLMForCausalLM             |  8  | 17.3405  |
|     DistilBertForQuestionAnswering      | 64  | 16.6965  |
|         MegatronBertForCausalLM         | 16  | 16.4154  |
|    MegatronBertForQuestionAnswering     | 16  | 16.1948  |
|     PLBartForConditionalGeneration      | 16  | 14.9079  |
| BlenderbotSmallForConditionalGeneration | 64  | 14.6041  |
|       MT5ForConditionalGeneration       |  8  | 12.8268  |
|           RobertaForCausalLM            | 64  | 12.7428  |
|       T5ForConditionalGeneration        |  4  | 12.6852  |
|       RobertaForQuestionAnswering       | 128 | 12.6526  |
|           LayoutLMForMaskedLM           | 16  | 12.4495  |
|       ElectraForQuestionAnswering       | 64  | 12.3243  |
|           ElectraForCausalLM            | 32  | 12.2365  |
|             BertForMaskedLM             | 64  | 12.1568  |
|    LayoutLMForSequenceClassification    | 16  | 12.0709  |
|             BartForCausalLM             |  4  | 11.8716  |
|            MBartForCausalLM             | 32  | 11.7556  |
|        BertForQuestionAnswering         | 128 | 11.5338  |
|            TrOCRForCausalLM             | 32  | 11.3487  |
|                 T5Small                 |  1  | 11.3371  |
|      GPT2ForSequenceClassification      |  4  | 11.2746  |
|            AlbertForMaskedLM            |  4  | 11.0849  |
|             OPTForCausalLM              | 32  | 11.0221  |
|               GoogleFnet                |  1  | 10.7448  |
|                CamemBert                |  1  |  10.619  |
|         Speech2Text2ForCausalLM         | 128 | 10.3587  |
|       BlenderbotSmallForCausalLM        | 64  | 10.1789  |
|          DistilBertForMaskedLM          | 64  |  9.5233  |
|            PLBartForCausalLM            | 32  |  9.1554  |
|               DistillGPT2               |  1  |  8.2035  |
|            XLNetLMHeadModel             | 32  | -8.5389  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9977  |
|       AlbertForQuestionAnswering        |  4  |  0.9977  |
|             BartForCausalLM             |  4  |  0.9962  |
|       ElectraForQuestionAnswering       | 64  |  0.9962  |
|           RobertaForCausalLM            | 64  |  0.9956  |
|           ElectraForCausalLM            | 32  |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9953  |
|             BertForMaskedLM             | 64  |  0.9952  |
|          DistilBertForMaskedLM          | 64  |  0.9946  |
|            TrOCRForCausalLM             | 32  |  0.9941  |
|      GPT2ForSequenceClassification      |  4  |  0.9941  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9939  |
|         Speech2Text2ForCausalLM         | 128 |  0.9938  |
|       DebertaForQuestionAnswering       |  8  |  0.9938  |
|           PegasusForCausalLM            | 32  |  0.9935  |
|            MBartForCausalLM             | 32  |  0.9933  |
|       T5ForConditionalGeneration        |  4  |  0.9932  |
|             OPTForCausalLM              | 32  |  0.993   |
|           LayoutLMForMaskedLM           | 16  |  0.993   |
|    LayoutLMForSequenceClassification    | 16  |  0.9927  |
|            PLBartForCausalLM            | 32  |  0.9925  |
|     PegasusForConditionalGeneration     | 16  |  0.9921  |
|      BartForConditionalGeneration       |  2  |  0.9919  |
|                 BigBird                 |  1  |  0.9915  |
|           DebertaForMaskedLM            |  4  |  0.9913  |
|       RobertaForQuestionAnswering       | 128 |  0.9912  |
|      MBartForConditionalGeneration      | 16  |  0.9876  |
|         MegatronBertForCausalLM         | 16  |  0.9867  |
|               GoogleFnet                |  1  |  0.9838  |
|     PLBartForConditionalGeneration      | 16  |  0.9834  |
|               DistillGPT2               |  1  |  0.9828  |
|        BertForQuestionAnswering         | 128 |  0.9817  |
|          MobileBertForMaskedLM          | 32  |  0.9802  |
|     DistilBertForQuestionAnswering      | 64  |  0.9799  |
|            XLNetLMHeadModel             | 32  |  0.9733  |
|             XGLMForCausalLM             |  8  |  0.964   |
|    MegatronBertForQuestionAnswering     | 16  |  0.9603  |
|            YituTechConvBert             |  1  |  0.9454  |
|                 T5Small                 |  1  |  0.9432  |
|     M2M100ForConditionalGeneration      |  8  |  0.9419  |
|          AllenaiLongformerBase          |  1  |  0.9138  |
|     MobileBertForQuestionAnswering      | 64  |  0.8877  |
|                CamemBert                |  1  |  0.8388  |
|       MT5ForConditionalGeneration       |  8  |  0.7901  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.4327  |
|          inception_v3           | 128 |  1.2815  |
|       gluon_inception_v3        | 128 |  1.2746  |
|        adv_inception_v3         | 128 |   1.26   |
|           fbnetc_100            | 128 |  1.1964  |
|           res2next50            | 128 |  1.1947  |
|           mnasnet_100           | 128 |  1.1861  |
|          spnasnet_100           | 128 |  1.1838  |
|        res2net50_14w_8s         | 128 |  1.1719  |
|        ese_vovnet19b_dw         | 128 |  1.1705  |
|      mobilenetv3_large_100      | 128 |  1.1667  |
|             dpn107              | 32  |  1.1605  |
|             dla102              | 128 |  1.1539  |
|          ghostnet_100           | 128 |  1.1521  |
|            lcnet_050            | 128 |  1.1492  |
|            hrnet_w18            | 128 |  1.1482  |
|        res2net101_26w_4s        | 64  |  1.1464  |
|           volo_d1_224           | 64  |  1.1299  |
|            gernet_l             | 128 |  1.122   |
|            repvgg_a2            | 128 |  1.1182  |
|        gluon_xception65         | 32  |  1.1142  |
|         visformer_small         | 128 |  1.093   |
|           selecsls42b           | 128 |  1.0929  |
|            fbnetv3_b            | 128 |  1.0894  |
|     swsl_resnext101_32x16d      | 32  |  1.0821  |
|         mobilenetv2_100         | 128 |  1.0725  |
|           regnety_002           | 128 |  1.0644  |
|           tf_mixnet_l           | 128 |  1.0574  |
|          cspdarknet53           | 64  |  1.0558  |
|           resnest101e           | 64  |  1.0425  |
|      xcit_large_24_p8_224       |  5  |  1.0173  |
|          gmixer_24_224          | 128 |  1.0126  |
|            mixnet_l             | 128 |  1.0071  |
|          cait_m36_384           |  4  |  0.9831  |
|            nfnet_l0             | 128 |  0.9106  |
|  swin_base_patch4_window7_224   | 64  |  0.8936  |
|      beit_base_patch16_224      | 64  |  0.8895  |
|           rexnet_100            | 128 |  0.8856  |
|          convnext_base          | 64  |  0.8827  |
|           dm_nfnet_f0           | 128 |  0.879   |
|           convit_base           | 64  |  0.8711  |
|         poolformer_m36          | 64  |  0.8597  |
|       tf_efficientnet_b0        | 128 |  0.8576  |
|      vit_base_patch16_224       | 64  |  0.8576  |
| deit_base_distilled_patch16_224 | 64  |  0.8571  |
|           mobilevit_s           | 64  |  0.8447  |
|          jx_nest_base           | 32  |  0.8343  |
|          resmlp_12_224          | 128 |  0.8309  |
|            tinynet_a            | 128 |  0.8178  |
|        tnt_s_patch16_224        | 128 |  0.8126  |
|            pit_b_224            | 64  |  0.8018  |
|          mixer_b16_224          | 128 |  0.7926  |
|         coat_lite_mini          | 128 |  0.7568  |
|        twins_pcpvt_base         | 64  |  0.7359  |
|         crossvit_9_240          | 128 |  0.7294  |
|          gmlp_s16_224           | 128 |  0.5807  |
|        sebotnet33ts_256         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|            hrnet_w18            | 128 | 38.6761  |
|          pnasnet5large          | 16  | 30.2884  |
|  swin_base_patch4_window7_224   | 64  | 24.3665  |
|     swsl_resnext101_32x16d      | 32  | 22.5364  |
|          cait_m36_384           |  4  | 22.3963  |
|        twins_pcpvt_base         | 64  | 21.5813  |
|           tf_mixnet_l           | 128 | 21.4847  |
|           dm_nfnet_f0           | 128 | 21.4823  |
|           resnest101e           | 64  | 21.2826  |
|        res2net50_14w_8s         | 128 | 21.0787  |
|             dla102              | 128 | 20.9507  |
|         poolformer_m36          | 64  | 20.5791  |
|           mobilevit_s           | 64  | 20.5489  |
|      xcit_large_24_p8_224       |  5  | 20.2446  |
|        res2net101_26w_4s        | 64  | 20.0055  |
|            mixnet_l             | 128 | 19.8109  |
|        tnt_s_patch16_224        | 128 | 19.3401  |
|            fbnetv3_b            | 128 | 18.1484  |
|          gmlp_s16_224           | 128 |  18.061  |
|           regnety_002           | 128 | 18.0097  |
|             dpn107              | 32  | 17.9336  |
|           rexnet_100            | 128 | 17.8279  |
|            nfnet_l0             | 128 | 17.8076  |
|          jx_nest_base           | 32  | 17.2234  |
|        gluon_xception65         | 32  |  16.187  |
|           res2next50            | 128 | 15.6389  |
|          convnext_base          | 64  | 15.6367  |
|         crossvit_9_240          | 128 | 15.5356  |
|           volo_d1_224           | 64  | 15.4737  |
|       tf_efficientnet_b0        | 128 | 15.2815  |
|            tinynet_a            | 128 | 15.0834  |
|        adv_inception_v3         | 128 | 14.7912  |
|          inception_v3           | 128 | 14.7196  |
|       gluon_inception_v3        | 128 | 14.6679  |
|          ghostnet_100           | 128 | 14.5108  |
|         coat_lite_mini          | 128 | 14.4649  |
|           convit_base           | 64  | 14.0784  |
|            pit_b_224            | 64  |  14.03   |
|          cspdarknet53           | 64  | 13.4574  |
|          gmixer_24_224          | 128 | 12.8121  |
|          mixer_b16_224          | 128 | 12.6285  |
|      mobilenetv3_large_100      | 128 | 12.2343  |
|      beit_base_patch16_224      | 64  | 12.1593  |
|         mobilenetv2_100         | 128 | 12.1048  |
|           fbnetc_100            | 128 | 11.9571  |
|          spnasnet_100           | 128 | 11.8834  |
| deit_base_distilled_patch16_224 | 64  |  11.866  |
|         visformer_small         | 128 | 11.8494  |
|      vit_base_patch16_224       | 64  | 11.7031  |
|            gernet_l             | 128 | 11.1115  |
|           mnasnet_100           | 128 | 10.9839  |
|            repvgg_a2            | 128 |  10.406  |
|        ese_vovnet19b_dw         | 128 |  9.8925  |
|           selecsls42b           | 128 |  9.7574  |
|          resmlp_12_224          | 128 |  9.6532  |
|            lcnet_050            | 128 |  9.0859  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          mixer_b16_224          | 128 |  0.9979  |
|        ese_vovnet19b_dw         | 128 |  0.9974  |
|           res2next50            | 128 |  0.9972  |
|         visformer_small         | 128 |  0.9971  |
|     swsl_resnext101_32x16d      | 32  |  0.997   |
|         coat_lite_mini          | 128 |  0.9969  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           tf_mixnet_l           | 128 |  0.9966  |
|            mixnet_l             | 128 |  0.9965  |
| deit_base_distilled_patch16_224 | 64  |  0.9965  |
|           convit_base           | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|            gernet_l             | 128 |  0.9962  |
|        adv_inception_v3         | 128 |  0.9962  |
|      beit_base_patch16_224      | 64  |  0.9961  |
|          resmlp_12_224          | 128 |  0.996   |
|        gluon_xception65         | 32  |  0.996   |
|          gmixer_24_224          | 128 |  0.996   |
|            fbnetv3_b            | 128 |  0.9959  |
|       tf_efficientnet_b0        | 128 |  0.9959  |
|          convnext_base          | 64  |  0.9958  |
|          cspdarknet53           | 64  |  0.9958  |
|           resnest101e           | 64  |  0.9958  |
|        res2net50_14w_8s         | 128 |  0.9957  |
|            pit_b_224            | 64  |  0.9957  |
|           selecsls42b           | 128 |  0.9957  |
|           mobilevit_s           | 64  |  0.9956  |
|          gmlp_s16_224           | 128 |  0.9955  |
|            nfnet_l0             | 128 |  0.9948  |
|           fbnetc_100            | 128 |  0.9948  |
|          spnasnet_100           | 128 |  0.9945  |
|           rexnet_100            | 128 |  0.9945  |
|           mnasnet_100           | 128 |  0.9944  |
|        tnt_s_patch16_224        | 128 |  0.9942  |
|             dpn107              | 32  |  0.994   |
|            repvgg_a2            | 128 |  0.9939  |
|      mobilenetv3_large_100      | 128 |  0.9938  |
|         mobilenetv2_100         | 128 |  0.9936  |
|       gluon_inception_v3        | 128 |  0.9935  |
|  swin_base_patch4_window7_224   | 64  |  0.9935  |
|          inception_v3           | 128 |  0.9934  |
|         poolformer_m36          | 64  |  0.9933  |
|            tinynet_a            | 128 |  0.9928  |
|          cait_m36_384           |  4  |  0.9927  |
|          jx_nest_base           | 32  |  0.9926  |
|          ghostnet_100           | 128 |  0.9921  |
|        twins_pcpvt_base         | 64  |  0.992   |
|        res2net101_26w_4s        | 64  |  0.992   |
|          pnasnet5large          | 16  |  0.9919  |
|            hrnet_w18            | 128 |  0.9917  |
|         crossvit_9_240          | 128 |  0.9915  |
|           volo_d1_224           | 64  |  0.9911  |
|      xcit_large_24_p8_224       |  5  |  0.9894  |
|           regnety_002           | 128 |  0.9887  |
|            lcnet_050            | 128 |  0.9881  |
|             dla102              | 128 |  0.9829  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

ESI-SYD · 2022-11-03T08:46:13Z

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 50/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |     1.00x   |    1.10x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    18.23   |    18.39    |    21.87    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.18x    |    1.32x    |    1.11x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|        shufflenet_v2_x1_0         | 1  |  1.6303  |
|           timm_resnest            | 1  |  1.3047  |
| attention_is_all_you_need_pytorch | 1  |  1.2927  |
|           mobilenet_v2            | 1  |  1.2252  |
|           squeezenet1_1           | 1  |  1.2142  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.1865  |
|            densenet121            | 1  |  1.1803  |
|            mnasnet1_0             | 1  |  1.1663  |
|        mobilenet_v3_large         | 1  |  1.1543  |
|       functorch_dp_cifar10        | 1  |  1.1534  |
|          resnext50_32x4d          | 1  |  1.1496  |
|             resnet18              | 1  |  1.1423  |
|             resnet50              | 1  |  1.136   |
|          vision_maskrcnn          | 1  |  1.1325  |
|            timm_vovnet            | 1  |  1.1164  |
|     detectron2_fcos_r_50_fpn      | 1  |  1.0986  |
|               vgg16               | 1  |  1.0902  |
|              alexnet              | 1  |  1.0841  |
|                drq                | 1  |  1.0802  |
|           pytorch_unet            | 1  |  1.0801  |
|               dlrm                | 1  |   1.07   |
|            timm_regnet            | 1  |  1.0692  |
|          LearningToPaint          | 1  |  1.0669  |
|              yolov3               | 1  |  1.0582  |
|            Super_SloMo            | 1  |  1.0568  |
|               dcgan               | 1  |   1.03   |
|          pytorch_stargan          | 16 |  1.0153  |
|        Background_Matting         | 1  |  1.015   |
|              demucs               | 1  |  1.0007  |
|      resnet50_quantized_qat       | 1  |  0.9971  |
|            tts_angular            | 1  |  0.9956  |
|            timm_nfnet             | 1  |  0.994   |
|            hf_BigBird             | 1  |  0.9933  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9929  |
|      nvidia_deeprecommender       | 1  |  0.9801  |
|              hf_GPT2              | 1  |  0.8869  |
|             hf_Albert             | 1  |  0.8366  |
|           hf_Longformer           | 1  |  0.8335  |
|            hf_Reformer            | 1  |  0.824   |
|           lennard_jones           | 1  |  0.8225  |
|           BERT_pytorch            | 1  |  0.8156  |
|            hf_T5_large            | 1  |  0.8146  |
|   timm_vision_transformer_large   | 1  |  0.8083  |
|           hf_GPT2_large           | 1  |  0.7638  |
|               hf_T5               | 1  |  0.7614  |
|           hf_DistilBert           | 1  |  0.7511  |
|              hf_Bert              | 1  |  0.719   |
|              hf_Bart              | 1  |  0.7189  |
|            hf_T5_base             | 1  |  0.6985  |
|      timm_vision_transformer      | 1  |  0.6076  |
|         timm_efficientnet         | 1  |  0.5181  |
|           fastNLP_Bert            | 0  |   0.0    |
|             tacotron2             | 0  |   0.0    |
|        speech_transformer         | 0  |   0.0    |
|         soft_actor_critic         | 0  |   0.0    |
+-----------------------------------+----+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 1  | pass_due_to_skip |
|           hf_GPT2_large           | 1  | pass_due_to_skip |
|   timm_vision_transformer_large   | 1  | pass_due_to_skip |
|               hf_T5               | 1  |       pass       |
|               dlrm                | 1  |       pass       |
|            hf_T5_base             | 1  |       pass       |
|          LearningToPaint          | 1  |       pass       |
|            Super_SloMo            | 1  |       pass       |
|              alexnet              | 1  |       pass       |
| attention_is_all_you_need_pytorch | 1  |       pass       |
|               dcgan               | 1  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 1  |       pass       |
|     detectron2_fcos_r_50_fpn      | 1  |       pass       |
|                drq                | 1  |       pass       |
|            hf_Reformer            | 1  |       pass       |
|            mnasnet1_0             | 1  |       pass       |
|       functorch_dp_cifar10        | 1  |       pass       |
|             hf_Albert             | 1  |       pass       |
|              hf_Bart              | 1  |       pass       |
|              hf_Bert              | 1  |       pass       |
|            hf_BigBird             | 1  |       pass       |
|           hf_DistilBert           | 1  |       pass       |
|              hf_GPT2              | 1  |       pass       |
|           hf_Longformer           | 1  |       pass       |
|              yolov3               | 1  |       pass       |
|           lennard_jones           | 1  |       pass       |
|          resnext50_32x4d          | 1  |       pass       |
|           BERT_pytorch            | 1  |       pass       |
|      resnet50_quantized_qat       | 1  |       pass       |
|           mobilenet_v2            | 1  |       pass       |
|    mobilenet_v2_quantized_qat     | 1  |       pass       |
|        mobilenet_v3_large         | 1  |       pass       |
|      nvidia_deeprecommender       | 1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 1  |       pass       |
|               vgg16               | 1  |       pass       |
|             resnet50              | 1  |       pass       |
|             resnet18              | 1  |       pass       |
|        Background_Matting         | 1  |       pass       |
|        shufflenet_v2_x1_0         | 1  |       pass       |
|           squeezenet1_1           | 1  |       pass       |
|         timm_efficientnet         | 1  |       pass       |
|            timm_nfnet             | 1  |       pass       |
|            timm_regnet            | 1  |       pass       |
|           timm_resnest            | 1  |       pass       |
|      timm_vision_transformer      | 1  |       pass       |
|            timm_vovnet            | 1  |       pass       |
|            tts_angular            | 1  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|          vision_maskrcnn          | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|          vision_maskrcnn          | 1  | 67.9055  |
|     detectron2_fcos_r_50_fpn      | 1  | 65.0179  |
|            hf_T5_base             | 1  | 61.0279  |
|            hf_T5_large            | 1  | 55.3729  |
|           hf_GPT2_large           | 1  | 44.9441  |
|            densenet121            | 1  | 32.9563  |
|           hf_Longformer           | 1  | 31.9838  |
|            timm_nfnet             | 1  | 26.7587  |
|            hf_BigBird             | 1  | 26.0525  |
|              yolov3               | 1  | 25.8128  |
|   timm_vision_transformer_large   | 1  | 25.5415  |
|         timm_efficientnet         | 1  | 24.3729  |
|            Super_SloMo            | 1  | 23.1083  |
|        mobilenet_v3_large         | 1  | 22.8879  |
|            timm_vovnet            | 1  | 22.6841  |
|            timm_regnet            | 1  | 19.7768  |
|              hf_Bart              | 1  | 17.1645  |
|            mnasnet1_0             | 1  |  17.093  |
|        shufflenet_v2_x1_0         | 1  | 17.0703  |
|               hf_T5               | 1  | 16.8451  |
|           mobilenet_v2            | 1  | 16.7573  |
|            hf_Reformer            | 1  | 16.7426  |
|           timm_resnest            | 1  |  15.893  |
|        Background_Matting         | 1  | 15.1108  |
|             resnet50              | 1  | 15.0244  |
|          resnext50_32x4d          | 1  | 15.0111  |
|              hf_GPT2              | 1  | 14.6155  |
|           BERT_pytorch            | 1  | 13.7351  |
|              hf_Bert              | 1  | 13.1208  |
|      timm_vision_transformer      | 1  | 13.1204  |
| attention_is_all_you_need_pytorch | 1  | 12.8987  |
|       functorch_dp_cifar10        | 1  | 12.6869  |
|             hf_Albert             | 1  | 12.3245  |
|           pytorch_unet            | 1  |  12.05   |
|             resnet18              | 1  | 11.8429  |
|           hf_DistilBert           | 1  |  11.227  |
|          LearningToPaint          | 1  | 11.1157  |
|           squeezenet1_1           | 1  | 11.0953  |
|   pytorch_CycleGAN_and_pix2pix    | 1  | 10.7537  |
|          pytorch_stargan          | 16 |  10.425  |
|               vgg16               | 1  |  9.3719  |
|               dlrm                | 1  |  9.1913  |
|      nvidia_deeprecommender       | 1  |  9.0608  |
|                drq                | 1  |  8.9684  |
|            tts_angular            | 1  |  8.1898  |
|              alexnet              | 1  |  7.8231  |
|           lennard_jones           | 1  |  7.7809  |
|               dcgan               | 1  |  7.5431  |
|              demucs               | 1  |  0.9926  |
|    mobilenet_v2_quantized_qat     | 1  |  0.1719  |
|      resnet50_quantized_qat       | 1  |  0.1575  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
+-----------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  |  4.0697  |
|           pytorch_unet            | 1  |  2.4777  |
|            hf_BigBird             | 1  |  2.1386  |
|           hf_GPT2_large           | 1  |  1.9364  |
|            Super_SloMo            | 1  |  1.6529  |
|            hf_T5_large            | 1  |  1.5674  |
|           hf_Longformer           | 1  |  1.4812  |
|          vision_maskrcnn          | 1  |  1.4019  |
|              yolov3               | 1  |  1.2458  |
|          LearningToPaint          | 1  |  1.2383  |
|           hf_DistilBert           | 1  |  1.2054  |
|            timm_nfnet             | 1  |  1.1525  |
|             hf_Albert             | 1  |  1.1471  |
|            timm_regnet            | 1  |  1.144   |
|   timm_vision_transformer_large   | 1  |  1.1144  |
|          resnext50_32x4d          | 1  |  1.1056  |
|             resnet50              | 1  |  1.0811  |
|                drq                | 1  |  1.0667  |
|           mobilenet_v2            | 1  |  1.0624  |
|            timm_vovnet            | 1  |  1.0523  |
|         timm_efficientnet         | 1  |  1.0511  |
|           timm_resnest            | 1  |  1.0487  |
|      timm_vision_transformer      | 1  |  1.0453  |
|              hf_Bart              | 1  |  1.0411  |
|            mnasnet1_0             | 1  |  1.0159  |
|              demucs               | 1  |  0.9985  |
|               dlrm                | 1  |  0.9976  |
|             resnet18              | 1  |  0.9969  |
|      nvidia_deeprecommender       | 1  |  0.9965  |
|            tts_angular            | 1  |  0.9958  |
|      resnet50_quantized_qat       | 1  |  0.9956  |
|        Background_Matting         | 1  |  0.9937  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9934  |
|          pytorch_stargan          | 16 |  0.9883  |
| attention_is_all_you_need_pytorch | 1  |  0.9837  |
|        mobilenet_v3_large         | 1  |  0.9822  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  0.9813  |
|              alexnet              | 1  |  0.9807  |
|           lennard_jones           | 1  |  0.9804  |
|     detectron2_fcos_r_50_fpn      | 1  |  0.9786  |
|               vgg16               | 1  |  0.9737  |
|       functorch_dp_cifar10        | 1  |  0.9703  |
|           squeezenet1_1           | 1  |  0.9689  |
|              hf_GPT2              | 1  |  0.9674  |
|        shufflenet_v2_x1_0         | 1  |  0.9536  |
|               dcgan               | 1  |  0.9478  |
|              hf_Bert              | 1  |  0.9125  |
|           BERT_pytorch            | 1  |  0.8995  |
|               hf_T5               | 1  |  0.808   |
|            densenet121            | 1  |  0.7885  |
|            hf_Reformer            | 1  |  0.7624  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
+-----------------------------------+----+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0915  |
|            XLNetLMHeadModel             | 1  |  1.0461  |
|       AlbertForQuestionAnswering        | 1  |  0.9584  |
|          MobileBertForMaskedLM          | 1  |  0.9445  |
|            AlbertForMaskedLM            | 1  |  0.9374  |
|                 BigBird                 | 1  |  0.9061  |
|     M2M100ForConditionalGeneration      | 1  |  0.9058  |
|             OPTForCausalLM              | 1  |  0.881   |
|               GoogleFnet                | 1  |  0.8584  |
|            YituTechConvBert             | 1  |  0.824   |
|         Speech2Text2ForCausalLM         | 1  |  0.823   |
|       DebertaForQuestionAnswering       | 1  |  0.814   |
|      MBartForConditionalGeneration      | 1  |  0.8049  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8045  |
|     PegasusForConditionalGeneration     | 1  |  0.8037  |
|         MegatronBertForCausalLM         | 1  |  0.7996  |
|          AllenaiLongformerBase          | 1  |  0.7925  |
|       RobertaForQuestionAnswering       | 1  |  0.7884  |
|            MBartForCausalLM             | 1  |  0.7875  |
|            TrOCRForCausalLM             | 1  |  0.7868  |
|           PegasusForCausalLM            | 1  |  0.7839  |
|             XGLMForCausalLM             | 1  |  0.7793  |
|           DebertaForMaskedLM            | 1  |  0.7789  |
|           RobertaForCausalLM            | 1  |  0.7719  |
|     DistilBertForQuestionAnswering      | 1  |  0.7685  |
|     PLBartForConditionalGeneration      | 1  |  0.768   |
|        BertForQuestionAnswering         | 1  |  0.7573  |
|          DistilBertForMaskedLM          | 1  |  0.7509  |
|             BertForMaskedLM             | 1  |  0.7464  |
|               DistillGPT2               | 1  |  0.7368  |
|            PLBartForCausalLM            | 1  |  0.7231  |
|       MT5ForConditionalGeneration       | 1  |  0.7122  |
|    LayoutLMForSequenceClassification    | 1  |  0.6877  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.6839  |
|           LayoutLMForMaskedLM           | 1  |  0.6829  |
|      GPT2ForSequenceClassification      | 1  |  0.6814  |
|                CamemBert                | 1  |  0.675   |
|             BartForCausalLM             | 1  |  0.6741  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6694  |
|       T5ForConditionalGeneration        | 1  |  0.6436  |
|                 T5Small                 | 1  |  0.6422  |
|      BartForConditionalGeneration       | 1  |  0.6387  |
|       ElectraForQuestionAnswering       | 1  |  0.5124  |
|           ElectraForCausalLM            | 1  |  0.494   |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 41.6146  |
|          AllenaiLongformerBase          | 1  | 27.9724  |
|          MobileBertForMaskedLM          | 1  | 24.7466  |
|     MobileBertForQuestionAnswering      | 1  | 24.6238  |
|     PegasusForConditionalGeneration     | 1  | 23.6489  |
|     M2M100ForConditionalGeneration      | 1  | 23.5255  |
|      MBartForConditionalGeneration      | 1  | 22.7092  |
|           DebertaForMaskedLM            | 1  | 22.4907  |
|                 BigBird                 | 1  | 22.4712  |
|       MT5ForConditionalGeneration       | 1  | 22.4523  |
|             XGLMForCausalLM             | 1  | 22.1615  |
|       T5ForConditionalGeneration        | 1  | 22.1289  |
|       DebertaForQuestionAnswering       | 1  | 21.9579  |
|                 T5Small                 | 1  |  21.842  |
|             BartForCausalLM             | 1  | 21.1736  |
|       BlenderbotSmallForCausalLM        | 1  | 20.9169  |
|         MegatronBertForCausalLM         | 1  | 20.7428  |
|    LayoutLMForSequenceClassification    | 1  | 20.6166  |
|            XLNetLMHeadModel             | 1  | 20.5761  |
|    MegatronBertForQuestionAnswering     | 1  | 20.2754  |
|      GPT2ForSequenceClassification      | 1  | 19.0535  |
|            YituTechConvBert             | 1  | 17.6785  |
|           PegasusForCausalLM            | 1  | 16.5059  |
| BlenderbotSmallForConditionalGeneration | 1  | 16.2353  |
|     PLBartForConditionalGeneration      | 1  | 15.9034  |
|                CamemBert                | 1  | 15.6886  |
|            MBartForCausalLM             | 1  | 15.5201  |
|           LayoutLMForMaskedLM           | 1  | 15.3705  |
|       AlbertForQuestionAnswering        | 1  | 14.7732  |
|             OPTForCausalLM              | 1  |  14.347  |
|           ElectraForCausalLM            | 1  | 13.7837  |
|             BertForMaskedLM             | 1  | 13.7694  |
|           RobertaForCausalLM            | 1  | 13.7615  |
|            TrOCRForCausalLM             | 1  | 13.5365  |
|       RobertaForQuestionAnswering       | 1  | 13.5005  |
|       ElectraForQuestionAnswering       | 1  | 13.4956  |
|        BertForQuestionAnswering         | 1  | 13.3995  |
|               DistillGPT2               | 1  | 13.3664  |
|               GoogleFnet                | 1  | 12.4917  |
|         Speech2Text2ForCausalLM         | 1  | 12.3276  |
|            AlbertForMaskedLM            | 1  | 11.6542  |
|            PLBartForCausalLM            | 1  | 11.6237  |
|          DistilBertForMaskedLM          | 1  |  11.446  |
|     DistilBertForQuestionAnswering      | 1  | 11.2391  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.802   |
|       AlbertForQuestionAnswering        | 1  |  2.583   |
|      BartForConditionalGeneration       | 1  |  2.1312  |
|       T5ForConditionalGeneration        | 1  |  2.1166  |
|                 BigBird                 | 1  |  2.0647  |
|                 T5Small                 | 1  |  1.9002  |
|      GPT2ForSequenceClassification      | 1  |  1.846   |
|          AllenaiLongformerBase          | 1  |  1.7207  |
|             BartForCausalLM             | 1  |  1.4786  |
|           DebertaForMaskedLM            | 1  |  1.4048  |
|               GoogleFnet                | 1  |  1.3821  |
|            XLNetLMHeadModel             | 1  |  1.3705  |
|       DebertaForQuestionAnswering       | 1  |  1.3687  |
|                CamemBert                | 1  |  1.3494  |
|            YituTechConvBert             | 1  |  1.3208  |
|           LayoutLMForMaskedLM           | 1  |  1.3173  |
|    LayoutLMForSequenceClassification    | 1  |  1.291   |
|           ElectraForCausalLM            | 1  |  1.2799  |
|               DistillGPT2               | 1  |  1.223   |
|       ElectraForQuestionAnswering       | 1  |  1.1931  |
|       MT5ForConditionalGeneration       | 1  |  1.1079  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.1071  |
|     PegasusForConditionalGeneration     | 1  |  1.0873  |
|     M2M100ForConditionalGeneration      | 1  |  1.086   |
|      MBartForConditionalGeneration      | 1  |  1.0757  |
|         MegatronBertForCausalLM         | 1  |  1.0552  |
|            TrOCRForCausalLM             | 1  |  1.0532  |
|     PLBartForConditionalGeneration      | 1  |  1.051   |
|    MegatronBertForQuestionAnswering     | 1  |  1.0509  |
|             BertForMaskedLM             | 1  |  1.0492  |
|           RobertaForCausalLM            | 1  |  1.0466  |
|             XGLMForCausalLM             | 1  |  1.0427  |
|            PLBartForCausalLM            | 1  |  1.0404  |
|          DistilBertForMaskedLM          | 1  |  1.0374  |
|       BlenderbotSmallForCausalLM        | 1  |  1.037   |
|       RobertaForQuestionAnswering       | 1  |  1.0349  |
|        BertForQuestionAnswering         | 1  |  1.0348  |
|             OPTForCausalLM              | 1  |  1.0214  |
|     DistilBertForQuestionAnswering      | 1  |  1.0185  |
|            MBartForCausalLM             | 1  |  1.0161  |
|           PegasusForCausalLM            | 1  |  1.0117  |
|          MobileBertForMaskedLM          | 1  |  0.9992  |
|         Speech2Text2ForCausalLM         | 1  |  0.9802  |
|     MobileBertForQuestionAnswering      | 1  |  0.9763  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.7341  |
|        ese_vovnet19b_dw         | 1  |  1.3548  |
|           regnety_002           | 1  |  1.3477  |
|            lcnet_050            | 1  |  1.3356  |
|          inception_v3           | 1  |  1.3331  |
|          spnasnet_100           | 1  |  1.3173  |
|       gluon_inception_v3        | 1  |  1.313   |
|        adv_inception_v3         | 1  |  1.3073  |
|           mnasnet_100           | 1  |  1.2884  |
|          gmixer_24_224          | 1  |  1.2732  |
|           fbnetc_100            | 1  |  1.2512  |
|        gluon_xception65         | 1  |  1.2378  |
|         mobilenetv2_100         | 1  |  1.2232  |
|           res2next50            | 1  |  1.2148  |
|        res2net50_14w_8s         | 1  |  1.195   |
|          ghostnet_100           | 1  |  1.1809  |
|      mobilenetv3_large_100      | 1  |  1.169   |
|            fbnetv3_b            | 1  |  1.162   |
|             dla102              | 1  |  1.1449  |
|        res2net101_26w_4s        | 1  |  1.1322  |
|            hrnet_w18            | 1  |  1.1296  |
|            gernet_l             | 1  |  1.0825  |
|           selecsls42b           | 1  |  1.0775  |
|             dpn107              | 1  |  1.067   |
|          cspdarknet53           | 1  |  1.0659  |
|            repvgg_a2            | 1  |  1.0495  |
|           resnest101e           | 1  |  1.0373  |
|     swsl_resnext101_32x16d      | 1  |  1.0226  |
|            nfnet_l0             | 1  |  0.9919  |
|           dm_nfnet_f0           | 1  |  0.9164  |
|         visformer_small         | 1  |  0.9081  |
|      xcit_large_24_p8_224       | 1  |  0.8928  |
|      beit_base_patch16_224      | 1  |  0.8315  |
|  swin_base_patch4_window7_224   | 1  |  0.7798  |
| deit_base_distilled_patch16_224 | 1  |  0.7748  |
|           convit_base           | 1  |  0.7729  |
|      vit_base_patch16_224       | 1  |  0.7706  |
|          cait_m36_384           | 1  |  0.7586  |
|           volo_d1_224           | 1  |  0.7516  |
|         poolformer_m36          | 1  |  0.7361  |
|           tf_mixnet_l           | 1  |  0.7334  |
|            mixnet_l             | 1  |  0.7243  |
|          mixer_b16_224          | 1  |  0.718   |
|           rexnet_100            | 1  |  0.6964  |
|          resmlp_12_224          | 1  |  0.6721  |
|            pit_b_224            | 1  |  0.6687  |
|          convnext_base          | 1  |  0.6642  |
|        tnt_s_patch16_224        | 1  |  0.6411  |
|          jx_nest_base           | 1  |  0.6265  |
|        twins_pcpvt_base         | 1  |  0.6247  |
|         crossvit_9_240          | 1  |  0.6108  |
|           mobilevit_s           | 1  |  0.5634  |
|         coat_lite_mini          | 1  |  0.5461  |
|            tinynet_a            | 1  |  0.5407  |
|       tf_efficientnet_b0        | 1  |  0.4227  |
|          gmlp_s16_224           | 1  |  0.3766  |
|        eca_halonext26ts         | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 41.7089  |
|            hrnet_w18            | 1  | 39.2931  |
|           tf_mixnet_l           | 1  | 33.1455  |
|           rexnet_100            | 1  | 31.1758  |
|        twins_pcpvt_base         | 1  |  31.058  |
|          ghostnet_100           | 1  | 30.5595  |
|        res2net50_14w_8s         | 1  | 30.4734  |
|  swin_base_patch4_window7_224   | 1  |  30.162  |
|           mobilevit_s           | 1  | 29.9955  |
|         coat_lite_mini          | 1  | 29.8229  |
|            nfnet_l0             | 1  | 29.7563  |
|            mixnet_l             | 1  | 29.7319  |
|          cait_m36_384           | 1  | 29.3847  |
|           resnest101e           | 1  | 28.1108  |
|        res2net101_26w_4s        | 1  | 27.3083  |
|            fbnetv3_b            | 1  | 26.8071  |
|      xcit_large_24_p8_224       | 1  | 26.5514  |
|             dpn107              | 1  | 26.4295  |
|           dm_nfnet_f0           | 1  | 25.5659  |
|       tf_efficientnet_b0        | 1  | 24.8257  |
|         poolformer_m36          | 1  | 24.7907  |
|            tinynet_a            | 1  | 24.6314  |
|          jx_nest_base           | 1  | 22.7314  |
|           res2next50            | 1  | 22.3748  |
|      mobilenetv3_large_100      | 1  | 21.9447  |
|         crossvit_9_240          | 1  | 21.3376  |
|        gluon_xception65         | 1  | 21.1032  |
|        tnt_s_patch16_224        | 1  | 21.0291  |
|           volo_d1_224           | 1  | 20.9827  |
|       gluon_inception_v3        | 1  | 20.4352  |
|             dla102              | 1  | 19.9123  |
|        adv_inception_v3         | 1  | 19.8067  |
|          inception_v3           | 1  | 19.6737  |
|          convnext_base          | 1  | 19.1797  |
|          cspdarknet53           | 1  | 19.1125  |
|           fbnetc_100            | 1  | 19.0881  |
|          spnasnet_100           | 1  | 18.6126  |
|            pit_b_224            | 1  | 18.0546  |
|           regnety_002           | 1  | 17.5177  |
|     swsl_resnext101_32x16d      | 1  | 17.4402  |
|          gmlp_s16_224           | 1  | 17.4338  |
|         mobilenetv2_100         | 1  | 16.8956  |
|           mnasnet_100           | 1  | 16.8593  |
|           convit_base           | 1  | 16.6751  |
|         visformer_small         | 1  | 16.2995  |
|            gernet_l             | 1  | 14.7961  |
|          gmixer_24_224          | 1  | 14.7167  |
|           selecsls42b           | 1  | 13.9762  |
|        ese_vovnet19b_dw         | 1  | 13.5153  |
|          mixer_b16_224          | 1  | 13.4293  |
|            repvgg_a2            | 1  | 13.3048  |
| deit_base_distilled_patch16_224 | 1  | 12.8625  |
|            lcnet_050            | 1  |  12.858  |
|      beit_base_patch16_224      | 1  |  12.767  |
|      vit_base_patch16_224       | 1  | 12.5794  |
|          resmlp_12_224          | 1  | 10.5512  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.3088  |
|          cait_m36_384           | 1  |  2.1772  |
|          pnasnet5large          | 1  |  1.6668  |
|        gluon_xception65         | 1  |  1.4668  |
|           mobilevit_s           | 1  |  1.2066  |
|           dm_nfnet_f0           | 1  |  1.205   |
|          jx_nest_base           | 1  |  1.1979  |
|     swsl_resnext101_32x16d      | 1  |  1.1881  |
|            nfnet_l0             | 1  |  1.1865  |
|           resnest101e           | 1  |  1.1833  |
|             dpn107              | 1  |  1.1777  |
|           convit_base           | 1  |  1.1426  |
|          cspdarknet53           | 1  |  1.1421  |
|             dla102              | 1  |  1.1357  |
|          convnext_base          | 1  |  1.1312  |
|         poolformer_m36          | 1  |  1.1191  |
|          gmixer_24_224          | 1  |  1.1105  |
|           selecsls42b           | 1  |  1.1089  |
|  swin_base_patch4_window7_224   | 1  |  1.1085  |
|            pit_b_224            | 1  |  1.1047  |
|           volo_d1_224           | 1  |  1.0932  |
|           res2next50            | 1  |  1.0922  |
|        res2net101_26w_4s        | 1  |   1.09   |
|        tnt_s_patch16_224        | 1  |  1.0877  |
| deit_base_distilled_patch16_224 | 1  |  1.0821  |
|      vit_base_patch16_224       | 1  |  1.0819  |
|          mixer_b16_224          | 1  |  1.0805  |
|        twins_pcpvt_base         | 1  |  1.0702  |
|           rexnet_100            | 1  |  1.0687  |
|       tf_efficientnet_b0        | 1  |  1.0654  |
|         mobilenetv2_100         | 1  |  1.0622  |
|           tf_mixnet_l           | 1  |  1.0591  |
|          inception_v3           | 1  |  1.0579  |
|        ese_vovnet19b_dw         | 1  |  1.0563  |
|       gluon_inception_v3        | 1  |  1.054   |
|        res2net50_14w_8s         | 1  |  1.0513  |
|            mixnet_l             | 1  |  1.0512  |
|          resmlp_12_224          | 1  |  1.0472  |
|        adv_inception_v3         | 1  |  1.0458  |
|         coat_lite_mini          | 1  |  1.0436  |
|            fbnetv3_b            | 1  |  1.0302  |
|            tinynet_a            | 1  |  1.0243  |
|            gernet_l             | 1  |  1.0241  |
|            repvgg_a2            | 1  |   1.02   |
|          spnasnet_100           | 1  |  1.0174  |
|           mnasnet_100           | 1  |  1.0144  |
|           fbnetc_100            | 1  |  1.0034  |
|            hrnet_w18            | 1  |  0.984   |
|         visformer_small         | 1  |  0.9648  |
|            lcnet_050            | 1  |  0.9643  |
|           regnety_002           | 1  |  0.9594  |
|      beit_base_patch16_224      | 1  |  0.9559  |
|          ghostnet_100           | 1  |  0.9436  |
|      mobilenetv3_large_100      | 1  |  0.8822  |
|         crossvit_9_240          | 1  |  0.8758  |
|          gmlp_s16_224           | 1  |  0.8741  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

ESI-SYD · 2022-11-08T11:17:06Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 49/54 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.07x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    11.68    |    14.90    |    18.76    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.96x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.8229  |
|            densenet121            |  64  |  1.4481  |
|            mnasnet1_0             |  32  |  1.2714  |
|             resnet18              |  8   |  1.2227  |
|           pytorch_unet            |  1   |  1.187   |
|           timm_resnest            |  32  |  1.1784  |
|           squeezenet1_1           |  16  |  1.1622  |
|       functorch_dp_cifar10        |  64  |  1.1606  |
|              alexnet              | 128  |  1.1507  |
|           mobilenet_v2            |  16  |  1.145   |
|        mobilenet_v3_large         |  32  |  1.1357  |
|            timm_vovnet            |  32  |  1.1106  |
|                drq                |  1   |  1.1049  |
|             resnet50              |  32  |  1.093   |
|               dcgan               | 256  |  1.0908  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0891  |
|               vgg16               |  4   |  1.0725  |
|            hf_Reformer            |  1   |  1.0698  |
|            Super_SloMo            |  6   |  1.0689  |
|            hf_T5_large            |  1   |  1.0617  |
|          LearningToPaint          |  96  |  1.0579  |
|               dlrm                | 2048 |  1.0565  |
|        Background_Matting         |  1   |  1.0554  |
|            timm_regnet            |  32  |  1.0524  |
|            hf_BigBird             |  1   |  1.0321  |
|          pytorch_stargan          |  16  |  1.0292  |
|               hf_T5               |  1   |  1.0188  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9988  |
|              demucs               |  1   |  0.9975  |
|            tts_angular            |  64  |  0.9962  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|           BERT_pytorch            |  2   |  0.9875  |
|           hf_GPT2_large           |  1   |  0.9401  |
|              hf_GPT2              |  1   |  0.9269  |
|           hf_Longformer           |  1   |  0.921   |
|             hf_Albert             |  1   |  0.8995  |
|              yolov3               |  8   |  0.8824  |
|            hf_T5_base             |  1   |  0.8824  |
|   timm_vision_transformer_large   |  8   |  0.8753  |
|          resnext50_32x4d          |  8   |  0.8594  |
|      nvidia_deeprecommender       | 256  |  0.8387  |
|           hf_DistilBert           |  1   |  0.8317  |
|            timm_nfnet             | 128  |  0.8173  |
|              hf_Bert              |  1   |  0.8049  |
|         timm_efficientnet         |  64  |  0.7993  |
|              hf_Bart              |  1   |  0.7981  |
| attention_is_all_you_need_pytorch |  32  |  0.728   |
|      timm_vision_transformer      |  8   |  0.637   |
|           lennard_jones           | 1000 |  0.3592  |
|             tacotron2             |  0   |   0.0    |
|        speech_transformer         |  0   |   0.0    |
|         soft_actor_critic         |  0   |   0.0    |
|           fastNLP_Bert            |  0   |   0.0    |
|         timm_efficientdet         |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 2  | pass_due_to_skip |
|           hf_GPT2_large           | 2  | pass_due_to_skip |
|   timm_vision_transformer_large   | 2  | pass_due_to_skip |
|               hf_T5               | 2  |       pass       |
|            hf_Reformer            | 2  |       pass       |
|            hf_T5_base             | 2  |       pass       |
|          LearningToPaint          | 2  |       pass       |
|            Super_SloMo            | 2  |       pass       |
|              alexnet              | 2  |       pass       |
| attention_is_all_you_need_pytorch | 2  |       pass       |
|               dcgan               | 2  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 2  |       pass       |
|               dlrm                | 2  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 2  |       pass       |
|           mobilenet_v2            | 2  |       pass       |
|             hf_Albert             | 2  |       pass       |
|              hf_Bart              | 2  |       pass       |
|              hf_Bert              | 2  |       pass       |
|            hf_BigBird             | 2  |       pass       |
|           hf_DistilBert           | 2  |       pass       |
|              hf_GPT2              | 2  |       pass       |
|           hf_Longformer           | 2  |       pass       |
|       functorch_dp_cifar10        | 2  |       pass       |
|           lennard_jones           | 2  |       pass       |
|        Background_Matting         | 1  |       pass       |
|          resnext50_32x4d          | 2  |       pass       |
|           BERT_pytorch            | 2  |       pass       |
|      resnet50_quantized_qat       | 2  |       pass       |
|    mobilenet_v2_quantized_qat     | 2  |       pass       |
|        mobilenet_v3_large         | 2  |       pass       |
|      nvidia_deeprecommender       | 2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 2  |       pass       |
|               vgg16               | 2  |       pass       |
|             resnet50              | 2  |       pass       |
|             resnet18              | 2  |       pass       |
|            mnasnet1_0             | 2  |       pass       |
|        shufflenet_v2_x1_0         | 2  |       pass       |
|           squeezenet1_1           | 2  |       pass       |
|         timm_efficientnet         | 2  |       pass       |
|            timm_nfnet             | 2  |       pass       |
|            timm_regnet            | 2  |       pass       |
|           timm_resnest            | 2  |       pass       |
|      timm_vision_transformer      | 2  |       pass       |
|            timm_vovnet            | 2  |       pass       |
|            tts_angular            | 2  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|         timm_efficientdet         | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|            hf_T5_large            |  1   | 33.7788  |
|   timm_vision_transformer_large   |  8   |  32.45   |
|            timm_nfnet             | 128  | 23.7158  |
|           hf_GPT2_large           |  1   | 22.4235  |
|            densenet121            |  64  | 21.8843  |
|            hf_T5_base             |  1   | 21.4543  |
|            hf_BigBird             |  1   | 21.3205  |
|           hf_Longformer           |  1   | 20.3187  |
|              yolov3               |  8   | 18.3641  |
|            Super_SloMo            |  6   | 16.8824  |
|         timm_efficientnet         |  64  | 14.9878  |
|            timm_regnet            |  32  | 14.5997  |
|              hf_Bert              |  1   | 13.6024  |
|              hf_Bart              |  1   | 12.6615  |
|            hf_Reformer            |  1   | 12.1682  |
| attention_is_all_you_need_pytorch |  32  | 11.8647  |
|               hf_T5               |  1   | 11.8495  |
|           BERT_pytorch            |  2   | 11.4679  |
|              hf_GPT2              |  1   | 10.6672  |
|      timm_vision_transformer      |  8   | 10.5754  |
|        Background_Matting         |  1   | 10.4842  |
|             resnet50              |  32  | 10.3343  |
|             hf_Albert             |  1   | 10.2016  |
|            timm_vovnet            |  32  | 10.1441  |
|        mobilenet_v3_large         |  32  | 10.0769  |
|        shufflenet_v2_x1_0         |  64  |  9.7873  |
|          resnext50_32x4d          |  8   |  9.7049  |
|            mnasnet1_0             |  32  |  9.2318  |
|           mobilenet_v2            |  16  |  9.2068  |
|           hf_DistilBert           |  1   |  8.9652  |
|           timm_resnest            |  32  |  8.827   |
|           pytorch_unet            |  1   |  8.7658  |
|       functorch_dp_cifar10        |  64  |  8.4817  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  8.1832  |
|          pytorch_stargan          |  16  |  8.176   |
|          LearningToPaint          |  96  |  7.7343  |
|           squeezenet1_1           |  16  |  7.5185  |
|             resnet18              |  8   |  7.5156  |
|      nvidia_deeprecommender       | 256  |  7.5014  |
|               vgg16               |  4   |  7.4072  |
|               dlrm                | 2048 |  7.3599  |
|              alexnet              | 128  |  7.1185  |
|            tts_angular            |  64  |  7.072   |
|                drq                |  1   |  6.7346  |
|           lennard_jones           | 1000 |  6.6819  |
|              demucs               |  1   |  1.4613  |
|               dcgan               | 256  |  0.2598  |
|    mobilenet_v2_quantized_qat     |  96  |  0.2521  |
|      resnet50_quantized_qat       |  32  |  0.2215  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientdet         |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|               dlrm                | 2048 |  0.9968  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|        Background_Matting         |  1   |  0.994   |
|   timm_vision_transformer_large   |  8   |  0.9935  |
|         timm_efficientnet         |  64  |  0.9925  |
|               vgg16               |  4   |  0.9924  |
|            timm_nfnet             | 128  |  0.9918  |
|          LearningToPaint          |  96  |  0.9915  |
|           pytorch_unet            |  1   |  0.9909  |
|            timm_regnet            |  32  |  0.9908  |
|            hf_BigBird             |  1   |  0.9908  |
|            Super_SloMo            |  6   |  0.9902  |
|            densenet121            |  64  |  0.9894  |
|             resnet50              |  32  |  0.9881  |
|            mnasnet1_0             |  32  |  0.9859  |
|                drq                |  1   |  0.9854  |
|            hf_T5_base             |  1   |  0.9838  |
|           lennard_jones           | 1000 |  0.9822  |
|        mobilenet_v3_large         |  32  |  0.9817  |
| attention_is_all_you_need_pytorch |  32  |  0.9815  |
|           mobilenet_v2            |  16  |  0.9805  |
|           hf_GPT2_large           |  1   |  0.9805  |
|        shufflenet_v2_x1_0         |  64  |  0.9804  |
|            timm_vovnet            |  32  |  0.9794  |
|            tts_angular            |  64  |  0.9786  |
|              alexnet              | 128  |  0.9784  |
|           hf_DistilBert           |  1   |  0.9774  |
|          resnext50_32x4d          |  8   |  0.9757  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9735  |
|             hf_Albert             |  1   |  0.9723  |
|              hf_Bert              |  1   |  0.9689  |
|              hf_Bart              |  1   |  0.9658  |
|          pytorch_stargan          |  16  |  0.9632  |
|              hf_GPT2              |  1   |  0.9616  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.949   |
|      timm_vision_transformer      |  8   |  0.9335  |
|             resnet18              |  8   |  0.9325  |
|               dcgan               | 256  |  0.9308  |
|           hf_Longformer           |  1   |  0.9176  |
|           BERT_pytorch            |  2   |  0.9168  |
|            hf_Reformer            |  1   |  0.9164  |
|       functorch_dp_cifar10        |  64  |  0.9003  |
|           timm_resnest            |  32  |  0.8899  |
|              yolov3               |  8   |  0.8676  |
|               hf_T5               |  1   |  0.7678  |
|           squeezenet1_1           |  16  |  0.7248  |
|            hf_T5_large            |  1   |  0.6029  |
|      nvidia_deeprecommender       | 256  |  0.5832  |
|           fastNLP_Bert            |  0   |   nan    |
|         soft_actor_critic         |  0   |   nan    |
|        speech_transformer         |  0   |   nan    |
|             tacotron2             |  0   |   nan    |
|         timm_efficientdet         |  0   |   nan    |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |   2.52   |
|            XLNetLMHeadModel             | 32  |  1.9735  |
|     MobileBertForQuestionAnswering      | 64  |  1.3506  |
|               DistillGPT2               |  1  |  1.3222  |
|               GoogleFnet                |  1  |  1.2726  |
|            YituTechConvBert             |  1  |  1.1963  |
|          MobileBertForMaskedLM          | 32  |  1.1388  |
|                 BigBird                 |  1  |  1.0872  |
|     M2M100ForConditionalGeneration      |  8  |  1.0839  |
|             XGLMForCausalLM             |  8  |  1.0554  |
|                 T5Small                 |  1  |  1.0447  |
|          AllenaiLongformerBase          |  1  |  1.0342  |
|            AlbertForMaskedLM            |  4  |  1.0302  |
|       AlbertForQuestionAnswering        |  4  |  1.0284  |
|             OPTForCausalLM              | 32  |  1.0192  |
|           DebertaForMaskedLM            |  4  |  0.9714  |
|                CamemBert                |  1  |  0.9494  |
|      GPT2ForSequenceClassification      |  4  |  0.9432  |
|         Speech2Text2ForCausalLM         | 128 |  0.9414  |
|       DebertaForQuestionAnswering       |  8  |  0.9174  |
|           RobertaForCausalLM            | 64  |  0.8998  |
|     PLBartForConditionalGeneration      | 16  |  0.8968  |
|           ElectraForCausalLM            | 32  |  0.8924  |
|         MegatronBertForCausalLM         | 16  |   0.89   |
|      MBartForConditionalGeneration      | 16  |  0.8895  |
|            PLBartForCausalLM            | 32  |  0.8875  |
|     PegasusForConditionalGeneration     | 16  |  0.8874  |
|    LayoutLMForSequenceClassification    | 16  |  0.8763  |
|            TrOCRForCausalLM             | 32  |  0.8712  |
|       RobertaForQuestionAnswering       | 128 |  0.8694  |
|    MegatronBertForQuestionAnswering     | 16  |  0.8664  |
|           PegasusForCausalLM            | 32  |  0.8664  |
|           LayoutLMForMaskedLM           | 16  |  0.8655  |
|            MBartForCausalLM             | 32  |  0.8629  |
|       T5ForConditionalGeneration        |  4  |  0.8604  |
|             BertForMaskedLM             | 64  |  0.8416  |
|     DistilBertForQuestionAnswering      | 64  |  0.8389  |
|        BertForQuestionAnswering         | 128 |  0.836   |
|          DistilBertForMaskedLM          | 64  |  0.8349  |
|             BartForCausalLM             |  4  |  0.8196  |
|       ElectraForQuestionAnswering       | 64  |  0.8134  |
|      BartForConditionalGeneration       |  2  |  0.8084  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8081  |
|       BlenderbotSmallForCausalLM        | 64  |  0.793   |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|     PegasusForConditionalGeneration     | 16  | 22.3478  |
|          MobileBertForMaskedLM          | 32  | 21.9429  |
|       DebertaForQuestionAnswering       |  8  |  21.77   |
|            YituTechConvBert             |  1  | 21.6894  |
|     MobileBertForQuestionAnswering      | 64  | 21.4578  |
|          AllenaiLongformerBase          |  1  | 21.2457  |
|           DebertaForMaskedLM            |  4  | 20.9841  |
|     M2M100ForConditionalGeneration      |  8  | 20.5287  |
|                 BigBird                 |  1  | 20.2965  |
|      BartForConditionalGeneration       |  2  | 19.7537  |
|      MBartForConditionalGeneration      | 16  | 19.0224  |
|             XGLMForCausalLM             |  8  | 18.5344  |
|         MegatronBertForCausalLM         | 16  | 17.2338  |
|    MegatronBertForQuestionAnswering     | 16  | 16.8028  |
|       RobertaForQuestionAnswering       | 128 | 16.3171  |
| BlenderbotSmallForConditionalGeneration | 64  | 15.7945  |
|           PegasusForCausalLM            | 32  |  14.928  |
|            AlbertForMaskedLM            |  4  | 14.4438  |
|       AlbertForQuestionAnswering        |  4  | 14.3569  |
|        BertForQuestionAnswering         | 128 | 14.2543  |
|       MT5ForConditionalGeneration       |  8  | 14.1973  |
|           LayoutLMForMaskedLM           | 16  | 13.6621  |
|       T5ForConditionalGeneration        |  4  | 13.6308  |
|             BartForCausalLM             |  4  | 13.3232  |
|       ElectraForQuestionAnswering       | 64  | 13.2929  |
|           RobertaForCausalLM            | 64  | 13.2914  |
|     PLBartForConditionalGeneration      | 16  | 13.2436  |
|           ElectraForCausalLM            | 32  | 13.2227  |
|             BertForMaskedLM             | 64  |  13.182  |
|    LayoutLMForSequenceClassification    | 16  | 13.1042  |
|            MBartForCausalLM             | 32  | 12.5802  |
|      GPT2ForSequenceClassification      |  4  | 12.4793  |
|            TrOCRForCausalLM             | 32  | 12.4106  |
|             OPTForCausalLM              | 32  | 12.0205  |
|                 T5Small                 |  1  | 11.7247  |
|                CamemBert                |  1  | 11.1804  |
|               GoogleFnet                |  1  | 11.1298  |
|         Speech2Text2ForCausalLM         | 128 | 10.7654  |
|       BlenderbotSmallForCausalLM        | 64  | 10.7171  |
|          DistilBertForMaskedLM          | 64  | 10.1725  |
|     DistilBertForQuestionAnswering      | 64  |  9.6337  |
|            PLBartForCausalLM            | 32  |  9.5557  |
|               DistillGPT2               |  1  |  8.4453  |
|            XLNetLMHeadModel             | 32  |  4.8687  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9978  |
|       AlbertForQuestionAnswering        |  4  |  0.9976  |
|       ElectraForQuestionAnswering       | 64  |  0.9962  |
|             BartForCausalLM             |  4  |  0.9959  |
|           RobertaForCausalLM            | 64  |  0.9956  |
|           ElectraForCausalLM            | 32  |  0.9954  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9952  |
|             BertForMaskedLM             | 64  |  0.9951  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9947  |
|          DistilBertForMaskedLM          | 64  |  0.9944  |
|           PegasusForCausalLM            | 32  |  0.9944  |
|            TrOCRForCausalLM             | 32  |  0.9943  |
|      GPT2ForSequenceClassification      |  4  |  0.9943  |
|       DebertaForQuestionAnswering       |  8  |  0.9937  |
|            MBartForCausalLM             | 32  |  0.9935  |
|       T5ForConditionalGeneration        |  4  |  0.9931  |
|             OPTForCausalLM              | 32  |  0.993   |
|         Speech2Text2ForCausalLM         | 128 |  0.9929  |
|            PLBartForCausalLM            | 32  |  0.9924  |
|     DistilBertForQuestionAnswering      | 64  |  0.9916  |
|     PegasusForConditionalGeneration     | 16  |  0.9915  |
|                 BigBird                 |  1  |  0.9914  |
|        BertForQuestionAnswering         | 128 |  0.9913  |
|      BartForConditionalGeneration       |  2  |  0.9913  |
|       RobertaForQuestionAnswering       | 128 |  0.9912  |
|           DebertaForMaskedLM            |  4  |  0.9893  |
|           LayoutLMForMaskedLM           | 16  |  0.9876  |
|      MBartForConditionalGeneration      | 16  |  0.9874  |
|     PLBartForConditionalGeneration      | 16  |  0.987   |
|               GoogleFnet                |  1  |  0.987   |
|    LayoutLMForSequenceClassification    | 16  |  0.9853  |
|               DistillGPT2               |  1  |  0.9838  |
|            XLNetLMHeadModel             | 32  |  0.9773  |
|          MobileBertForMaskedLM          | 32  |  0.9771  |
|         MegatronBertForCausalLM         | 16  |  0.9645  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9602  |
|     M2M100ForConditionalGeneration      |  8  |  0.9538  |
|             XGLMForCausalLM             |  8  |  0.9248  |
|          AllenaiLongformerBase          |  1  |  0.8733  |
|     MobileBertForQuestionAnswering      | 64  |  0.848   |
|                CamemBert                |  1  |  0.8381  |
|            YituTechConvBert             |  1  |  0.8042  |
|       MT5ForConditionalGeneration       |  8  |  0.7392  |
|                 T5Small                 |  1  |  0.7249  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.5267  |
|        gluon_xception65         | 32  |  1.4004  |
|           resnest101e           | 64  |  1.3192  |
|          inception_v3           | 128 |  1.3004  |
|        adv_inception_v3         | 128 |  1.2898  |
|       gluon_inception_v3        | 128 |  1.2825  |
|           fbnetc_100            | 128 |  1.2105  |
|           res2next50            | 128 |  1.2083  |
|        res2net50_14w_8s         | 128 |  1.2004  |
|        res2net101_26w_4s        | 64  |  1.1989  |
|           volo_d1_224           | 64  |   1.19   |
|          spnasnet_100           | 128 |  1.1888  |
|           mnasnet_100           | 128 |  1.1874  |
|      mobilenetv3_large_100      | 128 |  1.1814  |
|        ese_vovnet19b_dw         | 128 |  1.1771  |
|            hrnet_w18            | 128 |  1.1738  |
|          ghostnet_100           | 128 |  1.1673  |
|            lcnet_050            | 128 |  1.1595  |
|             dpn107              | 32  |  1.1524  |
|            gernet_l             | 128 |  1.1266  |
|            repvgg_a2            | 128 |  1.1169  |
|         visformer_small         | 128 |  1.1012  |
|            fbnetv3_b            | 128 |  1.1005  |
|           selecsls42b           | 128 |  1.0886  |
|         mobilenetv2_100         | 128 |  1.0687  |
|           regnety_002           | 128 |  1.0555  |
|           tf_mixnet_l           | 128 |  1.0453  |
|          cspdarknet53           | 64  |  1.0445  |
|      xcit_large_24_p8_224       |  5  |  1.0317  |
|          gmixer_24_224          | 128 |   1.03   |
|            mixnet_l             | 128 |   1.0    |
|          cait_m36_384           |  4  |  0.9731  |
|             dla102              | 128 |  0.9443  |
|            nfnet_l0             | 128 |  0.9023  |
|  swin_base_patch4_window7_224   | 64  |  0.8897  |
|           dm_nfnet_f0           | 128 |  0.885   |
|      beit_base_patch16_224      | 64  |  0.8841  |
|           rexnet_100            | 128 |  0.882   |
|           convit_base           | 64  |  0.8643  |
|          convnext_base          | 64  |  0.8637  |
| deit_base_distilled_patch16_224 | 64  |  0.8615  |
|       tf_efficientnet_b0        | 128 |  0.8598  |
|      vit_base_patch16_224       | 64  |  0.8571  |
|         poolformer_m36          | 64  |  0.8554  |
|          jx_nest_base           | 32  |  0.8342  |
|          resmlp_12_224          | 128 |  0.8261  |
|            tinynet_a            | 128 |  0.8205  |
|           mobilevit_s           | 64  |  0.8145  |
|        tnt_s_patch16_224        | 128 |  0.8144  |
|            pit_b_224            | 64  |  0.7913  |
|          mixer_b16_224          | 128 |  0.7906  |
|         coat_lite_mini          | 128 |  0.7535  |
|        twins_pcpvt_base         | 64  |  0.7371  |
|         crossvit_9_240          | 128 |  0.7225  |
|          gmlp_s16_224           | 128 |  0.5764  |
|     swsl_resnext101_32x16d      | 32  |  0.0717  |
|        convmixer_768_32         |  0  |   0.0    |
|        sebotnet33ts_256         |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 116.6718 |
|          pnasnet5large          | 16  |  32.273  |
|           dm_nfnet_f0           | 128 | 30.8393  |
|            hrnet_w18            | 128 | 28.1686  |
|        tnt_s_patch16_224        | 128 | 27.0522  |
|  swin_base_patch4_window7_224   | 64  | 26.1303  |
|          cait_m36_384           |  4  | 24.0501  |
|           tf_mixnet_l           | 128 |  23.045  |
|        twins_pcpvt_base         | 64  | 22.4356  |
|           mobilevit_s           | 64  | 21.9736  |
|         poolformer_m36          | 64  | 21.9105  |
|      xcit_large_24_p8_224       |  5  | 21.1167  |
|          inception_v3           | 128 | 21.0264  |
|            mixnet_l             | 128 | 20.9835  |
|        res2net50_14w_8s         | 128 | 20.5074  |
|           rexnet_100            | 128 | 19.8709  |
|             dla102              | 128 | 19.7566  |
|            fbnetv3_b            | 128 | 19.6552  |
|           resnest101e           | 64  | 19.5838  |
|            nfnet_l0             | 128 | 19.4525  |
|        res2net101_26w_4s        | 64  | 19.4293  |
|             dpn107              | 32  | 19.1965  |
|          jx_nest_base           | 32  |  18.167  |
|          gmlp_s16_224           | 128 | 16.9396  |
|          convnext_base          | 64  | 16.9235  |
|           volo_d1_224           | 64  | 16.5552  |
|         crossvit_9_240          | 128 | 16.5376  |
|       tf_efficientnet_b0        | 128 | 16.2456  |
|           convit_base           | 64  |  16.061  |
|           res2next50            | 128 | 15.5874  |
|         coat_lite_mini          | 128 | 15.5764  |
|            pit_b_224            | 64  | 15.2213  |
|        adv_inception_v3         | 128 | 14.4561  |
|       gluon_inception_v3        | 128 | 14.2199  |
|        gluon_xception65         | 32  | 14.1294  |
|          mixer_b16_224          | 128 | 14.0365  |
|          gmixer_24_224          | 128 | 13.8209  |
|            tinynet_a            | 128 |  13.23   |
|      beit_base_patch16_224      | 64  | 13.1765  |
|         visformer_small         | 128 | 13.0997  |
|      mobilenetv3_large_100      | 128 | 12.8485  |
| deit_base_distilled_patch16_224 | 64  |  12.723  |
|          cspdarknet53           | 64  | 12.7215  |
|      vit_base_patch16_224       | 64  |  12.632  |
|          ghostnet_100           | 128 | 11.9344  |
|            gernet_l             | 128 | 11.7819  |
|           fbnetc_100            | 128 |  11.327  |
|          spnasnet_100           | 128 | 11.2226  |
|           regnety_002           | 128 |  11.123  |
|            repvgg_a2            | 128 | 10.9643  |
|         mobilenetv2_100         | 128 | 10.8973  |
|        ese_vovnet19b_dw         | 128 | 10.5822  |
|           mnasnet_100           | 128 | 10.2721  |
|          resmlp_12_224          | 128 | 10.0726  |
|           selecsls42b           | 128 |  9.447   |
|            lcnet_050            | 128 |  9.1307  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9975  |
|          mixer_b16_224          | 128 |  0.9972  |
|        adv_inception_v3         | 128 |  0.9971  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           convit_base           | 64  |  0.9967  |
| deit_base_distilled_patch16_224 | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|             dla102              | 128 |  0.9963  |
|          gmixer_24_224          | 128 |  0.9962  |
|          resmlp_12_224          | 128 |  0.9962  |
|       gluon_inception_v3        | 128 |  0.9962  |
|        gluon_xception65         | 32  |  0.9962  |
|          inception_v3           | 128 |  0.9961  |
|            fbnetv3_b            | 128 |  0.996   |
|        res2net50_14w_8s         | 128 |  0.996   |
|      beit_base_patch16_224      | 64  |  0.9959  |
|           mobilevit_s           | 64  |  0.9957  |
|          convnext_base          | 64  |  0.9957  |
|            gernet_l             | 128 |  0.9957  |
|           rexnet_100            | 128 |  0.9956  |
|         visformer_small         | 128 |  0.9954  |
|            pit_b_224            | 64  |  0.9953  |
|         coat_lite_mini          | 128 |  0.9952  |
|            hrnet_w18            | 128 |  0.9949  |
|        tnt_s_patch16_224        | 128 |  0.9949  |
|           res2next50            | 128 |  0.9947  |
|            nfnet_l0             | 128 |  0.9946  |
|            mixnet_l             | 128 |  0.9946  |
|           tf_mixnet_l           | 128 |  0.9943  |
|            repvgg_a2            | 128 |  0.994   |
|        res2net101_26w_4s        | 64  |  0.994   |
|       tf_efficientnet_b0        | 128 |  0.9938  |
|          cait_m36_384           |  4  |  0.9936  |
|           resnest101e           | 64  |  0.9936  |
|             dpn107              | 32  |  0.9935  |
|  swin_base_patch4_window7_224   | 64  |  0.9934  |
|         poolformer_m36          | 64  |  0.9933  |
|         crossvit_9_240          | 128 |  0.9919  |
|          pnasnet5large          | 16  |  0.9919  |
|          jx_nest_base           | 32  |  0.9918  |
|      mobilenetv3_large_100      | 128 |  0.9914  |
|        twins_pcpvt_base         | 64  |  0.9914  |
|           volo_d1_224           | 64  |  0.9907  |
|      xcit_large_24_p8_224       |  5  |  0.9902  |
|     swsl_resnext101_32x16d      | 32  |  0.989   |
|            lcnet_050            | 128 |  0.9886  |
|           regnety_002           | 128 |  0.9871  |
|           fbnetc_100            | 128 |  0.9865  |
|           selecsls42b           | 128 |  0.9864  |
|          cspdarknet53           | 64  |  0.9863  |
|           mnasnet_100           | 128 |  0.9839  |
|          spnasnet_100           | 128 |  0.9833  |
|          ghostnet_100           | 128 |  0.9796  |
|            tinynet_a            | 128 |  0.9731  |
|          gmlp_s16_224           | 128 |  0.9425  |
|         mobilenetv2_100         | 128 |  0.8072  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

ESI-SYD · 2022-11-08T11:18:01Z

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 49/54 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    1.04x   |    1.00x    |    1.07x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    17.21   |    19.50    |    22.93    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |    1.23x   |     1.30x   |    1.05x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|        shufflenet_v2_x1_0         | 1  |  1.4924  |
|           squeezenet1_1           | 1  |  1.2456  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.1829  |
|             resnet18              | 1  |  1.1615  |
|       functorch_dp_cifar10        | 1  |  1.1532  |
|           pytorch_unet            | 1  |  1.1414  |
| attention_is_all_you_need_pytorch | 1  |  1.1106  |
|            timm_vovnet            | 1  |  1.1089  |
|            Super_SloMo            | 1  |  1.0927  |
|               vgg16               | 1  |  1.0908  |
|              alexnet              | 1  |  1.0802  |
|                drq                | 1  |  1.0768  |
|            timm_regnet            | 1  |  1.0688  |
|        Background_Matting         | 1  |  1.0576  |
|          LearningToPaint          | 1  |  1.0514  |
|               dlrm                | 1  |  1.0468  |
|            densenet121            | 1  |  1.0435  |
|           timm_resnest            | 1  |  1.042   |
|               dcgan               | 1  |  1.0316  |
|          pytorch_stargan          | 16 |  1.0133  |
|              demucs               | 1  |  0.9988  |
|            tts_angular            | 1  |  0.9982  |
|      resnet50_quantized_qat       | 1  |  0.9973  |
|            hf_BigBird             | 1  |  0.9936  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9924  |
|            timm_nfnet             | 1  |  0.9911  |
|      nvidia_deeprecommender       | 1  |  0.9847  |
|           mobilenet_v2            | 1  |  0.9292  |
|            mnasnet1_0             | 1  |  0.9092  |
|              yolov3               | 1  |  0.8915  |
|              hf_GPT2              | 1  |  0.8666  |
|             resnet50              | 1  |  0.8611  |
|           hf_Longformer           | 1  |  0.8127  |
|           BERT_pytorch            | 1  |  0.8125  |
|            hf_T5_large            | 1  |  0.8077  |
|             hf_Albert             | 1  |  0.8057  |
|   timm_vision_transformer_large   | 1  |  0.8052  |
|            hf_Reformer            | 1  |  0.7879  |
|        mobilenet_v3_large         | 1  |  0.7861  |
|           lennard_jones           | 1  |  0.7822  |
|          resnext50_32x4d          | 1  |  0.7543  |
|           hf_GPT2_large           | 1  |  0.7516  |
|               hf_T5               | 1  |  0.7441  |
|           hf_DistilBert           | 1  |  0.7326  |
|              hf_Bart              | 1  |  0.7097  |
|            hf_T5_base             | 1  |  0.6893  |
|              hf_Bert              | 1  |  0.6852  |
|      timm_vision_transformer      | 1  |  0.5967  |
|         timm_efficientnet         | 1  |  0.5114  |
|         soft_actor_critic         | 0  |   0.0    |
|        speech_transformer         | 0  |   0.0    |
|         timm_efficientdet         | 0  |   0.0    |
|           fastNLP_Bert            | 0  |   0.0    |
|             tacotron2             | 0  |   0.0    |
+-----------------------------------+----+----------+

Accuracy

+-----------------------------------+----+------------------+
|               name                | bs |     inductor     |
+-----------------------------------+----+------------------+
|            hf_T5_large            | 1  | pass_due_to_skip |
|           hf_GPT2_large           | 1  | pass_due_to_skip |
|   timm_vision_transformer_large   | 1  | pass_due_to_skip |
|               hf_T5               | 1  |       pass       |
|            hf_Reformer            | 1  |       pass       |
|            hf_T5_base             | 1  |       pass       |
|          LearningToPaint          | 1  |       pass       |
|            Super_SloMo            | 1  |       pass       |
|              alexnet              | 1  |       pass       |
| attention_is_all_you_need_pytorch | 1  |       pass       |
|               dcgan               | 1  |       pass       |
|              demucs               | 1  |       pass       |
|            densenet121            | 1  |       pass       |
|               dlrm                | 1  |       pass       |
|                drq                | 1  |       pass       |
|              yolov3               | 1  |       pass       |
|           mobilenet_v2            | 1  |       pass       |
|             hf_Albert             | 1  |       pass       |
|              hf_Bart              | 1  |       pass       |
|              hf_Bert              | 1  |       pass       |
|            hf_BigBird             | 1  |       pass       |
|           hf_DistilBert           | 1  |       pass       |
|              hf_GPT2              | 1  |       pass       |
|           hf_Longformer           | 1  |       pass       |
|       functorch_dp_cifar10        | 1  |       pass       |
|           lennard_jones           | 1  |       pass       |
|        Background_Matting         | 1  |       pass       |
|          resnext50_32x4d          | 1  |       pass       |
|           BERT_pytorch            | 1  |       pass       |
|      resnet50_quantized_qat       | 1  |       pass       |
|    mobilenet_v2_quantized_qat     | 1  |       pass       |
|        mobilenet_v3_large         | 1  |       pass       |
|      nvidia_deeprecommender       | 1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |
|          pytorch_stargan          | 16 |       pass       |
|           pytorch_unet            | 1  |       pass       |
|               vgg16               | 1  |       pass       |
|             resnet50              | 1  |       pass       |
|             resnet18              | 1  |       pass       |
|            mnasnet1_0             | 1  |       pass       |
|        shufflenet_v2_x1_0         | 1  |       pass       |
|           squeezenet1_1           | 1  |       pass       |
|         timm_efficientnet         | 1  |       pass       |
|            timm_nfnet             | 1  |       pass       |
|            timm_regnet            | 1  |       pass       |
|           timm_resnest            | 1  |       pass       |
|      timm_vision_transformer      | 1  |       pass       |
|            timm_vovnet            | 1  |       pass       |
|            tts_angular            | 1  |       pass       |
|         soft_actor_critic         | 0  |      0.0000      |
|           fastNLP_Bert            | 0  |      0.0000      |
|             tacotron2             | 0  |      0.0000      |
|         timm_efficientdet         | 0  |      0.0000      |
|        speech_transformer         | 0  |      0.0000      |
+-----------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  | 76.7783  |
|            hf_T5_large            | 1  | 60.5122  |
|           hf_GPT2_large           | 1  | 59.4694  |
|            densenet121            | 1  | 30.8545  |
|   timm_vision_transformer_large   | 1  | 28.9246  |
|            timm_nfnet             | 1  | 25.4024  |
|              yolov3               | 1  | 24.6705  |
|           hf_Longformer           | 1  | 24.4957  |
|            Super_SloMo            | 1  | 24.3262  |
|         timm_efficientnet         | 1  | 23.1212  |
|        Background_Matting         | 1  | 22.8918  |
|            hf_BigBird             | 1  | 22.3391  |
|           pytorch_unet            | 1  | 18.9934  |
|            timm_regnet            | 1  | 18.4639  |
|            hf_Reformer            | 1  | 18.0788  |
|        mobilenet_v3_large         | 1  | 17.9186  |
|              hf_Bart              | 1  | 17.7819  |
|               hf_T5               | 1  | 17.7063  |
|            timm_vovnet            | 1  | 15.5172  |
|          resnext50_32x4d          | 1  | 14.7342  |
|             resnet50              | 1  | 14.6658  |
|              hf_Bert              | 1  | 14.5967  |
|           mobilenet_v2            | 1  |  14.549  |
|              hf_GPT2              | 1  | 13.9911  |
|           BERT_pytorch            | 1  | 13.8472  |
|           hf_DistilBert           | 1  | 13.8324  |
|           timm_resnest            | 1  |  13.25   |
|            mnasnet1_0             | 1  | 13.1723  |
|        shufflenet_v2_x1_0         | 1  | 12.8739  |
|             hf_Albert             | 1  | 12.6541  |
|      timm_vision_transformer      | 1  | 12.2517  |
| attention_is_all_you_need_pytorch | 1  | 11.9372  |
|       functorch_dp_cifar10        | 1  | 11.8487  |
|          pytorch_stargan          | 16 | 11.5192  |
|          LearningToPaint          | 1  | 10.9144  |
|   pytorch_CycleGAN_and_pix2pix    | 1  | 10.1494  |
|           squeezenet1_1           | 1  |  9.7543  |
|               vgg16               | 1  |  8.5459  |
|             resnet18              | 1  |  8.449   |
|      nvidia_deeprecommender       | 1  |  8.3243  |
|                drq                | 1  |  8.1052  |
|               dlrm                | 1  |  7.9679  |
|              alexnet              | 1  |  7.3684  |
|            tts_angular            | 1  |  6.9832  |
|           lennard_jones           | 1  |  6.5252  |
|              demucs               | 1  |  1.4428  |
|               dcgan               | 1  |  0.2565  |
|    mobilenet_v2_quantized_qat     | 1  |  0.174   |
|      resnet50_quantized_qat       | 1  |  0.1546  |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
|         timm_efficientdet         | 0  |   nan    |
+-----------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------+----+----------+
|               name                | bs | inductor |
+-----------------------------------+----+----------+
|            hf_T5_base             | 1  |  4.3494  |
|           pytorch_unet            | 1  |  2.1317  |
|             hf_Albert             | 1  |  2.1001  |
|           hf_GPT2_large           | 1  |  1.9307  |
|            hf_BigBird             | 1  |  1.7971  |
|            Super_SloMo            | 1  |  1.7855  |
|            hf_T5_large            | 1  |  1.5672  |
|   pytorch_CycleGAN_and_pix2pix    | 1  |  1.4777  |
|               hf_T5               | 1  |  1.3382  |
|              hf_Bart              | 1  |  1.3275  |
|              hf_Bert              | 1  |  1.2964  |
|           hf_Longformer           | 1  |  1.2845  |
|               dlrm                | 1  |  1.2155  |
|          LearningToPaint          | 1  |  1.1866  |
|           hf_DistilBert           | 1  |  1.1789  |
|            timm_nfnet             | 1  |  1.1521  |
|            timm_regnet            | 1  |  1.1463  |
|              hf_GPT2              | 1  |  1.1158  |
|   timm_vision_transformer_large   | 1  |  1.1142  |
|                drq                | 1  |  1.0669  |
|            timm_vovnet            | 1  |  1.0587  |
|           mobilenet_v2            | 1  |  1.0514  |
|         timm_efficientnet         | 1  |  1.0509  |
|      timm_vision_transformer      | 1  |  1.0436  |
|              yolov3               | 1  |  1.0023  |
|              demucs               | 1  |  0.9985  |
|      nvidia_deeprecommender       | 1  |  0.9962  |
|      resnet50_quantized_qat       | 1  |  0.9955  |
|        Background_Matting         | 1  |  0.9953  |
|    mobilenet_v2_quantized_qat     | 1  |  0.9928  |
|            tts_angular            | 1  |  0.9921  |
|          pytorch_stargan          | 16 |  0.9883  |
|            mnasnet1_0             | 1  |  0.9844  |
|            densenet121            | 1  |  0.9831  |
| attention_is_all_you_need_pytorch | 1  |  0.983   |
|           lennard_jones           | 1  |  0.9818  |
|              alexnet              | 1  |  0.9803  |
|        mobilenet_v3_large         | 1  |  0.9771  |
|               vgg16               | 1  |  0.9758  |
|          resnext50_32x4d          | 1  |  0.975   |
|           squeezenet1_1           | 1  |  0.966   |
|               dcgan               | 1  |  0.9639  |
|        shufflenet_v2_x1_0         | 1  |  0.9582  |
|           timm_resnest            | 1  |  0.9582  |
|       functorch_dp_cifar10        | 1  |  0.9542  |
|             resnet50              | 1  |  0.9529  |
|            hf_Reformer            | 1  |  0.9437  |
|           BERT_pytorch            | 1  |  0.9242  |
|             resnet18              | 1  |  0.917   |
|           fastNLP_Bert            | 0  |   nan    |
|         soft_actor_critic         | 0  |   nan    |
|        speech_transformer         | 0  |   nan    |
|             tacotron2             | 0  |   nan    |
|         timm_efficientdet         | 0  |   nan    |
+-----------------------------------+----+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.0761  |
|            XLNetLMHeadModel             | 1  |  1.049   |
|       AlbertForQuestionAnswering        | 1  |  0.9522  |
|            AlbertForMaskedLM            | 1  |  0.9483  |
|          MobileBertForMaskedLM          | 1  |  0.9464  |
|                 BigBird                 | 1  |  0.9053  |
|     M2M100ForConditionalGeneration      | 1  |  0.9032  |
|             OPTForCausalLM              | 1  |  0.8764  |
|               GoogleFnet                | 1  |  0.8565  |
|            YituTechConvBert             | 1  |  0.8172  |
|         Speech2Text2ForCausalLM         | 1  |  0.8158  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8047  |
|     PegasusForConditionalGeneration     | 1  |  0.8046  |
|       DebertaForQuestionAnswering       | 1  |  0.8035  |
|      MBartForConditionalGeneration      | 1  |  0.8029  |
|         MegatronBertForCausalLM         | 1  |  0.7995  |
|       RobertaForQuestionAnswering       | 1  |  0.7913  |
|          AllenaiLongformerBase          | 1  |  0.791   |
|            TrOCRForCausalLM             | 1  |  0.7903  |
|           PegasusForCausalLM            | 1  |  0.7878  |
|             XGLMForCausalLM             | 1  |  0.7832  |
|            MBartForCausalLM             | 1  |  0.7824  |
|           RobertaForCausalLM            | 1  |  0.7728  |
|     PLBartForConditionalGeneration      | 1  |  0.7699  |
|     DistilBertForQuestionAnswering      | 1  |  0.7654  |
|           DebertaForMaskedLM            | 1  |  0.7648  |
|        BertForQuestionAnswering         | 1  |  0.756   |
|             BertForMaskedLM             | 1  |  0.7524  |
|            PLBartForCausalLM            | 1  |  0.7459  |
|          DistilBertForMaskedLM          | 1  |  0.7431  |
|               DistillGPT2               | 1  |  0.726   |
|       MT5ForConditionalGeneration       | 1  |  0.7145  |
|      GPT2ForSequenceClassification      | 1  |  0.6909  |
|           LayoutLMForMaskedLM           | 1  |  0.6807  |
| BlenderbotSmallForConditionalGeneration | 1  |   0.68   |
|    LayoutLMForSequenceClassification    | 1  |  0.6773  |
|             BartForCausalLM             | 1  |  0.6707  |
|                CamemBert                | 1  |   0.67   |
|       BlenderbotSmallForCausalLM        | 1  |  0.6692  |
|      BartForConditionalGeneration       | 1  |  0.6425  |
|       T5ForConditionalGeneration        | 1  |  0.6322  |
|                 T5Small                 | 1  |  0.6311  |
|       ElectraForQuestionAnswering       | 1  |  0.5028  |
|           ElectraForCausalLM            | 1  |  0.4886  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 49.6995  |
|            AlbertForMaskedLM            | 1  | 32.9211  |
|          AllenaiLongformerBase          | 1  |  31.457  |
|       AlbertForQuestionAnswering        | 1  | 30.8554  |
|            YituTechConvBert             | 1  | 25.3315  |
|             BartForCausalLM             | 1  | 24.3902  |
|           PegasusForCausalLM            | 1  |  23.914  |
|          MobileBertForMaskedLM          | 1  | 23.6263  |
|            XLNetLMHeadModel             | 1  |  23.513  |
|     PegasusForConditionalGeneration     | 1  | 23.4144  |
|     M2M100ForConditionalGeneration      | 1  | 23.2889  |
|     MobileBertForQuestionAnswering      | 1  | 23.2762  |
|                 BigBird                 | 1  | 22.9858  |
|      MBartForConditionalGeneration      | 1  | 22.3898  |
|           DebertaForMaskedLM            | 1  |  22.357  |
|       T5ForConditionalGeneration        | 1  |  22.172  |
|       MT5ForConditionalGeneration       | 1  | 22.1254  |
|             XGLMForCausalLM             | 1  | 22.0928  |
|                 T5Small                 | 1  | 21.8409  |
|       DebertaForQuestionAnswering       | 1  | 21.3565  |
|         MegatronBertForCausalLM         | 1  | 20.1642  |
|             BertForMaskedLM             | 1  | 20.0772  |
|    MegatronBertForQuestionAnswering     | 1  | 19.6571  |
|      GPT2ForSequenceClassification      | 1  | 19.1087  |
|               GoogleFnet                | 1  | 18.4626  |
| BlenderbotSmallForConditionalGeneration | 1  | 15.2419  |
|                CamemBert                | 1  | 15.1637  |
|     PLBartForConditionalGeneration      | 1  | 14.9998  |
|           LayoutLMForMaskedLM           | 1  | 14.9718  |
|            MBartForCausalLM             | 1  |  14.451  |
|    LayoutLMForSequenceClassification    | 1  | 14.2165  |
|        BertForQuestionAnswering         | 1  | 13.9057  |
|             OPTForCausalLM              | 1  | 13.7452  |
|               DistillGPT2               | 1  |  13.034  |
|            TrOCRForCausalLM             | 1  | 12.9223  |
|           ElectraForCausalLM            | 1  | 12.9222  |
|           RobertaForCausalLM            | 1  | 12.9136  |
|       RobertaForQuestionAnswering       | 1  | 12.6522  |
|       ElectraForQuestionAnswering       | 1  | 12.3606  |
|       BlenderbotSmallForCausalLM        | 1  | 11.2526  |
|         Speech2Text2ForCausalLM         | 1  | 11.1537  |
|            PLBartForCausalLM            | 1  | 10.7109  |
|          DistilBertForMaskedLM          | 1  | 10.5215  |
|     DistilBertForQuestionAnswering      | 1  | 10.2008  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.7784  |
|       AlbertForQuestionAnswering        | 1  |  2.726   |
|      BartForConditionalGeneration       | 1  |  2.1303  |
|       T5ForConditionalGeneration        | 1  |  1.8725  |
|                 T5Small                 | 1  |  1.8719  |
|                 BigBird                 | 1  |  1.8559  |
|      GPT2ForSequenceClassification      | 1  |  1.8462  |
|             BartForCausalLM             | 1  |  1.515   |
|          AllenaiLongformerBase          | 1  |  1.486   |
|            XLNetLMHeadModel             | 1  |  1.3611  |
|       DebertaForQuestionAnswering       | 1  |  1.3577  |
|                CamemBert                | 1  |  1.3398  |
|           LayoutLMForMaskedLM           | 1  |  1.3339  |
|            YituTechConvBert             | 1  |  1.3279  |
|           DebertaForMaskedLM            | 1  |  1.3117  |
|               GoogleFnet                | 1  |  1.2932  |
|    LayoutLMForSequenceClassification    | 1  |  1.2861  |
|           ElectraForCausalLM            | 1  |  1.2708  |
|               DistillGPT2               | 1  |  1.1948  |
|       ElectraForQuestionAnswering       | 1  |  1.1867  |
|     PegasusForConditionalGeneration     | 1  |  1.1209  |
|      MBartForConditionalGeneration      | 1  |  1.0693  |
|       MT5ForConditionalGeneration       | 1  |  1.0645  |
|             BertForMaskedLM             | 1  |  1.0616  |
|     M2M100ForConditionalGeneration      | 1  |  1.0565  |
|         MegatronBertForCausalLM         | 1  |  1.056   |
|           RobertaForCausalLM            | 1  |  1.0498  |
|            TrOCRForCausalLM             | 1  |  1.0498  |
|    MegatronBertForQuestionAnswering     | 1  |  1.0494  |
|       BlenderbotSmallForCausalLM        | 1  |  1.0494  |
|             OPTForCausalLM              | 1  |  1.0467  |
|        BertForQuestionAnswering         | 1  |  1.038   |
|          DistilBertForMaskedLM          | 1  |  1.0359  |
|       RobertaForQuestionAnswering       | 1  |  1.0347  |
|             XGLMForCausalLM             | 1  |  1.0325  |
|     PLBartForConditionalGeneration      | 1  |  1.0286  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.0284  |
|            MBartForCausalLM             | 1  |  1.026   |
|            PLBartForCausalLM            | 1  |  1.0254  |
|     DistilBertForQuestionAnswering      | 1  |  1.017   |
|           PegasusForCausalLM            | 1  |  1.0034  |
|          MobileBertForMaskedLM          | 1  |  0.9964  |
|         Speech2Text2ForCausalLM         | 1  |  0.9816  |
|     MobileBertForQuestionAnswering      | 1  |  0.9741  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5744  |
|           regnety_002           | 1  |  1.3378  |
|        ese_vovnet19b_dw         | 1  |  1.3321  |
|          inception_v3           | 1  |  1.332   |
|       gluon_inception_v3        | 1  |  1.3149  |
|           mnasnet_100           | 1  |  1.3143  |
|        adv_inception_v3         | 1  |  1.3085  |
|            lcnet_050            | 1  |  1.2876  |
|          spnasnet_100           | 1  |  1.2836  |
|           fbnetc_100            | 1  |  1.2526  |
|         mobilenetv2_100         | 1  |  1.2212  |
|            fbnetv3_b            | 1  |   1.16   |
|      mobilenetv3_large_100      | 1  |  1.1506  |
|            gernet_l             | 1  |  1.0846  |
|             dpn107              | 1  |  1.0695  |
|        gluon_xception65         | 1  |  1.0594  |
|          cspdarknet53           | 1  |  1.0591  |
|            repvgg_a2            | 1  |  1.0524  |
|            hrnet_w18            | 1  |  1.0432  |
|          ghostnet_100           | 1  |  1.0058  |
|            nfnet_l0             | 1  |  1.0031  |
|           resnest101e           | 1  |  0.9801  |
|         crossvit_9_240          | 1  |  0.9629  |
|        twins_pcpvt_base         | 1  |  0.9453  |
|           selecsls42b           | 1  |  0.9185  |
|           dm_nfnet_f0           | 1  |  0.9119  |
|         visformer_small         | 1  |  0.9093  |
|        res2net101_26w_4s        | 1  |  0.9033  |
|      xcit_large_24_p8_224       | 1  |  0.9028  |
|          convnext_base          | 1  |  0.8551  |
|        res2net50_14w_8s         | 1  |  0.8483  |
|      beit_base_patch16_224      | 1  |  0.8427  |
|          gmixer_24_224          | 1  |  0.8267  |
| deit_base_distilled_patch16_224 | 1  |   0.79   |
|           res2next50            | 1  |  0.7897  |
|  swin_base_patch4_window7_224   | 1  |  0.7765  |
|           convit_base           | 1  |  0.7703  |
|          cait_m36_384           | 1  |  0.7632  |
|      vit_base_patch16_224       | 1  |  0.761   |
|           volo_d1_224           | 1  |  0.7515  |
|         poolformer_m36          | 1  |  0.736   |
|            mixnet_l             | 1  |  0.729   |
|             dla102              | 1  |  0.7196  |
|           tf_mixnet_l           | 1  |  0.7186  |
|          mixer_b16_224          | 1  |  0.7082  |
|          resmlp_12_224          | 1  |  0.666   |
|            pit_b_224            | 1  |  0.6644  |
|           rexnet_100            | 1  |  0.6578  |
|        tnt_s_patch16_224        | 1  |  0.6416  |
|          gmlp_s16_224           | 1  |  0.6327  |
|          jx_nest_base           | 1  |  0.6187  |
|           mobilevit_s           | 1  |  0.5702  |
|            tinynet_a            | 1  |  0.5466  |
|         coat_lite_mini          | 1  |  0.5378  |
|       tf_efficientnet_b0        | 1  |  0.5215  |
|     swsl_resnext101_32x16d      | 1  |  0.0669  |
|        sebotnet33ts_256         | 0  |   0.0    |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 85.6242  |
|          pnasnet5large          | 1  | 43.6874  |
|            fbnetv3_b            | 1  | 34.4074  |
|          cait_m36_384           | 1  | 33.7081  |
|           tf_mixnet_l           | 1  | 33.6983  |
|        twins_pcpvt_base         | 1  | 31.5227  |
|            hrnet_w18            | 1  | 31.1337  |
|           rexnet_100            | 1  | 30.7918  |
|  swin_base_patch4_window7_224   | 1  | 30.3129  |
|           mobilevit_s           | 1  | 30.2527  |
|            mixnet_l             | 1  |  29.901  |
|      xcit_large_24_p8_224       | 1  | 28.4048  |
|             dpn107              | 1  | 26.7877  |
|        adv_inception_v3         | 1  | 26.4651  |
|        res2net50_14w_8s         | 1  | 26.3563  |
|           dm_nfnet_f0           | 1  | 26.1631  |
|          ghostnet_100           | 1  | 25.6518  |
|         poolformer_m36          | 1  | 24.9019  |
|       tf_efficientnet_b0        | 1  | 24.8772  |
|           resnest101e           | 1  | 24.5232  |
|         visformer_small         | 1  | 24.3597  |
|            tinynet_a            | 1  | 24.2595  |
|        res2net101_26w_4s        | 1  | 23.5626  |
|          jx_nest_base           | 1  | 23.1414  |
|         coat_lite_mini          | 1  | 22.4646  |
|            nfnet_l0             | 1  | 22.4105  |
|      mobilenetv3_large_100      | 1  | 22.3182  |
|             dla102              | 1  | 21.2783  |
|           volo_d1_224           | 1  | 21.2219  |
|         crossvit_9_240          | 1  | 21.2091  |
|        tnt_s_patch16_224        | 1  | 21.0931  |
|          convnext_base          | 1  | 19.4932  |
|          cspdarknet53           | 1  | 19.3065  |
|           fbnetc_100            | 1  | 18.9707  |
|       gluon_inception_v3        | 1  | 18.6493  |
|           res2next50            | 1  | 18.5211  |
|            pit_b_224            | 1  | 18.1465  |
|          spnasnet_100           | 1  | 18.1066  |
|          inception_v3           | 1  | 17.9952  |
|          gmlp_s16_224           | 1  | 17.9558  |
|      beit_base_patch16_224      | 1  |  17.513  |
|           regnety_002           | 1  | 17.0633  |
|        gluon_xception65         | 1  | 16.8521  |
|           mnasnet_100           | 1  | 16.5367  |
|         mobilenetv2_100         | 1  | 16.3189  |
|           convit_base           | 1  | 16.2035  |
|            gernet_l             | 1  | 14.7975  |
|          gmixer_24_224          | 1  |  14.685  |
|           selecsls42b           | 1  | 14.2609  |
|        ese_vovnet19b_dw         | 1  | 13.4973  |
|          mixer_b16_224          | 1  |  13.462  |
|      vit_base_patch16_224       | 1  | 13.3318  |
| deit_base_distilled_patch16_224 | 1  | 13.0111  |
|            repvgg_a2            | 1  | 12.9665  |
|            lcnet_050            | 1  | 12.8088  |
|          resmlp_12_224          | 1  | 10.4233  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.2814  |
|          cait_m36_384           | 1  |  2.1693  |
|           dm_nfnet_f0           | 1  |  1.2079  |
|           mobilevit_s           | 1  |  1.2047  |
|          jx_nest_base           | 1  |  1.1988  |
|            nfnet_l0             | 1  |  1.1854  |
|             dpn107              | 1  |  1.1766  |
|        gluon_xception65         | 1  |  1.1551  |
|          convnext_base          | 1  |  1.155   |
|           convit_base           | 1  |  1.1399  |
|          cspdarknet53           | 1  |  1.1261  |
|         poolformer_m36          | 1  |  1.1142  |
|  swin_base_patch4_window7_224   | 1  |  1.106   |
|            pit_b_224            | 1  |  1.1013  |
|      beit_base_patch16_224      | 1  |  1.0972  |
|        twins_pcpvt_base         | 1  |  1.0907  |
|        tnt_s_patch16_224        | 1  |  1.0873  |
|           volo_d1_224           | 1  |  1.087   |
| deit_base_distilled_patch16_224 | 1  |  1.082   |
|      vit_base_patch16_224       | 1  |  1.0789  |
|          mixer_b16_224          | 1  |  1.0786  |
|         mobilenetv2_100         | 1  |  1.0611  |
|        ese_vovnet19b_dw         | 1  |  1.0596  |
|       tf_efficientnet_b0        | 1  |  1.057   |
|          resmlp_12_224          | 1  |  1.0457  |
|         coat_lite_mini          | 1  |  1.0422  |
|           rexnet_100            | 1  |  1.0408  |
|            mixnet_l             | 1  |  1.0377  |
|            gernet_l             | 1  |  1.0281  |
|            fbnetv3_b            | 1  |  1.0236  |
|            tinynet_a            | 1  |  1.0189  |
|           fbnetc_100            | 1  |  1.0121  |
|          spnasnet_100           | 1  |  1.0098  |
|           mnasnet_100           | 1  |  1.0097  |
|            repvgg_a2            | 1  |  1.0028  |
|         crossvit_9_240          | 1  |  0.9982  |
|           resnest101e           | 1  |  0.9928  |
|      mobilenetv3_large_100      | 1  |  0.9677  |
|            lcnet_050            | 1  |  0.9643  |
|           regnety_002           | 1  |  0.9604  |
|         visformer_small         | 1  |  0.9598  |
|          inception_v3           | 1  |  0.9554  |
|           res2next50            | 1  |  0.9538  |
|             dla102              | 1  |  0.9508  |
|        adv_inception_v3         | 1  |  0.9508  |
|            hrnet_w18            | 1  |  0.9504  |
|          ghostnet_100           | 1  |  0.9456  |
|        res2net50_14w_8s         | 1  |  0.9426  |
|       gluon_inception_v3        | 1  |  0.9393  |
|          gmixer_24_224          | 1  |  0.9237  |
|        res2net101_26w_4s        | 1  |  0.9135  |
|          gmlp_s16_224           | 1  |  0.8851  |
|     swsl_resnext101_32x16d      | 1  |  0.8458  |
|           tf_mixnet_l           | 1  |  0.8379  |
|           selecsls42b           | 1  |  0.8177  |
|          pnasnet5large          | 1  |  0.7496  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

chuanqi129 · 2022-11-11T16:38:21Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-09 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	3b29687	a7420d2
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 55/55 | 100%, 44/44 | 92%, 56/61  |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.08x    |    1.07x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   14.89    |    16.47    |    20.62    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.95x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.8287  |
|         soft_actor_critic         | 256  |  1.6029  |
|            densenet121            |  64  |   1.46   |
|            mnasnet1_0             |  32  |  1.2959  |
|             resnet18              |  8   |  1.2223  |
|           pytorch_unet            |  1   |  1.1886  |
|       functorch_dp_cifar10        |  64  |  1.1859  |
|              alexnet              | 128  |  1.1594  |
|           squeezenet1_1           |  16  |  1.1573  |
|             resnet152             |  32  |  1.1529  |
|           mobilenet_v2            |  16  |  1.1485  |
|           timm_resnest            |  32  |  1.1455  |
|        mobilenet_v3_large         |  32  |  1.1413  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1121  |
|            timm_vovnet            |  32  |  1.1105  |
|             resnet50              |  32  |  1.0939  |
|               dcgan               | 256  |  1.0811  |
|               dlrm                | 2048 |  1.081   |
|            Super_SloMo            |  6   |  1.0735  |
|          LearningToPaint          |  96  |  1.0641  |
|        Background_Matting         |  1   |  1.0627  |
|               vgg16               |  4   |  1.0589  |
|           BERT_pytorch            |  2   |  1.0586  |
|            timm_regnet            |  32  |  1.0365  |
|            hf_T5_large            |  1   |  1.036   |
|          pytorch_stargan          |  16  |  1.0281  |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0179  |
|            hf_Reformer            |  1   |  1.0089  |
|            tts_angular            |  64  |  1.0026  |
|            hf_BigBird             |  1   |  0.9993  |
|              demucs               |  1   |  0.999   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9988  |
|                drq                |  1   |  0.9979  |
|      resnet50_quantized_qat       |  32  |  0.9952  |
|           hf_Longformer           |  1   |  0.9891  |
|               hf_T5               |  1   |  0.978   |
|          vision_maskrcnn          |  1   |  0.9622  |
|        speech_transformer         |  1   |  0.9319  |
|           hf_GPT2_large           |  1   |  0.9271  |
|              hf_GPT2              |  1   |  0.9202  |
|   timm_vision_transformer_large   |  8   |  0.9158  |
|            timm_nfnet             | 128  |  0.9154  |
|           hf_DistilBert           |  1   |  0.8986  |
|             hf_Albert             |  1   |  0.8814  |
|              hf_Bert              |  1   |  0.8779  |
|              yolov3               |  8   |  0.8727  |
|           fastNLP_Bert            |  1   |  0.8721  |
|              hf_Bart              |  1   |  0.8498  |
|          resnext50_32x4d          |  8   |  0.8362  |
|      nvidia_deeprecommender       | 256  |  0.8304  |
|            hf_T5_base             |  1   |  0.8286  |
|      timm_vision_transformer      |  8   |  0.764   |
|         timm_efficientnet         |  64  |  0.7439  |
| attention_is_all_you_need_pytorch |  32  |  0.7132  |
|           lennard_jones           | 1000 |  0.3489  |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|              yolov3               |  2  |       pass       |
|                drq                |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   | 64.5524  |
|          vision_maskrcnn          |  1   | 63.9086  |
|           BERT_pytorch            |  2   | 45.7082  |
|            hf_T5_large            |  1   | 33.6462  |
|           hf_GPT2_large           |  1   | 22.5423  |
|            hf_BigBird             |  1   | 22.4972  |
|            hf_T5_base             |  1   | 22.2742  |
|            timm_nfnet             | 128  | 22.1908  |
|   timm_vision_transformer_large   |  8   | 21.7555  |
|           hf_Longformer           |  1   | 21.6172  |
|            densenet121            |  64  | 21.4901  |
|              yolov3               |  8   |  19.69   |
|           mobilenet_v2            |  16  | 18.9421  |
|            Super_SloMo            |  6   | 17.2204  |
|             resnet152             |  32  | 17.0653  |
|        speech_transformer         |  1   | 15.5597  |
|            timm_regnet            |  32  |  15.255  |
|              hf_Bart              |  1   | 13.7245  |
|         timm_efficientnet         |  64  | 13.6015  |
|            hf_Reformer            |  1   | 13.5318  |
|           fastNLP_Bert            |  1   | 13.1859  |
| attention_is_all_you_need_pytorch |  32  | 13.0332  |
|               hf_T5               |  1   | 12.8551  |
|              hf_Bert              |  1   | 12.2325  |
|              hf_GPT2              |  1   | 11.7985  |
|      timm_vision_transformer      |  8   | 11.7079  |
|        Background_Matting         |  1   | 11.3904  |
|             hf_Albert             |  1   | 11.3643  |
|            timm_vovnet            |  32  | 11.1937  |
|             resnet50              |  32  | 11.0994  |
|        mobilenet_v3_large         |  32  | 10.9908  |
|          resnext50_32x4d          |  8   | 10.7624  |
|        shufflenet_v2_x1_0         |  64  | 10.6591  |
|            mnasnet1_0             |  32  | 10.2952  |
|           hf_DistilBert           |  1   | 10.1395  |
|           timm_resnest            |  32  |  9.9293  |
|       functorch_dp_cifar10        |  64  |  9.5011  |
|           pytorch_unet            |  1   |  9.3573  |
|          pytorch_stargan          |  16  |  9.3224  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2365  |
|          LearningToPaint          |  96  |  8.6886  |
|           squeezenet1_1           |  16  |  8.6509  |
|             resnet18              |  8   |  8.522   |
|      nvidia_deeprecommender       | 256  |  8.3779  |
|               vgg16               |  4   |  8.2719  |
|               dlrm                | 2048 |  8.2115  |
|              alexnet              | 128  |  8.1818  |
|            tts_angular            |  64  |  8.0338  |
|                drq                |  1   |  7.9702  |
|         soft_actor_critic         | 256  |  7.7336  |
|           lennard_jones           | 1000 |  7.7074  |
|              demucs               |  1   |  1.2856  |
|               dcgan               | 256  |  0.3822  |
|    mobilenet_v2_quantized_qat     |  96  |  0.0985  |
|      resnet50_quantized_qat       |  32  |  0.0088  |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9988  |
|               dlrm                | 2048 |  0.9971  |
|      resnet50_quantized_qat       |  32  |  0.996   |
|        Background_Matting         |  1   |  0.9938  |
|            timm_regnet            |  32  |  0.9936  |
|               vgg16               |  4   |  0.9927  |
|            timm_nfnet             | 128  |  0.9917  |
|          LearningToPaint          |  96  |  0.9912  |
|            hf_BigBird             |  1   |  0.9895  |
|            densenet121            |  64  |  0.9893  |
|             resnet152             |  32  |  0.9889  |
|             resnet50              |  32  |  0.9866  |
|           pytorch_unet            |  1   |  0.986   |
|   timm_vision_transformer_large   |  8   |  0.9859  |
|            mnasnet1_0             |  32  |  0.9852  |
|           lennard_jones           | 1000 |  0.9837  |
|           mobilenet_v2            |  16  |  0.9828  |
|         soft_actor_critic         | 256  |  0.9828  |
|                drq                |  1   |  0.9819  |
| attention_is_all_you_need_pytorch |  32  |  0.9818  |
|        mobilenet_v3_large         |  32  |  0.9814  |
|        shufflenet_v2_x1_0         |  64  |  0.9813  |
|           hf_GPT2_large           |  1   |  0.9793  |
|            Super_SloMo            |  6   |  0.9789  |
|              hf_Bart              |  1   |  0.9772  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9771  |
|            timm_vovnet            |  32  |  0.9767  |
|           hf_DistilBert           |  1   |  0.9761  |
|    mobilenet_v2_quantized_qat     |  96  |  0.9748  |
|             hf_Albert             |  1   |  0.9741  |
|           squeezenet1_1           |  16  |  0.9733  |
|          resnext50_32x4d          |  8   |  0.9732  |
|            tts_angular            |  64  |  0.9722  |
|           BERT_pytorch            |  2   |  0.9715  |
|         timm_efficientnet         |  64  |  0.9707  |
|        speech_transformer         |  1   |  0.9686  |
|          vision_maskrcnn          |  1   |  0.9681  |
|              hf_GPT2              |  1   |  0.9645  |
|          pytorch_stargan          |  16  |  0.9623  |
|               dcgan               | 256  |  0.957   |
|             resnet18              |  8   |  0.9524  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.951   |
|            hf_T5_base             |  1   |  0.9448  |
|      timm_vision_transformer      |  8   |  0.9343  |
|            hf_Reformer            |  1   |  0.9338  |
|              alexnet              | 128  |  0.9295  |
|           fastNLP_Bert            |  1   |  0.9217  |
|              yolov3               |  8   |  0.9142  |
|       functorch_dp_cifar10        |  64  |  0.9129  |
|           timm_resnest            |  32  |  0.8811  |
|           hf_Longformer           |  1   |  0.877   |
|              hf_Bert              |  1   |  0.7856  |
|            hf_T5_large            |  1   |  0.6783  |
|               hf_T5               |  1   |  0.6081  |
|      nvidia_deeprecommender       | 256  |  0.5844  |
+-----------------------------------+------+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.4476  |
|            XLNetLMHeadModel             | 32  |  1.9966  |
|            YituTechConvBert             |  1  |  1.3041  |
|               DistillGPT2               |  1  |  1.2765  |
|     MobileBertForQuestionAnswering      | 64  |  1.2614  |
|               GoogleFnet                |  1  |  1.2128  |
|          MobileBertForMaskedLM          | 32  |  1.0902  |
|                 BigBird                 |  1  |  1.0684  |
|             XGLMForCausalLM             |  8  |  1.0601  |
|     M2M100ForConditionalGeneration      |  8  |  1.0597  |
|          AllenaiLongformerBase          |  1  |  1.0388  |
|                CamemBert                |  1  |  1.0358  |
|            AlbertForMaskedLM            |  4  |  1.0232  |
|       AlbertForQuestionAnswering        |  4  |  1.0218  |
|             OPTForCausalLM              | 32  |  0.9935  |
|           DebertaForMaskedLM            |  4  |  0.9902  |
|                 T5Small                 |  1  |  0.9826  |
|       DebertaForQuestionAnswering       |  8  |  0.9465  |
|     PLBartForConditionalGeneration      | 16  |  0.9233  |
|      MBartForConditionalGeneration      | 16  |  0.9222  |
|         MegatronBertForCausalLM         | 16  |  0.9217  |
|     PegasusForConditionalGeneration     | 16  |  0.9195  |
|      GPT2ForSequenceClassification      |  4  |  0.9192  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9112  |
|    LayoutLMForSequenceClassification    | 16  |  0.9082  |
|           RobertaForCausalLM            | 64  |  0.9069  |
|         Speech2Text2ForCausalLM         | 128 |  0.9055  |
|            TrOCRForCausalLM             | 32  |  0.9011  |
|           ElectraForCausalLM            | 32  |  0.8992  |
|     DistilBertForQuestionAnswering      | 64  |  0.8984  |
|            PLBartForCausalLM            | 32  |  0.8943  |
|           PegasusForCausalLM            | 32  |  0.8924  |
|       RobertaForQuestionAnswering       | 128 |  0.8892  |
|           LayoutLMForMaskedLM           | 16  |  0.8876  |
|             BertForMaskedLM             | 64  |  0.8794  |
|        BertForQuestionAnswering         | 128 |  0.8784  |
|       ElectraForQuestionAnswering       | 64  |  0.8612  |
|          DistilBertForMaskedLM          | 64  |  0.8588  |
|       T5ForConditionalGeneration        |  4  |  0.8556  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8261  |
|             BartForCausalLM             |  4  |  0.8217  |
|      BartForConditionalGeneration       |  2  |  0.8188  |
|       BlenderbotSmallForCausalLM        | 64  |  0.8048  |
|            MBartForCausalLM             | 32  |  0.7923  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|     PegasusForConditionalGeneration     | 16  | 23.9196  |
|          MobileBertForMaskedLM          | 32  | 23.8928  |
|          AllenaiLongformerBase          |  1  | 23.1661  |
|     MobileBertForQuestionAnswering      | 64  | 22.9028  |
|           DebertaForMaskedLM            |  4  | 22.8131  |
|       DebertaForQuestionAnswering       |  8  | 22.6847  |
|     M2M100ForConditionalGeneration      |  8  | 22.5781  |
|      BartForConditionalGeneration       |  2  | 21.6736  |
|                 BigBird                 |  1  | 21.5731  |
|      MBartForConditionalGeneration      | 16  | 21.0786  |
|             XGLMForCausalLM             |  8  | 20.0701  |
|          DistilBertForMaskedLM          | 64  |  19.039  |
|         MegatronBertForCausalLM         | 16  | 18.8882  |
|       AlbertForQuestionAnswering        |  4  | 18.6163  |
|    MegatronBertForQuestionAnswering     | 16  | 18.5201  |
|            YituTechConvBert             |  1  | 17.9706  |
| BlenderbotSmallForConditionalGeneration | 64  | 17.3304  |
|           PegasusForCausalLM            | 32  | 16.4097  |
|            AlbertForMaskedLM            |  4  | 16.1552  |
|       MT5ForConditionalGeneration       |  8  | 15.8319  |
|       RobertaForQuestionAnswering       | 128 | 15.5256  |
|        BertForQuestionAnswering         | 128 | 15.4197  |
|           LayoutLMForMaskedLM           | 16  | 15.2302  |
|       T5ForConditionalGeneration        |  4  | 14.9184  |
|           RobertaForCausalLM            | 64  | 14.8398  |
|             BartForCausalLM             |  4  |  14.813  |
|       ElectraForQuestionAnswering       | 64  | 14.6766  |
|             BertForMaskedLM             | 64  |  14.621  |
|    LayoutLMForSequenceClassification    | 16  | 14.5713  |
|           ElectraForCausalLM            | 32  | 14.4872  |
|     PLBartForConditionalGeneration      | 16  | 14.4639  |
|            MBartForCausalLM             | 32  | 14.2826  |
|      GPT2ForSequenceClassification      |  4  | 14.1216  |
|            TrOCRForCausalLM             | 32  | 13.7212  |
|             OPTForCausalLM              | 32  | 13.2485  |
|                 T5Small                 |  1  |  13.094  |
|                CamemBert                |  1  | 12.6413  |
|               GoogleFnet                |  1  | 12.4936  |
|         Speech2Text2ForCausalLM         | 128 | 12.3975  |
|       BlenderbotSmallForCausalLM        | 64  | 12.2115  |
|     DistilBertForQuestionAnswering      | 64  | 11.0523  |
|            PLBartForCausalLM            | 32  |  10.871  |
|               DistillGPT2               |  1  |  9.8446  |
|            XLNetLMHeadModel             | 32  |  5.9197  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9977  |
|       AlbertForQuestionAnswering        |  4  |  0.9976  |
|           ElectraForCausalLM            | 32  |  0.9961  |
|             BartForCausalLM             |  4  |  0.9959  |
|           RobertaForCausalLM            | 64  |  0.9955  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9954  |
|             BertForMaskedLM             | 64  |  0.9952  |
|       ElectraForQuestionAnswering       | 64  |  0.995   |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9948  |
|      GPT2ForSequenceClassification      |  4  |  0.9945  |
|          DistilBertForMaskedLM          | 64  |  0.9944  |
|            TrOCRForCausalLM             | 32  |  0.9941  |
|           PegasusForCausalLM            | 32  |  0.9939  |
|       T5ForConditionalGeneration        |  4  |  0.9936  |
|            MBartForCausalLM             | 32  |  0.9932  |
|             OPTForCausalLM              | 32  |  0.9931  |
|            PLBartForCausalLM            | 32  |  0.9925  |
|         Speech2Text2ForCausalLM         | 128 |  0.9925  |
|      BartForConditionalGeneration       |  2  |  0.9919  |
|     DistilBertForQuestionAnswering      | 64  |  0.9915  |
|           DebertaForMaskedLM            |  4  |  0.9915  |
|     PegasusForConditionalGeneration     | 16  |  0.9914  |
|    LayoutLMForSequenceClassification    | 16  |  0.9913  |
|        BertForQuestionAnswering         | 128 |  0.9912  |
|       RobertaForQuestionAnswering       | 128 |  0.9911  |
|                 BigBird                 |  1  |  0.9911  |
|         MegatronBertForCausalLM         | 16  |  0.9903  |
|      MBartForConditionalGeneration      | 16  |  0.9879  |
|           LayoutLMForMaskedLM           | 16  |  0.9876  |
|               GoogleFnet                |  1  |  0.9849  |
|     PLBartForConditionalGeneration      | 16  |  0.9832  |
|               DistillGPT2               |  1  |  0.9832  |
|          MobileBertForMaskedLM          | 32  |  0.9793  |
|            XLNetLMHeadModel             | 32  |  0.9764  |
|          AllenaiLongformerBase          |  1  |  0.9739  |
|             XGLMForCausalLM             |  8  |  0.9707  |
|    MegatronBertForQuestionAnswering     | 16  |   0.96   |
|     M2M100ForConditionalGeneration      |  8  |  0.9463  |
|            YituTechConvBert             |  1  |  0.9448  |
|                CamemBert                |  1  |  0.9319  |
|     MobileBertForQuestionAnswering      | 64  |  0.8914  |
|       DebertaForQuestionAnswering       |  8  |  0.805   |
|                 T5Small                 |  1  |  0.7752  |
|       MT5ForConditionalGeneration       |  8  |  0.7396  |
+-----------------------------------------+-----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.5208  |
|        gluon_xception65         | 32  |  1.4163  |
|           resnest101e           | 64  |  1.3082  |
|          inception_v3           | 128 |  1.3007  |
|        adv_inception_v3         | 128 |  1.286   |
|       gluon_inception_v3        | 128 |  1.2831  |
|        res2net101_26w_4s        | 64  |  1.2057  |
|           volo_d1_224           | 64  |  1.2056  |
|           fbnetc_100            | 128 |  1.2011  |
|        res2net50_14w_8s         | 128 |  1.1988  |
|          ghostnet_100           | 128 |  1.1984  |
|           res2next50            | 128 |  1.198   |
|           mnasnet_100           | 128 |  1.1891  |
|          spnasnet_100           | 128 |  1.1889  |
|      mobilenetv3_large_100      | 128 |  1.1738  |
|        ese_vovnet19b_dw         | 128 |  1.1707  |
|            hrnet_w18            | 128 |  1.1678  |
|             dpn107              | 32  |  1.1573  |
|            lcnet_050            | 128 |  1.1561  |
|            repvgg_a2            | 128 |  1.1224  |
|            gernet_l             | 128 |  1.1206  |
|            fbnetv3_b            | 128 |  1.1005  |
|         visformer_small         | 128 |  1.0995  |
|           selecsls42b           | 128 |  1.0811  |
|         mobilenetv2_100         | 128 |  1.0765  |
|           regnety_002           | 128 |  1.0605  |
|          cspdarknet53           | 64  |  1.0587  |
|           tf_mixnet_l           | 128 |  1.012   |
|      xcit_large_24_p8_224       |  5  |  1.0046  |
|          gmixer_24_224          | 128 |  0.9926  |
|            mixnet_l             | 128 |  0.9656  |
|          cait_m36_384           |  4  |  0.9542  |
|             dla102              | 128 |  0.9532  |
|           dm_nfnet_f0           | 128 |  0.9113  |
|  swin_base_patch4_window7_224   | 64  |  0.8918  |
|      beit_base_patch16_224      | 64  |  0.8875  |
|          convnext_base          | 64  |  0.8669  |
|         poolformer_m36          | 64  |  0.8643  |
| deit_base_distilled_patch16_224 | 64  |  0.8496  |
|      vit_base_patch16_224       | 64  |  0.8432  |
|            nfnet_l0             | 128 |  0.8416  |
|           convit_base           | 64  |  0.8362  |
|          resmlp_12_224          | 128 |  0.8293  |
|          jx_nest_base           | 32  |  0.8262  |
|         coat_lite_mini          | 128 |  0.822   |
|           rexnet_100            | 128 |  0.8213  |
|          mixer_b16_224          | 128 |  0.8027  |
|        tnt_s_patch16_224        | 128 |  0.8001  |
|            tinynet_a            | 128 |   0.8    |
|       tf_efficientnet_b0        | 128 |  0.7965  |
|            pit_b_224            | 64  |  0.7856  |
|           mobilevit_s           | 64  |  0.7611  |
|        twins_pcpvt_base         | 64  |  0.7438  |
|         crossvit_9_240          | 128 |  0.7234  |
|          gmlp_s16_224           | 128 |  0.6081  |
|     swsl_resnext101_32x16d      | 32  |  0.0721  |
|        sebotnet33ts_256         |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
|       eca_botnext26ts_256       |  0  |   0.0    |
|          botnet26t_256          |  0  |   0.0    |
|        eca_halonext26ts         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        eca_halonext26ts         | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|        sebotnet33ts_256         | 2  |  fail_to_run  |
|          botnet26t_256          | 2  |  fail_to_run  |
|       eca_botnext26ts_256       | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 116.2647 |
|          pnasnet5large          | 16  | 34.3568  |
|            hrnet_w18            | 128 | 33.1729  |
|  swin_base_patch4_window7_224   | 64  | 30.3657  |
|          cait_m36_384           |  4  |  26.113  |
|       tf_efficientnet_b0        | 128 | 26.0213  |
|           dm_nfnet_f0           | 128 | 25.8731  |
|           tf_mixnet_l           | 128 | 25.1199  |
|        twins_pcpvt_base         | 64  | 24.4181  |
|             dla102              | 128 | 24.3209  |
|           mobilevit_s           | 64  | 24.2889  |
|         poolformer_m36          | 64  | 23.7559  |
|            mixnet_l             | 128 | 22.9678  |
|        res2net50_14w_8s         | 128 | 22.6444  |
|      xcit_large_24_p8_224       |  5  | 22.5081  |
|        tnt_s_patch16_224        | 128 |  22.374  |
|           resnest101e           | 64  | 22.0767  |
|            nfnet_l0             | 128 | 21.8411  |
|            fbnetv3_b            | 128 | 21.3069  |
|        res2net101_26w_4s        | 64  |  21.205  |
|             dpn107              | 32  | 21.1366  |
|           rexnet_100            | 128 | 20.9301  |
|          gmlp_s16_224           | 128 | 20.5459  |
|          jx_nest_base           | 32  | 19.9281  |
|          convnext_base          | 64  |  18.721  |
|           volo_d1_224           | 64  | 18.2821  |
|         crossvit_9_240          | 128 | 18.1307  |
|           convit_base           | 64  | 17.8616  |
|            tinynet_a            | 128 | 17.5486  |
|           res2next50            | 128 | 17.3809  |
|            pit_b_224            | 64  | 17.1044  |
|         coat_lite_mini          | 128 | 17.0934  |
|          cspdarknet53           | 64  |  16.315  |
|       gluon_inception_v3        | 128 |  16.027  |
|          inception_v3           | 128 | 15.9994  |
|        adv_inception_v3         | 128 | 15.9628  |
|        gluon_xception65         | 32  | 15.8657  |
|          gmixer_24_224          | 128 | 15.5962  |
|          mixer_b16_224          | 128 | 15.3517  |
|          ghostnet_100           | 128 | 15.1605  |
|         visformer_small         | 128 | 14.9027  |
|      beit_base_patch16_224      | 64  | 14.7057  |
| deit_base_distilled_patch16_224 | 64  | 14.5395  |
|         mobilenetv2_100         | 128 | 14.4137  |
|      mobilenetv3_large_100      | 128 | 14.2885  |
|      vit_base_patch16_224       | 64  | 14.2558  |
|           fbnetc_100            | 128 | 14.1446  |
|          spnasnet_100           | 128 | 13.8994  |
|            gernet_l             | 128 | 13.4121  |
|           mnasnet_100           | 128 | 12.7666  |
|           regnety_002           | 128 | 12.5997  |
|            repvgg_a2            | 128 | 12.5894  |
|        ese_vovnet19b_dw         | 128 | 12.1089  |
|           selecsls42b           | 128 | 12.0156  |
|          resmlp_12_224          | 128 | 11.5499  |
|            lcnet_050            | 128 | 10.6792  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        adv_inception_v3         | 128 |  0.9973  |
|          mixer_b16_224          | 128 |  0.9973  |
|           resnest101e           | 64  |  0.997   |
|        ese_vovnet19b_dw         | 128 |  0.997   |
|          inception_v3           | 128 |  0.9969  |
|             dla102              | 128 |  0.9969  |
|           dm_nfnet_f0           | 128 |  0.9968  |
|           tf_mixnet_l           | 128 |  0.9967  |
|         coat_lite_mini          | 128 |  0.9966  |
|            mixnet_l             | 128 |  0.9964  |
| deit_base_distilled_patch16_224 | 64  |  0.9964  |
|           convit_base           | 64  |  0.9964  |
|          cspdarknet53           | 64  |  0.9963  |
|      vit_base_patch16_224       | 64  |  0.9963  |
|          gmixer_24_224          | 128 |  0.9962  |
|      beit_base_patch16_224      | 64  |  0.9962  |
|           selecsls42b           | 128 |  0.9962  |
|        gluon_xception65         | 32  |  0.9961  |
|            gernet_l             | 128 |  0.9961  |
|          resmlp_12_224          | 128 |  0.996   |
|            fbnetv3_b            | 128 |  0.996   |
|          convnext_base          | 64  |  0.9956  |
|           mobilevit_s           | 64  |  0.9954  |
|          gmlp_s16_224           | 128 |  0.9954  |
|            pit_b_224            | 64  |  0.9954  |
|       tf_efficientnet_b0        | 128 |  0.9952  |
|            tinynet_a            | 128 |  0.9949  |
|           res2next50            | 128 |  0.9947  |
|           fbnetc_100            | 128 |  0.9946  |
|           mnasnet_100           | 128 |  0.9944  |
|          spnasnet_100           | 128 |  0.9943  |
|            hrnet_w18            | 128 |  0.9943  |
|           rexnet_100            | 128 |  0.9942  |
|         visformer_small         | 128 |  0.9941  |
|        tnt_s_patch16_224        | 128 |  0.9941  |
|            repvgg_a2            | 128 |  0.9941  |
|         mobilenetv2_100         | 128 |  0.9941  |
|      mobilenetv3_large_100      | 128 |  0.994   |
|       gluon_inception_v3        | 128 |  0.9939  |
|             dpn107              | 32  |  0.9937  |
|          cait_m36_384           |  4  |  0.9935  |
|            nfnet_l0             | 128 |  0.9934  |
|        res2net50_14w_8s         | 128 |  0.9933  |
|         poolformer_m36          | 64  |  0.9931  |
|  swin_base_patch4_window7_224   | 64  |  0.9929  |
|          jx_nest_base           | 32  |  0.9925  |
|        res2net101_26w_4s        | 64  |  0.9924  |
|          ghostnet_100           | 128 |  0.9923  |
|        twins_pcpvt_base         | 64  |  0.9915  |
|         crossvit_9_240          | 128 |  0.991   |
|          pnasnet5large          | 16  |  0.9904  |
|           volo_d1_224           | 64  |  0.9903  |
|      xcit_large_24_p8_224       |  5  |  0.9901  |
|     swsl_resnext101_32x16d      | 32  |  0.9898  |
|            lcnet_050            | 128 |  0.9883  |
|           regnety_002           | 128 |  0.9879  |
|          botnet26t_256          |  0  |   nan    |
|        convmixer_768_32         |  0  |   nan    |
|       eca_botnext26ts_256       |  0  |   nan    |
|        eca_halonext26ts         |  0  |   nan    |
|        sebotnet33ts_256         |  0  |   nan    |
+---------------------------------+-----+----------+

chuanqi129 · 2022-11-11T16:52:50Z

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-09 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	3b29687	a7420d2
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 55/55 | 100%, 44/44 | 92%, 56/61  |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.05x    |    1.00x    |    1.08x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   20.31    |    21.37    |    24.75    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.10x    |    1.29x    |    1.07x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|        shufflenet_v2_x1_0         |  1  |  1.4852  |
|           squeezenet1_1           |  1  |  1.2136  |
|           hf_Longformer           |  1  |  1.1919  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1811  |
|            timm_nfnet             |  1  |  1.1635  |
|       functorch_dp_cifar10        |  1  |  1.1517  |
|             resnet18              |  1  |  1.1458  |
|           pytorch_unet            |  1  |  1.1379  |
|            timm_vovnet            |  1  |  1.1126  |
| attention_is_all_you_need_pytorch |  1  |  1.1067  |
|     detectron2_fcos_r_50_fpn      |  1  |  1.094   |
|            Super_SloMo            |  1  |  1.0921  |
|               vgg16               |  1  |  1.0901  |
|              alexnet              |  1  |  1.0825  |
|                drq                |  1  |  1.0759  |
|            timm_regnet            |  1  |  1.0654  |
|        Background_Matting         |  1  |  1.0627  |
|          LearningToPaint          |  1  |  1.0595  |
|         soft_actor_critic         | 256 |  1.0328  |
|               dcgan               |  1  |  1.032   |
|               dlrm                |  1  |  1.0286  |
|           timm_resnest            |  1  |  1.0247  |
|          pytorch_stargan          | 16  |  1.0146  |
|            densenet121            |  1  |  1.001   |
|              demucs               |  1  |  0.9994  |
|            tts_angular            |  1  |  0.998   |
|      resnet50_quantized_qat       |  1  |  0.9967  |
|            hf_BigBird             |  1  |  0.9922  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9921  |
|      nvidia_deeprecommender       |  1  |  0.984   |
|             resnet152             |  1  |  0.9384  |
|        speech_transformer         |  1  |  0.9359  |
|           mobilenet_v2            |  1  |  0.9322  |
|            mnasnet1_0             |  1  |  0.9098  |
|          vision_maskrcnn          |  1  |  0.8935  |
|           BERT_pytorch            |  1  |  0.891   |
|              yolov3               |  1  |  0.8903  |
|              hf_GPT2              |  1  |  0.8772  |
|             resnet50              |  1  |  0.8629  |
|   timm_vision_transformer_large   |  1  |  0.8603  |
|             hf_Albert             |  1  |  0.8272  |
|            hf_Reformer            |  1  |  0.8229  |
|        mobilenet_v3_large         |  1  |  0.8028  |
|            hf_T5_large            |  1  |  0.7941  |
|           hf_DistilBert           |  1  |  0.7891  |
|           lennard_jones           |  1  |  0.7829  |
|           fastNLP_Bert            |  1  |  0.7693  |
|              hf_Bert              |  1  |  0.7587  |
|               hf_T5               |  1  |  0.7556  |
|           hf_GPT2_large           |  1  |  0.7493  |
|          resnext50_32x4d          |  1  |  0.7484  |
|              hf_Bart              |  1  |  0.742   |
|      timm_vision_transformer      |  1  |  0.6809  |
|            hf_T5_base             |  1  |   0.67   |
|         timm_efficientnet         |  1  |  0.5005  |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|              yolov3               |  1  |       pass       |
|                drq                |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|     detectron2_fcos_r_50_fpn      |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|           BERT_pytorch            |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 81.7092  |
|          vision_maskrcnn          |  1  | 72.4651  |
|     detectron2_fcos_r_50_fpn      |  1  |  69.401  |
|           hf_GPT2_large           |  1  | 59.9626  |
|            hf_T5_large            |  1  | 54.6313  |
|            densenet121            |  1  | 33.7722  |
|              yolov3               |  1  |  28.69   |
|   timm_vision_transformer_large   |  1  | 28.2311  |
|            timm_nfnet             |  1  | 26.8436  |
|           hf_Longformer           |  1  |  26.637  |
|            Super_SloMo            |  1  | 26.4081  |
|            hf_BigBird             |  1  | 24.8659  |
|         timm_efficientnet         |  1  | 24.7895  |
|        Background_Matting         |  1  | 24.5048  |
|           BERT_pytorch            |  1  | 24.3098  |
|             resnet152             |  1  | 21.2911  |
|            timm_regnet            |  1  |  20.07   |
|        speech_transformer         |  1  | 19.8299  |
|               hf_T5               |  1  | 19.5154  |
|        mobilenet_v3_large         |  1  | 19.4807  |
|            hf_Reformer            |  1  | 18.6868  |
|              hf_Bart              |  1  | 18.0463  |
|           pytorch_unet            |  1  | 17.9894  |
|           fastNLP_Bert            |  1  | 17.0308  |
|          resnext50_32x4d          |  1  | 16.6334  |
|           mobilenet_v2            |  1  | 16.5489  |
|             resnet50              |  1  | 16.2093  |
|              hf_GPT2              |  1  | 16.1715  |
|              hf_Bert              |  1  |  15.522  |
|            mnasnet1_0             |  1  | 14.8517  |
|           timm_resnest            |  1  | 14.7064  |
|            timm_vovnet            |  1  | 14.6828  |
|             hf_Albert             |  1  |  14.343  |
|        shufflenet_v2_x1_0         |  1  | 14.1955  |
| attention_is_all_you_need_pytorch |  1  | 14.1544  |
|       functorch_dp_cifar10        |  1  | 13.9881  |
|      timm_vision_transformer      |  1  | 13.7393  |
|           hf_DistilBert           |  1  | 13.1191  |
|          pytorch_stargan          | 16  | 12.5482  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 11.5582  |
|           squeezenet1_1           |  1  | 11.2071  |
|          LearningToPaint          |  1  | 10.6182  |
|                drq                |  1  |  9.9081  |
|               vgg16               |  1  |  9.7425  |
|             resnet18              |  1  |  9.7095  |
|               dlrm                |  1  |  9.5449  |
|      nvidia_deeprecommender       |  1  |  9.441   |
|              alexnet              |  1  |   9.03   |
|            tts_angular            |  1  |  8.247   |
|         soft_actor_critic         | 256 |  8.2421  |
|           lennard_jones           |  1  |  7.8457  |
|              demucs               |  1  |  1.1898  |
|               dcgan               |  1  |  0.1695  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0748  |
|      resnet50_quantized_qat       |  1  |  0.0579  |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|           pytorch_unet            |  1  |  2.1274  |
|             hf_Albert             |  1  |  2.0308  |
|            hf_BigBird             |  1  |  1.7822  |
|            Super_SloMo            |  1  |  1.7473  |
|            hf_T5_large            |  1  |  1.4919  |
|          vision_maskrcnn          |  1  |  1.4585  |
|        speech_transformer         |  1  |  1.4474  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.4467  |
|           fastNLP_Bert            |  1  |  1.359   |
|            hf_T5_base             |  1  |  1.3165  |
|               hf_T5               |  1  |  1.2507  |
|            timm_nfnet             |  1  |  1.1479  |
|            timm_regnet            |  1  |  1.1406  |
|   timm_vision_transformer_large   |  1  |  1.1151  |
|         soft_actor_critic         | 256 |  1.1148  |
|              hf_GPT2              |  1  |  1.1111  |
|         timm_efficientnet         |  1  |  1.0556  |
|            timm_vovnet            |  1  |  1.0511  |
|           mobilenet_v2            |  1  |  1.0508  |
|      timm_vision_transformer      |  1  |  1.0422  |
|              yolov3               |  1  |  1.0148  |
|           hf_GPT2_large           |  1  |  1.0104  |
|               dlrm                |  1  |  1.0008  |
|              demucs               |  1  |  0.9986  |
|      nvidia_deeprecommender       |  1  |  0.9973  |
|      resnet50_quantized_qat       |  1  |  0.9953  |
|        Background_Matting         |  1  |  0.9952  |
|            tts_angular            |  1  |  0.9931  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9927  |
|          pytorch_stargan          | 16  |  0.9886  |
|           lennard_jones           |  1  |  0.9833  |
|        mobilenet_v3_large         |  1  |  0.9811  |
|            mnasnet1_0             |  1  |  0.9796  |
|     detectron2_fcos_r_50_fpn      |  1  |  0.9783  |
|                drq                |  1  |  0.9753  |
|               vgg16               |  1  |  0.9738  |
| attention_is_all_you_need_pytorch |  1  |  0.9696  |
|           squeezenet1_1           |  1  |  0.9683  |
|              alexnet              |  1  |  0.9671  |
|           timm_resnest            |  1  |  0.9604  |
|        shufflenet_v2_x1_0         |  1  |  0.9602  |
|           hf_Longformer           |  1  |  0.9518  |
|       functorch_dp_cifar10        |  1  |  0.9511  |
|               dcgan               |  1  |  0.9463  |
|          LearningToPaint          |  1  |  0.9427  |
|             resnet50              |  1  |  0.9418  |
|              hf_Bart              |  1  |  0.9408  |
|           BERT_pytorch            |  1  |  0.9357  |
|              hf_Bert              |  1  |  0.9295  |
|             resnet18              |  1  |  0.9199  |
|             resnet152             |  1  |  0.9161  |
|           hf_DistilBert           |  1  |  0.8963  |
|          resnext50_32x4d          |  1  |  0.8052  |
|            hf_Reformer            |  1  |  0.7742  |
|            densenet121            |  1  |  0.7664  |
+-----------------------------------+-----+----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            XLNetLMHeadModel             | 1  |  1.1042  |
|     MobileBertForQuestionAnswering      | 1  |  1.0831  |
|       AlbertForQuestionAnswering        | 1  |  0.9533  |
|            AlbertForMaskedLM            | 1  |  0.9478  |
|          MobileBertForMaskedLM          | 1  |  0.9301  |
|     M2M100ForConditionalGeneration      | 1  |  0.8985  |
|                 BigBird                 | 1  |  0.8978  |
|            YituTechConvBert             | 1  |  0.8934  |
|             OPTForCausalLM              | 1  |  0.8727  |
|       DebertaForQuestionAnswering       | 1  |  0.8613  |
|               GoogleFnet                | 1  |  0.8548  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8545  |
|         MegatronBertForCausalLM         | 1  |  0.8449  |
|      MBartForConditionalGeneration      | 1  |  0.8443  |
|     PegasusForConditionalGeneration     | 1  |  0.843   |
|     DistilBertForQuestionAnswering      | 1  |  0.8402  |
|        BertForQuestionAnswering         | 1  |  0.8328  |
|            TrOCRForCausalLM             | 1  |  0.8311  |
|            MBartForCausalLM             | 1  |  0.8303  |
|           PegasusForCausalLM            | 1  |  0.8296  |
|       RobertaForQuestionAnswering       | 1  |  0.8235  |
|          AllenaiLongformerBase          | 1  |  0.8232  |
|           DebertaForMaskedLM            | 1  |  0.8209  |
|             XGLMForCausalLM             | 1  |  0.8092  |
|         Speech2Text2ForCausalLM         | 1  |  0.8035  |
|     PLBartForConditionalGeneration      | 1  |  0.8028  |
|             BertForMaskedLM             | 1  |  0.7952  |
|           RobertaForCausalLM            | 1  |  0.7938  |
|          DistilBertForMaskedLM          | 1  |  0.7847  |
|            PLBartForCausalLM            | 1  |  0.7766  |
|               DistillGPT2               | 1  |  0.7334  |
|           LayoutLMForMaskedLM           | 1  |  0.7261  |
|    LayoutLMForSequenceClassification    | 1  |  0.7245  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.7184  |
|                CamemBert                | 1  |  0.7103  |
|       BlenderbotSmallForCausalLM        | 1  |  0.6951  |
|       MT5ForConditionalGeneration       | 1  |  0.6884  |
|             BartForCausalLM             | 1  |  0.6786  |
|      GPT2ForSequenceClassification      | 1  |  0.6739  |
|      BartForConditionalGeneration       | 1  |  0.6451  |
|       T5ForConditionalGeneration        | 1  |  0.632   |
|                 T5Small                 | 1  |  0.6313  |
|       ElectraForQuestionAnswering       | 1  |  0.5732  |
|           ElectraForCausalLM            | 1  |  0.5332  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 48.1056  |
|       AlbertForQuestionAnswering        | 1  | 34.3286  |
|            AlbertForMaskedLM            | 1  | 33.2098  |
|          AllenaiLongformerBase          | 1  | 32.9485  |
|       MT5ForConditionalGeneration       | 1  | 32.3667  |
|     M2M100ForConditionalGeneration      | 1  | 26.2587  |
|          MobileBertForMaskedLM          | 1  | 26.1128  |
|                 BigBird                 | 1  | 25.7282  |
|             BartForCausalLM             | 1  | 25.6498  |
|     MobileBertForQuestionAnswering      | 1  | 25.6126  |
|     PegasusForConditionalGeneration     | 1  | 25.5287  |
|       T5ForConditionalGeneration        | 1  | 24.7543  |
|           DebertaForMaskedLM            | 1  | 24.7295  |
|            XLNetLMHeadModel             | 1  |  24.657  |
|      MBartForConditionalGeneration      | 1  | 24.5472  |
|                 T5Small                 | 1  | 24.3507  |
|             XGLMForCausalLM             | 1  | 23.8144  |
|            TrOCRForCausalLM             | 1  |  23.758  |
|       DebertaForQuestionAnswering       | 1  | 23.4078  |
|          DistilBertForMaskedLM          | 1  | 22.5849  |
|      GPT2ForSequenceClassification      | 1  | 22.0168  |
|         MegatronBertForCausalLM         | 1  | 21.8418  |
|    MegatronBertForQuestionAnswering     | 1  | 21.4816  |
|            YituTechConvBert             | 1  | 19.0803  |
|           PegasusForCausalLM            | 1  | 17.7898  |
| BlenderbotSmallForConditionalGeneration | 1  | 17.4371  |
|             BertForMaskedLM             | 1  | 17.1292  |
|                CamemBert                | 1  | 16.7218  |
|     PLBartForConditionalGeneration      | 1  | 16.6943  |
|            MBartForCausalLM             | 1  | 16.6502  |
|           LayoutLMForMaskedLM           | 1  | 16.5857  |
|        BertForQuestionAnswering         | 1  | 16.3166  |
|    LayoutLMForSequenceClassification    | 1  | 16.0688  |
|             OPTForCausalLM              | 1  | 15.9214  |
|               DistillGPT2               | 1  | 15.0388  |
|           RobertaForCausalLM            | 1  | 14.5262  |
|           ElectraForCausalLM            | 1  | 14.4963  |
|       RobertaForQuestionAnswering       | 1  | 14.2469  |
|       ElectraForQuestionAnswering       | 1  | 13.9396  |
|               GoogleFnet                | 1  | 13.6451  |
|       BlenderbotSmallForCausalLM        | 1  | 12.9301  |
|         Speech2Text2ForCausalLM         | 1  | 12.7357  |
|            PLBartForCausalLM            | 1  | 12.5302  |
|     DistilBertForQuestionAnswering      | 1  | 11.8878  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |  2.6371  |
|       AlbertForQuestionAnswering        | 1  |  2.5568  |
|      BartForConditionalGeneration       | 1  |  2.1208  |
|                 T5Small                 | 1  |  1.9472  |
|       T5ForConditionalGeneration        | 1  |  1.8685  |
|                 BigBird                 | 1  |  1.8542  |
|      GPT2ForSequenceClassification      | 1  |  1.8363  |
|             BartForCausalLM             | 1  |  1.5123  |
|          AllenaiLongformerBase          | 1  |  1.4905  |
|            XLNetLMHeadModel             | 1  |  1.3591  |
|       DebertaForQuestionAnswering       | 1  |  1.3508  |
|                CamemBert                | 1  |  1.3349  |
|           LayoutLMForMaskedLM           | 1  |  1.3255  |
|            YituTechConvBert             | 1  |  1.3106  |
|           DebertaForMaskedLM            | 1  |  1.3088  |
|               GoogleFnet                | 1  |  1.2897  |
|    LayoutLMForSequenceClassification    | 1  |  1.2841  |
|           ElectraForCausalLM            | 1  |  1.264   |
|               DistillGPT2               | 1  |   1.19   |
|       ElectraForQuestionAnswering       | 1  |  1.1781  |
|     M2M100ForConditionalGeneration      | 1  |  1.0959  |
| BlenderbotSmallForConditionalGeneration | 1  |  1.0836  |
|     PegasusForConditionalGeneration     | 1  |  1.0813  |
|      MBartForConditionalGeneration      | 1  |  1.0707  |
|       MT5ForConditionalGeneration       | 1  |  1.0639  |
|             BertForMaskedLM             | 1  |  1.0574  |
|         MegatronBertForCausalLM         | 1  |  1.0557  |
|    MegatronBertForQuestionAnswering     | 1  |  1.0497  |
|           RobertaForCausalLM            | 1  |  1.0444  |
|       BlenderbotSmallForCausalLM        | 1  |  1.0425  |
|            PLBartForCausalLM            | 1  |  1.0422  |
|            TrOCRForCausalLM             | 1  |  1.042   |
|             XGLMForCausalLM             | 1  |  1.0411  |
|             OPTForCausalLM              | 1  |  1.0408  |
|        BertForQuestionAnswering         | 1  |  1.038   |
|     PLBartForConditionalGeneration      | 1  |  1.0358  |
|          DistilBertForMaskedLM          | 1  |  1.0342  |
|       RobertaForQuestionAnswering       | 1  |  1.0332  |
|            MBartForCausalLM             | 1  |  1.0272  |
|     DistilBertForQuestionAnswering      | 1  |  1.0157  |
|           PegasusForCausalLM            | 1  |  1.0102  |
|          MobileBertForMaskedLM          | 1  |  0.9923  |
|         Speech2Text2ForCausalLM         | 1  |  0.9793  |
|     MobileBertForQuestionAnswering      | 1  |  0.9747  |
+-----------------------------------------+----+----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5448  |
|           regnety_002           | 1  |  1.3632  |
|            lcnet_050            | 1  |  1.3565  |
|        ese_vovnet19b_dw         | 1  |  1.3444  |
|          spnasnet_100           | 1  |  1.3401  |
|          inception_v3           | 1  |  1.3239  |
|       gluon_inception_v3        | 1  |  1.3096  |
|        adv_inception_v3         | 1  |  1.3006  |
|           mnasnet_100           | 1  |  1.2928  |
|           fbnetc_100            | 1  |  1.2682  |
|          gmixer_24_224          | 1  |  1.2677  |
|         mobilenetv2_100         | 1  |  1.2517  |
|            fbnetv3_b            | 1  |  1.1728  |
|      mobilenetv3_large_100      | 1  |  1.1453  |
|             dpn107              | 1  |  1.0835  |
|            gernet_l             | 1  |  1.0821  |
|          cspdarknet53           | 1  |  1.0688  |
|        gluon_xception65         | 1  |  1.0532  |
|            repvgg_a2            | 1  |  1.0493  |
|            hrnet_w18            | 1  |  1.0471  |
|          ghostnet_100           | 1  |  0.9966  |
|            nfnet_l0             | 1  |  0.9933  |
|           dm_nfnet_f0           | 1  |  0.9885  |
|           resnest101e           | 1  |  0.9787  |
|           selecsls42b           | 1  |  0.9235  |
|         visformer_small         | 1  |  0.9056  |
|        res2net101_26w_4s        | 1  |  0.8988  |
|      xcit_large_24_p8_224       | 1  |  0.8934  |
|          convnext_base          | 1  |  0.8669  |
|        res2net50_14w_8s         | 1  |  0.8607  |
|      beit_base_patch16_224      | 1  |  0.8547  |
|  swin_base_patch4_window7_224   | 1  |  0.8027  |
|           res2next50            | 1  |   0.79   |
| deit_base_distilled_patch16_224 | 1  |  0.7887  |
|           volo_d1_224           | 1  |  0.7649  |
|         poolformer_m36          | 1  |  0.7614  |
|           convit_base           | 1  |  0.7574  |
|      vit_base_patch16_224       | 1  |  0.755   |
|          cait_m36_384           | 1  |  0.7391  |
|           tf_mixnet_l           | 1  |  0.7347  |
|          mixer_b16_224          | 1  |  0.7273  |
|             dla102              | 1  |  0.7187  |
|            mixnet_l             | 1  |  0.7127  |
|          resmlp_12_224          | 1  |  0.6882  |
|            pit_b_224            | 1  |  0.6741  |
|        twins_pcpvt_base         | 1  |  0.6635  |
|        tnt_s_patch16_224        | 1  |  0.6476  |
|          jx_nest_base           | 1  |  0.6457  |
|         coat_lite_mini          | 1  |  0.6293  |
|           rexnet_100            | 1  |  0.6235  |
|         crossvit_9_240          | 1  |  0.6016  |
|           mobilevit_s           | 1  |  0.5561  |
|            tinynet_a            | 1  |   0.5    |
|       tf_efficientnet_b0        | 1  |  0.4917  |
|          gmlp_s16_224           | 1  |  0.4346  |
|     swsl_resnext101_32x16d      | 1  |  0.0673  |
|       eca_botnext26ts_256       | 0  |   0.0    |
|        sebotnet33ts_256         | 0  |   0.0    |
|          botnet26t_256          | 0  |   0.0    |
|        eca_halonext26ts         | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        eca_halonext26ts         | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|        sebotnet33ts_256         | 1  |  fail_to_run  |
|          botnet26t_256          | 1  |  fail_to_run  |
|       eca_botnext26ts_256       | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 87.5279  |
|          pnasnet5large          | 1  | 46.3015  |
|          cait_m36_384           | 1  | 36.1308  |
|           tf_mixnet_l           | 1  |  35.475  |
|            hrnet_w18            | 1  | 33.5026  |
|  swin_base_patch4_window7_224   | 1  | 33.1088  |
|        twins_pcpvt_base         | 1  | 33.0708  |
|           rexnet_100            | 1  | 32.9664  |
|           mobilevit_s           | 1  | 32.2259  |
|            mixnet_l             | 1  | 31.5505  |
|      xcit_large_24_p8_224       | 1  |  30.044  |
|           resnest101e           | 1  | 29.2334  |
|            fbnetv3_b            | 1  | 29.1483  |
|        adv_inception_v3         | 1  | 29.0574  |
|             dpn107              | 1  | 28.7993  |
|        res2net50_14w_8s         | 1  | 28.4829  |
|           dm_nfnet_f0           | 1  | 27.8804  |
|          ghostnet_100           | 1  |  27.737  |
|         poolformer_m36          | 1  | 26.9426  |
|       tf_efficientnet_b0        | 1  | 26.5395  |
|            tinynet_a            | 1  | 26.4855  |
|        res2net101_26w_4s        | 1  | 25.9801  |
|          jx_nest_base           | 1  | 24.9393  |
|         coat_lite_mini          | 1  | 24.2566  |
|            nfnet_l0             | 1  | 24.0727  |
|      mobilenetv3_large_100      | 1  |  23.634  |
|        tnt_s_patch16_224        | 1  | 23.2213  |
|           fbnetc_100            | 1  | 23.1681  |
|             dla102              | 1  | 23.0499  |
|           volo_d1_224           | 1  | 22.8963  |
|         crossvit_9_240          | 1  |  22.406  |
|          convnext_base          | 1  | 21.0074  |
|          cspdarknet53           | 1  | 20.9592  |
|           res2next50            | 1  | 20.4985  |
|       gluon_inception_v3        | 1  | 20.1694  |
|            pit_b_224            | 1  | 20.0287  |
|          spnasnet_100           | 1  | 19.9673  |
|          inception_v3           | 1  | 19.8218  |
|          gmlp_s16_224           | 1  | 19.3607  |
|           regnety_002           | 1  | 18.9754  |
|         mobilenetv2_100         | 1  | 18.4704  |
|        gluon_xception65         | 1  | 18.4543  |
|           mnasnet_100           | 1  | 18.4216  |
|           convit_base           | 1  | 18.2287  |
|         visformer_small         | 1  | 17.5961  |
|            gernet_l             | 1  | 16.5629  |
|          gmixer_24_224          | 1  | 16.3941  |
|           selecsls42b           | 1  | 16.1093  |
|        ese_vovnet19b_dw         | 1  | 15.1091  |
|          mixer_b16_224          | 1  | 15.0007  |
|            repvgg_a2            | 1  | 14.8364  |
| deit_base_distilled_patch16_224 | 1  | 14.7708  |
|      vit_base_patch16_224       | 1  | 14.6892  |
|      beit_base_patch16_224      | 1  | 14.5156  |
|            lcnet_050            | 1  | 14.3073  |
|          resmlp_12_224          | 1  | 11.9188  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|      xcit_large_24_p8_224       | 1  |  2.2401  |
|          cait_m36_384           | 1  |  2.165   |
|          pnasnet5large          | 1  |  1.3583  |
|           dm_nfnet_f0           | 1  |  1.2227  |
|            nfnet_l0             | 1  |  1.2121  |
|           mobilevit_s           | 1  |  1.2001  |
|          jx_nest_base           | 1  |  1.1911  |
|             dpn107              | 1  |  1.1863  |
|          convnext_base          | 1  |  1.1528  |
|           convit_base           | 1  |  1.1385  |
|          cspdarknet53           | 1  |  1.1357  |
|         poolformer_m36          | 1  |  1.1085  |
|          gmixer_24_224          | 1  |  1.1051  |
|  swin_base_patch4_window7_224   | 1  |  1.1047  |
|      beit_base_patch16_224      | 1  |  1.0966  |
|           volo_d1_224           | 1  |  1.0917  |
| deit_base_distilled_patch16_224 | 1  |  1.0801  |
|      vit_base_patch16_224       | 1  |   1.08   |
|        tnt_s_patch16_224        | 1  |  1.0799  |
|          mixer_b16_224          | 1  |  1.0769  |
|         mobilenetv2_100         | 1  |  1.0659  |
|        twins_pcpvt_base         | 1  |  1.0643  |
|           tf_mixnet_l           | 1  |  1.0592  |
|        ese_vovnet19b_dw         | 1  |  1.0515  |
|       tf_efficientnet_b0        | 1  |  1.0506  |
|           rexnet_100            | 1  |  1.0491  |
|         coat_lite_mini          | 1  |  1.0484  |
|          resmlp_12_224          | 1  |  1.0453  |
|            mixnet_l             | 1  |  1.0396  |
|            fbnetv3_b            | 1  |  1.0266  |
|            gernet_l             | 1  |  1.0235  |
|            tinynet_a            | 1  |  1.0175  |
|          spnasnet_100           | 1  |  1.0163  |
|           fbnetc_100            | 1  |  1.011   |
|           mnasnet_100           | 1  |  1.0069  |
|            repvgg_a2            | 1  |  1.0065  |
|           resnest101e           | 1  |  0.9914  |
|            pit_b_224            | 1  |  0.9784  |
|      mobilenetv3_large_100      | 1  |  0.9722  |
|            lcnet_050            | 1  |  0.9651  |
|            hrnet_w18            | 1  |  0.9589  |
|           res2next50            | 1  |  0.9575  |
|           regnety_002           | 1  |  0.9549  |
|         visformer_small         | 1  |  0.9525  |
|       gluon_inception_v3        | 1  |  0.9516  |
|        adv_inception_v3         | 1  |  0.9507  |
|          inception_v3           | 1  |  0.9499  |
|             dla102              | 1  |  0.9464  |
|          ghostnet_100           | 1  |  0.9441  |
|        res2net50_14w_8s         | 1  |  0.941   |
|         crossvit_9_240          | 1  |  0.8815  |
|          gmlp_s16_224           | 1  |   0.88   |
|     swsl_resnext101_32x16d      | 1  |  0.8433  |
|           selecsls42b           | 1  |  0.8197  |
|        gluon_xception65         | 1  |  0.8117  |
|        res2net101_26w_4s        | 1  |  0.7678  |
|          botnet26t_256          | 0  |   nan    |
|        convmixer_768_32         | 0  |   nan    |
|       eca_botnext26ts_256       | 0  |   nan    |
|        eca_halonext26ts         | 0  |   nan    |
|        sebotnet33ts_256         | 0  |   nan    |
+---------------------------------+----+----------+

ESI-SYD · 2022-11-16T08:59:20Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	637228b	46796fe
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 89%, 49/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.04x    |    1.07x    |    1.07x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   13.34    |    17.64    |    20.14    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.92x    |    0.96x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|        shufflenet_v2_x1_0         |  64  |  1.3553  |
|            densenet121            |  64  |  1.2508  |
|           mobilenet_v2            |  16  |  1.2493  |
|             resnet18              |  8   |  1.1779  |
|           squeezenet1_1           |  16  |  1.1674  |
|              alexnet              | 128  |  1.1589  |
|           pytorch_unet            |  1   |  1.1328  |
|            timm_vovnet            |  32  |  1.1134  |
|         soft_actor_critic         | 256  |  1.1091  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0987  |
|                drq                |  1   |  1.0977  |
|          vision_maskrcnn          |  1   |  1.0803  |
|               vgg16               |  4   |  1.0726  |
|            Super_SloMo            |  6   |  1.0633  |
|               dlrm                | 2048 |  1.0587  |
|            hf_T5_large            |  1   |  1.0521  |
|           BERT_pytorch            |  2   |  1.0512  |
|            timm_regnet            |  32  |  1.0416  |
|          pytorch_stargan          |  16  |  1.0272  |
|            hf_Reformer            |  1   |  1.0193  |
|        Background_Matting         |  1   |  1.008   |
|     detectron2_fcos_r_50_fpn      |  1   |  1.0036  |
|            hf_BigBird             |  1   |  1.0027  |
|              demucs               |  1   |  1.0002  |
|            tts_angular            |  64  |  0.997   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9965  |
|               dcgan               | 256  |  0.9957  |
|          LearningToPaint          |  96  |  0.9906  |
|      resnet50_quantized_qat       |  32  |  0.9897  |
|               hf_T5               |  1   |  0.9732  |
|             resnet152             |  32  |  0.9557  |
|           hf_Longformer           |  1   |  0.9471  |
|           hf_GPT2_large           |  1   |  0.926   |
|             resnet50              |  32  |  0.924   |
|        speech_transformer         |  1   |  0.9203  |
|   timm_vision_transformer_large   |  8   |  0.9179  |
|            timm_nfnet             | 128  |  0.913   |
|              hf_GPT2              |  1   |  0.9043  |
|           hf_DistilBert           |  1   |  0.8967  |
|             hf_Albert             |  1   |  0.8903  |
|              hf_Bert              |  1   |  0.8775  |
|           fastNLP_Bert            |  1   |  0.8732  |
|              hf_Bart              |  1   |  0.8687  |
|      nvidia_deeprecommender       | 256  |  0.8346  |
|            hf_T5_base             |  1   |  0.8332  |
|      timm_vision_transformer      |  8   |  0.7683  |
|         timm_efficientnet         |  64  |  0.7536  |
| attention_is_all_you_need_pytorch |  32  |  0.7283  |
|          resnext50_32x4d          |  8   |  0.6504  |
|       functorch_dp_cifar10        |  64  |  0.5466  |
|           lennard_jones           | 1000 |  0.3455  |
|            mnasnet1_0             |  0   |   0.0    |
|           timm_resnest            |  0   |   0.0    |
|        mobilenet_v3_large         |  0   |   0.0    |
|              yolov3               |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|      timm_vision_transformer      |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           fastNLP_Bert            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|        Background_Matting         |  1  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|      resnet50_quantized_qat       |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|            mnasnet1_0             |  2  |   fail_to_run    |
|           timm_resnest            |  2  |   fail_to_run    |
|        mobilenet_v3_large         |  2  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|              yolov3               |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|     detectron2_fcos_r_50_fpn      |  1   |  68.917  |
|          vision_maskrcnn          |  1   | 39.5334  |
|            hf_T5_large            |  1   | 37.3845  |
|   timm_vision_transformer_large   |  8   | 26.3515  |
|           hf_GPT2_large           |  1   | 25.2843  |
|            timm_nfnet             | 128  | 24.0902  |
|            hf_T5_base             |  1   | 23.5965  |
|           BERT_pytorch            |  2   | 22.9939  |
|            hf_BigBird             |  1   | 22.8551  |
|            densenet121            |  64  | 22.5694  |
|           hf_Longformer           |  1   | 22.2104  |
|             resnet152             |  32  | 16.5343  |
|            Super_SloMo            |  6   | 16.2681  |
|         timm_efficientnet         |  64  | 15.9977  |
|            timm_regnet            |  32  | 15.9502  |
|        speech_transformer         |  1   | 15.9089  |
|              hf_Bart              |  1   | 14.1855  |
|            hf_Reformer            |  1   | 13.6939  |
|           fastNLP_Bert            |  1   | 13.5781  |
| attention_is_all_you_need_pytorch |  32  | 13.3312  |
|               hf_T5               |  1   | 13.1629  |
|              hf_Bert              |  1   | 12.5994  |
|              hf_GPT2              |  1   |  12.262  |
|        Background_Matting         |  1   | 11.7714  |
|      timm_vision_transformer      |  8   | 11.7588  |
|            timm_vovnet            |  32  |  11.651  |
|             resnet50              |  32  | 11.5667  |
|             hf_Albert             |  1   | 11.4757  |
|        shufflenet_v2_x1_0         |  64  | 11.2385  |
|          resnext50_32x4d          |  8   | 10.9301  |
|       functorch_dp_cifar10        |  64  | 10.3097  |
|           hf_DistilBert           |  1   | 10.1671  |
|           mobilenet_v2            |  16  | 10.1208  |
|          pytorch_stargan          |  16  |  9.2953  |
|          LearningToPaint          |  96  |  9.1694  |
|           pytorch_unet            |  1   |  9.1086  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  8.9421  |
|             resnet18              |  8   |  8.7708  |
|           squeezenet1_1           |  16  |  8.6944  |
|               vgg16               |  4   |  8.6095  |
|      nvidia_deeprecommender       | 256  |  8.5305  |
|               dlrm                | 2048 |  8.3915  |
|              alexnet              | 128  |  8.2392  |
|            tts_angular            |  64  |  8.1306  |
|                drq                |  1   |   7.92   |
|         soft_actor_critic         | 256  |  7.7621  |
|               dcgan               | 256  |  7.7376  |
|           lennard_jones           | 1000 |  7.7336  |
|              demucs               |  1   |  1.2338  |
|    mobilenet_v2_quantized_qat     |  96  |  0.1009  |
|      resnet50_quantized_qat       |  32  | -0.1167  |
|            mnasnet1_0             |  0   |   nan    |
|        mobilenet_v3_large         |  0   |   nan    |
|           timm_resnest            |  0   |   nan    |
|              yolov3               |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9985  |
|               dlrm                | 2048 |  0.9969  |
|      resnet50_quantized_qat       |  32  |  0.9959  |
|        Background_Matting         |  1   |  0.9941  |
|               vgg16               |  4   |  0.9921  |
|            Super_SloMo            |  6   |  0.9918  |
|          LearningToPaint          |  96  |  0.9907  |
|              alexnet              | 128  |  0.9893  |
|            hf_BigBird             |  1   |  0.9883  |
|            densenet121            |  64  |  0.9875  |
|         timm_efficientnet         |  64  |  0.987   |
|            timm_vovnet            |  32  |  0.9867  |
|             resnet50              |  32  |  0.9863  |
|           mobilenet_v2            |  16  |  0.985   |
|           lennard_jones           | 1000 |  0.9844  |
|           pytorch_unet            |  1   |  0.9842  |
|            timm_regnet            |  32  |  0.9836  |
|            timm_nfnet             | 128  |  0.9827  |
|         soft_actor_critic         | 256  |  0.9812  |
|        shufflenet_v2_x1_0         |  64  |  0.9801  |
|            tts_angular            |  64  |  0.9785  |
|                drq                |  1   |  0.9777  |
|     detectron2_fcos_r_50_fpn      |  1   |  0.9769  |
|          resnext50_32x4d          |  8   |  0.976   |
|    mobilenet_v2_quantized_qat     |  96  |  0.9748  |
|          vision_maskrcnn          |  1   |  0.973   |
|   timm_vision_transformer_large   |  8   |  0.968   |
|           squeezenet1_1           |  16  |  0.9649  |
|          pytorch_stargan          |  16  |  0.9585  |
|           hf_GPT2_large           |  1   |  0.9569  |
|           hf_DistilBert           |  1   |  0.9565  |
|        speech_transformer         |  1   |  0.9477  |
|              hf_GPT2              |  1   |  0.9454  |
|              hf_Bart              |  1   |  0.9411  |
|           BERT_pytorch            |  2   |  0.935   |
|               dcgan               | 256  |  0.9314  |
|            hf_T5_base             |  1   |  0.9311  |
| attention_is_all_you_need_pytorch |  32  |  0.9264  |
|            hf_Reformer            |  1   |  0.9182  |
|             hf_Albert             |  1   |  0.8971  |
|           fastNLP_Bert            |  1   |  0.8945  |
|      timm_vision_transformer      |  8   |  0.8796  |
|       functorch_dp_cifar10        |  64  |  0.8791  |
|             resnet18              |  8   |  0.8248  |
|           hf_Longformer           |  1   |  0.7863  |
|              hf_Bert              |  1   |  0.7829  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.777   |
|             resnet152             |  32  |  0.7492  |
|            hf_T5_large            |  1   |  0.645   |
|      nvidia_deeprecommender       | 256  |  0.584   |
|               hf_T5               |  1   |  0.5705  |
|            mnasnet1_0             |  0   |   nan    |
|        mobilenet_v3_large         |  0   |   nan    |
|           timm_resnest            |  0   |   nan    |
|              yolov3               |  0   |   nan    |
+-----------------------------------+------+----------+

Absolute latency (ms)

+-----------------------------------+------+-----------+
|               name                |  bs  | inductor  |
+-----------------------------------+------+-----------+
|   timm_vision_transformer_large   |  8   | 1398.7424 |
|            hf_T5_base             |  1   | 1208.9221 |
|            timm_nfnet             | 128  | 1062.2625 |
|           hf_GPT2_large           |  1   | 885.0781  |
|            Super_SloMo            |  6   | 641.5188  |
|            hf_T5_large            |  1   | 448.5634  |
|          vision_maskrcnn          |  1   | 390.5735  |
|            timm_regnet            |  32  | 379.4222  |
|             resnet152             |  32  | 378.8831  |
|        Background_Matting         |  1   | 287.9932  |
|            densenet121            |  64  | 283.8995  |
|          pytorch_stargan          |  16  | 245.9641  |
|     detectron2_fcos_r_50_fpn      |  1   |  241.546  |
|           pytorch_unet            |  1   | 234.4081  |
|            hf_BigBird             |  1   | 201.2967  |
|              demucs               |  1   | 188.0801  |
|         timm_efficientnet         |  64  | 186.9033  |
|             resnet50              |  32  | 172.0353  |
|            timm_vovnet            |  32  | 166.4536  |
|           hf_Longformer           |  1   |  92.4033  |
|              hf_Bart              |  1   |  84.0521  |
|          resnext50_32x4d          |  8   |  64.7025  |
|              hf_Bert              |  1   |  64.284   |
|    mobilenet_v2_quantized_qat     |  96  |  61.6329  |
|            tts_angular            |  64  |  60.5634  |
|        speech_transformer         |  1   |  59.7795  |
|             hf_Albert             |  1   |  57.749   |
|           fastNLP_Bert            |  1   |  56.6793  |
|              alexnet              | 128  |  56.6686  |
| attention_is_all_you_need_pytorch |  32  |  54.8065  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  53.4079  |
|          LearningToPaint          |  96  |  51.6065  |
|               hf_T5               |  1   |  48.896   |
|      timm_vision_transformer      |  8   |  48.6617  |
|              hf_GPT2              |  1   |  44.8374  |
|      nvidia_deeprecommender       | 256  |  43.0138  |
|               vgg16               |  4   |  41.0296  |
|      resnet50_quantized_qat       |  32  |  40.4163  |
|           hf_DistilBert           |  1   |  36.6378  |
|            hf_Reformer            |  1   |  34.5032  |
|           BERT_pytorch            |  2   |  30.8088  |
|        shufflenet_v2_x1_0         |  64  |  27.2268  |
|               dcgan               | 256  |  23.9807  |
|           mobilenet_v2            |  16  |  18.8449  |
|             resnet18              |  8   |  13.4283  |
|           squeezenet1_1           |  16  |  13.2435  |
|       functorch_dp_cifar10        |  64  |  10.4651  |
|               dlrm                | 2048 |   6.805   |
|                drq                |  1   |  1.1352   |
|           lennard_jones           | 1000 |  0.5759   |
|         soft_actor_critic         | 256  |  0.5045   |
|            mnasnet1_0             |  0   |    nan    |
|        mobilenet_v3_large         |  0   |    nan    |
|           timm_resnest            |  0   |    nan    |
|              yolov3               |  0   |    nan    |
+-----------------------------------+------+-----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5012  |
|            XLNetLMHeadModel             | 32  |  1.9757  |
|     MobileBertForQuestionAnswering      | 64  |  1.3192  |
|            YituTechConvBert             |  1  |  1.2841  |
|               DistillGPT2               |  1  |  1.2666  |
|               GoogleFnet                |  1  |  1.2042  |
|          MobileBertForMaskedLM          | 32  |  1.1072  |
|             XGLMForCausalLM             |  8  |  1.0744  |
|                 BigBird                 |  1  |  1.0687  |
|     M2M100ForConditionalGeneration      |  8  |  1.0412  |
|                CamemBert                |  1  |  1.0292  |
|            AlbertForMaskedLM            |  4  |  1.0258  |
|       AlbertForQuestionAnswering        |  4  |  1.0251  |
|          AllenaiLongformerBase          |  1  |  1.0123  |
|             OPTForCausalLM              | 32  |  1.0061  |
|           DebertaForMaskedLM            |  4  |  1.003   |
|                 T5Small                 |  1  |  0.9955  |
|       DebertaForQuestionAnswering       |  8  |  0.9724  |
|      GPT2ForSequenceClassification      |  4  |  0.9399  |
|         MegatronBertForCausalLM         | 16  |  0.9384  |
|    LayoutLMForSequenceClassification    | 16  |  0.9268  |
|           RobertaForCausalLM            | 64  |  0.9214  |
|     PLBartForConditionalGeneration      | 16  |  0.9206  |
|           ElectraForCausalLM            | 32  |  0.9203  |
|      MBartForConditionalGeneration      | 16  |  0.9191  |
|         Speech2Text2ForCausalLM         | 128 |  0.9166  |
|     PegasusForConditionalGeneration     | 16  |  0.9151  |
|     DistilBertForQuestionAnswering      | 64  |  0.9048  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9042  |
|            TrOCRForCausalLM             | 32  |  0.9037  |
|            PLBartForCausalLM            | 32  |  0.9025  |
|           LayoutLMForMaskedLM           | 16  |  0.9001  |
|       RobertaForQuestionAnswering       | 128 |  0.8975  |
|            MBartForCausalLM             | 32  |  0.8971  |
|        BertForQuestionAnswering         | 128 |  0.8951  |
|           PegasusForCausalLM            | 32  |  0.8918  |
|             BertForMaskedLM             | 64  |  0.8895  |
|       ElectraForQuestionAnswering       | 64  |  0.8739  |
|          DistilBertForMaskedLM          | 64  |  0.8656  |
|       T5ForConditionalGeneration        |  4  |  0.861   |
|             BartForCausalLM             |  4  |  0.839   |
|      BartForConditionalGeneration       |  2  |  0.8299  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8292  |
|       BlenderbotSmallForCausalLM        | 64  |  0.8097  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 26.3394  |
|          MobileBertForMaskedLM          | 32  | 25.7779  |
|     MobileBertForQuestionAnswering      | 64  |  25.68   |
|     PegasusForConditionalGeneration     | 16  |  25.467  |
|     M2M100ForConditionalGeneration      |  8  | 23.9695  |
|       DebertaForQuestionAnswering       |  8  | 23.7444  |
|          AllenaiLongformerBase          |  1  | 23.6976  |
|      BartForConditionalGeneration       |  2  | 23.3764  |
|           DebertaForMaskedLM            |  4  | 22.8322  |
|      MBartForConditionalGeneration      | 16  | 22.6826  |
|             OPTForCausalLM              | 32  | 22.5619  |
|                 BigBird                 |  1  | 21.8533  |
|             XGLMForCausalLM             |  8  | 21.3831  |
|          DistilBertForMaskedLM          | 64  | 20.8333  |
|         MegatronBertForCausalLM         | 16  | 20.0415  |
|    MegatronBertForQuestionAnswering     | 16  | 19.6688  |
| BlenderbotSmallForConditionalGeneration | 64  | 18.4866  |
|            YituTechConvBert             |  1  | 17.7681  |
|           PegasusForCausalLM            | 32  | 17.0373  |
|            AlbertForMaskedLM            |  4  | 17.0334  |
|       MT5ForConditionalGeneration       |  8  | 16.5386  |
|        BertForQuestionAnswering         | 128 | 16.3445  |
|       RobertaForQuestionAnswering       | 128 | 16.0391  |
|           LayoutLMForMaskedLM           | 16  | 15.5177  |
|       T5ForConditionalGeneration        |  4  | 15.4862  |
|           RobertaForCausalLM            | 64  | 15.2302  |
|             BartForCausalLM             |  4  | 15.1982  |
|           ElectraForCausalLM            | 32  | 15.1563  |
|       ElectraForQuestionAnswering       | 64  | 15.0421  |
|             BertForMaskedLM             | 64  | 15.0283  |
|     PLBartForConditionalGeneration      | 16  | 15.0188  |
|    LayoutLMForSequenceClassification    | 16  | 14.9871  |
|            MBartForCausalLM             | 32  | 14.5625  |
|      GPT2ForSequenceClassification      |  4  | 14.2372  |
|            TrOCRForCausalLM             | 32  | 14.1298  |
|                 T5Small                 |  1  |  13.542  |
|                CamemBert                |  1  | 12.7371  |
|       BlenderbotSmallForCausalLM        | 64  | 12.6528  |
|         Speech2Text2ForCausalLM         | 128 | 12.5373  |
|               GoogleFnet                |  1  | 12.2514  |
|     DistilBertForQuestionAnswering      | 64  | 11.3804  |
|            PLBartForCausalLM            | 32  | 11.1566  |
|            XLNetLMHeadModel             | 32  | 10.6467  |
|               DistillGPT2               |  1  | 10.4047  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.995   |
|       AlbertForQuestionAnswering        |  4  |  0.9949  |
|       DebertaForQuestionAnswering       |  8  |  0.9942  |
|           DebertaForMaskedLM            |  4  |  0.9922  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9915  |
|          DistilBertForMaskedLM          | 64  |  0.9913  |
|                 BigBird                 |  1  |  0.9908  |
|             BartForCausalLM             |  4  |  0.9905  |
|       ElectraForQuestionAnswering       | 64  |  0.9899  |
|           ElectraForCausalLM            | 32  |  0.9898  |
|         Speech2Text2ForCausalLM         | 128 |  0.9892  |
|      GPT2ForSequenceClassification      |  4  |  0.9887  |
|            PLBartForCausalLM            | 32  |  0.988   |
|           PegasusForCausalLM            | 32  |  0.9872  |
|        BertForQuestionAnswering         | 128 |  0.9863  |
|       RobertaForQuestionAnswering       | 128 |  0.986   |
|     DistilBertForQuestionAnswering      | 64  |  0.9859  |
|            TrOCRForCausalLM             | 32  |  0.9859  |
|            MBartForCausalLM             | 32  |  0.9858  |
|             OPTForCausalLM              | 32  |  0.9858  |
|           LayoutLMForMaskedLM           | 16  |  0.9854  |
|               GoogleFnet                |  1  |  0.9826  |
|           RobertaForCausalLM            | 64  |  0.9805  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9797  |
|             BertForMaskedLM             | 64  |  0.9787  |
|    LayoutLMForSequenceClassification    | 16  |  0.9781  |
|     PegasusForConditionalGeneration     | 16  |  0.9779  |
|               DistillGPT2               |  1  |  0.9743  |
|       T5ForConditionalGeneration        |  4  |  0.9696  |
|            XLNetLMHeadModel             | 32  |  0.9662  |
|      BartForConditionalGeneration       |  2  |  0.9656  |
|     PLBartForConditionalGeneration      | 16  |  0.9655  |
|          AllenaiLongformerBase          |  1  |  0.9531  |
|      MBartForConditionalGeneration      | 16  |  0.9515  |
|                CamemBert                |  1  |  0.9487  |
|             XGLMForCausalLM             |  8  |  0.9433  |
|     M2M100ForConditionalGeneration      |  8  |  0.9399  |
|         MegatronBertForCausalLM         | 16  |  0.936   |
|    MegatronBertForQuestionAnswering     | 16  |  0.9295  |
|                 T5Small                 |  1  |  0.9231  |
|            YituTechConvBert             |  1  |  0.9204  |
|     MobileBertForQuestionAnswering      | 64  |  0.8881  |
|          MobileBertForMaskedLM          | 32  |  0.881   |
|       MT5ForConditionalGeneration       |  8  |  0.7482  |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             | 32  | 6135.7544 |
|            AlbertForMaskedLM            |  4  | 2913.5788 |
|       AlbertForQuestionAnswering        |  4  | 2902.9108 |
|        BertForQuestionAnswering         | 128 | 1081.2987 |
|       RobertaForQuestionAnswering       | 128 | 1059.8627 |
|      BartForConditionalGeneration       |  2  | 984.6583  |
|             BartForCausalLM             |  4  | 837.3208  |
|           LayoutLMForMaskedLM           | 16  | 832.1489  |
| BlenderbotSmallForConditionalGeneration | 64  | 683.8456  |
|             BertForMaskedLM             | 64  | 671.6693  |
|    LayoutLMForSequenceClassification    | 16  | 670.1861  |
|           RobertaForCausalLM            | 64  | 652.6031  |
|       ElectraForQuestionAnswering       | 64  | 643.3807  |
|      MBartForConditionalGeneration      | 16  | 610.1322  |
|     PegasusForConditionalGeneration     | 16  | 609.0805  |
|            MBartForCausalLM             | 32  | 572.2292  |
|           PegasusForCausalLM            | 32  | 571.9501  |
|            TrOCRForCausalLM             | 32  | 561.4435  |
|      GPT2ForSequenceClassification      |  4  | 519.2097  |
|         MegatronBertForCausalLM         | 16  | 497.6218  |
|           ElectraForCausalLM            | 32  |  487.991  |
|       T5ForConditionalGeneration        |  4  | 467.7136  |
|             XGLMForCausalLM             |  8  | 453.1366  |
|    MegatronBertForQuestionAnswering     | 16  | 449.5991  |
|          DistilBertForMaskedLM          | 64  | 416.0586  |
|       BlenderbotSmallForCausalLM        | 64  | 384.0055  |
|     M2M100ForConditionalGeneration      |  8  | 382.1028  |
|             OPTForCausalLM              | 32  | 344.7281  |
|       DebertaForQuestionAnswering       |  8  | 328.4248  |
|     DistilBertForQuestionAnswering      | 64  | 255.9421  |
|       MT5ForConditionalGeneration       |  8  |  252.552  |
|            PLBartForCausalLM            | 32  | 246.1751  |
|           DebertaForMaskedLM            |  4  |  242.225  |
|     PLBartForConditionalGeneration      | 16  | 223.0679  |
|                 BigBird                 |  1  | 212.8135  |
|          AllenaiLongformerBase          |  1  | 174.5816  |
|         Speech2Text2ForCausalLM         | 128 | 167.4415  |
|          MobileBertForMaskedLM          | 32  | 148.5585  |
|     MobileBertForQuestionAnswering      | 64  |  134.642  |
|                 T5Small                 |  1  | 124.1493  |
|            YituTechConvBert             |  1  |  69.3295  |
|                CamemBert                |  1  |  68.0215  |
|               DistillGPT2               |  1  |  46.8582  |
|               GoogleFnet                |  1  |  42.4637  |
+-----------------------------------------+-----+-----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.3859  |
|          inception_v3           | 128 |  1.3183  |
|       gluon_inception_v3        | 128 |  1.3049  |
|        adv_inception_v3         | 128 |  1.2813  |
|           fbnetc_100            | 128 |  1.2028  |
|           volo_d1_224           | 64  |  1.2006  |
|          botnet26t_256          | 128 |  1.1893  |
|           mnasnet_100           | 128 |  1.1789  |
|          spnasnet_100           | 128 |  1.1788  |
|          ghostnet_100           | 128 |  1.174   |
|        ese_vovnet19b_dw         | 128 |  1.1691  |
|      mobilenetv3_large_100      | 128 |  1.1658  |
|             dpn107              | 32  |  1.1617  |
|        gluon_xception65         | 32  |  1.1576  |
|           res2next50            | 128 |  1.1218  |
|            gernet_l             | 128 |   1.12   |
|            repvgg_a2            | 128 |  1.118   |
|            lcnet_050            | 128 |  1.1163  |
|        res2net101_26w_4s        | 64  |  1.1121  |
|        res2net50_14w_8s         | 128 |  1.1096  |
|            fbnetv3_b            | 128 |  1.0893  |
|         mobilenetv2_100         | 128 |  1.0708  |
|           selecsls42b           | 128 |  1.0687  |
|          cspdarknet53           | 64  |  1.0549  |
|           regnety_002           | 128 |  1.042   |
|           tf_mixnet_l           | 128 |  1.0342  |
|      xcit_large_24_p8_224       |  5  |  1.0162  |
|          gmixer_24_224          | 128 |  1.0108  |
|          cait_m36_384           |  4  |  0.9912  |
|             dla102              | 128 |  0.9896  |
|            mixnet_l             | 128 |  0.9869  |
|           dm_nfnet_f0           | 128 |  0.9194  |
|        eca_halonext26ts         | 128 |  0.9112  |
|       eca_botnext26ts_256       | 128 |  0.9086  |
|  swin_base_patch4_window7_224   | 64  |  0.8985  |
|        sebotnet33ts_256         | 64  |  0.8906  |
|      beit_base_patch16_224      | 64  |  0.8877  |
|         poolformer_m36          | 64  |  0.8625  |
| deit_base_distilled_patch16_224 | 64  |  0.8574  |
|            nfnet_l0             | 128 |  0.8569  |
|      vit_base_patch16_224       | 64  |  0.8495  |
|           convit_base           | 64  |  0.8484  |
|          resmlp_12_224          | 128 |  0.8357  |
|           rexnet_100            | 128 |  0.8341  |
|          convnext_base          | 64  |  0.8285  |
|          jx_nest_base           | 32  |  0.8268  |
|       tf_efficientnet_b0        | 128 |  0.8073  |
|        tnt_s_patch16_224        | 128 |  0.8049  |
|          mixer_b16_224          | 128 |  0.8014  |
|            pit_b_224            | 64  |  0.7813  |
|         coat_lite_mini          | 128 |  0.7712  |
|            tinynet_a            | 128 |  0.7709  |
|           mobilevit_s           | 64  |  0.7699  |
|         crossvit_9_240          | 128 |  0.768   |
|        twins_pcpvt_base         | 64  |  0.7268  |
|          gmlp_s16_224           | 128 |  0.6241  |
|           resnest101e           |  0  |   0.0    |
|        convmixer_768_32         |  0  |   0.0    |
|     swsl_resnext101_32x16d      |  0  |   0.0    |
|            hrnet_w18            |  0  |   0.0    |
|         visformer_small         |  0  |   0.0    |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|       eca_botnext26ts_256       | 2  |     pass      |
|          botnet26t_256          | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|           convit_base           | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|        eca_halonext26ts         | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|        sebotnet33ts_256         | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |  fail_to_run  |
|        convmixer_768_32         | 2  |  fail_to_run  |
|            hrnet_w18            | 2  |  fail_to_run  |
|         visformer_small         | 2  |  fail_to_run  |
|           resnest101e           | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  | 38.6446  |
|           dm_nfnet_f0           | 128 | 38.1373  |
|  swin_base_patch4_window7_224   | 64  | 30.8531  |
|         poolformer_m36          | 64  | 30.7346  |
|          cait_m36_384           |  4  | 29.7031  |
|        twins_pcpvt_base         | 64  | 29.3807  |
|           tf_mixnet_l           | 128 | 29.2054  |
|            tinynet_a            | 128 | 27.9677  |
|           mobilevit_s           | 64  | 26.8933  |
|            mixnet_l             | 128 | 26.4851  |
|        res2net50_14w_8s         | 128 | 25.4154  |
|      xcit_large_24_p8_224       |  5  | 25.1994  |
|        tnt_s_patch16_224        | 128 | 24.5253  |
|             dla102              | 128 | 24.3247  |
|            nfnet_l0             | 128 | 24.1006  |
|        eca_halonext26ts         | 128 | 23.1717  |
|            fbnetv3_b            | 128 | 23.0866  |
|             dpn107              | 32  | 22.6769  |
|        res2net101_26w_4s        | 64  | 22.5152  |
|          jx_nest_base           | 32  | 22.0539  |
|           rexnet_100            | 128 | 21.7039  |
|        sebotnet33ts_256         | 64  |  21.513  |
|       eca_botnext26ts_256       | 128 | 21.2468  |
|         coat_lite_mini          | 128 | 20.3097  |
|          convnext_base          | 64  | 20.2474  |
|       tf_efficientnet_b0        | 128 | 19.9948  |
|           volo_d1_224           | 64  | 19.7657  |
|           convit_base           | 64  | 19.3706  |
|         crossvit_9_240          | 128 | 19.3676  |
|           res2next50            | 128 | 19.1922  |
|            pit_b_224            | 64  | 18.7909  |
|          gmlp_s16_224           | 128 | 18.5539  |
|          botnet26t_256          | 128 | 17.8705  |
|        adv_inception_v3         | 128 | 17.3002  |
|          cspdarknet53           | 64  | 17.0967  |
|          mixer_b16_224          | 128 | 16.5458  |
|        gluon_xception65         | 32  | 15.9474  |
|          ghostnet_100           | 128 | 15.6503  |
| deit_base_distilled_patch16_224 | 64  | 15.4246  |
|           regnety_002           | 128 | 15.4038  |
|         mobilenetv2_100         | 128 | 15.1876  |
|      vit_base_patch16_224       | 64  | 15.1275  |
|      mobilenetv3_large_100      | 128 | 14.9626  |
|           fbnetc_100            | 128 | 14.9414  |
|          spnasnet_100           | 128 | 14.6729  |
|            gernet_l             | 128 |  13.935  |
|      beit_base_patch16_224      | 64  | 13.7499  |
|       gluon_inception_v3        | 128 | 13.5428  |
|           mnasnet_100           | 128 | 13.4203  |
|          gmixer_24_224          | 128 | 13.4183  |
|          inception_v3           | 128 | 13.3092  |
|            repvgg_a2            | 128 |  13.011  |
|           selecsls42b           | 128 | 12.8237  |
|        ese_vovnet19b_dw         | 128 | 12.7159  |
|          resmlp_12_224          | 128 | 11.7585  |
|            lcnet_050            | 128 | 10.2061  |
|        convmixer_768_32         |  0  |   nan    |
|            hrnet_w18            |  0  |   nan    |
|           resnest101e           |  0  |   nan    |
|     swsl_resnext101_32x16d      |  0  |   nan    |
|         visformer_small         |  0  |   nan    |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9971  |
|             dla102              | 128 |  0.9963  |
|           selecsls42b           | 128 |  0.9963  |
|           tf_mixnet_l           | 128 |  0.9962  |
|       eca_botnext26ts_256       | 128 |  0.9962  |
|            mixnet_l             | 128 |  0.9961  |
|        eca_halonext26ts         | 128 |  0.9961  |
|        gluon_xception65         | 32  |  0.9959  |
|        adv_inception_v3         | 128 |  0.9957  |
|          botnet26t_256          | 128 |  0.9956  |
|          mixer_b16_224          | 128 |  0.9955  |
|        res2net50_14w_8s         | 128 |  0.9946  |
|           res2next50            | 128 |  0.9946  |
|          cspdarknet53           | 64  |  0.9945  |
|            pit_b_224            | 64  |  0.9944  |
|          resmlp_12_224          | 128 |  0.9943  |
|           rexnet_100            | 128 |  0.9943  |
|         coat_lite_mini          | 128 |  0.9938  |
|           mobilevit_s           | 64  |  0.9936  |
|           convit_base           | 64  |  0.9935  |
|          ghostnet_100           | 128 |  0.9935  |
|             dpn107              | 32  |  0.9933  |
|            nfnet_l0             | 128 |  0.9932  |
|       tf_efficientnet_b0        | 128 |  0.9931  |
|           dm_nfnet_f0           | 128 |  0.9928  |
|        sebotnet33ts_256         | 64  |  0.9925  |
| deit_base_distilled_patch16_224 | 64  |  0.9919  |
|            gernet_l             | 128 |  0.9919  |
|      vit_base_patch16_224       | 64  |  0.9918  |
|        res2net101_26w_4s        | 64  |  0.9917  |
|         mobilenetv2_100         | 128 |  0.9915  |
|        tnt_s_patch16_224        | 128 |  0.9913  |
|           mnasnet_100           | 128 |  0.9905  |
|          pnasnet5large          | 16  |  0.9904  |
|         crossvit_9_240          | 128 |  0.9897  |
|            tinynet_a            | 128 |  0.9897  |
|           fbnetc_100            | 128 |  0.9896  |
|      mobilenetv3_large_100      | 128 |  0.9894  |
|            repvgg_a2            | 128 |  0.9894  |
|            fbnetv3_b            | 128 |  0.9892  |
|          spnasnet_100           | 128 |  0.9886  |
|          convnext_base          | 64  |  0.9883  |
|  swin_base_patch4_window7_224   | 64  |  0.9846  |
|           regnety_002           | 128 |  0.9796  |
|           volo_d1_224           | 64  |  0.9787  |
|         poolformer_m36          | 64  |  0.9764  |
|          jx_nest_base           | 32  |  0.9731  |
|            lcnet_050            | 128 |  0.967   |
|       gluon_inception_v3        | 128 |  0.9649  |
|          inception_v3           | 128 |  0.9643  |
|          cait_m36_384           |  4  |  0.9626  |
|        twins_pcpvt_base         | 64  |  0.9612  |
|      xcit_large_24_p8_224       |  5  |  0.9537  |
|      beit_base_patch16_224      | 64  |  0.9503  |
|          gmixer_24_224          | 128 |  0.919   |
|          gmlp_s16_224           | 128 |  0.8688  |
|        convmixer_768_32         |  0  |   nan    |
|            hrnet_w18            |  0  |   nan    |
|           resnest101e           |  0  |   nan    |
|     swsl_resnext101_32x16d      |  0  |   nan    |
|         visformer_small         |  0  |   nan    |
+---------------------------------+-----+----------+

Absolute latency (ms)

+---------------------------------+-----+-----------+
|              name               | bs  | inductor  |
+---------------------------------+-----+-----------+
|           dm_nfnet_f0           | 128 | 1893.3513 |
|            nfnet_l0             | 128 | 1400.5184 |
|          mixer_b16_224          | 128 | 1343.6957 |
|           convit_base           | 64  | 1274.3139 |
|          cait_m36_384           |  4  | 1257.5395 |
|        tnt_s_patch16_224        | 128 | 1143.7164 |
|             dla102              | 128 | 1138.292  |
|  swin_base_patch4_window7_224   | 64  | 1120.7129 |
|          gmlp_s16_224           | 128 | 1067.7405 |
|          convnext_base          | 64  | 963.5042  |
|            pit_b_224            | 64  | 960.3873  |
|      vit_base_patch16_224       | 64  | 928.8819  |
| deit_base_distilled_patch16_224 | 64  | 917.4262  |
|         poolformer_m36          | 64  | 905.9793  |
|      beit_base_patch16_224      | 64  | 899.0354  |
|        eca_halonext26ts         | 128 | 753.4683  |
|       eca_botnext26ts_256       | 128 | 743.8374  |
|           res2next50            | 128 | 743.5604  |
|        res2net50_14w_8s         | 128 | 737.2616  |
|        adv_inception_v3         | 128 |  673.563  |
|       gluon_inception_v3        | 128 | 665.0792  |
|          inception_v3           | 128 | 664.3113  |
|          jx_nest_base           | 32  | 660.1735  |
|          gmixer_24_224          | 128 | 658.4195  |
|        twins_pcpvt_base         | 64  | 652.0627  |
|           tf_mixnet_l           | 128 | 622.3522  |
|            repvgg_a2            | 128 |  616.996  |
|            mixnet_l             | 128 | 607.9118  |
|          pnasnet5large          | 16  | 606.0259  |
|         coat_lite_mini          | 128 | 600.8722  |
|          botnet26t_256          | 128 | 585.0667  |
|        res2net101_26w_4s        | 64  | 584.4436  |
|           volo_d1_224           | 64  | 576.0334  |
|             dpn107              | 32  | 567.6784  |
|        sebotnet33ts_256         | 64  | 551.4591  |
|      xcit_large_24_p8_224       |  5  | 541.7088  |
|           mobilevit_s           | 64  | 535.4861  |
|        gluon_xception65         | 32  | 506.9852  |
|          cspdarknet53           | 64  | 493.8644  |
|            gernet_l             | 128 | 485.0978  |
|         crossvit_9_240          | 128 | 449.3476  |
|       tf_efficientnet_b0        | 128 | 430.5557  |
|           rexnet_100            | 128 | 400.9165  |
|           selecsls42b           | 128 | 378.0065  |
|          resmlp_12_224          | 128 | 370.8926  |
|        ese_vovnet19b_dw         | 128 | 351.5002  |
|            fbnetv3_b            | 128 | 345.3958  |
|            tinynet_a            | 128 | 306.0124  |
|         mobilenetv2_100         | 128 | 250.1286  |
|           fbnetc_100            | 128 | 217.9854  |
|          spnasnet_100           | 128 | 193.8245  |
|           mnasnet_100           | 128 | 182.0802  |
|          ghostnet_100           | 128 | 156.2125  |
|      mobilenetv3_large_100      | 128 | 152.9724  |
|           regnety_002           | 128 |  89.4823  |
|            lcnet_050            | 128 |  38.3547  |
|        convmixer_768_32         |  0  |    nan    |
|            hrnet_w18            |  0  |    nan    |
|           resnest101e           |  0  |    nan    |
|     swsl_resnext101_32x16d      |  0  |    nan    |
|         visformer_small         |  0  |    nan    |
+---------------------------------+-----+-----------+

ESI-SYD · 2022-11-16T09:00:06Z

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	637228b	46796fe
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

Update: We use single-instance mode in this round.
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 50/55 | 100%, 44/44 | 89%, 54/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.03x    |    1.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   24.94    |    24.00    |    31.63    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.85x    |    0.86x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_Reformer            |  1  |  1.5011  |
|           mobilenet_v2            |  1  |  1.3734  |
|        speech_transformer         |  1  |  1.3015  |
| attention_is_all_you_need_pytorch |  1  |  1.2772  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.2309  |
|            timm_nfnet             |  1  |  1.1797  |
|              hf_GPT2              |  1  |  1.1606  |
|         soft_actor_critic         | 256 |  1.1579  |
|             resnet18              |  1  |  1.1408  |
|          vision_maskrcnn          |  1  |  1.1278  |
|           pytorch_unet            |  1  |  1.1258  |
|        shufflenet_v2_x1_0         |  1  |  1.1245  |
|            timm_vovnet            |  1  |  1.1127  |
|           squeezenet1_1           |  1  |  1.0981  |
|               vgg16               |  1  |  1.0909  |
|     detectron2_fcos_r_50_fpn      |  1  |  1.0901  |
|           BERT_pytorch            |  1  |  1.0853  |
|              alexnet              |  1  |  1.0752  |
|          pytorch_stargan          | 16  |  1.0681  |
|            densenet121            |  1  |  1.0661  |
|            Super_SloMo            |  1  |  1.0624  |
|            timm_regnet            |  1  |  1.0616  |
|                drq                |  1  |  1.0598  |
|               dlrm                |  1  |  1.0443  |
|        Background_Matting         |  1  |  1.0345  |
|               dcgan               |  1  |  1.0263  |
|          LearningToPaint          |  1  |  1.0231  |
|    mobilenet_v2_quantized_qat     |  1  |  1.0035  |
|            tts_angular            |  1  |  1.0009  |
|      resnet50_quantized_qat       |  1  |  0.9994  |
|              demucs               |  1  |  0.9986  |
|            hf_BigBird             |  1  |  0.9947  |
|      nvidia_deeprecommender       |  1  |  0.9779  |
|      timm_vision_transformer      |  1  |  0.972   |
|           hf_DistilBert           |  1  |  0.9675  |
|             resnet50              |  1  |   0.9    |
|             resnet152             |  1  |  0.8906  |
|   timm_vision_transformer_large   |  1  |  0.8427  |
|           hf_Longformer           |  1  |  0.8362  |
|            hf_T5_large            |  1  |  0.7989  |
|          resnext50_32x4d          |  1  |  0.794   |
|         timm_efficientnet         |  1  |  0.7916  |
|             hf_Albert             |  1  |  0.7731  |
|           lennard_jones           |  1  |  0.7587  |
|              hf_Bert              |  1  |  0.7504  |
|              hf_Bart              |  1  |  0.7361  |
|           fastNLP_Bert            |  1  |  0.7308  |
|               hf_T5               |  1  |  0.724   |
|           hf_GPT2_large           |  1  |  0.7223  |
|            hf_T5_base             |  1  |  0.6573  |
|       functorch_dp_cifar10        |  1  |  0.1853  |
|            mnasnet1_0             |  0  |   0.0    |
|           timm_resnest            |  0  |   0.0    |
|        mobilenet_v3_large         |  0  |   0.0    |
|              yolov3               |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|           BERT_pytorch            |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|     detectron2_fcos_r_50_fpn      |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|           fastNLP_Bert            |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|      resnet50_quantized_qat       |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|        Background_Matting         |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|            mnasnet1_0             |  1  |   fail_to_run    |
|           timm_resnest            |  1  |   fail_to_run    |
|        mobilenet_v3_large         |  1  |   fail_to_run    |
|              yolov3               |  1  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  | 92.8005  |
|            hf_T5_large            |  1  | 77.7114  |
|     detectron2_fcos_r_50_fpn      |  1  | 74.3171  |
|           hf_GPT2_large           |  1  | 70.7082  |
|            densenet121            |  1  | 58.1948  |
|          vision_maskrcnn          |  1  |  55.998  |
|            timm_nfnet             |  1  | 38.8452  |
|           hf_Longformer           |  1  | 38.6295  |
|         timm_efficientnet         |  1  | 37.2077  |
|   timm_vision_transformer_large   |  1  | 35.6581  |
|        Background_Matting         |  1  | 33.8217  |
|            Super_SloMo            |  1  | 33.7243  |
|            timm_regnet            |  1  | 28.8207  |
|            timm_vovnet            |  1  | 28.5165  |
|           BERT_pytorch            |  1  | 27.9394  |
|             resnet152             |  1  | 27.6069  |
|            hf_Reformer            |  1  | 25.1953  |
|           pytorch_unet            |  1  | 24.1219  |
|            hf_BigBird             |  1  | 23.9335  |
|           hf_DistilBert           |  1  | 23.5613  |
|        speech_transformer         |  1  | 23.1587  |
|          resnext50_32x4d          |  1  | 22.8071  |
|               hf_T5               |  1  | 22.7984  |
|             resnet50              |  1  | 22.5974  |
|              hf_Bart              |  1  | 22.2426  |
|       functorch_dp_cifar10        |  1  |  21.438  |
|              hf_GPT2              |  1  | 19.4325  |
|        shufflenet_v2_x1_0         |  1  | 18.9821  |
|           fastNLP_Bert            |  1  | 18.3992  |
|              hf_Bert              |  1  | 17.7705  |
|          LearningToPaint          |  1  | 17.7093  |
|      timm_vision_transformer      |  1  |  17.112  |
|             resnet18              |  1  | 16.6731  |
|           squeezenet1_1           |  1  | 16.4526  |
| attention_is_all_you_need_pytorch |  1  | 16.2151  |
|             hf_Albert             |  1  | 16.0676  |
|           mobilenet_v2            |  1  | 14.9278  |
|          pytorch_stargan          | 16  | 14.2514  |
|               vgg16               |  1  | 14.1918  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 13.2499  |
|                drq                |  1  | 11.6338  |
|              alexnet              |  1  | 10.7819  |
|               dlrm                |  1  | 10.6928  |
|      nvidia_deeprecommender       |  1  | 10.3391  |
|               dcgan               |  1  |  9.4681  |
|            tts_angular            |  1  |  8.4271  |
|         soft_actor_critic         | 256 |  8.4067  |
|           lennard_jones           |  1  |  7.9306  |
|              demucs               |  1  |  1.2121  |
|    mobilenet_v2_quantized_qat     |  1  |  0.0774  |
|      resnet50_quantized_qat       |  1  |  0.0578  |
|            mnasnet1_0             |  0  |   nan    |
|        mobilenet_v3_large         |  0  |   nan    |
|           timm_resnest            |  0  |   nan    |
|              yolov3               |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|      resnet50_quantized_qat       |  1  |  1.0555  |
|              demucs               |  1  |  0.9986  |
|               dlrm                |  1  |  0.9981  |
|        Background_Matting         |  1  |  0.9955  |
|    mobilenet_v2_quantized_qat     |  1  |  0.9918  |
|           pytorch_unet            |  1  |  0.9891  |
|         soft_actor_critic         | 256 |  0.9871  |
|          pytorch_stargan          | 16  |  0.9864  |
|            tts_angular            |  1  |  0.9839  |
|           lennard_jones           |  1  |  0.9835  |
|                drq                |  1  |  0.9819  |
|            hf_T5_base             |  1  |  0.9803  |
|            hf_BigBird             |  1  |  0.9766  |
|            Super_SloMo            |  1  |  0.9756  |
|              alexnet              |  1  |  0.9745  |
|               vgg16               |  1  |  0.9675  |
|           hf_DistilBert           |  1  |  0.9581  |
|            timm_vovnet            |  1  |  0.9498  |
|               dcgan               |  1  |  0.9484  |
|              hf_GPT2              |  1  |  0.947   |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9415  |
|   timm_vision_transformer_large   |  1  |  0.9377  |
|           squeezenet1_1           |  1  |  0.937   |
|       functorch_dp_cifar10        |  1  |  0.9323  |
|           mobilenet_v2            |  1  |  0.9289  |
|           BERT_pytorch            |  1  |  0.9276  |
|              hf_Bart              |  1  |  0.9274  |
|          LearningToPaint          |  1  |  0.9272  |
|        shufflenet_v2_x1_0         |  1  |  0.921   |
|        speech_transformer         |  1  |  0.9188  |
|          vision_maskrcnn          |  1  |  0.9154  |
|            hf_Reformer            |  1  |  0.9086  |
| attention_is_all_you_need_pytorch |  1  |   0.9    |
|             hf_Albert             |  1  |  0.8985  |
|            timm_regnet            |  1  |  0.8979  |
|             resnet18              |  1  |  0.8822  |
|      timm_vision_transformer      |  1  |  0.8794  |
|         timm_efficientnet         |  1  |  0.8656  |
|          resnext50_32x4d          |  1  |  0.8288  |
|             resnet50              |  1  |  0.8273  |
|            densenet121            |  1  |  0.8186  |
|     detectron2_fcos_r_50_fpn      |  1  |  0.8031  |
|               hf_T5               |  1  |  0.8024  |
|              hf_Bert              |  1  |  0.779   |
|           hf_GPT2_large           |  1  |  0.7665  |
|             resnet152             |  1  |  0.7302  |
|           fastNLP_Bert            |  1  |  0.7229  |
|           hf_Longformer           |  1  |  0.7169  |
|            timm_nfnet             |  1  |  0.6614  |
|      nvidia_deeprecommender       |  1  |  0.4974  |
|            hf_T5_large            |  1  |  0.4545  |
|            mnasnet1_0             |  0  |   nan    |
|        mobilenet_v3_large         |  0  |   nan    |
|           timm_resnest            |  0  |   nan    |
|              yolov3               |  0  |   nan    |
+-----------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------+-----+------------+
|               name                | bs  |  inductor  |
+-----------------------------------+-----+------------+
|            hf_T5_base             |  1  | 24338.544  |
|           hf_GPT2_large           |  1  | 17436.8148 |
|            hf_T5_large            |  1  | 7392.2429  |
|        Background_Matting         |  1  | 5839.6769  |
|           pytorch_unet            |  1  | 4485.8313  |
|   timm_vision_transformer_large   |  1  | 3919.5872  |
|          vision_maskrcnn          |  1  | 2901.5894  |
|          pytorch_stargan          | 16  | 2755.3495  |
|     detectron2_fcos_r_50_fpn      |  1  | 2393.4072  |
|            Super_SloMo            |  1  | 2220.5123  |
|              demucs               |  1  | 1954.1595  |
|            hf_BigBird             |  1  | 1786.2235  |
|              hf_Bart              |  1  | 1387.0216  |
|           hf_Longformer           |  1  | 1346.5585  |
|              hf_Bert              |  1  | 1062.5523  |
|             hf_Albert             |  1  |  938.716   |
|           fastNLP_Bert            |  1  |  878.4237  |
|        speech_transformer         |  1  |  791.4288  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  777.1568  |
|               hf_T5               |  1  |  743.4815  |
|           hf_DistilBert           |  1  |  626.4406  |
|              hf_GPT2              |  1  |  542.0892  |
|            hf_Reformer            |  1  |  496.1613  |
|             resnet152             |  1  |  249.2874  |
|               vgg16               |  1  |  241.9715  |
|            timm_nfnet             |  1  |  225.9756  |
|           BERT_pytorch            |  1  |  211.919   |
|            timm_regnet            |  1  |  204.7786  |
|          resnext50_32x4d          |  1  |  164.5818  |
|            timm_vovnet            |  1  |  152.549   |
|             resnet50              |  1  |  151.8386  |
|            densenet121            |  1  |  116.0932  |
|      timm_vision_transformer      |  1  |  112.181   |
|             resnet18              |  1  |  62.0212   |
|         timm_efficientnet         |  1  |  61.6157   |
|            tts_angular            |  1  |  55.7978   |
|      nvidia_deeprecommender       |  1  |  55.1794   |
|      resnet50_quantized_qat       |  1  |  38.2829   |
| attention_is_all_you_need_pytorch |  1  |  35.3115   |
|       functorch_dp_cifar10        |  1  |  33.3901   |
|              alexnet              |  1  |  31.0984   |
|           mobilenet_v2            |  1  |   27.257   |
|           squeezenet1_1           |  1  |  22.6721   |
|        shufflenet_v2_x1_0         |  1  |   18.763   |
|    mobilenet_v2_quantized_qat     |  1  |  17.8269   |
|          LearningToPaint          |  1  |   14.713   |
|               dcgan               |  1  |   7.3128   |
|         soft_actor_critic         | 256 |   3.9637   |
|                drq                |  1  |   3.5036   |
|               dlrm                |  1  |   0.7033   |
|           lennard_jones           |  1  |   0.0784   |
|            mnasnet1_0             |  0  |    nan     |
|        mobilenet_v3_large         |  0  |    nan     |
|           timm_resnest            |  0  |    nan     |
|              yolov3               |  0  |    nan     |
+-----------------------------------+-----+------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  2.0157  |
|               GoogleFnet                | 1  |  1.1464  |
|         Speech2Text2ForCausalLM         | 1  |  1.1193  |
|            XLNetLMHeadModel             | 1  |  1.1035  |
|             OPTForCausalLM              | 1  |  1.0493  |
|     DistilBertForQuestionAnswering      | 1  |  1.0272  |
|      MBartForConditionalGeneration      | 1  |  1.0185  |
|       DebertaForQuestionAnswering       | 1  |  1.0141  |
|     PegasusForConditionalGeneration     | 1  |  1.0138  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9942  |
|     PLBartForConditionalGeneration      | 1  |  0.9911  |
|       ElectraForQuestionAnswering       | 1  |  0.9823  |
|            TrOCRForCausalLM             | 1  |  0.9578  |
|            MBartForCausalLM             | 1  |  0.9551  |
|       AlbertForQuestionAnswering        | 1  |  0.9503  |
|           PegasusForCausalLM            | 1  |  0.9483  |
|            AlbertForMaskedLM            | 1  |  0.9433  |
|          MobileBertForMaskedLM          | 1  |  0.9305  |
|          DistilBertForMaskedLM          | 1  |  0.9283  |
|            PLBartForCausalLM            | 1  |  0.9071  |
|                 BigBird                 | 1  |  0.9054  |
|     M2M100ForConditionalGeneration      | 1  |  0.8989  |
|       BlenderbotSmallForCausalLM        | 1  |  0.8872  |
|            YituTechConvBert             | 1  |  0.8676  |
|    MegatronBertForQuestionAnswering     | 1  |  0.8559  |
|         MegatronBertForCausalLM         | 1  |  0.8358  |
|       RobertaForQuestionAnswering       | 1  |  0.831   |
|           ElectraForCausalLM            | 1  |  0.8181  |
|        BertForQuestionAnswering         | 1  |  0.8142  |
|           DebertaForMaskedLM            | 1  |  0.8102  |
|             XGLMForCausalLM             | 1  |  0.8052  |
|           RobertaForCausalLM            | 1  |  0.7969  |
|             BertForMaskedLM             | 1  |  0.795   |
|          AllenaiLongformerBase          | 1  |  0.7873  |
|               DistillGPT2               | 1  |  0.7127  |
|                CamemBert                | 1  |  0.7103  |
|    LayoutLMForSequenceClassification    | 1  |  0.7102  |
|           LayoutLMForMaskedLM           | 1  |  0.7085  |
|             BartForCausalLM             | 1  |  0.6974  |
|       MT5ForConditionalGeneration       | 1  |  0.6928  |
|      GPT2ForSequenceClassification      | 1  |  0.6634  |
|      BartForConditionalGeneration       | 1  |  0.6572  |
|                 T5Small                 | 1  |  0.6369  |
|       T5ForConditionalGeneration        | 1  |  0.615   |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 47.1175  |
|       AlbertForQuestionAnswering        | 1  | 44.5295  |
|          AllenaiLongformerBase          | 1  | 37.0788  |
|    MegatronBertForQuestionAnswering     | 1  | 36.2235  |
|            AlbertForMaskedLM            | 1  | 35.9871  |
|     M2M100ForConditionalGeneration      | 1  | 31.9347  |
|       MT5ForConditionalGeneration       | 1  | 31.5822  |
|          MobileBertForMaskedLM          | 1  | 31.4211  |
|      MBartForConditionalGeneration      | 1  | 30.9842  |
|     MobileBertForQuestionAnswering      | 1  | 30.8973  |
|                 T5Small                 | 1  | 30.2116  |
|     PegasusForConditionalGeneration     | 1  | 30.0553  |
|       T5ForConditionalGeneration        | 1  | 29.3266  |
|             XGLMForCausalLM             | 1  | 29.1562  |
|         MegatronBertForCausalLM         | 1  | 27.7735  |
|             BartForCausalLM             | 1  | 25.5648  |
|           DebertaForMaskedLM            | 1  | 25.1827  |
|                 BigBird                 | 1  | 25.0161  |
|       DebertaForQuestionAnswering       | 1  | 24.3473  |
|            XLNetLMHeadModel             | 1  | 23.5854  |
|            YituTechConvBert             | 1  | 22.6812  |
|           PegasusForCausalLM            | 1  | 21.3936  |
| BlenderbotSmallForConditionalGeneration | 1  | 20.6178  |
|      GPT2ForSequenceClassification      | 1  | 20.3253  |
|     PLBartForConditionalGeneration      | 1  |  20.325  |
|            MBartForCausalLM             | 1  | 20.3167  |
|             OPTForCausalLM              | 1  | 20.0142  |
|                CamemBert                | 1  |  19.289  |
|            TrOCRForCausalLM             | 1  | 19.0708  |
|             BertForMaskedLM             | 1  | 18.9873  |
|           LayoutLMForMaskedLM           | 1  |  18.862  |
|    LayoutLMForSequenceClassification    | 1  | 18.8137  |
|               DistillGPT2               | 1  | 18.3318  |
|           RobertaForCausalLM            | 1  |  17.153  |
|           ElectraForCausalLM            | 1  | 16.6059  |
|       RobertaForQuestionAnswering       | 1  | 16.4143  |
|        BertForQuestionAnswering         | 1  | 15.9136  |
|       ElectraForQuestionAnswering       | 1  | 15.6335  |
|         Speech2Text2ForCausalLM         | 1  | 15.4609  |
|       BlenderbotSmallForCausalLM        | 1  | 15.1055  |
|            PLBartForCausalLM            | 1  | 14.5719  |
|          DistilBertForMaskedLM          | 1  | 14.5081  |
|               GoogleFnet                | 1  | 13.8878  |
|     DistilBertForQuestionAnswering      | 1  | 13.7853  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|               GoogleFnet                | 1  |  0.9833  |
|                 BigBird                 | 1  |  0.982   |
|       DebertaForQuestionAnswering       | 1  |  0.9742  |
|               DistillGPT2               | 1  |  0.9711  |
|             BartForCausalLM             | 1  |  0.9692  |
|           DebertaForMaskedLM            | 1  |  0.9677  |
|            PLBartForCausalLM            | 1  |  0.9642  |
|           PegasusForCausalLM            | 1  |  0.9594  |
|            TrOCRForCausalLM             | 1  |  0.9588  |
|     DistilBertForQuestionAnswering      | 1  |  0.9588  |
|            MBartForCausalLM             | 1  |  0.9578  |
|          DistilBertForMaskedLM          | 1  |  0.9566  |
|     M2M100ForConditionalGeneration      | 1  |  0.9529  |
|     PegasusForConditionalGeneration     | 1  |  0.9484  |
|             OPTForCausalLM              | 1  |  0.9472  |
|         Speech2Text2ForCausalLM         | 1  |  0.9429  |
|       BlenderbotSmallForCausalLM        | 1  |  0.9421  |
|          AllenaiLongformerBase          | 1  |  0.9393  |
|             XGLMForCausalLM             | 1  |  0.9383  |
|        BertForQuestionAnswering         | 1  |  0.9267  |
|       RobertaForQuestionAnswering       | 1  |  0.9263  |
|     PLBartForConditionalGeneration      | 1  |  0.9139  |
|           RobertaForCausalLM            | 1  |  0.9058  |
|             BertForMaskedLM             | 1  |  0.9053  |
|      BartForConditionalGeneration       | 1  |  0.9017  |
|      MBartForConditionalGeneration      | 1  |  0.8851  |
|       ElectraForQuestionAnswering       | 1  |  0.8503  |
|           ElectraForCausalLM            | 1  |  0.8476  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.8268  |
|      GPT2ForSequenceClassification      | 1  |  0.8072  |
|            XLNetLMHeadModel             | 1  |  0.792   |
|                CamemBert                | 1  |  0.7851  |
|           LayoutLMForMaskedLM           | 1  |  0.7813  |
|            YituTechConvBert             | 1  |  0.7812  |
|       T5ForConditionalGeneration        | 1  |  0.7482  |
|                 T5Small                 | 1  |  0.742   |
|    LayoutLMForSequenceClassification    | 1  |  0.7225  |
|       AlbertForQuestionAnswering        | 1  |  0.6756  |
|            AlbertForMaskedLM            | 1  |  0.6756  |
|         MegatronBertForCausalLM         | 1  |  0.6464  |
|    MegatronBertForQuestionAnswering     | 1  |  0.641   |
|          MobileBertForMaskedLM          | 1  |  0.5617  |
|     MobileBertForQuestionAnswering      | 1  |  0.5454  |
|       MT5ForConditionalGeneration       | 1  |  0.4533  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|            AlbertForMaskedLM            | 1  | 15979.3074 |
|       AlbertForQuestionAnswering        | 1  | 15761.2033 |
|      BartForConditionalGeneration       | 1  | 10401.2985 |
|             BartForCausalLM             | 1  | 4551.3694  |
|            XLNetLMHeadModel             | 1  | 3923.4382  |
|          AllenaiLongformerBase          | 1  |  3044.825  |
|      GPT2ForSequenceClassification      | 1  | 2708.5162  |
|       T5ForConditionalGeneration        | 1  | 2494.2517  |
|                 BigBird                 | 1  | 2485.3698  |
|                 T5Small                 | 1  | 2461.8846  |
|             XGLMForCausalLM             | 1  | 1297.3975  |
|           DebertaForMaskedLM            | 1  | 1184.1494  |
|           LayoutLMForMaskedLM           | 1  | 1175.5668  |
|                CamemBert                | 1  | 1169.6708  |
|            YituTechConvBert             | 1  | 1045.6575  |
|     M2M100ForConditionalGeneration      | 1  |  1002.25   |
|       DebertaForQuestionAnswering       | 1  |  972.4197  |
|    LayoutLMForSequenceClassification    | 1  |  939.5538  |
|     PegasusForConditionalGeneration     | 1  |  914.3748  |
|      MBartForConditionalGeneration      | 1  |  909.1633  |
|               DistillGPT2               | 1  |  866.172   |
|       MT5ForConditionalGeneration       | 1  |  837.5624  |
|               GoogleFnet                | 1  |  762.3751  |
|         MegatronBertForCausalLM         | 1  |  741.3603  |
|    MegatronBertForQuestionAnswering     | 1  |  659.098   |
|           PegasusForCausalLM            | 1  |  469.6912  |
|            MBartForCausalLM             | 1  |  466.6514  |
|            TrOCRForCausalLM             | 1  |  464.9123  |
|     PLBartForConditionalGeneration      | 1  |  355.0224  |
|           ElectraForCausalLM            | 1  |  353.6413  |
|             OPTForCausalLM              | 1  |  285.7109  |
| BlenderbotSmallForConditionalGeneration | 1  |  276.1258  |
|             BertForMaskedLM             | 1  |  270.0311  |
|           RobertaForCausalLM            | 1  |  269.9671  |
|            PLBartForCausalLM            | 1  |  216.2976  |
|       ElectraForQuestionAnswering       | 1  |  205.9566  |
|        BertForQuestionAnswering         | 1  |  203.536   |
|       RobertaForQuestionAnswering       | 1  |  199.4976  |
|          DistilBertForMaskedLM          | 1  |  173.1242  |
|       BlenderbotSmallForCausalLM        | 1  |  170.3158  |
|          MobileBertForMaskedLM          | 1  |  142.1039  |
|     DistilBertForQuestionAnswering      | 1  |  101.307   |
|     MobileBertForQuestionAnswering      | 1  |  58.3095   |
|         Speech2Text2ForCausalLM         | 1  |  33.4075   |
+-----------------------------------------+----+------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.5261  |
|           regnety_002           | 1  |  1.4805  |
|           fbnetc_100            | 1  |  1.4311  |
|          spnasnet_100           | 1  |  1.4109  |
|           mnasnet_100           | 1  |  1.4031  |
|         mobilenetv2_100         | 1  |  1.3833  |
|            fbnetv3_b            | 1  |  1.3708  |
|          inception_v3           | 1  |  1.3517  |
|        ese_vovnet19b_dw         | 1  |  1.3445  |
|       gluon_inception_v3        | 1  |  1.3418  |
|        adv_inception_v3         | 1  |  1.333   |
|          botnet26t_256          | 1  |  1.3111  |
|      mobilenetv3_large_100      | 1  |  1.2825  |
|            lcnet_050            | 1  |  1.2651  |
|          gmixer_24_224          | 1  |  1.258   |
|            gernet_l             | 1  |  1.193   |
|          cspdarknet53           | 1  |  1.1808  |
|            nfnet_l0             | 1  |  1.1806  |
|           dm_nfnet_f0           | 1  |  1.166   |
|           volo_d1_224           | 1  |  1.1242  |
|            repvgg_a2            | 1  |  1.121   |
|          ghostnet_100           | 1  |  1.0803  |
|      beit_base_patch16_224      | 1  |  1.0729  |
|             dpn107              | 1  |  1.067   |
|        tnt_s_patch16_224        | 1  |  1.0514  |
|         crossvit_9_240          | 1  |  1.0269  |
|        gluon_xception65         | 1  |  1.0139  |
|           tf_mixnet_l           | 1  |  0.9924  |
|           convit_base           | 1  |  0.9908  |
|            mixnet_l             | 1  |  0.9894  |
|           rexnet_100            | 1  |  0.988   |
|          resmlp_12_224          | 1  |  0.9627  |
|           selecsls42b           | 1  |  0.9512  |
|         coat_lite_mini          | 1  |  0.9506  |
|        sebotnet33ts_256         | 1  |  0.9414  |
|        eca_halonext26ts         | 1  |  0.9367  |
|      vit_base_patch16_224       | 1  |  0.9362  |
|       eca_botnext26ts_256       | 1  |  0.9286  |
|        res2net50_14w_8s         | 1  |  0.922   |
|          jx_nest_base           | 1  |  0.9137  |
|            pit_b_224            | 1  |  0.9052  |
|        res2net101_26w_4s        | 1  |  0.902   |
|          mixer_b16_224          | 1  |  0.8879  |
|           mobilevit_s           | 1  |  0.8722  |
|      xcit_large_24_p8_224       | 1  |  0.8715  |
|           res2next50            | 1  |  0.8654  |
|            tinynet_a            | 1  |  0.8091  |
|             dla102              | 1  |  0.8081  |
|  swin_base_patch4_window7_224   | 1  |  0.7887  |
| deit_base_distilled_patch16_224 | 1  |  0.7847  |
|       tf_efficientnet_b0        | 1  |  0.7563  |
|          cait_m36_384           | 1  |  0.7369  |
|          gmlp_s16_224           | 1  |  0.7355  |
|         poolformer_m36          | 1  |  0.7316  |
|          convnext_base          | 1  |  0.6633  |
|        twins_pcpvt_base         | 1  |  0.6567  |
|     swsl_resnext101_32x16d      | 0  |   0.0    |
|        convmixer_768_32         | 0  |   0.0    |
|         visformer_small         | 0  |   0.0    |
|            hrnet_w18            | 0  |   0.0    |
|           resnest101e           | 0  |   0.0    |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|           convit_base           | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |  fail_to_run  |
|        convmixer_768_32         | 1  |  fail_to_run  |
|            hrnet_w18            | 1  |  fail_to_run  |
|         visformer_small         | 1  |  fail_to_run  |
|           resnest101e           | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 65.8488  |
|           tf_mixnet_l           | 1  | 52.3016  |
|        twins_pcpvt_base         | 1  | 51.7848  |
|           mobilevit_s           | 1  | 46.7904  |
|            mixnet_l             | 1  | 45.6539  |
|           rexnet_100            | 1  | 45.2664  |
|          cait_m36_384           | 1  | 45.1329  |
|  swin_base_patch4_window7_224   | 1  | 44.4902  |
|         poolformer_m36          | 1  | 43.0483  |
|          jx_nest_base           | 1  | 42.9664  |
|        res2net50_14w_8s         | 1  | 40.3109  |
|             dpn107              | 1  | 40.1958  |
|      xcit_large_24_p8_224       | 1  | 40.0007  |
|       tf_efficientnet_b0        | 1  | 39.7331  |
|         coat_lite_mini          | 1  | 39.6945  |
|           dm_nfnet_f0           | 1  |  39.247  |
|          ghostnet_100           | 1  | 38.4214  |
|            tinynet_a            | 1  | 38.1868  |
|          spnasnet_100           | 1  | 38.1579  |
|            fbnetv3_b            | 1  | 37.8081  |
|        sebotnet33ts_256         | 1  | 36.1111  |
|             dla102              | 1  | 34.4196  |
|            nfnet_l0             | 1  |  34.343  |
|        eca_halonext26ts         | 1  | 34.0299  |
|      mobilenetv3_large_100      | 1  |  32.944  |
|        res2net101_26w_4s        | 1  | 32.7057  |
|         crossvit_9_240          | 1  | 31.8976  |
|           volo_d1_224           | 1  | 30.7166  |
|       eca_botnext26ts_256       | 1  | 29.5281  |
|        tnt_s_patch16_224        | 1  | 29.2087  |
|            pit_b_224            | 1  |  28.442  |
|          cspdarknet53           | 1  | 28.4177  |
|           res2next50            | 1  |  27.934  |
|          botnet26t_256          | 1  | 27.4151  |
|        adv_inception_v3         | 1  | 27.2005  |
|           regnety_002           | 1  | 27.1994  |
|          inception_v3           | 1  | 27.0732  |
|       gluon_inception_v3        | 1  | 27.0532  |
|           fbnetc_100            | 1  | 26.9271  |
|           mnasnet_100           | 1  | 24.7638  |
|           convit_base           | 1  | 24.6206  |
|         mobilenetv2_100         | 1  | 24.6136  |
|          convnext_base          | 1  | 24.1569  |
|          gmlp_s16_224           | 1  | 23.7206  |
|           selecsls42b           | 1  | 23.4718  |
|            gernet_l             | 1  | 20.3854  |
|          gmixer_24_224          | 1  |  19.723  |
|        ese_vovnet19b_dw         | 1  | 19.0525  |
| deit_base_distilled_patch16_224 | 1  | 18.8981  |
|        gluon_xception65         | 1  | 18.4347  |
|          mixer_b16_224          | 1  | 17.9788  |
|            repvgg_a2            | 1  | 17.8001  |
|            lcnet_050            | 1  | 17.4862  |
|      beit_base_patch16_224      | 1  | 17.1874  |
|      vit_base_patch16_224       | 1  | 17.1663  |
|          resmlp_12_224          | 1  | 13.6514  |
|        convmixer_768_32         | 0  |   nan    |
|            hrnet_w18            | 0  |   nan    |
|           resnest101e           | 0  |   nan    |
|     swsl_resnext101_32x16d      | 0  |   nan    |
|         visformer_small         | 0  |   nan    |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|             dpn107              | 1  |  0.9625  |
|        ese_vovnet19b_dw         | 1  |  0.9465  |
|            pit_b_224            | 1  |  0.9402  |
|          resmlp_12_224          | 1  |  0.9361  |
| deit_base_distilled_patch16_224 | 1  |  0.9315  |
|      vit_base_patch16_224       | 1  |  0.9313  |
|            gernet_l             | 1  |  0.9282  |
|            repvgg_a2            | 1  |  0.9267  |
|          cspdarknet53           | 1  |  0.9266  |
|            lcnet_050            | 1  |  0.924   |
|          mixer_b16_224          | 1  |  0.9236  |
|           convit_base           | 1  |  0.9232  |
|      beit_base_patch16_224      | 1  |  0.9217  |
|          botnet26t_256          | 1  |  0.9201  |
|         coat_lite_mini          | 1  |   0.92   |
|       eca_botnext26ts_256       | 1  |  0.9086  |
|            nfnet_l0             | 1  |  0.9076  |
|           rexnet_100            | 1  |  0.9064  |
|           mnasnet_100           | 1  |  0.9061  |
|          convnext_base          | 1  |  0.9027  |
|        eca_halonext26ts         | 1  |  0.8989  |
|         mobilenetv2_100         | 1  |  0.8979  |
|          ghostnet_100           | 1  |  0.8957  |
|          cait_m36_384           | 1  |  0.8954  |
|           regnety_002           | 1  |  0.8937  |
|        sebotnet33ts_256         | 1  |  0.8895  |
|           fbnetc_100            | 1  |  0.884   |
|          spnasnet_100           | 1  |  0.8834  |
|            mixnet_l             | 1  |  0.8826  |
|           tf_mixnet_l           | 1  |  0.8784  |
|      mobilenetv3_large_100      | 1  |  0.8757  |
|      xcit_large_24_p8_224       | 1  |  0.8743  |
|         crossvit_9_240          | 1  |  0.8711  |
|           mobilevit_s           | 1  |  0.8643  |
|       tf_efficientnet_b0        | 1  |  0.8635  |
|           dm_nfnet_f0           | 1  |  0.8582  |
|          gmixer_24_224          | 1  |  0.8422  |
|            tinynet_a            | 1  |  0.842   |
|       gluon_inception_v3        | 1  |  0.8373  |
|        adv_inception_v3         | 1  |  0.8373  |
|          inception_v3           | 1  |  0.837   |
|  swin_base_patch4_window7_224   | 1  |  0.8339  |
|           res2next50            | 1  |  0.8255  |
|           selecsls42b           | 1  |  0.8204  |
|        res2net50_14w_8s         | 1  |  0.8045  |
|        tnt_s_patch16_224        | 1  |  0.803   |
|             dla102              | 1  |  0.8019  |
|        gluon_xception65         | 1  |  0.7932  |
|           volo_d1_224           | 1  |  0.7754  |
|            fbnetv3_b            | 1  |  0.7697  |
|          jx_nest_base           | 1  |  0.7662  |
|        res2net101_26w_4s        | 1  |  0.7615  |
|          pnasnet5large          | 1  |  0.7516  |
|          gmlp_s16_224           | 1  |  0.7425  |
|         poolformer_m36          | 1  |  0.6931  |
|        twins_pcpvt_base         | 1  |  0.6234  |
|        convmixer_768_32         | 0  |   nan    |
|            hrnet_w18            | 0  |   nan    |
|           resnest101e           | 0  |   nan    |
|     swsl_resnext101_32x16d      | 0  |   nan    |
|         visformer_small         | 0  |   nan    |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|          cait_m36_384           | 1  | 5000.4837 |
|      xcit_large_24_p8_224       | 1  | 2106.4201 |
|          pnasnet5large          | 1  | 519.3834  |
|          jx_nest_base           | 1  | 493.5201  |
|           dm_nfnet_f0           | 1  | 428.9768  |
|           convit_base           | 1  | 396.2937  |
|  swin_base_patch4_window7_224   | 1  | 373.1167  |
|          convnext_base          | 1  | 363.4278  |
|            pit_b_224            | 1  | 340.8231  |
|      vit_base_patch16_224       | 1  | 325.3479  |
|      beit_base_patch16_224      | 1  | 317.7918  |
| deit_base_distilled_patch16_224 | 1  | 312.6354  |
|         poolformer_m36          | 1  | 298.2161  |
|             dpn107              | 1  | 287.9234  |
|             dla102              | 1  | 270.2644  |
|            nfnet_l0             | 1  | 258.4369  |
|          mixer_b16_224          | 1  | 247.0671  |
|        gluon_xception65         | 1  | 241.2717  |
|        twins_pcpvt_base         | 1  | 224.3239  |
|           volo_d1_224           | 1  | 201.5605  |
|          gmlp_s16_224           | 1  |  200.212  |
|        tnt_s_patch16_224        | 1  | 185.3776  |
|        sebotnet33ts_256         | 1  | 184.1834  |
|          cspdarknet53           | 1  | 181.7837  |
|        res2net101_26w_4s        | 1  | 178.8889  |
|        res2net50_14w_8s         | 1  | 165.3656  |
|           res2next50            | 1  | 161.0902  |
|            repvgg_a2            | 1  | 161.0307  |
|       gluon_inception_v3        | 1  | 160.1412  |
|          inception_v3           | 1  | 159.7522  |
|        adv_inception_v3         | 1  | 159.5323  |
|           mobilevit_s           | 1  | 149.5235  |
|           selecsls42b           | 1  | 135.0448  |
|        eca_halonext26ts         | 1  | 122.3759  |
|       eca_botnext26ts_256       | 1  | 119.8172  |
|          gmixer_24_224          | 1  | 109.4654  |
|         coat_lite_mini          | 1  | 103.9586  |
|           tf_mixnet_l           | 1  |  97.0654  |
|            gernet_l             | 1  |  93.7525  |
|            mixnet_l             | 1  |  90.0455  |
|          botnet26t_256          | 1  |  89.0058  |
|         crossvit_9_240          | 1  |  71.6891  |
|       tf_efficientnet_b0        | 1  |  70.8574  |
|          resmlp_12_224          | 1  |  67.9699  |
|           rexnet_100            | 1  |  54.5953  |
|            tinynet_a            | 1  |  52.4606  |
|            fbnetv3_b            | 1  |  46.5291  |
|        ese_vovnet19b_dw         | 1  |  44.5556  |
|          ghostnet_100           | 1  |  34.2241  |
|           fbnetc_100            | 1  |  28.5216  |
|         mobilenetv2_100         | 1  |  27.3889  |
|          spnasnet_100           | 1  |  26.9365  |
|           mnasnet_100           | 1  |  24.4007  |
|      mobilenetv3_large_100      | 1  |  23.7384  |
|           regnety_002           | 1  |  15.3487  |
|            lcnet_050            | 1  |  8.0231   |
|        convmixer_768_32         | 0  |    nan    |
|            hrnet_w18            | 0  |    nan    |
|           resnest101e           | 0  |    nan    |
|     swsl_resnext101_32x16d      | 0  |    nan    |
|         visformer_small         | 0  |    nan    |
+---------------------------------+----+-----------+

ESI-SYD · 2022-11-18T08:19:54Z

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-16 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	0662e90	e2f0648
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 95%, 58/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.09x    |    1.06x    |    1.09x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   13.75    |    16.98    |    21.17    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.94x    |    0.95x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|         soft_actor_critic         | 256  |  1.6252  |
|           squeezenet1_1           |  16  |  1.5821  |
|           timm_resnest            |  32  |  1.3733  |
|        mobilenet_v3_large         |  32  |  1.3718  |
|        shufflenet_v2_x1_0         |  64  |  1.3643  |
|            mnasnet1_0             |  32  |  1.3223  |
|           mobilenet_v2            |  16  |  1.2357  |
|             resnet18              |  8   |  1.2326  |
|            densenet121            |  64  |  1.2309  |
|           pytorch_unet            |  1   |  1.186   |
|             resnet50              |  32  |  1.1807  |
|          resnext50_32x4d          |  8   |  1.1735  |
|             resnet152             |  32  |  1.1562  |
|              alexnet              | 128  |  1.1477  |
|            timm_vovnet            |  32  |  1.1097  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.1049  |
|        Background_Matting         |  1   |  1.0849  |
|          vision_maskrcnn          |  1   |  1.0788  |
|               vgg16               |  4   |  1.0784  |
|            Super_SloMo            |  6   |  1.0687  |
|              yolov3               |  8   |  1.0684  |
|               dlrm                | 2048 |  1.0652  |
|            hf_T5_large            |  1   |  1.0443  |
|            timm_regnet            |  32  |  1.0366  |
|           BERT_pytorch            |  2   |  1.0277  |
|          pytorch_stargan          |  16  |  1.026   |
|               dcgan               | 256  |  1.0183  |
|            tts_angular            |  64  |  1.0072  |
|              demucs               |  1   |  1.0036  |
|          LearningToPaint          |  96  |  0.9895  |
|            hf_BigBird             |  1   |  0.9858  |
|            hf_Reformer            |  1   |  0.9762  |
|           hf_Longformer           |  1   |  0.9692  |
|                drq                |  1   |  0.9596  |
|               hf_T5               |  1   |  0.9589  |
|           hf_GPT2_large           |  1   |  0.9334  |
|            timm_nfnet             | 128  |  0.9175  |
|        speech_transformer         |  1   |  0.9138  |
|              hf_GPT2              |  1   |  0.9076  |
|   timm_vision_transformer_large   |  8   |  0.9039  |
|       functorch_dp_cifar10        |  64  |  0.8978  |
|             hf_Albert             |  1   |  0.884   |
|           hf_DistilBert           |  1   |  0.8839  |
|           fastNLP_Bert            |  1   |  0.8578  |
|              hf_Bart              |  1   |  0.8356  |
|      nvidia_deeprecommender       | 256  |  0.8324  |
|            hf_T5_base             |  1   |  0.8256  |
|         timm_efficientnet         |  64  |  0.7539  |
|      timm_vision_transformer      |  8   |  0.7304  |
| attention_is_all_you_need_pytorch |  32  |  0.7104  |
|              hf_Bert              |  1   |  0.7102  |
|           lennard_jones           | 1000 |  0.3472  |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |
|    mobilenet_v2_quantized_qat     |  0   |   0.0    |
|      resnet50_quantized_qat       |  0   |   0.0    |
+-----------------------------------+------+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  2  | pass_due_to_skip |
|           hf_GPT2_large           |  2  | pass_due_to_skip |
|   timm_vision_transformer_large   |  2  | pass_due_to_skip |
|           fastNLP_Bert            |  2  |       pass       |
|             hf_Albert             |  2  |       pass       |
|          LearningToPaint          |  2  |       pass       |
|            Super_SloMo            |  2  |       pass       |
|              alexnet              |  2  |       pass       |
| attention_is_all_you_need_pytorch |  2  |       pass       |
|               dcgan               |  2  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  2  |       pass       |
|               dlrm                |  2  |       pass       |
|                drq                |  1  |       pass       |
|       functorch_dp_cifar10        |  2  |       pass       |
|              hf_Bart              |  2  |       pass       |
|              hf_Bert              |  2  |       pass       |
|            hf_BigBird             |  2  |       pass       |
|           hf_DistilBert           |  2  |       pass       |
|              hf_GPT2              |  2  |       pass       |
|           hf_Longformer           |  2  |       pass       |
|            hf_Reformer            |  2  |       pass       |
|               hf_T5               |  2  |       pass       |
|            hf_T5_base             |  2  |       pass       |
|              yolov3               |  2  |       pass       |
|           lennard_jones           |  2  |       pass       |
|            mnasnet1_0             |  2  |       pass       |
|           mobilenet_v2            |  2  |       pass       |
|          resnext50_32x4d          |  2  |       pass       |
|           BERT_pytorch            |  2  |       pass       |
|        shufflenet_v2_x1_0         |  2  |       pass       |
|        mobilenet_v3_large         |  2  |       pass       |
|      nvidia_deeprecommender       |  2  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  2  |       pass       |
|             resnet152             |  2  |       pass       |
|             resnet18              |  2  |       pass       |
|               vgg16               |  2  |       pass       |
|             resnet50              |  2  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        Background_Matting         |  1  |       pass       |
|        speech_transformer         |  2  |       pass       |
|           squeezenet1_1           |  2  |       pass       |
|         timm_efficientnet         |  2  |       pass       |
|            timm_nfnet             |  2  |       pass       |
|            timm_regnet            |  2  |       pass       |
|           timm_resnest            |  2  |       pass       |
|      timm_vision_transformer      |  2  |       pass       |
|            timm_vovnet            |  2  |       pass       |
|            tts_angular            |  2  |       pass       |
|    mobilenet_v2_quantized_qat     |  2  |   fail_to_run    |
|      resnet50_quantized_qat       |  2  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  2  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|           BERT_pytorch            |  2   | 45.3345  |
|          vision_maskrcnn          |  1   | 37.3994  |
|            hf_T5_large            |  1   | 35.8439  |
|           hf_GPT2_large           |  1   | 23.8375  |
|   timm_vision_transformer_large   |  8   | 23.6431  |
|            timm_nfnet             | 128  | 22.9685  |
|            hf_T5_base             |  1   | 22.7697  |
|            hf_BigBird             |  1   | 22.2036  |
|           hf_Longformer           |  1   | 22.1697  |
|            densenet121            |  64  | 21.2738  |
|              yolov3               |  8   | 20.0725  |
|          pytorch_stargan          |  16  | 18.7711  |
|        speech_transformer         |  1   |  17.199  |
|            Super_SloMo            |  6   | 16.1839  |
|             resnet152             |  32  | 15.1095  |
|              hf_Bart              |  1   | 14.1614  |
|            timm_regnet            |  32  | 14.1071  |
|            hf_Reformer            |  1   | 13.6092  |
|         timm_efficientnet         |  64  | 13.5555  |
|           fastNLP_Bert            |  1   | 13.4595  |
| attention_is_all_you_need_pytorch |  32  | 13.2475  |
|               hf_T5               |  1   |  13.152  |
|              hf_Bert              |  1   | 12.6097  |
|              hf_GPT2              |  1   | 11.9871  |
|      timm_vision_transformer      |  8   | 11.7748  |
|             hf_Albert             |  1   | 11.6249  |
|            timm_vovnet            |  32  | 11.2489  |
|        Background_Matting         |  1   | 11.1389  |
|        shufflenet_v2_x1_0         |  64  | 10.6221  |
|             resnet50              |  32  | 10.1995  |
|           mobilenet_v2            |  16  | 10.1357  |
|           hf_DistilBert           |  1   | 10.1328  |
|        mobilenet_v3_large         |  32  | 10.0824  |
|          resnext50_32x4d          |  8   |  9.8654  |
|       functorch_dp_cifar10        |  64  |  9.5765  |
|           timm_resnest            |  32  |  9.5476  |
|           pytorch_unet            |  1   |  9.4959  |
|            mnasnet1_0             |  32  |  9.4026  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  9.2776  |
|          LearningToPaint          |  96  |  8.8359  |
|             resnet18              |  8   |  8.5035  |
|           squeezenet1_1           |  16  |  8.4414  |
|      nvidia_deeprecommender       | 256  |  8.3935  |
|               vgg16               |  4   |  8.3262  |
|               dlrm                | 2048 |  8.2111  |
|              alexnet              | 128  |  8.2034  |
|            tts_angular            |  64  |  8.1786  |
|                drq                |  1   |  7.9311  |
|               dcgan               | 256  |  7.8473  |
|         soft_actor_critic         | 256  |  7.7683  |
|           lennard_jones           | 1000 |  7.7577  |
|              demucs               |  1   |  1.4649  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan    |
|      resnet50_quantized_qat       |  0   |   nan    |
+-----------------------------------+------+----------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+
|               name                |  bs  | inductor |
+-----------------------------------+------+----------+
|              demucs               |  1   |  0.9986  |
|               dlrm                | 2048 |  0.9976  |
|        Background_Matting         |  1   |  0.9949  |
|               vgg16               |  4   |  0.9928  |
|            Super_SloMo            |  6   |  0.9917  |
|          LearningToPaint          |  96  |  0.9907  |
|            densenet121            |  64  |  0.9902  |
|             resnet152             |  32  |   0.99   |
|            hf_BigBird             |  1   |  0.9898  |
|            mnasnet1_0             |  32  |  0.9897  |
|            timm_nfnet             | 128  |  0.9886  |
|             resnet50              |  32  |  0.9885  |
|           pytorch_unet            |  1   |  0.9877  |
|        mobilenet_v3_large         |  32  |  0.9873  |
|                drq                |  1   |  0.9863  |
|              yolov3               |  8   |  0.9854  |
|           mobilenet_v2            |  16  |  0.9851  |
|           lennard_jones           | 1000 |  0.984   |
|        shufflenet_v2_x1_0         |  64  |  0.9827  |
|            timm_vovnet            |  32  |  0.9817  |
|         soft_actor_critic         | 256  |  0.9806  |
|            tts_angular            |  64  |  0.9791  |
|          vision_maskrcnn          |  1   |  0.9777  |
|          resnext50_32x4d          |  8   |  0.9776  |
|           squeezenet1_1           |  16  |  0.9776  |
|        speech_transformer         |  1   |  0.9731  |
|           hf_GPT2_large           |  1   |  0.9702  |
|           hf_DistilBert           |  1   |  0.9689  |
|         timm_efficientnet         |  64  |  0.9623  |
|             hf_Albert             |  1   |  0.9621  |
|   timm_vision_transformer_large   |  8   |  0.9611  |
|          pytorch_stargan          |  16  |  0.959   |
|              hf_Bart              |  1   |  0.9583  |
|             resnet18              |  8   |  0.9572  |
|            timm_regnet            |  32  |  0.956   |
|              hf_GPT2              |  1   |  0.9545  |
| attention_is_all_you_need_pytorch |  32  |  0.9534  |
|           BERT_pytorch            |  2   |  0.9524  |
|            hf_T5_base             |  1   |  0.939   |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.9369  |
|            hf_Reformer            |  1   |  0.933   |
|              alexnet              | 128  |  0.9299  |
|      timm_vision_transformer      |  8   |  0.9081  |
|           fastNLP_Bert            |  1   |  0.9071  |
|       functorch_dp_cifar10        |  64  |  0.8914  |
|           timm_resnest            |  32  |  0.8874  |
|               dcgan               | 256  |  0.8806  |
|           hf_Longformer           |  1   |  0.8593  |
|              hf_Bert              |  1   |  0.7947  |
|               hf_T5               |  1   |  0.765   |
|            hf_T5_large            |  1   |  0.6635  |
|      nvidia_deeprecommender       | 256  |  0.5843  |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |
|    mobilenet_v2_quantized_qat     |  0   |   nan    |
|      resnet50_quantized_qat       |  0   |   nan    |
+-----------------------------------+------+----------+

Absolute latency (ms)

+-----------------------------------+------+-----------+
|               name                |  bs  | inductor  |
+-----------------------------------+------+-----------+
|   timm_vision_transformer_large   |  8   | 1422.3969 |
|            hf_T5_base             |  1   | 1206.0799 |
|            timm_nfnet             | 128  | 1060.8077 |
|           hf_GPT2_large           |  1   | 877.9491  |
|            Super_SloMo            |  6   | 638.0393  |
|            hf_T5_large            |  1   | 448.5525  |
|          vision_maskrcnn          |  1   | 383.4892  |
|            timm_regnet            |  32  | 382.0405  |
|             resnet152             |  32  | 309.6735  |
|            densenet121            |  64  | 288.5369  |
|        Background_Matting         |  1   | 265.7255  |
|          pytorch_stargan          |  16  | 246.0795  |
|              yolov3               |  8   | 233.6774  |
|           pytorch_unet            |  1   | 223.4771  |
|            hf_BigBird             |  1   | 199.5483  |
|              demucs               |  1   | 192.0452  |
|         timm_efficientnet         |  64  | 184.3108  |
|            timm_vovnet            |  32  | 165.5849  |
|             resnet50              |  32  | 133.4325  |
|           hf_Longformer           |  1   |  89.3846  |
|           timm_resnest            |  32  |  85.8821  |
|              hf_Bart              |  1   |  84.5563  |
|              hf_Bert              |  1   |  79.6488  |
|            tts_angular            |  64  |  59.9619  |
|        speech_transformer         |  1   |  59.4359  |
|             hf_Albert             |  1   |  57.6235  |
|           fastNLP_Bert            |  1   |  57.2893  |
|              alexnet              | 128  |  56.4508  |
| attention_is_all_you_need_pytorch |  32  |  55.1287  |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  53.7583  |
|          LearningToPaint          |  96  |  51.7449  |
|               hf_T5               |  1   |  49.0007  |
|      timm_vision_transformer      |  8   |  48.6971  |
|              hf_GPT2              |  1   |  44.2553  |
|      nvidia_deeprecommender       | 256  |  42.8203  |
|               vgg16               |  4   |  40.8964  |
|           hf_DistilBert           |  1   |  37.3271  |
|          resnext50_32x4d          |  8   |  35.4202  |
|            hf_Reformer            |  1   |  34.1554  |
|           BERT_pytorch            |  2   |  31.4222  |
|            mnasnet1_0             |  32  |  28.8491  |
|        shufflenet_v2_x1_0         |  64  |  26.5383  |
|        mobilenet_v3_large         |  32  |  25.7928  |
|               dcgan               | 256  |  24.0592  |
|           mobilenet_v2            |  16  |  18.8528  |
|             resnet18              |  8   |  12.4553  |
|           squeezenet1_1           |  16  |  9.9098   |
|               dlrm                | 2048 |  6.7473   |
|       functorch_dp_cifar10        |  64  |  6.3668   |
|         soft_actor_critic         | 256  |  1.2018   |
|                drq                |  1   |  0.9108   |
|           lennard_jones           | 1000 |  0.5708   |
|     detectron2_fcos_r_50_fpn      |  0   |    nan    |
|    mobilenet_v2_quantized_qat     |  0   |    nan    |
|      resnet50_quantized_qat       |  0   |    nan    |
+-----------------------------------+------+-----------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       MT5ForConditionalGeneration       |  8  |  2.5028  |
|            XLNetLMHeadModel             | 32  |  2.0138  |
|     MobileBertForQuestionAnswering      | 64  |  1.3572  |
|            YituTechConvBert             |  1  |  1.2789  |
|               DistillGPT2               |  1  |  1.2715  |
|          MobileBertForMaskedLM          | 32  |  1.125   |
|     M2M100ForConditionalGeneration      |  8  |  1.0679  |
|             XGLMForCausalLM             |  8  |  1.0595  |
|          AllenaiLongformerBase          |  1  |  1.0311  |
|       AlbertForQuestionAnswering        |  4  |  1.0261  |
|            AlbertForMaskedLM            |  4  |  1.0253  |
|                CamemBert                |  1  |  1.0154  |
|             OPTForCausalLM              | 32  |  0.9986  |
|                 T5Small                 |  1  |  0.9883  |
|           DebertaForMaskedLM            |  4  |  0.9875  |
|                 BigBird                 |  1  |  0.9837  |
|               GoogleFnet                |  1  |  0.9808  |
|      GPT2ForSequenceClassification      |  4  |  0.9491  |
|       DebertaForQuestionAnswering       |  8  |  0.9396  |
|         Speech2Text2ForCausalLM         | 128 |  0.9159  |
|     PLBartForConditionalGeneration      | 16  |  0.9126  |
|           RobertaForCausalLM            | 64  |  0.9119  |
|           ElectraForCausalLM            | 32  |  0.9106  |
|    LayoutLMForSequenceClassification    | 16  |  0.9102  |
|    MegatronBertForQuestionAnswering     | 16  |  0.9098  |
|     PegasusForConditionalGeneration     | 16  |  0.9068  |
|         MegatronBertForCausalLM         | 16  |  0.9059  |
|      MBartForConditionalGeneration      | 16  |  0.9038  |
|           LayoutLMForMaskedLM           | 16  |  0.8947  |
|     DistilBertForQuestionAnswering      | 64  |  0.8897  |
|       RobertaForQuestionAnswering       | 128 |  0.8895  |
|            MBartForCausalLM             | 32  |  0.8871  |
|        BertForQuestionAnswering         | 128 |  0.8853  |
|           PegasusForCausalLM            | 32  |  0.8853  |
|            PLBartForCausalLM            | 32  |  0.8813  |
|             BertForMaskedLM             | 64  |  0.8746  |
|       ElectraForQuestionAnswering       | 64  |  0.8578  |
|          DistilBertForMaskedLM          | 64  |  0.8569  |
|       T5ForConditionalGeneration        |  4  |  0.8568  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8282  |
|             BartForCausalLM             |  4  |  0.8269  |
|      BartForConditionalGeneration       |  2  |  0.8117  |
|       BlenderbotSmallForCausalLM        | 64  |  0.7943  |
|            TrOCRForCausalLM             | 32  |  0.789   |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|          MobileBertForMaskedLM          | 32  | 25.2372  |
|     PegasusForConditionalGeneration     | 16  | 24.4352  |
|     MobileBertForQuestionAnswering      | 64  | 24.4324  |
|          AllenaiLongformerBase          |  1  | 23.6894  |
|     M2M100ForConditionalGeneration      |  8  | 23.5087  |
|             OPTForCausalLM              | 32  | 22.7762  |
|      BartForConditionalGeneration       |  2  | 22.7697  |
|           DebertaForMaskedLM            |  4  |  22.595  |
|      MBartForConditionalGeneration      | 16  | 21.9287  |
|       DebertaForQuestionAnswering       |  8  | 21.5273  |
|             XGLMForCausalLM             |  8  | 20.8673  |
|                 BigBird                 |  1  | 19.9053  |
|         MegatronBertForCausalLM         | 16  | 19.6555  |
|          DistilBertForMaskedLM          | 64  | 19.2973  |
|    MegatronBertForQuestionAnswering     | 16  | 18.7873  |
| BlenderbotSmallForConditionalGeneration | 64  | 18.2002  |
|            YituTechConvBert             |  1  | 18.0638  |
|           PegasusForCausalLM            | 32  | 16.5992  |
|            AlbertForMaskedLM            |  4  | 16.4087  |
|       AlbertForQuestionAnswering        |  4  | 16.0927  |
|       MT5ForConditionalGeneration       |  8  | 15.9537  |
|       RobertaForQuestionAnswering       | 128 |  15.901  |
|        BertForQuestionAnswering         | 128 | 15.7772  |
|           LayoutLMForMaskedLM           | 16  | 15.4624  |
|       T5ForConditionalGeneration        |  4  | 15.4344  |
|             BartForCausalLM             |  4  | 15.2338  |
|           RobertaForCausalLM            | 64  | 15.2191  |
|           ElectraForCausalLM            | 32  | 15.1027  |
|       ElectraForQuestionAnswering       | 64  | 15.0604  |
|     PLBartForConditionalGeneration      | 16  | 15.0183  |
|             BertForMaskedLM             | 64  | 14.8992  |
|    LayoutLMForSequenceClassification    | 16  | 14.8256  |
|            TrOCRForCausalLM             | 32  | 14.5027  |
|            MBartForCausalLM             | 32  | 14.3014  |
|      GPT2ForSequenceClassification      |  4  | 14.2973  |
|                 T5Small                 |  1  | 13.4727  |
|                CamemBert                |  1  | 12.9114  |
|               GoogleFnet                |  1  |  12.517  |
|       BlenderbotSmallForCausalLM        | 64  | 12.3886  |
|         Speech2Text2ForCausalLM         | 128 |  12.212  |
|            PLBartForCausalLM            | 32  | 11.0534  |
|     DistilBertForQuestionAnswering      | 64  | 10.9706  |
|               DistillGPT2               |  1  | 10.0647  |
|            XLNetLMHeadModel             | 32  |  7.6572  |
+-----------------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  |  0.9964  |
|       AlbertForQuestionAnswering        |  4  |  0.9962  |
|             BartForCausalLM             |  4  |  0.9933  |
|       BlenderbotSmallForCausalLM        | 64  |  0.9931  |
|          DistilBertForMaskedLM          | 64  |  0.9928  |
|           ElectraForCausalLM            | 32  |  0.9925  |
|           RobertaForCausalLM            | 64  |  0.9923  |
|           DebertaForMaskedLM            |  4  |  0.9918  |
|       ElectraForQuestionAnswering       | 64  |  0.9915  |
|      GPT2ForSequenceClassification      |  4  |  0.9913  |
|             BertForMaskedLM             | 64  |  0.9912  |
|         Speech2Text2ForCausalLM         | 128 |  0.9908  |
|            PLBartForCausalLM            | 32  |  0.9901  |
|            TrOCRForCausalLM             | 32  |   0.99   |
|           PegasusForCausalLM            | 32  |  0.9898  |
|     DistilBertForQuestionAnswering      | 64  |  0.9896  |
|                 BigBird                 |  1  |  0.9892  |
|            MBartForCausalLM             | 32  |  0.9891  |
|             OPTForCausalLM              | 32  |  0.9891  |
|               GoogleFnet                |  1  |  0.9887  |
|        BertForQuestionAnswering         | 128 |  0.9886  |
|       RobertaForQuestionAnswering       | 128 |  0.9886  |
| BlenderbotSmallForConditionalGeneration | 64  |  0.9868  |
|     PegasusForConditionalGeneration     | 16  |  0.9851  |
|           LayoutLMForMaskedLM           | 16  |  0.9841  |
|       DebertaForQuestionAnswering       |  8  |  0.9836  |
|    LayoutLMForSequenceClassification    | 16  |  0.9811  |
|      BartForConditionalGeneration       |  2  |  0.978   |
|     PLBartForConditionalGeneration      | 16  |  0.9755  |
|       T5ForConditionalGeneration        |  4  |  0.9745  |
|               DistillGPT2               |  1  |  0.973   |
|            XLNetLMHeadModel             | 32  |  0.9711  |
|      MBartForConditionalGeneration      | 16  |  0.9687  |
|                CamemBert                |  1  |  0.9644  |
|             XGLMForCausalLM             |  8  |  0.9545  |
|         MegatronBertForCausalLM         | 16  |  0.9496  |
|     M2M100ForConditionalGeneration      |  8  |  0.9458  |
|          MobileBertForMaskedLM          | 32  |  0.9435  |
|          AllenaiLongformerBase          |  1  |  0.9049  |
|                 T5Small                 |  1  |  0.824   |
|            YituTechConvBert             |  1  |  0.7912  |
|     MobileBertForQuestionAnswering      | 64  |  0.7786  |
|    MegatronBertForQuestionAnswering     | 16  |  0.6873  |
|       MT5ForConditionalGeneration       |  8  |  0.6623  |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             | 32  | 5994.3722 |
|            AlbertForMaskedLM            |  4  | 2922.6433 |
|       AlbertForQuestionAnswering        |  4  | 2902.1749 |
|        BertForQuestionAnswering         | 128 | 1092.4142 |
|       RobertaForQuestionAnswering       | 128 | 1066.0875 |
|      BartForConditionalGeneration       |  2  | 1003.2364 |
|             BartForCausalLM             |  4  | 848.5397  |
|           LayoutLMForMaskedLM           | 16  | 836.0798  |
| BlenderbotSmallForConditionalGeneration | 64  | 685.6462  |
|    LayoutLMForSequenceClassification    | 16  |  680.853  |
|             BertForMaskedLM             | 64  |  678.306  |
|           RobertaForCausalLM            | 64  | 659.2635  |
|       ElectraForQuestionAnswering       | 64  | 653.3479  |
|            TrOCRForCausalLM             | 32  | 637.5153  |
|      MBartForConditionalGeneration      | 16  | 617.9418  |
|     PegasusForConditionalGeneration     | 16  | 613.0034  |
|            MBartForCausalLM             | 32  | 578.3349  |
|           PegasusForCausalLM            | 32  | 577.1994  |
|      GPT2ForSequenceClassification      |  4  | 512.4605  |
|         MegatronBertForCausalLM         | 16  | 504.3404  |
|           ElectraForCausalLM            | 32  | 493.0583  |
|       T5ForConditionalGeneration        |  4  |  466.512  |
|             XGLMForCausalLM             |  8  | 459.3132  |
|    MegatronBertForQuestionAnswering     | 16  | 455.0356  |
|          DistilBertForMaskedLM          | 64  | 418.8048  |
|       BlenderbotSmallForCausalLM        | 64  |  390.33   |
|     M2M100ForConditionalGeneration      |  8  | 371.0493  |
|             OPTForCausalLM              | 32  | 343.9418  |
|       DebertaForQuestionAnswering       |  8  | 333.8932  |
|     DistilBertForQuestionAnswering      | 64  | 259.0878  |
|       MT5ForConditionalGeneration       |  8  |  252.337  |
|            PLBartForCausalLM            | 32  | 252.0114  |
|           DebertaForMaskedLM            |  4  | 246.1688  |
|                 BigBird                 |  1  | 230.7581  |
|     PLBartForConditionalGeneration      | 16  | 225.1841  |
|          AllenaiLongformerBase          |  1  |  169.836  |
|         Speech2Text2ForCausalLM         | 128 | 166.7482  |
|          MobileBertForMaskedLM          | 32  | 147.1068  |
|     MobileBertForQuestionAnswering      | 64  | 133.1138  |
|                 T5Small                 |  1  | 125.0146  |
|            YituTechConvBert             |  1  |  69.9418  |
|                CamemBert                |  1  |  68.9437  |
|               GoogleFnet                |  1  |  51.2721  |
|               DistillGPT2               |  1  |  46.0628  |
+-----------------------------------------+-----+-----------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|          pnasnet5large          | 16  |  1.428   |
|           res2next50            | 128 |  1.3149  |
|          inception_v3           | 128 |  1.2957  |
|        res2net50_14w_8s         | 128 |  1.2881  |
|       gluon_inception_v3        | 128 |  1.2857  |
|        adv_inception_v3         | 128 |  1.2816  |
|          ghostnet_100           | 128 |  1.2418  |
|        res2net101_26w_4s        | 64  |  1.2204  |
|             dla102              | 128 |  1.2104  |
|            hrnet_w18            | 128 |  1.2092  |
|        gluon_xception65         | 32  |  1.2061  |
|           volo_d1_224           | 64  |  1.2018  |
|           fbnetc_100            | 128 |  1.1962  |
|          botnet26t_256          | 128 |   1.19   |
|        convmixer_768_32         | 32  |  1.185   |
|           mnasnet_100           | 128 |  1.177   |
|          spnasnet_100           | 128 |  1.169   |
|      mobilenetv3_large_100      | 128 |  1.1684  |
|        ese_vovnet19b_dw         | 128 |  1.1673  |
|           selecsls42b           | 128 |  1.1524  |
|             dpn107              | 32  |  1.1516  |
|            lcnet_050            | 128 |  1.1462  |
|            repvgg_a2            | 128 |  1.1177  |
|            gernet_l             | 128 |  1.1166  |
|           resnest101e           | 64  |  1.098   |
|            fbnetv3_b            | 128 |  1.0851  |
|         mobilenetv2_100         | 128 |  1.0691  |
|         visformer_small         | 128 |  1.0611  |
|          cspdarknet53           | 64  |  1.0599  |
|           regnety_002           | 128 |  1.0506  |
|           tf_mixnet_l           | 128 |  1.0345  |
|          gmixer_24_224          | 128 |  1.0061  |
|      xcit_large_24_p8_224       |  5  |  1.0038  |
|            mixnet_l             | 128 |  0.988   |
|          cait_m36_384           |  4  |  0.9747  |
|           dm_nfnet_f0           | 128 |  0.9249  |
|        eca_halonext26ts         | 128 |  0.9112  |
|       eca_botnext26ts_256       | 128 |  0.9037  |
|        sebotnet33ts_256         | 64  |   0.89   |
|  swin_base_patch4_window7_224   | 64  |  0.8812  |
|      beit_base_patch16_224      | 64  |  0.8688  |
|            nfnet_l0             | 128 |  0.8651  |
| deit_base_distilled_patch16_224 | 64  |  0.8525  |
|      vit_base_patch16_224       | 64  |  0.8456  |
|           convit_base           | 64  |  0.8455  |
|         poolformer_m36          | 64  |  0.8436  |
|          resmlp_12_224          | 128 |  0.8339  |
|          jx_nest_base           | 32  |  0.8315  |
|          convnext_base          | 64  |  0.8298  |
|           rexnet_100            | 128 |  0.8239  |
|       tf_efficientnet_b0        | 128 |  0.8048  |
|        tnt_s_patch16_224        | 128 |  0.7939  |
|          mixer_b16_224          | 128 |  0.7852  |
|            pit_b_224            | 64  |  0.7817  |
|            tinynet_a            | 128 |  0.7681  |
|           mobilevit_s           | 64  |  0.7643  |
|         coat_lite_mini          | 128 |  0.7525  |
|         crossvit_9_240          | 128 |  0.7215  |
|        twins_pcpvt_base         | 64  |  0.6998  |
|          gmlp_s16_224           | 128 |  0.6209  |
|     swsl_resnext101_32x16d      | 32  |  0.0628  |
+---------------------------------+-----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 2  |     pass      |
|          mixer_b16_224          | 2  |     pass      |
|        ese_vovnet19b_dw         | 2  |     pass      |
|          botnet26t_256          | 2  |     pass      |
|         coat_lite_mini          | 2  |     pass      |
|        convmixer_768_32         | 2  |     pass      |
|          convnext_base          | 2  |     pass      |
|         crossvit_9_240          | 2  |     pass      |
|          cspdarknet53           | 2  |     pass      |
| deit_base_distilled_patch16_224 | 2  |     pass      |
|             dla102              | 2  |     pass      |
|           dm_nfnet_f0           | 2  |     pass      |
|             dpn107              | 2  |     pass      |
|       eca_botnext26ts_256       | 2  |     pass      |
|        eca_halonext26ts         | 2  |     pass      |
|           mnasnet_100           | 2  |     pass      |
|           fbnetc_100            | 2  |     pass      |
|            fbnetv3_b            | 2  |     pass      |
|            gernet_l             | 2  |     pass      |
|       gluon_inception_v3        | 2  |     pass      |
|        gluon_xception65         | 2  |     pass      |
|          gmixer_24_224          | 2  |     pass      |
|          gmlp_s16_224           | 2  |     pass      |
|            hrnet_w18            | 2  |     pass      |
|          inception_v3           | 2  |     pass      |
|          jx_nest_base           | 2  |     pass      |
|            lcnet_050            | 2  |     pass      |
|      xcit_large_24_p8_224       | 2  |     pass      |
|      beit_base_patch16_224      | 2  |     pass      |
|            mixnet_l             | 2  |     pass      |
|         mobilenetv2_100         | 2  |     pass      |
|           volo_d1_224           | 2  |     pass      |
|      mobilenetv3_large_100      | 2  |     pass      |
|           mobilevit_s           | 2  |     pass      |
|            nfnet_l0             | 2  |     pass      |
|            pit_b_224            | 2  |     pass      |
|          pnasnet5large          | 2  |     pass      |
|         poolformer_m36          | 2  |     pass      |
|           regnety_002           | 2  |     pass      |
|            repvgg_a2            | 2  |     pass      |
|        res2net101_26w_4s        | 2  |     pass      |
|        res2net50_14w_8s         | 2  |     pass      |
|           res2next50            | 2  |     pass      |
|          resmlp_12_224          | 2  |     pass      |
|           resnest101e           | 2  |     pass      |
|           rexnet_100            | 2  |     pass      |
|        sebotnet33ts_256         | 2  |     pass      |
|           selecsls42b           | 2  |     pass      |
|          spnasnet_100           | 2  |     pass      |
|  swin_base_patch4_window7_224   | 2  |     pass      |
|     swsl_resnext101_32x16d      | 2  |     pass      |
|       tf_efficientnet_b0        | 2  |     pass      |
|           tf_mixnet_l           | 2  |     pass      |
|            tinynet_a            | 2  |     pass      |
|        tnt_s_patch16_224        | 2  |     pass      |
|        twins_pcpvt_base         | 2  |     pass      |
|         visformer_small         | 2  |     pass      |
|      vit_base_patch16_224       | 2  |     pass      |
|           convit_base           | 2  |  fail_to_run  |
|          ghostnet_100           | 2  | fail_accuracy |
|          cait_m36_384           | 2  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|     swsl_resnext101_32x16d      | 32  | 125.8047 |
|          pnasnet5large          | 16  |  43.565  |
|            hrnet_w18            | 128 | 29.4082  |
|  swin_base_patch4_window7_224   | 64  | 28.9151  |
|          cait_m36_384           |  4  |  28.216  |
|       tf_efficientnet_b0        | 128 | 28.1704  |
|        twins_pcpvt_base         | 64  | 27.1524  |
|           tf_mixnet_l           | 128 | 26.7534  |
|           dm_nfnet_f0           | 128 |  26.659  |
|             dla102              | 128 | 25.8052  |
|           mobilevit_s           | 64  | 25.3669  |
|         poolformer_m36          | 64  |  25.07   |
|      xcit_large_24_p8_224       |  5  | 24.4275  |
|            mixnet_l             | 128 | 24.1128  |
|           resnest101e           | 64  | 23.5432  |
|        tnt_s_patch16_224        | 128 | 23.3802  |
|        res2net50_14w_8s         | 128 | 22.9929  |
|        sebotnet33ts_256         | 64  | 22.8585  |
|            nfnet_l0             | 128 |  21.852  |
|            fbnetv3_b            | 128 |  21.561  |
|             dpn107              | 32  | 21.4473  |
|        eca_halonext26ts         | 128 | 21.2382  |
|          gmlp_s16_224           | 128 | 20.9871  |
|          jx_nest_base           | 32  | 20.8849  |
|           rexnet_100            | 128 | 20.4116  |
|        res2net101_26w_4s        | 64  | 20.3678  |
|       eca_botnext26ts_256       | 128 | 19.5379  |
|          convnext_base          | 64  | 19.2767  |
|         coat_lite_mini          | 128 | 18.7639  |
|           volo_d1_224           | 64  | 18.7463  |
|         crossvit_9_240          | 128 | 18.5428  |
|           convit_base           | 64  | 18.2128  |
|            pit_b_224            | 64  | 18.0961  |
|            tinynet_a            | 128 | 17.9042  |
| deit_base_distilled_patch16_224 | 64  | 17.3892  |
|          botnet26t_256          | 128 | 16.9784  |
|          cspdarknet53           | 64  | 16.4149  |
|        adv_inception_v3         | 128 | 16.2581  |
|          inception_v3           | 128 | 16.1081  |
|       gluon_inception_v3        | 128 | 16.0881  |
|           res2next50            | 128 | 15.8701  |
|          mixer_b16_224          | 128 | 15.8593  |
|          gmixer_24_224          | 128 | 15.7285  |
|         visformer_small         | 128 | 15.2266  |
|      beit_base_patch16_224      | 64  | 15.0171  |
|      vit_base_patch16_224       | 64  | 14.6177  |
|         mobilenetv2_100         | 128 | 14.4065  |
|           fbnetc_100            | 128 | 14.3854  |
|          spnasnet_100           | 128 | 14.2649  |
|      mobilenetv3_large_100      | 128 | 14.1979  |
|        convmixer_768_32         | 32  | 14.1311  |
|        gluon_xception65         | 32  |  14.13   |
|          ghostnet_100           | 128 | 13.8001  |
|            gernet_l             | 128 | 13.5367  |
|           mnasnet_100           | 128 | 12.9553  |
|            repvgg_a2            | 128 | 12.9382  |
|           regnety_002           | 128 | 12.7798  |
|        ese_vovnet19b_dw         | 128 | 12.2247  |
|          resmlp_12_224          | 128 | 11.5833  |
|           selecsls42b           | 128 | 10.7187  |
|            lcnet_050            | 128 | 10.4701  |
+---------------------------------+-----+----------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+
|              name               | bs  | inductor |
+---------------------------------+-----+----------+
|        ese_vovnet19b_dw         | 128 |  0.9973  |
|           selecsls42b           | 128 |  0.9972  |
|           resnest101e           | 64  |  0.9971  |
|        gluon_xception65         | 32  |  0.9971  |
|           res2next50            | 128 |  0.9968  |
|        eca_halonext26ts         | 128 |  0.9968  |
|       eca_botnext26ts_256       | 128 |  0.9967  |
|       gluon_inception_v3        | 128 |  0.9966  |
|        adv_inception_v3         | 128 |  0.9965  |
|           tf_mixnet_l           | 128 |  0.9965  |
|        res2net50_14w_8s         | 128 |  0.9963  |
|            mixnet_l             | 128 |  0.9963  |
|          botnet26t_256          | 128 |  0.9962  |
|          mixer_b16_224          | 128 |  0.9959  |
|          cspdarknet53           | 64  |  0.9956  |
|            pit_b_224            | 64  |  0.9953  |
|             dla102              | 128 |  0.9951  |
|            hrnet_w18            | 128 |  0.995   |
|          resmlp_12_224          | 128 |  0.995   |
|         coat_lite_mini          | 128 |  0.9949  |
|           convit_base           | 64  |  0.9948  |
|            gernet_l             | 128 |  0.9946  |
|        res2net101_26w_4s        | 64  |  0.9946  |
|           mobilevit_s           | 64  |  0.9945  |
|           dm_nfnet_f0           | 128 |  0.9945  |
|       tf_efficientnet_b0        | 128 |  0.9944  |
|           rexnet_100            | 128 |  0.9942  |
| deit_base_distilled_patch16_224 | 64  |  0.9941  |
|      vit_base_patch16_224       | 64  |  0.9939  |
|          gmixer_24_224          | 128 |  0.9939  |
|        sebotnet33ts_256         | 64  |  0.9938  |
|      beit_base_patch16_224      | 64  |  0.9936  |
|             dpn107              | 32  |  0.9935  |
|         visformer_small         | 128 |  0.9933  |
|        tnt_s_patch16_224        | 128 |  0.9931  |
|            nfnet_l0             | 128 |  0.993   |
|          ghostnet_100           | 128 |  0.9928  |
|         mobilenetv2_100         | 128 |  0.9927  |
|          inception_v3           | 128 |  0.9925  |
|          convnext_base          | 64  |  0.9924  |
|           mnasnet_100           | 128 |  0.9923  |
|            tinynet_a            | 128 |  0.9921  |
|            fbnetv3_b            | 128 |  0.9919  |
|            repvgg_a2            | 128 |  0.9916  |
|      mobilenetv3_large_100      | 128 |  0.9915  |
|           fbnetc_100            | 128 |  0.9914  |
|        convmixer_768_32         | 32  |  0.9912  |
|          spnasnet_100           | 128 |  0.9912  |
|          gmlp_s16_224           | 128 |  0.991   |
|          pnasnet5large          | 16  |  0.991   |
|         crossvit_9_240          | 128 |  0.9904  |
|  swin_base_patch4_window7_224   | 64  |  0.9889  |
|     swsl_resnext101_32x16d      | 32  |  0.9877  |
|            lcnet_050            | 128 |  0.9861  |
|           volo_d1_224           | 64  |  0.9845  |
|           regnety_002           | 128 |  0.9837  |
|         poolformer_m36          | 64  |  0.9817  |
|          jx_nest_base           | 32  |  0.981   |
|          cait_m36_384           |  4  |  0.9783  |
|        twins_pcpvt_base         | 64  |  0.9755  |
|      xcit_large_24_p8_224       |  5  |  0.9709  |
+---------------------------------+-----+----------+

Absolute latency (ms)

+---------------------------------+-----+------------+
|              name               | bs  |  inductor  |
+---------------------------------+-----+------------+
|     swsl_resnext101_32x16d      | 32  | 19530.1615 |
|           resnest101e           | 64  | 3623.9337  |
|           dm_nfnet_f0           | 128 | 1893.5537  |
|            nfnet_l0             | 128 | 1387.5817  |
|          mixer_b16_224          | 128 | 1372.1228  |
|          cait_m36_384           |  4  | 1300.7156  |
|           convit_base           | 64  | 1275.8401  |
|        tnt_s_patch16_224        | 128 |  1158.66   |
|  swin_base_patch4_window7_224   | 64  | 1133.0562  |
|          gmlp_s16_224           | 128 | 1070.7234  |
|          convnext_base          | 64  |  960.0547  |
|            pit_b_224            | 64  |  954.8482  |
|      vit_base_patch16_224       | 64  |  931.6392  |
|             dla102              | 128 |  929.1243  |
|         poolformer_m36          | 64  |  925.6955  |
| deit_base_distilled_patch16_224 | 64  |  918.7619  |
|      beit_base_patch16_224      | 64  |  915.5773  |
|            hrnet_w18            | 128 |  898.8086  |
|        eca_halonext26ts         | 128 |  750.3109  |
|       eca_botnext26ts_256       | 128 |  742.1284  |
|        convmixer_768_32         | 32  |  717.7093  |
|        twins_pcpvt_base         | 64  |  674.5591  |
|          inception_v3           | 128 |  671.6297  |
|       gluon_inception_v3        | 128 |  671.5278  |
|        adv_inception_v3         | 128 |  669.7301  |
|          gmixer_24_224          | 128 |  655.8327  |
|          jx_nest_base           | 32  |  652.3996  |
|        res2net50_14w_8s         | 128 |  636.1703  |
|           res2next50            | 128 |  634.4357  |
|           tf_mixnet_l           | 128 |  622.4799  |
|            repvgg_a2            | 128 |  616.9888  |
|         coat_lite_mini          | 128 |  611.4089  |
|            mixnet_l             | 128 |  607.6028  |
|         visformer_small         | 128 |  601.7692  |
|          botnet26t_256          | 128 |  582.6177  |
|          pnasnet5large          | 16  |  576.5913  |
|           volo_d1_224           | 64  |  574.1977  |
|             dpn107              | 32  |  572.7044  |
|        sebotnet33ts_256         | 64  |  549.5431  |
|      xcit_large_24_p8_224       |  5  |  547.8695  |
|           mobilevit_s           | 64  |  538.4283  |
|        res2net101_26w_4s        | 64  |  529.9529  |
|          cspdarknet53           | 64  |  493.7609  |
|        gluon_xception65         | 32  |  487.5553  |
|            gernet_l             | 128 |  484.6508  |
|         crossvit_9_240          | 128 |  475.0907  |
|       tf_efficientnet_b0        | 128 |  432.611   |
|           rexnet_100            | 128 |  404.8396  |
|          resmlp_12_224          | 128 |  370.2756  |
|        ese_vovnet19b_dw         | 128 |  352.5807  |
|           selecsls42b           | 128 |  351.4123  |
|            fbnetv3_b            | 128 |  346.4762  |
|            tinynet_a            | 128 |  306.0505  |
|         mobilenetv2_100         | 128 |  250.3617  |
|           fbnetc_100            | 128 |  217.3378  |
|          spnasnet_100           | 128 |  195.4492  |
|           mnasnet_100           | 128 |  182.8092  |
|      mobilenetv3_large_100      | 128 |  152.9359  |
|          ghostnet_100           | 128 |  149.1116  |
|           regnety_002           | 128 |  89.2034   |
|            lcnet_050            | 128 |  38.4039   |
+---------------------------------+-----+------------+

ESI-SYD · 2022-11-18T08:20:22Z

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-16 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information

SW	Nightly commit	Master/Main commit
Pytorch	0662e90	e2f0648
Torchbench	/	022dfe3
torchaudio	4b10b6a	74f9a89
torchtext	71e4561	c047efe
torchvision	797e1ac	ffd5a56

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 20.04.5 LTS
Kernel	5.15.0-1022-aws
Microcode	0xd000331
GCC	gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
GLIBC	ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.34
Python	Python 3.8.13
OpenSSL	OpenSSL 1.1.1s 1 Nov 2022

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 93%, 51/55 | 100%, 44/44 | 95%, 58/61  |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.07x    |    1.03x    |    1.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   22.72    |    24.36    |    31.14    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.89x    |    0.88x    |    0.88x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_Reformer            |  1  |  1.4271  |
|           mobilenet_v2            |  1  |  1.3724  |
|           timm_resnest            |  1  |  1.3189  |
| attention_is_all_you_need_pytorch |  1  |  1.2846  |
|        speech_transformer         |  1  |  1.251   |
|           squeezenet1_1           |  1  |  1.2136  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  1.1842  |
|            timm_nfnet             |  1  |  1.1784  |
|             resnet18              |  1  |  1.1612  |
|         soft_actor_critic         | 256 |  1.1594  |
|          vision_maskrcnn          |  1  |  1.1251  |
|           pytorch_unet            |  1  |  1.1247  |
|              hf_GPT2              |  1  |  1.1184  |
|            timm_vovnet            |  1  |  1.1149  |
|               vgg16               |  1  |  1.0887  |
|        Background_Matting         |  1  |  1.0855  |
|        shufflenet_v2_x1_0         |  1  |  1.0746  |
|              alexnet              |  1  |  1.0736  |
|            mnasnet1_0             |  1  |  1.0654  |
|            densenet121            |  1  |  1.065   |
|            timm_regnet            |  1  |  1.0611  |
|            Super_SloMo            |  1  |  1.0605  |
|              yolov3               |  1  |  1.0592  |
|             resnet50              |  1  |  1.0561  |
|          resnext50_32x4d          |  1  |  1.0498  |
|        mobilenet_v3_large         |  1  |  1.0495  |
|                drq                |  1  |  1.0488  |
|               dlrm                |  1  |  1.0433  |
|          LearningToPaint          |  1  |  1.0197  |
|               dcgan               |  1  |  1.0196  |
|          pytorch_stargan          | 16  |  1.0116  |
|      timm_vision_transformer      |  1  |  1.0066  |
|            tts_angular            |  1  |  1.0008  |
|             resnet152             |  1  |  1.0006  |
|              demucs               |  1  |  0.9998  |
|            hf_BigBird             |  1  |  0.9854  |
|      nvidia_deeprecommender       |  1  |  0.9782  |
|              hf_Bert              |  1  |  0.9751  |
|           hf_DistilBert           |  1  |  0.9618  |
|           BERT_pytorch            |  1  |  0.8705  |
|   timm_vision_transformer_large   |  1  |  0.8504  |
|           hf_Longformer           |  1  |  0.8393  |
|            hf_T5_large            |  1  |  0.7976  |
|         timm_efficientnet         |  1  |  0.7935  |
|           lennard_jones           |  1  |  0.7858  |
|             hf_Albert             |  1  |  0.7802  |
|              hf_Bart              |  1  |  0.747   |
|           hf_GPT2_large           |  1  |  0.7421  |
|           fastNLP_Bert            |  1  |  0.7328  |
|               hf_T5               |  1  |  0.725   |
|            hf_T5_base             |  1  |  0.6576  |
|       functorch_dp_cifar10        |  1  |  0.2274  |
|      resnet50_quantized_qat       |  0  |   0.0    |
|     detectron2_fcos_r_50_fpn      |  0  |   0.0    |
|    mobilenet_v2_quantized_qat     |  0  |   0.0    |
+-----------------------------------+-----+----------+

Accuracy

+-----------------------------------+-----+------------------+
|               name                | bs  |     inductor     |
+-----------------------------------+-----+------------------+
|            hf_T5_large            |  1  | pass_due_to_skip |
|           hf_GPT2_large           |  1  | pass_due_to_skip |
|   timm_vision_transformer_large   |  1  | pass_due_to_skip |
|           fastNLP_Bert            |  1  |       pass       |
|             hf_Albert             |  1  |       pass       |
|          LearningToPaint          |  1  |       pass       |
|            Super_SloMo            |  1  |       pass       |
|              alexnet              |  1  |       pass       |
| attention_is_all_you_need_pytorch |  1  |       pass       |
|               dcgan               |  1  |       pass       |
|              demucs               |  1  |       pass       |
|            densenet121            |  1  |       pass       |
|               dlrm                |  1  |       pass       |
|                drq                |  1  |       pass       |
|       functorch_dp_cifar10        |  1  |       pass       |
|              hf_Bart              |  1  |       pass       |
|              hf_Bert              |  1  |       pass       |
|            hf_BigBird             |  1  |       pass       |
|           hf_DistilBert           |  1  |       pass       |
|              hf_GPT2              |  1  |       pass       |
|           hf_Longformer           |  1  |       pass       |
|            hf_Reformer            |  1  |       pass       |
|               hf_T5               |  1  |       pass       |
|            hf_T5_base             |  1  |       pass       |
|              yolov3               |  1  |       pass       |
|           lennard_jones           |  1  |       pass       |
|            mnasnet1_0             |  1  |       pass       |
|           mobilenet_v2            |  1  |       pass       |
|          resnext50_32x4d          |  1  |       pass       |
|           BERT_pytorch            |  1  |       pass       |
|        shufflenet_v2_x1_0         |  1  |       pass       |
|        mobilenet_v3_large         |  1  |       pass       |
|      nvidia_deeprecommender       |  1  |       pass       |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |
|          pytorch_stargan          | 16  |       pass       |
|           pytorch_unet            |  1  |       pass       |
|             resnet152             |  1  |       pass       |
|             resnet18              |  1  |       pass       |
|               vgg16               |  1  |       pass       |
|             resnet50              |  1  |       pass       |
|         soft_actor_critic         | 256 |       pass       |
|        Background_Matting         |  1  |       pass       |
|        speech_transformer         |  1  |       pass       |
|           squeezenet1_1           |  1  |       pass       |
|         timm_efficientnet         |  1  |       pass       |
|            timm_nfnet             |  1  |       pass       |
|            timm_regnet            |  1  |       pass       |
|           timm_resnest            |  1  |       pass       |
|      timm_vision_transformer      |  1  |       pass       |
|            timm_vovnet            |  1  |       pass       |
|            tts_angular            |  1  |       pass       |
|    mobilenet_v2_quantized_qat     |  1  |   fail_to_run    |
|      resnet50_quantized_qat       |  1  |   fail_to_run    |
|     detectron2_fcos_r_50_fpn      |  1  |   fail_to_run    |
|          vision_maskrcnn          |  0  |      0.0000      |
+-----------------------------------+-----+------------------+

Compilation latency (sec)

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|            hf_T5_base             |  1  |  88.253  |
|           hf_GPT2_large           |  1  | 75.2972  |
|            hf_T5_large            |  1  | 73.7311  |
|            densenet121            |  1  | 58.6129  |
|          vision_maskrcnn          |  1  | 51.5459  |
|            timm_nfnet             |  1  | 38.6469  |
|         timm_efficientnet         |  1  |  37.736  |
|           hf_Longformer           |  1  |  36.419  |
|   timm_vision_transformer_large   |  1  | 34.0175  |
|            Super_SloMo            |  1  | 31.6949  |
|              yolov3               |  1  | 31.2274  |
|            timm_regnet            |  1  | 28.7742  |
|            timm_vovnet            |  1  | 28.6898  |
|           pytorch_unet            |  1  | 28.5344  |
|           BERT_pytorch            |  1  | 27.3763  |
|            hf_Reformer            |  1  | 24.8126  |
|        Background_Matting         |  1  | 24.2711  |
|               hf_T5               |  1  | 23.8201  |
|            hf_BigBird             |  1  | 23.4584  |
|        speech_transformer         |  1  |  23.414  |
|              hf_Bart              |  1  | 22.4221  |
|              hf_GPT2              |  1  | 19.2972  |
|           fastNLP_Bert            |  1  | 18.3493  |
|       functorch_dp_cifar10        |  1  | 17.6411  |
|              hf_Bert              |  1  |  17.076  |
|      timm_vision_transformer      |  1  |  16.911  |
|             resnet152             |  1  | 16.3772  |
|        mobilenet_v3_large         |  1  |  16.242  |
|             hf_Albert             |  1  | 16.1589  |
| attention_is_all_you_need_pytorch |  1  | 16.1319  |
|           timm_resnest            |  1  | 15.8468  |
|        shufflenet_v2_x1_0         |  1  | 15.7815  |
|           mobilenet_v2            |  1  | 14.9252  |
|          pytorch_stargan          | 16  | 14.3669  |
|           hf_DistilBert           |  1  | 14.1932  |
|          LearningToPaint          |  1  | 13.8432  |
|   pytorch_CycleGAN_and_pix2pix    |  1  | 13.7945  |
|           squeezenet1_1           |  1  |   12.1   |
|                drq                |  1  | 11.5885  |
|               vgg16               |  1  | 11.4219  |
|             resnet50              |  1  | 10.9111  |
|          resnext50_32x4d          |  1  | 10.9055  |
|               dlrm                |  1  | 10.5436  |
|      nvidia_deeprecommender       |  1  | 10.1483  |
|              alexnet              |  1  |  9.7466  |
|            mnasnet1_0             |  1  |  9.7384  |
|             resnet18              |  1  |  9.4164  |
|            tts_angular            |  1  |  8.4291  |
|         soft_actor_critic         | 256 |  8.4037  |
|               dcgan               |  1  |  8.0043  |
|           lennard_jones           |  1  |  7.9502  |
|              demucs               |  1  |  1.496   |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
|    mobilenet_v2_quantized_qat     |  0  |   nan    |
|      resnet50_quantized_qat       |  0  |   nan    |
+-----------------------------------+-----+----------+

Peak Memory Compression Ratio

+-----------------------------------+-----+----------+
|               name                | bs  | inductor |
+-----------------------------------+-----+----------+
|              demucs               |  1  |  0.9982  |
|               dlrm                |  1  |  0.998   |
|        Background_Matting         |  1  |  0.9966  |
|           pytorch_unet            |  1  |  0.9904  |
|         soft_actor_critic         | 256 |  0.9865  |
|            tts_angular            |  1  |  0.9841  |
|           lennard_jones           |  1  |  0.9837  |
|                drq                |  1  |  0.9828  |
|            hf_BigBird             |  1  |  0.9805  |
|            Super_SloMo            |  1  |  0.9766  |
|              alexnet              |  1  |  0.9754  |
|               vgg16               |  1  |  0.9684  |
|           hf_DistilBert           |  1  |  0.9654  |
|          pytorch_stargan          | 16  |  0.9648  |
|   timm_vision_transformer_large   |  1  |  0.9642  |
|              hf_GPT2              |  1  |  0.9634  |
|        speech_transformer         |  1  |  0.9584  |
|              hf_Bert              |  1  |  0.9517  |
|              hf_Bart              |  1  |  0.9513  |
|            timm_vovnet            |  1  |  0.9511  |
|           squeezenet1_1           |  1  |  0.9499  |
|               dcgan               |  1  |  0.9496  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  0.9469  |
|       functorch_dp_cifar10        |  1  |  0.9449  |
| attention_is_all_you_need_pytorch |  1  |  0.9386  |
|             hf_Albert             |  1  |  0.933   |
|            mnasnet1_0             |  1  |  0.9325  |
|        shufflenet_v2_x1_0         |  1  |  0.9319  |
|           mobilenet_v2            |  1  |  0.9293  |
|        mobilenet_v3_large         |  1  |  0.924   |
|          LearningToPaint          |  1  |  0.9204  |
|            timm_regnet            |  1  |  0.9195  |
|            hf_Reformer            |  1  |  0.9193  |
|      timm_vision_transformer      |  1  |  0.9137  |
|           timm_resnest            |  1  |  0.8976  |
|         timm_efficientnet         |  1  |  0.8943  |
|             resnet18              |  1  |  0.8891  |
|          vision_maskrcnn          |  1  |  0.8685  |
|          resnext50_32x4d          |  1  |  0.8412  |
|             resnet50              |  1  |  0.8406  |
|            densenet121            |  1  |  0.8205  |
|           BERT_pytorch            |  1  |  0.8189  |
|               hf_T5               |  1  |  0.8059  |
|           hf_GPT2_large           |  1  |  0.7913  |
|             resnet152             |  1  |  0.7678  |
|              yolov3               |  1  |  0.7543  |
|            hf_T5_base             |  1  |  0.7463  |
|           fastNLP_Bert            |  1  |  0.7452  |
|           hf_Longformer           |  1  |  0.6739  |
|            timm_nfnet             |  1  |  0.6547  |
|      nvidia_deeprecommender       |  1  |  0.4974  |
|            hf_T5_large            |  1  |  0.4698  |
|     detectron2_fcos_r_50_fpn      |  0  |   nan    |
|    mobilenet_v2_quantized_qat     |  0  |   nan    |
|      resnet50_quantized_qat       |  0  |   nan    |
+-----------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------+-----+------------+
|               name                | bs  |  inductor  |
+-----------------------------------+-----+------------+
|            hf_T5_base             |  1  | 24444.0013 |
|           hf_GPT2_large           |  1  | 16924.0607 |
|            hf_T5_large            |  1  | 7376.7352  |
|        Background_Matting         |  1  | 5595.8396  |
|           pytorch_unet            |  1  | 4490.7341  |
|   timm_vision_transformer_large   |  1  | 3861.5341  |
|          vision_maskrcnn          |  1  |  2905.452  |
|          pytorch_stargan          | 16  |  2367.288  |
|              demucs               |  1  | 2262.7917  |
|            Super_SloMo            |  1  | 2218.7874  |
|            hf_BigBird             |  1  | 1796.9369  |
|              hf_Bart              |  1  | 1362.8665  |
|           hf_Longformer           |  1  | 1329.0983  |
|              hf_Bert              |  1  | 1099.2459  |
|             hf_Albert             |  1  |  928.676   |
|           fastNLP_Bert            |  1  |  877.165   |
|        speech_transformer         |  1  |  814.6247  |
|               hf_T5               |  1  |  748.1866  |
|   pytorch_CycleGAN_and_pix2pix    |  1  |  639.1924  |
|           hf_DistilBert           |  1  |  623.1241  |
|              yolov3               |  1  |  547.2033  |
|              hf_GPT2              |  1  |  538.6555  |
|            hf_Reformer            |  1  |  490.097   |
|             resnet152             |  1  |  327.4474  |
|               vgg16               |  1  |  241.9179  |
|            timm_nfnet             |  1  |  225.1512  |
|           BERT_pytorch            |  1  |  210.2374  |
|            timm_regnet            |  1  |  204.6491  |
|            timm_vovnet            |  1  |  152.4075  |
|             resnet50              |  1  |  130.0606  |
|          resnext50_32x4d          |  1  |  124.8423  |
|            densenet121            |  1  |  116.4706  |
|      timm_vision_transformer      |  1  |  108.2699  |
|           timm_resnest            |  1  |  71.7184   |
|         timm_efficientnet         |  1  |    61.7    |
|             resnet18              |  1  |  60.8499   |
|            tts_angular            |  1  |  54.8638   |
|      nvidia_deeprecommender       |  1  |  54.5474   |
| attention_is_all_you_need_pytorch |  1  |  34.8207   |
|            mnasnet1_0             |  1  |  31.6789   |
|              alexnet              |  1  |  31.0992   |
|        mobilenet_v3_large         |  1  |  27.8738   |
|           mobilenet_v2            |  1  |  27.2902   |
|       functorch_dp_cifar10        |  1  |  27.2832   |
|           squeezenet1_1           |  1  |  20.6629   |
|        shufflenet_v2_x1_0         |  1  |  19.9144   |
|          LearningToPaint          |  1  |  14.7087   |
|               dcgan               |  1  |   7.3288   |
|         soft_actor_critic         | 256 |   3.9583   |
|                drq                |  1  |   3.519    |
|               dlrm                |  1  |   0.703    |
|           lennard_jones           |  1  |   0.0778   |
|     detectron2_fcos_r_50_fpn      |  0  |    nan     |
|    mobilenet_v2_quantized_qat     |  0  |    nan     |
|      resnet50_quantized_qat       |  0  |    nan     |
+-----------------------------------+-----+------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|     MobileBertForQuestionAnswering      | 1  |  1.9192  |
|               GoogleFnet                | 1  |  1.2296  |
|          MobileBertForMaskedLM          | 1  |  1.2079  |
|         Speech2Text2ForCausalLM         | 1  |  1.1074  |
|            XLNetLMHeadModel             | 1  |  1.1002  |
|             OPTForCausalLM              | 1  |  1.0451  |
|     M2M100ForConditionalGeneration      | 1  |  1.0349  |
|        BertForQuestionAnswering         | 1  |  1.0318  |
|      MBartForConditionalGeneration      | 1  |  1.0187  |
|       DebertaForQuestionAnswering       | 1  |  1.0111  |
|     PegasusForConditionalGeneration     | 1  |  1.009   |
|       RobertaForQuestionAnswering       | 1  |  1.0082  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.9945  |
|     DistilBertForQuestionAnswering      | 1  |  0.9928  |
|     PLBartForConditionalGeneration      | 1  |  0.9888  |
|                 BigBird                 | 1  |  0.9722  |
|            TrOCRForCausalLM             | 1  |  0.9624  |
|            MBartForCausalLM             | 1  |  0.9532  |
|           PegasusForCausalLM            | 1  |  0.951   |
|       AlbertForQuestionAnswering        | 1  |  0.9423  |
|       ElectraForQuestionAnswering       | 1  |  0.9421  |
|            AlbertForMaskedLM            | 1  |  0.9392  |
|          DistilBertForMaskedLM          | 1  |  0.9244  |
|            PLBartForCausalLM            | 1  |  0.9079  |
|           DebertaForMaskedLM            | 1  |  0.8949  |
|       BlenderbotSmallForCausalLM        | 1  |  0.8846  |
|            YituTechConvBert             | 1  |  0.863   |
|    MegatronBertForQuestionAnswering     | 1  |  0.854   |
|           ElectraForCausalLM            | 1  |  0.8465  |
|         MegatronBertForCausalLM         | 1  |  0.8386  |
|          AllenaiLongformerBase          | 1  |  0.815   |
|             XGLMForCausalLM             | 1  |  0.8107  |
|           RobertaForCausalLM            | 1  |  0.7954  |
|             BertForMaskedLM             | 1  |  0.787   |
|               DistillGPT2               | 1  |  0.7215  |
|    LayoutLMForSequenceClassification    | 1  |  0.7078  |
|           LayoutLMForMaskedLM           | 1  |  0.7049  |
|       MT5ForConditionalGeneration       | 1  |  0.6901  |
|                CamemBert                | 1  |  0.6848  |
|             BartForCausalLM             | 1  |  0.6845  |
|      GPT2ForSequenceClassification      | 1  |  0.6648  |
|      BartForConditionalGeneration       | 1  |  0.6546  |
|       T5ForConditionalGeneration        | 1  |  0.6156  |
|                 T5Small                 | 1  |  0.6139  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|            AlbertForMaskedLM            | 1  |   pass   |
|       AlbertForQuestionAnswering        | 1  |   pass   |
|                CamemBert                | 1  |   pass   |
|          AllenaiLongformerBase          | 1  |   pass   |
|             BartForCausalLM             | 1  |   pass   |
|      BartForConditionalGeneration       | 1  |   pass   |
|             BertForMaskedLM             | 1  |   pass   |
|        BertForQuestionAnswering         | 1  |   pass   |
|                 BigBird                 | 1  |   pass   |
|       BlenderbotSmallForCausalLM        | 1  |   pass   |
| BlenderbotSmallForConditionalGeneration | 1  |   pass   |
|           DebertaForMaskedLM            | 1  |   pass   |
|           LayoutLMForMaskedLM           | 1  |   pass   |
|       DebertaForQuestionAnswering       | 1  |   pass   |
|          DistilBertForMaskedLM          | 1  |   pass   |
|     DistilBertForQuestionAnswering      | 1  |   pass   |
|               DistillGPT2               | 1  |   pass   |
|           ElectraForCausalLM            | 1  |   pass   |
|       ElectraForQuestionAnswering       | 1  |   pass   |
|      GPT2ForSequenceClassification      | 1  |   pass   |
|               GoogleFnet                | 1  |   pass   |
|    LayoutLMForSequenceClassification    | 1  |   pass   |
|     M2M100ForConditionalGeneration      | 1  |   pass   |
|            MBartForCausalLM             | 1  |   pass   |
|     PLBartForConditionalGeneration      | 1  |   pass   |
|      MBartForConditionalGeneration      | 1  |   pass   |
|       MT5ForConditionalGeneration       | 1  |   pass   |
|         MegatronBertForCausalLM         | 1  |   pass   |
|    MegatronBertForQuestionAnswering     | 1  |   pass   |
|          MobileBertForMaskedLM          | 1  |   pass   |
|     MobileBertForQuestionAnswering      | 1  |   pass   |
|             OPTForCausalLM              | 1  |   pass   |
|            PLBartForCausalLM            | 1  |   pass   |
|           PegasusForCausalLM            | 1  |   pass   |
|            XLNetLMHeadModel             | 1  |   pass   |
|     PegasusForConditionalGeneration     | 1  |   pass   |
|           RobertaForCausalLM            | 1  |   pass   |
|       RobertaForQuestionAnswering       | 1  |   pass   |
|         Speech2Text2ForCausalLM         | 1  |   pass   |
|       T5ForConditionalGeneration        | 1  |   pass   |
|                 T5Small                 | 1  |   pass   |
|            TrOCRForCausalLM             | 1  |   pass   |
|             XGLMForCausalLM             | 1  |   pass   |
|            YituTechConvBert             | 1  |   pass   |
+-----------------------------------------+----+----------+

Compilation latency (sec)

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|      BartForConditionalGeneration       | 1  | 48.9971  |
|       AlbertForQuestionAnswering        | 1  | 44.5563  |
|          AllenaiLongformerBase          | 1  |  43.111  |
|            AlbertForMaskedLM            | 1  | 35.9983  |
|    MegatronBertForQuestionAnswering     | 1  |  35.094  |
|      GPT2ForSequenceClassification      | 1  |   33.0   |
|          MobileBertForMaskedLM          | 1  | 31.5125  |
|     M2M100ForConditionalGeneration      | 1  | 31.1394  |
|                 T5Small                 | 1  | 31.0737  |
|       MT5ForConditionalGeneration       | 1  | 30.9215  |
|     MobileBertForQuestionAnswering      | 1  | 30.6931  |
|     PegasusForConditionalGeneration     | 1  | 30.2567  |
|       T5ForConditionalGeneration        | 1  | 29.9066  |
|             XGLMForCausalLM             | 1  | 29.3261  |
|      MBartForConditionalGeneration      | 1  | 29.1963  |
|         MegatronBertForCausalLM         | 1  | 26.1944  |
|             BertForMaskedLM             | 1  | 25.8671  |
|           DebertaForMaskedLM            | 1  | 24.9838  |
|            TrOCRForCausalLM             | 1  | 24.3498  |
|       DebertaForQuestionAnswering       | 1  | 23.4213  |
|            XLNetLMHeadModel             | 1  | 23.0369  |
|            YituTechConvBert             | 1  | 22.6534  |
|             BartForCausalLM             | 1  | 21.9363  |
|                 BigBird                 | 1  | 21.8919  |
|           PegasusForCausalLM            | 1  | 21.2559  |
|            MBartForCausalLM             | 1  | 20.7836  |
| BlenderbotSmallForConditionalGeneration | 1  | 20.5169  |
|     PLBartForConditionalGeneration      | 1  | 20.4035  |
|             OPTForCausalLM              | 1  | 19.5484  |
|    LayoutLMForSequenceClassification    | 1  |  18.248  |
|                CamemBert                | 1  | 17.9571  |
|           LayoutLMForMaskedLM           | 1  |  17.482  |
|           RobertaForCausalLM            | 1  | 17.1473  |
|               DistillGPT2               | 1  | 17.0975  |
|           ElectraForCausalLM            | 1  | 16.7446  |
|       RobertaForQuestionAnswering       | 1  | 16.5735  |
|        BertForQuestionAnswering         | 1  | 16.0552  |
|       ElectraForQuestionAnswering       | 1  | 15.7711  |
|         Speech2Text2ForCausalLM         | 1  | 15.4567  |
|       BlenderbotSmallForCausalLM        | 1  | 15.2025  |
|          DistilBertForMaskedLM          | 1  | 14.5338  |
|            PLBartForCausalLM            | 1  |  14.525  |
|     DistilBertForQuestionAnswering      | 1  | 13.7605  |
|               GoogleFnet                | 1  | 13.6029  |
+-----------------------------------------+----+----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|               GoogleFnet                | 1  |  0.984   |
|       DebertaForQuestionAnswering       | 1  |  0.9767  |
|           DebertaForMaskedLM            | 1  |  0.9749  |
|     M2M100ForConditionalGeneration      | 1  |  0.9727  |
|            TrOCRForCausalLM             | 1  |  0.9716  |
|            PLBartForCausalLM            | 1  |  0.9715  |
|           PegasusForCausalLM            | 1  |  0.9705  |
|            MBartForCausalLM             | 1  |  0.9695  |
|     DistilBertForQuestionAnswering      | 1  |  0.9672  |
|          DistilBertForMaskedLM          | 1  |  0.965   |
|     PegasusForConditionalGeneration     | 1  |  0.9648  |
|             XGLMForCausalLM             | 1  |  0.9611  |
|             OPTForCausalLM              | 1  |  0.9591  |
|       BlenderbotSmallForCausalLM        | 1  |  0.9536  |
|         Speech2Text2ForCausalLM         | 1  |  0.9526  |
|        BertForQuestionAnswering         | 1  |  0.9512  |
|       RobertaForQuestionAnswering       | 1  |  0.9507  |
|    MegatronBertForQuestionAnswering     | 1  |  0.9406  |
|     PLBartForConditionalGeneration      | 1  |  0.9398  |
|      BartForConditionalGeneration       | 1  |  0.9387  |
|         MegatronBertForCausalLM         | 1  |  0.933   |
|      MBartForConditionalGeneration      | 1  |  0.927   |
|           RobertaForCausalLM            | 1  |  0.925   |
|             BertForMaskedLM             | 1  |  0.9249  |
|      GPT2ForSequenceClassification      | 1  |  0.9058  |
|       ElectraForQuestionAnswering       | 1  |  0.892   |
|           ElectraForCausalLM            | 1  |  0.8905  |
| BlenderbotSmallForConditionalGeneration | 1  |  0.8836  |
|             BartForCausalLM             | 1  |  0.8579  |
|            XLNetLMHeadModel             | 1  |  0.8531  |
|                CamemBert                | 1  |  0.8527  |
|           LayoutLMForMaskedLM           | 1  |  0.8519  |
|          AllenaiLongformerBase          | 1  |  0.8253  |
|                 BigBird                 | 1  |  0.825   |
|            YituTechConvBert             | 1  |  0.8133  |
|          MobileBertForMaskedLM          | 1  |  0.7559  |
|               DistillGPT2               | 1  |  0.7556  |
|       T5ForConditionalGeneration        | 1  |  0.7544  |
|     MobileBertForQuestionAnswering      | 1  |  0.7307  |
|    LayoutLMForSequenceClassification    | 1  |  0.7296  |
|                 T5Small                 | 1  |   0.71   |
|            AlbertForMaskedLM            | 1  |  0.6963  |
|       AlbertForQuestionAnswering        | 1  |  0.6799  |
|       MT5ForConditionalGeneration       | 1  |  0.4384  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|       AlbertForQuestionAnswering        | 1  | 16162.4794 |
|            AlbertForMaskedLM            | 1  | 16118.1871 |
|      BartForConditionalGeneration       | 1  | 10419.4101 |
|             BartForCausalLM             | 1  |  4643.22   |
|            XLNetLMHeadModel             | 1  | 3907.6893  |
|          AllenaiLongformerBase          | 1  | 2926.5363  |
|      GPT2ForSequenceClassification      | 1  | 2699.8503  |
|       T5ForConditionalGeneration        | 1  | 2494.1865  |
|                 T5Small                 | 1  | 2493.0742  |
|                 BigBird                 | 1  | 2295.8959  |
|             XGLMForCausalLM             | 1  | 1288.9455  |
|                CamemBert                | 1  | 1207.9531  |
|           LayoutLMForMaskedLM           | 1  | 1176.0064  |
|           DebertaForMaskedLM            | 1  | 1075.9537  |
|            YituTechConvBert             | 1  | 1047.5884  |
|     M2M100ForConditionalGeneration      | 1  | 1023.4148  |
|       DebertaForQuestionAnswering       | 1  |  966.3726  |
|    LayoutLMForSequenceClassification    | 1  |  930.3851  |
|     PegasusForConditionalGeneration     | 1  |  909.4274  |
|      MBartForConditionalGeneration      | 1  |  905.5053  |
|               DistillGPT2               | 1  |  851.721   |
|       MT5ForConditionalGeneration       | 1  |  836.0849  |
|         MegatronBertForCausalLM         | 1  |  736.6892  |
|               GoogleFnet                | 1  |  703.621   |
|    MegatronBertForQuestionAnswering     | 1  |  658.2027  |
|           PegasusForCausalLM            | 1  |  468.2069  |
|            MBartForCausalLM             | 1  |  465.857   |
|            TrOCRForCausalLM             | 1  |  461.3907  |
|     PLBartForConditionalGeneration      | 1  |  353.2526  |
|           ElectraForCausalLM            | 1  |  345.684   |
|             OPTForCausalLM              | 1  |  284.7286  |
| BlenderbotSmallForConditionalGeneration | 1  |  275.3079  |
|             BertForMaskedLM             | 1  |  271.0192  |
|           RobertaForCausalLM            | 1  |  268.3139  |
|       ElectraForQuestionAnswering       | 1  |  214.5785  |
|            PLBartForCausalLM            | 1  |  213.3463  |
|       RobertaForQuestionAnswering       | 1  |  204.6172  |
|        BertForQuestionAnswering         | 1  |  199.7581  |
|          MobileBertForMaskedLM          | 1  |  177.9792  |
|          DistilBertForMaskedLM          | 1  |  172.2906  |
|       BlenderbotSmallForCausalLM        | 1  |  169.7299  |
|     DistilBertForQuestionAnswering      | 1  |  104.4706  |
|     MobileBertForQuestionAnswering      | 1  |  61.2836   |
|         Speech2Text2ForCausalLM         | 1  |  33.5208   |
+-----------------------------------------+----+------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  1.6555  |
|           regnety_002           | 1  |  1.4812  |
|           fbnetc_100            | 1  |  1.4328  |
|          spnasnet_100           | 1  |  1.4102  |
|           mnasnet_100           | 1  |  1.4019  |
|         mobilenetv2_100         | 1  |  1.3831  |
|            fbnetv3_b            | 1  |  1.3684  |
|          inception_v3           | 1  |  1.3519  |
|       gluon_inception_v3        | 1  |  1.3388  |
|        adv_inception_v3         | 1  |  1.3315  |
|        ese_vovnet19b_dw         | 1  |  1.3231  |
|          botnet26t_256          | 1  |  1.3139  |
|      mobilenetv3_large_100      | 1  |  1.2819  |
|          gmixer_24_224          | 1  |  1.2701  |
|            lcnet_050            | 1  |  1.2552  |
|            gernet_l             | 1  |  1.1916  |
|          cspdarknet53           | 1  |  1.1765  |
|            hrnet_w18            | 1  |  1.1667  |
|             dla102              | 1  |  1.1396  |
|            repvgg_a2            | 1  |  1.1216  |
|           volo_d1_224           | 1  |  1.1141  |
|           res2next50            | 1  |  1.1042  |
|        gluon_xception65         | 1  |  1.0977  |
|        res2net50_14w_8s         | 1  |  1.0855  |
|           resnest101e           | 1  |  1.0801  |
|      beit_base_patch16_224      | 1  |  1.0752  |
|             dpn107              | 1  |  1.0698  |
|        tnt_s_patch16_224        | 1  |  1.058   |
|          ghostnet_100           | 1  |  1.0541  |
|        convmixer_768_32         | 1  |  1.0473  |
|        res2net101_26w_4s        | 1  |  1.0445  |
|            mixnet_l             | 1  |  1.0012  |
|         crossvit_9_240          | 1  |  0.9966  |
|           convit_base           | 1  |  0.9947  |
|           tf_mixnet_l           | 1  |  0.9911  |
|           rexnet_100            | 1  |   0.99   |
|           dm_nfnet_f0           | 1  |  0.9863  |
|           selecsls42b           | 1  |  0.9758  |
|          resmlp_12_224          | 1  |  0.9638  |
| deit_base_distilled_patch16_224 | 1  |  0.9633  |
|         coat_lite_mini          | 1  |  0.9632  |
|            nfnet_l0             | 1  |  0.9569  |
|         visformer_small         | 1  |  0.9539  |
|        sebotnet33ts_256         | 1  |  0.9414  |
|        eca_halonext26ts         | 1  |  0.9327  |
|      vit_base_patch16_224       | 1  |  0.932   |
|       eca_botnext26ts_256       | 1  |  0.9271  |
|            pit_b_224            | 1  |  0.9026  |
|      xcit_large_24_p8_224       | 1  |  0.8754  |
|           mobilevit_s           | 1  |  0.869   |
|          mixer_b16_224          | 1  |  0.8577  |
|  swin_base_patch4_window7_224   | 1  |  0.7902  |
|            tinynet_a            | 1  |  0.7672  |
|       tf_efficientnet_b0        | 1  |  0.7611  |
|         poolformer_m36          | 1  |  0.7558  |
|          cait_m36_384           | 1  |  0.7343  |
|          gmlp_s16_224           | 1  |  0.7227  |
|        twins_pcpvt_base         | 1  |  0.6653  |
|          convnext_base          | 1  |  0.664   |
|          jx_nest_base           | 1  |  0.6213  |
|     swsl_resnext101_32x16d      | 1  |  0.0663  |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+---------------+
|              name               | bs |   inductor    |
+---------------------------------+----+---------------+
|        adv_inception_v3         | 1  |     pass      |
|          mixer_b16_224          | 1  |     pass      |
|        ese_vovnet19b_dw         | 1  |     pass      |
|          botnet26t_256          | 1  |     pass      |
|         coat_lite_mini          | 1  |     pass      |
|        convmixer_768_32         | 1  |     pass      |
|          convnext_base          | 1  |     pass      |
|         crossvit_9_240          | 1  |     pass      |
|          cspdarknet53           | 1  |     pass      |
| deit_base_distilled_patch16_224 | 1  |     pass      |
|             dla102              | 1  |     pass      |
|           dm_nfnet_f0           | 1  |     pass      |
|             dpn107              | 1  |     pass      |
|       eca_botnext26ts_256       | 1  |     pass      |
|        eca_halonext26ts         | 1  |     pass      |
|           mnasnet_100           | 1  |     pass      |
|           fbnetc_100            | 1  |     pass      |
|            fbnetv3_b            | 1  |     pass      |
|            gernet_l             | 1  |     pass      |
|       gluon_inception_v3        | 1  |     pass      |
|        gluon_xception65         | 1  |     pass      |
|          gmixer_24_224          | 1  |     pass      |
|          gmlp_s16_224           | 1  |     pass      |
|            hrnet_w18            | 1  |     pass      |
|          inception_v3           | 1  |     pass      |
|          jx_nest_base           | 1  |     pass      |
|            lcnet_050            | 1  |     pass      |
|      xcit_large_24_p8_224       | 1  |     pass      |
|      beit_base_patch16_224      | 1  |     pass      |
|            mixnet_l             | 1  |     pass      |
|         mobilenetv2_100         | 1  |     pass      |
|           volo_d1_224           | 1  |     pass      |
|      mobilenetv3_large_100      | 1  |     pass      |
|           mobilevit_s           | 1  |     pass      |
|            nfnet_l0             | 1  |     pass      |
|            pit_b_224            | 1  |     pass      |
|          pnasnet5large          | 1  |     pass      |
|         poolformer_m36          | 1  |     pass      |
|           regnety_002           | 1  |     pass      |
|            repvgg_a2            | 1  |     pass      |
|        res2net101_26w_4s        | 1  |     pass      |
|        res2net50_14w_8s         | 1  |     pass      |
|           res2next50            | 1  |     pass      |
|          resmlp_12_224          | 1  |     pass      |
|           resnest101e           | 1  |     pass      |
|           rexnet_100            | 1  |     pass      |
|        sebotnet33ts_256         | 1  |     pass      |
|           selecsls42b           | 1  |     pass      |
|          spnasnet_100           | 1  |     pass      |
|  swin_base_patch4_window7_224   | 1  |     pass      |
|     swsl_resnext101_32x16d      | 1  |     pass      |
|       tf_efficientnet_b0        | 1  |     pass      |
|           tf_mixnet_l           | 1  |     pass      |
|            tinynet_a            | 1  |     pass      |
|        tnt_s_patch16_224        | 1  |     pass      |
|        twins_pcpvt_base         | 1  |     pass      |
|         visformer_small         | 1  |     pass      |
|      vit_base_patch16_224       | 1  |     pass      |
|           convit_base           | 1  |  fail_to_run  |
|          ghostnet_100           | 1  | fail_accuracy |
|          cait_m36_384           | 1  | fail_accuracy |
+---------------------------------+----+---------------+

Compilation latency (sec)

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|     swsl_resnext101_32x16d      | 1  | 84.6102  |
|          pnasnet5large          | 1  | 62.6202  |
|        twins_pcpvt_base         | 1  | 56.5052  |
|           tf_mixnet_l           | 1  | 52.8404  |
|           mobilevit_s           | 1  | 46.8707  |
|           rexnet_100            | 1  | 45.3422  |
|            mixnet_l             | 1  |  45.226  |
|         coat_lite_mini          | 1  | 44.8874  |
|  swin_base_patch4_window7_224   | 1  | 44.2222  |
|          cait_m36_384           | 1  | 43.4292  |
|       tf_efficientnet_b0        | 1  | 40.8872  |
|             dpn107              | 1  | 39.7548  |
|            tinynet_a            | 1  | 39.2097  |
|           dm_nfnet_f0           | 1  | 39.0219  |
|            hrnet_w18            | 1  |  38.823  |
|      xcit_large_24_p8_224       | 1  |  38.371  |
|            fbnetv3_b            | 1  | 37.2976  |
|        sebotnet33ts_256         | 1  | 35.6706  |
|          jx_nest_base           | 1  | 35.5893  |
|            nfnet_l0             | 1  | 34.3365  |
|        res2net50_14w_8s         | 1  | 33.8936  |
|        eca_halonext26ts         | 1  | 33.0631  |
|         poolformer_m36          | 1  |  32.494  |
|      mobilenetv3_large_100      | 1  |  31.978  |
|         crossvit_9_240          | 1  | 31.2509  |
|           volo_d1_224           | 1  | 30.2039  |
|          ghostnet_100           | 1  | 29.9092  |
|       eca_botnext26ts_256       | 1  | 29.5306  |
|        tnt_s_patch16_224        | 1  |  28.646  |
|          spnasnet_100           | 1  |  28.506  |
|          cspdarknet53           | 1  | 28.3475  |
|           fbnetc_100            | 1  | 28.0247  |
|           resnest101e           | 1  | 27.7135  |
|          botnet26t_256          | 1  | 27.3314  |
|           regnety_002           | 1  | 27.2782  |
|          inception_v3           | 1  | 27.0959  |
|        adv_inception_v3         | 1  | 27.0685  |
|       gluon_inception_v3        | 1  | 26.9963  |
|            pit_b_224            | 1  | 26.5334  |
|        res2net101_26w_4s        | 1  | 26.3073  |
|           mnasnet_100           | 1  | 25.4305  |
|         mobilenetv2_100         | 1  | 24.2003  |
|          convnext_base          | 1  | 23.9086  |
|          gmlp_s16_224           | 1  | 23.5881  |
|         visformer_small         | 1  | 23.3023  |
|           convit_base           | 1  | 23.1088  |
|           selecsls42b           | 1  | 21.8964  |
|           res2next50            | 1  | 21.7189  |
|             dla102              | 1  | 20.6287  |
|            gernet_l             | 1  |  19.839  |
|          gmixer_24_224          | 1  | 19.6573  |
|        ese_vovnet19b_dw         | 1  | 19.0535  |
| deit_base_distilled_patch16_224 | 1  | 18.5192  |
|          mixer_b16_224          | 1  |  18.021  |
|      vit_base_patch16_224       | 1  | 17.9716  |
|            repvgg_a2            | 1  | 17.6986  |
|      beit_base_patch16_224      | 1  | 16.9351  |
|            lcnet_050            | 1  | 16.8918  |
|        gluon_xception65         | 1  | 14.7129  |
|        convmixer_768_32         | 1  | 14.0338  |
|          resmlp_12_224          | 1  | 13.6005  |
+---------------------------------+----+----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  |  0.956   |
| deit_base_distilled_patch16_224 | 1  |  0.9518  |
|            pit_b_224            | 1  |  0.9516  |
|      vit_base_patch16_224       | 1  |  0.9513  |
|            repvgg_a2            | 1  |   0.95   |
|          resmlp_12_224          | 1  |  0.9487  |
|        ese_vovnet19b_dw         | 1  |  0.9479  |
|            gernet_l             | 1  |  0.9477  |
|             dpn107              | 1  |  0.9459  |
|           convit_base           | 1  |  0.945   |
|      beit_base_patch16_224      | 1  |  0.9445  |
|          cait_m36_384           | 1  |  0.9429  |
|          mixer_b16_224          | 1  |  0.9393  |
|          convnext_base          | 1  |  0.9366  |
|          cspdarknet53           | 1  |  0.9334  |
|          botnet26t_256          | 1  |  0.9315  |
|            lcnet_050            | 1  |  0.9311  |
|         coat_lite_mini          | 1  |  0.9253  |
|           mnasnet_100           | 1  |  0.9248  |
|         mobilenetv2_100         | 1  |  0.9208  |
|       eca_botnext26ts_256       | 1  |  0.9207  |
|      xcit_large_24_p8_224       | 1  |   0.92   |
|           regnety_002           | 1  |  0.9191  |
|          spnasnet_100           | 1  |  0.9179  |
|           fbnetc_100            | 1  |  0.9176  |
|        eca_halonext26ts         | 1  |  0.9142  |
|           rexnet_100            | 1  |  0.9096  |
|        sebotnet33ts_256         | 1  |  0.9093  |
|          ghostnet_100           | 1  |  0.9057  |
|      mobilenetv3_large_100      | 1  |  0.8935  |
|       tf_efficientnet_b0        | 1  |  0.8927  |
|            mixnet_l             | 1  |  0.8886  |
|         crossvit_9_240          | 1  |  0.8855  |
|           mobilevit_s           | 1  |  0.8842  |
|          gmixer_24_224          | 1  |  0.8837  |
|           tf_mixnet_l           | 1  |  0.8837  |
|         visformer_small         | 1  |  0.8829  |
|            tinynet_a            | 1  |  0.8809  |
|  swin_base_patch4_window7_224   | 1  |  0.877   |
|          jx_nest_base           | 1  |  0.8548  |
|           volo_d1_224           | 1  |  0.8435  |
|        tnt_s_patch16_224        | 1  |  0.8427  |
|          inception_v3           | 1  |  0.8365  |
|        adv_inception_v3         | 1  |  0.8365  |
|     swsl_resnext101_32x16d      | 1  |  0.8359  |
|           res2next50            | 1  |  0.8356  |
|       gluon_inception_v3        | 1  |  0.8355  |
|           selecsls42b           | 1  |  0.829   |
|        convmixer_768_32         | 1  |  0.8208  |
|            fbnetv3_b            | 1  |  0.8183  |
|        res2net50_14w_8s         | 1  |  0.8166  |
|             dla102              | 1  |  0.8133  |
|        gluon_xception65         | 1  |  0.8008  |
|            hrnet_w18            | 1  |  0.7942  |
|          gmlp_s16_224           | 1  |  0.7915  |
|            nfnet_l0             | 1  |  0.7703  |
|           resnest101e           | 1  |  0.7647  |
|        res2net101_26w_4s        | 1  |  0.7614  |
|         poolformer_m36          | 1  |   0.75   |
|        twins_pcpvt_base         | 1  |  0.731   |
|           dm_nfnet_f0           | 1  |  0.6665  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|     swsl_resnext101_32x16d      | 1  | 13498.4926 |
|          cait_m36_384           | 1  | 4985.9813  |
|      xcit_large_24_p8_224       | 1  | 2097.5778  |
|           resnest101e           | 1  |  980.7668  |
|          pnasnet5large          | 1  |  484.1883  |
|          jx_nest_base           | 1  |  470.3207  |
|           convit_base           | 1  |  393.0478  |
|  swin_base_patch4_window7_224   | 1  |  372.0333  |
|          convnext_base          | 1  |  362.3913  |
|           dm_nfnet_f0           | 1  |  353.6843  |
|            pit_b_224            | 1  |  340.8464  |
|      vit_base_patch16_224       | 1  |  325.1788  |
| deit_base_distilled_patch16_224 | 1  |  316.8631  |
|      beit_base_patch16_224      | 1  |  316.0932  |
|         poolformer_m36          | 1  |  287.6353  |
|             dpn107              | 1  |  287.2039  |
|        convmixer_768_32         | 1  |  287.1061  |
|          mixer_b16_224          | 1  |  256.1196  |
|            hrnet_w18            | 1  |  238.2177  |
|        twins_pcpvt_base         | 1  |  223.0628  |
|        gluon_xception65         | 1  |  221.8742  |
|            nfnet_l0             | 1  |  215.7706  |
|           volo_d1_224           | 1  |  203.9664  |
|          gmlp_s16_224           | 1  |  203.4203  |
|             dla102              | 1  |  191.7719  |
|        tnt_s_patch16_224        | 1  |  184.6197  |
|        sebotnet33ts_256         | 1  |  183.8818  |
|          cspdarknet53           | 1  |  181.7662  |
|            repvgg_a2            | 1  |  160.6748  |
|        adv_inception_v3         | 1  |  160.2409  |
|       gluon_inception_v3        | 1  |  160.1876  |
|          inception_v3           | 1  |  160.1089  |
|        res2net101_26w_4s        | 1  |  154.658   |
|           mobilevit_s           | 1  |  149.3921  |
|         visformer_small         | 1  |  141.4109  |
|        res2net50_14w_8s         | 1  |  140.7363  |
|           selecsls42b           | 1  |  131.5543  |
|           res2next50            | 1  |  126.2329  |
|        eca_halonext26ts         | 1  |  122.1701  |
|       eca_botnext26ts_256       | 1  |  119.1028  |
|          gmixer_24_224          | 1  |  109.0381  |
|         coat_lite_mini          | 1  |  102.7589  |
|           tf_mixnet_l           | 1  |  97.3142   |
|            gernet_l             | 1  |  94.0719   |
|          botnet26t_256          | 1  |  88.7531   |
|            mixnet_l             | 1  |  88.0298   |
|         crossvit_9_240          | 1  |  71.2385   |
|       tf_efficientnet_b0        | 1  |  69.9641   |
|          resmlp_12_224          | 1  |  67.8867   |
|            tinynet_a            | 1  |  55.4643   |
|           rexnet_100            | 1  |  54.8369   |
|            fbnetv3_b            | 1  |  46.7386   |
|        ese_vovnet19b_dw         | 1  |  44.5951   |
|          ghostnet_100           | 1  |  34.9799   |
|           fbnetc_100            | 1  |  28.5539   |
|         mobilenetv2_100         | 1  |  27.4313   |
|          spnasnet_100           | 1  |  26.9549   |
|           mnasnet_100           | 1  |  24.4474   |
|      mobilenetv3_large_100      | 1  |  23.7742   |
|           regnety_002           | 1  |   15.387   |
|            lcnet_050            | 1  |   8.0342   |
+---------------------------------+----+------------+

zxd1997066 · 2024-10-01T11:22:38Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-09-29 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	1d6e0412f5205b1cd709e034526d7f21d6f2d56f
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.60x    |    1.21x    |    1.57x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   53.60    |    36.97    |    50.94    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.81x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 52.127691 |
|     pyhpc_equation_of_state     |    1    | 25.506603 |
|     functorch_maml_omniglot     |    1    | 3.717789  |
|          basic_gnn_gin          |    1    |  3.59645  |
|         basic_gnn_sage          |    1    | 3.569591  |
|          squeezenet1_1          |    1    | 3.509685  |
|          basic_gnn_gcn          |    1    | 3.174749  |
|          maml_omniglot          |    5    | 2.908667  |
|           timm_nfnet            |    1    | 2.796501  |
|         opacus_cifar10          |    1    | 2.754716  |
|      functorch_dp_cifar10       |    1    | 2.533077  |
|       shufflenet_v2_x1_0        |    1    | 2.381408  |
|              dcgan              |    1    | 2.270358  |
|            resnet18             |    1    |  2.24222  |
|          mobilenet_v2           |    1    | 2.120617  |
|         phlippe_resnet          |    1    | 2.006091  |
|           mnasnet1_0            |    1    | 2.001826  |
|          timm_resnest           |    1    | 1.998988  |
|       mobilenet_v3_large        |    1    | 1.948231  |
|            resnet50             |    1    | 1.808562  |
|        timm_efficientnet        |    1    | 1.792939  |
|           densenet121           |    1    | 1.786634  |
|        phlippe_densenet         |    1    | 1.752867  |
|          lennard_jones          |    1    | 1.752409  |
|            resnet152            |    1    | 1.720373  |
|         LearningToPaint         |    1    | 1.657303  |
|           timm_vovnet           |    1    | 1.613813  |
|              llama              |    1    | 1.569625  |
|         resnext50_32x4d         |    1    | 1.520482  |
|           timm_regnet           |    1    | 1.507834  |
|      doctr_reco_predictor       |    1    | 1.487192  |
|              vgg16              |    1    | 1.445498  |
|              dlrm               |    1    | 1.426784  |
|             yolov3              |    1    | 1.406848  |
|        basic_gnn_edgecnn        |    1    | 1.403126  |
|          BERT_pytorch           |    1    | 1.397206  |
|             alexnet             |    1    | 1.391414  |
|       doctr_det_predictor       |    1    | 1.383026  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.295941  |
|            hf_Albert            |    1    | 1.292341  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.280403  |
|             hf_GPT2             |    1    | 1.271282  |
|           hf_Reformer           |    1    | 1.270055  |
|         vision_maskrcnn         |    1    | 1.266494  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.249836  |
|     timm_vision_transformer     |    1    | 1.241896  |
|          fastNLP_Bert           |    1    | 1.229526  |
|          hf_GPT2_large          |    1    | 1.224717  |
|            moondream            |    1    |  1.22467  |
|              maml               |    1    | 1.224095  |
|           Super_SloMo           |    1    | 1.211367  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.192389  |
|         pytorch_stargan         |   16    | 1.181153  |
|          hf_Bert_large          |    1    | 1.179852  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.179014  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.174296  |
|  timm_vision_transformer_large  |    1    | 1.172185  |
|             hf_Bert             |    1    | 1.156941  |
|          hf_DistilBert          |    1    |  1.1438   |
|      torch_multimodal_clip      |    1    | 1.143765  |
|          pytorch_unet           |    1    | 1.118774  |
|           hf_BigBird            |    1    | 1.113744  |
|             hf_Bart             |    1    | 1.108966  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.096137  |
|       speech_transformer        |    1    | 1.083288  |
|        hf_distil_whisper        |    1    |  1.0747   |
|       Background_Matting        |    1    | 1.072401  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.052946  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051352  |
|        soft_actor_critic        |   256   | 1.049968  |
|          hf_Longformer          |    1    |  1.0179   |
|             demucs              |    1    | 1.004468  |
|           tts_angular           |    1    | 0.998848  |
|     resnet50_quantized_qat      |    1    | 0.994463  |
|   mobilenet_v2_quantized_qat    |    1    | 0.984078  |
|     nvidia_deeprecommender      |    1    | 0.945212  |
|           hf_T5_large           |    1    | 0.794151  |
|              hf_T5              |    1    |  0.70921  |
|           hf_T5_base            |    1    | 0.596262  |
|        timm_efficientdet        |    0    |    0.0    |
|               drq               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 344.970798 |
|         vision_maskrcnn         |    1    | 323.013372 |
|    detectron2_fcos_r_50_fpn     |    1    | 265.177119 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 248.946249 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 220.938658 |
|           hf_T5_large           |    1    | 195.075125 |
|              maml               |    1    | 158.194622 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 125.33843  |
|          hf_Longformer          |    1    | 114.558281 |
|           hf_T5_base            |    1    | 113.460542 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 100.681811 |
|       speech_transformer        |    1    | 93.186574  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 86.836067  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  86.60779  |
|           hf_Reformer           |    1    | 84.198057  |
|          basic_gnn_gcn          |    1    | 64.430239  |
|           densenet121           |    1    | 62.576778  |
|            resnet152            |    1    | 62.225093  |
|          fastNLP_Bert           |    1    | 55.895586  |
|       doctr_det_predictor       |    1    |   50.889   |
|  timm_vision_transformer_large  |    1    | 47.359221  |
|          hf_Bert_large          |    1    | 43.110419  |
|        hf_distil_whisper        |    1    | 40.342108  |
|            moondream            |    1    | 40.037255  |
|          hf_GPT2_large          |    1    | 39.493977  |
|           Super_SloMo           |    1    | 37.785344  |
|           timm_regnet           |    1    | 37.149889  |
|             demucs              |    1    | 36.890718  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 35.181375  |
|             yolov3              |    1    | 33.307986  |
|              hf_T5              |    1    | 32.092466  |
|      torch_multimodal_clip      |    1    | 31.846135  |
|           timm_nfnet            |    1    |  31.65302  |
|        timm_efficientnet        |    1    | 30.743587  |
|          BERT_pytorch           |    1    | 28.971499  |
|             hf_Bart             |    1    | 28.221967  |
|        phlippe_densenet         |    1    | 27.766412  |
|      doctr_reco_predictor       |    1    | 27.421724  |
|       shufflenet_v2_x1_0        |    1    | 26.926486  |
|             hf_Bert             |    1    | 26.196952  |
|       mobilenet_v3_large        |    1    |  26.17745  |
|       Background_Matting        |    1    | 25.870968  |
|         resnext50_32x4d         |    1    |  24.26367  |
|            resnet50             |    1    | 24.230289  |
|          mobilenet_v2           |    1    | 23.947342  |
|            hf_Albert            |    1    | 23.919435  |
|           mnasnet1_0            |    1    | 23.446523  |
|          timm_resnest           |    1    | 23.216521  |
|           timm_vovnet           |    1    | 23.118761  |
|             hf_GPT2             |    1    | 22.992953  |
|              llama              |    1    | 22.749829  |
|     timm_vision_transformer     |    1    | 22.515211  |
|         pytorch_stargan         |   16    | 21.380233  |
|          hf_DistilBert          |    1    | 19.771484  |
|         opacus_cifar10          |    1    | 19.761661  |
|      functorch_dp_cifar10       |    1    | 19.224401  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.125355  |
|          pytorch_unet           |    1    | 18.363416  |
|     pyhpc_isoneutral_mixing     |    1    | 18.198816  |
|            resnet18             |    1    | 17.721989  |
|         LearningToPaint         |    1    | 17.672569  |
|         phlippe_resnet          |    1    | 17.270383  |
|          squeezenet1_1          |    1    | 17.222592  |
|              vgg16              |    1    | 16.492518  |
|             alexnet             |    1    | 16.008404  |
|          maml_omniglot          |    5    | 15.571084  |
|     functorch_maml_omniglot     |    1    | 15.186868  |
|     pyhpc_equation_of_state     |    1    | 14.804104  |
|     nvidia_deeprecommender      |    1    | 14.644956  |
|              dlrm               |    1    |  14.41531  |
|              dcgan              |    1    | 14.262336  |
|          basic_gnn_gin          |    1    | 13.714709  |
|         basic_gnn_sage          |    1    | 13.710235  |
|        basic_gnn_edgecnn        |    1    | 13.693424  |
|        soft_actor_critic        |   256   | 13.546626  |
|          lennard_jones          |    1    | 13.167364  |
|           tts_angular           |    1    | 12.963513  |
|   mobilenet_v2_quantized_qat    |    1    |  0.099709  |
|     resnet50_quantized_qat      |    1    |  0.070419  |
|        timm_efficientdet        |    0    |    0.0     |
|               drq               |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    |  0.9951  |
|           hf_T5_base            |    1    | 0.98846  |
|              dlrm               |    1    | 0.986558 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.98135  |
|       Background_Matting        |    1    | 0.98029  |
|          hf_GPT2_large          |    1    | 0.974286 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973359 |
|          pytorch_unet           |    1    | 0.973298 |
|        basic_gnn_edgecnn        |    1    | 0.970244 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.956315 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.953152 |
|     resnet50_quantized_qat      |    1    | 0.952885 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.952754 |
|       doctr_det_predictor       |    1    | 0.951083 |
|           hf_BigBird            |    1    | 0.945446 |
|         LearningToPaint         |    1    | 0.945269 |
|         pytorch_stargan         |   16    | 0.943755 |
|         vision_maskrcnn         |    1    | 0.943561 |
|          basic_gnn_gcn          |    1    | 0.939727 |
|          basic_gnn_gin          |    1    | 0.938696 |
|      doctr_reco_predictor       |    1    | 0.938397 |
|         basic_gnn_sage          |    1    | 0.937539 |
|      torch_multimodal_clip      |    1    | 0.934974 |
|           Super_SloMo           |    1    | 0.93256  |
|   mobilenet_v2_quantized_qat    |    1    | 0.922144 |
|              llama              |    1    | 0.918042 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907646 |
|        hf_distil_whisper        |    1    | 0.902484 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885442 |
|           tts_angular           |    1    | 0.881614 |
|        soft_actor_critic        |   256   | 0.879121 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.877556 |
|         opacus_cifar10          |    1    |  0.875   |
|             yolov3              |    1    | 0.874135 |
|        timm_efficientnet        |    1    | 0.862073 |
|          timm_resnest           |    1    | 0.857905 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.857842 |
|          lennard_jones          |    1    | 0.852058 |
|          mobilenet_v2           |    1    | 0.851426 |
|          squeezenet1_1          |    1    | 0.850225 |
|           mnasnet1_0            |    1    | 0.848107 |
|          fastNLP_Bert           |    1    | 0.845565 |
|          maml_omniglot          |    5    | 0.845188 |
|     functorch_maml_omniglot     |    1    | 0.84216  |
|       mobilenet_v3_large        |    1    | 0.835169 |
|              dcgan              |    1    | 0.832847 |
|     pyhpc_equation_of_state     |    1    | 0.832069 |
|       shufflenet_v2_x1_0        |    1    | 0.822918 |
|       speech_transformer        |    1    | 0.818763 |
|         phlippe_resnet          |    1    | 0.817703 |
|     pyhpc_isoneutral_mixing     |    1    | 0.813534 |
|           timm_nfnet            |    1    | 0.811774 |
|          hf_Bert_large          |    1    | 0.807239 |
|        phlippe_densenet         |    1    | 0.804811 |
|           hf_T5_large           |    1    | 0.804129 |
|     timm_vision_transformer     |    1    | 0.803358 |
|            hf_Albert            |    1    | 0.800412 |
|             hf_Bert             |    1    | 0.799405 |
|          hf_Longformer          |    1    | 0.79857  |
|            moondream            |    1    | 0.796497 |
|           timm_vovnet           |    1    | 0.793158 |
|           densenet121           |    1    | 0.789729 |
|         resnext50_32x4d         |    1    | 0.787205 |
|            resnet18             |    1    | 0.786316 |
|            resnet50             |    1    | 0.783237 |
|           timm_regnet           |    1    | 0.782176 |
|          hf_DistilBert          |    1    | 0.78072  |
|          BERT_pytorch           |    1    | 0.777664 |
|             hf_GPT2             |    1    | 0.765039 |
|             hf_Bart             |    1    | 0.756311 |
|              hf_T5              |    1    | 0.754464 |
|      functorch_dp_cifar10       |    1    | 0.749825 |
|  timm_vision_transformer_large  |    1    | 0.733317 |
|             alexnet             |    1    | 0.729945 |
|           hf_Reformer           |    1    | 0.728983 |
|              maml               |    1    | 0.726622 |
|              vgg16              |    1    | 0.721243 |
|            resnet152            |    1    | 0.717599 |
|     nvidia_deeprecommender      |    1    | 0.671809 |
|        timm_efficientdet        |    0    |   0.0    |
|               drq               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26584.39516  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11883.126301 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11324.188701 |
|          hf_GPT2_large          |    1    | 10193.61959  |
|           hf_T5_large           |    1    | 7508.499308  |
|            moondream            |    1    | 7360.634583  |
|        hf_distil_whisper        |    1    | 7003.881296  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5773.409684  |
|       Background_Matting        |    1    | 5263.001322  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5098.179088  |
|          pytorch_unet           |    1    | 4564.315289  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3190.726489  |
|  timm_vision_transformer_large  |    1    | 2807.280189  |
|         vision_maskrcnn         |    1    | 2653.003865  |
|    detectron2_fcos_r_50_fpn     |    1    | 2432.123479  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2417.305074  |
|             demucs              |    1    | 2319.261051  |
|         pytorch_stargan         |   16    | 2052.002332  |
|           Super_SloMo           |    1    | 1947.209127  |
|          hf_Bert_large          |    1    | 1773.762103  |
|       doctr_det_predictor       |    1    | 1614.791719  |
|           hf_BigBird            |    1    | 1560.754446  |
|      torch_multimodal_clip      |    1    |  1220.69464  |
|          hf_Longformer          |    1    | 1118.728033  |
|             hf_Bart             |    1    |  888.078619  |
|              hf_T5              |    1    |  761.56971   |
|             hf_Bert             |    1    |  683.450779  |
|       speech_transformer        |    1    |  674.274351  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  624.901618  |
|            hf_Albert            |    1    |  574.867438  |
|          fastNLP_Bert           |    1    |  525.838074  |
|             yolov3              |    1    |  434.893599  |
|          hf_DistilBert          |    1    |  415.36363   |
|             hf_GPT2             |    1    |  356.099929  |
|           hf_Reformer           |    1    |  288.814832  |
|        basic_gnn_edgecnn        |    1    |  235.598954  |
|              vgg16              |    1    |  191.633799  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  158.995485  |
|           timm_regnet           |    1    |  150.864375  |
|            resnet152            |    1    |  137.498066  |
|          BERT_pytorch           |    1    |  134.702415  |
|           timm_nfnet            |    1    |  97.959813   |
|           timm_vovnet           |    1    |  81.271375   |
|              maml               |    1    |  80.998179   |
|     timm_vision_transformer     |    1    |  58.699331   |
|     nvidia_deeprecommender      |    1    |  58.643607   |
|         resnext50_32x4d         |    1    |  57.533685   |
|           tts_angular           |    1    |   54.77578   |
|            resnet50             |    1    |  51.549831   |
|           densenet121           |    1    |  45.026685   |
|          timm_resnest           |    1    |   33.81793   |
|          basic_gnn_gcn          |    1    |  32.147629   |
|      doctr_reco_predictor       |    1    |  23.858614   |
|            resnet18             |    1    |  22.507817   |
|             alexnet             |    1    |  22.454813   |
|              llama              |    1    |  21.325187   |
|     resnet50_quantized_qat      |    1    |  19.524586   |
|         basic_gnn_sage          |    1    |  16.860384   |
|          basic_gnn_gin          |    1    |  16.523546   |
|        timm_efficientnet        |    1    |  12.847355   |
|         LearningToPaint         |    1    |   9.911148   |
|   mobilenet_v2_quantized_qat    |    1    |   8.533341   |
|           mnasnet1_0            |    1    |   7.739361   |
|          mobilenet_v2           |    1    |   7.509362   |
|       mobilenet_v3_large        |    1    |   7.14096    |
|          squeezenet1_1          |    1    |   5.886255   |
|       shufflenet_v2_x1_0        |    1    |   5.469513   |
|        phlippe_densenet         |    1    |   3.552919   |
|        soft_actor_critic        |   256   |   3.453194   |
|      functorch_dp_cifar10       |    1    |    2.1527    |
|         opacus_cifar10          |    1    |   2.111358   |
|              dcgan              |    1    |   1.800223   |
|         phlippe_resnet          |    1    |   1.391884   |
|     functorch_maml_omniglot     |    1    |   0.841661   |
|          maml_omniglot          |    5    |   0.798967   |
|              dlrm               |    1    |   0.578652   |
|     pyhpc_isoneutral_mixing     |    1    |   0.060451   |
|     pyhpc_equation_of_state     |    1    |   0.047667   |
|          lennard_jones          |    1    |   0.04379    |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.096733 |
|     MobileBertForQuestionAnswering      | 1  | 1.779147 |
|         Speech2Text2ForCausalLM         | 1  | 1.416039 |
|            XLNetLMHeadModel             | 1  | 1.360356 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.336909 |
|          DistilBertForMaskedLM          | 1  | 1.328032 |
|     DistilBertForQuestionAnswering      | 1  | 1.327723 |
|       BlenderbotSmallForCausalLM        | 1  | 1.324352 |
|      GPT2ForSequenceClassification      | 1  | 1.311353 |
|            YituTechConvBert             | 1  | 1.307048 |
|       DebertaForQuestionAnswering       | 1  | 1.276857 |
|           PegasusForCausalLM            | 1  | 1.259388 |
|     PegasusForConditionalGeneration     | 1  | 1.255381 |
|     M2M100ForConditionalGeneration      | 1  | 1.253211 |
|          BlenderbotForCausalLM          | 1  | 1.25182  |
|           DebertaForMaskedLM            | 1  | 1.246473 |
|             XGLMForCausalLM             | 1  | 1.245074 |
|               GoogleFnet                | 1  | 1.237717 |
|           ElectraForCausalLM            | 1  | 1.233364 |
|       ElectraForQuestionAnswering       | 1  | 1.213952 |
|       MT5ForConditionalGeneration       | 1  | 1.208728 |
|       AlbertForQuestionAnswering        | 1  | 1.205136 |
|            AlbertForMaskedLM            | 1  | 1.204726 |
|               DistillGPT2               | 1  | 1.195862 |
|    LayoutLMForSequenceClassification    | 1  | 1.190435 |
|        BertForQuestionAnswering         | 1  | 1.181689 |
|                CamemBert                | 1  | 1.180903 |
|           LayoutLMForMaskedLM           | 1  | 1.179137 |
|    MegatronBertForQuestionAnswering     | 1  | 1.178909 |
|       RobertaForQuestionAnswering       | 1  | 1.164357 |
|             BertForMaskedLM             | 1  | 1.16247  |
|         MegatronBertForCausalLM         | 1  | 1.162405 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.161995 |
|           RobertaForCausalLM            | 1  | 1.15834  |
|          DebertaV2ForMaskedLM           | 1  | 1.156222 |
|            TrOCRForCausalLM             | 1  | 1.145858 |
|     PLBartForConditionalGeneration      | 1  | 1.087052 |
|      BartForConditionalGeneration       | 1  | 1.065693 |
|             BartForCausalLM             | 1  | 1.061812 |
|      MBartForConditionalGeneration      | 1  | 1.05661  |
|            PLBartForCausalLM            | 1  | 1.027973 |
|             OPTForCausalLM              | 1  | 1.022257 |
|            MBartForCausalLM             | 1  | 1.008408 |
|          AllenaiLongformerBase          | 1  | 0.954477 |
|       T5ForConditionalGeneration        | 1  | 0.623834 |
|                 T5Small                 | 1  | 0.61701  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          AllenaiLongformerBase          | 1  | 125.598761 |
|     MobileBertForQuestionAnswering      | 1  | 122.511174 |
|          MobileBertForMaskedLM          | 1  | 122.393885 |
|      BartForConditionalGeneration       | 1  | 56.497288  |
|     PegasusForConditionalGeneration     | 1  | 52.054126  |
|     M2M100ForConditionalGeneration      | 1  | 51.848592  |
|      MBartForConditionalGeneration      | 1  | 51.363333  |
|          BlenderbotForCausalLM          | 1  | 45.925696  |
|             XGLMForCausalLM             | 1  | 44.512748  |
|          DebertaV2ForMaskedLM           | 1  | 41.783369  |
|      DebertaV2ForQuestionAnswering      | 1  | 41.769678  |
|       MT5ForConditionalGeneration       | 1  | 40.998376  |
|         MegatronBertForCausalLM         | 1  | 39.445592  |
|    MegatronBertForQuestionAnswering     | 1  | 39.082595  |
|            XLNetLMHeadModel             | 1  | 38.429011  |
| BlenderbotSmallForConditionalGeneration | 1  | 36.964462  |
|       T5ForConditionalGeneration        | 1  | 34.802785  |
|                 T5Small                 | 1  | 34.700351  |
|            YituTechConvBert             | 1  | 34.309854  |
|     PLBartForConditionalGeneration      | 1  | 30.525075  |
|            MBartForCausalLM             | 1  | 28.104662  |
|           PegasusForCausalLM            | 1  | 27.617427  |
|             OPTForCausalLM              | 1  | 27.601773  |
|            TrOCRForCausalLM             | 1  | 27.436563  |
|           ElectraForCausalLM            | 1  |  26.98499  |
|       ElectraForQuestionAnswering       | 1  | 25.638786  |
|           RobertaForCausalLM            | 1  | 25.508059  |
|                CamemBert                | 1  | 25.483588  |
|           LayoutLMForMaskedLM           | 1  | 25.454815  |
|             BertForMaskedLM             | 1  | 25.359302  |
|       RobertaForQuestionAnswering       | 1  | 25.284879  |
|    LayoutLMForSequenceClassification    | 1  | 25.201632  |
|        BertForQuestionAnswering         | 1  | 25.080068  |
|           DebertaForMaskedLM            | 1  | 24.896181  |
|       DebertaForQuestionAnswering       | 1  | 24.570691  |
|             BartForCausalLM             | 1  | 24.489226  |
|       BlenderbotSmallForCausalLM        | 1  | 22.252471  |
|      GPT2ForSequenceClassification      | 1  | 21.720417  |
|               GoogleFnet                | 1  | 21.144385  |
|            PLBartForCausalLM            | 1  |  20.49036  |
|          DistilBertForMaskedLM          | 1  | 20.416517  |
|         Speech2Text2ForCausalLM         | 1  | 20.269712  |
|     DistilBertForQuestionAnswering      | 1  | 20.153476  |
|            AlbertForMaskedLM            | 1  | 19.268499  |
|               DistillGPT2               | 1  | 18.318928  |
|       AlbertForQuestionAnswering        | 1  | 16.430271  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986314 |
|      MBartForConditionalGeneration      | 1  | 0.972485 |
|      GPT2ForSequenceClassification      | 1  | 0.957438 |
|          AllenaiLongformerBase          | 1  | 0.95071  |
|            MBartForCausalLM             | 1  | 0.926133 |
|            XLNetLMHeadModel             | 1  | 0.911522 |
|       T5ForConditionalGeneration        | 1  | 0.907474 |
|                 T5Small                 | 1  | 0.90696  |
|     PLBartForConditionalGeneration      | 1  | 0.901545 |
|            PLBartForCausalLM            | 1  | 0.901342 |
|       DebertaForQuestionAnswering       | 1  | 0.881304 |
|               GoogleFnet                | 1  | 0.851572 |
|       RobertaForQuestionAnswering       | 1  | 0.844056 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.843066 |
|        BertForQuestionAnswering         | 1  | 0.842136 |
|    LayoutLMForSequenceClassification    | 1  | 0.839983 |
|       ElectraForQuestionAnswering       | 1  | 0.838996 |
|    MegatronBertForQuestionAnswering     | 1  | 0.831107 |
|           DebertaForMaskedLM            | 1  | 0.830896 |
|               DistillGPT2               | 1  | 0.825841 |
|          DebertaV2ForMaskedLM           | 1  | 0.823978 |
|           LayoutLMForMaskedLM           | 1  | 0.812025 |
|         MegatronBertForCausalLM         | 1  | 0.810594 |
|                CamemBert                | 1  | 0.808956 |
|             BertForMaskedLM             | 1  | 0.80884  |
|           RobertaForCausalLM            | 1  | 0.807739 |
|         Speech2Text2ForCausalLM         | 1  | 0.803622 |
|           ElectraForCausalLM            | 1  | 0.801905 |
|             BartForCausalLM             | 1  | 0.799674 |
|          BlenderbotForCausalLM          | 1  | 0.79818  |
|     DistilBertForQuestionAnswering      | 1  | 0.797056 |
|      BartForConditionalGeneration       | 1  | 0.789703 |
|            TrOCRForCausalLM             | 1  | 0.785219 |
|            YituTechConvBert             | 1  | 0.783658 |
|       MT5ForConditionalGeneration       | 1  | 0.777312 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763374 |
|           PegasusForCausalLM            | 1  | 0.752902 |
|          DistilBertForMaskedLM          | 1  | 0.74395  |
| BlenderbotSmallForConditionalGeneration | 1  | 0.736035 |
|     MobileBertForQuestionAnswering      | 1  | 0.717077 |
|     PegasusForConditionalGeneration     | 1  | 0.712546 |
|     M2M100ForConditionalGeneration      | 1  | 0.708245 |
|             XGLMForCausalLM             | 1  | 0.697624 |
|          MobileBertForMaskedLM          | 1  | 0.695827 |
|            AlbertForMaskedLM            | 1  | 0.448437 |
|       AlbertForQuestionAnswering        | 1  | 0.443423 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12840.274766 |
|       AlbertForQuestionAnswering        | 1  | 12798.480932 |
|      MBartForConditionalGeneration      | 1  |  6240.00253  |
|      BartForConditionalGeneration       | 1  | 5698.578728  |
|             OPTForCausalLM              | 1  | 5308.634286  |
|          DebertaV2ForMaskedLM           | 1  |  5073.26008  |
|      DebertaV2ForQuestionAnswering      | 1  | 3985.683611  |
|            XLNetLMHeadModel             | 1  | 3186.992752  |
|            MBartForCausalLM             | 1  | 3087.043169  |
|          BlenderbotForCausalLM          | 1  | 2649.228097  |
|             BartForCausalLM             | 1  | 2573.202973  |
|                 T5Small                 | 1  | 2545.882117  |
|       T5ForConditionalGeneration        | 1  | 2516.438948  |
|          AllenaiLongformerBase          | 1  | 2457.498297  |
|     PLBartForConditionalGeneration      | 1  | 2195.635975  |
|         MegatronBertForCausalLM         | 1  | 2045.498335  |
|    MegatronBertForQuestionAnswering     | 1  | 1869.243409  |
|      GPT2ForSequenceClassification      | 1  | 1331.322459  |
|            PLBartForCausalLM            | 1  |  1229.80869  |
|             XGLMForCausalLM             | 1  |  836.459155  |
|           DebertaForMaskedLM            | 1  |  788.372492  |
|           RobertaForCausalLM            | 1  |  780.636854  |
|     M2M100ForConditionalGeneration      | 1  |  719.385316  |
|                CamemBert                | 1  |  698.169452  |
|           LayoutLMForMaskedLM           | 1  |  691.415164  |
|            YituTechConvBert             | 1  |  691.027593  |
|             BertForMaskedLM             | 1  |  689.585847  |
|     PegasusForConditionalGeneration     | 1  |  608.554188  |
|            TrOCRForCausalLM             | 1  |  596.542193  |
|       DebertaForQuestionAnswering       | 1  |  558.461958  |
|       RobertaForQuestionAnswering       | 1  |  546.946905  |
|    LayoutLMForSequenceClassification    | 1  |  546.486095  |
|        BertForQuestionAnswering         | 1  |  546.355564  |
|               DistillGPT2               | 1  |  505.990274  |
|               GoogleFnet                | 1  |  475.753128  |
|       MT5ForConditionalGeneration       | 1  |  395.715995  |
|           PegasusForCausalLM            | 1  |  300.550221  |
| BlenderbotSmallForConditionalGeneration | 1  |  144.182001  |
|           ElectraForCausalLM            | 1  |  135.004466  |
|          DistilBertForMaskedLM          | 1  |  99.965119   |
|       ElectraForQuestionAnswering       | 1  |  93.568755   |
|       BlenderbotSmallForCausalLM        | 1  |  84.495649   |
|          MobileBertForMaskedLM          | 1  |  68.947818   |
|     DistilBertForQuestionAnswering      | 1  |  63.719034   |
|     MobileBertForQuestionAnswering      | 1  |  41.165457   |
|         Speech2Text2ForCausalLM         | 1  |  18.677443   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.433532 |
|          inception_v3           | 1  | 2.409729 |
|       gluon_inception_v3        | 1  | 2.392809 |
|        adv_inception_v3         | 1  | 2.37207  |
|            nfnet_l0             | 1  | 2.230528 |
|           dm_nfnet_f0           | 1  | 2.227616 |
|         mobilenetv2_100         | 1  | 2.193628 |
|            levit_128            | 1  | 2.136688 |
|            lcnet_050            | 1  | 2.133528 |
|          ghostnet_100           | 1  | 2.126245 |
|          spnasnet_100           | 1  | 2.125274 |
|           mnasnet_100           | 1  | 2.063014 |
|           fbnetc_100            | 1  | 2.034739 |
|           regnety_002           | 1  | 2.034493 |
|      mobilenetv3_large_100      | 1  | 1.986666 |
|            repvgg_a2            | 1  | 1.963254 |
|            fbnetv3_b            | 1  | 1.836225 |
|           rexnet_100            | 1  | 1.822177 |
|       tf_efficientnet_b0        | 1  | 1.799878 |
|         poolformer_m36          | 1  | 1.788246 |
|           selecsls42b           | 1  | 1.779838 |
|             dla102              | 1  | 1.727388 |
|            tinynet_a            | 1  | 1.723315 |
|        ese_vovnet19b_dw         | 1  | 1.723054 |
|           mobilevit_s           | 1  | 1.712926 |
|       eca_botnext26ts_256       | 1  | 1.663279 |
|          botnet26t_256          | 1  | 1.659616 |
|        eca_halonext26ts         | 1  | 1.649553 |
|          cspdarknet53           | 1  | 1.613634 |
|           res2next50            | 1  | 1.591582 |
|        res2net50_14w_8s         | 1  | 1.589691 |
|           volo_d1_224           | 1  | 1.560144 |
|         coat_lite_mini          | 1  | 1.54054  |
|        res2net101_26w_4s        | 1  | 1.522628 |
|           tf_mixnet_l           | 1  | 1.492691 |
|         visformer_small         | 1  | 1.433647 |
|        twins_pcpvt_base         | 1  | 1.430383 |
|           convit_base           | 1  | 1.410906 |
|          gmixer_24_224          | 1  | 1.366834 |
|          jx_nest_base           | 1  | 1.359811 |
|            mixnet_l             | 1  | 1.34918  |
|            gernet_l             | 1  | 1.334519 |
|      beit_base_patch16_224      | 1  | 1.301841 |
|          resmlp_12_224          | 1  | 1.300054 |
|  swin_base_patch4_window7_224   | 1  | 1.291988 |
|         crossvit_9_240          | 1  | 1.290857 |
|        tnt_s_patch16_224        | 1  | 1.26379  |
|        convmixer_768_32         | 1  | 1.24257  |
|          gmlp_s16_224           | 1  | 1.229598 |
| deit_base_distilled_patch16_224 | 1  | 1.217885 |
|          mixer_b16_224          | 1  | 1.211373 |
|             dpn107              | 1  | 1.196459 |
|      vit_base_patch16_224       | 1  | 1.188623 |
|      xcit_large_24_p8_224       | 1  | 1.185765 |
|            pit_b_224            | 1  | 1.18458  |
|          convnext_base          | 1  | 1.178842 |
|          cait_m36_384           | 1  | 1.061041 |
|           resnest101e           | 1  | 0.990263 |
|        sebotnet33ts_256         | 1  | 0.976688 |
|            hrnet_w18            | 1  | 0.610165 |
|     swsl_resnext101_32x16d      | 1  | 0.067207 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  |  502.0787  |
|            hrnet_w18            | 1  | 288.631683 |
|     swsl_resnext101_32x16d      | 1  | 103.362895 |
|        res2net101_26w_4s        | 1  |  91.17391  |
|           resnest101e           | 1  | 85.452358  |
|        res2net50_14w_8s         | 1  |  74.40315  |
|           tf_mixnet_l           | 1  |  72.59215  |
|            mixnet_l             | 1  | 70.239326  |
|      xcit_large_24_p8_224       | 1  | 67.952273  |
|         poolformer_m36          | 1  | 61.104561  |
|          cait_m36_384           | 1  | 61.012116  |
|        twins_pcpvt_base         | 1  | 60.493467  |
|  swin_base_patch4_window7_224   | 1  | 53.800244  |
|             dpn107              | 1  |  52.96106  |
|            fbnetv3_b            | 1  | 48.143617  |
|        adv_inception_v3         | 1  | 45.729126  |
|          jx_nest_base           | 1  | 44.156304  |
|       gluon_inception_v3        | 1  | 43.717084  |
|          inception_v3           | 1  | 43.548089  |
|             dla102              | 1  | 43.423402  |
|        tnt_s_patch16_224        | 1  | 42.906416  |
|           res2next50            | 1  | 40.584505  |
|           mobilevit_s           | 1  | 40.442536  |
|          ghostnet_100           | 1  | 39.150893  |
|          convnext_base          | 1  |  37.50447  |
|           volo_d1_224           | 1  | 36.760956  |
|            tinynet_a            | 1  | 35.908469  |
|          gmixer_24_224          | 1  | 34.778817  |
|          gmlp_s16_224           | 1  |  33.64931  |
|         crossvit_9_240          | 1  | 33.533449  |
|           rexnet_100            | 1  | 32.257827  |
|        sebotnet33ts_256         | 1  | 32.245635  |
|         coat_lite_mini          | 1  | 31.575023  |
|       tf_efficientnet_b0        | 1  | 31.514499  |
|            levit_128            | 1  | 31.493751  |
|           dm_nfnet_f0           | 1  | 31.414592  |
|        eca_halonext26ts         | 1  | 31.104097  |
|            nfnet_l0             | 1  |  29.34113  |
|        convmixer_768_32         | 1  | 29.325402  |
|          cspdarknet53           | 1  | 29.184236  |
|           regnety_002           | 1  | 27.341678  |
|         visformer_small         | 1  | 26.756624  |
|       eca_botnext26ts_256       | 1  | 26.740919  |
|           fbnetc_100            | 1  | 26.653181  |
|           convit_base           | 1  | 26.473536  |
|          spnasnet_100           | 1  | 26.385718  |
|      mobilenetv3_large_100      | 1  | 26.103074  |
|          botnet26t_256          | 1  | 25.743443  |
|            gernet_l             | 1  | 25.178372  |
|            pit_b_224            | 1  | 24.914664  |
|            repvgg_a2            | 1  | 24.432027  |
|         mobilenetv2_100         | 1  | 23.871231  |
|           mnasnet_100           | 1  | 23.587323  |
|      beit_base_patch16_224      | 1  | 23.581581  |
|          mixer_b16_224          | 1  |  23.3404   |
| deit_base_distilled_patch16_224 | 1  | 22.761579  |
|      vit_base_patch16_224       | 1  | 22.754053  |
|        ese_vovnet19b_dw         | 1  | 22.214759  |
|           selecsls42b           | 1  | 21.507106  |
|            lcnet_050            | 1  | 19.612422  |
|          resmlp_12_224          | 1  | 18.970502  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.948362 |
|        convmixer_768_32         | 1  | 0.904948 |
|          pnasnet5large          | 1  | 0.898042 |
|            nfnet_l0             | 1  | 0.894781 |
|      xcit_large_24_p8_224       | 1  | 0.88842  |
|        ese_vovnet19b_dw         | 1  | 0.879018 |
|         mobilenetv2_100         | 1  |   0.87   |
|        eca_halonext26ts         | 1  | 0.868938 |
|           mnasnet_100           | 1  | 0.864221 |
|       eca_botnext26ts_256       | 1  | 0.863079 |
|       tf_efficientnet_b0        | 1  | 0.860957 |
|            lcnet_050            | 1  | 0.855991 |
|          spnasnet_100           | 1  | 0.853841 |
|      mobilenetv3_large_100      | 1  | 0.852267 |
|            fbnetv3_b            | 1  | 0.852256 |
|          botnet26t_256          | 1  | 0.851946 |
|           dm_nfnet_f0           | 1  | 0.851383 |
|           rexnet_100            | 1  | 0.851326 |
|            tinynet_a            | 1  | 0.848422 |
|           fbnetc_100            | 1  | 0.84352  |
|          cspdarknet53           | 1  | 0.842934 |
|           mobilevit_s           | 1  | 0.838564 |
|           tf_mixnet_l           | 1  | 0.835773 |
|          ghostnet_100           | 1  | 0.829323 |
|            mixnet_l             | 1  | 0.828802 |
|           regnety_002           | 1  | 0.826177 |
|          resmlp_12_224          | 1  | 0.824252 |
|         visformer_small         | 1  | 0.820976 |
|        sebotnet33ts_256         | 1  | 0.815323 |
|         coat_lite_mini          | 1  | 0.814809 |
|             dpn107              | 1  | 0.813245 |
|          convnext_base          | 1  | 0.801429 |
|         poolformer_m36          | 1  | 0.800279 |
|           resnest101e           | 1  | 0.796331 |
|          gmlp_s16_224           | 1  | 0.796047 |
|            levit_128            | 1  | 0.795058 |
|             dla102              | 1  | 0.794332 |
|       gluon_inception_v3        | 1  | 0.793752 |
|        adv_inception_v3         | 1  | 0.79328  |
|          inception_v3           | 1  | 0.792615 |
|           res2next50            | 1  | 0.790987 |
|           volo_d1_224           | 1  | 0.788873 |
|          gmixer_24_224          | 1  | 0.784248 |
|           selecsls42b           | 1  | 0.784125 |
|          jx_nest_base           | 1  | 0.784119 |
|        twins_pcpvt_base         | 1  | 0.783636 |
|         crossvit_9_240          | 1  | 0.781809 |
|           convit_base           | 1  | 0.780311 |
|        tnt_s_patch16_224        | 1  | 0.77999  |
|          mixer_b16_224          | 1  | 0.775783 |
|        res2net50_14w_8s         | 1  | 0.775214 |
|      beit_base_patch16_224      | 1  | 0.769945 |
| deit_base_distilled_patch16_224 | 1  | 0.76237  |
|            hrnet_w18            | 1  | 0.761602 |
|      vit_base_patch16_224       | 1  | 0.760018 |
|            pit_b_224            | 1  | 0.752919 |
|  swin_base_patch4_window7_224   | 1  | 0.739317 |
|        res2net101_26w_4s        | 1  | 0.738065 |
|            repvgg_a2            | 1  | 0.734921 |
|            gernet_l             | 1  | 0.734551 |
|     swsl_resnext101_32x16d      | 1  | 0.650326 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|     swsl_resnext101_32x16d      | 1  | 13583.69564 |
|          cait_m36_384           | 1  | 3428.914063 |
|      xcit_large_24_p8_224       | 1  | 1566.726715 |
|           resnest101e           | 1  | 1088.77909  |
|          pnasnet5large          | 1  | 374.009127  |
|          convnext_base          | 1  | 304.529843  |
|            hrnet_w18            | 1  | 292.993866  |
|             dpn107              | 1  | 269.225019  |
|        convmixer_768_32         | 1  | 251.651813  |
|          jx_nest_base           | 1  | 249.675613  |
|  swin_base_patch4_window7_224   | 1  | 223.867965  |
|      vit_base_patch16_224       | 1  | 201.684973  |
|      beit_base_patch16_224      | 1  | 200.196728  |
|           convit_base           | 1  | 199.131295  |
| deit_base_distilled_patch16_224 | 1  | 197.627482  |
|            pit_b_224            | 1  | 171.079903  |
|           dm_nfnet_f0           | 1  | 160.146958  |
|          mixer_b16_224          | 1  |  142.65148  |
|         poolformer_m36          | 1  | 120.954894  |
|        res2net101_26w_4s        | 1  | 113.773011  |
|        twins_pcpvt_base         | 1  | 103.884745  |
|        sebotnet33ts_256         | 1  |  99.046296  |
|            nfnet_l0             | 1  |  94.741936  |
|           volo_d1_224           | 1  |  93.886323  |
|        tnt_s_patch16_224        | 1  |  92.474001  |
|             dla102              | 1  |  91.198417  |
|          cspdarknet53           | 1  |  84.499578  |
|          inception_v3           | 1  |  70.137289  |
|       gluon_inception_v3        | 1  |  70.064457  |
|        adv_inception_v3         | 1  |  69.983216  |
|          gmlp_s16_224           | 1  |  68.916494  |
|         visformer_small         | 1  |  67.511384  |
|          gmixer_24_224          | 1  |  66.127274  |
|            repvgg_a2            | 1  |  64.930808  |
|        res2net50_14w_8s         | 1  |  64.902231  |
|           res2next50            | 1  |  59.486633  |
|            gernet_l             | 1  |  57.385481  |
|          botnet26t_256          | 1  |  46.910892  |
|           selecsls42b           | 1  |  46.267584  |
|        eca_halonext26ts         | 1  |  43.09115   |
|       eca_botnext26ts_256       | 1  |  42.02658   |
|         coat_lite_mini          | 1  |  38.20367   |
|           mobilevit_s           | 1  |  37.434423  |
|          resmlp_12_224          | 1  |  36.408905  |
|         crossvit_9_240          | 1  |  32.550556  |
|        ese_vovnet19b_dw         | 1  |  31.409387  |
|            mixnet_l             | 1  |  30.290371  |
|           tf_mixnet_l           | 1  |  29.633932  |
|            fbnetv3_b            | 1  |  15.626651  |
|       tf_efficientnet_b0        | 1  |  13.761639  |
|           rexnet_100            | 1  |  13.481008  |
|            tinynet_a            | 1  |  11.970597  |
|           fbnetc_100            | 1  |  9.388779   |
|            levit_128            | 1  |   9.08272   |
|          ghostnet_100           | 1  |  8.781055   |
|          spnasnet_100           | 1  |  8.291381   |
|           mnasnet_100           | 1  |  7.661525   |
|         mobilenetv2_100         | 1  |  7.434317   |
|      mobilenetv3_large_100      | 1  |  7.160234   |
|           regnety_002           | 1  |  6.145933   |
|            lcnet_050            | 1  |  2.449122   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-01T11:22:39Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-09-29 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	1d6e0412f5205b1cd709e034526d7f21d6f2d56f
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.61x    |    1.20x    |    1.57x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   53.17    |    36.73    |    50.54    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.81x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 51.517814 |
|     pyhpc_equation_of_state     |    1    | 25.969366 |
|          maml_omniglot          |    5    | 4.036413  |
|     functorch_maml_omniglot     |    1    | 3.723148  |
|          basic_gnn_gin          |    1    | 3.650591  |
|         basic_gnn_sage          |    1    | 3.610559  |
|          squeezenet1_1          |    1    | 3.544867  |
|          basic_gnn_gcn          |    1    |  3.20027  |
|           timm_nfnet            |    1    | 2.786056  |
|         opacus_cifar10          |    1    | 2.765338  |
|      functorch_dp_cifar10       |    1    | 2.561362  |
|       shufflenet_v2_x1_0        |    1    | 2.375432  |
|              dcgan              |    1    | 2.349984  |
|            resnet18             |    1    | 2.225344  |
|          mobilenet_v2           |    1    | 2.104211  |
|         phlippe_resnet          |    1    | 2.014096  |
|          timm_resnest           |    1    | 2.012949  |
|           mnasnet1_0            |    1    | 2.000725  |
|       mobilenet_v3_large        |    1    | 1.941002  |
|            resnet50             |    1    | 1.804463  |
|           densenet121           |    1    | 1.791194  |
|        timm_efficientnet        |    1    | 1.780174  |
|          lennard_jones          |    1    | 1.757237  |
|        phlippe_densenet         |    1    |  1.72262  |
|            resnet152            |    1    |  1.71138  |
|         LearningToPaint         |    1    | 1.653177  |
|           timm_vovnet           |    1    | 1.610772  |
|              llama              |    1    | 1.588491  |
|         resnext50_32x4d         |    1    | 1.515669  |
|           timm_regnet           |    1    | 1.493926  |
|      doctr_reco_predictor       |    1    | 1.490907  |
|              dlrm               |    1    | 1.456912  |
|              vgg16              |    1    | 1.450381  |
|             yolov3              |    1    | 1.401066  |
|        basic_gnn_edgecnn        |    1    | 1.397765  |
|          BERT_pytorch           |    1    | 1.392351  |
|       doctr_det_predictor       |    1    | 1.380453  |
|             alexnet             |    1    | 1.361577  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.308791  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.304724  |
|            hf_Albert            |    1    |  1.28321  |
|         vision_maskrcnn         |    1    | 1.265016  |
|           hf_Reformer           |    1    | 1.263694  |
|             hf_GPT2             |    1    | 1.259427  |
|     timm_vision_transformer     |    1    | 1.249428  |
|              maml               |    1    | 1.224429  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.223697  |
|          hf_GPT2_large          |    1    | 1.223472  |
|            moondream            |    1    | 1.220248  |
|          fastNLP_Bert           |    1    | 1.214306  |
|           Super_SloMo           |    1    | 1.212746  |
|        soft_actor_critic        |   256   | 1.192841  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.190707  |
|         pytorch_stargan         |   16    | 1.188905  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.181717  |
| detectron2_fasterrcnn_r_101_fpn |    1    |  1.17832  |
|          hf_Bert_large          |    1    | 1.173677  |
|      torch_multimodal_clip      |    1    | 1.167051  |
|  timm_vision_transformer_large  |    1    | 1.166216  |
|             hf_Bert             |    1    |  1.14556  |
|          hf_DistilBert          |    1    | 1.142615  |
|          pytorch_unet           |    1    | 1.118048  |
|           hf_BigBird            |    1    | 1.109984  |
|             hf_Bart             |    1    | 1.104261  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.096945  |
|       speech_transformer        |    1    | 1.081211  |
|        hf_distil_whisper        |    1    | 1.078205  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.062567  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.059678  |
|          hf_Longformer          |    1    | 1.014639  |
|             demucs              |    1    | 1.003878  |
|           tts_angular           |    1    | 1.000384  |
|     resnet50_quantized_qat      |    1    | 0.994148  |
|   mobilenet_v2_quantized_qat    |    1    | 0.967049  |
|       Background_Matting        |    1    | 0.942398  |
|     nvidia_deeprecommender      |    1    | 0.921029  |
|           hf_T5_large           |    1    | 0.786279  |
|              hf_T5              |    1    | 0.702938  |
|           hf_T5_base            |    1    | 0.590414  |
|        timm_efficientdet        |    0    |    0.0    |
|               drq               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 341.885434 |
|         vision_maskrcnn         |    1    | 320.514893 |
|    detectron2_fcos_r_50_fpn     |    1    | 263.364481 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 247.02232  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 218.16671  |
|           hf_T5_large           |    1    | 194.99309  |
|              maml               |    1    | 157.09482  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 123.961513 |
|          hf_Longformer          |    1    | 113.622708 |
|           hf_T5_base            |    1    | 113.486203 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 99.369118  |
|       speech_transformer        |    1    | 92.592622  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 85.847825  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  84.91185  |
|           hf_Reformer           |    1    | 83.678357  |
|          basic_gnn_gcn          |    1    | 63.782692  |
|           densenet121           |    1    | 61.880715  |
|            resnet152            |    1    | 61.647591  |
|          fastNLP_Bert           |    1    | 55.364979  |
|       doctr_det_predictor       |    1    | 50.543591  |
|  timm_vision_transformer_large  |    1    | 47.101304  |
|          hf_Bert_large          |    1    | 42.673142  |
|            moondream            |    1    | 40.109558  |
|        hf_distil_whisper        |    1    | 39.704785  |
|          hf_GPT2_large          |    1    | 39.322659  |
|           Super_SloMo           |    1    | 37.477483  |
|           timm_regnet           |    1    | 36.793276  |
|             demucs              |    1    | 36.520198  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 35.003825  |
|             yolov3              |    1    | 33.102575  |
|              hf_T5              |    1    | 31.877154  |
|      torch_multimodal_clip      |    1    | 31.714187  |
|           timm_nfnet            |    1    | 31.384472  |
|        timm_efficientnet        |    1    | 30.457705  |
|       Background_Matting        |    1    | 29.099519  |
|          BERT_pytorch           |    1    | 28.700091  |
|             hf_Bart             |    1    | 27.978981  |
|        phlippe_densenet         |    1    | 27.597551  |
|      doctr_reco_predictor       |    1    | 27.121116  |
|       shufflenet_v2_x1_0        |    1    | 26.760703  |
|             hf_Bert             |    1    | 26.071331  |
|       mobilenet_v3_large        |    1    | 25.988883  |
|         resnext50_32x4d         |    1    | 24.042802  |
|            resnet50             |    1    | 24.018885  |
|          mobilenet_v2           |    1    | 23.787378  |
|            hf_Albert            |    1    |  23.72741  |
|           mnasnet1_0            |    1    | 23.242715  |
|          timm_resnest           |    1    | 23.102258  |
|           timm_vovnet           |    1    | 22.986994  |
|             hf_GPT2             |    1    | 22.899268  |
|              llama              |    1    | 22.551541  |
|     timm_vision_transformer     |    1    | 22.372835  |
|         opacus_cifar10          |    1    |  19.64711  |
|          hf_DistilBert          |    1    | 19.641303  |
|      functorch_dp_cifar10       |    1    | 19.045212  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.997185  |
|         pytorch_stargan         |   16    |  18.26059  |
|          pytorch_unet           |    1    | 18.142868  |
|     pyhpc_isoneutral_mixing     |    1    | 18.113625  |
|            resnet18             |    1    | 17.599074  |
|         LearningToPaint         |    1    | 17.469504  |
|         phlippe_resnet          |    1    | 17.176546  |
|          squeezenet1_1          |    1    | 17.109696  |
|              vgg16              |    1    | 16.356735  |
|             alexnet             |    1    | 15.853745  |
|     functorch_maml_omniglot     |    1    | 15.063389  |
|          maml_omniglot          |    5    | 14.773564  |
|     pyhpc_equation_of_state     |    1    | 14.710752  |
|     nvidia_deeprecommender      |    1    | 14.529114  |
|              dlrm               |    1    | 14.322906  |
|              dcgan              |    1    | 14.113976  |
|          basic_gnn_gin          |    1    | 13.605197  |
|         basic_gnn_sage          |    1    | 13.582554  |
|        basic_gnn_edgecnn        |    1    | 13.519516  |
|        soft_actor_critic        |   256   | 13.385602  |
|          lennard_jones          |    1    |  13.05363  |
|           tts_angular           |    1    | 12.891369  |
|   mobilenet_v2_quantized_qat    |    1    |  0.098443  |
|     resnet50_quantized_qat      |    1    |  0.070289  |
|        timm_efficientdet        |    0    |    0.0     |
|               drq               |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995066 |
|           hf_T5_base            |    1    | 0.988511 |
|              dlrm               |    1    | 0.986652 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981247 |
|       Background_Matting        |    1    | 0.979733 |
|          hf_GPT2_large          |    1    | 0.974219 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972855 |
|          pytorch_unet           |    1    | 0.97269  |
|        basic_gnn_edgecnn        |    1    | 0.97065  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.958022 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955891 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.953256 |
|       doctr_det_predictor       |    1    | 0.95279  |
|     resnet50_quantized_qat      |    1    | 0.952221 |
|         vision_maskrcnn         |    1    | 0.947047 |
|           hf_BigBird            |    1    | 0.945549 |
|         LearningToPaint         |    1    | 0.944574 |
|         pytorch_stargan         |   16    | 0.944071 |
|      doctr_reco_predictor       |    1    | 0.939713 |
|          basic_gnn_gin          |    1    | 0.938834 |
|         basic_gnn_sage          |    1    | 0.937532 |
|          basic_gnn_gcn          |    1    | 0.937366 |
|      torch_multimodal_clip      |    1    | 0.935236 |
|           Super_SloMo           |    1    | 0.932259 |
|   mobilenet_v2_quantized_qat    |    1    | 0.925628 |
|              llama              |    1    | 0.91773  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.908465 |
|        hf_distil_whisper        |    1    | 0.902825 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885976 |
|           tts_angular           |    1    | 0.882406 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.87996  |
|        soft_actor_critic        |   256   | 0.878317 |
|         opacus_cifar10          |    1    | 0.875689 |
|             yolov3              |    1    | 0.874614 |
|        timm_efficientnet        |    1    | 0.863114 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.860143 |
|          timm_resnest           |    1    |  0.8583  |
|          mobilenet_v2           |    1    | 0.85311  |
|          lennard_jones          |    1    | 0.852222 |
|           mnasnet1_0            |    1    | 0.848474 |
|          fastNLP_Bert           |    1    | 0.845758 |
|          squeezenet1_1          |    1    | 0.845252 |
|              dcgan              |    1    | 0.844181 |
|          maml_omniglot          |    5    | 0.843913 |
|     functorch_maml_omniglot     |    1    | 0.84216  |
|       mobilenet_v3_large        |    1    | 0.835284 |
|     pyhpc_equation_of_state     |    1    | 0.833031 |
|       shufflenet_v2_x1_0        |    1    | 0.818734 |
|       speech_transformer        |    1    | 0.818732 |
|         phlippe_resnet          |    1    | 0.816842 |
|     pyhpc_isoneutral_mixing     |    1    | 0.815465 |
|           timm_nfnet            |    1    | 0.811987 |
|          hf_Bert_large          |    1    | 0.809145 |
|          hf_Longformer          |    1    | 0.805068 |
|        phlippe_densenet         |    1    | 0.805044 |
|           hf_T5_large           |    1    | 0.804185 |
|     timm_vision_transformer     |    1    | 0.801807 |
|            hf_Albert            |    1    | 0.800329 |
|             hf_Bert             |    1    | 0.799612 |
|            moondream            |    1    | 0.796465 |
|           timm_vovnet           |    1    | 0.792238 |
|         resnext50_32x4d         |    1    | 0.787515 |
|            resnet18             |    1    | 0.784397 |
|            resnet50             |    1    | 0.783134 |
|           timm_regnet           |    1    | 0.781726 |
|          hf_DistilBert          |    1    | 0.780751 |
|          BERT_pytorch           |    1    | 0.777207 |
|             hf_GPT2             |    1    | 0.764874 |
|           densenet121           |    1    | 0.759809 |
|             hf_Bart             |    1    | 0.756328 |
|              hf_T5              |    1    | 0.755011 |
|      functorch_dp_cifar10       |    1    | 0.749096 |
|  timm_vision_transformer_large  |    1    | 0.733632 |
|             alexnet             |    1    | 0.730672 |
|           hf_Reformer           |    1    | 0.729503 |
|              vgg16              |    1    | 0.721243 |
|              maml               |    1    | 0.718271 |
|            resnet152            |    1    | 0.717699 |
|     nvidia_deeprecommender      |    1    | 0.672061 |
|        timm_efficientdet        |    0    |   0.0    |
|               drq               |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26537.889671 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11675.031649 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11109.42656  |
|          hf_GPT2_large          |    1    | 10138.914887 |
|           hf_T5_large           |    1    | 7491.736772  |
|            moondream            |    1    | 7339.175558  |
|        hf_distil_whisper        |    1    | 6981.269508  |
|       Background_Matting        |    1    | 5948.428988  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5692.378352  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  5026.6066   |
|          pytorch_unet           |    1    | 4542.503896  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3164.688275  |
|  timm_vision_transformer_large  |    1    |  2813.4786   |
|         vision_maskrcnn         |    1    | 2634.834204  |
|    detectron2_fcos_r_50_fpn     |    1    | 2411.472628  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2402.204333  |
|             demucs              |    1    | 2269.024457  |
|         pytorch_stargan         |   16    | 2027.770945  |
|           Super_SloMo           |    1    | 1932.658065  |
|          hf_Bert_large          |    1    | 1770.167816  |
|       doctr_det_predictor       |    1    | 1603.074988  |
|           hf_BigBird            |    1    |  1549.29066  |
|      torch_multimodal_clip      |    1    | 1237.974359  |
|          hf_Longformer          |    1    |  1116.68753  |
|             hf_Bart             |    1    |  886.16111   |
|              hf_T5              |    1    |  759.61706   |
|             hf_Bert             |    1    |  683.36694   |
|       speech_transformer        |    1    |  670.917097  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  617.419189  |
|            hf_Albert            |    1    |  573.135658  |
|          fastNLP_Bert           |    1    |  525.039937  |
|             yolov3              |    1    |  432.135524  |
|          hf_DistilBert          |    1    |  413.550376  |
|             hf_GPT2             |    1    |  355.330389  |
|           hf_Reformer           |    1    |  286.221791  |
|        basic_gnn_edgecnn        |    1    |  231.772584  |
|              vgg16              |    1    |  189.583946  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  157.221257  |
|           timm_regnet           |    1    |  149.927406  |
|            resnet152            |    1    |  136.497055  |
|          BERT_pytorch           |    1    |  133.99086   |
|           timm_nfnet            |    1    |  97.370137   |
|           timm_vovnet           |    1    |  80.715998   |
|              maml               |    1    |  80.075281   |
|     timm_vision_transformer     |    1    |  57.985196   |
|     nvidia_deeprecommender      |    1    |  57.750305   |
|         resnext50_32x4d         |    1    |  57.145384   |
|            resnet50             |    1    |  51.022144   |
|           tts_angular           |    1    |  50.901361   |
|           densenet121           |    1    |  44.218597   |
|          timm_resnest           |    1    |  33.334477   |
|          basic_gnn_gcn          |    1    |  31.452425   |
|      doctr_reco_predictor       |    1    |  23.776742   |
|            resnet18             |    1    |  22.444562   |
|             alexnet             |    1    |  21.953757   |
|              llama              |    1    |  21.063285   |
|     resnet50_quantized_qat      |    1    |  19.413773   |
|         basic_gnn_sage          |    1    |  16.246821   |
|          basic_gnn_gin          |    1    |  15.917753   |
|        timm_efficientnet        |    1    |  12.856863   |
|         LearningToPaint         |    1    |    9.7928    |
|   mobilenet_v2_quantized_qat    |    1    |   8.449081   |
|           mnasnet1_0            |    1    |   7.581626   |
|          mobilenet_v2           |    1    |   7.408353   |
|       mobilenet_v3_large        |    1    |   7.035983   |
|          squeezenet1_1          |    1    |   5.807074   |
|       shufflenet_v2_x1_0        |    1    |   5.43973    |
|        phlippe_densenet         |    1    |   3.530351   |
|        soft_actor_critic        |   256   |   3.020114   |
|      functorch_dp_cifar10       |    1    |   2.081369   |
|         opacus_cifar10          |    1    |   2.08075    |
|              dcgan              |    1    |   1.757923   |
|         phlippe_resnet          |    1    |   1.388165   |
|     functorch_maml_omniglot     |    1    |   0.829387   |
|          maml_omniglot          |    5    |   0.582314   |
|              dlrm               |    1    |   0.546168   |
|     pyhpc_isoneutral_mixing     |    1    |   0.060792   |
|     pyhpc_equation_of_state     |    1    |   0.046014   |
|          lennard_jones          |    1    |   0.042752   |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.058658 |
|     MobileBertForQuestionAnswering      | 1  | 1.76882  |
|         Speech2Text2ForCausalLM         | 1  | 1.394638 |
|            XLNetLMHeadModel             | 1  | 1.355146 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.32966  |
|          DistilBertForMaskedLM          | 1  | 1.322201 |
|     DistilBertForQuestionAnswering      | 1  | 1.315559 |
|      GPT2ForSequenceClassification      | 1  | 1.31198  |
|       BlenderbotSmallForCausalLM        | 1  | 1.308987 |
|            YituTechConvBert             | 1  | 1.295115 |
|       DebertaForQuestionAnswering       | 1  | 1.275671 |
|     M2M100ForConditionalGeneration      | 1  | 1.255189 |
|          BlenderbotForCausalLM          | 1  | 1.249434 |
|     PegasusForConditionalGeneration     | 1  | 1.248671 |
|           PegasusForCausalLM            | 1  | 1.247889 |
|           DebertaForMaskedLM            | 1  | 1.242364 |
|             XGLMForCausalLM             | 1  | 1.240532 |
|               GoogleFnet                | 1  | 1.228695 |
|           ElectraForCausalLM            | 1  | 1.219233 |
|       MT5ForConditionalGeneration       | 1  | 1.205384 |
|            AlbertForMaskedLM            | 1  | 1.198867 |
|       AlbertForQuestionAnswering        | 1  | 1.198711 |
|               DistillGPT2               | 1  | 1.186086 |
|       ElectraForQuestionAnswering       | 1  | 1.185305 |
|           LayoutLMForMaskedLM           | 1  | 1.179102 |
|             BertForMaskedLM             | 1  | 1.176479 |
|        BertForQuestionAnswering         | 1  | 1.176134 |
|                CamemBert                | 1  | 1.170586 |
|    MegatronBertForQuestionAnswering     | 1  | 1.170253 |
|    LayoutLMForSequenceClassification    | 1  | 1.162326 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.15792  |
|         MegatronBertForCausalLM         | 1  | 1.156809 |
|          DebertaV2ForMaskedLM           | 1  | 1.154242 |
|       RobertaForQuestionAnswering       | 1  | 1.151273 |
|            TrOCRForCausalLM             | 1  | 1.147431 |
|           RobertaForCausalLM            | 1  | 1.147104 |
|     PLBartForConditionalGeneration      | 1  | 1.078763 |
|             BartForCausalLM             | 1  | 1.063439 |
|      MBartForConditionalGeneration      | 1  | 1.061759 |
|      BartForConditionalGeneration       | 1  | 1.06064  |
|             OPTForCausalLM              | 1  | 1.018175 |
|            PLBartForCausalLM            | 1  | 1.018075 |
|            MBartForCausalLM             | 1  | 1.008214 |
|          AllenaiLongformerBase          | 1  | 0.952318 |
|                 T5Small                 | 1  | 0.614873 |
|       T5ForConditionalGeneration        | 1  | 0.608743 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          AllenaiLongformerBase          | 1  | 124.80302  |
|     MobileBertForQuestionAnswering      | 1  | 120.963881 |
|          MobileBertForMaskedLM          | 1  | 120.886457 |
|      BartForConditionalGeneration       | 1  | 55.973913  |
|     PegasusForConditionalGeneration     | 1  | 51.634543  |
|     M2M100ForConditionalGeneration      | 1  | 51.593065  |
|      MBartForConditionalGeneration      | 1  | 51.503995  |
|          BlenderbotForCausalLM          | 1  | 45.624805  |
|             XGLMForCausalLM             | 1  | 44.283734  |
|      DebertaV2ForQuestionAnswering      | 1  | 41.621597  |
|          DebertaV2ForMaskedLM           | 1  | 41.378203  |
|       MT5ForConditionalGeneration       | 1  | 40.736825  |
|         MegatronBertForCausalLM         | 1  | 39.206738  |
|    MegatronBertForQuestionAnswering     | 1  | 38.745701  |
|            XLNetLMHeadModel             | 1  |  38.28629  |
| BlenderbotSmallForConditionalGeneration | 1  | 36.702848  |
|                 T5Small                 | 1  | 34.543283  |
|       T5ForConditionalGeneration        | 1  |  34.52861  |
|            YituTechConvBert             | 1  | 34.107309  |
|     PLBartForConditionalGeneration      | 1  | 30.362386  |
|            MBartForCausalLM             | 1  | 27.973197  |
|             OPTForCausalLM              | 1  | 27.601449  |
|           PegasusForCausalLM            | 1  | 27.478038  |
|            TrOCRForCausalLM             | 1  | 27.232898  |
|           ElectraForCausalLM            | 1  | 26.826097  |
|       ElectraForQuestionAnswering       | 1  | 25.453971  |
|           RobertaForCausalLM            | 1  | 25.325169  |
|           LayoutLMForMaskedLM           | 1  | 25.300102  |
|                CamemBert                | 1  | 25.260696  |
|             BertForMaskedLM             | 1  |  25.22826  |
|       RobertaForQuestionAnswering       | 1  | 25.184219  |
|    LayoutLMForSequenceClassification    | 1  | 25.009612  |
|        BertForQuestionAnswering         | 1  |  24.96514  |
|           DebertaForMaskedLM            | 1  | 24.726408  |
|       DebertaForQuestionAnswering       | 1  | 24.440226  |
|             BartForCausalLM             | 1  | 24.394844  |
|       BlenderbotSmallForCausalLM        | 1  | 22.101786  |
|      GPT2ForSequenceClassification      | 1  | 21.638635  |
|               GoogleFnet                | 1  | 21.038939  |
|            PLBartForCausalLM            | 1  | 20.349957  |
|          DistilBertForMaskedLM          | 1  | 20.256207  |
|         Speech2Text2ForCausalLM         | 1  |  20.12594  |
|     DistilBertForQuestionAnswering      | 1  | 20.040738  |
|            AlbertForMaskedLM            | 1  | 19.303654  |
|               DistillGPT2               | 1  | 18.220354  |
|       AlbertForQuestionAnswering        | 1  | 16.540213  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986562 |
|      MBartForConditionalGeneration      | 1  | 0.972375 |
|      GPT2ForSequenceClassification      | 1  | 0.957326 |
|       T5ForConditionalGeneration        | 1  | 0.955593 |
|          AllenaiLongformerBase          | 1  | 0.950898 |
|            MBartForCausalLM             | 1  | 0.926072 |
|            XLNetLMHeadModel             | 1  | 0.910874 |
|                 T5Small                 | 1  | 0.907094 |
|     PLBartForConditionalGeneration      | 1  | 0.90234  |
|            PLBartForCausalLM            | 1  | 0.901359 |
|       DebertaForQuestionAnswering       | 1  | 0.881351 |
|               GoogleFnet                | 1  | 0.851911 |
|       RobertaForQuestionAnswering       | 1  | 0.844086 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.843031 |
|        BertForQuestionAnswering         | 1  | 0.842552 |
|    LayoutLMForSequenceClassification    | 1  | 0.840021 |
|       ElectraForQuestionAnswering       | 1  | 0.83877  |
|    MegatronBertForQuestionAnswering     | 1  | 0.831777 |
|           DebertaForMaskedLM            | 1  | 0.830963 |
|               DistillGPT2               | 1  | 0.826134 |
|          DebertaV2ForMaskedLM           | 1  | 0.823985 |
|           LayoutLMForMaskedLM           | 1  | 0.811972 |
|             BertForMaskedLM             | 1  | 0.809866 |
|                CamemBert                | 1  | 0.809365 |
|         MegatronBertForCausalLM         | 1  | 0.809259 |
|           RobertaForCausalLM            | 1  | 0.807924 |
|         Speech2Text2ForCausalLM         | 1  | 0.803029 |
|           ElectraForCausalLM            | 1  | 0.801432 |
|             BartForCausalLM             | 1  | 0.800395 |
|          BlenderbotForCausalLM          | 1  | 0.798125 |
|     DistilBertForQuestionAnswering      | 1  | 0.796429 |
|      BartForConditionalGeneration       | 1  | 0.789497 |
|            TrOCRForCausalLM             | 1  | 0.785778 |
|            YituTechConvBert             | 1  | 0.785327 |
|       MT5ForConditionalGeneration       | 1  | 0.781852 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763071 |
|           PegasusForCausalLM            | 1  | 0.752889 |
|          DistilBertForMaskedLM          | 1  | 0.744202 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.73649  |
|     MobileBertForQuestionAnswering      | 1  | 0.718365 |
|     PegasusForConditionalGeneration     | 1  | 0.712342 |
|     M2M100ForConditionalGeneration      | 1  | 0.708303 |
|          MobileBertForMaskedLM          | 1  | 0.697014 |
|             XGLMForCausalLM             | 1  | 0.69654  |
|            AlbertForMaskedLM            | 1  | 0.448436 |
|       AlbertForQuestionAnswering        | 1  | 0.443394 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12827.026257 |
|       AlbertForQuestionAnswering        | 1  | 12783.464764 |
|      MBartForConditionalGeneration      | 1  | 6183.796458  |
|      BartForConditionalGeneration       | 1  | 5684.029268  |
|             OPTForCausalLM              | 1  |  5289.73968  |
|          DebertaV2ForMaskedLM           | 1  | 5060.720964  |
|      DebertaV2ForQuestionAnswering      | 1  | 3977.809691  |
|            XLNetLMHeadModel             | 1  | 3179.579416  |
|            MBartForCausalLM             | 1  | 3070.854467  |
|          BlenderbotForCausalLM          | 1  | 2640.971228  |
|             BartForCausalLM             | 1  | 2564.822773  |
|                 T5Small                 | 1  |  2524.21928  |
|       T5ForConditionalGeneration        | 1  | 2511.970855  |
|          AllenaiLongformerBase          | 1  | 2448.644515  |
|     PLBartForConditionalGeneration      | 1  | 2199.322348  |
|         MegatronBertForCausalLM         | 1  |  2047.86929  |
|    MegatronBertForQuestionAnswering     | 1  | 1874.745311  |
|      GPT2ForSequenceClassification      | 1  |  1320.81238  |
|            PLBartForCausalLM            | 1  | 1230.582619  |
|             XGLMForCausalLM             | 1  |  835.035167  |
|           DebertaForMaskedLM            | 1  |  786.364088  |
|           RobertaForCausalLM            | 1  |  781.473753  |
|     M2M100ForConditionalGeneration      | 1  |  716.151155  |
|                CamemBert                | 1  |  697.65744   |
|            YituTechConvBert             | 1  |  691.98459   |
|           LayoutLMForMaskedLM           | 1  |  688.592504  |
|             BertForMaskedLM             | 1  |  686.336459  |
|     PegasusForConditionalGeneration     | 1  |  606.37678   |
|            TrOCRForCausalLM             | 1  |  592.841509  |
|       DebertaForQuestionAnswering       | 1  |  555.69678   |
|    LayoutLMForSequenceClassification    | 1  |  549.345774  |
|       RobertaForQuestionAnswering       | 1  |  548.060961  |
|        BertForQuestionAnswering         | 1  |  545.002698  |
|               DistillGPT2               | 1  |  506.491073  |
|               GoogleFnet                | 1  |  474.567131  |
|       MT5ForConditionalGeneration       | 1  |  393.816785  |
|           PegasusForCausalLM            | 1  |  300.52134   |
| BlenderbotSmallForConditionalGeneration | 1  |  143.156694  |
|           ElectraForCausalLM            | 1  |   133.0811   |
|          DistilBertForMaskedLM          | 1  |  99.715981   |
|       ElectraForQuestionAnswering       | 1  |  93.123332   |
|       BlenderbotSmallForCausalLM        | 1  |  84.381385   |
|          MobileBertForMaskedLM          | 1  |  68.302865   |
|     DistilBertForQuestionAnswering      | 1  |  63.864642   |
|     MobileBertForQuestionAnswering      | 1  |  41.102346   |
|         Speech2Text2ForCausalLM         | 1  |  18.704445   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.439424 |
|          inception_v3           | 1  | 2.416469 |
|       gluon_inception_v3        | 1  | 2.390939 |
|        adv_inception_v3         | 1  | 2.375986 |
|           dm_nfnet_f0           | 1  | 2.224252 |
|            nfnet_l0             | 1  | 2.215704 |
|         mobilenetv2_100         | 1  | 2.171072 |
|            levit_128            | 1  |  2.1406  |
|          ghostnet_100           | 1  | 2.127889 |
|          spnasnet_100           | 1  | 2.119941 |
|            lcnet_050            | 1  | 2.106562 |
|           mnasnet_100           | 1  | 2.079547 |
|           regnety_002           | 1  | 2.036308 |
|           fbnetc_100            | 1  | 2.034227 |
|      mobilenetv3_large_100      | 1  | 1.977788 |
|            repvgg_a2            | 1  | 1.955935 |
|           rexnet_100            | 1  | 1.857316 |
|            fbnetv3_b            | 1  | 1.849552 |
|           selecsls42b           | 1  | 1.806075 |
|         poolformer_m36          | 1  | 1.785939 |
|       tf_efficientnet_b0        | 1  | 1.781798 |
|            tinynet_a            | 1  | 1.73074  |
|             dla102              | 1  | 1.729407 |
|        ese_vovnet19b_dw         | 1  | 1.720788 |
|           mobilevit_s           | 1  | 1.681295 |
|          botnet26t_256          | 1  | 1.654268 |
|           tf_mixnet_l           | 1  | 1.642635 |
|       eca_botnext26ts_256       | 1  | 1.631468 |
|        eca_halonext26ts         | 1  | 1.621582 |
|          cspdarknet53           | 1  | 1.606004 |
|           res2next50            | 1  | 1.579999 |
|        res2net50_14w_8s         | 1  | 1.567242 |
|           volo_d1_224           | 1  | 1.551761 |
|         coat_lite_mini          | 1  | 1.544056 |
|        res2net101_26w_4s        | 1  | 1.525639 |
|         visformer_small         | 1  | 1.434452 |
|        twins_pcpvt_base         | 1  | 1.422513 |
|           convit_base           | 1  | 1.394808 |
|          gmixer_24_224          | 1  | 1.371987 |
|          jx_nest_base           | 1  | 1.356096 |
|            mixnet_l             | 1  | 1.348266 |
|            gernet_l             | 1  | 1.32932  |
|          resmlp_12_224          | 1  | 1.304885 |
|      beit_base_patch16_224      | 1  | 1.296629 |
|         crossvit_9_240          | 1  | 1.285978 |
|  swin_base_patch4_window7_224   | 1  | 1.285946 |
|        tnt_s_patch16_224        | 1  | 1.246365 |
|        convmixer_768_32         | 1  | 1.240737 |
|          gmlp_s16_224           | 1  | 1.230936 |
| deit_base_distilled_patch16_224 | 1  | 1.213039 |
|          mixer_b16_224          | 1  | 1.195072 |
|      vit_base_patch16_224       | 1  | 1.189446 |
|            pit_b_224            | 1  | 1.185032 |
|      xcit_large_24_p8_224       | 1  | 1.183604 |
|             dpn107              | 1  | 1.182756 |
|          convnext_base          | 1  | 1.174544 |
|          cait_m36_384           | 1  | 1.062094 |
|           resnest101e           | 1  | 0.988869 |
|        sebotnet33ts_256         | 1  | 0.960822 |
|            hrnet_w18            | 1  | 0.607728 |
|     swsl_resnext101_32x16d      | 1  | 0.066824 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 497.851835 |
|            hrnet_w18            | 1  | 285.149956 |
|     swsl_resnext101_32x16d      | 1  | 102.925062 |
|        res2net101_26w_4s        | 1  | 90.577742  |
|           resnest101e           | 1  | 84.459431  |
|        res2net50_14w_8s         | 1  | 73.466698  |
|           tf_mixnet_l           | 1  |  71.77966  |
|            mixnet_l             | 1  | 69.591398  |
|      xcit_large_24_p8_224       | 1  | 67.441396  |
|          cait_m36_384           | 1  | 60.783475  |
|         poolformer_m36          | 1  | 60.687731  |
|        twins_pcpvt_base         | 1  | 59.890369  |
|  swin_base_patch4_window7_224   | 1  | 53.396126  |
|             dpn107              | 1  | 52.647751  |
|            fbnetv3_b            | 1  | 47.590882  |
|        adv_inception_v3         | 1  |  45.42417  |
|          jx_nest_base           | 1  |  43.97297  |
|          inception_v3           | 1  | 43.380904  |
|       gluon_inception_v3        | 1  | 43.334755  |
|             dla102              | 1  | 43.045347  |
|        tnt_s_patch16_224        | 1  | 42.635883  |
|           res2next50            | 1  | 40.241514  |
|           mobilevit_s           | 1  | 40.202059  |
|          ghostnet_100           | 1  | 38.869581  |
|          convnext_base          | 1  | 37.144112  |
|           volo_d1_224           | 1  | 36.524591  |
|            tinynet_a            | 1  | 35.628753  |
|          gmixer_24_224          | 1  | 34.590936  |
|          gmlp_s16_224           | 1  | 33.472791  |
|         crossvit_9_240          | 1  |  33.24946  |
|        sebotnet33ts_256         | 1  | 32.084683  |
|           rexnet_100            | 1  | 31.950227  |
|         coat_lite_mini          | 1  | 31.322855  |
|       tf_efficientnet_b0        | 1  | 31.293978  |
|            levit_128            | 1  | 31.282362  |
|           dm_nfnet_f0           | 1  | 31.131401  |
|        eca_halonext26ts         | 1  | 30.997134  |
|        convmixer_768_32         | 1  | 29.127585  |
|            nfnet_l0             | 1  | 29.108738  |
|          cspdarknet53           | 1  | 28.935528  |
|           regnety_002           | 1  | 27.100756  |
|       eca_botnext26ts_256       | 1  | 26.624309  |
|         visformer_small         | 1  | 26.572995  |
|           fbnetc_100            | 1  |  26.38669  |
|           convit_base           | 1  | 26.358087  |
|          spnasnet_100           | 1  | 26.187362  |
|      mobilenetv3_large_100      | 1  | 25.864463  |
|          botnet26t_256          | 1  | 25.600432  |
|            gernet_l             | 1  | 25.027047  |
|            pit_b_224            | 1  | 24.721849  |
|            repvgg_a2            | 1  | 24.314973  |
|         mobilenetv2_100         | 1  | 23.716804  |
|           mnasnet_100           | 1  | 23.455778  |
|      beit_base_patch16_224      | 1  |  23.45169  |
|          mixer_b16_224          | 1  | 23.226805  |
| deit_base_distilled_patch16_224 | 1  | 22.620386  |
|      vit_base_patch16_224       | 1  |  22.55726  |
|        ese_vovnet19b_dw         | 1  | 22.063501  |
|           selecsls42b           | 1  | 21.313427  |
|            lcnet_050            | 1  | 19.482733  |
|          resmlp_12_224          | 1  | 18.833757  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.950209 |
|        convmixer_768_32         | 1  | 0.904472 |
|          pnasnet5large          | 1  | 0.898238 |
|            nfnet_l0             | 1  | 0.895731 |
|      xcit_large_24_p8_224       | 1  | 0.890752 |
|        ese_vovnet19b_dw         | 1  | 0.879036 |
|         mobilenetv2_100         | 1  | 0.869944 |
|       eca_botnext26ts_256       | 1  | 0.868097 |
|        eca_halonext26ts         | 1  | 0.867305 |
|           mnasnet_100           | 1  | 0.863413 |
|       tf_efficientnet_b0        | 1  | 0.862237 |
|            lcnet_050            | 1  | 0.854967 |
|          spnasnet_100           | 1  | 0.852909 |
|      mobilenetv3_large_100      | 1  | 0.852536 |
|            fbnetv3_b            | 1  | 0.852191 |
|           rexnet_100            | 1  | 0.851958 |
|          botnet26t_256          | 1  | 0.851854 |
|           dm_nfnet_f0           | 1  | 0.850458 |
|           fbnetc_100            | 1  | 0.848957 |
|            tinynet_a            | 1  | 0.84791  |
|          cspdarknet53           | 1  | 0.841754 |
|           tf_mixnet_l           | 1  | 0.839729 |
|           mobilevit_s           | 1  | 0.839239 |
|          ghostnet_100           | 1  | 0.829585 |
|           regnety_002           | 1  | 0.82836  |
|            mixnet_l             | 1  | 0.828012 |
|          resmlp_12_224          | 1  | 0.823825 |
|         visformer_small         | 1  | 0.821043 |
|         coat_lite_mini          | 1  | 0.820627 |
|        sebotnet33ts_256         | 1  | 0.816482 |
|             dpn107              | 1  | 0.81295  |
|          convnext_base          | 1  | 0.801272 |
|         poolformer_m36          | 1  | 0.799849 |
|           resnest101e           | 1  | 0.796614 |
|            levit_128            | 1  | 0.794826 |
|          gmlp_s16_224           | 1  | 0.794675 |
|       gluon_inception_v3        | 1  | 0.794036 |
|        adv_inception_v3         | 1  | 0.793928 |
|             dla102              | 1  | 0.79378  |
|          inception_v3           | 1  | 0.792402 |
|           res2next50            | 1  | 0.791417 |
|           volo_d1_224           | 1  | 0.78788  |
|           selecsls42b           | 1  | 0.785965 |
|          gmixer_24_224          | 1  | 0.784133 |
|          jx_nest_base           | 1  | 0.783306 |
|         crossvit_9_240          | 1  | 0.781919 |
|           convit_base           | 1  | 0.780428 |
|        tnt_s_patch16_224        | 1  | 0.779364 |
|        twins_pcpvt_base         | 1  | 0.778753 |
|          mixer_b16_224          | 1  | 0.775557 |
|      beit_base_patch16_224      | 1  | 0.770597 |
|            hrnet_w18            | 1  | 0.765038 |
|        res2net50_14w_8s         | 1  | 0.764727 |
| deit_base_distilled_patch16_224 | 1  | 0.762595 |
|      vit_base_patch16_224       | 1  | 0.760162 |
|            pit_b_224            | 1  | 0.752901 |
|  swin_base_patch4_window7_224   | 1  | 0.739189 |
|        res2net101_26w_4s        | 1  | 0.738521 |
|            repvgg_a2            | 1  | 0.735478 |
|            gernet_l             | 1  | 0.733077 |
|     swsl_resnext101_32x16d      | 1  | 0.651346 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|     swsl_resnext101_32x16d      | 1  | 13585.69782 |
|          cait_m36_384           | 1  | 3376.858198 |
|      xcit_large_24_p8_224       | 1  | 1556.446729 |
|           resnest101e           | 1  | 1087.499603 |
|          pnasnet5large          | 1  | 369.923453  |
|          convnext_base          | 1  | 303.343259  |
|            hrnet_w18            | 1  |  292.77504  |
|             dpn107              | 1  | 269.420982  |
|        convmixer_768_32         | 1  | 248.960939  |
|          jx_nest_base           | 1  | 248.040003  |
|  swin_base_patch4_window7_224   | 1  | 223.065622  |
|      vit_base_patch16_224       | 1  | 200.410726  |
|      beit_base_patch16_224      | 1  | 198.957087  |
|           convit_base           | 1  | 198.489511  |
| deit_base_distilled_patch16_224 | 1  | 197.185932  |
|            pit_b_224            | 1  |  170.44983  |
|           dm_nfnet_f0           | 1  | 158.792542  |
|          mixer_b16_224          | 1  | 143.739238  |
|         poolformer_m36          | 1  | 119.883296  |
|        res2net101_26w_4s        | 1  | 112.404341  |
|        twins_pcpvt_base         | 1  | 103.321631  |
|        sebotnet33ts_256         | 1  |  98.426079  |
|            nfnet_l0             | 1  |  94.660592  |
|        tnt_s_patch16_224        | 1  |  93.247034  |
|           volo_d1_224           | 1  |  93.215498  |
|             dla102              | 1  |  90.642174  |
|          cspdarknet53           | 1  |  83.761532  |
|          inception_v3           | 1  |  69.611965  |
|       gluon_inception_v3        | 1  |  69.529513  |
|        adv_inception_v3         | 1  |  69.371416  |
|          gmlp_s16_224           | 1  |  68.214497  |
|         visformer_small         | 1  |  67.024076  |
|          gmixer_24_224          | 1  |  65.314208  |
|        res2net50_14w_8s         | 1  |   64.6143   |
|            repvgg_a2            | 1  |  64.391336  |
|           res2next50            | 1  |  59.318776  |
|            gernet_l             | 1  |  56.954185  |
|          botnet26t_256          | 1  |  46.423342  |
|           selecsls42b           | 1  |  44.891229  |
|        eca_halonext26ts         | 1  |  43.109685  |
|       eca_botnext26ts_256       | 1  |  41.817974  |
|         coat_lite_mini          | 1  |  37.512567  |
|           mobilevit_s           | 1  |  37.463441  |
|          resmlp_12_224          | 1  |  36.058708  |
|         crossvit_9_240          | 1  |  32.312751  |
|        ese_vovnet19b_dw         | 1  |  30.940563  |
|            mixnet_l             | 1  |  30.423226  |
|           tf_mixnet_l           | 1  |  29.069317  |
|            fbnetv3_b            | 1  |  15.408021  |
|       tf_efficientnet_b0        | 1  |  13.62839   |
|           rexnet_100            | 1  |  13.328328  |
|            tinynet_a            | 1  |  11.750719  |
|           fbnetc_100            | 1  |  9.238242   |
|            levit_128            | 1  |   8.97039   |
|          ghostnet_100           | 1  |  8.656134   |
|          spnasnet_100           | 1  |  8.149972   |
|           mnasnet_100           | 1  |  7.524095   |
|         mobilenetv2_100         | 1  |  7.355215   |
|      mobilenetv3_large_100      | 1  |  7.083648   |
|           regnety_002           | 1  |  6.125227   |
|            lcnet_050            | 1  |  2.438328   |
+---------------------------------+----+-------------+

WeizhuoZhang-intel · 2024-10-14T23:29:58Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.97x    |    2.14x    |    2.63x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   38.93    |    43.76    |    55.84    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.447489 |
|          timm_resnest           |   32    | 4.637852  |
|            resnet50             |   32    |  3.93947  |
|          squeezenet1_1          |   16    | 3.365345  |
|         phlippe_resnet          |   128   | 3.330402  |
|              vgg16              |    4    | 3.290727  |
|            resnet152            |   32    | 3.191298  |
|           mnasnet1_0            |   32    | 3.128405  |
|           timm_vovnet           |   32    |  3.11558  |
|             yolov3              |    8    |  3.09161  |
|         resnext50_32x4d         |    8    | 3.059108  |
|          mobilenet_v2           |   16    |  3.02282  |
|            resnet18             |    8    | 2.896797  |
|             hf_GPT2             |    1    | 2.855643  |
|             alexnet             |   128   | 2.823032  |
|       mobilenet_v3_large        |   32    | 2.802885  |
|        timm_efficientnet        |   64    | 2.689798  |
|       shufflenet_v2_x1_0        |   64    | 2.615852  |
|           timm_regnet           |   32    | 2.551041  |
|           timm_nfnet            |   128   | 2.518621  |
|             hf_Bert             |    1    | 2.507083  |
|              llama              |   32    | 2.476892  |
|           hf_T5_base            |    1    | 2.398057  |
|           densenet121           |   64    | 2.371295  |
|       doctr_det_predictor       |    1    | 2.365523  |
|          hf_DistilBert          |    1    | 2.319314  |
|          BERT_pytorch           |    2    | 2.318731  |
|        soft_actor_critic        |   256   | 2.274201  |
|          hf_Bert_large          |    1    | 2.194554  |
|          lennard_jones          |  1000   | 2.183214  |
|             hf_Bart             |    1    | 2.155359  |
|     functorch_maml_omniglot     |    1    | 2.118631  |
|          fastNLP_Bert           |    1    | 2.096137  |
|              hf_T5              |    1    |  2.09378  |
|            moondream            |    1    | 2.067477  |
|              dcgan              |   256   | 2.051937  |
|        phlippe_densenet         |   128   |  2.05141  |
|            hf_Albert            |    1    | 2.038082  |
|         LearningToPaint         |   96    | 1.965208  |
|          hf_Longformer          |    1    | 1.832724  |
|          hf_GPT2_large          |    1    | 1.827441  |
|           hf_T5_large           |    1    |  1.80332  |
|      doctr_reco_predictor       |    1    | 1.799266  |
|          pytorch_unet           |    1    | 1.737806  |
|       Background_Matting        |    1    |  1.71765  |
|        basic_gnn_edgecnn        |    1    | 1.615978  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.599187  |
|     timm_vision_transformer     |   32    | 1.565183  |
|        hf_distil_whisper        |    1    | 1.548353  |
|         pytorch_stargan         |   16    | 1.526393  |
|       speech_transformer        |    1    | 1.491358  |
|     nvidia_deeprecommender      |   256   | 1.479502  |
|          maml_omniglot          |    5    | 1.472982  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.464953  |
|           hf_Reformer           |    1    | 1.402367  |
|  timm_vision_transformer_large  |   32    | 1.359433  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.358316  |
|          basic_gnn_gin          |    1    | 1.357034  |
|         vision_maskrcnn         |    1    | 1.339758  |
|         basic_gnn_sage          |    1    | 1.281695  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.250783  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.238819  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.20009  |
|           Super_SloMo           |    6    | 1.156626  |
|              dlrm               |  2048   |  1.15227  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.148868  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.133625  |
|          basic_gnn_gcn          |    1    | 1.130788  |
|           hf_BigBird            |    1    | 1.128456  |
|      torch_multimodal_clip      |   32    | 1.117383  |
|         opacus_cifar10          |   64    | 1.086886  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.081466  |
|             demucs              |    1    |  1.0666   |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.042771  |
|      functorch_dp_cifar10       |   64    | 1.013082  |
|   mobilenet_v2_quantized_qat    |   96    |  0.99214  |
|           tts_angular           |   64    | 0.982827  |
|     resnet50_quantized_qat      |   32    | 0.972103  |
|              maml               |    1    | 0.838103  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 284.561274 |
|         vision_maskrcnn         |    1    | 271.811803 |
|    detectron2_fcos_r_50_fpn     |    1    | 215.272564 |
| detectron2_fasterrcnn_r_101_fpn |    1    |  205.5213  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 189.96193  |
|           hf_T5_large           |    1    | 131.501069 |
|              maml               |    1    | 131.499085 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 103.113972 |
|          hf_Longformer          |    1    | 93.491925  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 90.725472  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 87.315417  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 81.449968  |
|       speech_transformer        |    1    | 79.592992  |
|           hf_Reformer           |    1    | 70.754814  |
|            resnet152            |   32    | 65.214171  |
|  timm_vision_transformer_large  |   32    | 62.188361  |
|           densenet121           |   64    |  59.72643  |
|      torch_multimodal_clip      |   32    | 55.709511  |
|          basic_gnn_gcn          |    1    | 54.686122  |
|          hf_GPT2_large          |    1    | 51.638663  |
|           hf_T5_base            |    1    | 51.544064  |
|          fastNLP_Bert           |    1    | 48.832238  |
|            moondream            |    1    | 46.514824  |
|           Super_SloMo           |    6    | 45.359815  |
|       doctr_det_predictor       |    1    |  44.82444  |
|          hf_Bert_large          |    1    | 40.416252  |
|           timm_regnet           |   32    | 39.849755  |
|        hf_distil_whisper        |    1    | 39.187532  |
|             yolov3              |    8    | 38.249684  |
|     pyhpc_isoneutral_mixing     | 1048576 | 37.692073  |
|          BERT_pytorch           |    2    |  36.33556  |
|        timm_efficientnet        |   64    | 35.156318  |
|           timm_nfnet            |   128   | 33.117893  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 31.264203  |
|             demucs              |    1    | 30.641886  |
|       mobilenet_v3_large        |   32    |  30.16011  |
|        phlippe_densenet         |   128   | 29.598744  |
|             hf_Bart             |    1    |  29.10973  |
|       shufflenet_v2_x1_0        |   64    | 28.858307  |
|              hf_T5              |    1    | 28.671931  |
|          mobilenet_v2           |   16    | 26.824717  |
|     timm_vision_transformer     |   32    | 25.645613  |
|         resnext50_32x4d         |    8    | 25.475845  |
|            resnet50             |   32    | 25.191398  |
|             hf_Bert             |    1    | 24.587205  |
|          timm_resnest           |   32    | 24.157394  |
|           timm_vovnet           |   32    | 24.119735  |
|           mnasnet1_0            |   32    | 23.987744  |
|       Background_Matting        |    1    |  23.98017  |
|      doctr_reco_predictor       |    1    | 23.969978  |
|            hf_Albert            |    1    | 23.280602  |
|             hf_GPT2             |    1    | 22.598204  |
|         opacus_cifar10          |   64    | 21.205984  |
|              llama              |   32    | 21.127788  |
|         pytorch_stargan         |   16    | 20.826629  |
|      functorch_dp_cifar10       |   64    | 20.755843  |
|          pytorch_unet           |    1    | 18.875389  |
|          hf_DistilBert          |    1    | 18.726436  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.265705  |
|            resnet18             |    8    | 18.005135  |
|         LearningToPaint         |   96    | 17.850696  |
|          squeezenet1_1          |   16    | 17.639248  |
|         phlippe_resnet          |   128   | 17.605166  |
|              vgg16              |    4    | 16.146366  |
|             alexnet             |   128   | 15.634606  |
|          maml_omniglot          |    5    |  14.9342   |
|     pyhpc_equation_of_state     | 1048576 | 14.334335  |
|     functorch_maml_omniglot     |    1    | 14.311307  |
|              dcgan              |   256   | 13.932995  |
|              dlrm               |  2048   | 13.919046  |
|          basic_gnn_gin          |    1    | 13.605401  |
|         basic_gnn_sage          |    1    | 13.562858  |
|     nvidia_deeprecommender      |   256   | 13.559968  |
|        basic_gnn_edgecnn        |    1    | 13.529883  |
|        soft_actor_critic        |   256   | 13.446676  |
|          lennard_jones          |  1000   | 13.304271  |
|           tts_angular           |   64    | 12.502213  |
|   mobilenet_v2_quantized_qat    |   96    |  0.218236  |
|     resnet50_quantized_qat      |   32    |  0.199666  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.987827 |
|           timm_nfnet            |   128   | 0.985649 |
|             demucs              |    1    | 0.981166 |
|           hf_T5_base            |    1    | 0.978656 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974619 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97461  |
|           timm_regnet           |   32    | 0.971993 |
|      torch_multimodal_clip      |   32    | 0.971851 |
|       Background_Matting        |    1    | 0.971399 |
|           Super_SloMo           |    6    | 0.969977 |
|              llama              |   32    | 0.969133 |
|        timm_efficientnet        |   64    | 0.968815 |
|       doctr_det_predictor       |    1    | 0.967896 |
|           densenet121           |   64    | 0.967182 |
|          pytorch_unet           |    1    | 0.964363 |
|         LearningToPaint         |   96    | 0.96219  |
|        basic_gnn_edgecnn        |    1    | 0.961826 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.961423 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.960861 |
|            resnet50             |   32    | 0.960731 |
|            resnet152            |   32    | 0.960697 |
|           hf_BigBird            |    1    | 0.960107 |
|             yolov3              |    8    | 0.956324 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955763 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.954776 |
|             alexnet             |   128   | 0.954584 |
|         vision_maskrcnn         |    1    | 0.953454 |
|     resnet50_quantized_qat      |   32    | 0.952941 |
|           timm_vovnet           |   32    | 0.95243  |
|     timm_vision_transformer     |   32    | 0.951072 |
|   mobilenet_v2_quantized_qat    |   96    | 0.950372 |
|          timm_resnest           |   32    | 0.946515 |
|          basic_gnn_gcn          |    1    | 0.942414 |
|         basic_gnn_sage          |    1    | 0.93892  |
|  timm_vision_transformer_large  |   32    | 0.938538 |
|      doctr_reco_predictor       |    1    | 0.935423 |
|     pyhpc_equation_of_state     | 1048576 | 0.934956 |
|           mnasnet1_0            |   32    | 0.934859 |
|          basic_gnn_gin          |    1    | 0.934844 |
|             hf_Bert             |    1    | 0.928525 |
|         resnext50_32x4d         |    8    | 0.928221 |
|       mobilenet_v3_large        |   32    | 0.927494 |
|         pytorch_stargan         |   16    | 0.927348 |
|       shufflenet_v2_x1_0        |   64    | 0.920826 |
|          mobilenet_v2           |   16    | 0.920155 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.918919 |
|     nvidia_deeprecommender      |   256   | 0.918691 |
|          fastNLP_Bert           |    1    | 0.916362 |
|       speech_transformer        |    1    | 0.915226 |
|            hf_Albert            |    1    | 0.911067 |
|        phlippe_densenet         |   128   | 0.908446 |
|          BERT_pytorch           |    2    | 0.908377 |
|           tts_angular           |   64    | 0.90455  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.902298 |
|             hf_GPT2             |    1    | 0.898471 |
|          hf_DistilBert          |    1    | 0.898001 |
|              dcgan              |   256   | 0.896391 |
|          hf_Longformer          |    1    | 0.894564 |
|          hf_GPT2_large          |    1    | 0.893722 |
|          squeezenet1_1          |   16    | 0.893276 |
|              hf_T5              |    1    | 0.890179 |
|         opacus_cifar10          |   64    | 0.88961  |
|             hf_Bart             |    1    | 0.887752 |
|        hf_distil_whisper        |    1    | 0.883846 |
|              vgg16              |    4    | 0.881888 |
|        soft_actor_critic        |   256   | 0.877445 |
|      functorch_dp_cifar10       |   64    | 0.86578  |
|         phlippe_resnet          |   128   | 0.861745 |
|           hf_T5_large           |    1    | 0.860205 |
|          lennard_jones          |  1000   | 0.858824 |
|          hf_Bert_large          |    1    | 0.855761 |
|     functorch_maml_omniglot     |    1    | 0.853736 |
|            moondream            |    1    | 0.85299  |
|          maml_omniglot          |    5    | 0.849684 |
|           hf_Reformer           |    1    | 0.815944 |
|            resnet18             |    8    |  0.8125  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.766009 |
|              maml               |    1    | 0.734082 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.56976  |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 814.846222 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 794.299755 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 705.421153 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 698.040131 |
|  timm_vision_transformer_large  |   32    | 674.861942 |
|           hf_T5_base            |    1    | 350.48227  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 250.575291 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 245.113482 |
|         vision_maskrcnn         |    1    | 237.238487 |
|           Super_SloMo           |    6    | 203.122726 |
|          hf_GPT2_large          |    1    | 135.472835 |
|           hf_T5_large           |    1    | 128.201896 |
|           timm_nfnet            |   128   | 106.039192 |
|            moondream            |    1    | 100.922232 |
|        hf_distil_whisper        |    1    | 93.372117  |
|           hf_BigBird            |    1    | 89.162129  |
|    detectron2_fcos_r_50_fpn     |    1    | 76.100966  |
|          pytorch_unet           |    1    | 65.868433  |
|       Background_Matting        |    1    | 64.603214  |
|      torch_multimodal_clip      |   32    | 61.358705  |
|              maml               |    1    | 54.523659  |
|           densenet121           |   64    | 53.667653  |
|             demucs              |    1    | 51.168769  |
|           timm_regnet           |   32    | 42.407822  |
|       doctr_det_predictor       |    1    | 39.636288  |
|            resnet152            |   32    | 37.120557  |
|          hf_Bert_large          |    1    | 35.590215  |
|          hf_Longformer          |    1    | 33.342703  |
|             yolov3              |    8    |  32.4572   |
|   mobilenet_v2_quantized_qat    |   96    | 31.606971  |
|     pyhpc_isoneutral_mixing     | 1048576 | 27.404474  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.247892  |
|       speech_transformer        |    1    | 24.096165  |
|        timm_efficientnet        |   64    | 23.461958  |
|     timm_vision_transformer     |   32    |  22.09482  |
|           hf_Reformer           |    1    |  21.32117  |
|             hf_Bart             |    1    | 19.203063  |
|              hf_T5              |    1    | 18.823953  |
|           timm_vovnet           |   32    | 18.401871  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.920346  |
|          fastNLP_Bert           |    1    | 16.690724  |
|         pytorch_stargan         |   16    | 16.482807  |
|     nvidia_deeprecommender      |   256   | 16.277362  |
|            resnet50             |   32    | 14.599333  |
|             hf_Bert             |    1    | 14.014291  |
|            hf_Albert            |    1    | 13.635726  |
|          BERT_pytorch           |    2    | 12.456095  |
|     resnet50_quantized_qat      |   32    | 12.181143  |
|             hf_GPT2             |    1    | 10.434828  |
|       shufflenet_v2_x1_0        |   64    | 10.241305  |
|          timm_resnest           |   32    |  9.89232   |
|         resnext50_32x4d         |    8    |  9.539373  |
|              llama              |   32    |  9.436977  |
|        phlippe_densenet         |   128   |  9.245183  |
|       mobilenet_v3_large        |   32    |  9.128558  |
|         LearningToPaint         |   96    |  8.702857  |
|           tts_angular           |   64    |  8.624432  |
|          hf_DistilBert          |    1    |  8.283294  |
|          basic_gnn_gcn          |    1    |  7.970143  |
|         opacus_cifar10          |   64    |  7.864526  |
|      functorch_dp_cifar10       |   64    |  7.811943  |
|           mnasnet1_0            |   32    |  7.595415  |
|              vgg16              |    4    |  7.577568  |
|        basic_gnn_edgecnn        |    1    |  6.931936  |
|             alexnet             |   128   |  6.921814  |
|          mobilenet_v2           |   16    |  5.827851  |
|              dlrm               |  2048   |  4.406613  |
|         basic_gnn_sage          |    1    |  4.135136  |
|          squeezenet1_1          |   16    |  3.895035  |
|          basic_gnn_gin          |    1    |  3.694805  |
|            resnet18             |    8    |  3.564447  |
|      doctr_reco_predictor       |    1    |  3.114134  |
|              dcgan              |   256   |  2.934687  |
|         phlippe_resnet          |   128   |  1.674202  |
|     pyhpc_equation_of_state     | 1048576 |  1.15309   |
|          maml_omniglot          |    5    |  0.528757  |
|     functorch_maml_omniglot     |    1    |  0.392642  |
|        soft_actor_critic        |   256   |  0.268302  |
|          lennard_jones          |  1000   |  0.18089   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.306223 |
|       ElectraForQuestionAnswering       | 64  | 3.927382  |
|           ElectraForCausalLM            | 32  | 3.543618  |
|     MobileBertForQuestionAnswering      | 128 | 3.285889  |
|           RobertaForCausalLM            | 16  | 3.002516  |
|       RobertaForQuestionAnswering       | 16  | 2.941028  |
|    LayoutLMForSequenceClassification    | 16  | 2.908419  |
|                CamemBert                | 16  | 2.843971  |
|          MobileBertForMaskedLM          | 128 |  2.80197  |
|           LayoutLMForMaskedLM           | 16  | 2.709831  |
|             BertForMaskedLM             | 16  | 2.670997  |
|        BertForQuestionAnswering         | 16  |  2.66619  |
|    MegatronBertForQuestionAnswering     |  8  | 2.664839  |
|       T5ForConditionalGeneration        |  4  |  2.35877  |
|                 T5Small                 |  4  |  2.34013  |
|         MegatronBertForCausalLM         |  4  | 2.313169  |
|               DistillGPT2               | 16  | 2.312265  |
|           DebertaForMaskedLM            |  8  | 2.307336  |
|      GPT2ForSequenceClassification      |  4  | 2.240424  |
|       DebertaForQuestionAnswering       | 16  | 2.163422  |
|      DebertaV2ForQuestionAnswering      |  1  | 2.118065  |
|            YituTechConvBert             | 16  |  2.08048  |
|             OPTForCausalLM              |  2  | 2.051298  |
|             XGLMForCausalLM             |  8  | 2.025246  |
|         Speech2Text2ForCausalLM         | 256 | 2.009931  |
|       MT5ForConditionalGeneration       | 16  | 1.930411  |
|       BlenderbotSmallForCausalLM        | 64  | 1.847172  |
|     DistilBertForQuestionAnswering      | 256 | 1.796968  |
|          BlenderbotForCausalLM          |  4  | 1.770501  |
|            PLBartForCausalLM            |  8  | 1.763095  |
|            MBartForCausalLM             |  4  | 1.748456  |
|          DistilBertForMaskedLM          | 128 | 1.716913  |
|          DebertaV2ForMaskedLM           |  2  | 1.707489  |
|     PLBartForConditionalGeneration      |  4  | 1.694705  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.679392  |
|            TrOCRForCausalLM             | 32  | 1.644955  |
|     M2M100ForConditionalGeneration      | 16  | 1.627341  |
|      MBartForConditionalGeneration      |  2  |  1.61649  |
|           PegasusForCausalLM            | 32  | 1.610241  |
|     PegasusForConditionalGeneration     | 32  | 1.574591  |
|      BartForConditionalGeneration       |  2  | 1.513005  |
|            AlbertForMaskedLM            |  4  | 1.493375  |
|             BartForCausalLM             |  4  | 1.479292  |
|       AlbertForQuestionAnswering        |  4  | 1.475744  |
|               GoogleFnet                | 16  | 1.356873  |
|          AllenaiLongformerBase          |  4  | 1.057638  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 133.05062  |
|          MobileBertForMaskedLM          | 128 | 118.472675 |
|     MobileBertForQuestionAnswering      | 128 | 117.657625 |
|      MBartForConditionalGeneration      |  2  | 75.314929  |
|     M2M100ForConditionalGeneration      | 16  | 74.052275  |
|     PegasusForConditionalGeneration     | 32  | 73.886038  |
|      BartForConditionalGeneration       |  2  | 67.138959  |
|             XGLMForCausalLM             |  8  | 58.550745  |
|          DebertaV2ForMaskedLM           |  2  | 58.522502  |
|          BlenderbotForCausalLM          |  4  | 58.357812  |
|       MT5ForConditionalGeneration       | 16  | 54.765867  |
|         MegatronBertForCausalLM         |  4  | 49.859456  |
|    MegatronBertForQuestionAnswering     |  8  | 49.747864  |
| BlenderbotSmallForConditionalGeneration | 64  | 49.376326  |
|      DebertaV2ForQuestionAnswering      |  1  | 46.314503  |
|            XLNetLMHeadModel             |  8  | 44.744955  |
|            YituTechConvBert             | 16  |  42.08654  |
|     PLBartForConditionalGeneration      |  4  | 39.199096  |
|       T5ForConditionalGeneration        |  4  | 38.386399  |
|                 T5Small                 |  4  |  38.20074  |
|             OPTForCausalLM              |  2  | 32.895116  |
|           PegasusForCausalLM            | 32  |  32.66408  |
|            MBartForCausalLM             |  4  | 32.614984  |
|            TrOCRForCausalLM             | 32  | 32.140664  |
|       DebertaForQuestionAnswering       | 16  | 32.037409  |
|           DebertaForMaskedLM            |  8  | 31.857847  |
|            AlbertForMaskedLM            |  4  | 30.068278  |
|       RobertaForQuestionAnswering       | 16  | 29.384518  |
|           LayoutLMForMaskedLM           | 16  |  29.13176  |
|        BertForQuestionAnswering         | 16  | 29.096616  |
|             BertForMaskedLM             | 16  | 29.073056  |
|           RobertaForCausalLM            | 16  | 29.072994  |
|                CamemBert                | 16  | 28.935627  |
|           ElectraForCausalLM            | 32  | 28.798918  |
|       ElectraForQuestionAnswering       | 64  | 28.690007  |
|    LayoutLMForSequenceClassification    | 16  |  28.64533  |
|             BartForCausalLM             |  4  | 28.481886  |
|      GPT2ForSequenceClassification      |  4  | 28.323553  |
|       AlbertForQuestionAnswering        |  4  | 27.384551  |
|       BlenderbotSmallForCausalLM        | 64  | 25.838216  |
|          DistilBertForMaskedLM          | 128 | 22.800341  |
|     DistilBertForQuestionAnswering      | 256 | 22.674483  |
|         Speech2Text2ForCausalLM         | 256 | 22.175683  |
|            PLBartForCausalLM            |  8  | 22.087119  |
|               DistillGPT2               | 16  | 20.405179  |
|               GoogleFnet                | 16  | 20.077893  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.991076 |
|       AlbertForQuestionAnswering        |  4  | 0.990535 |
|            PLBartForCausalLM            |  8  | 0.990499 |
|               GoogleFnet                | 16  | 0.990433 |
|          DistilBertForMaskedLM          | 128 | 0.989327 |
|               DistillGPT2               | 16  | 0.989161 |
|           ElectraForCausalLM            | 32  | 0.988894 |
|           LayoutLMForMaskedLM           | 16  | 0.988648 |
|             BertForMaskedLM             | 16  | 0.987352 |
|                CamemBert                | 16  | 0.986817 |
|           RobertaForCausalLM            | 16  | 0.986368 |
|       ElectraForQuestionAnswering       | 64  | 0.986276 |
|             OPTForCausalLM              |  2  | 0.985883 |
|    LayoutLMForSequenceClassification    | 16  | 0.985797 |
|            YituTechConvBert             | 16  | 0.985434 |
|        BertForQuestionAnswering         | 16  | 0.985369 |
|         Speech2Text2ForCausalLM         | 256 | 0.985218 |
|     DistilBertForQuestionAnswering      | 256 | 0.985051 |
|       BlenderbotSmallForCausalLM        | 64  | 0.983811 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.982254 |
|            TrOCRForCausalLM             | 32  | 0.98047  |
|       DebertaForQuestionAnswering       | 16  | 0.979694 |
|       RobertaForQuestionAnswering       | 16  | 0.97955  |
|          MobileBertForMaskedLM          | 128 | 0.978543 |
|                 T5Small                 |  4  | 0.976163 |
|       T5ForConditionalGeneration        |  4  | 0.975787 |
|       MT5ForConditionalGeneration       | 16  | 0.973557 |
|      GPT2ForSequenceClassification      |  4  | 0.972623 |
|             BartForCausalLM             |  4  | 0.969288 |
|     MobileBertForQuestionAnswering      | 128 | 0.965841 |
|           DebertaForMaskedLM            |  8  | 0.965577 |
|          AllenaiLongformerBase          |  4  | 0.965258 |
|           PegasusForCausalLM            | 32  | 0.964971 |
|     PLBartForConditionalGeneration      |  4  | 0.960353 |
|     M2M100ForConditionalGeneration      | 16  | 0.952656 |
|         MegatronBertForCausalLM         |  4  | 0.94342  |
|          BlenderbotForCausalLM          |  4  | 0.941825 |
|    MegatronBertForQuestionAnswering     |  8  | 0.940078 |
|          DebertaV2ForMaskedLM           |  2  | 0.939613 |
|             XGLMForCausalLM             |  8  | 0.936862 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.936264 |
|            XLNetLMHeadModel             |  8  | 0.93328  |
|      BartForConditionalGeneration       |  2  | 0.930225 |
|      MBartForConditionalGeneration      |  2  | 0.925487 |
|            MBartForCausalLM             |  4  | 0.912871 |
|     PegasusForConditionalGeneration     | 32  | 0.910234 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 940.069268 |
|            AlbertForMaskedLM            |  4  | 478.894313 |
|       AlbertForQuestionAnswering        |  4  | 476.34241  |
|            XLNetLMHeadModel             |  8  | 411.697881 |
|               GoogleFnet                | 16  | 274.176192 |
|             OPTForCausalLM              |  2  | 195.896089 |
|            TrOCRForCausalLM             | 32  | 170.508321 |
|      MBartForConditionalGeneration      |  2  | 168.348056 |
|            MBartForCausalLM             |  4  | 165.63719  |
|     PegasusForConditionalGeneration     | 32  | 164.552899 |
|            YituTechConvBert             | 16  | 141.625101 |
|      BartForConditionalGeneration       |  2  | 135.151173 |
|     DistilBertForQuestionAnswering      | 256 | 134.029433 |
|            PLBartForCausalLM            |  8  | 130.722461 |
|    MegatronBertForQuestionAnswering     |  8  | 128.904792 |
|       T5ForConditionalGeneration        |  4  | 121.491292 |
|                 T5Small                 |  4  | 121.420787 |
|     PLBartForConditionalGeneration      |  4  | 119.903377 |
|          BlenderbotForCausalLM          |  4  | 119.78332  |
|     M2M100ForConditionalGeneration      | 16  | 119.438852 |
|          DebertaV2ForMaskedLM           |  2  | 116.104164 |
|          DistilBertForMaskedLM          | 128 | 111.720827 |
| BlenderbotSmallForConditionalGeneration | 64  | 107.912879 |
|          MobileBertForMaskedLM          | 128 | 106.317513 |
|           RobertaForCausalLM            | 16  | 105.292545 |
|             BertForMaskedLM             | 16  | 102.677397 |
|           LayoutLMForMaskedLM           | 16  | 101.845298 |
|                CamemBert                | 16  | 97.087442  |
|             BartForCausalLM             |  4  | 93.473933  |
|       DebertaForQuestionAnswering       | 16  | 89.745464  |
|       MT5ForConditionalGeneration       | 16  | 89.004764  |
|        BertForQuestionAnswering         | 16  | 86.535778  |
|         MegatronBertForCausalLM         |  4  | 84.766861  |
|    LayoutLMForSequenceClassification    | 16  | 79.555097  |
|       RobertaForQuestionAnswering       | 16  | 78.435162  |
|           PegasusForCausalLM            | 32  | 76.358866  |
|               DistillGPT2               | 16  | 73.228343  |
|             XGLMForCausalLM             |  8  | 70.614078  |
|       ElectraForQuestionAnswering       | 64  | 65.703934  |
|           ElectraForCausalLM            | 32  | 59.598909  |
|      GPT2ForSequenceClassification      |  4  | 59.302905  |
|           DebertaForMaskedLM            |  8  | 58.826573  |
|      DebertaV2ForQuestionAnswering      |  1  | 57.698569  |
|         Speech2Text2ForCausalLM         | 256 | 56.960627  |
|       BlenderbotSmallForCausalLM        | 64  | 55.560811  |
|     MobileBertForQuestionAnswering      | 128 | 54.328666  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 4.798667 |
|           mnasnet_100           | 512  | 4.787718 |
|          inception_v3           | 128  | 4.74765  |
|       gluon_inception_v3        | 256  | 4.738292 |
|        adv_inception_v3         | 128  | 4.629879 |
|           resnest101e           |  64  | 4.57375  |
|      mobilenetv3_large_100      | 512  | 4.429276 |
|           regnety_002           | 1024 | 4.42233  |
|          cspdarknet53           |  64  | 4.341051 |
|            fbnetv3_b            | 256  | 4.250158 |
|        ese_vovnet19b_dw         | 256  | 4.191295 |
|           res2next50            | 128  | 4.157546 |
|         mobilenetv2_100         | 128  | 4.12506  |
|        res2net50_14w_8s         | 128  | 3.929777 |
|            lcnet_050            | 256  | 3.893701 |
|        res2net101_26w_4s        | 128  | 3.874358 |
|            hrnet_w18            | 128  | 3.818371 |
|          spnasnet_100           | 128  | 3.764179 |
|          botnet26t_256          | 128  | 3.663927 |
|             dla102              | 128  | 3.638804 |
|           rexnet_100            | 256  | 3.629812 |
|            nfnet_l0             | 128  | 3.57278  |
|          pnasnet5large          |  16  | 3.552344 |
|            gernet_l             | 128  | 3.417551 |
|     swsl_resnext101_32x16d      |  32  | 3.279006 |
|           volo_d1_224           |  64  | 3.255892 |
|       eca_botnext26ts_256       | 128  | 3.122218 |
|        eca_halonext26ts         | 128  | 3.051839 |
|           dm_nfnet_f0           | 128  | 3.009339 |
|       tf_efficientnet_b0        | 128  | 2.996102 |
|            tinynet_a            | 128  | 2.843601 |
|        convmixer_768_32         |  32  | 2.836796 |
|            repvgg_a2            | 128  | 2.66427  |
|           selecsls42b           | 128  | 2.567475 |
|          ghostnet_100           | 512  | 2.453099 |
|         visformer_small         | 128  | 2.436975 |
|           mobilevit_s           |  64  | 2.380533 |
|         poolformer_m36          |  64  | 2.253231 |
|          jx_nest_base           |  32  | 2.072256 |
|             dpn107              |  64  | 1.92259  |
|      xcit_large_24_p8_224       |  16  | 1.889284 |
|            levit_128            | 1024 | 1.886923 |
|         coat_lite_mini          | 128  | 1.869391 |
|            mixnet_l             | 128  | 1.864177 |
|          convnext_base          |  64  | 1.858243 |
|           tf_mixnet_l           | 128  | 1.853797 |
|          gmlp_s16_224           | 128  | 1.730705 |
|        tnt_s_patch16_224        | 128  | 1.622581 |
|  swin_base_patch4_window7_224   |  64  | 1.615785 |
|          mixer_b16_224          | 128  | 1.570418 |
|        twins_pcpvt_base         | 128  | 1.563264 |
|           convit_base           |  64  | 1.550322 |
|      beit_base_patch16_224      |  64  | 1.536639 |
| deit_base_distilled_patch16_224 |  64  | 1.492456 |
|         crossvit_9_240          | 256  | 1.442811 |
|          resmlp_12_224          | 128  | 1.384756 |
|        sebotnet33ts_256         |  64  | 1.378995 |
|            pit_b_224            |  64  | 1.372268 |
|      vit_base_patch16_224       |  64  | 1.361061 |
|          gmixer_24_224          | 128  | 1.327485 |
|          cait_m36_384           |  4   | 0.985816 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 500.413422 |
|            hrnet_w18            | 128  | 289.609614 |
|        res2net101_26w_4s        | 128  | 96.709859  |
|           resnest101e           |  64  | 93.983193  |
|          cait_m36_384           |  4   | 89.511262  |
|      xcit_large_24_p8_224       |  16  |  89.00625  |
|           tf_mixnet_l           | 128  | 81.634609  |
|        res2net50_14w_8s         | 128  | 81.389514  |
|            mixnet_l             | 128  |  76.70932  |
|  swin_base_patch4_window7_224   |  64  |  69.51713  |
|         poolformer_m36          |  64  | 67.903846  |
|             dpn107              |  64  | 67.009079  |
|        twins_pcpvt_base         | 128  | 62.903645  |
|        tnt_s_patch16_224        | 128  | 62.600807  |
|           mobilevit_s           |  64  |  59.93054  |
|            fbnetv3_b            | 256  | 59.173028  |
|          jx_nest_base           |  32  | 58.861313  |
|           volo_d1_224           |  64  | 52.021651  |
|        adv_inception_v3         | 128  | 48.947949  |
|             dla102              | 128  | 46.461084  |
|          inception_v3           | 128  | 45.780488  |
|       gluon_inception_v3        | 256  | 44.839085  |
|            levit_128            | 1024 |  43.74466  |
|        sebotnet33ts_256         |  64  | 43.568176  |
|           res2next50            | 128  | 42.798558  |
|        eca_halonext26ts         | 128  | 42.536586  |
|          gmixer_24_224          | 128  |  42.3133   |
|         crossvit_9_240          | 256  | 42.113022  |
|          ghostnet_100           | 512  | 41.774193  |
|          convnext_base          |  64  | 41.389644  |
|     swsl_resnext101_32x16d      |  32  | 40.794857  |
|            tinynet_a            | 128  | 40.581132  |
|           rexnet_100            | 256  | 39.041538  |
|          gmlp_s16_224           | 128  | 38.698711  |
|         coat_lite_mini          | 128  | 37.470624  |
|           convit_base           |  64  | 36.345409  |
|       tf_efficientnet_b0        | 128  | 36.064213  |
|       eca_botnext26ts_256       | 128  | 35.118618  |
|          botnet26t_256          | 128  | 32.275219  |
|           dm_nfnet_f0           | 128  | 32.191895  |
|          cspdarknet53           |  64  | 31.405749  |
|        convmixer_768_32         |  32  | 31.251317  |
|         visformer_small         | 128  | 30.762905  |
|            nfnet_l0             | 128  | 29.896033  |
|           regnety_002           | 1024 | 29.055148  |
|      mobilenetv3_large_100      | 512  | 28.442188  |
|          spnasnet_100           | 128  | 27.476845  |
|            pit_b_224            |  64  | 27.412577  |
|         mobilenetv2_100         | 128  | 26.816947  |
|            gernet_l             | 128  | 26.095099  |
|      beit_base_patch16_224      |  64  | 26.061598  |
|           fbnetc_100            | 512  | 26.061311  |
|          mixer_b16_224          | 128  | 25.789594  |
| deit_base_distilled_patch16_224 |  64  | 25.649182  |
|      vit_base_patch16_224       |  64  | 25.514496  |
|            repvgg_a2            | 128  | 24.703022  |
|            lcnet_050            | 256  | 22.915616  |
|           mnasnet_100           | 512  | 22.690338  |
|           selecsls42b           | 128  | 22.363018  |
|          resmlp_12_224          | 128  | 21.478109  |
|        ese_vovnet19b_dw         | 256  | 20.626107  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.994882 |
|      mobilenetv3_large_100      | 512  | 0.992247 |
|           fbnetc_100            | 512  | 0.991841 |
|           dm_nfnet_f0           | 128  | 0.991469 |
|           rexnet_100            | 256  | 0.991379 |
|           mnasnet_100           | 512  | 0.991341 |
|           regnety_002           | 1024 | 0.991225 |
|       gluon_inception_v3        | 256  | 0.99057  |
|            levit_128            | 1024 | 0.989718 |
|            nfnet_l0             | 128  |  0.9892  |
|          ghostnet_100           | 512  | 0.988688 |
|            fbnetv3_b            | 256  | 0.988619 |
|             dpn107              |  64  | 0.988156 |
|           convit_base           |  64  | 0.987953 |
|      xcit_large_24_p8_224       |  16  | 0.987901 |
|      beit_base_patch16_224      |  64  | 0.987149 |
|          mixer_b16_224          | 128  | 0.987061 |
|       eca_botnext26ts_256       | 128  | 0.987029 |
|        res2net101_26w_4s        | 128  | 0.986613 |
|             dla102              | 128  | 0.986392 |
|        eca_halonext26ts         | 128  | 0.986171 |
|        convmixer_768_32         |  32  | 0.985579 |
|          gmlp_s16_224           | 128  | 0.985567 |
|            mixnet_l             | 128  | 0.985473 |
|           tf_mixnet_l           | 128  | 0.98525  |
|          botnet26t_256          | 128  | 0.985035 |
|        twins_pcpvt_base         | 128  | 0.984906 |
|         visformer_small         | 128  | 0.984841 |
|           resnest101e           |  64  | 0.984511 |
|           res2next50            | 128  | 0.984375 |
|          convnext_base          |  64  | 0.984331 |
|         coat_lite_mini          | 128  | 0.984262 |
|        tnt_s_patch16_224        | 128  | 0.983928 |
|          inception_v3           | 128  | 0.983095 |
|         poolformer_m36          |  64  | 0.98274  |
|        adv_inception_v3         | 128  | 0.982637 |
|          gmixer_24_224          | 128  | 0.982493 |
|          pnasnet5large          |  16  | 0.982122 |
|         crossvit_9_240          | 256  | 0.981646 |
|          jx_nest_base           |  32  | 0.981395 |
|       tf_efficientnet_b0        | 128  | 0.981325 |
|            pit_b_224            |  64  | 0.981163 |
|        res2net50_14w_8s         | 128  | 0.981052 |
| deit_base_distilled_patch16_224 |  64  | 0.98069  |
|      vit_base_patch16_224       |  64  | 0.98061  |
|  swin_base_patch4_window7_224   |  64  | 0.97997  |
|         mobilenetv2_100         | 128  | 0.978776 |
|           mobilevit_s           |  64  | 0.978322 |
|     swsl_resnext101_32x16d      |  32  | 0.977651 |
|            gernet_l             | 128  | 0.977487 |
|          resmlp_12_224          | 128  | 0.976964 |
|          cspdarknet53           |  64  | 0.976636 |
|            tinynet_a            | 128  | 0.975994 |
|           volo_d1_224           |  64  | 0.975715 |
|           selecsls42b           | 128  | 0.975489 |
|          spnasnet_100           | 128  | 0.971878 |
|          cait_m36_384           |  4   | 0.971276 |
|            hrnet_w18            | 128  | 0.971267 |
|            repvgg_a2            | 128  | 0.969715 |
|            lcnet_050            | 256  | 0.966242 |
|        sebotnet33ts_256         |  64  | 0.83197  |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 455.617624 |
|      xcit_large_24_p8_224       |  16  | 266.296003 |
|           convit_base           |  64  | 226.750794 |
|             dpn107              |  64  | 222.638676 |
|            levit_128            | 1024 | 208.766131 |
|        tnt_s_patch16_224        | 128  | 191.927469 |
|          convnext_base          |  64  | 176.319548 |
|           dm_nfnet_f0           | 128  | 175.295933 |
|        ese_vovnet19b_dw         | 256  | 166.884184 |
|  swin_base_patch4_window7_224   |  64  | 155.653376 |
|        twins_pcpvt_base         | 128  | 154.199108 |
|          mixer_b16_224          | 128  | 149.495487 |
|          jx_nest_base           |  32  | 147.764564 |
|         poolformer_m36          |  64  | 143.232552 |
|       gluon_inception_v3        | 256  | 142.792311 |
|            nfnet_l0             | 128  | 138.598586 |
|           tf_mixnet_l           | 128  | 133.79244  |
|        sebotnet33ts_256         |  64  | 132.931498 |
|         crossvit_9_240          | 256  | 129.143096 |
|            mixnet_l             | 128  | 126.403109 |
|          ghostnet_100           | 512  | 123.397759 |
|          gmixer_24_224          | 128  | 117.833039 |
|      vit_base_patch16_224       |  64  | 115.819384 |
|          pnasnet5large          |  16  | 114.467869 |
|           volo_d1_224           |  64  | 109.958305 |
|          gmlp_s16_224           | 128  | 109.039226 |
|      beit_base_patch16_224      |  64  | 108.182206 |
|            pit_b_224            |  64  | 107.018379 |
|        res2net101_26w_4s        | 128  | 106.27258  |
| deit_base_distilled_patch16_224 |  64  | 104.665696 |
|         coat_lite_mini          | 128  | 102.012445 |
|        eca_halonext26ts         | 128  | 96.222945  |
|           fbnetc_100            | 512  | 94.453662  |
|     swsl_resnext101_32x16d      |  32  | 93.479445  |
|       eca_botnext26ts_256       | 128  | 92.632276  |
|             dla102              | 128  |  89.99522  |
|            hrnet_w18            | 128  | 89.935255  |
|          resmlp_12_224          | 128  | 85.915637  |
|           regnety_002           | 1024 |  84.8738   |
|         visformer_small         | 128  | 83.166797  |
|           resnest101e           |  64  | 81.999523  |
|        res2net50_14w_8s         | 128  | 80.890296  |
|           rexnet_100            | 256  | 80.787996  |
|          botnet26t_256          | 128  | 80.435914  |
|           mnasnet_100           | 512  |  79.36578  |
|           res2next50            | 128  | 79.222809  |
|      mobilenetv3_large_100      | 512  | 76.469408  |
|           mobilevit_s           |  64  | 74.553121  |
|        convmixer_768_32         |  32  | 74.115518  |
|            fbnetv3_b            | 256  | 72.241741  |
|        adv_inception_v3         | 128  | 69.717241  |
|          inception_v3           | 128  | 69.380084  |
|          cspdarknet53           |  64  | 48.489156  |
|       tf_efficientnet_b0        | 128  | 46.138118  |
|            repvgg_a2            | 128  | 46.097328  |
|            gernet_l             | 128  | 39.063614  |
|           selecsls42b           | 128  | 35.231794  |
|            tinynet_a            | 128  | 34.185648  |
|         mobilenetv2_100         | 128  | 23.330272  |
|          spnasnet_100           | 128  | 21.206447  |
|            lcnet_050            | 256  |  9.422571  |
+---------------------------------+------+------------+

WeizhuoZhang-intel · 2024-10-14T23:30:00Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 98%, 45/46  | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.79x    |    1.79x    |    3.03x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   35.00    |    34.89    |    45.03    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 37.735501 |
|     pyhpc_equation_of_state     |    1    | 20.593697 |
|              dcgan              |    1    | 8.791238  |
|          squeezenet1_1          |    1    | 7.733435  |
|          timm_resnest           |    1    | 7.033064  |
|      functorch_dp_cifar10       |    1    | 6.604776  |
|         opacus_cifar10          |    1    | 6.456927  |
|           timm_nfnet            |    1    | 5.933068  |
|            resnet18             |    1    | 5.780493  |
|            resnet50             |    1    | 5.415839  |
|       doctr_det_predictor       |    1    | 5.089593  |
|         resnext50_32x4d         |    1    | 5.087019  |
|         LearningToPaint         |    1    | 5.061325  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.683173  |
|              vgg16              |    1    | 4.626135  |
|          lennard_jones          |    1    | 4.571669  |
|          mobilenet_v2           |    1    | 4.541826  |
|            resnet152            |    1    | 4.525941  |
|             yolov3              |    1    | 4.479446  |
|     nvidia_deeprecommender      |    1    | 4.469558  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.332597  |
|             alexnet             |    1    | 4.255247  |
|              llama              |    1    | 4.248713  |
|           mnasnet1_0            |    1    | 4.227385  |
|           timm_vovnet           |    1    | 4.193677  |
|       mobilenet_v3_large        |    1    | 4.132472  |
|         vision_maskrcnn         |    1    | 3.960986  |
|      doctr_reco_predictor       |    1    | 3.898068  |
|     functorch_maml_omniglot     |    1    | 3.607801  |
|           timm_regnet           |    1    | 3.490291  |
|       shufflenet_v2_x1_0        |    1    | 3.486363  |
|           densenet121           |    1    | 3.461999  |
|          basic_gnn_gin          |    1    | 3.419436  |
|              dlrm               |    1    | 3.294809  |
|        timm_efficientnet        |    1    | 3.062622  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.021071  |
|         phlippe_resnet          |    1    | 3.015583  |
|         basic_gnn_sage          |    1    |  3.00221  |
|          basic_gnn_gcn          |    1    | 2.982512  |
|           Super_SloMo           |    1    | 2.962264  |
|          maml_omniglot          |    5    | 2.695171  |
|          BERT_pytorch           |    1    | 2.664467  |
|        phlippe_densenet         |    1    | 2.522342  |
|          pytorch_unet           |    1    | 2.298245  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.226157  |
|       Background_Matting        |    1    | 2.213727  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.141234  |
|             hf_GPT2             |    1    | 2.039913  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.014054  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.910535  |
|     timm_vision_transformer     |    1    | 1.909693  |
|  timm_vision_transformer_large  |    1    | 1.909275  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.833813  |
|        soft_actor_critic        |   256   | 1.822862  |
|           hf_Reformer           |    1    | 1.820017  |
|         pytorch_stargan         |   16    | 1.736201  |
|        basic_gnn_edgecnn        |    1    | 1.722255  |
|          hf_GPT2_large          |    1    | 1.709982  |
|             hf_Bert             |    1    | 1.677997  |
|          hf_DistilBert          |    1    | 1.647065  |
|          fastNLP_Bert           |    1    | 1.591165  |
|          hf_Bert_large          |    1    | 1.590059  |
|             hf_Bart             |    1    | 1.481855  |
|      torch_multimodal_clip      |    1    | 1.473841  |
|           hf_T5_base            |    1    | 1.458483  |
|            moondream            |    1    | 1.431858  |
|            hf_Albert            |    1    | 1.415163  |
|       speech_transformer        |    1    | 1.398113  |
|           hf_BigBird            |    1    | 1.333057  |
|        hf_distil_whisper        |    1    | 1.325679  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.309298  |
|           hf_T5_large           |    1    | 1.237315  |
|              hf_T5              |    1    | 1.123436  |
|             demucs              |    1    | 1.046409  |
|           tts_angular           |    1    | 0.997252  |
|     resnet50_quantized_qat      |    1    |  0.98696  |
|   mobilenet_v2_quantized_qat    |    1    | 0.977255  |
|              maml               |    1    | 0.959881  |
|          hf_Longformer          |    1    | 0.884937  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 286.066238 |
|         vision_maskrcnn         |    1    | 263.366808 |
|    detectron2_fcos_r_50_fpn     |    1    | 210.684928 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 196.633907 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 177.170981 |
|              maml               |    1    | 132.282636 |
|           hf_T5_large           |    1    | 126.909128 |
|          hf_Longformer          |    1    | 94.917998  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 94.143276  |
|       speech_transformer        |    1    | 79.671127  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 79.252257  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 79.150345  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 74.718255  |
|           hf_Reformer           |    1    | 70.331467  |
|            resnet152            |    1    | 54.570618  |
|          basic_gnn_gcn          |    1    | 54.477852  |
|          fastNLP_Bert           |    1    | 48.498949  |
|           densenet121           |    1    | 46.644908  |
|  timm_vision_transformer_large  |    1    |  46.38839  |
|          hf_GPT2_large          |    1    | 39.348952  |
|          hf_Bert_large          |    1    | 39.113186  |
|            moondream            |    1    | 38.601574  |
|       doctr_det_predictor       |    1    | 36.818317  |
|        hf_distil_whisper        |    1    | 36.201495  |
|           timm_regnet           |    1    | 33.051512  |
|           hf_T5_base            |    1    |  32.74239  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  32.31641  |
|           Super_SloMo           |    1    | 32.203066  |
|             demucs              |    1    |  30.37889  |
|      torch_multimodal_clip      |    1    | 29.822417  |
|             yolov3              |    1    | 28.833039  |
|              hf_T5              |    1    | 28.762438  |
|           timm_nfnet            |    1    | 28.425759  |
|             hf_Bart             |    1    | 28.138955  |
|        timm_efficientnet        |    1    | 28.037141  |
|          BERT_pytorch           |    1    | 27.583288  |
|        phlippe_densenet         |    1    | 24.440333  |
|       shufflenet_v2_x1_0        |    1    | 24.220321  |
|      doctr_reco_predictor       |    1    | 23.960802  |
|             hf_Bert             |    1    | 23.819893  |
|       mobilenet_v3_large        |    1    | 23.123153  |
|            hf_Albert            |    1    | 22.535398  |
|             hf_GPT2             |    1    | 21.631308  |
|            resnet50             |    1    | 21.242297  |
|         resnext50_32x4d         |    1    | 21.166886  |
|          mobilenet_v2           |    1    | 21.029332  |
|     timm_vision_transformer     |    1    | 21.022089  |
|              llama              |    1    | 20.986319  |
|           mnasnet1_0            |    1    | 20.618608  |
|          timm_resnest           |    1    | 20.607456  |
|           timm_vovnet           |    1    | 20.210183  |
|         pytorch_stargan         |   16    |  19.61428  |
|          hf_DistilBert          |    1    | 18.290484  |
|         opacus_cifar10          |    1    | 17.957705  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.745833  |
|      functorch_dp_cifar10       |    1    | 17.496784  |
|       Background_Matting        |    1    | 16.432785  |
|            resnet18             |    1    | 16.155667  |
|     pyhpc_isoneutral_mixing     |    1    | 16.079679  |
|         LearningToPaint         |    1    | 15.948257  |
|          squeezenet1_1          |    1    | 15.917263  |
|         phlippe_resnet          |    1    | 15.849532  |
|          maml_omniglot          |    5    |  14.64986  |
|             alexnet             |    1    | 14.509528  |
|              vgg16              |    1    | 14.473928  |
|     functorch_maml_omniglot     |    1    | 14.257491  |
|     pyhpc_equation_of_state     |    1    | 13.693933  |
|              dlrm               |    1    | 13.679931  |
|              dcgan              |    1    | 13.476438  |
|         basic_gnn_sage          |    1    | 13.420358  |
|        soft_actor_critic        |   256   | 13.408388  |
|          basic_gnn_gin          |    1    | 13.314489  |
|        basic_gnn_edgecnn        |    1    | 13.152668  |
|     nvidia_deeprecommender      |    1    | 13.121087  |
|          lennard_jones          |    1    | 12.539043  |
|           tts_angular           |    1    | 12.435962  |
|          pytorch_unet           |    1    |  9.908807  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199355  |
|     resnet50_quantized_qat      |    1    |  0.17892   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.98585  |
|             demucs              |    1    | 0.984879 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974677 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.974074 |
|           hf_T5_base            |    1    | 0.971943 |
|       Background_Matting        |    1    | 0.970491 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.968387 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.96708  |
|              llama              |    1    | 0.964504 |
|          pytorch_unet           |    1    | 0.964255 |
|        basic_gnn_edgecnn        |    1    | 0.96157  |
|           hf_BigBird            |    1    | 0.961268 |
|      torch_multimodal_clip      |    1    | 0.95905  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954533 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.953225 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.953088 |
|         vision_maskrcnn         |    1    | 0.952309 |
|       doctr_det_predictor       |    1    | 0.950286 |
|         LearningToPaint         |    1    | 0.949809 |
|      doctr_reco_predictor       |    1    | 0.945791 |
|     resnet50_quantized_qat      |    1    | 0.942827 |
|          basic_gnn_gcn          |    1    | 0.942605 |
|           Super_SloMo           |    1    | 0.941845 |
|         basic_gnn_sage          |    1    | 0.937845 |
|          basic_gnn_gin          |    1    | 0.936376 |
|          fastNLP_Bert           |    1    | 0.935519 |
|             hf_Bert             |    1    | 0.925989 |
|         pytorch_stargan         |   16    | 0.924666 |
|        hf_distil_whisper        |    1    | 0.924632 |
|            hf_Albert            |    1    | 0.916279 |
|       speech_transformer        |    1    | 0.913068 |
|          hf_GPT2_large          |    1    | 0.909587 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907485 |
|          hf_DistilBert          |    1    | 0.907387 |
|             hf_GPT2             |    1    | 0.906331 |
|              hf_T5              |    1    | 0.906278 |
|          BERT_pytorch           |    1    | 0.904987 |
|   mobilenet_v2_quantized_qat    |    1    | 0.904255 |
|          hf_Longformer          |    1    | 0.895626 |
|             hf_Bart             |    1    | 0.890263 |
|         opacus_cifar10          |    1    | 0.88794  |
|           tts_angular           |    1    | 0.882591 |
|        soft_actor_critic        |   256   | 0.878415 |
|     timm_vision_transformer     |    1    | 0.869525 |
|           timm_nfnet            |    1    | 0.869267 |
|            moondream            |    1    | 0.86437  |
|        timm_efficientnet        |    1    | 0.864227 |
|          squeezenet1_1          |    1    | 0.863914 |
|          mobilenet_v2           |    1    | 0.862162 |
|          hf_Bert_large          |    1    | 0.858469 |
|              vgg16              |    1    | 0.857446 |
|           hf_T5_large           |    1    | 0.856549 |
|          lennard_jones          |    1    | 0.852792 |
|     functorch_maml_omniglot     |    1    | 0.851911 |
|              dcgan              |    1    | 0.850148 |
|          maml_omniglot          |    5    | 0.849445 |
|       mobilenet_v3_large        |    1    | 0.849279 |
|      functorch_dp_cifar10       |    1    | 0.845597 |
|           mnasnet1_0            |    1    | 0.844295 |
|             alexnet             |    1    | 0.84261  |
|  timm_vision_transformer_large  |    1    | 0.84229  |
|          timm_resnest           |    1    | 0.839612 |
|     pyhpc_equation_of_state     |    1    | 0.838333 |
|     nvidia_deeprecommender      |    1    | 0.836471 |
|         phlippe_resnet          |    1    | 0.828299 |
|           hf_Reformer           |    1    | 0.828143 |
|       shufflenet_v2_x1_0        |    1    | 0.821378 |
|     pyhpc_isoneutral_mixing     |    1    | 0.809677 |
|        phlippe_densenet         |    1    | 0.800905 |
|           densenet121           |    1    | 0.79893  |
|         resnext50_32x4d         |    1    | 0.798312 |
|            resnet18             |    1    | 0.793757 |
|             yolov3              |    1    | 0.791148 |
|           timm_vovnet           |    1    | 0.789909 |
|           timm_regnet           |    1    | 0.779051 |
|            resnet50             |    1    | 0.778607 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.765713 |
|            resnet152            |    1    | 0.737952 |
|              maml               |    1    | 0.716267 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9630.795497 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2917.228426 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2878.905901 |
|          hf_GPT2_large          |    1    | 2591.153319 |
|            moondream            |    1    | 2228.494367 |
|           hf_T5_large           |    1    | 2144.182109 |
|        hf_distil_whisper        |    1    | 2012.484824 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1661.56381  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1638.989686 |
|          pytorch_unet           |    1    | 1395.486584 |
|       Background_Matting        |    1    | 1219.060079 |
|             demucs              |    1    | 1187.275252 |
|  timm_vision_transformer_large  |    1    | 964.821115  |
|         vision_maskrcnn         |    1    | 692.881191  |
|    detectron2_fcos_r_50_fpn     |    1    | 622.040374  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 599.241012  |
|           hf_BigBird            |    1    |  523.75343  |
|          hf_Longformer          |    1    | 512.064115  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 508.691464  |
|          hf_Bert_large          |    1    | 465.982662  |
|       doctr_det_predictor       |    1    | 420.668865  |
|      torch_multimodal_clip      |    1    | 321.841949  |
|           Super_SloMo           |    1    | 315.365346  |
|             hf_Bart             |    1    | 256.943543  |
|              hf_T5              |    1    | 233.229489  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  225.64313  |
|         pytorch_stargan         |   16    | 218.957641  |
|             hf_Bert             |    1    | 181.306747  |
|            hf_Albert            |    1    | 167.162094  |
|          fastNLP_Bert           |    1    |  166.43856  |
|       speech_transformer        |    1    | 164.079229  |
|             hf_GPT2             |    1    | 110.989697  |
|          hf_DistilBert          |    1    | 104.202939  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 103.618686  |
|           hf_Reformer           |    1    |  98.88594   |
|        basic_gnn_edgecnn        |    1    |  86.704854  |
|             yolov3              |    1    |  62.91377   |
|              maml               |    1    |  54.940726  |
|              vgg16              |    1    |  43.127396  |
|          BERT_pytorch           |    1    |  42.670851  |
|     nvidia_deeprecommender      |    1    |  37.227812  |
|           timm_regnet           |    1    |  32.767547  |
|           timm_nfnet            |    1    |  30.894307  |
|           tts_angular           |    1    |  30.501083  |
|            resnet152            |    1    |  29.909093  |
|          basic_gnn_gcn          |    1    |  27.427693  |
|     timm_vision_transformer     |    1    |  18.791643  |
|           densenet121           |    1    |  13.501299  |
|         resnext50_32x4d         |    1    |  13.24053   |
|           timm_vovnet           |    1    |  12.921708  |
|             alexnet             |    1    |  11.665141  |
|            resnet50             |    1    |  11.042794  |
|              llama              |    1    |  10.131756  |
|         basic_gnn_sage          |    1    |  9.803103   |
|          basic_gnn_gin          |    1    |  8.895975   |
|     resnet50_quantized_qat      |    1    |  8.541107   |
|        timm_efficientnet        |    1    |  8.044442   |
|          timm_resnest           |    1    |   5.96531   |
|   mobilenet_v2_quantized_qat    |    1    |  5.815222   |
|      doctr_reco_predictor       |    1    |  5.579699   |
|            resnet18             |    1    |  3.968141   |
|       mobilenet_v3_large        |    1    |  3.870338   |
|           mnasnet1_0            |    1    |  3.629486   |
|          mobilenet_v2           |    1    |   3.60711   |
|       shufflenet_v2_x1_0        |    1    |   3.46979   |
|         LearningToPaint         |    1    |  2.546446   |
|        phlippe_densenet         |    1    |  2.503372   |
|          squeezenet1_1          |    1    |   2.20027   |
|         opacus_cifar10          |    1    |  1.663629   |
|      functorch_dp_cifar10       |    1    |  1.637282   |
|         phlippe_resnet          |    1    |  0.867588   |
|        soft_actor_critic        |   256   |  0.664599   |
|          maml_omniglot          |    5    |  0.641132   |
|              dcgan              |    1    |  0.601515   |
|     functorch_maml_omniglot     |    1    |  0.518752   |
|              dlrm               |    1    |  0.451757   |
|     pyhpc_isoneutral_mixing     |    1    |  0.046031   |
|          lennard_jones          |    1    |  0.027962   |
|     pyhpc_equation_of_state     |    1    |  0.026391   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.913756 |
|     MobileBertForQuestionAnswering      | 1  | 3.202358 |
|             XGLMForCausalLM             | 1  | 2.743477 |
|          DistilBertForMaskedLM          | 1  | 2.66421  |
|           PegasusForCausalLM            | 1  | 2.617837 |
|     M2M100ForConditionalGeneration      | 1  | 2.614046 |
|         Speech2Text2ForCausalLM         | 1  | 2.612115 |
|     DistilBertForQuestionAnswering      | 1  | 2.591321 |
|     PegasusForConditionalGeneration     | 1  | 2.578442 |
|          BlenderbotForCausalLM          | 1  | 2.516092 |
|       BlenderbotSmallForCausalLM        | 1  | 2.487893 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.358615 |
|            YituTechConvBert             | 1  | 2.260106 |
|            XLNetLMHeadModel             | 1  | 2.056313 |
|       DebertaForQuestionAnswering       | 1  | 2.022136 |
|           DebertaForMaskedLM            | 1  | 1.967885 |
|       MT5ForConditionalGeneration       | 1  | 1.901323 |
|           ElectraForCausalLM            | 1  | 1.801765 |
|           RobertaForCausalLM            | 1  | 1.766682 |
|                CamemBert                | 1  | 1.755451 |
|      GPT2ForSequenceClassification      | 1  | 1.750603 |
|               DistillGPT2               | 1  | 1.750257 |
|           LayoutLMForMaskedLM           | 1  | 1.744917 |
|             BertForMaskedLM             | 1  | 1.699113 |
|    LayoutLMForSequenceClassification    | 1  | 1.669686 |
|       RobertaForQuestionAnswering       | 1  | 1.660964 |
|        BertForQuestionAnswering         | 1  | 1.657437 |
|            TrOCRForCausalLM             | 1  | 1.65047  |
|         MegatronBertForCausalLM         | 1  | 1.633756 |
|    MegatronBertForQuestionAnswering     | 1  | 1.633659 |
|       ElectraForQuestionAnswering       | 1  | 1.632561 |
|          DebertaV2ForMaskedLM           | 1  | 1.542728 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.515938 |
|             OPTForCausalLM              | 1  | 1.43319  |
|             BartForCausalLM             | 1  | 1.354985 |
|            PLBartForCausalLM            | 1  | 1.321522 |
|      BartForConditionalGeneration       | 1  | 1.315964 |
|     PLBartForConditionalGeneration      | 1  | 1.292694 |
|            MBartForCausalLM             | 1  | 1.280709 |
|      MBartForConditionalGeneration      | 1  | 1.265254 |
|            AlbertForMaskedLM            | 1  | 1.227534 |
|       AlbertForQuestionAnswering        | 1  | 1.226895 |
|               GoogleFnet                | 1  | 1.221429 |
|                 T5Small                 | 1  | 1.170112 |
|       T5ForConditionalGeneration        | 1  | 1.166442 |
|          AllenaiLongformerBase          | 1  | 0.798171 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          AllenaiLongformerBase          | 1  | 104.084535 |
|          MobileBertForMaskedLM          | 1  | 100.188897 |
|     MobileBertForQuestionAnswering      | 1  | 99.642198  |
|     PegasusForConditionalGeneration     | 1  | 57.445795  |
|     M2M100ForConditionalGeneration      | 1  | 57.068924  |
|      MBartForConditionalGeneration      | 1  | 57.036901  |
|      BartForConditionalGeneration       | 1  | 52.652772  |
|             XGLMForCausalLM             | 1  | 44.367659  |
|      DebertaV2ForQuestionAnswering      | 1  | 42.867749  |
|          DebertaV2ForMaskedLM           | 1  | 42.702048  |
|          BlenderbotForCausalLM          | 1  | 41.473693  |
|         MegatronBertForCausalLM         | 1  | 39.577944  |
|    MegatronBertForQuestionAnswering     | 1  | 39.186835  |
|       MT5ForConditionalGeneration       | 1  | 38.586635  |
| BlenderbotSmallForConditionalGeneration | 1  | 37.921946  |
|            XLNetLMHeadModel             | 1  | 36.656676  |
|            YituTechConvBert             | 1  | 31.857994  |
|     PLBartForConditionalGeneration      | 1  | 29.987113  |
|                 T5Small                 | 1  | 28.806408  |
|       T5ForConditionalGeneration        | 1  | 28.763936  |
|            TrOCRForCausalLM             | 1  | 25.683852  |
|           PegasusForCausalLM            | 1  | 25.654574  |
|            MBartForCausalLM             | 1  |  24.96579  |
|           LayoutLMForMaskedLM           | 1  | 24.276837  |
|                CamemBert                | 1  | 24.184922  |
|           ElectraForCausalLM            | 1  | 24.148698  |
|    LayoutLMForSequenceClassification    | 1  | 24.094493  |
|             BertForMaskedLM             | 1  |  24.05661  |
|       RobertaForQuestionAnswering       | 1  | 24.053965  |
|           RobertaForCausalLM            | 1  | 24.014836  |
|        BertForQuestionAnswering         | 1  | 24.007172  |
|       ElectraForQuestionAnswering       | 1  | 23.889395  |
|            AlbertForMaskedLM            | 1  | 23.147637  |
|           DebertaForMaskedLM            | 1  | 23.012004  |
|       DebertaForQuestionAnswering       | 1  | 22.936113  |
|             BartForCausalLM             | 1  |  22.70645  |
|             OPTForCausalLM              | 1  |  22.05897  |
|      GPT2ForSequenceClassification      | 1  | 21.302803  |
|       BlenderbotSmallForCausalLM        | 1  | 21.199904  |
|       AlbertForQuestionAnswering        | 1  | 20.524045  |
|         Speech2Text2ForCausalLM         | 1  | 19.029084  |
|          DistilBertForMaskedLM          | 1  | 18.910173  |
|     DistilBertForQuestionAnswering      | 1  | 18.831487  |
|            PLBartForCausalLM            | 1  |  18.68229  |
|               GoogleFnet                | 1  | 18.371529  |
|               DistillGPT2               | 1  | 16.906242  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.97664  |
|            MBartForCausalLM             | 1  | 0.97227  |
|               DistillGPT2               | 1  | 0.952907 |
|            PLBartForCausalLM            | 1  | 0.942302 |
|     PLBartForConditionalGeneration      | 1  | 0.942151 |
|      GPT2ForSequenceClassification      | 1  | 0.940647 |
|    MegatronBertForQuestionAnswering     | 1  | 0.940447 |
|       RobertaForQuestionAnswering       | 1  | 0.938299 |
|            YituTechConvBert             | 1  | 0.937941 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.936391 |
|             BertForMaskedLM             | 1  | 0.934919 |
|           LayoutLMForMaskedLM           | 1  | 0.934662 |
|           PegasusForCausalLM            | 1  | 0.933777 |
|          BlenderbotForCausalLM          | 1  | 0.933649 |
|             BartForCausalLM             | 1  | 0.933181 |
|                CamemBert                | 1  | 0.931502 |
|       T5ForConditionalGeneration        | 1  | 0.931461 |
|            TrOCRForCausalLM             | 1  | 0.931304 |
|                 T5Small                 | 1  | 0.930777 |
|       MT5ForConditionalGeneration       | 1  | 0.928354 |
|           DebertaForMaskedLM            | 1  | 0.927819 |
|     M2M100ForConditionalGeneration      | 1  | 0.924742 |
|        BertForQuestionAnswering         | 1  | 0.923601 |
|           RobertaForCausalLM            | 1  | 0.923036 |
|    LayoutLMForSequenceClassification    | 1  | 0.920273 |
|       BlenderbotSmallForCausalLM        | 1  | 0.918216 |
|             XGLMForCausalLM             | 1  | 0.91654  |
|      BartForConditionalGeneration       | 1  | 0.913462 |
|          AllenaiLongformerBase          | 1  | 0.904452 |
|          DebertaV2ForMaskedLM           | 1  | 0.902454 |
|      MBartForConditionalGeneration      | 1  | 0.900341 |
|           ElectraForCausalLM            | 1  | 0.891041 |
|         MegatronBertForCausalLM         | 1  | 0.888698 |
|            XLNetLMHeadModel             | 1  | 0.884009 |
|          DistilBertForMaskedLM          | 1  | 0.878992 |
|     PegasusForConditionalGeneration     | 1  | 0.876037 |
|     DistilBertForQuestionAnswering      | 1  | 0.873354 |
|       ElectraForQuestionAnswering       | 1  | 0.872444 |
|       DebertaForQuestionAnswering       | 1  | 0.869692 |
|         Speech2Text2ForCausalLM         | 1  | 0.862903 |
|               GoogleFnet                | 1  | 0.852941 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.844734 |
|          MobileBertForMaskedLM          | 1  | 0.783333 |
|     MobileBertForQuestionAnswering      | 1  | 0.745189 |
|            AlbertForMaskedLM            | 1  | 0.688555 |
|       AlbertForQuestionAnswering        | 1  | 0.668485 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 3819.560043 |
|       AlbertForQuestionAnswering        | 1  | 3810.889852 |
|             OPTForCausalLM              | 1  | 2192.38438  |
|      MBartForConditionalGeneration      | 1  | 1823.617043 |
|      BartForConditionalGeneration       | 1  | 1466.609866 |
|          DebertaV2ForMaskedLM           | 1  |  1316.4221  |
|          AllenaiLongformerBase          | 1  | 1133.142723 |
|      DebertaV2ForQuestionAnswering      | 1  | 1015.302524 |
|            MBartForCausalLM             | 1  | 979.531032  |
|            XLNetLMHeadModel             | 1  | 963.785669  |
|                 T5Small                 | 1  | 791.590813  |
|       T5ForConditionalGeneration        | 1  | 791.451154  |
|          BlenderbotForCausalLM          | 1  | 752.210178  |
|     PLBartForConditionalGeneration      | 1  | 672.003264  |
|             BartForCausalLM             | 1  | 626.903398  |
|         MegatronBertForCausalLM         | 1  | 524.588881  |
|    MegatronBertForQuestionAnswering     | 1  | 482.613564  |
|               GoogleFnet                | 1  | 453.408005  |
|      GPT2ForSequenceClassification      | 1  | 382.940181  |
|            PLBartForCausalLM            | 1  | 382.713721  |
|             XGLMForCausalLM             | 1  | 241.265233  |
|            YituTechConvBert             | 1  | 218.615331  |
|           RobertaForCausalLM            | 1  | 215.066352  |
|     M2M100ForConditionalGeneration      | 1  |  212.27268  |
|           DebertaForMaskedLM            | 1  | 208.160566  |
|       MT5ForConditionalGeneration       | 1  |  193.75269  |
|             BertForMaskedLM             | 1  | 189.552372  |
|           LayoutLMForMaskedLM           | 1  | 188.555717  |
|                CamemBert                | 1  | 188.139438  |
|            TrOCRForCausalLM             | 1  | 182.630196  |
|     PegasusForConditionalGeneration     | 1  | 179.970148  |
|        BertForQuestionAnswering         | 1  | 149.484358  |
|       RobertaForQuestionAnswering       | 1  | 149.302892  |
|    LayoutLMForSequenceClassification    | 1  | 149.214338  |
|               DistillGPT2               | 1  | 147.846391  |
|       DebertaForQuestionAnswering       | 1  | 141.323838  |
|           PegasusForCausalLM            | 1  |  88.217742  |
| BlenderbotSmallForConditionalGeneration | 1  |  53.549973  |
|           ElectraForCausalLM            | 1  |  51.713478  |
|       ElectraForQuestionAnswering       | 1  |  30.884949  |
|          DistilBertForMaskedLM          | 1  |  30.611517  |
|          MobileBertForMaskedLM          | 1  |  29.531399  |
|       BlenderbotSmallForCausalLM        | 1  |  27.937758  |
|     DistilBertForQuestionAnswering      | 1  |  18.192143  |
|     MobileBertForQuestionAnswering      | 1  |  17.967268  |
|         Speech2Text2ForCausalLM         | 1  |  5.580415   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.33062  |
|        ese_vovnet19b_dw         | 1  | 6.230072 |
|       gluon_inception_v3        | 1  | 6.156353 |
|        adv_inception_v3         | 1  | 6.108751 |
|           resnest101e           | 1  | 5.707571 |
|          pnasnet5large          | 1  | 5.32144  |
|          cspdarknet53           | 1  | 5.260657 |
|           dm_nfnet_f0           | 1  | 5.140457 |
|     swsl_resnext101_32x16d      | 1  | 4.904309 |
|           res2next50            | 1  | 4.801645 |
|            nfnet_l0             | 1  | 4.683765 |
|         mobilenetv2_100         | 1  | 4.647781 |
|           fbnetc_100            | 1  | 4.509218 |
|             dla102              | 1  | 4.481013 |
|            fbnetv3_b            | 1  | 4.480228 |
|          spnasnet_100           | 1  | 4.405371 |
|           mnasnet_100           | 1  | 4.376632 |
|      mobilenetv3_large_100      | 1  | 4.321303 |
|          botnet26t_256          | 1  | 4.260713 |
|        res2net50_14w_8s         | 1  | 4.247588 |
|           selecsls42b           | 1  | 4.19896  |
|            gernet_l             | 1  | 4.102917 |
|        res2net101_26w_4s        | 1  | 3.991472 |
|            repvgg_a2            | 1  | 3.802217 |
|            hrnet_w18            | 1  | 3.783275 |
|           regnety_002           | 1  | 3.773645 |
|          ghostnet_100           | 1  | 3.711141 |
|       eca_botnext26ts_256       | 1  | 3.584994 |
|        eca_halonext26ts         | 1  | 3.38657  |
|            tinynet_a            | 1  | 3.262728 |
|           rexnet_100            | 1  | 3.251159 |
|         poolformer_m36          | 1  | 3.195171 |
|            lcnet_050            | 1  | 3.194791 |
|         visformer_small         | 1  | 3.154825 |
|       tf_efficientnet_b0        | 1  | 3.004382 |
|             dpn107              | 1  | 2.954405 |
|            levit_128            | 1  | 2.926246 |
|           mobilevit_s           | 1  | 2.867309 |
|        convmixer_768_32         | 1  | 2.62556  |
|         coat_lite_mini          | 1  | 2.542883 |
|        twins_pcpvt_base         | 1  |  2.2291  |
|           tf_mixnet_l           | 1  | 2.140923 |
|            mixnet_l             | 1  | 2.116464 |
|           volo_d1_224           | 1  | 1.98615  |
|          gmixer_24_224          | 1  | 1.969778 |
|          gmlp_s16_224           | 1  | 1.799965 |
|        tnt_s_patch16_224        | 1  | 1.795576 |
|      beit_base_patch16_224      | 1  | 1.771697 |
|           convit_base           | 1  | 1.754547 |
|            pit_b_224            | 1  | 1.74866  |
|          jx_nest_base           | 1  | 1.720128 |
|  swin_base_patch4_window7_224   | 1  | 1.717877 |
|          convnext_base          | 1  | 1.679371 |
|         crossvit_9_240          | 1  | 1.677642 |
|          resmlp_12_224          | 1  | 1.676434 |
|      xcit_large_24_p8_224       | 1  | 1.665626 |
|      vit_base_patch16_224       | 1  | 1.620063 |
| deit_base_distilled_patch16_224 | 1  | 1.492643 |
|          cait_m36_384           | 1  | 1.488631 |
|          mixer_b16_224          | 1  | 1.42737  |
|        sebotnet33ts_256         | 1  | 1.213914 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  |  485.2164  |
|            hrnet_w18            | 1  | 263.944338 |
|        res2net101_26w_4s        | 1  | 80.743161  |
|           resnest101e           | 1  |  75.29002  |
|        res2net50_14w_8s         | 1  | 67.191649  |
|           tf_mixnet_l           | 1  | 66.786026  |
|            mixnet_l             | 1  | 64.768332  |
|         poolformer_m36          | 1  | 56.187048  |
|      xcit_large_24_p8_224       | 1  | 54.639251  |
|          cait_m36_384           | 1  | 50.316552  |
|             dpn107              | 1  | 49.168751  |
|        twins_pcpvt_base         | 1  |  49.15022  |
|  swin_base_patch4_window7_224   | 1  | 49.131595  |
|        tnt_s_patch16_224        | 1  | 43.370888  |
|            fbnetv3_b            | 1  | 42.764297  |
|          jx_nest_base           | 1  |  41.46723  |
|        adv_inception_v3         | 1  | 40.144385  |
|             dla102              | 1  | 37.709104  |
|       gluon_inception_v3        | 1  | 37.579392  |
|          inception_v3           | 1  | 37.531347  |
|           volo_d1_224           | 1  | 34.952645  |
|           res2next50            | 1  | 34.705839  |
|          ghostnet_100           | 1  |  34.47612  |
|           mobilevit_s           | 1  | 34.045857  |
|          convnext_base          | 1  | 33.369691  |
|     swsl_resnext101_32x16d      | 1  | 32.944777  |
|         crossvit_9_240          | 1  | 32.192518  |
|            tinynet_a            | 1  | 32.173287  |
|           rexnet_100            | 1  |  30.24703  |
|          gmixer_24_224          | 1  | 29.794088  |
|       tf_efficientnet_b0        | 1  | 28.508574  |
|            levit_128            | 1  | 28.345619  |
|           dm_nfnet_f0           | 1  | 28.208411  |
|         coat_lite_mini          | 1  | 28.007645  |
|          gmlp_s16_224           | 1  |  27.82819  |
|        sebotnet33ts_256         | 1  | 27.658745  |
|        eca_halonext26ts         | 1  | 26.920779  |
|            nfnet_l0             | 1  | 26.443498  |
|        convmixer_768_32         | 1  | 25.840027  |
|          cspdarknet53           | 1  | 25.332915  |
|           convit_base           | 1  | 25.193372  |
|           regnety_002           | 1  | 24.452541  |
|       eca_botnext26ts_256       | 1  | 23.494359  |
|         visformer_small         | 1  | 23.196761  |
|           fbnetc_100            | 1  | 23.115775  |
|          spnasnet_100           | 1  | 22.937758  |
|      mobilenetv3_large_100      | 1  | 22.891494  |
|          botnet26t_256          | 1  | 22.151393  |
|            pit_b_224            | 1  | 21.943798  |
|            gernet_l             | 1  | 21.844662  |
|            repvgg_a2            | 1  | 21.276061  |
|      beit_base_patch16_224      | 1  | 21.263744  |
| deit_base_distilled_patch16_224 | 1  | 21.089268  |
|      vit_base_patch16_224       | 1  | 20.892016  |
|         mobilenetv2_100         | 1  |  20.86217  |
|           mnasnet_100           | 1  | 20.645502  |
|          mixer_b16_224          | 1  | 20.550632  |
|        ese_vovnet19b_dw         | 1  | 19.503345  |
|           selecsls42b           | 1  | 19.080153  |
|          resmlp_12_224          | 1  | 17.968774  |
|            lcnet_050            | 1  | 17.652713  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 0.913031 |
|            nfnet_l0             | 1  | 0.911581 |
|          convnext_base          | 1  | 0.908434 |
|          cait_m36_384           | 1  | 0.895076 |
|        convmixer_768_32         | 1  | 0.894772 |
|      beit_base_patch16_224      | 1  | 0.892632 |
|          resmlp_12_224          | 1  | 0.890524 |
|         poolformer_m36          | 1  | 0.889612 |
|           dm_nfnet_f0           | 1  | 0.889567 |
|           convit_base           | 1  | 0.885236 |
|           volo_d1_224           | 1  | 0.883836 |
|  swin_base_patch4_window7_224   | 1  | 0.883014 |
|      xcit_large_24_p8_224       | 1  | 0.882059 |
|        ese_vovnet19b_dw         | 1  | 0.880423 |
|            pit_b_224            | 1  | 0.877116 |
|         coat_lite_mini          | 1  | 0.874066 |
|         visformer_small         | 1  | 0.874023 |
|      vit_base_patch16_224       | 1  | 0.873306 |
|          mixer_b16_224          | 1  | 0.872943 |
|          jx_nest_base           | 1  | 0.871969 |
| deit_base_distilled_patch16_224 | 1  | 0.871773 |
|          gmlp_s16_224           | 1  | 0.870595 |
|         mobilenetv2_100         | 1  | 0.869869 |
|        twins_pcpvt_base         | 1  | 0.868341 |
|          gmixer_24_224          | 1  | 0.865001 |
|            lcnet_050            | 1  | 0.861005 |
|           mobilevit_s           | 1  | 0.860955 |
|       tf_efficientnet_b0        | 1  | 0.858299 |
|      mobilenetv3_large_100      | 1  | 0.853134 |
|           rexnet_100            | 1  | 0.851696 |
|           mnasnet_100           | 1  | 0.850011 |
|        sebotnet33ts_256         | 1  | 0.847603 |
|           fbnetc_100            | 1  | 0.846723 |
|            fbnetv3_b            | 1  | 0.846203 |
|          spnasnet_100           | 1  | 0.845726 |
|            tinynet_a            | 1  | 0.845539 |
|       eca_botnext26ts_256       | 1  | 0.843249 |
|           tf_mixnet_l           | 1  | 0.842748 |
|        eca_halonext26ts         | 1  | 0.841373 |
|          botnet26t_256          | 1  | 0.841286 |
|        tnt_s_patch16_224        | 1  | 0.838997 |
|          ghostnet_100           | 1  | 0.835428 |
|         crossvit_9_240          | 1  | 0.834277 |
|            mixnet_l             | 1  | 0.832405 |
|             dpn107              | 1  | 0.830147 |
|           regnety_002           | 1  | 0.827534 |
|            levit_128            | 1  | 0.825708 |
|          cspdarknet53           | 1  | 0.789966 |
|           res2next50            | 1  | 0.789801 |
|             dla102              | 1  | 0.786488 |
|          inception_v3           | 1  | 0.786435 |
|       gluon_inception_v3        | 1  | 0.786265 |
|        adv_inception_v3         | 1  | 0.785791 |
|        res2net50_14w_8s         | 1  | 0.784556 |
|           resnest101e           | 1  | 0.77765  |
|           selecsls42b           | 1  | 0.768299 |
|            repvgg_a2            | 1  | 0.76503  |
|            gernet_l             | 1  | 0.76385  |
|            hrnet_w18            | 1  | 0.759226 |
|        res2net101_26w_4s        | 1  | 0.755424 |
|     swsl_resnext101_32x16d      | 1  | 0.705981 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1558.437296 |
|      xcit_large_24_p8_224       | 1  | 430.994569  |
|          pnasnet5large          | 1  | 129.845052  |
|          jx_nest_base           | 1  | 109.932438  |
|          convnext_base          | 1  |  99.381476  |
|     swsl_resnext101_32x16d      | 1  |  90.617132  |
|  swin_base_patch4_window7_224   | 1  |  88.032246  |
|           convit_base           | 1  |  87.57195   |
| deit_base_distilled_patch16_224 | 1  |  79.417041  |
|      beit_base_patch16_224      | 1  |  74.850966  |
|      vit_base_patch16_224       | 1  |   74.0885   |
|             dpn107              | 1  |  71.290421  |
|        convmixer_768_32         | 1  |  70.04742   |
|            pit_b_224            | 1  |  65.779495  |
|          mixer_b16_224          | 1  |  61.140094  |
|        sebotnet33ts_256         | 1  |  49.259383  |
|         poolformer_m36          | 1  |  49.09937   |
|        twins_pcpvt_base         | 1  |  42.636628  |
|        tnt_s_patch16_224        | 1  |  42.612878  |
|           dm_nfnet_f0           | 1  |  41.061755  |
|           volo_d1_224           | 1  |  39.603965  |
|           resnest101e           | 1  |  31.900518  |
|          gmlp_s16_224           | 1  |  28.932224  |
|        res2net101_26w_4s        | 1  |  28.814149  |
|            nfnet_l0             | 1  |  27.968721  |
|          gmixer_24_224          | 1  |  25.674679  |
|            hrnet_w18            | 1  |  23.358195  |
|         visformer_small         | 1  |  22.055297  |
|           mobilevit_s           | 1  |  21.273164  |
|           tf_mixnet_l           | 1  |  20.474493  |
|            mixnet_l             | 1  |  19.847517  |
|             dla102              | 1  |  19.59069   |
|        res2net50_14w_8s         | 1  |  17.400455  |
|          resmlp_12_224          | 1  |  17.019985  |
|          cspdarknet53           | 1  |  16.17366   |
|        eca_halonext26ts         | 1  |  15.970437  |
|       gluon_inception_v3        | 1  |  15.448119  |
|          inception_v3           | 1  |  15.346476  |
|        adv_inception_v3         | 1  |  15.293557  |
|       eca_botnext26ts_256       | 1  |  15.094668  |
|           res2next50            | 1  |  14.841673  |
|         coat_lite_mini          | 1  |  14.796024  |
|         crossvit_9_240          | 1  |  14.662121  |
|            repvgg_a2            | 1  |  12.906229  |
|            gernet_l             | 1  |  12.443824  |
|          botnet26t_256          | 1  |  12.093572  |
|           selecsls42b           | 1  |  11.078419  |
|       tf_efficientnet_b0        | 1  |  8.426911   |
|        ese_vovnet19b_dw         | 1  |  8.152877   |
|           rexnet_100            | 1  |   8.01112   |
|            fbnetv3_b            | 1  |  7.979584   |
|            tinynet_a            | 1  |  7.317626   |
|            levit_128            | 1  |  5.574779   |
|          ghostnet_100           | 1  |  4.963929   |
|           fbnetc_100            | 1  |  4.456288   |
|          spnasnet_100           | 1  |  4.104387   |
|      mobilenetv3_large_100      | 1  |  3.806796   |
|           mnasnet_100           | 1  |  3.768925   |
|         mobilenetv2_100         | 1  |  3.658838   |
|           regnety_002           | 1  |  3.411224   |
|            lcnet_050            | 1  |  1.792048   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-15T02:30:05Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.11x    |    2.14x    |    2.75x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   36.67    |    35.40    |    46.70    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.239634 |
|          timm_resnest           |   32    | 4.982347  |
|            resnet50             |   32    | 4.759344  |
|         phlippe_resnet          |   128   |  4.32722  |
|         resnext50_32x4d         |    8    | 4.318623  |
|          squeezenet1_1          |   16    | 3.920689  |
|            resnet152            |   32    |  3.8975   |
|           mnasnet1_0            |   32    | 3.754807  |
|          mobilenet_v2           |   16    | 3.731958  |
|            resnet18             |    8    | 3.718214  |
|              vgg16              |    4    | 3.659919  |
|       mobilenet_v3_large        |   32    | 3.540624  |
|             yolov3              |    8    |  3.46236  |
|           timm_vovnet           |   32    | 3.419526  |
|             alexnet             |   128   | 3.141373  |
|           timm_regnet           |   32    | 3.060148  |
|        timm_efficientnet        |   64    | 2.992103  |
|             hf_GPT2             |    1    | 2.875003  |
|       shufflenet_v2_x1_0        |   64    | 2.812511  |
|           timm_nfnet            |   128   | 2.709598  |
|          maml_omniglot          |    5    | 2.708295  |
|             hf_Bert             |    1    | 2.548968  |
|              llama              |   32    | 2.533257  |
|        phlippe_densenet         |   128   | 2.514538  |
|        soft_actor_critic        |   256   | 2.489775  |
|           densenet121           |   64    | 2.463215  |
|           hf_T5_base            |    1    |  2.37892  |
|          BERT_pytorch           |    2    | 2.377411  |
|          hf_Bert_large          |    1    | 2.345328  |
|       doctr_det_predictor       |    1    | 2.341037  |
|          hf_DistilBert          |    1    | 2.319887  |
|          lennard_jones          |  1000   | 2.230914  |
|             hf_Bart             |    1    | 2.163859  |
|     functorch_maml_omniglot     |    1    | 2.109937  |
|            hf_Albert            |    1    | 2.088827  |
|            moondream            |    1    | 2.083906  |
|          fastNLP_Bert           |    1    | 2.077895  |
|              hf_T5              |    1    | 2.075387  |
|         LearningToPaint         |   96    | 2.057995  |
|              dcgan              |   256   | 1.952988  |
|          hf_Longformer          |    1    |  1.83584  |
|           hf_T5_large           |    1    | 1.808766  |
|          hf_GPT2_large          |    1    | 1.805298  |
|      torch_multimodal_clip      |   32    | 1.789046  |
|      doctr_reco_predictor       |    1    | 1.782479  |
|          pytorch_unet           |    1    | 1.736696  |
|       Background_Matting        |    1    | 1.723536  |
|        basic_gnn_edgecnn        |    1    | 1.673877  |
|         pytorch_stargan         |   16    | 1.631955  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.614427  |
|     timm_vision_transformer     |   32    | 1.549889  |
|        hf_distil_whisper        |    1    | 1.538504  |
|       speech_transformer        |    1    |  1.51557  |
|     nvidia_deeprecommender      |   256   | 1.481029  |
|           hf_Reformer           |    1    | 1.465836  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.440577  |
|          basic_gnn_gin          |    1    | 1.360898  |
|  timm_vision_transformer_large  |   32    | 1.354269  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.352637  |
|         vision_maskrcnn         |    1    | 1.337376  |
|         basic_gnn_sage          |    1    | 1.279328  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.253696  |
|         opacus_cifar10          |   64    | 1.243276  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.230237  |
|          basic_gnn_gcn          |    1    | 1.209243  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.201027  |
|           Super_SloMo           |    6    | 1.190359  |
|              dlrm               |  2048   |  1.15948  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.147637  |
|           hf_BigBird            |    1    | 1.135614  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.133613  |
|      functorch_dp_cifar10       |   64    | 1.096681  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.080346  |
|             demucs              |    1    | 1.065173  |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.039412  |
|   mobilenet_v2_quantized_qat    |   96    | 0.999283  |
|           tts_angular           |   64    | 0.993177  |
|     resnet50_quantized_qat      |   32    | 0.979317  |
|              maml               |    1    | 0.840592  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 284.436226 |
|         vision_maskrcnn         |    1    | 270.458313 |
|    detectron2_fcos_r_50_fpn     |    1    | 215.060488 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 204.309072 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 189.033543 |
|              maml               |    1    | 132.133888 |
|           hf_T5_large           |    1    | 131.423439 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 102.996103 |
|          hf_Longformer          |    1    | 93.501637  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  89.85848  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 87.021632  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 80.664917  |
|       speech_transformer        |    1    | 79.275161  |
|           hf_Reformer           |    1    | 70.766706  |
|          basic_gnn_gcn          |    1    | 54.671644  |
|            resnet152            |   32    | 53.232711  |
|          hf_GPT2_large          |    1    | 51.242224  |
|           hf_T5_base            |    1    | 50.785157  |
|  timm_vision_transformer_large  |   32    | 48.831833  |
|          fastNLP_Bert           |    1    | 48.626791  |
|           densenet121           |   64    | 47.320394  |
|            moondream            |    1    | 46.603955  |
|       doctr_det_predictor       |    1    | 44.701081  |
|          hf_Bert_large          |    1    | 40.897088  |
|        hf_distil_whisper        |    1    |  39.62042  |
|     pyhpc_isoneutral_mixing     | 1048576 |  37.79818  |
|           Super_SloMo           |    6    | 37.040695  |
|           timm_regnet           |   32    | 33.036015  |
|      torch_multimodal_clip      |   32    | 32.089277  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 31.200753  |
|             demucs              |    1    |  30.70224  |
|             yolov3              |    8    | 30.472423  |
|          BERT_pytorch           |    2    | 29.738136  |
|             hf_Bart             |    1    | 28.882947  |
|           timm_nfnet            |   128   | 28.843556  |
|              hf_T5              |    1    | 28.721042  |
|        timm_efficientnet        |   64    | 28.311285  |
|       shufflenet_v2_x1_0        |   64    | 25.200821  |
|             hf_Bert             |    1    | 24.671446  |
|        phlippe_densenet         |   128   | 24.437156  |
|      doctr_reco_predictor       |    1    | 24.198765  |
|       Background_Matting        |    1    |  23.96878  |
|       mobilenet_v3_large        |   32    | 23.320203  |
|            hf_Albert            |    1    |  23.13964  |
|             hf_GPT2             |    1    | 22.396247  |
|          timm_resnest           |   32    | 21.844569  |
|     timm_vision_transformer     |   32    | 21.749673  |
|         resnext50_32x4d         |    8    | 21.420073  |
|            resnet50             |   32    | 21.292597  |
|              llama              |   32    | 21.020515  |
|          mobilenet_v2           |   16    | 20.962332  |
|           mnasnet1_0            |   32    | 20.677166  |
|           timm_vovnet           |   32    | 20.503423  |
|          hf_DistilBert          |    1    | 18.748624  |
|         opacus_cifar10          |   64    | 18.697789  |
|          pytorch_unet           |    1    | 18.653562  |
|         pytorch_stargan         |   16    | 18.608808  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.389665  |
|      functorch_dp_cifar10       |   64    | 18.176008  |
|            resnet18             |    8    | 16.612652  |
|          squeezenet1_1          |   16    | 16.451545  |
|         LearningToPaint         |   96    | 16.324176  |
|         phlippe_resnet          |   128   | 15.888239  |
|              vgg16              |    4    | 15.052676  |
|             alexnet             |   128   | 14.946732  |
|     pyhpc_equation_of_state     | 1048576 | 14.481535  |
|     functorch_maml_omniglot     |    1    | 14.350177  |
|          maml_omniglot          |    5    | 14.041031  |
|              dlrm               |  2048   | 13.839822  |
|         basic_gnn_sage          |    1    | 13.532755  |
|              dcgan              |   256   | 13.480993  |
|        basic_gnn_edgecnn        |    1    | 13.466652  |
|          basic_gnn_gin          |    1    | 13.344906  |
|        soft_actor_critic        |   256   | 13.194275  |
|     nvidia_deeprecommender      |   256   | 13.186248  |
|          lennard_jones          |  1000   | 13.005696  |
|           tts_angular           |   64    | 12.378564  |
|   mobilenet_v2_quantized_qat    |   96    |  0.229596  |
|     resnet50_quantized_qat      |   32    |  0.206363  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.98741  |
|           timm_nfnet            |   128   | 0.98506  |
|             demucs              |    1    | 0.980983 |
|           hf_T5_base            |    1    | 0.978487 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974778 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973727 |
|           timm_regnet           |   32    | 0.973513 |
|        timm_efficientnet        |   64    | 0.972735 |
|       Background_Matting        |    1    | 0.97076  |
|           Super_SloMo           |    6    | 0.97001  |
|              llama              |   32    | 0.96935  |
|           densenet121           |   64    | 0.967919 |
|       doctr_det_predictor       |    1    | 0.966983 |
|      torch_multimodal_clip      |   32    | 0.965886 |
|          pytorch_unet           |    1    | 0.96432  |
|            resnet152            |   32    | 0.963465 |
|             yolov3              |    8    | 0.963454 |
|        basic_gnn_edgecnn        |    1    | 0.962656 |
|           hf_BigBird            |    1    | 0.961013 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.960945 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95992  |
|           timm_vovnet           |   32    | 0.959538 |
|            resnet50             |   32    | 0.959011 |
|         LearningToPaint         |   96    | 0.958812 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.958535 |
|     resnet50_quantized_qat      |   32    | 0.954736 |
|     timm_vision_transformer     |   32    | 0.954545 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.954332 |
|             alexnet             |   128   | 0.953801 |
|         vision_maskrcnn         |    1    | 0.952706 |
|   mobilenet_v2_quantized_qat    |   96    | 0.951078 |
|          timm_resnest           |   32    | 0.949868 |
|      doctr_reco_predictor       |    1    | 0.944654 |
|          basic_gnn_gcn          |    1    | 0.943771 |
|           mnasnet1_0            |   32    | 0.939197 |
|  timm_vision_transformer_large  |   32    | 0.939003 |
|     pyhpc_equation_of_state     | 1048576 | 0.935536 |
|          basic_gnn_gin          |    1    | 0.933616 |
|         basic_gnn_sage          |    1    | 0.933522 |
|       mobilenet_v3_large        |   32    | 0.932283 |
|         pytorch_stargan         |   16    | 0.93081  |
|          mobilenet_v2           |   16    | 0.930293 |
|       shufflenet_v2_x1_0        |   64    | 0.929398 |
|             hf_Bert             |    1    | 0.929037 |
|         resnext50_32x4d         |    8    | 0.927518 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.91979  |
|     nvidia_deeprecommender      |   256   | 0.917676 |
|        phlippe_densenet         |   128   | 0.915412 |
|          fastNLP_Bert           |    1    | 0.914923 |
|       speech_transformer        |    1    | 0.912027 |
|            hf_Albert            |    1    | 0.910231 |
|          BERT_pytorch           |    2    | 0.90916  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.901465 |
|           tts_angular           |   64    | 0.900442 |
|             hf_GPT2             |    1    | 0.899804 |
|          hf_DistilBert          |    1    | 0.897593 |
|              dcgan              |   256   | 0.897436 |
|          squeezenet1_1          |   16    | 0.896739 |
|          hf_Longformer          |    1    | 0.894736 |
|          hf_GPT2_large          |    1    | 0.893547 |
|              hf_T5              |    1    | 0.890527 |
|             hf_Bart             |    1    | 0.888993 |
|         opacus_cifar10          |   64    | 0.888649 |
|        hf_distil_whisper        |    1    | 0.883543 |
|        soft_actor_critic        |   256   | 0.882907 |
|              vgg16              |    4    | 0.881663 |
|         phlippe_resnet          |   128   | 0.869448 |
|           hf_T5_large           |    1    | 0.860109 |
|      functorch_dp_cifar10       |   64    | 0.859544 |
|          hf_Bert_large          |    1    | 0.858289 |
|          lennard_jones          |  1000   | 0.857143 |
|            moondream            |    1    | 0.853072 |
|          maml_omniglot          |    5    | 0.849445 |
|     functorch_maml_omniglot     |    1    | 0.849445 |
|           hf_Reformer           |    1    | 0.827305 |
|            resnet18             |    8    | 0.818533 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.766142 |
|              maml               |    1    | 0.731343 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.57084  |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 808.694472 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 795.004299 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 704.353018 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 697.406702 |
|  timm_vision_transformer_large  |   32    | 680.859308 |
|           hf_T5_base            |    1    | 350.993766 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 251.058815 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 240.222706 |
|         vision_maskrcnn         |    1    | 237.554553 |
|           Super_SloMo           |    6    | 199.827402 |
|          hf_GPT2_large          |    1    | 136.709355 |
|           hf_T5_large           |    1    | 128.382997 |
|            moondream            |    1    | 100.830464 |
|           timm_nfnet            |   128   | 98.116565  |
|        hf_distil_whisper        |    1    | 93.980885  |
|           hf_BigBird            |    1    | 89.103298  |
|    detectron2_fcos_r_50_fpn     |    1    | 76.934934  |
|          pytorch_unet           |    1    |  66.11601  |
|       Background_Matting        |    1    | 64.495954  |
|              maml               |    1    | 53.576927  |
|             demucs              |    1    |  51.46994  |
|           densenet121           |   64    | 50.814633  |
|       doctr_det_predictor       |    1    | 39.851072  |
|      torch_multimodal_clip      |   32    | 38.889458  |
|           timm_regnet           |   32    | 35.938027  |
|          hf_Bert_large          |    1    | 34.355539  |
|          hf_Longformer          |    1    | 34.205504  |
|   mobilenet_v2_quantized_qat    |   96    | 32.393433  |
|            resnet152            |   32    | 30.743959  |
|             yolov3              |    8    | 28.910145  |
|     pyhpc_isoneutral_mixing     | 1048576 | 27.801201  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 26.924054  |
|       speech_transformer        |    1    | 24.380057  |
|     timm_vision_transformer     |   32    |  22.22823  |
|        timm_efficientnet        |   64    | 21.436957  |
|           hf_Reformer           |    1    | 20.671767  |
|             hf_Bart             |    1    | 19.018055  |
|              hf_T5              |    1    | 18.703577  |
|           timm_vovnet           |   32    |  16.7843   |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.775478  |
|          fastNLP_Bert           |    1    | 16.348179  |
|     nvidia_deeprecommender      |   256   | 16.212278  |
|         pytorch_stargan         |   16    | 15.540465  |
|             hf_Bert             |    1    |  13.78603  |
|            hf_Albert            |    1    | 13.712189  |
|          BERT_pytorch           |    2    | 12.435553  |
|     resnet50_quantized_qat      |   32    | 12.202646  |
|            resnet50             |   32    |  11.90721  |
|             hf_GPT2             |    1    | 10.327257  |
|              llama              |   32    |  9.281229  |
|       shufflenet_v2_x1_0        |   64    |  9.143048  |
|          timm_resnest           |   32    |  9.079607  |
|           tts_angular           |   64    |  8.637049  |
|         LearningToPaint         |   96    |  8.470204  |
|          hf_DistilBert          |    1    |  8.134012  |
|        phlippe_densenet         |   128   |  7.830372  |
|          basic_gnn_gcn          |    1    |  7.622662  |
|       mobilenet_v3_large        |   32    |  7.389328  |
|      functorch_dp_cifar10       |   64    |  7.101534  |
|         resnext50_32x4d         |    8    |  7.000317  |
|         opacus_cifar10          |   64    |  6.706444  |
|        basic_gnn_edgecnn        |    1    |  6.670006  |
|              vgg16              |    4    |  6.615646  |
|           mnasnet1_0            |   32    |  6.275352  |
|             alexnet             |   128   |  6.230871  |
|          mobilenet_v2           |   16    |  4.85216   |
|              dlrm               |  2048   |  4.439415  |
|         basic_gnn_sage          |    1    |  4.040342  |
|          basic_gnn_gin          |    1    |  3.545721  |
|      doctr_reco_predictor       |    1    |  3.53155   |
|          squeezenet1_1          |   16    |  3.339011  |
|              dcgan              |   256   |  2.751876  |
|            resnet18             |    8    |  2.723315  |
|         phlippe_resnet          |   128   |  1.449294  |
|     pyhpc_equation_of_state     | 1048576 |  1.147264  |
|     functorch_maml_omniglot     |    1    |  0.393845  |
|          maml_omniglot          |    5    |  0.284477  |
|        soft_actor_critic        |   256   |  0.245852  |
|          lennard_jones          |  1000   |  0.162262  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 12.880427 |
|       ElectraForQuestionAnswering       | 64  | 3.925861  |
|           ElectraForCausalLM            | 32  | 3.596204  |
|     MobileBertForQuestionAnswering      | 128 | 3.289344  |
|           RobertaForCausalLM            | 16  | 2.967621  |
|    LayoutLMForSequenceClassification    | 16  | 2.951481  |
|          MobileBertForMaskedLM          | 128 | 2.946001  |
|        BertForQuestionAnswering         | 16  | 2.933513  |
|       RobertaForQuestionAnswering       | 16  | 2.883706  |
|    MegatronBertForQuestionAnswering     |  8  | 2.640673  |
|             BertForMaskedLM             | 16  | 2.633404  |
|                CamemBert                | 16  | 2.561086  |
|           LayoutLMForMaskedLM           | 16  | 2.466441  |
|                 T5Small                 |  4  | 2.388016  |
|       T5ForConditionalGeneration        |  4  | 2.384896  |
|            YituTechConvBert             | 16  | 2.347518  |
|               DistillGPT2               | 16  | 2.324717  |
|           DebertaForMaskedLM            |  8  | 2.301989  |
|         MegatronBertForCausalLM         |  4  | 2.288217  |
|      GPT2ForSequenceClassification      |  4  | 2.257303  |
|      DebertaV2ForQuestionAnswering      |  1  | 2.208477  |
|       DebertaForQuestionAnswering       | 16  | 2.183515  |
|             XGLMForCausalLM             |  8  | 2.092724  |
|             OPTForCausalLM              |  2  | 2.050239  |
|         Speech2Text2ForCausalLM         | 256 | 2.000239  |
|       MT5ForConditionalGeneration       | 16  | 1.889928  |
|       BlenderbotSmallForCausalLM        | 64  | 1.827247  |
|     DistilBertForQuestionAnswering      | 256 | 1.809085  |
|          BlenderbotForCausalLM          |  4  | 1.780759  |
|            PLBartForCausalLM            |  8  |  1.75657  |
|            MBartForCausalLM             |  4  | 1.746158  |
|     PLBartForConditionalGeneration      |  4  | 1.695994  |
|          DebertaV2ForMaskedLM           |  2  | 1.692017  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.660194  |
|            TrOCRForCausalLM             | 32  | 1.652237  |
|           PegasusForCausalLM            | 32  | 1.620963  |
|     M2M100ForConditionalGeneration      | 16  | 1.609549  |
|          DistilBertForMaskedLM          | 128 | 1.607636  |
|      MBartForConditionalGeneration      |  2  | 1.592587  |
|     PegasusForConditionalGeneration     | 32  | 1.577607  |
|             BartForCausalLM             |  4  | 1.481621  |
|            AlbertForMaskedLM            |  4  | 1.473004  |
|       AlbertForQuestionAnswering        |  4  | 1.458502  |
|               GoogleFnet                | 16  | 1.427912  |
|      BartForConditionalGeneration       |  2  | 1.379368  |
|          AllenaiLongformerBase          |  4  | 1.064914  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 104.673901 |
|     MobileBertForQuestionAnswering      | 128 | 99.798263  |
|          MobileBertForMaskedLM          | 128 | 98.180853  |
|     PegasusForConditionalGeneration     | 32  | 60.096753  |
|     M2M100ForConditionalGeneration      | 16  | 59.574834  |
|      MBartForConditionalGeneration      |  2  | 59.212154  |
|      BartForConditionalGeneration       |  2  | 55.426767  |
|          BlenderbotForCausalLM          |  4  | 47.862067  |
|             XGLMForCausalLM             |  8  | 46.967621  |
|          DebertaV2ForMaskedLM           |  2  | 46.863483  |
|      DebertaV2ForQuestionAnswering      |  1  | 46.055372  |
|       MT5ForConditionalGeneration       | 16  |  43.62352  |
| BlenderbotSmallForConditionalGeneration | 64  | 39.863621  |
|         MegatronBertForCausalLM         |  4  | 38.676499  |
|    MegatronBertForQuestionAnswering     |  8  | 38.495948  |
|     PLBartForConditionalGeneration      |  4  | 31.761776  |
|            YituTechConvBert             | 16  | 31.263481  |
|       T5ForConditionalGeneration        |  4  | 30.456328  |
|                 T5Small                 |  4  | 30.422729  |
|           PegasusForCausalLM            | 32  | 27.193978  |
|             OPTForCausalLM              |  2  | 26.852779  |
|            MBartForCausalLM             |  4  |  26.52991  |
|            TrOCRForCausalLM             | 32  | 26.439986  |
|           DebertaForMaskedLM            |  8  | 25.444511  |
|       DebertaForQuestionAnswering       | 16  | 25.109097  |
|            AlbertForMaskedLM            |  4  | 24.336013  |
|             BartForCausalLM             |  4  | 24.059442  |
|            XLNetLMHeadModel             |  8  | 23.896486  |
|           LayoutLMForMaskedLM           | 16  | 23.677104  |
|           RobertaForCausalLM            | 16  | 23.636903  |
|       RobertaForQuestionAnswering       | 16  | 23.440831  |
|        BertForQuestionAnswering         | 16  | 23.398053  |
|    LayoutLMForSequenceClassification    | 16  | 23.334728  |
|                CamemBert                | 16  | 23.251381  |
|             BertForMaskedLM             | 16  | 23.179772  |
|           ElectraForCausalLM            | 32  |  23.09968  |
|      GPT2ForSequenceClassification      |  4  | 22.926017  |
|       ElectraForQuestionAnswering       | 64  | 22.601777  |
|       BlenderbotSmallForCausalLM        | 64  | 21.623397  |
|       AlbertForQuestionAnswering        |  4  | 21.374509  |
|          DistilBertForMaskedLM          | 128 |  19.57537  |
|         Speech2Text2ForCausalLM         | 256 |  19.38744  |
|     DistilBertForQuestionAnswering      | 256 |  19.38642  |
|            PLBartForCausalLM            |  8  | 18.977167  |
|               GoogleFnet                | 16  | 18.653621  |
|               DistillGPT2               | 16  | 17.791448  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.991827 |
|       AlbertForQuestionAnswering        |  4  | 0.99175  |
|            PLBartForCausalLM            |  8  | 0.991262 |
|               GoogleFnet                | 16  | 0.990503 |
|             BertForMaskedLM             | 16  | 0.989971 |
|               DistillGPT2               | 16  | 0.989791 |
|          DistilBertForMaskedLM          | 128 | 0.989519 |
|           LayoutLMForMaskedLM           | 16  | 0.989326 |
|           ElectraForCausalLM            | 32  | 0.988942 |
|                CamemBert                | 16  | 0.987386 |
|             OPTForCausalLM              |  2  | 0.987315 |
|            YituTechConvBert             | 16  | 0.987113 |
|       ElectraForQuestionAnswering       | 64  | 0.987109 |
|    LayoutLMForSequenceClassification    | 16  | 0.986236 |
|           RobertaForCausalLM            | 16  | 0.986222 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986042 |
|         Speech2Text2ForCausalLM         | 256 | 0.985538 |
|     DistilBertForQuestionAnswering      | 256 | 0.985049 |
|        BertForQuestionAnswering         | 16  | 0.984768 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.984021 |
|            TrOCRForCausalLM             | 32  | 0.981802 |
|            MBartForCausalLM             |  4  | 0.981691 |
|       DebertaForQuestionAnswering       | 16  | 0.981094 |
|          MobileBertForMaskedLM          | 128 | 0.980779 |
|       T5ForConditionalGeneration        |  4  | 0.979794 |
|       RobertaForQuestionAnswering       | 16  | 0.979271 |
|                 T5Small                 |  4  | 0.979142 |
|       MT5ForConditionalGeneration       | 16  | 0.977935 |
|          AllenaiLongformerBase          |  4  | 0.977604 |
|      GPT2ForSequenceClassification      |  4  | 0.973828 |
|             BartForCausalLM             |  4  | 0.971695 |
|     MobileBertForQuestionAnswering      | 128 | 0.970237 |
|           DebertaForMaskedLM            |  8  | 0.966886 |
|           PegasusForCausalLM            | 32  | 0.96662  |
|     PLBartForConditionalGeneration      |  4  | 0.965221 |
|     M2M100ForConditionalGeneration      | 16  | 0.954474 |
|         MegatronBertForCausalLM         |  4  | 0.945735 |
|    MegatronBertForQuestionAnswering     |  8  | 0.942833 |
|          BlenderbotForCausalLM          |  4  | 0.942134 |
|          DebertaV2ForMaskedLM           |  2  | 0.941249 |
|             XGLMForCausalLM             |  8  | 0.938997 |
|            XLNetLMHeadModel             |  8  | 0.938701 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.936092 |
|      BartForConditionalGeneration       |  2  | 0.933383 |
|      MBartForConditionalGeneration      |  2  | 0.927416 |
|     PegasusForConditionalGeneration     | 32  | 0.912114 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 933.925365 |
|            AlbertForMaskedLM            |  4  | 487.033984 |
|       AlbertForQuestionAnswering        |  4  | 484.247915 |
|            XLNetLMHeadModel             |  8  | 365.783338 |
|               GoogleFnet                | 16  | 263.340929 |
|             OPTForCausalLM              |  2  | 195.636266 |
|            TrOCRForCausalLM             | 32  | 171.061539 |
|      BartForConditionalGeneration       |  2  | 169.013907 |
|      MBartForConditionalGeneration      |  2  | 168.183886 |
|            MBartForCausalLM             |  4  | 165.821831 |
|     PegasusForConditionalGeneration     | 32  | 165.600365 |
|     DistilBertForQuestionAnswering      | 256 | 132.948349 |
|            PLBartForCausalLM            |  8  | 131.270392 |
|    MegatronBertForQuestionAnswering     |  8  | 129.501695 |
|            YituTechConvBert             | 16  | 125.650896 |
|          BlenderbotForCausalLM          |  4  | 121.459695 |
|                 T5Small                 |  4  | 120.796959 |
|       T5ForConditionalGeneration        |  4  | 120.601216 |
|          DistilBertForMaskedLM          | 128 | 120.584805 |
|     PLBartForConditionalGeneration      |  4  | 120.224459 |
|     M2M100ForConditionalGeneration      | 16  | 119.708135 |
|          DebertaV2ForMaskedLM           |  2  | 116.24052  |
|           LayoutLMForMaskedLM           | 16  | 112.24336  |
|                CamemBert                | 16  | 107.851358 |
| BlenderbotSmallForConditionalGeneration | 64  | 107.579465 |
|           RobertaForCausalLM            | 16  | 105.773541 |
|             BertForMaskedLM             | 16  | 102.825345 |
|          MobileBertForMaskedLM          | 128 | 101.629883 |
|             BartForCausalLM             |  4  | 94.156088  |
|       DebertaForQuestionAnswering       | 16  | 90.807572  |
|       MT5ForConditionalGeneration       | 16  | 89.659488  |
|         MegatronBertForCausalLM         |  4  | 85.855302  |
|    LayoutLMForSequenceClassification    | 16  |  78.91792  |
|       RobertaForQuestionAnswering       | 16  | 78.745732  |
|        BertForQuestionAnswering         | 16  | 78.604065  |
|           PegasusForCausalLM            | 32  | 77.435277  |
|               DistillGPT2               | 16  | 73.455008  |
|             XGLMForCausalLM             |  8  | 68.548872  |
|       ElectraForQuestionAnswering       | 64  | 66.244286  |
|           DebertaForMaskedLM            |  8  | 59.238266  |
|      GPT2ForSequenceClassification      |  4  | 59.214679  |
|      DebertaV2ForQuestionAnswering      |  1  | 58.076181  |
|           ElectraForCausalLM            | 32  |  57.73335  |
|         Speech2Text2ForCausalLM         | 256 | 56.853029  |
|       BlenderbotSmallForCausalLM        | 64  | 55.074048  |
|     MobileBertForQuestionAnswering      | 128 | 54.506879  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 5.242721 |
|          inception_v3           | 128  | 4.998052 |
|           fbnetc_100            | 512  | 4.888263 |
|        adv_inception_v3         | 128  | 4.880872 |
|           mnasnet_100           | 512  | 4.871599 |
|       gluon_inception_v3        | 256  | 4.755677 |
|          cspdarknet53           |  64  | 4.614908 |
|      mobilenetv3_large_100      | 512  | 4.563785 |
|           regnety_002           | 1024 | 4.547478 |
|            fbnetv3_b            | 256  | 4.405106 |
|         mobilenetv2_100         | 128  | 4.291039 |
|           res2next50            | 128  | 4.278451 |
|            lcnet_050            | 256  | 4.253811 |
|        ese_vovnet19b_dw         | 256  | 4.223674 |
|            hrnet_w18            | 128  | 4.114688 |
|        res2net101_26w_4s        | 128  | 4.078715 |
|          botnet26t_256          | 128  | 4.057821 |
|          spnasnet_100           | 128  | 4.003538 |
|        res2net50_14w_8s         | 128  | 3.989876 |
|          pnasnet5large          |  16  | 3.880356 |
|           rexnet_100            | 256  | 3.84785  |
|            nfnet_l0             | 128  | 3.813603 |
|             dla102              | 128  | 3.765926 |
|     swsl_resnext101_32x16d      |  32  | 3.735826 |
|            gernet_l             | 128  | 3.695712 |
|       eca_botnext26ts_256       | 128  | 3.410136 |
|           volo_d1_224           |  64  | 3.321137 |
|        eca_halonext26ts         | 128  | 3.282712 |
|           dm_nfnet_f0           | 128  | 3.127551 |
|       tf_efficientnet_b0        | 128  | 3.104625 |
|           mobilevit_s           |  64  | 3.06156  |
|        convmixer_768_32         |  32  | 2.976857 |
|            tinynet_a            | 128  | 2.976132 |
|            repvgg_a2            | 128  | 2.851829 |
|           selecsls42b           | 128  | 2.658306 |
|         visformer_small         | 128  | 2.512031 |
|          ghostnet_100           | 512  | 2.50791  |
|           convit_base           |  64  | 2.373765 |
|         poolformer_m36          |  64  | 2.309017 |
|          jx_nest_base           |  32  | 2.057627 |
|             dpn107              |  64  | 1.969769 |
|            mixnet_l             | 128  | 1.946849 |
|      xcit_large_24_p8_224       |  16  | 1.908216 |
|           tf_mixnet_l           | 128  | 1.898917 |
|            levit_128            | 1024 | 1.892146 |
|          convnext_base          |  64  | 1.878246 |
|         coat_lite_mini          | 128  | 1.874028 |
|        tnt_s_patch16_224        | 128  | 1.663433 |
|        twins_pcpvt_base         | 128  | 1.643741 |
|  swin_base_patch4_window7_224   |  64  | 1.602187 |
|          mixer_b16_224          | 128  | 1.574675 |
|          gmlp_s16_224           | 128  | 1.560187 |
|      beit_base_patch16_224      |  64  | 1.534819 |
| deit_base_distilled_patch16_224 |  64  | 1.510159 |
|          resmlp_12_224          | 128  | 1.48101  |
|         crossvit_9_240          | 256  | 1.432161 |
|        sebotnet33ts_256         |  64  | 1.413519 |
|      vit_base_patch16_224       |  64  | 1.395718 |
|          gmixer_24_224          | 128  | 1.353573 |
|            pit_b_224            |  64  | 1.346934 |
|          cait_m36_384           |  4   | 0.983673 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 500.379529 |
|            hrnet_w18            | 128  | 267.183846 |
|        res2net101_26w_4s        | 128  | 80.111712  |
|           resnest101e           |  64  | 76.768563  |
|           tf_mixnet_l           | 128  | 68.756873  |
|        res2net50_14w_8s         | 128  | 67.808479  |
|            mixnet_l             | 128  |  64.92008  |
|          cait_m36_384           |  4   | 64.449725  |
|      xcit_large_24_p8_224       |  16  | 60.040657  |
|         poolformer_m36          |  64  | 57.714512  |
|             dpn107              |  64  | 53.685024  |
|  swin_base_patch4_window7_224   |  64  | 53.271981  |
|        twins_pcpvt_base         | 128  | 49.566875  |
|        tnt_s_patch16_224        | 128  | 48.186955  |
|          jx_nest_base           |  32  | 43.826705  |
|           mobilevit_s           |  64  | 43.623835  |
|            fbnetv3_b            | 256  | 41.735378  |
|        adv_inception_v3         | 128  | 40.908483  |
|          inception_v3           | 128  | 38.039642  |
|           volo_d1_224           |  64  | 37.718955  |
|       gluon_inception_v3        | 256  | 36.901126  |
|             dla102              | 128  | 36.605709  |
|        eca_halonext26ts         | 128  | 35.148094  |
|           res2next50            | 128  | 35.006373  |
|         crossvit_9_240          | 256  | 33.944344  |
|          ghostnet_100           | 512  | 33.883553  |
|          convnext_base          |  64  | 33.577616  |
|            levit_128            | 1024 | 32.816769  |
|     swsl_resnext101_32x16d      |  32  | 32.709993  |
|            tinynet_a            | 128  | 32.656236  |
|        sebotnet33ts_256         |  64  | 31.602262  |
|          gmixer_24_224          | 128  | 31.430109  |
|           rexnet_100            | 256  | 31.329539  |
|         coat_lite_mini          | 128  | 31.001702  |
|          gmlp_s16_224           | 128  | 29.493342  |
|       tf_efficientnet_b0        | 128  | 28.640824  |
|           dm_nfnet_f0           | 128  | 27.792423  |
|           convit_base           |  64  | 27.057708  |
|        convmixer_768_32         |  32  | 26.790481  |
|       eca_botnext26ts_256       | 128  | 26.172735  |
|            nfnet_l0             | 128  | 25.351332  |
|         visformer_small         | 128  | 25.307905  |
|          cspdarknet53           |  64  | 25.136872  |
|          botnet26t_256          | 128  | 24.389111  |
|           regnety_002           | 1024 | 23.628069  |
|          spnasnet_100           | 128  | 22.718512  |
|            pit_b_224            |  64  | 22.625484  |
|      mobilenetv3_large_100      | 512  | 22.257459  |
|      beit_base_patch16_224      |  64  | 21.935434  |
| deit_base_distilled_patch16_224 |  64  | 21.783893  |
|           fbnetc_100            | 512  | 21.669746  |
|            gernet_l             | 128  |  21.66741  |
|            repvgg_a2            | 128  | 21.534506  |
|      vit_base_patch16_224       |  64  | 21.448369  |
|          mixer_b16_224          | 128  | 20.787977  |
|         mobilenetv2_100         | 128  | 20.710931  |
|           mnasnet_100           | 512  | 19.246875  |
|           selecsls42b           | 128  | 19.179515  |
|          resmlp_12_224          | 128  | 18.666133  |
|            lcnet_050            | 256  | 17.681077  |
|        ese_vovnet19b_dw         | 256  |  17.52632  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995046 |
|      mobilenetv3_large_100      | 512  | 0.992644 |
|           fbnetc_100            | 512  | 0.992282 |
|           rexnet_100            | 256  | 0.992056 |
|       gluon_inception_v3        | 256  | 0.991655 |
|           regnety_002           | 1024 | 0.991584 |
|           mnasnet_100           | 512  | 0.991337 |
|           dm_nfnet_f0           | 128  | 0.991231 |
|            fbnetv3_b            | 256  |  0.9909  |
|          ghostnet_100           | 512  | 0.990518 |
|            levit_128            | 1024 | 0.990475 |
|      xcit_large_24_p8_224       |  16  | 0.988782 |
|             dpn107              |  64  | 0.988639 |
|           convit_base           |  64  | 0.988513 |
|      beit_base_patch16_224      |  64  | 0.988252 |
|             dla102              | 128  | 0.987961 |
|          mixer_b16_224          | 128  | 0.987829 |
|       eca_botnext26ts_256       | 128  | 0.987783 |
|        res2net101_26w_4s        | 128  | 0.987734 |
|        eca_halonext26ts         | 128  | 0.987513 |
|            nfnet_l0             | 128  | 0.98745  |
|          gmlp_s16_224           | 128  | 0.987409 |
|           tf_mixnet_l           | 128  | 0.986552 |
|        tnt_s_patch16_224        | 128  | 0.986333 |
|            mixnet_l             | 128  | 0.98609  |
|           res2next50            | 128  | 0.985916 |
|        twins_pcpvt_base         | 128  | 0.985853 |
|         visformer_small         | 128  | 0.985794 |
|         coat_lite_mini          | 128  | 0.985654 |
|          botnet26t_256          | 128  | 0.985458 |
|           resnest101e           |  64  | 0.985376 |
|          convnext_base          |  64  | 0.985021 |
|          inception_v3           | 128  | 0.984684 |
|          pnasnet5large          |  16  | 0.98429  |
|       tf_efficientnet_b0        | 128  | 0.984128 |
|        adv_inception_v3         | 128  | 0.984114 |
|          gmixer_24_224          | 128  | 0.98364  |
|        res2net50_14w_8s         | 128  | 0.983607 |
|         poolformer_m36          |  64  | 0.983273 |
|          jx_nest_base           |  32  | 0.983164 |
|        convmixer_768_32         |  32  | 0.982718 |
|          cspdarknet53           |  64  | 0.982564 |
|  swin_base_patch4_window7_224   |  64  | 0.982332 |
|      vit_base_patch16_224       |  64  | 0.982176 |
|         crossvit_9_240          | 256  | 0.981946 |
|         mobilenetv2_100         | 128  | 0.981804 |
|            pit_b_224            |  64  | 0.981583 |
| deit_base_distilled_patch16_224 |  64  | 0.981445 |
|            tinynet_a            | 128  | 0.980319 |
|     swsl_resnext101_32x16d      |  32  | 0.980001 |
|           mobilevit_s           |  64  | 0.979959 |
|            gernet_l             | 128  | 0.979379 |
|          resmlp_12_224          | 128  |  0.9792  |
|           volo_d1_224           |  64  | 0.978053 |
|           selecsls42b           | 128  | 0.978005 |
|          cait_m36_384           |  4   | 0.976232 |
|            hrnet_w18            | 128  | 0.975216 |
|          spnasnet_100           | 128  | 0.97374  |
|            repvgg_a2            | 128  | 0.970901 |
|            lcnet_050            | 256  | 0.970735 |
|        sebotnet33ts_256         |  64  | 0.834549 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 459.804848 |
|      xcit_large_24_p8_224       |  16  | 265.317149 |
|             dpn107              |  64  | 215.996375 |
|            levit_128            | 1024 | 208.077694 |
|        tnt_s_patch16_224        | 128  | 186.519845 |
|          convnext_base          |  64  | 174.727672 |
|           dm_nfnet_f0           | 128  | 169.42558  |
|        ese_vovnet19b_dw         | 256  | 165.485848 |
|  swin_base_patch4_window7_224   |  64  | 157.898933 |
|        twins_pcpvt_base         | 128  | 152.288106 |
|           convit_base           |  64  | 151.764407 |
|          jx_nest_base           |  32  | 148.941991 |
|          mixer_b16_224          | 128  | 148.880423 |
|       gluon_inception_v3        | 256  | 142.480241 |
|         poolformer_m36          |  64  |  140.9262  |
|            nfnet_l0             | 128  | 137.600952 |
|           tf_mixnet_l           | 128  | 130.383446 |
|         crossvit_9_240          | 256  | 129.47741  |
|        sebotnet33ts_256         |  64  | 126.105144 |
|            mixnet_l             | 128  | 123.448688 |
|          ghostnet_100           | 512  | 120.545507 |
|          gmixer_24_224          | 128  | 114.445396 |
|      vit_base_patch16_224       |  64  | 112.182059 |
|          gmlp_s16_224           | 128  | 110.467962 |
|           volo_d1_224           |  64  |  108.8349  |
|      beit_base_patch16_224      |  64  | 108.144933 |
|            pit_b_224            |  64  | 106.984722 |
|          pnasnet5large          |  16  | 106.693916 |
| deit_base_distilled_patch16_224 |  64  | 104.564034 |
|         coat_lite_mini          | 128  | 100.609058 |
|        res2net101_26w_4s        | 128  | 99.590561  |
|           fbnetc_100            | 512  | 92.369222  |
|        eca_halonext26ts         | 128  | 92.312363  |
|       eca_botnext26ts_256       | 128  | 89.979728  |
|             dla102              | 128  | 86.587551  |
|     swsl_resnext101_32x16d      |  32  | 83.271733  |
|            hrnet_w18            | 128  | 83.063411  |
|           regnety_002           | 1024 | 82.158816  |
|         visformer_small         | 128  | 81.029263  |
|          resmlp_12_224          | 128  | 80.763304  |
|           mnasnet_100           | 512  | 78.484371  |
|        res2net50_14w_8s         | 128  | 77.129788  |
|           rexnet_100            | 256  | 76.497156  |
|           res2next50            | 128  | 75.930635  |
|      mobilenetv3_large_100      | 512  | 74.429572  |
|          botnet26t_256          | 128  | 72.373477  |
|           resnest101e           |  64  | 72.293508  |
|        convmixer_768_32         |  32  | 71.334875  |
|            fbnetv3_b            | 256  | 67.790284  |
|        adv_inception_v3         | 128  |  66.30446  |
|          inception_v3           | 128  | 66.061038  |
|           mobilevit_s           |  64  | 58.068302  |
|          cspdarknet53           |  64  | 45.522043  |
|            repvgg_a2            | 128  | 43.913855  |
|       tf_efficientnet_b0        | 128  | 43.877766  |
|            gernet_l             | 128  | 36.704312  |
|           selecsls42b           | 128  | 34.091423  |
|            tinynet_a            | 128  | 32.767216  |
|         mobilenetv2_100         | 128  |  22.26587  |
|          spnasnet_100           | 128  | 19.448594  |
|            lcnet_050            | 256  |  8.582351  |
+---------------------------------+------+------------+

zxd1997066 · 2024-10-15T02:30:07Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 98%, 45/46  | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.81x    |    1.79x    |    3.05x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   34.99    |    35.00    |    45.43    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 39.807019 |
|     pyhpc_equation_of_state     |    1    | 20.590059 |
|              dcgan              |    1    | 8.674629  |
|          squeezenet1_1          |    1    |  7.80428  |
|      functorch_dp_cifar10       |    1    | 6.512744  |
|          timm_resnest           |    1    | 6.511675  |
|         opacus_cifar10          |    1    | 6.509918  |
|           timm_nfnet            |    1    |  5.96893  |
|            resnet18             |    1    | 5.728708  |
|          maml_omniglot          |    5    | 5.605997  |
|            resnet50             |    1    | 5.217776  |
|       doctr_det_predictor       |    1    | 5.137293  |
|         resnext50_32x4d         |    1    | 5.042546  |
|         LearningToPaint         |    1    | 5.033031  |
|              vgg16              |    1    | 4.694047  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.673479  |
|             yolov3              |    1    | 4.549782  |
|          mobilenet_v2           |    1    | 4.530292  |
|          lennard_jones          |    1    | 4.523005  |
|            resnet152            |    1    |  4.40567  |
|           timm_vovnet           |    1    | 4.294843  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.286247  |
|           mnasnet1_0            |    1    | 4.281948  |
|       mobilenet_v3_large        |    1    | 4.201833  |
|     nvidia_deeprecommender      |    1    | 4.188473  |
|             alexnet             |    1    | 4.029352  |
|      doctr_reco_predictor       |    1    |  3.9229   |
|         vision_maskrcnn         |    1    | 3.905278  |
|              llama              |    1    | 3.890938  |
|     functorch_maml_omniglot     |    1    | 3.622964  |
|       shufflenet_v2_x1_0        |    1    | 3.513903  |
|           timm_regnet           |    1    | 3.467055  |
|           densenet121           |    1    | 3.414135  |
|          basic_gnn_gin          |    1    | 3.376842  |
|              dlrm               |    1    | 3.292852  |
|         phlippe_resnet          |    1    | 3.032839  |
|        timm_efficientnet        |    1    | 3.024249  |
|    detectron2_fcos_r_50_fpn     |    1    |  3.01649  |
|         basic_gnn_sage          |    1    |  3.01597  |
|          basic_gnn_gcn          |    1    | 2.975442  |
|           Super_SloMo           |    1    | 2.970287  |
|        phlippe_densenet         |    1    | 2.721937  |
|          BERT_pytorch           |    1    |  2.7101   |
|          pytorch_unet           |    1    | 2.286516  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.212812  |
|       Background_Matting        |    1    | 2.206034  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.159341  |
|             hf_GPT2             |    1    | 2.035598  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.000455  |
|        soft_actor_critic        |   256   | 1.982234  |
|     timm_vision_transformer     |    1    | 1.953404  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.893068  |
|  timm_vision_transformer_large  |    1    | 1.838162  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  1.82945  |
|           hf_Reformer           |    1    | 1.820683  |
|         pytorch_stargan         |   16    | 1.757542  |
|        basic_gnn_edgecnn        |    1    |  1.7247   |
|          hf_GPT2_large          |    1    | 1.715741  |
|             hf_Bert             |    1    | 1.680629  |
|          hf_DistilBert          |    1    | 1.634761  |
|          fastNLP_Bert           |    1    | 1.606643  |
|          hf_Bert_large          |    1    | 1.577697  |
|      torch_multimodal_clip      |    1    | 1.492457  |
|             hf_Bart             |    1    |  1.48724  |
|           hf_T5_base            |    1    | 1.469313  |
|            moondream            |    1    | 1.430599  |
|            hf_Albert            |    1    | 1.407978  |
|       speech_transformer        |    1    | 1.403526  |
|           hf_BigBird            |    1    | 1.332935  |
|        hf_distil_whisper        |    1    | 1.320443  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.31626  |
|           hf_T5_large           |    1    |  1.24858  |
|              hf_T5              |    1    | 1.128563  |
|             demucs              |    1    | 1.041425  |
|           tts_angular           |    1    | 1.000659  |
|     resnet50_quantized_qat      |    1    | 0.988237  |
|   mobilenet_v2_quantized_qat    |    1    | 0.976455  |
|              maml               |    1    | 0.954617  |
|          hf_Longformer          |    1    | 0.884387  |
|        timm_efficientdet        |    0    |    0.0    |
|               drq               |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 286.323861 |
|         vision_maskrcnn         |    1    | 263.687759 |
|    detectron2_fcos_r_50_fpn     |    1    | 211.090694 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 197.018903 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 177.563552 |
|              maml               |    1    | 132.599245 |
|           hf_T5_large           |    1    | 126.276721 |
|          hf_Longformer          |    1    | 94.874728  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  94.35528  |
|       speech_transformer        |    1    | 79.886875  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 79.362366  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  79.23278  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  74.88719  |
|           hf_Reformer           |    1    | 70.540265  |
|            resnet152            |    1    | 54.657566  |
|          basic_gnn_gcn          |    1    |  54.49248  |
|          fastNLP_Bert           |    1    | 48.546237  |
|           densenet121           |    1    | 47.003913  |
|  timm_vision_transformer_large  |    1    | 46.732461  |
|          hf_GPT2_large          |    1    | 39.373278  |
|          hf_Bert_large          |    1    |  39.15321  |
|            moondream            |    1    | 38.592153  |
|       doctr_det_predictor       |    1    | 36.821254  |
|        hf_distil_whisper        |    1    | 36.238351  |
|           timm_regnet           |    1    | 33.009425  |
|           hf_T5_base            |    1    | 32.288043  |
|           Super_SloMo           |    1    | 32.252395  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 32.225277  |
|             demucs              |    1    | 30.410062  |
|      torch_multimodal_clip      |    1    |  29.79037  |
|             yolov3              |    1    | 28.782452  |
|              hf_T5              |    1    | 28.778885  |
|           timm_nfnet            |    1    | 28.442828  |
|             hf_Bart             |    1    |  28.21195  |
|        timm_efficientnet        |    1    | 28.063757  |
|          BERT_pytorch           |    1    | 27.613956  |
|        phlippe_densenet         |    1    | 24.481916  |
|       shufflenet_v2_x1_0        |    1    | 24.246874  |
|      doctr_reco_predictor       |    1    |  24.01182  |
|             hf_Bert             |    1    | 23.840677  |
|       mobilenet_v3_large        |    1    | 23.220143  |
|            hf_Albert            |    1    | 22.518647  |
|             hf_GPT2             |    1    | 21.648457  |
|            resnet50             |    1    | 21.242102  |
|         resnext50_32x4d         |    1    | 21.166161  |
|          mobilenet_v2           |    1    | 21.078942  |
|              llama              |    1    | 21.040849  |
|     timm_vision_transformer     |    1    | 21.021698  |
|           mnasnet1_0            |    1    | 20.724877  |
|          timm_resnest           |    1    | 20.631941  |
|           timm_vovnet           |    1    | 20.147391  |
|          hf_DistilBert          |    1    | 18.317517  |
|         opacus_cifar10          |    1    | 18.012742  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 17.691626  |
|      functorch_dp_cifar10       |    1    | 17.550865  |
|         pytorch_stargan         |   16    | 17.267378  |
|       Background_Matting        |    1    | 16.562543  |
|            resnet18             |    1    | 16.187627  |
|     pyhpc_isoneutral_mixing     |    1    | 16.125267  |
|         LearningToPaint         |    1    | 15.993465  |
|          squeezenet1_1          |    1    |  15.98794  |
|         phlippe_resnet          |    1    | 15.799461  |
|              vgg16              |    1    | 14.551505  |
|             alexnet             |    1    | 14.543903  |
|     functorch_maml_omniglot     |    1    | 14.295304  |
|          maml_omniglot          |    5    | 13.943718  |
|     pyhpc_equation_of_state     |    1    | 13.730779  |
|              dlrm               |    1    | 13.675229  |
|              dcgan              |    1    | 13.476017  |
|         basic_gnn_sage          |    1    |  13.42188  |
|          basic_gnn_gin          |    1    | 13.267627  |
|        basic_gnn_edgecnn        |    1    | 13.167732  |
|     nvidia_deeprecommender      |    1    |  13.11559  |
|        soft_actor_critic        |   256   | 13.086306  |
|          lennard_jones          |    1    |  12.55609  |
|           tts_angular           |    1    | 12.428087  |
|          pytorch_unet           |    1    | 10.016309  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199043  |
|     resnet50_quantized_qat      |    1    |  0.179458  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.985986 |
|             demucs              |    1    | 0.984865 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.975023 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.974185 |
|           hf_T5_base            |    1    | 0.971984 |
|       Background_Matting        |    1    | 0.970249 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.969636 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.968692 |
|          pytorch_unet           |    1    | 0.964543 |
|              llama              |    1    | 0.964268 |
|        basic_gnn_edgecnn        |    1    | 0.961601 |
|           hf_BigBird            |    1    | 0.961507 |
|      torch_multimodal_clip      |    1    | 0.958496 |
|         vision_maskrcnn         |    1    | 0.955319 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.955092 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.953088 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.952134 |
|       doctr_det_predictor       |    1    | 0.950626 |
|         LearningToPaint         |    1    | 0.949809 |
|      doctr_reco_predictor       |    1    | 0.944586 |
|          basic_gnn_gcn          |    1    | 0.943584 |
|     resnet50_quantized_qat      |    1    | 0.942827 |
|           Super_SloMo           |    1    | 0.941845 |
|             hf_GPT2             |    1    | 0.938947 |
|          basic_gnn_gin          |    1    | 0.937759 |
|         basic_gnn_sage          |    1    | 0.936552 |
|          fastNLP_Bert           |    1    | 0.934986 |
|         pytorch_stargan         |   16    | 0.928005 |
|             hf_Bert             |    1    | 0.927546 |
|        hf_distil_whisper        |    1    | 0.924656 |
|            hf_Albert            |    1    | 0.915615 |
|       speech_transformer        |    1    | 0.913019 |
|          hf_GPT2_large          |    1    | 0.908956 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.90795  |
|          hf_DistilBert          |    1    | 0.907887 |
|              hf_T5              |    1    | 0.904447 |
|          BERT_pytorch           |    1    | 0.902781 |
|   mobilenet_v2_quantized_qat    |    1    | 0.901216 |
|          hf_Longformer          |    1    | 0.895901 |
|             hf_Bart             |    1    | 0.890029 |
|         opacus_cifar10          |    1    | 0.88984  |
|           tts_angular           |    1    | 0.882591 |
|        soft_actor_critic        |   256   | 0.879781 |
|           timm_nfnet            |    1    | 0.869794 |
|     timm_vision_transformer     |    1    | 0.869525 |
|        timm_efficientnet        |    1    | 0.864227 |
|            moondream            |    1    | 0.864179 |
|          squeezenet1_1          |    1    | 0.85997  |
|          mobilenet_v2           |    1    | 0.858871 |
|          hf_Bert_large          |    1    | 0.858494 |
|              vgg16              |    1    |  0.8578  |
|          lennard_jones          |    1    | 0.854484 |
|     functorch_maml_omniglot     |    1    | 0.851911 |
|           hf_T5_large           |    1    | 0.85129  |
|          maml_omniglot          |    5    | 0.850794 |
|       mobilenet_v3_large        |    1    | 0.848366 |
|      functorch_dp_cifar10       |    1    | 0.844391 |
|             alexnet             |    1    | 0.84261  |
|           mnasnet1_0            |    1    | 0.840909 |
|          timm_resnest           |    1    | 0.839612 |
|     pyhpc_equation_of_state     |    1    | 0.838333 |
|              dcgan              |    1    | 0.837719 |
|     nvidia_deeprecommender      |    1    | 0.836471 |
|           hf_Reformer           |    1    | 0.833914 |
|  timm_vision_transformer_large  |    1    | 0.832147 |
|         phlippe_resnet          |    1    | 0.828299 |
|       shufflenet_v2_x1_0        |    1    | 0.821378 |
|     pyhpc_isoneutral_mixing     |    1    | 0.81129  |
|           densenet121           |    1    | 0.802363 |
|        phlippe_densenet         |    1    | 0.800905 |
|         resnext50_32x4d         |    1    | 0.798312 |
|            resnet18             |    1    | 0.796193 |
|             yolov3              |    1    | 0.791148 |
|           timm_vovnet           |    1    | 0.790736 |
|           timm_regnet           |    1    | 0.779956 |
|            resnet50             |    1    | 0.774752 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.765589 |
|            resnet152            |    1    | 0.73708  |
|              maml               |    1    | 0.716981 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9633.265047 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2907.336111 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2896.481402 |
|          hf_GPT2_large          |    1    | 2589.73942  |
|            moondream            |    1    | 2230.554062 |
|           hf_T5_large           |    1    | 2145.09612  |
|        hf_distil_whisper        |    1    | 2015.707122 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1670.513729 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1647.250748 |
|          pytorch_unet           |    1    | 1402.445446 |
|       Background_Matting        |    1    | 1217.42575  |
|             demucs              |    1    | 1191.08828  |
|  timm_vision_transformer_large  |    1    | 958.354775  |
|         vision_maskrcnn         |    1    | 701.230476  |
|    detectron2_fcos_r_50_fpn     |    1    | 623.783157  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 600.238056  |
|           hf_BigBird            |    1    | 522.931038  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 513.495389  |
|          hf_Longformer          |    1    | 512.073712  |
|          hf_Bert_large          |    1    | 465.021699  |
|       doctr_det_predictor       |    1    | 417.944332  |
|      torch_multimodal_clip      |    1    | 320.318099  |
|           Super_SloMo           |    1    | 315.987858  |
|             hf_Bart             |    1    | 256.253158  |
|              hf_T5              |    1    |  232.98752  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 226.238691  |
|         pytorch_stargan         |   16    | 216.329425  |
|             hf_Bert             |    1    | 180.524061  |
|            hf_Albert            |    1    | 167.271111  |
|          fastNLP_Bert           |    1    | 166.148052  |
|       speech_transformer        |    1    |  164.03909  |
|             hf_GPT2             |    1    | 110.461493  |
|          hf_DistilBert          |    1    | 104.778481  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 102.323726  |
|           hf_Reformer           |    1    |  99.103794  |
|        basic_gnn_edgecnn        |    1    |  86.889573  |
|             yolov3              |    1    |  62.810448  |
|              maml               |    1    |  55.596282  |
|          BERT_pytorch           |    1    |  42.688416  |
|              vgg16              |    1    |  42.389743  |
|     nvidia_deeprecommender      |    1    |  39.940052  |
|           timm_regnet           |    1    |  32.917079  |
|           timm_nfnet            |    1    |  30.826718  |
|           tts_angular           |    1    |  30.385206  |
|            resnet152            |    1    |  30.187177  |
|          basic_gnn_gcn          |    1    |  27.507343  |
|     timm_vision_transformer     |    1    |  18.866823  |
|           densenet121           |    1    |  13.594999  |
|         resnext50_32x4d         |    1    |  13.350526  |
|           timm_vovnet           |    1    |  12.768911  |
|             alexnet             |    1    |  12.273643  |
|            resnet50             |    1    |  11.651935  |
|              llama              |    1    |  10.954121  |
|         basic_gnn_sage          |    1    |  9.764009   |
|          basic_gnn_gin          |    1    |  9.008047   |
|     resnet50_quantized_qat      |    1    |   8.53643   |
|        timm_efficientnet        |    1    |  8.055111   |
|          timm_resnest           |    1    |  6.181672   |
|   mobilenet_v2_quantized_qat    |    1    |  5.781063   |
|      doctr_reco_predictor       |    1    |  5.587206   |
|            resnet18             |    1    |  4.011824   |
|       mobilenet_v3_large        |    1    |  3.904634   |
|           mnasnet1_0            |    1    |  3.646892   |
|          mobilenet_v2           |    1    |  3.564121   |
|       shufflenet_v2_x1_0        |    1    |  3.466181   |
|         LearningToPaint         |    1    |  2.534061   |
|        phlippe_densenet         |    1    |  2.335943   |
|          squeezenet1_1          |    1    |  2.214063   |
|         opacus_cifar10          |    1    |  1.687374   |
|      functorch_dp_cifar10       |    1    |  1.651344   |
|         phlippe_resnet          |    1    |  0.848748   |
|        soft_actor_critic        |   256   |  0.606607   |
|              dcgan              |    1    |  0.602917   |
|     functorch_maml_omniglot     |    1    |  0.518494   |
|              dlrm               |    1    |  0.450341   |
|          maml_omniglot          |    5    |  0.304058   |
|     pyhpc_isoneutral_mixing     |    1    |  0.044135   |
|          lennard_jones          |    1    |  0.028124   |
|     pyhpc_equation_of_state     |    1    |  0.026796   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.879509 |
|     MobileBertForQuestionAnswering      | 1  | 3.129215 |
|             XGLMForCausalLM             | 1  | 2.781922 |
|          DistilBertForMaskedLM          | 1  | 2.738782 |
|           PegasusForCausalLM            | 1  | 2.63644  |
|     M2M100ForConditionalGeneration      | 1  | 2.629618 |
|     DistilBertForQuestionAnswering      | 1  | 2.621954 |
|         Speech2Text2ForCausalLM         | 1  | 2.617003 |
|     PegasusForConditionalGeneration     | 1  | 2.553875 |
|       BlenderbotSmallForCausalLM        | 1  | 2.495734 |
|          BlenderbotForCausalLM          | 1  | 2.491732 |
|            YituTechConvBert             | 1  | 2.489364 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.352356 |
|       DebertaForQuestionAnswering       | 1  | 2.050518 |
|           DebertaForMaskedLM            | 1  | 2.012012 |
|            XLNetLMHeadModel             | 1  | 1.967681 |
|       MT5ForConditionalGeneration       | 1  | 1.837763 |
|           ElectraForCausalLM            | 1  | 1.823651 |
|           RobertaForCausalLM            | 1  | 1.758915 |
|           LayoutLMForMaskedLM           | 1  | 1.748989 |
|               DistillGPT2               | 1  | 1.748272 |
|      GPT2ForSequenceClassification      | 1  | 1.746596 |
|                CamemBert                | 1  | 1.742662 |
|             BertForMaskedLM             | 1  | 1.71043  |
|    LayoutLMForSequenceClassification    | 1  | 1.698051 |
|            TrOCRForCausalLM             | 1  | 1.684869 |
|       RobertaForQuestionAnswering       | 1  | 1.657247 |
|        BertForQuestionAnswering         | 1  | 1.647162 |
|       ElectraForQuestionAnswering       | 1  | 1.63449  |
|         MegatronBertForCausalLM         | 1  | 1.62433  |
|    MegatronBertForQuestionAnswering     | 1  | 1.62326  |
|          DebertaV2ForMaskedLM           | 1  | 1.542038 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.515752 |
|             OPTForCausalLM              | 1  | 1.435157 |
|             BartForCausalLM             | 1  | 1.37412  |
|            PLBartForCausalLM            | 1  | 1.318773 |
|      BartForConditionalGeneration       | 1  | 1.311054 |
|     PLBartForConditionalGeneration      | 1  |  1.284   |
|            MBartForCausalLM             | 1  | 1.277178 |
|      MBartForConditionalGeneration      | 1  | 1.273795 |
|               GoogleFnet                | 1  | 1.221829 |
|       AlbertForQuestionAnswering        | 1  | 1.192112 |
|            AlbertForMaskedLM            | 1  | 1.184223 |
|       T5ForConditionalGeneration        | 1  | 1.178028 |
|                 T5Small                 | 1  | 1.173722 |
|          AllenaiLongformerBase          | 1  | 0.806195 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |  fail_accuracy   |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          AllenaiLongformerBase          | 1  | 104.640132 |
|          MobileBertForMaskedLM          | 1  | 100.360341 |
|     MobileBertForQuestionAnswering      | 1  | 99.891127  |
|     PegasusForConditionalGeneration     | 1  | 57.410601  |
|      MBartForConditionalGeneration      | 1  | 57.123066  |
|     M2M100ForConditionalGeneration      | 1  | 57.050424  |
|      BartForConditionalGeneration       | 1  |  52.82702  |
|             XGLMForCausalLM             | 1  | 44.347276  |
|      DebertaV2ForQuestionAnswering      | 1  | 42.924347  |
|          DebertaV2ForMaskedLM           | 1  | 42.668281  |
|          BlenderbotForCausalLM          | 1  | 41.688798  |
|         MegatronBertForCausalLM         | 1  | 39.595519  |
|    MegatronBertForQuestionAnswering     | 1  | 39.347084  |
|       MT5ForConditionalGeneration       | 1  | 38.708373  |
| BlenderbotSmallForConditionalGeneration | 1  | 37.984196  |
|            XLNetLMHeadModel             | 1  | 36.887582  |
|            YituTechConvBert             | 1  |  31.96385  |
|     PLBartForConditionalGeneration      | 1  |  29.85842  |
|                 T5Small                 | 1  | 28.865796  |
|       T5ForConditionalGeneration        | 1  |  28.80205  |
|            TrOCRForCausalLM             | 1  | 25.734893  |
|           PegasusForCausalLM            | 1  | 25.661311  |
|            MBartForCausalLM             | 1  | 25.260732  |
|           LayoutLMForMaskedLM           | 1  | 24.298489  |
|                CamemBert                | 1  |  24.18351  |
|             BertForMaskedLM             | 1  | 24.168478  |
|    LayoutLMForSequenceClassification    | 1  | 24.155317  |
|            AlbertForMaskedLM            | 1  | 24.090718  |
|           RobertaForCausalLM            | 1  | 24.065603  |
|        BertForQuestionAnswering         | 1  | 24.046305  |
|       RobertaForQuestionAnswering       | 1  | 24.043664  |
|           ElectraForCausalLM            | 1  | 24.035212  |
|       ElectraForQuestionAnswering       | 1  | 23.884087  |
|       DebertaForQuestionAnswering       | 1  | 23.022023  |
|           DebertaForMaskedLM            | 1  | 22.944638  |
|             BartForCausalLM             | 1  | 22.774998  |
|             OPTForCausalLM              | 1  |  22.08866  |
|       AlbertForQuestionAnswering        | 1  | 21.302121  |
|      GPT2ForSequenceClassification      | 1  | 21.263694  |
|       BlenderbotSmallForCausalLM        | 1  | 21.222445  |
|         Speech2Text2ForCausalLM         | 1  | 19.039305  |
|          DistilBertForMaskedLM          | 1  | 18.915011  |
|     DistilBertForQuestionAnswering      | 1  |  18.7921   |
|            PLBartForCausalLM            | 1  | 18.736999  |
|               GoogleFnet                | 1  | 18.418935  |
|               DistillGPT2               | 1  |  17.23864  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.97663  |
|            MBartForCausalLM             | 1  | 0.958662 |
|               DistillGPT2               | 1  | 0.954674 |
|     PLBartForConditionalGeneration      | 1  | 0.943188 |
|            PLBartForCausalLM            | 1  | 0.942318 |
|      GPT2ForSequenceClassification      | 1  | 0.94063  |
|    MegatronBertForQuestionAnswering     | 1  | 0.940601 |
|       RobertaForQuestionAnswering       | 1  | 0.938815 |
|            YituTechConvBert             | 1  | 0.937963 |
|       MT5ForConditionalGeneration       | 1  | 0.937698 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.936182 |
|             BertForMaskedLM             | 1  | 0.934697 |
|           LayoutLMForMaskedLM           | 1  | 0.934647 |
|           PegasusForCausalLM            | 1  | 0.934407 |
|          BlenderbotForCausalLM          | 1  | 0.933665 |
|             BartForCausalLM             | 1  | 0.932979 |
|       T5ForConditionalGeneration        | 1  | 0.931461 |
|         MegatronBertForCausalLM         | 1  | 0.931167 |
|                CamemBert                | 1  | 0.930323 |
|                 T5Small                 | 1  | 0.929567 |
|            TrOCRForCausalLM             | 1  | 0.928722 |
|           DebertaForMaskedLM            | 1  | 0.927544 |
|     M2M100ForConditionalGeneration      | 1  | 0.926099 |
|        BertForQuestionAnswering         | 1  | 0.92341  |
|           RobertaForCausalLM            | 1  | 0.923077 |
|    LayoutLMForSequenceClassification    | 1  | 0.920245 |
|             XGLMForCausalLM             | 1  | 0.917073 |
|       BlenderbotSmallForCausalLM        | 1  | 0.916305 |
|      BartForConditionalGeneration       | 1  | 0.914172 |
|       DebertaForQuestionAnswering       | 1  | 0.903628 |
|          DebertaV2ForMaskedLM           | 1  | 0.902259 |
|          AllenaiLongformerBase          | 1  | 0.90214  |
|      MBartForConditionalGeneration      | 1  | 0.900194 |
|           ElectraForCausalLM            | 1  | 0.891761 |
|     PegasusForConditionalGeneration     | 1  | 0.888373 |
|            XLNetLMHeadModel             | 1  | 0.883388 |
|          DistilBertForMaskedLM          | 1  | 0.878992 |
|     DistilBertForQuestionAnswering      | 1  | 0.873275 |
|       ElectraForQuestionAnswering       | 1  | 0.872568 |
|         Speech2Text2ForCausalLM         | 1  | 0.862745 |
|               GoogleFnet                | 1  | 0.851737 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.844799 |
|          MobileBertForMaskedLM          | 1  | 0.783333 |
|     MobileBertForQuestionAnswering      | 1  | 0.749226 |
|            AlbertForMaskedLM            | 1  | 0.688607 |
|       AlbertForQuestionAnswering        | 1  | 0.668436 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 3879.468912 |
|       AlbertForQuestionAnswering        | 1  | 3839.221976 |
|             OPTForCausalLM              | 1  | 2190.466476 |
|      MBartForConditionalGeneration      | 1  | 1823.192853 |
|      BartForConditionalGeneration       | 1  | 1467.333396 |
|          DebertaV2ForMaskedLM           | 1  | 1324.682748 |
|          AllenaiLongformerBase          | 1  | 1132.245498 |
|      DebertaV2ForQuestionAnswering      | 1  | 1015.024377 |
|            XLNetLMHeadModel             | 1  | 1008.069075 |
|            MBartForCausalLM             | 1  | 983.111406  |
|                 T5Small                 | 1  | 793.246829  |
|       T5ForConditionalGeneration        | 1  | 791.812986  |
|          BlenderbotForCausalLM          | 1  |  754.50033  |
|     PLBartForConditionalGeneration      | 1  |  674.11565  |
|             BartForCausalLM             | 1  | 634.358225  |
|         MegatronBertForCausalLM         | 1  | 530.338617  |
|    MegatronBertForQuestionAnswering     | 1  | 483.983023  |
|               GoogleFnet                | 1  | 452.164998  |
|      GPT2ForSequenceClassification      | 1  | 387.349365  |
|            PLBartForCausalLM            | 1  | 381.998349  |
|             XGLMForCausalLM             | 1  | 239.582437  |
|           RobertaForCausalLM            | 1  |  215.31144  |
|     M2M100ForConditionalGeneration      | 1  | 210.644094  |
|           DebertaForMaskedLM            | 1  | 206.572392  |
|            YituTechConvBert             | 1  | 199.552335  |
|       MT5ForConditionalGeneration       | 1  | 193.721881  |
|             BertForMaskedLM             | 1  | 189.411326  |
|           LayoutLMForMaskedLM           | 1  | 188.934579  |
|                CamemBert                | 1  | 188.216629  |
|     PegasusForConditionalGeneration     | 1  | 181.636437  |
|            TrOCRForCausalLM             | 1  | 181.576706  |
|        BertForQuestionAnswering         | 1  | 149.806363  |
|               DistillGPT2               | 1  | 149.332587  |
|    LayoutLMForSequenceClassification    | 1  | 149.076044  |
|       RobertaForQuestionAnswering       | 1  | 148.712713  |
|       DebertaForQuestionAnswering       | 1  | 140.897886  |
|           PegasusForCausalLM            | 1  |  88.305199  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.825475  |
|           ElectraForCausalLM            | 1  |  51.540774  |
|       ElectraForQuestionAnswering       | 1  |  30.824581  |
|          DistilBertForMaskedLM          | 1  |  30.444251  |
|          MobileBertForMaskedLM          | 1  |  29.56508   |
|       BlenderbotSmallForCausalLM        | 1  |  27.805309  |
|     DistilBertForQuestionAnswering      | 1  |  18.346653  |
|     MobileBertForQuestionAnswering      | 1  |  18.236224  |
|         Speech2Text2ForCausalLM         | 1  |  5.574955   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.283952 |
|        ese_vovnet19b_dw         | 1  | 6.257334 |
|       gluon_inception_v3        | 1  | 6.254001 |
|        adv_inception_v3         | 1  | 6.163451 |
|           resnest101e           | 1  | 5.834551 |
|          pnasnet5large          | 1  | 5.347724 |
|          cspdarknet53           | 1  | 5.274964 |
|           dm_nfnet_f0           | 1  | 5.173357 |
|     swsl_resnext101_32x16d      | 1  | 4.979674 |
|           res2next50            | 1  | 4.922202 |
|            nfnet_l0             | 1  | 4.643614 |
|         mobilenetv2_100         | 1  | 4.591663 |
|             dla102              | 1  | 4.573489 |
|            fbnetv3_b            | 1  | 4.55556  |
|          botnet26t_256          | 1  | 4.460503 |
|           fbnetc_100            | 1  | 4.398459 |
|           mnasnet_100           | 1  | 4.385637 |
|          spnasnet_100           | 1  | 4.364346 |
|      mobilenetv3_large_100      | 1  | 4.296323 |
|            gernet_l             | 1  | 4.134342 |
|        res2net50_14w_8s         | 1  | 4.115765 |
|           selecsls42b           | 1  | 4.088614 |
|        res2net101_26w_4s        | 1  | 4.050172 |
|           regnety_002           | 1  | 3.81884  |
|            repvgg_a2            | 1  | 3.81146  |
|            hrnet_w18            | 1  | 3.787093 |
|          ghostnet_100           | 1  | 3.724338 |
|       eca_botnext26ts_256       | 1  | 3.632325 |
|        eca_halonext26ts         | 1  | 3.458316 |
|            tinynet_a            | 1  | 3.233528 |
|         poolformer_m36          | 1  | 3.22425  |
|           rexnet_100            | 1  | 3.197588 |
|            lcnet_050            | 1  | 3.156329 |
|         visformer_small         | 1  | 3.155428 |
|       tf_efficientnet_b0        | 1  | 3.03105  |
|            levit_128            | 1  | 2.933912 |
|             dpn107              | 1  | 2.907231 |
|           mobilevit_s           | 1  | 2.894149 |
|        convmixer_768_32         | 1  | 2.646223 |
|         coat_lite_mini          | 1  | 2.535569 |
|        twins_pcpvt_base         | 1  | 2.237484 |
|            mixnet_l             | 1  | 2.165399 |
|           tf_mixnet_l           | 1  | 2.157709 |
|           volo_d1_224           | 1  | 1.982917 |
|          gmixer_24_224          | 1  | 1.961712 |
|          gmlp_s16_224           | 1  | 1.826115 |
|      beit_base_patch16_224      | 1  | 1.804093 |
|        tnt_s_patch16_224        | 1  | 1.802395 |
|            pit_b_224            | 1  |  1.7787  |
|           convit_base           | 1  | 1.770241 |
|  swin_base_patch4_window7_224   | 1  | 1.727423 |
|          jx_nest_base           | 1  | 1.718599 |
|         crossvit_9_240          | 1  | 1.698824 |
|          convnext_base          | 1  | 1.693228 |
|      xcit_large_24_p8_224       | 1  | 1.673909 |
|          resmlp_12_224          | 1  | 1.672786 |
|      vit_base_patch16_224       | 1  | 1.611786 |
| deit_base_distilled_patch16_224 | 1  | 1.511317 |
|          cait_m36_384           | 1  | 1.493601 |
|          mixer_b16_224          | 1  | 1.413033 |
|        sebotnet33ts_256         | 1  | 1.230939 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 499.943997 |
|            hrnet_w18            | 1  | 271.642271 |
|        res2net101_26w_4s        | 1  | 80.618549  |
|           resnest101e           | 1  | 75.134153  |
|        res2net50_14w_8s         | 1  | 67.159763  |
|           tf_mixnet_l           | 1  |  66.96205  |
|            mixnet_l             | 1  | 64.795264  |
|         poolformer_m36          | 1  | 56.279449  |
|      xcit_large_24_p8_224       | 1  | 54.350869  |
|          cait_m36_384           | 1  | 50.430031  |
|             dpn107              | 1  | 49.279714  |
|  swin_base_patch4_window7_224   | 1  |  49.17978  |
|        twins_pcpvt_base         | 1  |  49.11706  |
|        tnt_s_patch16_224        | 1  | 43.447059  |
|            fbnetv3_b            | 1  | 42.756178  |
|          jx_nest_base           | 1  | 41.471648  |
|        adv_inception_v3         | 1  | 40.155451  |
|             dla102              | 1  | 37.700821  |
|          inception_v3           | 1  | 37.622563  |
|       gluon_inception_v3        | 1  | 37.607926  |
|           volo_d1_224           | 1  | 34.929726  |
|           res2next50            | 1  | 34.773235  |
|          ghostnet_100           | 1  | 34.521073  |
|           mobilevit_s           | 1  | 34.057773  |
|          convnext_base          | 1  | 33.401078  |
|     swsl_resnext101_32x16d      | 1  | 32.871241  |
|         crossvit_9_240          | 1  | 32.186571  |
|            tinynet_a            | 1  | 32.148389  |
|           rexnet_100            | 1  | 30.228945  |
|          gmixer_24_224          | 1  | 29.953207  |
|       tf_efficientnet_b0        | 1  | 28.505301  |
|            levit_128            | 1  | 28.497614  |
|           dm_nfnet_f0           | 1  | 28.143037  |
|         coat_lite_mini          | 1  | 28.005572  |
|          gmlp_s16_224           | 1  | 27.835461  |
|        sebotnet33ts_256         | 1  | 27.761492  |
|        eca_halonext26ts         | 1  | 27.064758  |
|            nfnet_l0             | 1  | 26.526028  |
|        convmixer_768_32         | 1  | 25.805115  |
|          cspdarknet53           | 1  | 25.352917  |
|           convit_base           | 1  | 25.211926  |
|           regnety_002           | 1  | 24.541961  |
|       eca_botnext26ts_256       | 1  | 23.573137  |
|         visformer_small         | 1  | 23.190288  |
|           fbnetc_100            | 1  | 23.128879  |
|      mobilenetv3_large_100      | 1  |  22.98553  |
|          spnasnet_100           | 1  | 22.972264  |
|          botnet26t_256          | 1  | 22.210064  |
|            pit_b_224            | 1  | 22.025925  |
|            gernet_l             | 1  | 21.886068  |
|      beit_base_patch16_224      | 1  | 21.395587  |
|            repvgg_a2            | 1  |  21.27579  |
| deit_base_distilled_patch16_224 | 1  |  21.16103  |
|      vit_base_patch16_224       | 1  | 20.964369  |
|         mobilenetv2_100         | 1  | 20.878113  |
|          mixer_b16_224          | 1  | 20.634953  |
|           mnasnet_100           | 1  | 20.598189  |
|        ese_vovnet19b_dw         | 1  | 19.573057  |
|           selecsls42b           | 1  | 19.084436  |
|          resmlp_12_224          | 1  | 17.979272  |
|            lcnet_050            | 1  |  17.72009  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 0.91369  |
|            nfnet_l0             | 1  | 0.911508 |
|          convnext_base          | 1  | 0.908895 |
|      beit_base_patch16_224      | 1  | 0.897091 |
|        convmixer_768_32         | 1  | 0.894816 |
|          cait_m36_384           | 1  | 0.894606 |
|           dm_nfnet_f0           | 1  | 0.891126 |
|          resmlp_12_224          | 1  | 0.887655 |
|         poolformer_m36          | 1  | 0.887344 |
|           convit_base           | 1  | 0.88453  |
|           volo_d1_224           | 1  | 0.883859 |
|  swin_base_patch4_window7_224   | 1  | 0.883426 |
|        ese_vovnet19b_dw         | 1  | 0.882463 |
|            pit_b_224            | 1  | 0.87724  |
|         coat_lite_mini          | 1  | 0.875133 |
|      vit_base_patch16_224       | 1  | 0.874617 |
| deit_base_distilled_patch16_224 | 1  | 0.873687 |
|          mixer_b16_224          | 1  | 0.873303 |
|         visformer_small         | 1  | 0.872863 |
|          jx_nest_base           | 1  | 0.87194  |
|        twins_pcpvt_base         | 1  |  0.8705  |
|         mobilenetv2_100         | 1  | 0.869983 |
|          gmixer_24_224          | 1  | 0.864261 |
|          gmlp_s16_224           | 1  | 0.864241 |
|           mobilevit_s           | 1  | 0.860951 |
|            lcnet_050            | 1  | 0.859506 |
|       tf_efficientnet_b0        | 1  | 0.855108 |
|      mobilenetv3_large_100      | 1  | 0.854511 |
|           rexnet_100            | 1  | 0.851983 |
|      xcit_large_24_p8_224       | 1  | 0.851257 |
|            fbnetv3_b            | 1  | 0.850026 |
|           mnasnet_100           | 1  | 0.848911 |
|          spnasnet_100           | 1  | 0.847846 |
|        sebotnet33ts_256         | 1  | 0.846736 |
|            tinynet_a            | 1  | 0.845439 |
|           fbnetc_100            | 1  | 0.843846 |
|       eca_botnext26ts_256       | 1  | 0.843489 |
|        tnt_s_patch16_224        | 1  | 0.841459 |
|        eca_halonext26ts         | 1  | 0.841382 |
|          botnet26t_256          | 1  |  0.8412  |
|           tf_mixnet_l           | 1  | 0.840893 |
|            mixnet_l             | 1  | 0.835005 |
|          ghostnet_100           | 1  | 0.83464  |
|         crossvit_9_240          | 1  | 0.834036 |
|            levit_128            | 1  | 0.827887 |
|           regnety_002           | 1  | 0.827183 |
|             dpn107              | 1  | 0.826662 |
|          cspdarknet53           | 1  | 0.795324 |
|           res2next50            | 1  | 0.791417 |
|       gluon_inception_v3        | 1  | 0.787132 |
|        adv_inception_v3         | 1  | 0.787064 |
|          inception_v3           | 1  | 0.787019 |
|             dla102              | 1  | 0.785204 |
|        res2net50_14w_8s         | 1  | 0.782559 |
|           resnest101e           | 1  | 0.774059 |
|           selecsls42b           | 1  | 0.770694 |
|            gernet_l             | 1  | 0.763741 |
|            repvgg_a2            | 1  | 0.763096 |
|            hrnet_w18            | 1  | 0.75852  |
|        res2net101_26w_4s        | 1  | 0.755718 |
|     swsl_resnext101_32x16d      | 1  | 0.71954  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1559.193385 |
|      xcit_large_24_p8_224       | 1  | 430.202657  |
|          pnasnet5large          | 1  |  129.93447  |
|          jx_nest_base           | 1  | 109.481679  |
|          convnext_base          | 1  |  99.200108  |
|     swsl_resnext101_32x16d      | 1  |  91.412973  |
|  swin_base_patch4_window7_224   | 1  |  87.386264  |
|           convit_base           | 1  |  87.140921  |
| deit_base_distilled_patch16_224 | 1  |  79.590872  |
|      beit_base_patch16_224      | 1  |  74.868534  |
|      vit_base_patch16_224       | 1  |  73.868619  |
|             dpn107              | 1  |  72.006106  |
|        convmixer_768_32         | 1  |  69.619515  |
|            pit_b_224            | 1  |  66.143478  |
|          mixer_b16_224          | 1  |  61.925702  |
|         poolformer_m36          | 1  |  49.426865  |
|        sebotnet33ts_256         | 1  |  49.092995  |
|        twins_pcpvt_base         | 1  |  42.881988  |
|        tnt_s_patch16_224        | 1  |  42.536065  |
|           dm_nfnet_f0           | 1  |  40.926544  |
|           volo_d1_224           | 1  |  40.395562  |
|           resnest101e           | 1  |   31.7854   |
|        res2net101_26w_4s        | 1  |  28.773239  |
|          gmlp_s16_224           | 1  |  28.714847  |
|            nfnet_l0             | 1  |  28.531958  |
|          gmixer_24_224          | 1  |  25.883895  |
|            hrnet_w18            | 1  |  23.609847  |
|         visformer_small         | 1  |  22.012685  |
|           mobilevit_s           | 1  |  21.298105  |
|           tf_mixnet_l           | 1  |  20.217214  |
|             dla102              | 1  |  19.61811   |
|            mixnet_l             | 1  |  19.376687  |
|        res2net50_14w_8s         | 1  |  17.815857  |
|          resmlp_12_224          | 1  |  17.111378  |
|          cspdarknet53           | 1  |  16.123831  |
|        eca_halonext26ts         | 1  |  16.041775  |
|          inception_v3           | 1  |  15.452856  |
|       eca_botnext26ts_256       | 1  |  15.290376  |
|       gluon_inception_v3        | 1  |  15.228623  |
|        adv_inception_v3         | 1  |  15.138283  |
|         coat_lite_mini          | 1  |  14.783823  |
|           res2next50            | 1  |  14.665808  |
|         crossvit_9_240          | 1  |  14.554188  |
|            repvgg_a2            | 1  |  12.885603  |
|            gernet_l             | 1  |  12.458896  |
|          botnet26t_256          | 1  |  12.026652  |
|           selecsls42b           | 1  |  11.374369  |
|       tf_efficientnet_b0        | 1  |  8.419273   |
|        ese_vovnet19b_dw         | 1  |   8.12147   |
|            fbnetv3_b            | 1  |  8.109134   |
|           rexnet_100            | 1  |    7.874    |
|            tinynet_a            | 1  |  7.303116   |
|            levit_128            | 1  |  5.654357   |
|          ghostnet_100           | 1  |  4.921359   |
|           fbnetc_100            | 1  |  4.410288   |
|          spnasnet_100           | 1  |  4.108966   |
|      mobilenetv3_large_100      | 1  |  3.876397   |
|         mobilenetv2_100         | 1  |  3.670668   |
|           mnasnet_100           | 1  |  3.669248   |
|           regnety_002           | 1  |  3.449969   |
|            lcnet_050            | 1  |   1.83388   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-15T14:36:27Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.55x    |    1.38x    |    1.90x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   52.50    |    36.08    |    50.45    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.97x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 17.066868 |
|          squeezenet1_1          |   16    | 3.196539  |
|       mobilenet_v3_large        |   32    | 3.131876  |
|        timm_efficientnet        |   64    | 3.084686  |
|          mobilenet_v2           |   16    | 3.035596  |
|           mnasnet1_0            |   32    |  3.02123  |
|       shufflenet_v2_x1_0        |   64    | 2.712963  |
|          timm_resnest           |   32    | 2.356686  |
|            resnet50             |   32    |  2.2439   |
|        phlippe_densenet         |   128   | 2.173088  |
|        soft_actor_critic        |   256   | 2.130405  |
|         resnext50_32x4d         |    8    | 2.111336  |
|         phlippe_resnet          |   128   |  2.07303  |
|             hf_GPT2             |    1    |  2.00841  |
|            resnet18             |    8    | 2.000927  |
|            resnet152            |   32    | 1.999399  |
|           densenet121           |   64    | 1.994099  |
|           timm_regnet           |   32    | 1.920687  |
|          BERT_pytorch           |    2    | 1.869382  |
|       doctr_det_predictor       |    1    | 1.837274  |
|           timm_nfnet            |   128   | 1.775558  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.722707  |
|          maml_omniglot          |    5    | 1.710649  |
|      doctr_reco_predictor       |    1    |  1.65387  |
|            hf_Albert            |    1    | 1.612811  |
|             alexnet             |   128   | 1.606917  |
|           timm_vovnet           |   32    | 1.600496  |
|        basic_gnn_edgecnn        |    1    | 1.577885  |
|          fastNLP_Bert           |    1    |  1.56715  |
|            moondream            |    1    | 1.566558  |
|     functorch_maml_omniglot     |    1    | 1.558701  |
|          hf_GPT2_large          |    1    | 1.544226  |
|          hf_Longformer          |    1    | 1.536847  |
|          hf_Bert_large          |    1    | 1.530632  |
|              llama              |   32    | 1.523814  |
|             hf_Bert             |    1    | 1.505914  |
|             yolov3              |    8    | 1.479305  |
|              vgg16              |    4    |  1.45748  |
|         LearningToPaint         |   96    | 1.457429  |
|          hf_DistilBert          |    1    | 1.418132  |
|      torch_multimodal_clip      |   32    | 1.402582  |
|           hf_Reformer           |    1    | 1.371079  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.366751  |
|         vision_maskrcnn         |    1    |  1.35691  |
|          lennard_jones          |  1000   | 1.338935  |
|          basic_gnn_gcn          |    1    | 1.329168  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.324837  |
|         opacus_cifar10          |   64    | 1.311468  |
|             hf_Bart             |    1    | 1.310179  |
|     timm_vision_transformer     |   32    | 1.305708  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.301842  |
|         basic_gnn_sage          |    1    | 1.269536  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.264677  |
|              hf_T5              |    1    | 1.257081  |
|           hf_T5_large           |    1    |  1.25707  |
|        hf_distil_whisper        |    1    | 1.242219  |
|              dcgan              |   256   | 1.239409  |
|         pytorch_stargan         |   16    | 1.229603  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.225632  |
|          pytorch_unet           |    1    | 1.219442  |
|              dlrm               |  2048   | 1.217233  |
|     nvidia_deeprecommender      |   256   | 1.207794  |
|           hf_BigBird            |    1    | 1.196374  |
|      functorch_dp_cifar10       |   64    | 1.187307  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.186788  |
|          basic_gnn_gin          |    1    | 1.183664  |
|           Super_SloMo           |    6    | 1.135413  |
|       Background_Matting        |    1    | 1.134125  |
|       speech_transformer        |    1    | 1.126485  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.07617  |
|  timm_vision_transformer_large  |   32    | 1.065352  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.032331  |
|     resnet50_quantized_qat      |   32    | 1.022524  |
|             demucs              |    1    | 1.019561  |
|   mobilenet_v2_quantized_qat    |   96    | 1.008261  |
|           tts_angular           |   64    | 1.000485  |
|           hf_T5_base            |    1    | 0.894027  |
|              maml               |    1    | 0.855427  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.756604  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 335.53093  |
|         vision_maskrcnn         |    1    | 314.685917 |
|    detectron2_fcos_r_50_fpn     |    1    | 260.368494 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 242.141753 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 224.303349 |
|           hf_T5_large           |    1    | 185.747583 |
|              maml               |    1    | 152.759642 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 120.421975 |
|          hf_Longformer          |    1    | 109.947448 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 97.114532  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 96.772077  |
|       speech_transformer        |    1    | 90.890536  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 84.939907  |
|           hf_Reformer           |    1    | 82.369494  |
|          basic_gnn_gcn          |    1    | 63.658412  |
|           densenet121           |   64    | 60.590754  |
|            resnet152            |   32    | 60.309676  |
|           hf_T5_base            |    1    | 57.671439  |
|          fastNLP_Bert           |    1    | 55.240717  |
|       doctr_det_predictor       |    1    | 51.453019  |
|          hf_GPT2_large          |    1    |  47.3123   |
|  timm_vision_transformer_large  |   32    | 46.363579  |
|            moondream            |    1    | 45.058214  |
|     pyhpc_isoneutral_mixing     | 1048576 | 43.564842  |
|          hf_Bert_large          |    1    |  43.18978  |
|        hf_distil_whisper        |    1    |  42.20136  |
|           Super_SloMo           |    6    |  39.57501  |
|           timm_regnet           |   32    | 35.688061  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 35.044071  |
|             demucs              |    1    | 34.029407  |
|             yolov3              |    8    | 33.563157  |
|      torch_multimodal_clip      |   32    | 32.171269  |
|        timm_efficientnet        |   64    | 30.513717  |
|           timm_nfnet            |   128   | 29.948408  |
|              hf_T5              |    1    | 29.947351  |
|          BERT_pytorch           |    2    |  29.85635  |
|             hf_Bart             |    1    | 28.440805  |
|       shufflenet_v2_x1_0        |   64    | 27.483783  |
|        phlippe_densenet         |   128   | 27.374866  |
|       Background_Matting        |    1    |  26.67551  |
|      doctr_reco_predictor       |    1    | 26.633136  |
|       mobilenet_v3_large        |   32    | 26.018767  |
|             hf_Bert             |    1    |  25.90331  |
|            hf_Albert            |    1    | 24.102459  |
|          timm_resnest           |   32    | 23.901888  |
|         resnext50_32x4d         |    8    | 23.750748  |
|            resnet50             |   32    | 23.169437  |
|           timm_vovnet           |   32    | 23.041048  |
|          mobilenet_v2           |   16    | 23.020311  |
|             hf_GPT2             |    1    | 22.830674  |
|           mnasnet1_0            |   32    | 22.494095  |
|     timm_vision_transformer     |   32    | 22.229484  |
|              llama              |   32    | 21.798732  |
|          pytorch_unet           |    1    | 19.990464  |
|         opacus_cifar10          |   64    | 19.876925  |
|         pytorch_stargan         |   16    | 19.502853  |
|          hf_DistilBert          |    1    | 19.468724  |
|      functorch_dp_cifar10       |   64    | 19.235064  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 19.055019  |
|            resnet18             |    8    | 17.587029  |
|          squeezenet1_1          |   16    | 17.392274  |
|         LearningToPaint         |   96    | 17.063883  |
|         phlippe_resnet          |   128   | 17.063052  |
|              vgg16              |    4    | 15.788667  |
|             alexnet             |   128   | 15.707964  |
|     pyhpc_equation_of_state     | 1048576 | 15.309242  |
|     functorch_maml_omniglot     |    1    | 14.696146  |
|          maml_omniglot          |    5    | 14.509286  |
|              dlrm               |  2048   | 14.014273  |
|              dcgan              |   256   | 13.816928  |
|        basic_gnn_edgecnn        |    1    | 13.650965  |
|         basic_gnn_sage          |    1    | 13.649184  |
|          basic_gnn_gin          |    1    | 13.491696  |
|     nvidia_deeprecommender      |   256   | 13.320465  |
|        soft_actor_critic        |   256   | 13.194995  |
|          lennard_jones          |  1000   | 13.193151  |
|           tts_angular           |   64    | 12.766737  |
|   mobilenet_v2_quantized_qat    |   96    |  0.103921  |
|     resnet50_quantized_qat      |   32    |  0.066397  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.99048  |
|              dlrm               |  2048   | 0.987917 |
|           hf_T5_base            |    1    | 0.987561 |
|       Background_Matting        |    1    | 0.981153 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981025 |
|        timm_efficientnet        |   64    | 0.979916 |
|           densenet121           |   64    | 0.977328 |
|             demucs              |    1    | 0.976422 |
|           Super_SloMo           |    6    | 0.975663 |
|          hf_GPT2_large          |    1    | 0.975003 |
|  timm_vision_transformer_large  |   32    | 0.974037 |
|        basic_gnn_edgecnn        |    1    | 0.972781 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972562 |
|           timm_regnet           |   32    | 0.972436 |
|          pytorch_unet           |    1    | 0.971568 |
|         LearningToPaint         |   96    | 0.965522 |
|             yolov3              |    8    | 0.965091 |
|          timm_resnest           |   32    | 0.963549 |
|      torch_multimodal_clip      |   32    | 0.963147 |
|            resnet50             |   32    | 0.961369 |
|            resnet152            |   32    | 0.961345 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.961261 |
|           timm_vovnet           |   32    | 0.960066 |
|     resnet50_quantized_qat      |   32    | 0.957536 |
|     timm_vision_transformer     |   32    | 0.957357 |
|   mobilenet_v2_quantized_qat    |   96    | 0.957001 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.955657 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.95381  |
|           mnasnet1_0            |   32    |  0.9532  |
|          mobilenet_v2           |   16    | 0.951067 |
|       doctr_det_predictor       |    1    | 0.950765 |
|         vision_maskrcnn         |    1    | 0.950498 |
|       shufflenet_v2_x1_0        |   64    |  0.9477  |
|       mobilenet_v3_large        |   32    | 0.947489 |
|           hf_BigBird            |    1    | 0.943816 |
|         pytorch_stargan         |   16    | 0.942044 |
|        phlippe_densenet         |   128   | 0.94186  |
|          basic_gnn_gcn          |    1    | 0.940048 |
|         resnext50_32x4d         |    8    | 0.938352 |
|      doctr_reco_predictor       |    1    | 0.933028 |
|           tts_angular           |   64    | 0.92426  |
|          squeezenet1_1          |   16    | 0.916918 |
|              llama              |   32    | 0.91435  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907653 |
|     pyhpc_equation_of_state     | 1048576 | 0.907541 |
|              dcgan              |   256   | 0.905285 |
|        hf_distil_whisper        |    1    | 0.900959 |
|             alexnet             |   128   | 0.894721 |
|         phlippe_resnet          |   128   | 0.885779 |
|         opacus_cifar10          |   64    | 0.884151 |
|            resnet18             |    8    | 0.883721 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.882717 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.881999 |
|        soft_actor_critic        |   256   | 0.876465 |
|          basic_gnn_gin          |    1    | 0.860581 |
|         basic_gnn_sage          |    1    | 0.860017 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.85852  |
|          lennard_jones          |  1000   | 0.852514 |
|     functorch_maml_omniglot     |    1    | 0.850683 |
|          maml_omniglot          |    5    | 0.849632 |
|          fastNLP_Bert           |    1    | 0.835115 |
|          BERT_pytorch           |    2    | 0.814691 |
|           hf_T5_large           |    1    | 0.809487 |
|       speech_transformer        |    1    | 0.803946 |
|          hf_Bert_large          |    1    | 0.803598 |
|          hf_Longformer          |    1    | 0.80071  |
|            moondream            |    1    | 0.79744  |
|      functorch_dp_cifar10       |   64    | 0.796117 |
|            hf_Albert            |    1    | 0.787898 |
|             hf_Bert             |    1    | 0.785517 |
|     nvidia_deeprecommender      |   256   | 0.773189 |
|              hf_T5              |    1    | 0.772302 |
|          hf_DistilBert          |    1    | 0.769255 |
|              vgg16              |    4    | 0.766875 |
|             hf_GPT2             |    1    | 0.756368 |
|             hf_Bart             |    1    | 0.730439 |
|              maml               |    1    | 0.725686 |
|           hf_Reformer           |    1    | 0.724338 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.701376 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4467.66271  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1466.198964 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1330.893261 |
|           hf_T5_base            |    1    | 1207.759202 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1186.718401 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1098.911694 |
|           Super_SloMo           |    6    | 621.861727  |
|           timm_nfnet            |   128   | 572.833665  |
|          hf_GPT2_large          |    1    | 562.368066  |
|           hf_T5_large           |    1    | 410.004269  |
|            moondream            |    1    | 406.613071  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 398.379878  |
|        hf_distil_whisper        |    1    | 364.481123  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 349.263526  |
|         vision_maskrcnn         |    1    | 309.509722  |
|       Background_Matting        |    1    | 257.576207  |
|          pytorch_unet           |    1    |  230.87556  |
|           timm_regnet           |   32    | 222.182465  |
|            resnet152            |   32    |  195.06561  |
|           densenet121           |   64    | 192.200323  |
|             yolov3              |    8    | 182.455058  |
|    detectron2_fcos_r_50_fpn     |    1    |  170.63273  |
|             demucs              |    1    | 157.207767  |
|      torch_multimodal_clip      |   32    |  149.74439  |
|           hf_BigBird            |    1    |  140.72099  |
|           timm_vovnet           |   32    | 127.888351  |
|          hf_Bert_large          |    1    | 113.465455  |
|         pytorch_stargan         |   16    |  96.265651  |
|       doctr_det_predictor       |    1    |  94.505373  |
|     timm_vision_transformer     |   32    |  91.382813  |
|            resnet50             |   32    |  77.711642  |
|          hf_Longformer          |    1    |  71.714593  |
|             hf_Bart             |    1    |  55.622577  |
|          timm_resnest           |   32    |  54.185319  |
|       speech_transformer        |    1    |  52.340091  |
|              maml               |    1    |  49.22787   |
|        timm_efficientnet        |   64    |  47.625531  |
|   mobilenet_v2_quantized_qat    |   96    |  46.03661   |
|             alexnet             |   128   |   44.568    |
|              hf_T5              |    1    |  43.625342  |
|             hf_Bert             |    1    |  41.884467  |
|         LearningToPaint         |   96    |  38.72326   |
|            hf_Albert            |    1    |  36.668696  |
|          fastNLP_Bert           |    1    |  34.946787  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  34.915413  |
|              vgg16              |    4    |  33.965413  |
|     pyhpc_isoneutral_mixing     | 1048576 |  31.171067  |
|     nvidia_deeprecommender      |   256   |  30.899938  |
|           hf_Reformer           |    1    |  28.60593   |
|     resnet50_quantized_qat      |   32    |  27.690859  |
|          hf_DistilBert          |    1    |  26.800284  |
|             hf_GPT2             |    1    |  24.101631  |
|         resnext50_32x4d         |    8    |  23.082669  |
|              llama              |   32    |  22.537442  |
|           tts_angular           |   64    |  21.800039  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  21.281072  |
|        phlippe_densenet         |   128   |  20.570705  |
|              dcgan              |   256   |  20.498544  |
|          BERT_pytorch           |    2    |  20.263168  |
|        basic_gnn_edgecnn        |    1    |  17.941712  |
|       shufflenet_v2_x1_0        |   64    |  15.976654  |
|           mnasnet1_0            |   32    |  14.246447  |
|       mobilenet_v3_large        |   32    |  13.403157  |
|          basic_gnn_gcn          |    1    |   9.59246   |
|          mobilenet_v2           |   16    |  9.294325   |
|            resnet18             |    8    |   9.27256   |
|              dlrm               |  2048   |  7.006668   |
|         opacus_cifar10          |   64    |  6.267045   |
|      functorch_dp_cifar10       |   64    |  6.101537   |
|          squeezenet1_1          |   16    |  5.909098   |
|         basic_gnn_sage          |    1    |  5.152097   |
|          basic_gnn_gin          |    1    |  4.810413   |
|         phlippe_resnet          |   128   |  4.388036   |
|      doctr_reco_predictor       |    1    |   3.69389   |
|     pyhpc_equation_of_state     | 1048576 |  0.859152   |
|     functorch_maml_omniglot     |    1    |  0.486205   |
|          maml_omniglot          |    5    |  0.392163   |
|        soft_actor_critic        |   256   |  0.324216   |
|          lennard_jones          |  1000   |  0.219966   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.666831 |
|     MobileBertForQuestionAnswering      | 128 | 2.047491 |
|           ElectraForCausalLM            | 32  | 1.956004 |
|      GPT2ForSequenceClassification      |  4  | 1.944648 |
|       ElectraForQuestionAnswering       | 64  | 1.878935 |
|          MobileBertForMaskedLM          | 128 | 1.642742 |
|               DistillGPT2               | 16  | 1.529532 |
|       RobertaForQuestionAnswering       | 16  | 1.500552 |
|            YituTechConvBert             | 16  | 1.462195 |
|    LayoutLMForSequenceClassification    | 16  | 1.455096 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.451178 |
|               GoogleFnet                | 16  | 1.447613 |
|        BertForQuestionAnswering         | 16  | 1.441926 |
|           RobertaForCausalLM            | 16  | 1.422215 |
|          AllenaiLongformerBase          |  4  | 1.417753 |
|    MegatronBertForQuestionAnswering     |  8  | 1.401863 |
|           LayoutLMForMaskedLM           | 16  | 1.398908 |
|             BertForMaskedLM             | 16  | 1.386185 |
|                CamemBert                | 16  | 1.383279 |
|         MegatronBertForCausalLM         |  4  | 1.380578 |
|           DebertaForMaskedLM            |  8  | 1.372374 |
|       DebertaForQuestionAnswering       | 16  | 1.321373 |
|             XGLMForCausalLM             |  8  | 1.304871 |
|     PLBartForConditionalGeneration      |  4  | 1.281522 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.263743 |
|      MBartForConditionalGeneration      |  2  | 1.260638 |
|          BlenderbotForCausalLM          |  4  | 1.253113 |
|          DebertaV2ForMaskedLM           |  2  | 1.251337 |
|            AlbertForMaskedLM            |  4  | 1.248335 |
|       AlbertForQuestionAnswering        |  4  | 1.245117 |
|         Speech2Text2ForCausalLM         | 256 | 1.233982 |
|             OPTForCausalLM              |  2  | 1.233267 |
|          DistilBertForMaskedLM          | 128 | 1.222379 |
|       MT5ForConditionalGeneration       | 16  | 1.209758 |
|     DistilBertForQuestionAnswering      | 256 | 1.204044 |
|     M2M100ForConditionalGeneration      | 16  | 1.190017 |
|      BartForConditionalGeneration       |  2  | 1.168335 |
|       BlenderbotSmallForCausalLM        | 64  | 1.166915 |
|     PegasusForConditionalGeneration     | 32  | 1.16663  |
|                 T5Small                 |  4  | 1.157407 |
|       T5ForConditionalGeneration        |  4  |  1.1546  |
|             BartForCausalLM             |  4  | 1.151771 |
|           PegasusForCausalLM            | 32  | 1.150183 |
|            MBartForCausalLM             |  4  | 1.124304 |
|            TrOCRForCausalLM             | 32  | 1.09022  |
|            PLBartForCausalLM            |  8  | 1.078233 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 127.645952 |
|     MobileBertForQuestionAnswering      | 128 | 117.614558 |
|          MobileBertForMaskedLM          | 128 | 115.832439 |
|      BartForConditionalGeneration       |  2  | 57.227935  |
|      MBartForConditionalGeneration      |  2  | 50.378429  |
|     M2M100ForConditionalGeneration      | 16  | 50.065972  |
|     PegasusForConditionalGeneration     | 32  | 49.743016  |
|             XGLMForCausalLM             |  8  |  45.84248  |
|          BlenderbotForCausalLM          |  4  | 45.451723  |
|          DebertaV2ForMaskedLM           |  2  | 44.545542  |
|      DebertaV2ForQuestionAnswering      |  1  | 43.316692  |
|         MegatronBertForCausalLM         |  4  | 42.577205  |
|    MegatronBertForQuestionAnswering     |  8  | 40.629602  |
|       MT5ForConditionalGeneration       | 16  | 38.299205  |
| BlenderbotSmallForConditionalGeneration | 64  | 37.765697  |
|            YituTechConvBert             | 16  | 32.832436  |
|     PLBartForConditionalGeneration      |  4  | 31.289639  |
|                 T5Small                 |  4  | 30.335911  |
|       T5ForConditionalGeneration        |  4  | 30.068885  |
|            TrOCRForCausalLM             | 32  | 27.574487  |
|           PegasusForCausalLM            | 32  | 27.443113  |
|            MBartForCausalLM             |  4  |  26.7971   |
|             OPTForCausalLM              |  2  | 26.726618  |
|           DebertaForMaskedLM            |  8  | 25.782591  |
|           ElectraForCausalLM            | 32  | 25.325722  |
|       DebertaForQuestionAnswering       | 16  | 25.285776  |
|           RobertaForCausalLM            | 16  |  25.21131  |
|           LayoutLMForMaskedLM           | 16  | 25.029812  |
|    LayoutLMForSequenceClassification    | 16  | 24.925592  |
|       ElectraForQuestionAnswering       | 64  | 24.874647  |
|             BartForCausalLM             |  4  | 24.856339  |
|       RobertaForQuestionAnswering       | 16  | 24.784123  |
|             BertForMaskedLM             | 16  | 24.657886  |
|                CamemBert                | 16  | 24.649779  |
|        BertForQuestionAnswering         | 16  | 24.636998  |
|            AlbertForMaskedLM            |  4  | 23.333741  |
|       BlenderbotSmallForCausalLM        | 64  | 22.253156  |
|      GPT2ForSequenceClassification      |  4  | 21.976743  |
|       AlbertForQuestionAnswering        |  4  | 20.814471  |
|               GoogleFnet                | 16  | 20.161199  |
|            PLBartForCausalLM            |  8  | 20.013103  |
|          DistilBertForMaskedLM          | 128 | 19.896601  |
|         Speech2Text2ForCausalLM         | 256 | 19.781026  |
|     DistilBertForQuestionAnswering      | 256 | 19.611389  |
|               DistillGPT2               | 16  | 17.736785  |
|            XLNetLMHeadModel             |  8  | 13.880185  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.993751 |
|       AlbertForQuestionAnswering        |  4  | 0.993601 |
|             OPTForCausalLM              |  2  | 0.992833 |
|     DistilBertForQuestionAnswering      | 256 |  0.9926  |
|           RobertaForCausalLM            | 16  | 0.992183 |
|               DistillGPT2               | 16  | 0.991728 |
|            TrOCRForCausalLM             | 32  | 0.991597 |
|           ElectraForCausalLM            | 32  | 0.990716 |
|          DistilBertForMaskedLM          | 128 | 0.990446 |
|               GoogleFnet                | 16  | 0.990393 |
|       ElectraForQuestionAnswering       | 64  | 0.989996 |
|            PLBartForCausalLM            |  8  | 0.989812 |
|             BertForMaskedLM             | 16  | 0.989651 |
|                CamemBert                | 16  | 0.989492 |
|           LayoutLMForMaskedLM           | 16  | 0.989394 |
|            MBartForCausalLM             |  4  | 0.988907 |
|         Speech2Text2ForCausalLM         | 256 | 0.988247 |
|       DebertaForQuestionAnswering       | 16  | 0.98822  |
|        BertForQuestionAnswering         | 16  | 0.987698 |
|            YituTechConvBert             | 16  |  0.9875  |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987408 |
|       RobertaForQuestionAnswering       | 16  | 0.987331 |
|    LayoutLMForSequenceClassification    | 16  | 0.987094 |
|     PLBartForConditionalGeneration      |  4  | 0.987068 |
|           PegasusForCausalLM            | 32  | 0.986199 |
|            XLNetLMHeadModel             |  8  | 0.985181 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985021 |
|             BartForCausalLM             |  4  | 0.984985 |
|      GPT2ForSequenceClassification      |  4  | 0.98436  |
|           DebertaForMaskedLM            |  8  | 0.983331 |
|          MobileBertForMaskedLM          | 128 | 0.983051 |
|       T5ForConditionalGeneration        |  4  | 0.982712 |
|                 T5Small                 |  4  |  0.9825  |
|     MobileBertForQuestionAnswering      | 128 | 0.979791 |
|          AllenaiLongformerBase          |  4  | 0.978385 |
|     PegasusForConditionalGeneration     | 32  | 0.974517 |
|         MegatronBertForCausalLM         |  4  | 0.974334 |
|      BartForConditionalGeneration       |  2  | 0.967861 |
|    MegatronBertForQuestionAnswering     |  8  | 0.966831 |
|       MT5ForConditionalGeneration       | 16  | 0.962665 |
|          DebertaV2ForMaskedLM           |  2  | 0.93358  |
|      MBartForConditionalGeneration      |  2  | 0.905796 |
|             XGLMForCausalLM             |  8  | 0.893973 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.875681 |
|          BlenderbotForCausalLM          |  4  | 0.843642 |
|     M2M100ForConditionalGeneration      | 16  | 0.773435 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2565.856418 |
|       AlbertForQuestionAnswering        |  4  | 2556.675779 |
|            XLNetLMHeadModel             |  8  | 1320.080406 |
|            TrOCRForCausalLM             | 32  | 979.766328  |
|     PegasusForConditionalGeneration     | 32  |  940.14597  |
|     DistilBertForQuestionAnswering      | 256 | 830.803091  |
|    MegatronBertForQuestionAnswering     |  8  | 754.929673  |
|      MBartForConditionalGeneration      |  2  | 666.710645  |
|            MBartForCausalLM             |  4  | 652.716385  |
|          DistilBertForMaskedLM          | 128 | 628.488987  |
|           RobertaForCausalLM            | 16  | 609.307667  |
|          BlenderbotForCausalLM          |  4  | 591.402389  |
|             OPTForCausalLM              |  2  | 589.756268  |
|      BartForConditionalGeneration       |  2  |  588.1306   |
|     M2M100ForConditionalGeneration      | 16  | 584.693227  |
|            YituTechConvBert             | 16  | 577.856044  |
|          DebertaV2ForMaskedLM           |  2  | 565.709402  |
|                CamemBert                | 16  | 562.087464  |
|             BertForMaskedLM             | 16  | 555.774823  |
|           LayoutLMForMaskedLM           | 16  | 554.516759  |
|            PLBartForCausalLM            |  8  | 518.654932  |
|          AllenaiLongformerBase          |  4  | 516.913004  |
|       DebertaForQuestionAnswering       | 16  | 506.297664  |
|             BartForCausalLM             |  4  | 500.017828  |
| BlenderbotSmallForConditionalGeneration | 64  | 466.424451  |
|           PegasusForCausalLM            | 32  | 462.814215  |
|     PLBartForConditionalGeneration      |  4  | 460.602752  |
|        BertForQuestionAnswering         | 16  | 445.777705  |
|    LayoutLMForSequenceClassification    | 16  | 444.188263  |
|         MegatronBertForCausalLM         |  4  | 440.281208  |
|       RobertaForQuestionAnswering       | 16  | 423.173012  |
|          MobileBertForMaskedLM          | 128 | 406.401198  |
|               GoogleFnet                | 16  | 397.677553  |
|               DistillGPT2               | 16  | 390.888478  |
|                 T5Small                 |  4  | 356.426654  |
|       T5ForConditionalGeneration        |  4  | 356.366513  |
|             XGLMForCausalLM             |  8  | 336.636688  |
|           DebertaForMaskedLM            |  8  | 334.781191  |
|       ElectraForQuestionAnswering       | 64  | 305.565775  |
|       MT5ForConditionalGeneration       | 16  |  277.15144  |
|       BlenderbotSmallForCausalLM        | 64  | 270.834957  |
|      GPT2ForSequenceClassification      |  4  | 257.520579  |
|         Speech2Text2ForCausalLM         | 256 | 254.292302  |
|      DebertaV2ForQuestionAnswering      |  1  | 232.570563  |
|     MobileBertForQuestionAnswering      | 128 | 230.015605  |
|           ElectraForCausalLM            | 32  | 229.048615  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.973385 |
|            lcnet_050            | 256  | 3.934273 |
|         mobilenetv2_100         | 128  | 3.932966 |
|           mnasnet_100           | 512  | 3.903995 |
|          spnasnet_100           | 128  | 3.81001  |
|      mobilenetv3_large_100      | 512  | 3.616703 |
|            fbnetv3_b            | 256  | 3.453926 |
|           regnety_002           | 1024 | 3.313823 |
|           rexnet_100            | 256  | 3.134041 |
|       tf_efficientnet_b0        | 128  | 3.045016 |
|            tinynet_a            | 128  | 3.007256 |
|          pnasnet5large          |  16  | 2.797728 |
|        ese_vovnet19b_dw         | 256  | 2.607741 |
|          botnet26t_256          | 128  | 2.505782 |
|           mobilevit_s           |  64  | 2.448566 |
|           res2next50            | 128  | 2.426346 |
|          ghostnet_100           | 512  | 2.343729 |
|       gluon_inception_v3        | 256  | 2.336021 |
|       eca_botnext26ts_256       | 128  | 2.308046 |
|          inception_v3           | 128  | 2.300767 |
|        adv_inception_v3         | 128  | 2.291341 |
|        eca_halonext26ts         | 128  | 2.288944 |
|             dla102              | 128  | 2.191548 |
|        res2net50_14w_8s         | 128  | 2.156775 |
|        res2net101_26w_4s        | 128  | 2.12695  |
|          cspdarknet53           |  64  | 2.068142 |
|            repvgg_a2            | 128  | 1.987372 |
|           tf_mixnet_l           | 128  | 1.963834 |
|        convmixer_768_32         |  32  | 1.958789 |
|            nfnet_l0             | 128  | 1.956391 |
|         poolformer_m36          |  64  | 1.890519 |
|            gernet_l             | 128  | 1.885912 |
|           volo_d1_224           |  64  | 1.869029 |
|            mixnet_l             | 128  | 1.807135 |
|           dm_nfnet_f0           | 128  | 1.784767 |
|           selecsls42b           | 128  | 1.779122 |
|         visformer_small         | 128  | 1.669183 |
|        sebotnet33ts_256         |  64  | 1.60065  |
|           convit_base           |  64  | 1.592059 |
|           resnest101e           |  64  | 1.486418 |
|            levit_128            | 1024 | 1.485859 |
|             dpn107              |  64  | 1.42933  |
|          jx_nest_base           |  32  | 1.385005 |
|          gmlp_s16_224           | 128  | 1.382122 |
|      xcit_large_24_p8_224       |  16  | 1.351648 |
|          gmixer_24_224          | 128  | 1.324842 |
|  swin_base_patch4_window7_224   |  64  | 1.321545 |
|         coat_lite_mini          | 128  | 1.295085 |
|        twins_pcpvt_base         | 128  | 1.269157 |
|        tnt_s_patch16_224        | 128  | 1.256412 |
|          mixer_b16_224          | 128  | 1.204168 |
|          convnext_base          |  64  | 1.203272 |
|      beit_base_patch16_224      |  64  | 1.187868 |
| deit_base_distilled_patch16_224 |  64  | 1.167559 |
|          cait_m36_384           |  4   | 1.159279 |
|      vit_base_patch16_224       |  64  | 1.151806 |
|            pit_b_224            |  64  | 1.125307 |
|         crossvit_9_240          | 256  | 1.082545 |
|            hrnet_w18            | 128  | 0.811551 |
|          resmlp_12_224          | 128  | 0.803329 |
|     swsl_resnext101_32x16d      |  32  | 0.075088 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          pnasnet5large          |  16  | 494.24058  |
|            hrnet_w18            | 128  | 282.925059 |
|     swsl_resnext101_32x16d      |  32  | 118.418891 |
|        res2net101_26w_4s        | 128  |  87.94602  |
|           resnest101e           |  64  | 80.328957  |
|           tf_mixnet_l           | 128  | 73.208684  |
|        res2net50_14w_8s         | 128  | 72.097211  |
|          cait_m36_384           |  4   | 70.876157  |
|            mixnet_l             | 128  | 69.658073  |
|      xcit_large_24_p8_224       |  16  |  65.54975  |
|         poolformer_m36          |  64  |  62.78088  |
|        twins_pcpvt_base         | 128  |  59.47856  |
|             dpn107              |  64  | 55.331603  |
|  swin_base_patch4_window7_224   |  64  | 54.106011  |
|           mobilevit_s           |  64  | 51.610395  |
|          jx_nest_base           |  32  | 45.834838  |
|        tnt_s_patch16_224        | 128  | 44.693294  |
|        adv_inception_v3         | 128  | 44.610202  |
|            fbnetv3_b            | 256  | 43.501188  |
|          inception_v3           | 128  | 41.845031  |
|       gluon_inception_v3        | 256  | 39.527002  |
|             dla102              | 128  | 39.482202  |
|        eca_halonext26ts         | 128  | 39.193527  |
|           volo_d1_224           |  64  | 38.413823  |
|           res2next50            | 128  |  38.33284  |
|          convnext_base          |  64  | 36.356028  |
|          ghostnet_100           | 512  |  36.08921  |
|         crossvit_9_240          | 256  | 35.597894  |
|            tinynet_a            | 128  |  35.30853  |
|          gmixer_24_224          | 128  | 35.083219  |
|        sebotnet33ts_256         |  64  | 35.005723  |
|         coat_lite_mini          | 128  | 34.209544  |
|            levit_128            | 1024 | 33.558334  |
|          gmlp_s16_224           | 128  | 33.011963  |
|           rexnet_100            | 256  | 31.358708  |
|       tf_efficientnet_b0        | 128  | 30.272663  |
|       eca_botnext26ts_256       | 128  | 28.534487  |
|           dm_nfnet_f0           | 128  | 27.573811  |
|        convmixer_768_32         |  32  | 27.384144  |
|         visformer_small         | 128  | 27.009762  |
|          cspdarknet53           |  64  | 26.916551  |
|          botnet26t_256          | 128  | 26.540092  |
|           convit_base           |  64  | 26.228735  |
|            nfnet_l0             | 128  | 26.216988  |
|          spnasnet_100           | 128  | 24.524826  |
|           regnety_002           | 1024 | 24.318622  |
|            pit_b_224            |  64  | 24.150777  |
|      beit_base_patch16_224      |  64  |  23.15472  |
|            gernet_l             | 128  | 23.122129  |
|          mixer_b16_224          | 128  | 22.963281  |
|      mobilenetv3_large_100      | 512  | 22.864263  |
| deit_base_distilled_patch16_224 |  64  | 22.304591  |
|            repvgg_a2            | 128  | 22.171678  |
|         mobilenetv2_100         | 128  |  22.11538  |
|      vit_base_patch16_224       |  64  | 21.979762  |
|           fbnetc_100            | 512  |  21.26572  |
|           selecsls42b           | 128  | 20.376198  |
|          resmlp_12_224          | 128  | 19.907069  |
|           mnasnet_100           | 512  | 19.024948  |
|            lcnet_050            | 256  | 18.911031  |
|        ese_vovnet19b_dw         | 256  | 18.042137  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997107 |
|            fbnetv3_b            | 256  | 0.996101 |
|           fbnetc_100            | 512  | 0.996054 |
|      mobilenetv3_large_100      | 512  | 0.995716 |
|           rexnet_100            | 256  | 0.995351 |
|           mnasnet_100           | 512  | 0.994922 |
|           dm_nfnet_f0           | 128  | 0.994635 |
|          ghostnet_100           | 512  | 0.994336 |
|            levit_128            | 1024 | 0.993759 |
|          convnext_base          |  64  | 0.993692 |
|           regnety_002           | 1024 | 0.993329 |
|       eca_botnext26ts_256       | 128  | 0.993304 |
|            nfnet_l0             | 128  | 0.993081 |
|        eca_halonext26ts         | 128  | 0.992666 |
|           res2next50            | 128  | 0.992447 |
|        res2net101_26w_4s        | 128  | 0.99152  |
|          gmlp_s16_224           | 128  | 0.991457 |
|           tf_mixnet_l           | 128  | 0.991415 |
|         visformer_small         | 128  | 0.991275 |
|        convmixer_768_32         |  32  | 0.991018 |
|         mobilenetv2_100         | 128  | 0.990761 |
|       tf_efficientnet_b0        | 128  | 0.990743 |
|         coat_lite_mini          | 128  | 0.990704 |
|          botnet26t_256          | 128  | 0.990593 |
|          mixer_b16_224          | 128  | 0.990392 |
|            mixnet_l             | 128  | 0.990378 |
|      xcit_large_24_p8_224       |  16  | 0.990315 |
|        sebotnet33ts_256         |  64  | 0.989595 |
|           mobilevit_s           |  64  | 0.989476 |
|             dpn107              |  64  | 0.989382 |
|          cspdarknet53           |  64  | 0.989353 |
|             dla102              | 128  | 0.989186 |
|        twins_pcpvt_base         | 128  | 0.988732 |
|       gluon_inception_v3        | 256  | 0.988523 |
|        tnt_s_patch16_224        | 128  | 0.988503 |
|            gernet_l             | 128  | 0.988172 |
|  swin_base_patch4_window7_224   |  64  | 0.988065 |
|          gmixer_24_224          | 128  | 0.988039 |
|           resnest101e           |  64  | 0.988005 |
|           convit_base           |  64  | 0.987957 |
|         poolformer_m36          |  64  | 0.987667 |
|        res2net50_14w_8s         | 128  | 0.987124 |
|      beit_base_patch16_224      |  64  | 0.98617  |
|           selecsls42b           | 128  |  0.9857  |
|            tinynet_a            | 128  | 0.98519  |
|          resmlp_12_224          | 128  | 0.984388 |
|          spnasnet_100           | 128  | 0.98423  |
|          pnasnet5large          |  16  | 0.983738 |
|        adv_inception_v3         | 128  | 0.983277 |
|          inception_v3           | 128  | 0.983169 |
|            hrnet_w18            | 128  | 0.982987 |
| deit_base_distilled_patch16_224 |  64  | 0.982657 |
|          jx_nest_base           |  32  | 0.982385 |
|            pit_b_224            |  64  | 0.981994 |
|      vit_base_patch16_224       |  64  | 0.981973 |
|     swsl_resnext101_32x16d      |  32  | 0.981811 |
|            lcnet_050            | 256  | 0.980582 |
|            repvgg_a2            | 128  | 0.978849 |
|           volo_d1_224           |  64  | 0.977823 |
|          cait_m36_384           |  4   | 0.97766  |
|         crossvit_9_240          | 256  | 0.965504 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+--------------+
|              name               |  bs  |   inductor   |
+---------------------------------+------+--------------+
|     swsl_resnext101_32x16d      |  32  | 17338.792965 |
|           resnest101e           |  64  | 2729.702438  |
|            hrnet_w18            | 128  | 1427.583438  |
|      xcit_large_24_p8_224       |  16  | 1339.840509  |
|          cait_m36_384           |  4   | 1162.808517  |
|          convnext_base          |  64  | 1055.466814  |
|           dm_nfnet_f0           | 128  | 1013.262123  |
|             dpn107              |  64  |  995.198165  |
|          mixer_b16_224          | 128  |  958.64661   |
|       gluon_inception_v3        | 256  |  816.873607  |
|  swin_base_patch4_window7_224   |  64  |  783.207767  |
|        tnt_s_patch16_224        | 128  |  744.937023  |
|        twins_pcpvt_base         | 128  |  731.809949  |
|           convit_base           |  64  |  713.537309  |
|        res2net101_26w_4s        | 128  |  659.257071  |
|      vit_base_patch16_224       |  64  |  640.333363  |
| deit_base_distilled_patch16_224 |  64  |  634.802038  |
|            nfnet_l0             | 128  |  631.377397  |
|      beit_base_patch16_224      |  64  |  628.586743  |
|            levit_128            | 1024 |  568.626869  |
|        ese_vovnet19b_dw         | 256  |  559.235348  |
|             dla102              | 128  |  547.317512  |
|          gmlp_s16_224           | 128  |  531.308436  |
|            pit_b_224            |  64  |  530.618181  |
|          gmixer_24_224          | 128  |  506.135729  |
|         crossvit_9_240          | 256  |  491.510667  |
|          jx_nest_base           |  32  |  481.635144  |
|        convmixer_768_32         |  32  |  450.881662  |
|         poolformer_m36          |  64  |  427.697693  |
|          inception_v3           | 128  |  408.641656  |
|        adv_inception_v3         | 128  |  408.318822  |
|          resmlp_12_224          | 128  |  407.299045  |
|        res2net50_14w_8s         | 128  |  407.225921  |
|         visformer_small         | 128  |  400.326487  |
|           volo_d1_224           |  64  |  392.149976  |
|         coat_lite_mini          | 128  |  370.289229  |
|           res2next50            | 128  |  363.676493  |
|          ghostnet_100           | 512  |  363.37115   |
|            repvgg_a2            | 128  |  354.257915  |
|            mixnet_l             | 128  |  350.895146  |
|          pnasnet5large          |  16  |  347.543014  |
|           tf_mixnet_l           | 128  |  340.338633  |
|        sebotnet33ts_256         |  64  |  319.542887  |
|        eca_halonext26ts         | 128  |  311.616641  |
|            gernet_l             | 128  |  305.187276  |
|           fbnetc_100            | 512  |  305.055806  |
|       eca_botnext26ts_256       | 128  |  303.20214   |
|          botnet26t_256          | 128  |  291.627379  |
|           regnety_002           | 1024 |  281.927667  |
|          cspdarknet53           |  64  |  269.566374  |
|           mnasnet_100           | 512  |  258.082191  |
|            fbnetv3_b            | 256  |  246.857871  |
|           selecsls42b           | 128  |  237.529874  |
|      mobilenetv3_large_100      | 512  |  232.470076  |
|           rexnet_100            | 256  |  225.521361  |
|           mobilevit_s           |  64  |  159.510017  |
|       tf_efficientnet_b0        | 128  |  117.897681  |
|            tinynet_a            | 128  |  80.871284   |
|         mobilenetv2_100         | 128  |  70.867017   |
|          spnasnet_100           | 128  |  63.239295   |
|            lcnet_050            | 256  |  27.144168   |
+---------------------------------+------+--------------+

zxd1997066 · 2024-10-15T14:36:29Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	f8a6ada8af
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+ba696ea
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	f8a6ada8af184dad9ea0fd0f2712caa37a54b2b2

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.61x    |    1.21x    |    1.58x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   52.76    |    36.00    |    50.36    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.81x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 51.333523 |
|     pyhpc_equation_of_state     |    1    | 23.99463  |
|          maml_omniglot          |    5    | 3.917462  |
|     functorch_maml_omniglot     |    1    | 3.686176  |
|         basic_gnn_sage          |    1    | 3.561719  |
|          basic_gnn_gin          |    1    | 3.559031  |
|          squeezenet1_1          |    1    | 3.508789  |
|          basic_gnn_gcn          |    1    | 3.221436  |
|           timm_nfnet            |    1    | 2.794779  |
|         opacus_cifar10          |    1    | 2.752736  |
|      functorch_dp_cifar10       |    1    |  2.56599  |
|       shufflenet_v2_x1_0        |    1    | 2.407747  |
|            resnet18             |    1    | 2.242087  |
|              dcgan              |    1    | 2.202505  |
|          mobilenet_v2           |    1    | 2.131238  |
|           mnasnet1_0            |    1    | 2.044042  |
|         phlippe_resnet          |    1    | 2.040592  |
|          timm_resnest           |    1    | 2.034816  |
|       mobilenet_v3_large        |    1    | 1.976498  |
|            resnet50             |    1    | 1.822232  |
|        timm_efficientnet        |    1    | 1.812182  |
|           densenet121           |    1    | 1.797056  |
|        phlippe_densenet         |    1    |  1.75075  |
|            resnet152            |    1    | 1.714163  |
|          lennard_jones          |    1    | 1.692663  |
|         LearningToPaint         |    1    |  1.63352  |
|           timm_vovnet           |    1    | 1.623378  |
|              llama              |    1    | 1.579036  |
|         resnext50_32x4d         |    1    | 1.518837  |
|           timm_regnet           |    1    | 1.515338  |
|      doctr_reco_predictor       |    1    | 1.494133  |
|              vgg16              |    1    | 1.445926  |
|              dlrm               |    1    | 1.444444  |
|        basic_gnn_edgecnn        |    1    | 1.401992  |
|             yolov3              |    1    | 1.394282  |
|          BERT_pytorch           |    1    | 1.392146  |
|       doctr_det_predictor       |    1    |  1.37706  |
|             alexnet             |    1    | 1.363364  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.294858  |
|            hf_Albert            |    1    | 1.282342  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.276724  |
|             hf_GPT2             |    1    | 1.270796  |
|         vision_maskrcnn         |    1    | 1.266824  |
|           hf_Reformer           |    1    | 1.266585  |
|     timm_vision_transformer     |    1    | 1.255304  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.237442  |
|           Super_SloMo           |    1    | 1.227935  |
|            moondream            |    1    | 1.226618  |
|          fastNLP_Bert           |    1    | 1.221244  |
|              maml               |    1    | 1.215822  |
|          hf_GPT2_large          |    1    | 1.212913  |
|        soft_actor_critic        |   256   | 1.200543  |
|         pytorch_stargan         |   16    | 1.192262  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.191825  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.178717  |
|          hf_Bert_large          |    1    | 1.177092  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.173186  |
|             hf_Bert             |    1    | 1.172399  |
|  timm_vision_transformer_large  |    1    | 1.172175  |
|          hf_DistilBert          |    1    | 1.144783  |
|      torch_multimodal_clip      |    1    | 1.135246  |
|          pytorch_unet           |    1    | 1.126762  |
|           hf_BigBird            |    1    | 1.114514  |
|             hf_Bart             |    1    | 1.110418  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.094397  |
|       speech_transformer        |    1    | 1.091127  |
|        hf_distil_whisper        |    1    |  1.08348  |
|       Background_Matting        |    1    | 1.072258  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.050569  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.048679  |
|          hf_Longformer          |    1    | 1.023928  |
|             demucs              |    1    |  1.00568  |
|           tts_angular           |    1    | 1.001285  |
|     resnet50_quantized_qat      |    1    | 0.995198  |
|   mobilenet_v2_quantized_qat    |    1    | 0.985669  |
|     nvidia_deeprecommender      |    1    | 0.968181  |
|           hf_T5_large           |    1    | 0.794455  |
|              hf_T5              |    1    | 0.706146  |
|           hf_T5_base            |    1    |  0.59735  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 343.09927  |
|         vision_maskrcnn         |    1    | 321.286808 |
|    detectron2_fcos_r_50_fpn     |    1    | 263.611505 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 246.732367 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 215.661429 |
|           hf_T5_large           |    1    | 191.511149 |
|              maml               |    1    | 157.743475 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 123.193992 |
|           hf_T5_base            |    1    | 112.484894 |
|          hf_Longformer          |    1    | 112.184937 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 99.011245  |
|       speech_transformer        |    1    | 92.435251  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 85.316781  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 84.061208  |
|           hf_Reformer           |    1    | 83.445577  |
|          basic_gnn_gcn          |    1    |  64.02299  |
|           densenet121           |    1    | 61.702374  |
|            resnet152            |    1    | 61.272274  |
|          fastNLP_Bert           |    1    | 54.995828  |
|       doctr_det_predictor       |    1    | 49.921353  |
|  timm_vision_transformer_large  |    1    | 45.199036  |
|          hf_Bert_large          |    1    | 41.808917  |
|        hf_distil_whisper        |    1    | 39.751301  |
|            moondream            |    1    | 37.739481  |
|          hf_GPT2_large          |    1    | 37.365981  |
|           Super_SloMo           |    1    | 36.596603  |
|           timm_regnet           |    1    | 36.569379  |
|             demucs              |    1    | 35.005123  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 34.408767  |
|             yolov3              |    1    | 32.705604  |
|              hf_T5              |    1    | 31.474674  |
|      torch_multimodal_clip      |    1    | 31.172657  |
|           timm_nfnet            |    1    | 30.895983  |
|        timm_efficientnet        |    1    | 30.570989  |
|          BERT_pytorch           |    1    | 28.291227  |
|             hf_Bart             |    1    | 27.520248  |
|        phlippe_densenet         |    1    | 27.482771  |
|      doctr_reco_predictor       |    1    | 27.109269  |
|       shufflenet_v2_x1_0        |    1    | 26.632797  |
|       mobilenet_v3_large        |    1    | 25.905574  |
|             hf_Bert             |    1    | 25.619965  |
|       Background_Matting        |    1    | 24.744477  |
|         resnext50_32x4d         |    1    | 23.983775  |
|            resnet50             |    1    | 23.939468  |
|          mobilenet_v2           |    1    | 23.676905  |
|            hf_Albert            |    1    | 23.375269  |
|           mnasnet1_0            |    1    | 23.188814  |
|          timm_resnest           |    1    | 22.952696  |
|           timm_vovnet           |    1    | 22.821897  |
|             hf_GPT2             |    1    | 22.465113  |
|              llama              |    1    | 22.364163  |
|     timm_vision_transformer     |    1    |  22.20191  |
|         opacus_cifar10          |    1    | 19.544668  |
|          hf_DistilBert          |    1    | 19.401783  |
|      functorch_dp_cifar10       |    1    | 19.003969  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 18.831263  |
|     pyhpc_isoneutral_mixing     |    1    | 18.075103  |
|         pytorch_stargan         |   16    |  17.90573  |
|            resnet18             |    1    |  17.54387  |
|          pytorch_unet           |    1    |  17.51757  |
|         LearningToPaint         |    1    |  17.37669  |
|         phlippe_resnet          |    1    | 17.150813  |
|          squeezenet1_1          |    1    | 17.131452  |
|              vgg16              |    1    | 16.167621  |
|             alexnet             |    1    | 15.775762  |
|     functorch_maml_omniglot     |    1    | 15.073185  |
|          maml_omniglot          |    5    | 14.795542  |
|     pyhpc_equation_of_state     |    1    | 14.765197  |
|     nvidia_deeprecommender      |    1    | 14.343434  |
|              dcgan              |    1    | 14.169535  |
|              dlrm               |    1    |  14.09702  |
|          basic_gnn_gin          |    1    | 13.604111  |
|         basic_gnn_sage          |    1    | 13.603539  |
|        soft_actor_critic        |   256   | 13.424959  |
|        basic_gnn_edgecnn        |    1    | 13.401322  |
|          lennard_jones          |    1    |  13.12498  |
|           tts_angular           |    1    | 12.811216  |
|   mobilenet_v2_quantized_qat    |    1    |  0.097327  |
|     resnet50_quantized_qat      |    1    |  0.069244  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995281 |
|           hf_T5_base            |    1    | 0.988497 |
|              dlrm               |    1    | 0.986929 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.982244 |
|       Background_Matting        |    1    | 0.981055 |
|          hf_GPT2_large          |    1    | 0.975609 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97367  |
|          pytorch_unet           |    1    | 0.972762 |
|        basic_gnn_edgecnn        |    1    | 0.972269 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.960457 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.957863 |
|       doctr_det_predictor       |    1    | 0.957112 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954506 |
|     resnet50_quantized_qat      |    1    | 0.951355 |
|         vision_maskrcnn         |    1    | 0.950182 |
|          basic_gnn_gin          |    1    | 0.946383 |
|         pytorch_stargan         |   16    | 0.945633 |
|           hf_BigBird            |    1    | 0.945388 |
|          basic_gnn_gcn          |    1    | 0.945038 |
|         LearningToPaint         |    1    | 0.944403 |
|      doctr_reco_predictor       |    1    | 0.943292 |
|         basic_gnn_sage          |    1    | 0.942413 |
|      torch_multimodal_clip      |    1    | 0.940914 |
|           Super_SloMo           |    1    | 0.934626 |
|   mobilenet_v2_quantized_qat    |    1    |  0.9207  |
|              llama              |    1    | 0.91838  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916793 |
|        hf_distil_whisper        |    1    | 0.903074 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.886431 |
|           tts_angular           |    1    | 0.886282 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.883236 |
|        soft_actor_critic        |   256   | 0.880996 |
|         opacus_cifar10          |    1    | 0.880851 |
|             yolov3              |    1    | 0.876469 |
|              dcgan              |    1    | 0.862034 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.858986 |
|           mnasnet1_0            |    1    | 0.85727  |
|        timm_efficientnet        |    1    | 0.855019 |
|          lennard_jones          |    1    | 0.852018 |
|          squeezenet1_1          |    1    | 0.851446 |
|          mobilenet_v2           |    1    | 0.851261 |
|          maml_omniglot          |    5    | 0.848898 |
|          fastNLP_Bert           |    1    | 0.84882  |
|     functorch_maml_omniglot     |    1    | 0.846073 |
|          timm_resnest           |    1    | 0.846045 |
|       mobilenet_v3_large        |    1    | 0.840307 |
|     pyhpc_equation_of_state     |    1    | 0.836848 |
|         phlippe_resnet          |    1    | 0.825346 |
|       shufflenet_v2_x1_0        |    1    | 0.82507  |
|       speech_transformer        |    1    | 0.819777 |
|           hf_T5_large           |    1    | 0.819074 |
|           timm_nfnet            |    1    | 0.815293 |
|     pyhpc_isoneutral_mixing     |    1    | 0.812973 |
|        phlippe_densenet         |    1    | 0.809091 |
|          hf_Bert_large          |    1    | 0.80728  |
|          hf_Longformer          |    1    | 0.805379 |
|     timm_vision_transformer     |    1    | 0.802616 |
|            hf_Albert            |    1    | 0.801613 |
|            resnet18             |    1    | 0.800151 |
|             hf_Bert             |    1    | 0.79783  |
|         resnext50_32x4d         |    1    | 0.796771 |
|           timm_vovnet           |    1    | 0.79645  |
|            moondream            |    1    | 0.796383 |
|           timm_regnet           |    1    | 0.782004 |
|          hf_DistilBert          |    1    | 0.781642 |
|            resnet50             |    1    | 0.780286 |
|          BERT_pytorch           |    1    | 0.779109 |
|              hf_T5              |    1    | 0.770928 |
|             hf_GPT2             |    1    | 0.76575  |
|             hf_Bart             |    1    | 0.760253 |
|           densenet121           |    1    | 0.749844 |
|      functorch_dp_cifar10       |    1    | 0.738636 |
|             alexnet             |    1    | 0.737705 |
|  timm_vision_transformer_large  |    1    | 0.732948 |
|           hf_Reformer           |    1    | 0.732551 |
|              vgg16              |    1    | 0.721822 |
|            resnet152            |    1    | 0.717269 |
|              maml               |    1    | 0.717122 |
|     nvidia_deeprecommender      |    1    | 0.672568 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26528.241835 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11878.684834 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11313.215891 |
|          hf_GPT2_large          |    1    | 10290.643246 |
|           hf_T5_large           |    1    | 7526.912606  |
|            moondream            |    1    | 7354.365515  |
|        hf_distil_whisper        |    1    | 7010.461202  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5784.719362  |
|       Background_Matting        |    1    | 5256.541331  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5105.892637  |
|          pytorch_unet           |    1    | 4541.782747  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3188.884064  |
|  timm_vision_transformer_large  |    1    | 2805.942473  |
|         vision_maskrcnn         |    1    | 2653.540336  |
|    detectron2_fcos_r_50_fpn     |    1    | 2438.951383  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2419.474568  |
|             demucs              |    1    |  2308.22053  |
|         pytorch_stargan         |   16    | 2034.545593  |
|           Super_SloMo           |    1    | 1919.644429  |
|          hf_Bert_large          |    1    | 1776.106776  |
|       doctr_det_predictor       |    1    | 1615.107304  |
|           hf_BigBird            |    1    | 1561.398397  |
|      torch_multimodal_clip      |    1    | 1232.240163  |
|          hf_Longformer          |    1    | 1122.245349  |
|             hf_Bart             |    1    |  890.175729  |
|              hf_T5              |    1    |  766.762402  |
|             hf_Bert             |    1    |  681.087511  |
|       speech_transformer        |    1    |  675.590591  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  622.649795  |
|            hf_Albert            |    1    |  579.720281  |
|          fastNLP_Bert           |    1    |  523.709184  |
|             yolov3              |    1    |  436.098108  |
|          hf_DistilBert          |    1    |  415.835845  |
|             hf_GPT2             |    1    |  356.678203  |
|           hf_Reformer           |    1    |  290.955663  |
|        basic_gnn_edgecnn        |    1    |  234.890843  |
|              vgg16              |    1    |  191.916509  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  161.195091  |
|           timm_regnet           |    1    |  151.23915   |
|            resnet152            |    1    |  138.254485  |
|          BERT_pytorch           |    1    |  135.207753  |
|           timm_nfnet            |    1    |  98.219347   |
|              maml               |    1    |  81.765545   |
|           timm_vovnet           |    1    |  81.216686   |
|     nvidia_deeprecommender      |    1    |  58.331298   |
|     timm_vision_transformer     |    1    |  58.209172   |
|         resnext50_32x4d         |    1    |  57.911495   |
|           tts_angular           |    1    |  54.691461   |
|            resnet50             |    1    |  51.743425   |
|           densenet121           |    1    |  45.152164   |
|          timm_resnest           |    1    |  33.732656   |
|          basic_gnn_gcn          |    1    |  31.646983   |
|      doctr_reco_predictor       |    1    |   24.00029   |
|            resnet18             |    1    |  22.521802   |
|             alexnet             |    1    |  22.446412   |
|              llama              |    1    |  21.385566   |
|     resnet50_quantized_qat      |    1    |  19.557575   |
|         basic_gnn_sage          |    1    |  16.854756   |
|          basic_gnn_gin          |    1    |  16.542652   |
|        timm_efficientnet        |    1    |  13.007223   |
|         LearningToPaint         |    1    |  10.088495   |
|   mobilenet_v2_quantized_qat    |    1    |   8.723685   |
|           mnasnet1_0            |    1    |   7.635684   |
|          mobilenet_v2           |    1    |   7.558237   |
|       mobilenet_v3_large        |    1    |   7.16573    |
|          squeezenet1_1          |    1    |   5.920298   |
|       shufflenet_v2_x1_0        |    1    |   5.508961   |
|        phlippe_densenet         |    1    |   3.575892   |
|        soft_actor_critic        |   256   |   3.039691   |
|      functorch_dp_cifar10       |    1    |   2.139316   |
|         opacus_cifar10          |    1    |   2.136485   |
|              dcgan              |    1    |   1.798364   |
|         phlippe_resnet          |    1    |   1.398219   |
|     functorch_maml_omniglot     |    1    |   0.842579   |
|          maml_omniglot          |    5    |   0.597713   |
|              dlrm               |    1    |   0.583795   |
|     pyhpc_isoneutral_mixing     |    1    |   0.062132   |
|     pyhpc_equation_of_state     |    1    |   0.05085    |
|          lennard_jones          |    1    |   0.046224   |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.076288 |
|     MobileBertForQuestionAnswering      | 1  | 1.800141 |
|         Speech2Text2ForCausalLM         | 1  | 1.413668 |
|            XLNetLMHeadModel             | 1  | 1.364312 |
|          DistilBertForMaskedLM          | 1  | 1.331242 |
|     DistilBertForQuestionAnswering      | 1  | 1.331227 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.324398 |
|       BlenderbotSmallForCausalLM        | 1  | 1.319503 |
|      GPT2ForSequenceClassification      | 1  | 1.316349 |
|            YituTechConvBert             | 1  | 1.308746 |
|       DebertaForQuestionAnswering       | 1  | 1.276122 |
|     M2M100ForConditionalGeneration      | 1  |  1.2557  |
|     PegasusForConditionalGeneration     | 1  | 1.252749 |
|          BlenderbotForCausalLM          | 1  | 1.252668 |
|           PegasusForCausalLM            | 1  | 1.249725 |
|             XGLMForCausalLM             | 1  | 1.249483 |
|           ElectraForCausalLM            | 1  | 1.247555 |
|           DebertaForMaskedLM            | 1  | 1.245305 |
|               GoogleFnet                | 1  | 1.238892 |
|       ElectraForQuestionAnswering       | 1  | 1.220245 |
|       MT5ForConditionalGeneration       | 1  | 1.213343 |
|            AlbertForMaskedLM            | 1  | 1.206372 |
|       AlbertForQuestionAnswering        | 1  | 1.204828 |
|    LayoutLMForSequenceClassification    | 1  | 1.190108 |
|           LayoutLMForMaskedLM           | 1  | 1.18944  |
|               DistillGPT2               | 1  | 1.186554 |
|             BertForMaskedLM             | 1  | 1.179371 |
|                CamemBert                | 1  | 1.178091 |
|    MegatronBertForQuestionAnswering     | 1  | 1.177456 |
|        BertForQuestionAnswering         | 1  | 1.166354 |
|         MegatronBertForCausalLM         | 1  | 1.165797 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.161349 |
|       RobertaForQuestionAnswering       | 1  | 1.157182 |
|          DebertaV2ForMaskedLM           | 1  | 1.156566 |
|            TrOCRForCausalLM             | 1  | 1.150137 |
|           RobertaForCausalLM            | 1  | 1.14877  |
|     PLBartForConditionalGeneration      | 1  | 1.086146 |
|      MBartForConditionalGeneration      | 1  | 1.066113 |
|      BartForConditionalGeneration       | 1  | 1.065276 |
|             BartForCausalLM             | 1  | 1.062988 |
|            PLBartForCausalLM            | 1  | 1.022198 |
|             OPTForCausalLM              | 1  | 1.021255 |
|            MBartForCausalLM             | 1  | 1.010061 |
|          AllenaiLongformerBase          | 1  | 0.960837 |
|                 T5Small                 | 1  | 0.626362 |
|       T5ForConditionalGeneration        | 1  | 0.624436 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          AllenaiLongformerBase          | 1  | 123.14577  |
|          MobileBertForMaskedLM          | 1  | 121.436871 |
|     MobileBertForQuestionAnswering      | 1  | 121.146298 |
|      BartForConditionalGeneration       | 1  | 54.973343  |
|     PegasusForConditionalGeneration     | 1  | 50.840978  |
|     M2M100ForConditionalGeneration      | 1  | 50.464376  |
|      MBartForConditionalGeneration      | 1  | 49.534855  |
|             XGLMForCausalLM             | 1  | 43.448174  |
|          BlenderbotForCausalLM          | 1  | 42.717333  |
|          DebertaV2ForMaskedLM           | 1  | 40.122553  |
|       MT5ForConditionalGeneration       | 1  | 40.038579  |
|      DebertaV2ForQuestionAnswering      | 1  |  39.90571  |
|         MegatronBertForCausalLM         | 1  | 38.228131  |
|    MegatronBertForQuestionAnswering     | 1  | 37.902049  |
|            XLNetLMHeadModel             | 1  | 37.308274  |
| BlenderbotSmallForConditionalGeneration | 1  |  36.39351  |
|       T5ForConditionalGeneration        | 1  |  34.03343  |
|                 T5Small                 | 1  | 33.984209  |
|            YituTechConvBert             | 1  | 33.520063  |
|     PLBartForConditionalGeneration      | 1  | 29.673637  |
|            MBartForCausalLM             | 1  | 27.233816  |
|           PegasusForCausalLM            | 1  | 26.935212  |
|            TrOCRForCausalLM             | 1  | 26.706966  |
|           ElectraForCausalLM            | 1  | 26.578205  |
|             OPTForCausalLM              | 1  | 26.238553  |
|       ElectraForQuestionAnswering       | 1  | 25.233699  |
|                CamemBert                | 1  | 24.840409  |
|           RobertaForCausalLM            | 1  | 24.832946  |
|           LayoutLMForMaskedLM           | 1  | 24.826131  |
|             BertForMaskedLM             | 1  | 24.728408  |
|       RobertaForQuestionAnswering       | 1  | 24.671385  |
|        BertForQuestionAnswering         | 1  | 24.544583  |
|    LayoutLMForSequenceClassification    | 1  | 24.504116  |
|           DebertaForMaskedLM            | 1  | 24.217148  |
|       DebertaForQuestionAnswering       | 1  | 23.992564  |
|             BartForCausalLM             | 1  | 23.807642  |
|       BlenderbotSmallForCausalLM        | 1  | 21.884923  |
|      GPT2ForSequenceClassification      | 1  | 21.011236  |
|               GoogleFnet                | 1  | 20.658499  |
|          DistilBertForMaskedLM          | 1  | 20.041723  |
|         Speech2Text2ForCausalLM         | 1  | 20.025326  |
|            PLBartForCausalLM            | 1  | 19.979731  |
|     DistilBertForQuestionAnswering      | 1  | 19.819802  |
|               DistillGPT2               | 1  |  17.94396  |
|            AlbertForMaskedLM            | 1  |  17.16046  |
|       AlbertForQuestionAnswering        | 1  | 14.759485  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986899 |
|      MBartForConditionalGeneration      | 1  | 0.97266  |
|      GPT2ForSequenceClassification      | 1  | 0.958539 |
|       T5ForConditionalGeneration        | 1  | 0.954255 |
|          AllenaiLongformerBase          | 1  | 0.949648 |
|            MBartForCausalLM             | 1  | 0.926091 |
|                 T5Small                 | 1  | 0.907683 |
|            XLNetLMHeadModel             | 1  | 0.907596 |
|     PLBartForConditionalGeneration      | 1  | 0.903369 |
|            PLBartForCausalLM            | 1  | 0.901528 |
|       DebertaForQuestionAnswering       | 1  | 0.880134 |
|               GoogleFnet                | 1  | 0.852501 |
|       RobertaForQuestionAnswering       | 1  | 0.846225 |
|    LayoutLMForSequenceClassification    | 1  | 0.845804 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.842944 |
|        BertForQuestionAnswering         | 1  | 0.83975  |
|       ElectraForQuestionAnswering       | 1  | 0.838154 |
|    MegatronBertForQuestionAnswering     | 1  | 0.831537 |
|           DebertaForMaskedLM            | 1  | 0.829513 |
|               DistillGPT2               | 1  | 0.827463 |
|          DebertaV2ForMaskedLM           | 1  | 0.823392 |
|           LayoutLMForMaskedLM           | 1  | 0.811425 |
|                CamemBert                | 1  | 0.810736 |
|         MegatronBertForCausalLM         | 1  | 0.810059 |
|             BertForMaskedLM             | 1  | 0.810041 |
|           RobertaForCausalLM            | 1  | 0.80756  |
|         Speech2Text2ForCausalLM         | 1  | 0.807319 |
|            YituTechConvBert             | 1  | 0.806059 |
|           ElectraForCausalLM            | 1  | 0.800597 |
|     DistilBertForQuestionAnswering      | 1  | 0.799425 |
|             BartForCausalLM             | 1  | 0.79841  |
|          BlenderbotForCausalLM          | 1  | 0.797934 |
|      BartForConditionalGeneration       | 1  | 0.788721 |
|            TrOCRForCausalLM             | 1  | 0.78532  |
|       MT5ForConditionalGeneration       | 1  | 0.780407 |
|       BlenderbotSmallForCausalLM        | 1  | 0.763013 |
|           PegasusForCausalLM            | 1  | 0.753623 |
|          DistilBertForMaskedLM          | 1  | 0.74676  |
| BlenderbotSmallForConditionalGeneration | 1  | 0.736721 |
|     MobileBertForQuestionAnswering      | 1  | 0.721388 |
|     M2M100ForConditionalGeneration      | 1  | 0.714237 |
|     PegasusForConditionalGeneration     | 1  | 0.712075 |
|             XGLMForCausalLM             | 1  | 0.699034 |
|          MobileBertForMaskedLM          | 1  | 0.690916 |
|            AlbertForMaskedLM            | 1  | 0.44848  |
|       AlbertForQuestionAnswering        | 1  | 0.44339  |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12844.609967 |
|       AlbertForQuestionAnswering        | 1  | 12807.692744 |
|      MBartForConditionalGeneration      | 1  | 6222.998301  |
|      BartForConditionalGeneration       | 1  | 5700.891836  |
|             OPTForCausalLM              | 1  | 5320.941958  |
|          DebertaV2ForMaskedLM           | 1  |  5073.26251  |
|      DebertaV2ForQuestionAnswering      | 1  | 3994.802353  |
|            XLNetLMHeadModel             | 1  | 3192.977668  |
|            MBartForCausalLM             | 1  | 3086.738961  |
|          BlenderbotForCausalLM          | 1  | 2650.175152  |
|             BartForCausalLM             | 1  | 2568.504905  |
|                 T5Small                 | 1  | 2545.503975  |
|       T5ForConditionalGeneration        | 1  | 2537.363181  |
|          AllenaiLongformerBase          | 1  | 2451.812673  |
|     PLBartForConditionalGeneration      | 1  | 2206.523815  |
|         MegatronBertForCausalLM         | 1  | 2048.509465  |
|    MegatronBertForQuestionAnswering     | 1  | 1874.844477  |
|      GPT2ForSequenceClassification      | 1  | 1325.293146  |
|            PLBartForCausalLM            | 1  | 1236.152429  |
|             XGLMForCausalLM             | 1  |  836.944143  |
|           DebertaForMaskedLM            | 1  |  791.339825  |
|           RobertaForCausalLM            | 1  |  788.62747   |
|     M2M100ForConditionalGeneration      | 1  |  719.468642  |
|                CamemBert                | 1  |  698.740079  |
|            YituTechConvBert             | 1  |  692.402166  |
|           LayoutLMForMaskedLM           | 1  |  688.667231  |
|             BertForMaskedLM             | 1  |  688.61139   |
|     PegasusForConditionalGeneration     | 1  |  609.844086  |
|            TrOCRForCausalLM             | 1  |  594.961568  |
|       DebertaForQuestionAnswering       | 1  |  559.84669   |
|       RobertaForQuestionAnswering       | 1  |  551.526972  |
|    LayoutLMForSequenceClassification    | 1  |  549.15371   |
|        BertForQuestionAnswering         | 1  |  547.776119  |
|               DistillGPT2               | 1  |  508.033199  |
|               GoogleFnet                | 1  |  475.565886  |
|       MT5ForConditionalGeneration       | 1  |  395.883383  |
|           PegasusForCausalLM            | 1  |  301.963257  |
| BlenderbotSmallForConditionalGeneration | 1  |  144.969462  |
|           ElectraForCausalLM            | 1  |  133.96543   |
|          DistilBertForMaskedLM          | 1  |  99.966474   |
|       ElectraForQuestionAnswering       | 1  |  93.644531   |
|       BlenderbotSmallForCausalLM        | 1  |  84.572572   |
|          MobileBertForMaskedLM          | 1  |  68.951515   |
|     DistilBertForQuestionAnswering      | 1  |  63.729439   |
|     MobileBertForQuestionAnswering      | 1  |  41.220045   |
|         Speech2Text2ForCausalLM         | 1  |  18.747609   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.430325 |
|          inception_v3           | 1  | 2.413285 |
|       gluon_inception_v3        | 1  | 2.390016 |
|        adv_inception_v3         | 1  | 2.366201 |
|            nfnet_l0             | 1  | 2.223395 |
|           dm_nfnet_f0           | 1  | 2.218743 |
|         mobilenetv2_100         | 1  | 2.213869 |
|          spnasnet_100           | 1  | 2.161856 |
|            lcnet_050            | 1  | 2.160974 |
|            levit_128            | 1  | 2.147239 |
|          ghostnet_100           | 1  | 2.119799 |
|           mnasnet_100           | 1  | 2.088761 |
|           fbnetc_100            | 1  | 2.067645 |
|           regnety_002           | 1  | 2.061586 |
|      mobilenetv3_large_100      | 1  | 2.011498 |
|            repvgg_a2            | 1  | 1.967083 |
|            fbnetv3_b            | 1  | 1.871099 |
|           rexnet_100            | 1  | 1.831524 |
|       tf_efficientnet_b0        | 1  | 1.813513 |
|            tinynet_a            | 1  | 1.805416 |
|         poolformer_m36          | 1  | 1.790209 |
|           selecsls42b           | 1  | 1.784983 |
|             dla102              | 1  | 1.738126 |
|           mobilevit_s           | 1  | 1.733361 |
|        ese_vovnet19b_dw         | 1  | 1.727817 |
|          cspdarknet53           | 1  | 1.662307 |
|       eca_botnext26ts_256       | 1  | 1.658136 |
|          botnet26t_256          | 1  | 1.652598 |
|        eca_halonext26ts         | 1  | 1.639118 |
|           res2next50            | 1  | 1.599322 |
|        res2net50_14w_8s         | 1  | 1.585097 |
|           volo_d1_224           | 1  | 1.555386 |
|         coat_lite_mini          | 1  | 1.551137 |
|           tf_mixnet_l           | 1  | 1.548707 |
|        res2net101_26w_4s        | 1  | 1.525243 |
|         visformer_small         | 1  | 1.43577  |
|        twins_pcpvt_base         | 1  | 1.435538 |
|           convit_base           | 1  | 1.408142 |
|          gmixer_24_224          | 1  | 1.390325 |
|          jx_nest_base           | 1  | 1.38255  |
|            mixnet_l             | 1  | 1.378194 |
|            gernet_l             | 1  | 1.346066 |
|      beit_base_patch16_224      | 1  | 1.310548 |
|          resmlp_12_224          | 1  | 1.306227 |
|         crossvit_9_240          | 1  | 1.296619 |
|  swin_base_patch4_window7_224   | 1  | 1.296246 |
|        tnt_s_patch16_224        | 1  | 1.256424 |
|        convmixer_768_32         | 1  | 1.244957 |
|          gmlp_s16_224           | 1  | 1.232655 |
| deit_base_distilled_patch16_224 | 1  | 1.215052 |
|          mixer_b16_224          | 1  | 1.213153 |
|             dpn107              | 1  | 1.197029 |
|      vit_base_patch16_224       | 1  | 1.194491 |
|      xcit_large_24_p8_224       | 1  | 1.18089  |
|            pit_b_224            | 1  | 1.17803  |
|          convnext_base          | 1  | 1.171475 |
|          cait_m36_384           | 1  | 1.06098  |
|        sebotnet33ts_256         | 1  | 0.99411  |
|           resnest101e           | 1  | 0.992099 |
|            hrnet_w18            | 1  | 0.618099 |
|     swsl_resnext101_32x16d      | 1  | 0.067484 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+------------+
|              name               | bs |  inductor  |
+---------------------------------+----+------------+
|          pnasnet5large          | 1  | 499.954114 |
|            hrnet_w18            | 1  | 287.136425 |
|     swsl_resnext101_32x16d      | 1  | 102.602414 |
|        res2net101_26w_4s        | 1  | 90.587386  |
|           resnest101e           | 1  | 84.928267  |
|        res2net50_14w_8s         | 1  | 73.703803  |
|           tf_mixnet_l           | 1  |  71.91119  |
|            mixnet_l             | 1  |  69.59963  |
|      xcit_large_24_p8_224       | 1  | 66.596082  |
|         poolformer_m36          | 1  | 60.295738  |
|        twins_pcpvt_base         | 1  | 59.890054  |
|          cait_m36_384           | 1  | 59.525981  |
|  swin_base_patch4_window7_224   | 1  | 52.539921  |
|             dpn107              | 1  | 52.118378  |
|            fbnetv3_b            | 1  |  47.77809  |
|        adv_inception_v3         | 1  |  45.07816  |
|          jx_nest_base           | 1  | 43.188052  |
|       gluon_inception_v3        | 1  | 43.147285  |
|          inception_v3           | 1  | 43.124444  |
|             dla102              | 1  | 42.857373  |
|        tnt_s_patch16_224        | 1  | 42.235506  |
|           res2next50            | 1  | 39.979897  |
|           mobilevit_s           | 1  | 39.676351  |
|          ghostnet_100           | 1  | 38.707947  |
|          convnext_base          | 1  | 36.826057  |
|           volo_d1_224           | 1  | 36.227285  |
|            tinynet_a            | 1  |  35.42413  |
|          gmixer_24_224          | 1  | 34.268121  |
|          gmlp_s16_224           | 1  |  33.1511   |
|         crossvit_9_240          | 1  | 33.021026  |
|           rexnet_100            | 1  | 31.823316  |
|        sebotnet33ts_256         | 1  | 31.652984  |
|         coat_lite_mini          | 1  | 31.165497  |
|       tf_efficientnet_b0        | 1  | 31.153688  |
|            levit_128            | 1  | 31.038381  |
|           dm_nfnet_f0           | 1  | 30.950488  |
|        eca_halonext26ts         | 1  | 30.498606  |
|            nfnet_l0             | 1  |  28.98255  |
|        convmixer_768_32         | 1  | 28.650678  |
|          cspdarknet53           | 1  | 28.460802  |
|           regnety_002           | 1  | 27.050234  |
|         visformer_small         | 1  | 26.456282  |
|       eca_botnext26ts_256       | 1  | 26.379248  |
|           fbnetc_100            | 1  | 26.240177  |
|           convit_base           | 1  |  26.00231  |
|          spnasnet_100           | 1  | 25.964552  |
|      mobilenetv3_large_100      | 1  | 25.705779  |
|          botnet26t_256          | 1  | 25.281389  |
|            gernet_l             | 1  | 24.808982  |
|            pit_b_224            | 1  | 24.365682  |
|            repvgg_a2            | 1  | 24.149283  |
|         mobilenetv2_100         | 1  | 23.456914  |
|           mnasnet_100           | 1  | 23.221345  |
|      beit_base_patch16_224      | 1  | 23.162722  |
|          mixer_b16_224          | 1  | 22.963189  |
| deit_base_distilled_patch16_224 | 1  | 22.374646  |
|      vit_base_patch16_224       | 1  | 22.301829  |
|        ese_vovnet19b_dw         | 1  | 21.939775  |
|           selecsls42b           | 1  | 21.216606  |
|            lcnet_050            | 1  | 19.426796  |
|          resmlp_12_224          | 1  | 18.735794  |
+---------------------------------+----+------------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.950726 |
|        convmixer_768_32         | 1  | 0.906791 |
|          pnasnet5large          | 1  | 0.903356 |
|            nfnet_l0             | 1  | 0.898246 |
|      xcit_large_24_p8_224       | 1  | 0.89062  |
|        ese_vovnet19b_dw         | 1  | 0.883487 |
|         mobilenetv2_100         | 1  | 0.871472 |
|        eca_halonext26ts         | 1  | 0.865639 |
|       eca_botnext26ts_256       | 1  | 0.864236 |
|           mnasnet_100           | 1  | 0.863602 |
|       tf_efficientnet_b0        | 1  | 0.859335 |
|          spnasnet_100           | 1  | 0.858642 |
|            lcnet_050            | 1  | 0.85485  |
|           dm_nfnet_f0           | 1  | 0.852854 |
|            fbnetv3_b            | 1  | 0.85173  |
|      mobilenetv3_large_100      | 1  | 0.848684 |
|          botnet26t_256          | 1  | 0.847184 |
|            tinynet_a            | 1  | 0.847141 |
|           rexnet_100            | 1  | 0.846812 |
|          cspdarknet53           | 1  | 0.844931 |
|           fbnetc_100            | 1  | 0.840798 |
|           tf_mixnet_l           | 1  | 0.839384 |
|           mobilevit_s           | 1  | 0.837146 |
|          ghostnet_100           | 1  | 0.833017 |
|           regnety_002           | 1  | 0.827604 |
|            mixnet_l             | 1  | 0.825316 |
|          resmlp_12_224          | 1  | 0.824329 |
|         visformer_small         | 1  | 0.822167 |
|         coat_lite_mini          | 1  | 0.819142 |
|        sebotnet33ts_256         | 1  | 0.816099 |
|             dpn107              | 1  | 0.813351 |
|         poolformer_m36          | 1  | 0.804833 |
|          convnext_base          | 1  | 0.800841 |
|           resnest101e           | 1  | 0.796724 |
|             dla102              | 1  | 0.795461 |
|          inception_v3           | 1  | 0.795317 |
|          gmlp_s16_224           | 1  | 0.795197 |
|        adv_inception_v3         | 1  | 0.794708 |
|       gluon_inception_v3        | 1  | 0.794135 |
|            levit_128            | 1  | 0.793926 |
|           res2next50            | 1  | 0.792227 |
|         crossvit_9_240          | 1  | 0.788507 |
|           volo_d1_224           | 1  | 0.786577 |
|          gmixer_24_224          | 1  | 0.785313 |
|        tnt_s_patch16_224        | 1  | 0.780986 |
|            hrnet_w18            | 1  | 0.78068  |
|           convit_base           | 1  | 0.780229 |
|           selecsls42b           | 1  | 0.780042 |
|        twins_pcpvt_base         | 1  | 0.778896 |
|          jx_nest_base           | 1  | 0.777609 |
|          mixer_b16_224          | 1  | 0.775759 |
|      beit_base_patch16_224      | 1  | 0.770201 |
|        res2net50_14w_8s         | 1  | 0.763848 |
|      vit_base_patch16_224       | 1  | 0.763375 |
| deit_base_distilled_patch16_224 | 1  | 0.761428 |
|            pit_b_224            | 1  | 0.752153 |
|  swin_base_patch4_window7_224   | 1  | 0.744724 |
|        res2net101_26w_4s        | 1  | 0.739373 |
|            repvgg_a2            | 1  | 0.738211 |
|            gernet_l             | 1  | 0.730663 |
|     swsl_resnext101_32x16d      | 1  | 0.651544 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13592.986056 |
|          cait_m36_384           | 1  | 3432.968255  |
|      xcit_large_24_p8_224       | 1  | 1572.972083  |
|           resnest101e           | 1  | 1089.910501  |
|          pnasnet5large          | 1  |  374.615844  |
|          convnext_base          | 1  |  307.057724  |
|            hrnet_w18            | 1  |  293.595999  |
|             dpn107              | 1  |  270.355711  |
|        convmixer_768_32         | 1  |  251.651918  |
|          jx_nest_base           | 1  |  247.480046  |
|  swin_base_patch4_window7_224   | 1  |  224.398357  |
|      vit_base_patch16_224       | 1  |  201.313594  |
|           convit_base           | 1  |  200.175243  |
|      beit_base_patch16_224      | 1  |  199.529971  |
| deit_base_distilled_patch16_224 | 1  |  198.094355  |
|            pit_b_224            | 1  |   172.3261   |
|           dm_nfnet_f0           | 1  |  160.74573   |
|          mixer_b16_224          | 1  |  142.940607  |
|         poolformer_m36          | 1  |  121.317049  |
|        res2net101_26w_4s        | 1  |  114.172176  |
|        twins_pcpvt_base         | 1  |  104.115414  |
|        sebotnet33ts_256         | 1  |  99.475716   |
|            nfnet_l0             | 1  |  95.025633   |
|           volo_d1_224           | 1  |  94.366373   |
|        tnt_s_patch16_224        | 1  |  93.736337   |
|             dla102              | 1  |  91.695783   |
|          cspdarknet53           | 1  |  84.561187   |
|        adv_inception_v3         | 1  |  70.343028   |
|          inception_v3           | 1  |  70.324691   |
|       gluon_inception_v3        | 1  |  70.302499   |
|          gmlp_s16_224           | 1  |  69.244542   |
|         visformer_small         | 1  |  67.745479   |
|          gmixer_24_224          | 1  |  65.806467   |
|        res2net50_14w_8s         | 1  |  65.339432   |
|            repvgg_a2            | 1  |  65.139098   |
|           res2next50            | 1  |  59.669516   |
|            gernet_l             | 1  |  57.482092   |
|          botnet26t_256          | 1  |  46.806199   |
|           selecsls42b           | 1  |  46.329018   |
|        eca_halonext26ts         | 1  |  43.100177   |
|       eca_botnext26ts_256       | 1  |   42.13169   |
|         coat_lite_mini          | 1  |  38.228581   |
|           mobilevit_s           | 1  |   37.37244   |
|          resmlp_12_224          | 1  |  36.446129   |
|         crossvit_9_240          | 1  |  32.433179   |
|        ese_vovnet19b_dw         | 1  |  31.918119   |
|            mixnet_l             | 1  |  30.819244   |
|           tf_mixnet_l           | 1  |  29.899398   |
|            fbnetv3_b            | 1  |  15.748953   |
|       tf_efficientnet_b0        | 1  |  13.874052   |
|           rexnet_100            | 1  |  13.607743   |
|            tinynet_a            | 1  |  11.987436   |
|           fbnetc_100            | 1  |   9.400588   |
|            levit_128            | 1  |   9.162255   |
|          ghostnet_100           | 1  |   8.987299   |
|          spnasnet_100           | 1  |   8.294409   |
|           mnasnet_100           | 1  |   7.674828   |
|         mobilenetv2_100         | 1  |   7.506452   |
|      mobilenetv3_large_100      | 1  |   7.252014   |
|           regnety_002           | 1  |   6.167846   |
|            lcnet_050            | 1  |   2.470086   |
+---------------------------------+----+--------------+

zxd1997066 · 2024-10-15T14:51:28Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	41977a05314bbf537e1c5d6cf5916a368d1907d9
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.15x    |    2.19x    |    2.77x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.67    |    22.98    |    22.75    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.316913 |
|          timm_resnest           |   32    | 5.430328  |
|            resnet50             |   32    |  4.78512  |
|         resnext50_32x4d         |    8    | 4.366257  |
|         phlippe_resnet          |   128   | 4.264483  |
|          squeezenet1_1          |   16    | 3.990085  |
|            resnet152            |   32    | 3.954579  |
|            resnet18             |    8    | 3.882505  |
|          mobilenet_v2           |   16    | 3.827207  |
|           mnasnet1_0            |   32    | 3.819818  |
|              vgg16              |    4    | 3.704375  |
|       mobilenet_v3_large        |   32    | 3.655492  |
|           timm_vovnet           |   32    | 3.484857  |
|             yolov3              |    8    |  3.47002  |
|             alexnet             |   128   | 3.269992  |
|           timm_regnet           |   32    | 3.032202  |
|        timm_efficientnet        |   64    | 2.954545  |
|             hf_GPT2             |    1    | 2.895724  |
|          maml_omniglot          |    5    | 2.838042  |
|       shufflenet_v2_x1_0        |   64    | 2.820978  |
|             hf_Bert             |    1    | 2.752006  |
|           timm_nfnet            |   128   | 2.747972  |
|          hf_Bert_large          |    1    | 2.745551  |
|              llama              |   32    | 2.670169  |
|        phlippe_densenet         |   128   |  2.6242   |
|        soft_actor_critic        |   256   | 2.584016  |
|       doctr_det_predictor       |    1    | 2.580248  |
|           densenet121           |   64    | 2.548719  |
|          lennard_jones          |  1000   | 2.470198  |
|           hf_T5_base            |    1    | 2.402156  |
|          BERT_pytorch           |    2    | 2.390788  |
|          hf_DistilBert          |    1    | 2.312463  |
|     functorch_maml_omniglot     |    1    | 2.194344  |
|             hf_Bart             |    1    | 2.179537  |
|            hf_Albert            |    1    | 2.153439  |
|         LearningToPaint         |   96    | 2.112653  |
|              hf_T5              |    1    | 2.110018  |
|          fastNLP_Bert           |    1    | 2.041961  |
|            moondream            |    1    | 2.021079  |
|              dcgan              |   256   | 1.989676  |
|          hf_GPT2_large          |    1    | 1.902177  |
|          hf_Longformer          |    1    | 1.856075  |
|      doctr_reco_predictor       |    1    | 1.843045  |
|           hf_T5_large           |    1    | 1.834999  |
|       Background_Matting        |    1    |  1.74859  |
|      torch_multimodal_clip      |   32    | 1.748113  |
|          pytorch_unet           |    1    |  1.73371  |
|         pytorch_stargan         |   16    | 1.637664  |
|     timm_vision_transformer     |   32    | 1.634633  |
|        basic_gnn_edgecnn        |    1    | 1.631405  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.606758  |
|        hf_distil_whisper        |    1    | 1.583052  |
|       speech_transformer        |    1    | 1.526012  |
|  timm_vision_transformer_large  |   32    | 1.473614  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.468149  |
|           hf_Reformer           |    1    | 1.433423  |
|     nvidia_deeprecommender      |   256   | 1.371234  |
|          basic_gnn_gin          |    1    | 1.351082  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.335206  |
|         vision_maskrcnn         |    1    | 1.331801  |
|         basic_gnn_sage          |    1    | 1.287179  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.253693  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.253588  |
|         opacus_cifar10          |   64    | 1.205158  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.199921  |
|          basic_gnn_gcn          |    1    | 1.194752  |
|              dlrm               |  2048   |  1.1943   |
|           Super_SloMo           |    6    | 1.192311  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.145541  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.132017  |
|      functorch_dp_cifar10       |   64    | 1.119545  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.078857  |
|             demucs              |    1    | 1.076643  |
|           hf_BigBird            |    1    | 1.068539  |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.048743  |
|   mobilenet_v2_quantized_qat    |   96    | 0.999359  |
|           tts_angular           |   64    | 0.986855  |
|     resnet50_quantized_qat      |   32    | 0.980789  |
|              maml               |    1    | 0.845166  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 129.698038 |
|           hf_BigBird            |    1    | 127.675996 |
|    detectron2_fcos_r_50_fpn     |    1    | 101.451941 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 99.893907  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 89.372613  |
|              maml               |    1    | 72.901561  |
|          hf_Longformer          |    1    | 70.636365  |
|           hf_T5_large           |    1    | 62.005868  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  56.18707  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.886361  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 46.318867  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 45.375066  |
|       speech_transformer        |    1    | 44.394523  |
|           hf_Reformer           |    1    | 37.533588  |
|           densenet121           |   64    | 34.553364  |
|            moondream            |    1    | 34.546397  |
|     pyhpc_isoneutral_mixing     | 1048576 | 33.264505  |
|          hf_GPT2_large          |    1    | 32.295505  |
|  timm_vision_transformer_large  |   32    | 29.969513  |
|           hf_T5_base            |    1    |  28.20758  |
|       doctr_det_predictor       |    1    | 26.849709  |
|          fastNLP_Bert           |    1    | 26.788218  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  26.42674  |
|          basic_gnn_gcn          |    1    | 26.274148  |
|           Super_SloMo           |    6    | 26.162787  |
|        hf_distil_whisper        |    1    | 25.752967  |
|      torch_multimodal_clip      |   32    | 24.398819  |
|          hf_Bert_large          |    1    | 23.907984  |
|            resnet152            |   32    | 22.361243  |
|          BERT_pytorch           |    2    | 21.654992  |
|             yolov3              |    8    | 19.803937  |
|             hf_Bart             |    1    | 18.922426  |
|              hf_T5              |    1    | 18.797155  |
|           timm_regnet           |   32    | 16.901633  |
|       shufflenet_v2_x1_0        |   64    | 16.878779  |
|             demucs              |    1    | 16.813351  |
|        phlippe_densenet         |   128   | 16.608895  |
|       Background_Matting        |    1    | 16.444159  |
|        timm_efficientnet        |   64    | 16.271604  |
|           timm_nfnet            |   128   | 16.269349  |
|              llama              |   32    | 15.709355  |
|             hf_Bert             |    1    | 15.678147  |
|             hf_GPT2             |    1    | 15.529007  |
|          timm_resnest           |   32    | 15.512683  |
|       mobilenet_v3_large        |   32    | 14.662752  |
|            hf_Albert            |    1    | 14.500752  |
|      doctr_reco_predictor       |    1    | 14.236999  |
|     timm_vision_transformer     |   32    | 14.217895  |
|           timm_vovnet           |   32    | 13.944349  |
|         resnext50_32x4d         |    8    | 13.350676  |
|          mobilenet_v2           |   16    | 13.268926  |
|          pytorch_unet           |    1    | 13.191716  |
|            resnet50             |   32    | 13.172042  |
|         opacus_cifar10          |   64    | 13.143161  |
|         pytorch_stargan         |   16    | 12.800376  |
|           mnasnet1_0            |   32    | 12.774452  |
|          hf_DistilBert          |    1    | 12.747579  |
|      functorch_dp_cifar10       |   64    | 12.571292  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.507409  |
|          squeezenet1_1          |   16    | 10.602942  |
|            resnet18             |    8    | 10.400886  |
|         LearningToPaint         |   96    | 10.190612  |
|         phlippe_resnet          |   128   |  9.942538  |
|     pyhpc_equation_of_state     | 1048576 |  9.851088  |
|             alexnet             |   128   |  9.707752  |
|              vgg16              |    4    |  9.514371  |
|     functorch_maml_omniglot     |    1    |  9.025887  |
|          maml_omniglot          |    5    |  8.763093  |
|              dlrm               |  2048   |  8.661069  |
|          basic_gnn_gin          |    1    |  8.479686  |
|         basic_gnn_sage          |    1    |  8.455079  |
|        basic_gnn_edgecnn        |    1    |  8.437218  |
|              dcgan              |   256   |  8.296424  |
|     nvidia_deeprecommender      |   256   |  8.169316  |
|        soft_actor_critic        |   256   |  8.080675  |
|           tts_angular           |   64    |  8.014036  |
|          lennard_jones          |  1000   |  7.893499  |
|   mobilenet_v2_quantized_qat    |   96    |  0.225916  |
|     resnet50_quantized_qat      |   32    |  0.202201  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.98784  |
|           timm_nfnet            |   128   | 0.986411 |
|             demucs              |    1    | 0.981181 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974451 |
|           timm_regnet           |   32    | 0.974068 |
|        timm_efficientnet        |   64    | 0.973589 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972788 |
|       Background_Matting        |    1    | 0.971448 |
|           Super_SloMo           |    6    | 0.969814 |
|              llama              |   32    | 0.968924 |
|       doctr_det_predictor       |    1    | 0.966338 |
|           densenet121           |   64    | 0.965994 |
|      torch_multimodal_clip      |   32    | 0.965981 |
|          pytorch_unet           |    1    | 0.965011 |
|             yolov3              |    8    | 0.964733 |
|        basic_gnn_edgecnn        |    1    | 0.964551 |
|         LearningToPaint         |   96    | 0.964469 |
|            resnet152            |   32    | 0.963291 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963242 |
|            resnet50             |   32    | 0.960584 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.959636 |
|           timm_vovnet           |   32    | 0.959625 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.959056 |
|     resnet50_quantized_qat      |   32    | 0.954736 |
|          timm_resnest           |   32    | 0.953826 |
|     timm_vision_transformer     |   32    | 0.953131 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.951289 |
|   mobilenet_v2_quantized_qat    |   96    | 0.950698 |
|         vision_maskrcnn         |    1    | 0.949048 |
|      doctr_reco_predictor       |    1    | 0.945317 |
|           hf_T5_base            |    1    | 0.943344 |
|          basic_gnn_gcn          |    1    | 0.942731 |
|           hf_BigBird            |    1    | 0.940423 |
|           mnasnet1_0            |   32    | 0.939412 |
|     pyhpc_equation_of_state     | 1048576 | 0.937643 |
|          basic_gnn_gin          |    1    | 0.936886 |
|         basic_gnn_sage          |    1    | 0.935574 |
|          mobilenet_v2           |   16    | 0.932555 |
|  timm_vision_transformer_large  |   32    | 0.931574 |
|         pytorch_stargan         |   16    | 0.931014 |
|       shufflenet_v2_x1_0        |   64    | 0.930756 |
|       mobilenet_v3_large        |   32    | 0.930756 |
|         resnext50_32x4d         |    8    | 0.930147 |
|             hf_Bert             |    1    | 0.927981 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.921817 |
|     nvidia_deeprecommender      |   256   | 0.920115 |
|          fastNLP_Bert           |    1    | 0.91911  |
|        phlippe_densenet         |   128   | 0.917905 |
|       speech_transformer        |    1    | 0.909495 |
|            hf_Albert            |    1    | 0.907007 |
|          BERT_pytorch           |    2    | 0.90674  |
|              dcgan              |   256   | 0.902439 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.901316 |
|             hf_GPT2             |    1    | 0.899479 |
|          squeezenet1_1          |   16    | 0.899142 |
|          hf_DistilBert          |    1    | 0.898928 |
|           tts_angular           |   64    | 0.898361 |
|         opacus_cifar10          |   64    | 0.891747 |
|          hf_Longformer          |    1    | 0.89068  |
|        soft_actor_critic        |   256   | 0.883715 |
|             hf_Bart             |    1    | 0.882634 |
|        hf_distil_whisper        |    1    | 0.882096 |
|          hf_GPT2_large          |    1    | 0.879365 |
|              hf_T5              |    1    | 0.879039 |
|              vgg16              |    4    | 0.869946 |
|         phlippe_resnet          |   128   | 0.867925 |
|          lennard_jones          |  1000   | 0.862126 |
|          maml_omniglot          |    5    | 0.858713 |
|     functorch_maml_omniglot     |    1    | 0.858268 |
|          hf_Bert_large          |    1    | 0.855953 |
|            moondream            |    1    | 0.849938 |
|      functorch_dp_cifar10       |   64    | 0.842044 |
|           hf_T5_large           |    1    | 0.82975  |
|            resnet18             |    8    | 0.820809 |
|           hf_Reformer           |    1    | 0.815331 |
|             alexnet             |   128   | 0.767751 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.762582 |
|              maml               |    1    | 0.723171 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.563999 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 820.918397 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 812.942127 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 710.762799 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 706.566724 |
|  timm_vision_transformer_large  |   32    | 652.564755 |
|           hf_T5_base            |    1    | 351.107694 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 251.696081 |
|         vision_maskrcnn         |    1    | 240.751566 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 240.692901 |
|           Super_SloMo           |    6    | 199.478792 |
|          hf_GPT2_large          |    1    | 128.575492 |
|           hf_T5_large           |    1    | 125.845761 |
|            moondream            |    1    |  102.7622  |
|        hf_distil_whisper        |    1    | 101.07956  |
|           hf_BigBird            |    1    | 95.793747  |
|           timm_nfnet            |   128   | 95.421398  |
|    detectron2_fcos_r_50_fpn     |    1    | 75.798494  |
|          pytorch_unet           |    1    | 66.111984  |
|       Background_Matting        |    1    | 64.470243  |
|              maml               |    1    | 53.441273  |
|             demucs              |    1    | 50.579154  |
|           densenet121           |   64    | 49.730205  |
|      torch_multimodal_clip      |   32    | 40.005359  |
|           timm_regnet           |   32    | 36.941023  |
|       doctr_det_predictor       |    1    | 36.160855  |
|          hf_Bert_large          |    1    | 32.858707  |
|          hf_Longformer          |    1    | 32.833588  |
|            resnet152            |   32    | 30.183713  |
|   mobilenet_v2_quantized_qat    |   96    |  29.93996  |
|             yolov3              |    8    | 28.759589  |
|     pyhpc_isoneutral_mixing     | 1048576 | 27.742922  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 26.815347  |
|       speech_transformer        |    1    | 24.463918  |
|        timm_efficientnet        |   64    | 21.800787  |
|     timm_vision_transformer     |   32    | 21.658087  |
|           hf_Reformer           |    1    | 21.291804  |
|             hf_Bart             |    1    |  19.1609   |
|              hf_T5              |    1    |  18.82623  |
|     nvidia_deeprecommender      |   256   | 17.614039  |
|          fastNLP_Bert           |    1    | 17.248991  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.860374  |
|           timm_vovnet           |   32    | 16.595306  |
|         pytorch_stargan         |   16    | 15.445661  |
|            hf_Albert            |    1    | 13.199865  |
|             hf_Bert             |    1    | 13.110262  |
|          BERT_pytorch           |    2    | 12.153478  |
|     resnet50_quantized_qat      |   32    | 11.959474  |
|            resnet50             |   32    | 11.825568  |
|             hf_GPT2             |    1    | 10.257457  |
|           tts_angular           |   64    |  9.308133  |
|       shufflenet_v2_x1_0        |   64    |  8.993479  |
|              llama              |   32    |  8.738323  |
|          timm_resnest           |   32    |  8.610842  |
|          hf_DistilBert          |    1    |  8.184236  |
|         LearningToPaint         |   96    |  8.158433  |
|        phlippe_densenet         |   128   |  7.890855  |
|          basic_gnn_gcn          |    1    |  7.629529  |
|         opacus_cifar10          |   64    |  7.445065  |
|       mobilenet_v3_large        |   32    |  7.324985  |
|      functorch_dp_cifar10       |   64    |  7.090935  |
|         resnext50_32x4d         |    8    |  6.986285  |
|        basic_gnn_edgecnn        |    1    |  6.707615  |
|              vgg16              |    4    |   6.468    |
|             alexnet             |   128   |  6.185389  |
|           mnasnet1_0            |   32    |  6.101419  |
|          mobilenet_v2           |   16    |  4.863597  |
|              dlrm               |  2048   |  4.433437  |
|         basic_gnn_sage          |    1    |  4.096344  |
|          basic_gnn_gin          |    1    |  3.657416  |
|      doctr_reco_predictor       |    1    |  3.294472  |
|          squeezenet1_1          |   16    |  3.228366  |
|              dcgan              |   256   |  2.712567  |
|            resnet18             |    8    |  2.650045  |
|         phlippe_resnet          |   128   |  1.635042  |
|     pyhpc_equation_of_state     | 1048576 |  1.156643  |
|     functorch_maml_omniglot     |    1    |  0.37092   |
|          maml_omniglot          |    5    |  0.277495  |
|        soft_actor_critic        |   256   |  0.234387  |
|          lennard_jones          |  1000   |  0.154126  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 12.407484 |
|       ElectraForQuestionAnswering       | 64  |  4.18208  |
|           ElectraForCausalLM            | 32  | 3.731507  |
|     MobileBertForQuestionAnswering      | 128 | 3.392804  |
|           RobertaForCausalLM            | 16  | 3.088474  |
|    LayoutLMForSequenceClassification    | 16  | 3.028118  |
|       RobertaForQuestionAnswering       | 16  | 3.018694  |
|          MobileBertForMaskedLM          | 128 | 2.981443  |
|        BertForQuestionAnswering         | 16  | 2.971994  |
|                CamemBert                | 16  | 2.901536  |
|             BertForMaskedLM             | 16  | 2.791065  |
|           LayoutLMForMaskedLM           | 16  | 2.786948  |
|    MegatronBertForQuestionAnswering     |  8  |  2.7318   |
|               DistillGPT2               | 16  | 2.479333  |
|         MegatronBertForCausalLM         |  4  | 2.391576  |
|            YituTechConvBert             | 16  | 2.375734  |
|       T5ForConditionalGeneration        |  4  | 2.371913  |
|      GPT2ForSequenceClassification      |  4  | 2.331312  |
|                 T5Small                 |  4  | 2.309592  |
|       DebertaForQuestionAnswering       | 16  | 2.161028  |
|             XGLMForCausalLM             |  8  | 2.081458  |
|             OPTForCausalLM              |  2  | 2.066776  |
|      DebertaV2ForQuestionAnswering      |  1  | 2.002774  |
|         Speech2Text2ForCausalLM         | 256 | 1.986926  |
|           DebertaForMaskedLM            |  8  | 1.983079  |
|       MT5ForConditionalGeneration       | 16  |  1.93897  |
|          BlenderbotForCausalLM          |  4  |  1.88804  |
|       BlenderbotSmallForCausalLM        | 64  | 1.862556  |
|     DistilBertForQuestionAnswering      | 256 | 1.847343  |
|            PLBartForCausalLM            |  8  | 1.769816  |
|            MBartForCausalLM             |  4  | 1.758551  |
|          DebertaV2ForMaskedLM           |  2  | 1.755359  |
|          DistilBertForMaskedLM          | 128 |  1.7323   |
|     PLBartForConditionalGeneration      |  4  | 1.723433  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.707116  |
|            TrOCRForCausalLM             | 32  | 1.671835  |
|     M2M100ForConditionalGeneration      | 16  | 1.647577  |
|           PegasusForCausalLM            | 32  | 1.620652  |
|     PegasusForConditionalGeneration     | 32  | 1.613655  |
|      MBartForConditionalGeneration      |  2  | 1.606881  |
|               GoogleFnet                | 16  | 1.565125  |
|      BartForConditionalGeneration       |  2  | 1.528669  |
|             BartForCausalLM             |  4  | 1.495236  |
|            AlbertForMaskedLM            |  4  | 1.488444  |
|       AlbertForQuestionAnswering        |  4  | 1.469034  |
|          AllenaiLongformerBase          |  4  | 1.064856  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 85.763993 |
|          MobileBertForMaskedLM          | 128 | 50.769401 |
|     MobileBertForQuestionAnswering      | 128 | 50.498199 |
|     M2M100ForConditionalGeneration      | 16  | 36.035169 |
|      MBartForConditionalGeneration      |  2  | 35.957669 |
|     PegasusForConditionalGeneration     | 32  | 35.867641 |
|      BartForConditionalGeneration       |  2  | 31.640169 |
|             XGLMForCausalLM             |  8  | 30.419751 |
|          BlenderbotForCausalLM          |  4  | 30.372998 |
|          DebertaV2ForMaskedLM           |  2  | 29.583421 |
|      DebertaV2ForQuestionAnswering      |  1  | 29.073131 |
|       MT5ForConditionalGeneration       | 16  | 26.86635  |
| BlenderbotSmallForConditionalGeneration | 64  | 26.32713  |
|         MegatronBertForCausalLM         |  4  | 25.044613 |
|    MegatronBertForQuestionAnswering     |  8  | 24.204154 |
|            YituTechConvBert             | 16  | 22.601689 |
|     PLBartForConditionalGeneration      |  4  | 21.468442 |
|                 T5Small                 |  4  | 20.415245 |
|       T5ForConditionalGeneration        |  4  | 20.332453 |
|           DebertaForMaskedLM            |  8  | 18.718219 |
|             OPTForCausalLM              |  2  | 18.68061  |
|           PegasusForCausalLM            | 32  | 18.610964 |
|            MBartForCausalLM             |  4  | 18.564513 |
|       DebertaForQuestionAnswering       | 16  | 18.516876 |
|            TrOCRForCausalLM             | 32  | 18.148428 |
|            XLNetLMHeadModel             |  8  | 16.833737 |
|            AlbertForMaskedLM            |  4  | 16.676924 |
|             BartForCausalLM             |  4  | 16.193498 |
|      GPT2ForSequenceClassification      |  4  | 16.005907 |
|           ElectraForCausalLM            | 32  | 15.661161 |
|    LayoutLMForSequenceClassification    | 16  | 15.537305 |
|           LayoutLMForMaskedLM           | 16  | 15.480112 |
|        BertForQuestionAnswering         | 16  | 15.475705 |
|       RobertaForQuestionAnswering       | 16  | 15.43252  |
|                CamemBert                | 16  | 15.264919 |
|           RobertaForCausalLM            | 16  | 15.195162 |
|             BertForMaskedLM             | 16  | 15.16217  |
|       BlenderbotSmallForCausalLM        | 64  | 15.119144 |
|       ElectraForQuestionAnswering       | 64  | 15.052855 |
|               GoogleFnet                | 16  | 14.125489 |
|       AlbertForQuestionAnswering        |  4  | 13.86638  |
|     DistilBertForQuestionAnswering      | 256 | 13.457448 |
|         Speech2Text2ForCausalLM         | 256 | 13.453592 |
|          DistilBertForMaskedLM          | 128 | 13.430862 |
|            PLBartForCausalLM            |  8  | 13.220481 |
|               DistillGPT2               | 16  | 12.005904 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.991574 |
|            PLBartForCausalLM            |  8  | 0.991404 |
|               DistillGPT2               | 16  | 0.990432 |
|          DistilBertForMaskedLM          | 128 | 0.989571 |
|           ElectraForCausalLM            | 32  | 0.989043 |
|             BertForMaskedLM             | 16  | 0.988802 |
|            AlbertForMaskedLM            |  4  | 0.988358 |
|       ElectraForQuestionAnswering       | 64  | 0.987992 |
|            YituTechConvBert             | 16  | 0.987064 |
|             OPTForCausalLM              |  2  |  0.9866  |
|       BlenderbotSmallForCausalLM        | 64  | 0.986363 |
|           RobertaForCausalLM            | 16  | 0.986209 |
|                CamemBert                | 16  | 0.985997 |
|         Speech2Text2ForCausalLM         | 256 | 0.985759 |
|           LayoutLMForMaskedLM           | 16  | 0.985385 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.984613 |
|        BertForQuestionAnswering         | 16  | 0.982743 |
|     DistilBertForQuestionAnswering      | 256 | 0.982551 |
|    LayoutLMForSequenceClassification    | 16  | 0.982031 |
|       DebertaForQuestionAnswering       | 16  | 0.982003 |
|            TrOCRForCausalLM             | 32  | 0.981481 |
|                 T5Small                 |  4  | 0.980449 |
|       T5ForConditionalGeneration        |  4  | 0.979978 |
|          MobileBertForMaskedLM          | 128 | 0.979582 |
|       RobertaForQuestionAnswering       | 16  | 0.979278 |
|       MT5ForConditionalGeneration       | 16  | 0.97796  |
|          AllenaiLongformerBase          |  4  | 0.977254 |
|      GPT2ForSequenceClassification      |  4  | 0.975209 |
|     MobileBertForQuestionAnswering      | 128 | 0.969855 |
|           DebertaForMaskedLM            |  8  | 0.967233 |
|           PegasusForCausalLM            | 32  | 0.967089 |
|     PLBartForConditionalGeneration      |  4  | 0.963079 |
|     M2M100ForConditionalGeneration      | 16  | 0.951859 |
|         MegatronBertForCausalLM         |  4  | 0.942406 |
|          BlenderbotForCausalLM          |  4  | 0.940728 |
|    MegatronBertForQuestionAnswering     |  8  | 0.937292 |
|          DebertaV2ForMaskedLM           |  2  | 0.936688 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.935506 |
|            XLNetLMHeadModel             |  8  | 0.935181 |
|      BartForConditionalGeneration       |  2  | 0.928626 |
|             BartForCausalLM             |  4  | 0.924003 |
|      MBartForConditionalGeneration      |  2  | 0.92223  |
|     PegasusForConditionalGeneration     | 32  | 0.915833 |
|             XGLMForCausalLM             |  8  | 0.914035 |
|            MBartForCausalLM             |  4  | 0.901384 |
|               GoogleFnet                | 16  | 0.891594 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 932.550038 |
|       AlbertForQuestionAnswering        |  4  | 481.820197 |
|            AlbertForMaskedLM            |  4  | 481.789232 |
|            XLNetLMHeadModel             |  8  | 362.270758 |
|               GoogleFnet                | 16  | 263.937613 |
|             OPTForCausalLM              |  2  | 195.946442 |
|            TrOCRForCausalLM             | 32  | 169.247489 |
|      MBartForConditionalGeneration      |  2  | 165.779106 |
|            MBartForCausalLM             |  4  | 165.084995 |
|     PegasusForConditionalGeneration     | 32  | 161.761785 |
|      BartForConditionalGeneration       |  2  | 130.792488 |
|     DistilBertForQuestionAnswering      | 256 | 130.457735 |
|            PLBartForCausalLM            |  8  | 130.312915 |
|    MegatronBertForQuestionAnswering     |  8  | 126.103416 |
|            YituTechConvBert             | 16  | 124.541848 |
|                 T5Small                 |  4  | 124.024741 |
|       T5ForConditionalGeneration        |  4  | 121.242856 |
|          BlenderbotForCausalLM          |  4  | 120.530201 |
|          DebertaV2ForMaskedLM           |  2  | 118.911453 |
|     M2M100ForConditionalGeneration      | 16  | 118.646637 |
|     PLBartForConditionalGeneration      |  4  | 117.191978 |
|          DistilBertForMaskedLM          | 128 | 109.940562 |
| BlenderbotSmallForConditionalGeneration | 64  | 105.505478 |
|           RobertaForCausalLM            | 16  | 100.63126  |
|          MobileBertForMaskedLM          | 128 | 99.983546  |
|           LayoutLMForMaskedLM           | 16  |  99.26128  |
|             BertForMaskedLM             | 16  | 97.244207  |
|                CamemBert                | 16  | 95.399345  |
|             BartForCausalLM             |  4  | 91.413701  |
|       DebertaForQuestionAnswering       | 16  | 89.390779  |
|       MT5ForConditionalGeneration       | 16  | 88.970851  |
|         MegatronBertForCausalLM         |  4  | 83.961676  |
|        BertForQuestionAnswering         | 16  |  77.13119  |
|           PegasusForCausalLM            | 32  | 76.523377  |
|    LayoutLMForSequenceClassification    | 16  | 76.460042  |
|       RobertaForQuestionAnswering       | 16  | 74.948044  |
|           DebertaForMaskedLM            |  8  | 69.545681  |
|             XGLMForCausalLM             |  8  | 69.347613  |
|               DistillGPT2               | 16  | 68.230567  |
|      DebertaV2ForQuestionAnswering      |  1  | 67.666742  |
|       ElectraForQuestionAnswering       | 64  | 61.800127  |
|      GPT2ForSequenceClassification      |  4  | 57.126052  |
|         Speech2Text2ForCausalLM         | 256 | 56.956446  |
|           ElectraForCausalLM            | 32  | 56.164248  |
|       BlenderbotSmallForCausalLM        | 64  | 55.201794  |
|     MobileBertForQuestionAnswering      | 128 | 53.068039  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 5.412684 |
|          inception_v3           | 128  | 5.048737 |
|           fbnetc_100            | 512  | 4.895025 |
|           mnasnet_100           | 512  | 4.886968 |
|        adv_inception_v3         | 128  | 4.873683 |
|       gluon_inception_v3        | 256  | 4.824419 |
|          cspdarknet53           |  64  | 4.634239 |
|           regnety_002           | 1024 | 4.626827 |
|      mobilenetv3_large_100      | 512  | 4.569177 |
|            fbnetv3_b            | 256  | 4.427387 |
|         mobilenetv2_100         | 128  | 4.40995  |
|            lcnet_050            | 256  | 4.369847 |
|           res2next50            | 128  | 4.280844 |
|            hrnet_w18            | 128  | 4.219932 |
|        res2net101_26w_4s        | 128  | 4.20543  |
|        ese_vovnet19b_dw         | 256  | 4.201597 |
|          spnasnet_100           | 128  | 4.122871 |
|        res2net50_14w_8s         | 128  | 4.087054 |
|          botnet26t_256          | 128  | 4.053022 |
|          pnasnet5large          |  16  | 3.872528 |
|            gernet_l             | 128  | 3.790612 |
|             dla102              | 128  | 3.765528 |
|     swsl_resnext101_32x16d      |  32  | 3.750806 |
|            nfnet_l0             | 128  | 3.680269 |
|           rexnet_100            | 256  | 3.437131 |
|           volo_d1_224           |  64  | 3.319353 |
|            tinynet_a            | 128  | 3.242709 |
|       tf_efficientnet_b0        | 128  | 3.22567  |
|        eca_halonext26ts         | 128  | 3.197977 |
|           dm_nfnet_f0           | 128  | 3.116506 |
|           mobilevit_s           |  64  | 3.103179 |
|       eca_botnext26ts_256       | 128  | 3.035662 |
|        convmixer_768_32         |  32  | 2.967912 |
|            repvgg_a2            | 128  | 2.87674  |
|           selecsls42b           | 128  | 2.671666 |
|         visformer_small         | 128  |  2.6185  |
|          ghostnet_100           | 512  | 2.497511 |
|           convit_base           |  64  | 2.367163 |
|         poolformer_m36          |  64  | 2.313564 |
|          jx_nest_base           |  32  | 2.02202  |
|      xcit_large_24_p8_224       |  16  | 2.007109 |
|             dpn107              |  64  | 1.98338  |
|           tf_mixnet_l           | 128  | 1.965326 |
|            mixnet_l             | 128  | 1.964774 |
|          convnext_base          |  64  | 1.891724 |
|            levit_128            | 1024 | 1.89094  |
|         coat_lite_mini          | 128  | 1.871945 |
|        twins_pcpvt_base         | 128  | 1.714678 |
|          gmlp_s16_224           | 128  | 1.713754 |
|        tnt_s_patch16_224        | 128  | 1.610684 |
|  swin_base_patch4_window7_224   |  64  | 1.604716 |
| deit_base_distilled_patch16_224 |  64  | 1.598806 |
|          mixer_b16_224          | 128  | 1.586655 |
|         crossvit_9_240          | 256  | 1.511137 |
|      beit_base_patch16_224      |  64  | 1.504662 |
|        sebotnet33ts_256         |  64  | 1.452542 |
|          resmlp_12_224          | 128  | 1.447527 |
|            pit_b_224            |  64  | 1.438848 |
|      vit_base_patch16_224       |  64  | 1.43441  |
|          gmixer_24_224          | 128  | 1.301077 |
|          cait_m36_384           |  4   | 1.01345  |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          cait_m36_384           |  4   | 49.255778 |
|      xcit_large_24_p8_224       |  16  | 47.821651 |
|          pnasnet5large          |  16  |  47.0104  |
|         poolformer_m36          |  64  | 44.406913 |
|            hrnet_w18            | 128  | 44.144786 |
|  swin_base_patch4_window7_224   |  64  | 41.093525 |
|           mobilevit_s           |  64  | 34.520957 |
|        tnt_s_patch16_224        | 128  | 34.178092 |
|          jx_nest_base           |  32  | 33.187356 |
|        res2net101_26w_4s        | 128  | 33.092875 |
|           resnest101e           |  64  | 32.130998 |
|             dpn107              |  64  | 31.951017 |
|        res2net50_14w_8s         | 128  | 31.850088 |
|        twins_pcpvt_base         | 128  | 31.036609 |
|           tf_mixnet_l           | 128  | 28.618088 |
|           volo_d1_224           |  64  | 28.252492 |
|        eca_halonext26ts         | 128  | 28.115965 |
|            mixnet_l             | 128  | 26.965185 |
|        adv_inception_v3         | 128  | 26.443246 |
|            levit_128            | 1024 | 25.846702 |
|          inception_v3           | 128  | 23.888573 |
|         crossvit_9_240          | 256  | 23.743765 |
|          gmixer_24_224          | 128  | 23.57589  |
|        sebotnet33ts_256         |  64  | 23.400311 |
|       gluon_inception_v3        | 256  | 22.418624 |
|          gmlp_s16_224           | 128  | 21.827848 |
|           res2next50            | 128  | 21.037142 |
|          convnext_base          |  64  |  20.6449  |
|         coat_lite_mini          | 128  | 20.643802 |
|           rexnet_100            | 256  | 19.81953  |
|            fbnetv3_b            | 256  | 19.727718 |
|       eca_botnext26ts_256       | 128  | 19.554174 |
|           convit_base           |  64  | 19.387618 |
|          ghostnet_100           | 512  | 18.897194 |
|             dla102              | 128  | 18.272845 |
|          botnet26t_256          | 128  | 18.031456 |
|            tinynet_a            | 128  | 17.470406 |
|         visformer_small         | 128  | 17.129691 |
|     swsl_resnext101_32x16d      |  32  | 17.080731 |
|        convmixer_768_32         |  32  | 16.888373 |
|       tf_efficientnet_b0        | 128  | 16.551946 |
|          cspdarknet53           |  64  | 15.486074 |
|            pit_b_224            |  64  | 15.206145 |
|           dm_nfnet_f0           | 128  | 15.09061  |
|          mixer_b16_224          | 128  | 14.847349 |
|      beit_base_patch16_224      |  64  | 14.376024 |
| deit_base_distilled_patch16_224 |  64  | 14.147717 |
|      vit_base_patch16_224       |  64  | 14.098338 |
|            repvgg_a2            | 128  | 13.994875 |
|          spnasnet_100           | 128  | 13.670924 |
|            nfnet_l0             | 128  | 13.449506 |
|           regnety_002           | 1024 | 13.313444 |
|            gernet_l             | 128  | 13.17965  |
|          resmlp_12_224          | 128  | 13.165442 |
|      mobilenetv3_large_100      | 512  | 13.018523 |
|         mobilenetv2_100         | 128  | 12.86556  |
|           selecsls42b           | 128  | 12.250677 |
|           fbnetc_100            | 512  | 12.104313 |
|            lcnet_050            | 256  | 11.487596 |
|           mnasnet_100           | 512  | 11.317893 |
|        ese_vovnet19b_dw         | 256  | 10.869515 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995211 |
|           rexnet_100            | 256  | 0.992461 |
|           fbnetc_100            | 512  | 0.992233 |
|      mobilenetv3_large_100      | 512  | 0.991834 |
|           regnety_002           | 1024 | 0.991705 |
|           mnasnet_100           | 512  | 0.991629 |
|       gluon_inception_v3        | 256  | 0.991563 |
|            fbnetv3_b            | 256  | 0.99134  |
|          ghostnet_100           | 512  | 0.991003 |
|           dm_nfnet_f0           | 128  | 0.990987 |
|            levit_128            | 1024 |  0.9909  |
|      xcit_large_24_p8_224       |  16  | 0.988754 |
|      beit_base_patch16_224      |  64  | 0.988596 |
|            nfnet_l0             | 128  | 0.988531 |
|             dpn107              |  64  | 0.988459 |
|             dla102              | 128  | 0.98833  |
|           convit_base           |  64  | 0.988102 |
|       eca_botnext26ts_256       | 128  | 0.987961 |
|          mixer_b16_224          | 128  | 0.98774  |
|        twins_pcpvt_base         | 128  | 0.987642 |
|        res2net101_26w_4s        | 128  | 0.987621 |
|        eca_halonext26ts         | 128  | 0.987262 |
|          gmlp_s16_224           | 128  | 0.987257 |
|           res2next50            | 128  | 0.986828 |
|            mixnet_l             | 128  | 0.986553 |
|           tf_mixnet_l           | 128  | 0.986389 |
|         coat_lite_mini          | 128  | 0.986165 |
|         visformer_small         | 128  | 0.985702 |
|        tnt_s_patch16_224        | 128  | 0.985547 |
|          botnet26t_256          | 128  | 0.985353 |
|           resnest101e           |  64  | 0.985307 |
|       tf_efficientnet_b0        | 128  | 0.98492  |
|          convnext_base          |  64  | 0.984821 |
|        adv_inception_v3         | 128  | 0.984684 |
|          inception_v3           | 128  | 0.984678 |
|         mobilenetv2_100         | 128  | 0.983476 |
|        res2net50_14w_8s         | 128  | 0.983277 |
|         poolformer_m36          |  64  | 0.983272 |
|            pit_b_224            |  64  | 0.983006 |
|          jx_nest_base           |  32  | 0.982924 |
|        convmixer_768_32         |  32  | 0.982737 |
|  swin_base_patch4_window7_224   |  64  | 0.982689 |
|          gmixer_24_224          | 128  | 0.982579 |
|          cspdarknet53           |  64  | 0.982318 |
| deit_base_distilled_patch16_224 |  64  | 0.982281 |
|          pnasnet5large          |  16  | 0.982113 |
|      vit_base_patch16_224       |  64  | 0.982081 |
|            tinynet_a            | 128  | 0.98144  |
|         crossvit_9_240          | 256  | 0.981397 |
|           mobilevit_s           |  64  | 0.98066  |
|            gernet_l             | 128  | 0.98057  |
|          resmlp_12_224          | 128  | 0.980485 |
|     swsl_resnext101_32x16d      |  32  | 0.979307 |
|           volo_d1_224           |  64  | 0.977934 |
|           selecsls42b           | 128  | 0.977474 |
|            hrnet_w18            | 128  | 0.97652  |
|          spnasnet_100           | 128  | 0.975195 |
|          cait_m36_384           |  4   | 0.97383  |
|            repvgg_a2            | 128  | 0.973769 |
|            lcnet_050            | 256  | 0.97276  |
|        sebotnet33ts_256         |  64  | 0.834361 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 458.123888 |
|      xcit_large_24_p8_224       |  16  | 250.136696 |
|             dpn107              |  64  | 219.280916 |
|            levit_128            | 1024 | 207.491489 |
|        tnt_s_patch16_224        | 128  | 189.548055 |
|          convnext_base          |  64  | 174.172137 |
|           dm_nfnet_f0           | 128  | 169.948835 |
|        ese_vovnet19b_dw         | 256  | 165.883851 |
|  swin_base_patch4_window7_224   |  64  | 156.322871 |
|           convit_base           |  64  | 150.505793 |
|          jx_nest_base           |  32  | 149.385195 |
|          mixer_b16_224          | 128  | 148.842331 |
|        twins_pcpvt_base         | 128  | 147.538326 |
|       gluon_inception_v3        | 256  | 140.040514 |
|         poolformer_m36          |  64  | 140.036863 |
|            nfnet_l0             | 128  | 134.062425 |
|        sebotnet33ts_256         |  64  | 124.82257  |
|           tf_mixnet_l           | 128  | 123.514743 |
|          gmixer_24_224          | 128  | 121.902834 |
|         crossvit_9_240          | 256  | 121.518109 |
|            mixnet_l             | 128  | 121.179299 |
|          ghostnet_100           | 512  | 121.012248 |
|          gmlp_s16_224           | 128  | 109.619696 |
|           volo_d1_224           |  64  | 108.78054  |
|      vit_base_patch16_224       |  64  | 106.925649 |
|          pnasnet5large          |  16  | 105.084103 |
|      beit_base_patch16_224      |  64  | 104.908788 |
|         coat_lite_mini          | 128  | 99.828342  |
|            pit_b_224            |  64  | 99.112157  |
|        res2net101_26w_4s        | 128  | 98.647077  |
| deit_base_distilled_patch16_224 |  64  | 96.822689  |
|       eca_botnext26ts_256       | 128  | 95.815818  |
|        eca_halonext26ts         | 128  | 93.122055  |
|           fbnetc_100            | 512  |  92.63881  |
|             dla102              | 128  | 85.968259  |
|           rexnet_100            | 256  | 85.395802  |
|     swsl_resnext101_32x16d      |  32  | 82.754023  |
|            hrnet_w18            | 128  | 81.682967  |
|          resmlp_12_224          | 128  | 81.047553  |
|         visformer_small         | 128  | 80.584041  |
|           regnety_002           | 1024 | 80.302905  |
|           mnasnet_100           | 512  | 77.757113  |
|           res2next50            | 128  |  76.08195  |
|        res2net50_14w_8s         | 128  | 75.829318  |
|      mobilenetv3_large_100      | 512  | 74.561922  |
|          botnet26t_256          | 128  | 71.786393  |
|        convmixer_768_32         |  32  | 71.122045  |
|           resnest101e           |  64  | 70.552736  |
|            fbnetv3_b            | 256  | 68.729334  |
|        adv_inception_v3         | 128  | 66.240802  |
|          inception_v3           | 128  |  65.84908  |
|           mobilevit_s           |  64  | 56.594495  |
|          cspdarknet53           |  64  | 45.488177  |
|            repvgg_a2            | 128  | 43.764515  |
|       tf_efficientnet_b0        | 128  | 43.405581  |
|            gernet_l             | 128  | 35.906221  |
|           selecsls42b           | 128  | 33.972311  |
|            tinynet_a            | 128  | 30.554828  |
|         mobilenetv2_100         | 128  | 22.018027  |
|          spnasnet_100           | 128  | 19.560298  |
|            lcnet_050            | 256  |  8.468473  |
+---------------------------------+------+------------+

zxd1997066 · 2024-10-15T14:51:31Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	41977a05314bbf537e1c5d6cf5916a368d1907d9
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.87x    |    1.85x    |    3.14x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.72    |    21.51    |    21.28    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.89x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 39.474501 |
|     pyhpc_equation_of_state     |    1    | 21.34384  |
|              dcgan              |    1    | 8.963236  |
|          squeezenet1_1          |    1    | 7.995893  |
|          timm_resnest           |    1    | 6.732286  |
|         opacus_cifar10          |    1    | 6.621084  |
|      functorch_dp_cifar10       |    1    | 6.537348  |
|           timm_nfnet            |    1    |  6.07254  |
|            resnet18             |    1    | 5.896968  |
|          maml_omniglot          |    5    | 5.830947  |
|            resnet50             |    1    | 5.570358  |
|       doctr_det_predictor       |    1    | 5.550533  |
|         LearningToPaint         |    1    | 5.111169  |
|         resnext50_32x4d         |    1    | 4.949384  |
|              vgg16              |    1    | 4.843257  |
|          mobilenet_v2           |    1    | 4.723079  |
|          lennard_jones          |    1    | 4.710245  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.665406  |
|     nvidia_deeprecommender      |    1    | 4.636777  |
|            resnet152            |    1    | 4.617248  |
|             yolov3              |    1    | 4.527759  |
|           mnasnet1_0            |    1    | 4.468141  |
|              llama              |    1    | 4.442262  |
|       mobilenet_v3_large        |    1    | 4.401192  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.277091  |
|             alexnet             |    1    | 4.201549  |
|           timm_vovnet           |    1    | 4.128809  |
|         vision_maskrcnn         |    1    | 3.908954  |
|      doctr_reco_predictor       |    1    | 3.890851  |
|     functorch_maml_omniglot     |    1    | 3.722187  |
|       shufflenet_v2_x1_0        |    1    | 3.714503  |
|           timm_regnet           |    1    | 3.558154  |
|           densenet121           |    1    | 3.482199  |
|              dlrm               |    1    | 3.417177  |
|          basic_gnn_gin          |    1    | 3.383027  |
|         phlippe_resnet          |    1    | 3.195612  |
|        timm_efficientnet        |    1    | 3.113245  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.019256  |
|          basic_gnn_gcn          |    1    | 2.995836  |
|         basic_gnn_sage          |    1    | 2.982495  |
|           Super_SloMo           |    1    | 2.952168  |
|          BERT_pytorch           |    1    | 2.704007  |
|        phlippe_densenet         |    1    | 2.670242  |
|          pytorch_unet           |    1    | 2.309449  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  2.22904  |
|       Background_Matting        |    1    |  2.22445  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.179976  |
|             hf_GPT2             |    1    | 2.069811  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.010839  |
|        soft_actor_critic        |   256   | 2.002049  |
|  timm_vision_transformer_large  |    1    | 1.999736  |
|     timm_vision_transformer     |    1    | 1.983949  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.888935  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  1.8365   |
|           hf_Reformer           |    1    | 1.823393  |
|          hf_Bert_large          |    1    | 1.783977  |
|             hf_Bert             |    1    | 1.756893  |
|          hf_GPT2_large          |    1    |  1.75084  |
|        basic_gnn_edgecnn        |    1    | 1.741641  |
|         pytorch_stargan         |   16    | 1.733899  |
|          hf_DistilBert          |    1    | 1.662208  |
|          fastNLP_Bert           |    1    | 1.650985  |
|             hf_Bart             |    1    | 1.530158  |
|      torch_multimodal_clip      |    1    | 1.486281  |
|            hf_Albert            |    1    | 1.448339  |
|           hf_T5_base            |    1    | 1.436864  |
|            moondream            |    1    | 1.425361  |
|       speech_transformer        |    1    | 1.416316  |
|        hf_distil_whisper        |    1    | 1.379179  |
|           hf_BigBird            |    1    | 1.322385  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.303676  |
|           hf_T5_large           |    1    | 1.231251  |
|              hf_T5              |    1    | 1.114565  |
|             demucs              |    1    | 1.050068  |
|           tts_angular           |    1    | 0.998016  |
|     resnet50_quantized_qat      |    1    | 0.991029  |
|              maml               |    1    | 0.988369  |
|   mobilenet_v2_quantized_qat    |    1    | 0.978484  |
|          hf_Longformer          |    1    | 0.885447  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 126.696711 |
|         vision_maskrcnn         |    1    | 119.523943 |
|    detectron2_fcos_r_50_fpn     |    1    | 94.655365  |
| detectron2_fasterrcnn_r_101_fpn |    1    |  92.92549  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 74.723995  |
|              maml               |    1    | 72.310864  |
|          hf_Longformer          |    1    | 70.946291  |
|           hf_T5_large           |    1    | 56.675787  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 44.393327  |
|       speech_transformer        |    1    | 44.023991  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 37.776021  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 37.603675  |
|           hf_Reformer           |    1    | 36.735075  |
|           densenet121           |    1    | 33.904833  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  33.23273  |
|  timm_vision_transformer_large  |    1    | 28.493027  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.657806  |
|            moondream            |    1    | 27.526575  |
|          fastNLP_Bert           |    1    |  26.14238  |
|          basic_gnn_gcn          |    1    |  25.55207  |
|          hf_Bert_large          |    1    |  22.77828  |
|            resnet152            |    1    |  22.37022  |
|      torch_multimodal_clip      |    1    | 22.078504  |
|        hf_distil_whisper        |    1    | 21.660902  |
|          hf_GPT2_large          |    1    | 21.584996  |
|           Super_SloMo           |    1    | 21.410786  |
|          BERT_pytorch           |    1    | 19.398871  |
|              hf_T5              |    1    | 18.326709  |
|             yolov3              |    1    | 17.959732  |
|       doctr_det_predictor       |    1    | 17.956084  |
|             hf_Bart             |    1    | 17.808867  |
|             demucs              |    1    |  16.47857  |
|        phlippe_densenet         |    1    | 16.398142  |
|           timm_regnet           |    1    | 16.313548  |
|           timm_nfnet            |    1    | 15.656729  |
|       shufflenet_v2_x1_0        |    1    |  15.53835  |
|        timm_efficientnet        |    1    | 15.477113  |
|             hf_Bert             |    1    | 14.910563  |
|              llama              |    1    | 14.880604  |
|             hf_GPT2             |    1    | 14.779648  |
|          timm_resnest           |    1    | 14.229923  |
|       mobilenet_v3_large        |    1    | 14.043834  |
|            hf_Albert            |    1    | 13.955941  |
|     timm_vision_transformer     |    1    | 13.938656  |
|      doctr_reco_predictor       |    1    | 13.891872  |
|           timm_vovnet           |    1    | 13.328073  |
|          mobilenet_v2           |    1    | 12.955263  |
|            resnet50             |    1    | 12.920164  |
|         resnext50_32x4d         |    1    | 12.867442  |
|           mnasnet1_0            |    1    |  12.59635  |
|         opacus_cifar10          |    1    |  12.18033  |
|          hf_DistilBert          |    1    | 12.066057  |
|      functorch_dp_cifar10       |    1    | 11.756145  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.614571  |
|     pyhpc_isoneutral_mixing     |    1    | 11.604796  |
|         pytorch_stargan         |   16    | 11.577004  |
|            resnet18             |    1    | 10.058818  |
|          squeezenet1_1          |    1    |  9.952164  |
|         LearningToPaint         |    1    |  9.868977  |
|         phlippe_resnet          |    1    |  9.644525  |
|           hf_T5_base            |    1    |  9.245801  |
|             alexnet             |    1    |  9.106109  |
|     pyhpc_equation_of_state     |    1    |  9.076895  |
|     functorch_maml_omniglot     |    1    |  8.942747  |
|       Background_Matting        |    1    |  8.818474  |
|              vgg16              |    1    |  8.775115  |
|          maml_omniglot          |    5    |  8.640941  |
|              dlrm               |    1    |  8.507442  |
|              dcgan              |    1    |  8.221895  |
|         basic_gnn_sage          |    1    |  8.18534   |
|          basic_gnn_gin          |    1    |  8.064291  |
|        basic_gnn_edgecnn        |    1    |  7.985476  |
|          lennard_jones          |    1    |  7.97909   |
|     nvidia_deeprecommender      |    1    |  7.975456  |
|        soft_actor_critic        |   256   |  7.927738  |
|           tts_angular           |    1    |  7.81114   |
|          pytorch_unet           |    1    |  4.162773  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199088  |
|     resnet50_quantized_qat      |    1    |  0.179525  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986276 |
|             demucs              |    1    | 0.984872 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974847 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.97339  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973253 |
|       Background_Matting        |    1    | 0.970515 |
|          pytorch_unet           |    1    | 0.966341 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.965835 |
|              llama              |    1    | 0.965732 |
|        basic_gnn_edgecnn        |    1    | 0.961758 |
|      torch_multimodal_clip      |    1    | 0.959625 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95513  |
|         vision_maskrcnn         |    1    | 0.952231 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.952133 |
|         LearningToPaint         |    1    | 0.950355 |
|       doctr_det_predictor       |    1    | 0.950089 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948921 |
|          basic_gnn_gcn          |    1    | 0.946035 |
|           hf_BigBird            |    1    | 0.944725 |
|     resnet50_quantized_qat      |    1    | 0.943983 |
|           Super_SloMo           |    1    | 0.940874 |
|      doctr_reco_predictor       |    1    | 0.940506 |
|         basic_gnn_sage          |    1    | 0.938272 |
|          basic_gnn_gin          |    1    | 0.935705 |
|             hf_GPT2             |    1    | 0.933011 |
|          fastNLP_Bert           |    1    | 0.931183 |
|         pytorch_stargan         |   16    | 0.92944  |
|             hf_Bert             |    1    | 0.927976 |
|           hf_T5_base            |    1    | 0.918404 |
|            hf_Albert            |    1    | 0.918327 |
|        hf_distil_whisper        |    1    | 0.916237 |
|          hf_DistilBert          |    1    | 0.913973 |
|       speech_transformer        |    1    | 0.911173 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.909699 |
|          BERT_pytorch           |    1    | 0.905161 |
|   mobilenet_v2_quantized_qat    |    1    | 0.90512  |
|          hf_GPT2_large          |    1    | 0.895422 |
|         opacus_cifar10          |    1    | 0.892667 |
|          hf_Longformer          |    1    | 0.88878  |
|           tts_angular           |    1    | 0.883534 |
|        soft_actor_critic        |   256   | 0.880109 |
|             hf_Bart             |    1    | 0.878812 |
|              hf_T5              |    1    | 0.876481 |
|  timm_vision_transformer_large  |    1    | 0.87095  |
|           timm_nfnet            |    1    | 0.868766 |
|        timm_efficientnet        |    1    | 0.868235 |
|          mobilenet_v2           |    1    | 0.867114 |
|          squeezenet1_1          |    1    | 0.866868 |
|     timm_vision_transformer     |    1    | 0.863971 |
|            moondream            |    1    | 0.860158 |
|              vgg16              |    1    | 0.858807 |
|          hf_Bert_large          |    1    | 0.857834 |
|          lennard_jones          |    1    | 0.857621 |
|     functorch_maml_omniglot     |    1    | 0.856693 |
|          maml_omniglot          |    5    | 0.855118 |
|             alexnet             |    1    | 0.849653 |
|       mobilenet_v3_large        |    1    | 0.84715  |
|           mnasnet1_0            |    1    | 0.846871 |
|      functorch_dp_cifar10       |    1    | 0.846707 |
|              dcgan              |    1    | 0.844477 |
|     pyhpc_equation_of_state     |    1    | 0.841322 |
|     nvidia_deeprecommender      |    1    | 0.837091 |
|          timm_resnest           |    1    | 0.835631 |
|         phlippe_resnet          |    1    | 0.832278 |
|       shufflenet_v2_x1_0        |    1    | 0.828892 |
|           hf_Reformer           |    1    | 0.824742 |
|           hf_T5_large           |    1    | 0.823065 |
|     pyhpc_isoneutral_mixing     |    1    | 0.814696 |
|        phlippe_densenet         |    1    | 0.803598 |
|         resnext50_32x4d         |    1    | 0.803376 |
|           densenet121           |    1    |   0.8    |
|           timm_vovnet           |    1    | 0.795041 |
|            resnet18             |    1    | 0.791621 |
|             yolov3              |    1    | 0.783956 |
|           timm_regnet           |    1    | 0.782988 |
|            resnet50             |    1    | 0.779703 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.76855  |
|            resnet152            |    1    | 0.737369 |
|              maml               |    1    | 0.705882 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9867.048686 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2938.03933  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2897.519195 |
|          hf_GPT2_large          |    1    | 2540.403085 |
|            moondream            |    1    | 2246.468198 |
|           hf_T5_large           |    1    | 2180.566013 |
|        hf_distil_whisper        |    1    | 1959.345619 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1682.587909 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1644.261171 |
|          pytorch_unet           |    1    | 1394.74223  |
|       Background_Matting        |    1    | 1215.109506 |
|             demucs              |    1    | 1179.697929 |
|  timm_vision_transformer_large  |    1    | 948.758964  |
|         vision_maskrcnn         |    1    | 700.662291  |
|    detectron2_fcos_r_50_fpn     |    1    | 622.785589  |
| detectron2_fasterrcnn_r_101_fpn |    1    |  599.7503   |
|           hf_BigBird            |    1    | 534.143994  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 514.494387  |
|          hf_Longformer          |    1    | 510.564942  |
|          hf_Bert_large          |    1    | 446.704376  |
|       doctr_det_predictor       |    1    | 387.513639  |
|      torch_multimodal_clip      |    1    |  321.41346  |
|           Super_SloMo           |    1    |  317.66957  |
|             hf_Bart             |    1    | 246.925706  |
|              hf_T5              |    1    | 234.620126  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 227.927624  |
|         pytorch_stargan         |   16    | 218.757105  |
|             hf_Bert             |    1    | 173.468523  |
|          fastNLP_Bert           |    1    | 165.230795  |
|       speech_transformer        |    1    | 164.213461  |
|            hf_Albert            |    1    | 160.644958  |
|             hf_GPT2             |    1    | 110.546226  |
|          hf_DistilBert          |    1    | 103.845041  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 101.906391  |
|           hf_Reformer           |    1    |  98.811132  |
|        basic_gnn_edgecnn        |    1    |  85.982428  |
|             yolov3              |    1    |  63.257888  |
|              maml               |    1    |  55.030928  |
|          BERT_pytorch           |    1    |  42.569389  |
|              vgg16              |    1    |  40.970543  |
|     nvidia_deeprecommender      |    1    |  36.663328  |
|           timm_regnet           |    1    |  32.603274  |
|           timm_nfnet            |    1    |  30.563836  |
|           tts_angular           |    1    |  30.427584  |
|            resnet152            |    1    |  29.497846  |
|          basic_gnn_gcn          |    1    |  27.568554  |
|     timm_vision_transformer     |    1    |  18.843179  |
|         resnext50_32x4d         |    1    |  13.748209  |
|           densenet121           |    1    |  13.341878  |
|           timm_vovnet           |    1    |  13.208247  |
|             alexnet             |    1    |  11.853149  |
|            resnet50             |    1    |  10.846266  |
|              llama              |    1    |  10.034843  |
|         basic_gnn_sage          |    1    |  9.770643   |
|          basic_gnn_gin          |    1    |  8.947139   |
|     resnet50_quantized_qat      |    1    |  8.507372   |
|        timm_efficientnet        |    1    |  7.884817   |
|          timm_resnest           |    1    |  6.018329   |
|   mobilenet_v2_quantized_qat    |    1    |  5.836586   |
|      doctr_reco_predictor       |    1    |  5.568268   |
|            resnet18             |    1    |   3.90792   |
|       mobilenet_v3_large        |    1    |  3.735832   |
|           mnasnet1_0            |    1    |  3.585463   |
|       shufflenet_v2_x1_0        |    1    |  3.534827   |
|          mobilenet_v2           |    1    |  3.468231   |
|         LearningToPaint         |    1    |  2.506287   |
|        phlippe_densenet         |    1    |  2.368088   |
|          squeezenet1_1          |    1    |  2.163819   |
|      functorch_dp_cifar10       |    1    |  1.631203   |
|         opacus_cifar10          |    1    |  1.629326   |
|         phlippe_resnet          |    1    |  0.809106   |
|        soft_actor_critic        |   256   |  0.601032   |
|              dcgan              |    1    |  0.588703   |
|     functorch_maml_omniglot     |    1    |  0.504841   |
|              dlrm               |    1    |  0.449662   |
|          maml_omniglot          |    5    |   0.29499   |
|     pyhpc_isoneutral_mixing     |    1    |   0.04435   |
|          lennard_jones          |    1    |  0.027468   |
|     pyhpc_equation_of_state     |    1    |  0.026012   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 4.133809 |
|     MobileBertForQuestionAnswering      | 1  | 3.299575 |
|     PegasusForConditionalGeneration     | 1  | 2.963138 |
|          BlenderbotForCausalLM          | 1  | 2.834939 |
|     M2M100ForConditionalGeneration      | 1  | 2.779314 |
|          DistilBertForMaskedLM          | 1  | 2.760739 |
|         Speech2Text2ForCausalLM         | 1  | 2.720253 |
|           PegasusForCausalLM            | 1  | 2.670919 |
|     DistilBertForQuestionAnswering      | 1  | 2.595905 |
|             XGLMForCausalLM             | 1  | 2.518472 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.514504 |
|       BlenderbotSmallForCausalLM        | 1  | 2.50324  |
|            YituTechConvBert             | 1  | 2.497016 |
|            XLNetLMHeadModel             | 1  | 2.051469 |
|         MegatronBertForCausalLM         | 1  | 2.040179 |
|           ElectraForCausalLM            | 1  | 2.005963 |
|           DebertaForMaskedLM            | 1  | 1.990822 |
|       DebertaForQuestionAnswering       | 1  | 1.970439 |
|       MT5ForConditionalGeneration       | 1  | 1.881582 |
|    MegatronBertForQuestionAnswering     | 1  |  1.8567  |
|                CamemBert                | 1  | 1.84163  |
|             BertForMaskedLM             | 1  | 1.816653 |
|           RobertaForCausalLM            | 1  | 1.81338  |
|           LayoutLMForMaskedLM           | 1  | 1.807046 |
|      GPT2ForSequenceClassification      | 1  | 1.801695 |
|       RobertaForQuestionAnswering       | 1  | 1.79034  |
|    LayoutLMForSequenceClassification    | 1  | 1.782587 |
|               DistillGPT2               | 1  | 1.758358 |
|        BertForQuestionAnswering         | 1  | 1.73795  |
|       ElectraForQuestionAnswering       | 1  | 1.733168 |
|            TrOCRForCausalLM             | 1  | 1.66488  |
|          DebertaV2ForMaskedLM           | 1  | 1.611394 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.564152 |
|             OPTForCausalLM              | 1  | 1.435854 |
|             BartForCausalLM             | 1  | 1.396126 |
|      BartForConditionalGeneration       | 1  | 1.387414 |
|     PLBartForConditionalGeneration      | 1  | 1.356049 |
|      MBartForConditionalGeneration      | 1  | 1.316852 |
|            PLBartForCausalLM            | 1  | 1.311205 |
|            MBartForCausalLM             | 1  | 1.278273 |
|               GoogleFnet                | 1  | 1.22348  |
|       AlbertForQuestionAnswering        | 1  | 1.206967 |
|            AlbertForMaskedLM            | 1  | 1.181624 |
|                 T5Small                 | 1  | 1.161938 |
|       T5ForConditionalGeneration        | 1  | 1.156663 |
|          AllenaiLongformerBase          | 1  | 0.791618 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.347242 |
|          MobileBertForMaskedLM          | 1  | 48.528588 |
|     MobileBertForQuestionAnswering      | 1  | 48.478924 |
|     PegasusForConditionalGeneration     | 1  | 32.406366 |
|     M2M100ForConditionalGeneration      | 1  | 32.116807 |
|      MBartForConditionalGeneration      | 1  | 31.408535 |
|            XLNetLMHeadModel             | 1  | 28.557965 |
|             XGLMForCausalLM             | 1  | 28.292363 |
|      BartForConditionalGeneration       | 1  | 27.850288 |
|      DebertaV2ForQuestionAnswering      | 1  | 26.865854 |
|          DebertaV2ForMaskedLM           | 1  | 26.415752 |
|          BlenderbotForCausalLM          | 1  | 25.372798 |
| BlenderbotSmallForConditionalGeneration | 1  | 24.485118 |
|         MegatronBertForCausalLM         | 1  | 23.399698 |
|       MT5ForConditionalGeneration       | 1  | 23.278715 |
|    MegatronBertForQuestionAnswering     | 1  | 23.133329 |
|            YituTechConvBert             | 1  | 21.557263 |
|     PLBartForConditionalGeneration      | 1  | 19.513469 |
|       T5ForConditionalGeneration        | 1  | 18.280622 |
|                 T5Small                 | 1  | 18.24346  |
|           PegasusForCausalLM            | 1  | 17.356122 |
|            TrOCRForCausalLM             | 1  | 17.307396 |
|            MBartForCausalLM             | 1  | 16.600125 |
|           DebertaForMaskedLM            | 1  | 16.354605 |
|       DebertaForQuestionAnswering       | 1  | 16.253732 |
|           ElectraForCausalLM            | 1  | 15.684169 |
|       ElectraForQuestionAnswering       | 1  | 15.545467 |
|           LayoutLMForMaskedLM           | 1  | 15.383457 |
|                CamemBert                | 1  | 15.248613 |
|             BertForMaskedLM             | 1  | 15.230933 |
|           RobertaForCausalLM            | 1  | 15.147755 |
|        BertForQuestionAnswering         | 1  | 15.123138 |
|       RobertaForQuestionAnswering       | 1  | 15.117455 |
|    LayoutLMForSequenceClassification    | 1  | 15.050652 |
|             BartForCausalLM             | 1  | 14.541888 |
|            AlbertForMaskedLM            | 1  | 14.52621  |
|       BlenderbotSmallForCausalLM        | 1  | 14.44173  |
|      GPT2ForSequenceClassification      | 1  | 14.286781 |
|               GoogleFnet                | 1  | 13.848875 |
|             OPTForCausalLM              | 1  | 13.644288 |
|         Speech2Text2ForCausalLM         | 1  | 13.042363 |
|          DistilBertForMaskedLM          | 1  | 12.821213 |
|     DistilBertForQuestionAnswering      | 1  | 12.632274 |
|            PLBartForCausalLM            | 1  | 12.538221 |
|       AlbertForQuestionAnswering        | 1  | 11.881624 |
|               DistillGPT2               | 1  | 11.228329 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.974361 |
|            MBartForCausalLM             | 1  | 0.967264 |
|               DistillGPT2               | 1  | 0.953971 |
|     PLBartForConditionalGeneration      | 1  | 0.940364 |
|       RobertaForQuestionAnswering       | 1  | 0.938978 |
|            YituTechConvBert             | 1  | 0.937963 |
|                CamemBert                | 1  | 0.937378 |
|           LayoutLMForMaskedLM           | 1  | 0.937031 |
|       MT5ForConditionalGeneration       | 1  | 0.936955 |
|             BertForMaskedLM             | 1  | 0.936738 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.93565  |
|          BlenderbotForCausalLM          | 1  | 0.933706 |
|             BartForCausalLM             | 1  | 0.932805 |
|            PLBartForCausalLM            | 1  | 0.932487 |
|           PegasusForCausalLM            | 1  | 0.93241  |
|     M2M100ForConditionalGeneration      | 1  | 0.928096 |
|           DebertaForMaskedLM            | 1  |  0.925   |
|            TrOCRForCausalLM             | 1  | 0.924627 |
|           RobertaForCausalLM            | 1  | 0.922934 |
|       T5ForConditionalGeneration        | 1  | 0.921694 |
|                 T5Small                 | 1  | 0.921268 |
|    LayoutLMForSequenceClassification    | 1  | 0.920954 |
|        BertForQuestionAnswering         | 1  | 0.919131 |
|      GPT2ForSequenceClassification      | 1  | 0.917319 |
|             XGLMForCausalLM             | 1  | 0.91679  |
|       BlenderbotSmallForCausalLM        | 1  | 0.916411 |
|      BartForConditionalGeneration       | 1  | 0.909321 |
|          DebertaV2ForMaskedLM           | 1  | 0.901277 |
|          AllenaiLongformerBase          | 1  | 0.900144 |
|      MBartForConditionalGeneration      | 1  | 0.896278 |
|           ElectraForCausalLM            | 1  | 0.895246 |
|         MegatronBertForCausalLM         | 1  | 0.886535 |
|    MegatronBertForQuestionAnswering     | 1  | 0.886205 |
|       ElectraForQuestionAnswering       | 1  | 0.879142 |
|          DistilBertForMaskedLM          | 1  | 0.878839 |
|     PegasusForConditionalGeneration     | 1  | 0.874523 |
|            XLNetLMHeadModel             | 1  | 0.87331  |
|     DistilBertForQuestionAnswering      | 1  | 0.872035 |
|       DebertaForQuestionAnswering       | 1  | 0.867575 |
|         Speech2Text2ForCausalLM         | 1  | 0.857469 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.851477 |
|          MobileBertForMaskedLM          | 1  | 0.777425 |
|     MobileBertForQuestionAnswering      | 1  | 0.742378 |
|               GoogleFnet                | 1  | 0.688397 |
|            AlbertForMaskedLM            | 1  | 0.686645 |
|       AlbertForQuestionAnswering        | 1  | 0.668792 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|       AlbertForQuestionAnswering        | 1  | 3844.132506 |
|            AlbertForMaskedLM            | 1  | 3824.952908 |
|             OPTForCausalLM              | 1  | 2194.973555 |
|      MBartForConditionalGeneration      | 1  | 1781.073844 |
|      BartForConditionalGeneration       | 1  | 1386.410047 |
|          DebertaV2ForMaskedLM           | 1  | 1316.504143 |
|          AllenaiLongformerBase          | 1  | 1133.741363 |
|      DebertaV2ForQuestionAnswering      | 1  | 1022.053369 |
|            MBartForCausalLM             | 1  | 978.936551  |
|            XLNetLMHeadModel             | 1  | 963.762291  |
|                 T5Small                 | 1  | 803.525539  |
|       T5ForConditionalGeneration        | 1  | 801.930156  |
|          BlenderbotForCausalLM          | 1  | 761.709163  |
|     PLBartForConditionalGeneration      | 1  | 651.520594  |
|             BartForCausalLM             | 1  | 608.403637  |
|         MegatronBertForCausalLM         | 1  | 508.076771  |
|               GoogleFnet                | 1  |  466.59841  |
|    MegatronBertForQuestionAnswering     | 1  | 465.252583  |
|            PLBartForCausalLM            | 1  | 383.956929  |
|      GPT2ForSequenceClassification      | 1  | 372.426126  |
|             XGLMForCausalLM             | 1  |  278.76757  |
|     M2M100ForConditionalGeneration      | 1  | 211.226587  |
|           RobertaForCausalLM            | 1  | 208.509231  |
|           DebertaForMaskedLM            | 1  | 205.427843  |
|            YituTechConvBert             | 1  | 197.951088  |
|       MT5ForConditionalGeneration       | 1  | 193.386777  |
|           LayoutLMForMaskedLM           | 1  | 183.091003  |
|            TrOCRForCausalLM             | 1  | 182.591259  |
|             BertForMaskedLM             | 1  | 180.798924  |
|                CamemBert                | 1  | 180.551288  |
|     PegasusForConditionalGeneration     | 1  |  177.61501  |
|               DistillGPT2               | 1  | 147.218005  |
|       RobertaForQuestionAnswering       | 1  | 146.352082  |
|    LayoutLMForSequenceClassification    | 1  | 143.151059  |
|        BertForQuestionAnswering         | 1  |  142.87206  |
|       DebertaForQuestionAnswering       | 1  | 140.723384  |
|           PegasusForCausalLM            | 1  |  88.870716  |
| BlenderbotSmallForConditionalGeneration | 1  |   52.1957   |
|           ElectraForCausalLM            | 1  |  47.556971  |
|          DistilBertForMaskedLM          | 1  |  30.65866   |
|          MobileBertForMaskedLM          | 1  |  29.518621  |
|       ElectraForQuestionAnswering       | 1  |  28.922989  |
|       BlenderbotSmallForCausalLM        | 1  |  28.201102  |
|     DistilBertForQuestionAnswering      | 1  |  19.274946  |
|     MobileBertForQuestionAnswering      | 1  |  18.330116  |
|         Speech2Text2ForCausalLM         | 1  |  5.421875   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.438306 |
|       gluon_inception_v3        | 1  | 6.370375 |
|        ese_vovnet19b_dw         | 1  | 6.283866 |
|        adv_inception_v3         | 1  | 6.008234 |
|           resnest101e           | 1  | 5.810844 |
|          pnasnet5large          | 1  | 5.374755 |
|           dm_nfnet_f0           | 1  | 5.165818 |
|            nfnet_l0             | 1  | 5.134364 |
|          cspdarknet53           | 1  | 5.114333 |
|           res2next50            | 1  | 4.966017 |
|         mobilenetv2_100         | 1  | 4.789139 |
|            fbnetv3_b            | 1  | 4.69819  |
|           fbnetc_100            | 1  | 4.690128 |
|     swsl_resnext101_32x16d      | 1  | 4.644332 |
|             dla102              | 1  | 4.622818 |
|          spnasnet_100           | 1  | 4.572975 |
|           mnasnet_100           | 1  | 4.512873 |
|      mobilenetv3_large_100      | 1  | 4.496087 |
|           selecsls42b           | 1  | 4.300359 |
|          botnet26t_256          | 1  | 4.288673 |
|            hrnet_w18            | 1  | 4.257189 |
|        res2net50_14w_8s         | 1  | 4.252434 |
|            gernet_l             | 1  | 4.138924 |
|        res2net101_26w_4s        | 1  | 4.121945 |
|           regnety_002           | 1  | 4.021307 |
|            repvgg_a2            | 1  | 3.841616 |
|          ghostnet_100           | 1  | 3.829273 |
|       eca_botnext26ts_256       | 1  | 3.565526 |
|        eca_halonext26ts         | 1  | 3.470282 |
|            levit_128            | 1  | 3.436479 |
|            lcnet_050            | 1  | 3.351102 |
|            tinynet_a            | 1  | 3.318278 |
|           rexnet_100            | 1  | 3.251256 |
|         poolformer_m36          | 1  | 3.231363 |
|       tf_efficientnet_b0        | 1  | 3.118291 |
|         visformer_small         | 1  | 3.102076 |
|           mobilevit_s           | 1  | 3.01542  |
|             dpn107              | 1  | 2.883435 |
|        convmixer_768_32         | 1  | 2.637504 |
|         coat_lite_mini          | 1  | 2.52865  |
|        twins_pcpvt_base         | 1  | 2.438921 |
|           volo_d1_224           | 1  | 2.431383 |
|           tf_mixnet_l           | 1  | 2.228595 |
|            mixnet_l             | 1  | 2.214951 |
|          gmixer_24_224          | 1  | 2.182626 |
|  swin_base_patch4_window7_224   | 1  | 1.917052 |
|      xcit_large_24_p8_224       | 1  | 1.89117  |
|          gmlp_s16_224           | 1  | 1.861849 |
|      beit_base_patch16_224      | 1  | 1.820413 |
|          jx_nest_base           | 1  | 1.818802 |
|        tnt_s_patch16_224        | 1  | 1.814396 |
|           convit_base           | 1  | 1.808313 |
|            pit_b_224            | 1  | 1.803481 |
|         crossvit_9_240          | 1  | 1.76066  |
|          convnext_base          | 1  | 1.708703 |
|          resmlp_12_224          | 1  | 1.678041 |
|      vit_base_patch16_224       | 1  | 1.662293 |
| deit_base_distilled_patch16_224 | 1  | 1.53243  |
|          cait_m36_384           | 1  | 1.531681 |
|          mixer_b16_224          | 1  |  1.4521  |
|        sebotnet33ts_256         | 1  | 1.242512 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|            hrnet_w18            | 1  | 44.326581 |
|      xcit_large_24_p8_224       | 1  | 43.501378 |
|          pnasnet5large          | 1  | 43.316475 |
|         poolformer_m36          | 1  | 42.546299 |
|          cait_m36_384           | 1  | 39.592563 |
|  swin_base_patch4_window7_224   | 1  | 37.998232 |
|        res2net101_26w_4s        | 1  | 31.842885 |
|          jx_nest_base           | 1  | 31.501948 |
|        twins_pcpvt_base         | 1  | 30.622361 |
|           resnest101e           | 1  | 30.336777 |
|        res2net50_14w_8s         | 1  | 30.198682 |
|        tnt_s_patch16_224        | 1  | 28.652008 |
|             dpn107              | 1  | 27.544836 |
|           tf_mixnet_l           | 1  | 26.760429 |
|        adv_inception_v3         | 1  | 25.659609 |
|           mobilevit_s           | 1  | 25.175053 |
|            mixnet_l             | 1  | 25.099011 |
|           volo_d1_224           | 1  | 23.910658 |
|          inception_v3           | 1  | 22.962647 |
|       gluon_inception_v3        | 1  | 22.95503  |
|            levit_128            | 1  | 21.947572 |
|          gmixer_24_224          | 1  | 21.908439 |
|         crossvit_9_240          | 1  | 21.598604 |
|          gmlp_s16_224           | 1  | 20.78881  |
|           res2next50            | 1  | 20.478776 |
|          convnext_base          | 1  | 20.450462 |
|            fbnetv3_b            | 1  | 19.853975 |
|        eca_halonext26ts         | 1  | 19.81981  |
|        sebotnet33ts_256         | 1  | 19.257474 |
|             dla102              | 1  | 18.951486 |
|           rexnet_100            | 1  | 18.777487 |
|          ghostnet_100           | 1  | 18.587513 |
|         coat_lite_mini          | 1  | 18.356308 |
|           convit_base           | 1  | 17.483777 |
|            tinynet_a            | 1  | 16.827643 |
|       eca_botnext26ts_256       | 1  | 16.823738 |
|     swsl_resnext101_32x16d      | 1  | 16.772527 |
|         visformer_small         | 1  | 16.771357 |
|       tf_efficientnet_b0        | 1  | 15.947468 |
|        convmixer_768_32         | 1  | 15.804649 |
|          botnet26t_256          | 1  | 15.565351 |
|           dm_nfnet_f0           | 1  | 15.42484  |
|          cspdarknet53           | 1  | 15.394854 |
|            pit_b_224            | 1  | 14.648551 |
|          mixer_b16_224          | 1  | 14.371469 |
|      beit_base_patch16_224      | 1  | 14.299464 |
|            nfnet_l0             | 1  | 14.207846 |
| deit_base_distilled_patch16_224 | 1  | 13.889842 |
|           regnety_002           | 1  | 13.867375 |
|      mobilenetv3_large_100      | 1  | 13.827244 |
|            repvgg_a2            | 1  | 13.814849 |
|      vit_base_patch16_224       | 1  | 13.720487 |
|          spnasnet_100           | 1  | 13.643758 |
|           fbnetc_100            | 1  | 13.594979 |
|            gernet_l             | 1  | 13.102554 |
|         mobilenetv2_100         | 1  | 12.789884 |
|        ese_vovnet19b_dw         | 1  | 12.566953 |
|           mnasnet_100           | 1  | 12.492938 |
|           selecsls42b           | 1  | 12.133504 |
|          resmlp_12_224          | 1  | 11.874183 |
|            lcnet_050            | 1  | 11.257784 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.916015 |
|          pnasnet5large          | 1  | 0.909748 |
|          convnext_base          | 1  | 0.908288 |
|      beit_base_patch16_224      | 1  | 0.895166 |
|        convmixer_768_32         | 1  | 0.893734 |
|          cait_m36_384           | 1  | 0.892706 |
|          resmlp_12_224          | 1  | 0.891415 |
|         poolformer_m36          | 1  | 0.890205 |
|           dm_nfnet_f0           | 1  | 0.890049 |
|        ese_vovnet19b_dw         | 1  | 0.887215 |
|           convit_base           | 1  | 0.879279 |
|           volo_d1_224           | 1  | 0.878206 |
|         coat_lite_mini          | 1  | 0.874735 |
|  swin_base_patch4_window7_224   | 1  | 0.874348 |
|         mobilenetv2_100         | 1  | 0.874204 |
|            pit_b_224            | 1  | 0.872787 |
|          mixer_b16_224          | 1  | 0.872057 |
|         visformer_small         | 1  | 0.86988  |
|        twins_pcpvt_base         | 1  | 0.868121 |
|          jx_nest_base           | 1  | 0.867902 |
|      vit_base_patch16_224       | 1  | 0.867875 |
| deit_base_distilled_patch16_224 | 1  | 0.866975 |
|          gmlp_s16_224           | 1  | 0.866953 |
|      mobilenetv3_large_100      | 1  | 0.862762 |
|           mobilevit_s           | 1  | 0.862326 |
|            lcnet_050            | 1  | 0.862023 |
|          gmixer_24_224          | 1  | 0.860274 |
|       tf_efficientnet_b0        | 1  | 0.860242 |
|           mnasnet_100           | 1  | 0.85409  |
|            fbnetv3_b            | 1  | 0.853836 |
|           rexnet_100            | 1  | 0.852815 |
|          spnasnet_100           | 1  | 0.851068 |
|           fbnetc_100            | 1  | 0.849622 |
|            tinynet_a            | 1  | 0.849552 |
|      xcit_large_24_p8_224       | 1  | 0.848169 |
|        sebotnet33ts_256         | 1  | 0.847259 |
|       eca_botnext26ts_256       | 1  | 0.844574 |
|        eca_halonext26ts         | 1  | 0.843736 |
|          botnet26t_256          | 1  | 0.841588 |
|           tf_mixnet_l           | 1  | 0.840154 |
|             dpn107              | 1  | 0.838993 |
|        tnt_s_patch16_224        | 1  | 0.838319 |
|            mixnet_l             | 1  | 0.836062 |
|          ghostnet_100           | 1  | 0.834897 |
|           regnety_002           | 1  | 0.828597 |
|         crossvit_9_240          | 1  | 0.828373 |
|            levit_128            | 1  | 0.822894 |
|           res2next50            | 1  | 0.800526 |
|          cspdarknet53           | 1  | 0.800277 |
|       gluon_inception_v3        | 1  | 0.789328 |
|          inception_v3           | 1  | 0.788286 |
|             dla102              | 1  | 0.787635 |
|        adv_inception_v3         | 1  | 0.787591 |
|        res2net50_14w_8s         | 1  | 0.782597 |
|           resnest101e           | 1  | 0.778047 |
|           selecsls42b           | 1  | 0.772829 |
|            gernet_l             | 1  | 0.769802 |
|            repvgg_a2            | 1  | 0.765349 |
|            hrnet_w18            | 1  | 0.764003 |
|        res2net101_26w_4s        | 1  | 0.754501 |
|     swsl_resnext101_32x16d      | 1  | 0.707266 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1617.742956 |
|      xcit_large_24_p8_224       | 1  | 433.387225  |
|          pnasnet5large          | 1  | 129.568811  |
|          jx_nest_base           | 1  | 109.108006  |
|          convnext_base          | 1  |  99.232162  |
|  swin_base_patch4_window7_224   | 1  |  94.758502  |
|     swsl_resnext101_32x16d      | 1  |  93.064561  |
|           convit_base           | 1  |  87.026431  |
| deit_base_distilled_patch16_224 | 1  |  80.512521  |
|             dpn107              | 1  |  73.489624  |
|      beit_base_patch16_224      | 1  |  73.316698  |
|      vit_base_patch16_224       | 1  |  72.966738  |
|        convmixer_768_32         | 1  |  69.734746  |
|            pit_b_224            | 1  |  64.664831  |
|          mixer_b16_224          | 1  |  61.22764   |
|        sebotnet33ts_256         | 1  |  49.227845  |
|         poolformer_m36          | 1  |  49.158902  |
|        twins_pcpvt_base         | 1  |  48.876236  |
|        tnt_s_patch16_224        | 1  |  42.08395   |
|           volo_d1_224           | 1  |  41.363984  |
|           dm_nfnet_f0           | 1  |  41.358354  |
|           resnest101e           | 1  |  31.367141  |
|          gmlp_s16_224           | 1  |  28.382088  |
|        res2net101_26w_4s        | 1  |  28.171004  |
|            nfnet_l0             | 1  |  28.019467  |
|          gmixer_24_224          | 1  |  26.409387  |
|         visformer_small         | 1  |  22.739385  |
|            hrnet_w18            | 1  |  22.647095  |
|           mobilevit_s           | 1  |  20.737096  |
|           tf_mixnet_l           | 1  |  20.123632  |
|             dla102              | 1  |  19.453906  |
|            mixnet_l             | 1  |  19.402096  |
|          resmlp_12_224          | 1  |  17.459043  |
|        res2net50_14w_8s         | 1  |  17.397108  |
|          cspdarknet53           | 1  |  16.499135  |
|        eca_halonext26ts         | 1  |  15.941372  |
|        adv_inception_v3         | 1  |  15.627672  |
|          inception_v3           | 1  |  15.170652  |
|       gluon_inception_v3        | 1  |  15.092633  |
|       eca_botnext26ts_256       | 1  |  14.93758   |
|         coat_lite_mini          | 1  |  14.841442  |
|           res2next50            | 1  |  14.460615  |
|         crossvit_9_240          | 1  |  13.985388  |
|            repvgg_a2            | 1  |  12.747991  |
|            gernet_l             | 1  |  12.398735  |
|          botnet26t_256          | 1  |  12.025657  |
|           selecsls42b           | 1  |  10.94204   |
|       tf_efficientnet_b0        | 1  |  8.428007   |
|        ese_vovnet19b_dw         | 1  |  8.043648   |
|           rexnet_100            | 1  |  7.811133   |
|            fbnetv3_b            | 1  |  7.773908   |
|            tinynet_a            | 1  |  7.205865   |
|            levit_128            | 1  |  5.448801   |
|          ghostnet_100           | 1  |  4.870704   |
|           fbnetc_100            | 1  |  4.295196   |
|          spnasnet_100           | 1  |  3.994927   |
|      mobilenetv3_large_100      | 1  |  3.728738   |
|           mnasnet_100           | 1  |  3.609564   |
|         mobilenetv2_100         | 1  |   3.47363   |
|           regnety_002           | 1  |  3.308265   |
|            lcnet_050            | 1  |  1.818572   |
+---------------------------------+----+-------------+

WeizhuoZhang-intel · 2024-10-15T15:19:37Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	41977a05314bbf537e1c5d6cf5916a368d1907d9
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.01x    |    2.17x    |    2.65x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   23.76    |    31.45    |    32.31    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.430854 |
|          timm_resnest           |   32    | 4.679659  |
|            resnet50             |   32    | 3.981684  |
|         phlippe_resnet          |   128   | 3.624717  |
|          squeezenet1_1          |   16    | 3.564052  |
|              vgg16              |    4    | 3.294681  |
|           mnasnet1_0            |   32    | 3.278206  |
|         resnext50_32x4d         |    8    | 3.227359  |
|            resnet152            |   32    | 3.222495  |
|          mobilenet_v2           |   16    | 3.194963  |
|           timm_vovnet           |   32    | 3.127951  |
|             yolov3              |    8    | 3.116059  |
|             alexnet             |   128   |  2.99046  |
|             hf_GPT2             |    1    | 2.896465  |
|            resnet18             |    8    |  2.8929   |
|       mobilenet_v3_large        |   32    | 2.858244  |
|          hf_Bert_large          |    1    | 2.817742  |
|             hf_Bert             |    1    | 2.738909  |
|           timm_nfnet            |   128   | 2.652113  |
|              llama              |   32    | 2.618494  |
|       shufflenet_v2_x1_0        |   64    | 2.593009  |
|       doctr_det_predictor       |    1    | 2.585158  |
|           timm_regnet           |   32    | 2.561526  |
|           densenet121           |   64    | 2.492244  |
|           hf_T5_base            |    1    | 2.419834  |
|        timm_efficientnet        |   64    | 2.405756  |
|          BERT_pytorch           |    2    | 2.379739  |
|        soft_actor_critic        |   256   | 2.336215  |
|          hf_DistilBert          |    1    | 2.310941  |
|          lennard_jones          |  1000   |  2.27577  |
|            hf_Albert            |    1    | 2.193154  |
|     functorch_maml_omniglot     |    1    | 2.182404  |
|             hf_Bart             |    1    | 2.179599  |
|              hf_T5              |    1    |  2.09258  |
|          fastNLP_Bert           |    1    | 2.077965  |
|        phlippe_densenet         |   128   | 2.072045  |
|            moondream            |    1    | 2.060181  |
|         LearningToPaint         |   96    | 1.980267  |
|          hf_GPT2_large          |    1    |  1.88783  |
|          hf_Longformer          |    1    | 1.848732  |
|      doctr_reco_predictor       |    1    | 1.820785  |
|           hf_T5_large           |    1    | 1.818403  |
|              dcgan              |   256   | 1.806298  |
|          pytorch_unet           |    1    |  1.75208  |
|       Background_Matting        |    1    | 1.729733  |
|     timm_vision_transformer     |   32    | 1.622496  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.604506  |
|        basic_gnn_edgecnn        |    1    | 1.588187  |
|        hf_distil_whisper        |    1    | 1.579291  |
|       speech_transformer        |    1    | 1.542698  |
|          maml_omniglot          |    5    | 1.526035  |
|         pytorch_stargan         |   16    |  1.50983  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.47735  |
|           hf_Reformer           |    1    |  1.44784  |
|  timm_vision_transformer_large  |   32    | 1.442858  |
|          basic_gnn_gin          |    1    |  1.36431  |
|     nvidia_deeprecommender      |   256   |  1.36205  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.346393  |
|         vision_maskrcnn         |    1    | 1.336157  |
|         basic_gnn_sage          |    1    | 1.278777  |
|          basic_gnn_gcn          |    1    | 1.273331  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.248889  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.234653  |
|           Super_SloMo           |    6    |  1.21235  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.196073  |
|      torch_multimodal_clip      |   32    | 1.163241  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.145877  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.134253  |
|              dlrm               |  2048   | 1.130909  |
|             demucs              |    1    | 1.083709  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.081255  |
|         opacus_cifar10          |   64    | 1.066345  |
|           hf_BigBird            |    1    | 1.064048  |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.063946  |
|      functorch_dp_cifar10       |   64    | 1.011906  |
|   mobilenet_v2_quantized_qat    |   96    |  0.9991   |
|     resnet50_quantized_qat      |   32    | 0.980899  |
|           tts_angular           |   64    | 0.980688  |
|              maml               |    1    | 0.842575  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 129.972552 |
|           hf_BigBird            |    1    | 128.337871 |
|    detectron2_fcos_r_50_fpn     |    1    | 101.773658 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 96.449007  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 88.864533  |
|              maml               |    1    | 72.937244  |
|          hf_Longformer          |    1    | 70.545816  |
|           hf_T5_large           |    1    | 62.442026  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  60.97462  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 60.360369  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 49.178648  |
|           densenet121           |   64    | 46.841199  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  45.23897  |
|       speech_transformer        |    1    | 44.417254  |
|  timm_vision_transformer_large  |   32    | 42.460381  |
|           hf_Reformer           |    1    | 37.809357  |
|      torch_multimodal_clip      |   32    | 34.739064  |
|            moondream            |    1    |  34.49479  |
|           Super_SloMo           |    6    | 34.203919  |
|            resnet152            |   32    | 34.136957  |
|     pyhpc_isoneutral_mixing     | 1048576 | 33.518762  |
|          hf_GPT2_large          |    1    | 32.601162  |
|           hf_T5_base            |    1    | 28.584834  |
|          BERT_pytorch           |    2    | 28.158127  |
|             yolov3              |    8    | 27.857015  |
|       doctr_det_predictor       |    1    | 26.973194  |
|          fastNLP_Bert           |    1    | 26.806874  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 26.553844  |
|          basic_gnn_gcn          |    1    | 26.188798  |
|        hf_distil_whisper        |    1    | 25.565427  |
|          hf_Bert_large          |    1    | 24.019149  |
|           timm_regnet           |   32    | 23.821752  |
|        timm_efficientnet        |   64    | 23.077985  |
|        phlippe_densenet         |   128   | 21.667968  |
|       mobilenet_v3_large        |   32    | 21.213868  |
|           timm_nfnet            |   128   | 20.546524  |
|       shufflenet_v2_x1_0        |   64    | 20.519013  |
|          mobilenet_v2           |   16    | 19.155621  |
|             hf_Bart             |    1    | 18.992436  |
|              hf_T5              |    1    | 18.988645  |
|     timm_vision_transformer     |   32    | 18.335866  |
|          timm_resnest           |   32    | 17.928349  |
|           timm_vovnet           |   32    | 17.775115  |
|         resnext50_32x4d         |    8    |  17.42224  |
|            resnet50             |   32    | 17.171836  |
|             demucs              |    1    | 16.876473  |
|       Background_Matting        |    1    |  16.45264  |
|           mnasnet1_0            |   32    | 16.216858  |
|              llama              |   32    | 15.683633  |
|             hf_Bert             |    1    | 15.636326  |
|         opacus_cifar10          |   64    | 15.610297  |
|             hf_GPT2             |    1    | 15.594877  |
|         pytorch_stargan         |   16    | 15.278408  |
|      functorch_dp_cifar10       |   64    | 15.035814  |
|            hf_Albert            |    1    | 14.568964  |
|      doctr_reco_predictor       |    1    | 14.239971  |
|          pytorch_unet           |    1    |  13.33385  |
|          hf_DistilBert          |    1    |  12.6303   |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.391388  |
|            resnet18             |    8    | 12.031754  |
|          squeezenet1_1          |   16    | 11.895224  |
|         LearningToPaint         |   96    | 11.853387  |
|         phlippe_resnet          |   128   | 11.410962  |
|              vgg16              |    4    | 10.467749  |
|             alexnet             |   128   | 10.270027  |
|     pyhpc_equation_of_state     | 1048576 |  9.877415  |
|          maml_omniglot          |    5    |  9.55749   |
|     functorch_maml_omniglot     |    1    |  9.145348  |
|              dcgan              |   256   |  8.906673  |
|              dlrm               |  2048   |  8.754847  |
|     nvidia_deeprecommender      |   256   |  8.478406  |
|          basic_gnn_gin          |    1    |  8.442973  |
|        basic_gnn_edgecnn        |    1    |  8.430458  |
|         basic_gnn_sage          |    1    |  8.415653  |
|        soft_actor_critic        |   256   |  8.273643  |
|          lennard_jones          |  1000   |  8.090856  |
|           tts_angular           |   64    |  8.071545  |
|   mobilenet_v2_quantized_qat    |   96    |  0.221774  |
|     resnet50_quantized_qat      |   32    |  0.199944  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.987704 |
|           timm_nfnet            |   128   | 0.98627  |
|             demucs              |    1    | 0.980993 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974107 |
|           timm_regnet           |   32    | 0.973044 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972789 |
|       Background_Matting        |    1    | 0.970344 |
|           Super_SloMo           |    6    | 0.96947  |
|      torch_multimodal_clip      |   32    | 0.969165 |
|        timm_efficientnet        |   64    | 0.968237 |
|              llama              |   32    | 0.967923 |
|       doctr_det_predictor       |    1    | 0.966091 |
|          pytorch_unet           |    1    | 0.964687 |
|        basic_gnn_edgecnn        |    1    | 0.963756 |
|           densenet121           |   64    | 0.963666 |
|         LearningToPaint         |   96    | 0.962264 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95957  |
|            resnet152            |   32    | 0.95931  |
|            resnet50             |   32    | 0.95894  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.958284 |
|           timm_vovnet           |   32    | 0.957914 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.95777  |
|             yolov3              |    8    | 0.956085 |
|     resnet50_quantized_qat      |   32    | 0.954052 |
|          timm_resnest           |   32    | 0.952756 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.952462 |
|         vision_maskrcnn         |    1    | 0.950319 |
|   mobilenet_v2_quantized_qat    |   96    | 0.948971 |
|     timm_vision_transformer     |   32    | 0.948893 |
|      doctr_reco_predictor       |    1    | 0.942982 |
|          basic_gnn_gcn          |    1    | 0.942668 |
|           hf_BigBird            |    1    | 0.939074 |
|     pyhpc_equation_of_state     | 1048576 | 0.937681 |
|           mnasnet1_0            |   32    | 0.937574 |
|          basic_gnn_gin          |    1    | 0.936798 |
|         basic_gnn_sage          |    1    | 0.934174 |
|         resnext50_32x4d         |    8    | 0.93133  |
|  timm_vision_transformer_large  |   32    | 0.931045 |
|             hf_Bert             |    1    | 0.929354 |
|         pytorch_stargan         |   16    | 0.927187 |
|       shufflenet_v2_x1_0        |   64    | 0.924928 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.922561 |
|          fastNLP_Bert           |    1    | 0.921675 |
|       mobilenet_v3_large        |   32    | 0.921625 |
|          mobilenet_v2           |   16    | 0.921305 |
|     nvidia_deeprecommender      |   256   | 0.91947  |
|           hf_T5_base            |    1    | 0.916457 |
|       speech_transformer        |    1    | 0.90904  |
|        phlippe_densenet         |   128   | 0.908964 |
|            hf_Albert            |    1    | 0.908257 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.902158 |
|          BERT_pytorch           |    2    | 0.900562 |
|              dcgan              |   256   | 0.899306 |
|          hf_DistilBert          |    1    | 0.898876 |
|             hf_GPT2             |    1    | 0.896509 |
|          squeezenet1_1          |   16    | 0.892324 |
|         opacus_cifar10          |   64    | 0.890675 |
|          hf_Longformer          |    1    | 0.888831 |
|           tts_angular           |   64    | 0.885157 |
|             hf_Bart             |    1    | 0.881358 |
|        hf_distil_whisper        |    1    | 0.881224 |
|              vgg16              |    4    | 0.880482 |
|              hf_T5              |    1    | 0.880116 |
|          hf_GPT2_large          |    1    | 0.879237 |
|        soft_actor_critic        |   256   | 0.878682 |
|      functorch_dp_cifar10       |   64    | 0.873341 |
|         phlippe_resnet          |   128   | 0.868102 |
|          lennard_jones          |  1000   | 0.860927 |
|     functorch_maml_omniglot     |    1    | 0.859621 |
|          maml_omniglot          |    5    | 0.855799 |
|          hf_Bert_large          |    1    | 0.855099 |
|            moondream            |    1    | 0.841269 |
|           hf_T5_large           |    1    | 0.83224  |
|            resnet18             |    8    | 0.816268 |
|           hf_Reformer           |    1    | 0.810405 |
|             alexnet             |   128   | 0.765522 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.763294 |
|              maml               |    1    | 0.722965 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.567161 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 821.884397 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 794.913918 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 711.12422  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 707.448924 |
|  timm_vision_transformer_large  |   32    | 674.722482 |
|           hf_T5_base            |    1    | 353.160457 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 252.831397 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 245.236278 |
|         vision_maskrcnn         |    1    | 237.288146 |
|           Super_SloMo           |    6    | 202.198854 |
|          hf_GPT2_large          |    1    | 133.934175 |
|           hf_T5_large           |    1    | 126.990793 |
|           timm_nfnet            |   128   | 102.305246 |
|            moondream            |    1    | 101.523559 |
|        hf_distil_whisper        |    1    | 100.852373 |
|           hf_BigBird            |    1    | 94.940979  |
|    detectron2_fcos_r_50_fpn     |    1    |  75.82558  |
|          pytorch_unet           |    1    |  66.33064  |
|       Background_Matting        |    1    | 64.280739  |
|      torch_multimodal_clip      |   32    |  61.56661  |
|              maml               |    1    | 54.310913  |
|           densenet121           |   64    | 53.058871  |
|             demucs              |    1    | 50.750418  |
|           timm_regnet           |   32    | 42.691692  |
|            resnet152            |   32    | 37.134203  |
|       doctr_det_predictor       |    1    | 36.730259  |
|          hf_Longformer          |    1    | 33.177726  |
|             yolov3              |    8    | 32.510462  |
|          hf_Bert_large          |    1    | 32.225664  |
|   mobilenet_v2_quantized_qat    |   96    | 29.974071  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.443807  |
|     pyhpc_isoneutral_mixing     | 1048576 | 26.737248  |
|        timm_efficientnet        |   64    | 26.641008  |
|       speech_transformer        |    1    | 24.159637  |
|     timm_vision_transformer     |   32    | 21.692822  |
|           hf_Reformer           |    1    | 21.360893  |
|              hf_T5              |    1    | 18.732206  |
|           timm_vovnet           |   32    |  18.62552  |
|             hf_Bart             |    1    | 18.365195  |
|     nvidia_deeprecommender      |   256   | 17.742949  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.647419  |
|         pytorch_stargan         |   16    | 16.593966  |
|          fastNLP_Bert           |    1    | 16.543891  |
|            resnet50             |   32    | 14.507719  |
|             hf_Bert             |    1    | 13.225744  |
|            hf_Albert            |    1    | 12.673041  |
|          BERT_pytorch           |    2    | 12.196205  |
|     resnet50_quantized_qat      |   32    | 12.023545  |
|             hf_GPT2             |    1    | 10.244393  |
|       shufflenet_v2_x1_0        |   64    | 10.045183  |
|        phlippe_densenet         |   128   |  9.592468  |
|          timm_resnest           |   32    |  9.455212  |
|         resnext50_32x4d         |    8    |  9.279902  |
|              llama              |   32    |  8.915136  |
|       mobilenet_v3_large        |   32    |  8.79992   |
|           tts_angular           |   64    |  8.74351   |
|         LearningToPaint         |   96    |  8.503127  |
|         opacus_cifar10          |   64    |  8.367308  |
|          hf_DistilBert          |    1    |  8.139182  |
|      functorch_dp_cifar10       |   64    |  7.980637  |
|          basic_gnn_gcn          |    1    |  7.532919  |
|              vgg16              |    4    |  7.283745  |
|           mnasnet1_0            |   32    |  7.19101   |
|        basic_gnn_edgecnn        |    1    |  7.103572  |
|             alexnet             |   128   |  6.90138   |
|          mobilenet_v2           |   16    |  5.637201  |
|         basic_gnn_sage          |    1    |  4.302037  |
|              dlrm               |  2048   |  4.261952  |
|          squeezenet1_1          |   16    |  3.818207  |
|          basic_gnn_gin          |    1    |  3.621734  |
|            resnet18             |    8    |  3.581258  |
|      doctr_reco_predictor       |    1    |  3.351632  |
|              dcgan              |   256   |  2.990298  |
|         phlippe_resnet          |   128   |  1.646589  |
|     pyhpc_equation_of_state     | 1048576 |  1.171795  |
|          maml_omniglot          |    5    |  0.497174  |
|     functorch_maml_omniglot     |    1    |  0.37616   |
|        soft_actor_critic        |   256   |  0.26192   |
|          lennard_jones          |  1000   |  0.172593  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.404372 |
|       ElectraForQuestionAnswering       | 64  | 4.126653  |
|           ElectraForCausalLM            | 32  | 3.669443  |
|     MobileBertForQuestionAnswering      | 128 | 3.301904  |
|    LayoutLMForSequenceClassification    | 16  | 3.049819  |
|           RobertaForCausalLM            | 16  | 3.048253  |
|        BertForQuestionAnswering         | 16  | 3.040037  |
|       RobertaForQuestionAnswering       | 16  | 2.996742  |
|          MobileBertForMaskedLM          | 128 | 2.940626  |
|                CamemBert                | 16  |  2.91755  |
|           LayoutLMForMaskedLM           | 16  | 2.772222  |
|             BertForMaskedLM             | 16  | 2.750139  |
|    MegatronBertForQuestionAnswering     |  8  | 2.644746  |
|               DistillGPT2               | 16  | 2.453965  |
|         MegatronBertForCausalLM         |  4  | 2.433372  |
|       T5ForConditionalGeneration        |  4  | 2.367426  |
|      GPT2ForSequenceClassification      |  4  | 2.295115  |
|                 T5Small                 |  4  | 2.251014  |
|            YituTechConvBert             | 16  | 2.243919  |
|       DebertaForQuestionAnswering       | 16  | 2.176728  |
|      DebertaV2ForQuestionAnswering      |  1  | 2.170274  |
|             XGLMForCausalLM             |  8  | 2.074639  |
|             OPTForCausalLM              |  2  | 2.061426  |
|           DebertaForMaskedLM            |  8  | 2.058519  |
|         Speech2Text2ForCausalLM         | 256 | 1.989208  |
|          BlenderbotForCausalLM          |  4  | 1.898654  |
|       MT5ForConditionalGeneration       | 16  | 1.884963  |
|       BlenderbotSmallForCausalLM        | 64  | 1.866641  |
|     DistilBertForQuestionAnswering      | 256 | 1.843238  |
|            PLBartForCausalLM            |  8  | 1.786736  |
|          DebertaV2ForMaskedLM           |  2  | 1.759468  |
|            MBartForCausalLM             |  4  | 1.750766  |
|     PLBartForConditionalGeneration      |  4  |  1.72251  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.715125  |
|          DistilBertForMaskedLM          | 128 | 1.706559  |
|            TrOCRForCausalLM             | 32  | 1.666671  |
|     M2M100ForConditionalGeneration      | 16  | 1.658986  |
|     PegasusForConditionalGeneration     | 32  | 1.646346  |
|      MBartForConditionalGeneration      |  2  |  1.62345  |
|           PegasusForCausalLM            | 32  | 1.619849  |
|      BartForConditionalGeneration       |  2  | 1.524439  |
|             BartForCausalLM             |  4  | 1.510885  |
|            AlbertForMaskedLM            |  4  | 1.454551  |
|       AlbertForQuestionAnswering        |  4  | 1.428764  |
|               GoogleFnet                | 16  | 1.322256  |
|          AllenaiLongformerBase          |  4  | 1.037977  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 113.096154 |
|          MobileBertForMaskedLM          | 128 | 69.966003  |
|     MobileBertForQuestionAnswering      | 128 | 69.823254  |
|      MBartForConditionalGeneration      |  2  | 51.611125  |
|     M2M100ForConditionalGeneration      | 16  | 51.526017  |
|     PegasusForConditionalGeneration     | 32  | 51.509707  |
|      BartForConditionalGeneration       |  2  |  43.52001  |
|             XGLMForCausalLM             |  8  | 42.227335  |
|          DebertaV2ForMaskedLM           |  2  | 42.031573  |
|          BlenderbotForCausalLM          |  4  | 41.762193  |
|       MT5ForConditionalGeneration       | 16  | 37.998833  |
| BlenderbotSmallForConditionalGeneration | 64  | 36.667674  |
|            XLNetLMHeadModel             |  8  | 36.531632  |
|         MegatronBertForCausalLM         |  4  | 35.733142  |
|    MegatronBertForQuestionAnswering     |  8  | 35.248612  |
|            YituTechConvBert             | 16  |  33.93187  |
|     PLBartForConditionalGeneration      |  4  | 29.361941  |
|      DebertaV2ForQuestionAnswering      |  1  | 29.071387  |
|                 T5Small                 |  4  | 28.298021  |
|       T5ForConditionalGeneration        |  4  | 28.005665  |
|           DebertaForMaskedLM            |  8  |  25.24168  |
|       DebertaForQuestionAnswering       | 16  | 25.226039  |
|           PegasusForCausalLM            | 32  | 24.735605  |
|             OPTForCausalLM              |  2  | 24.596153  |
|            MBartForCausalLM             |  4  | 24.513643  |
|            TrOCRForCausalLM             | 32  | 24.143785  |
|            AlbertForMaskedLM            |  4  |  22.65372  |
|           ElectraForCausalLM            | 32  | 21.476027  |
|       RobertaForQuestionAnswering       | 16  | 21.339536  |
|                CamemBert                | 16  | 21.199822  |
|      GPT2ForSequenceClassification      |  4  | 21.176747  |
|           LayoutLMForMaskedLM           | 16  | 21.133235  |
|           RobertaForCausalLM            | 16  | 20.995617  |
|        BertForQuestionAnswering         | 16  | 20.994873  |
|             BartForCausalLM             |  4  | 20.922104  |
|       ElectraForQuestionAnswering       | 64  | 20.894393  |
|    LayoutLMForSequenceClassification    | 16  | 20.817997  |
|             BertForMaskedLM             | 16  | 20.736514  |
|       AlbertForQuestionAnswering        |  4  | 19.994395  |
|       BlenderbotSmallForCausalLM        | 64  | 19.452189  |
|          DistilBertForMaskedLM          | 128 | 16.824042  |
|     DistilBertForQuestionAnswering      | 256 | 16.677508  |
|         Speech2Text2ForCausalLM         | 256 | 16.378878  |
|            PLBartForCausalLM            |  8  | 16.328497  |
|               GoogleFnet                | 16  | 15.612075  |
|               DistillGPT2               | 16  | 14.594746  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            PLBartForCausalLM            |  8  | 0.99113  |
|       AlbertForQuestionAnswering        |  4  | 0.990882 |
|               GoogleFnet                | 16  | 0.990293 |
|               DistillGPT2               | 16  | 0.989791 |
|             BertForMaskedLM             | 16  | 0.98942  |
|          DistilBertForMaskedLM          | 128 | 0.989144 |
|           ElectraForCausalLM            | 32  | 0.988903 |
|            AlbertForMaskedLM            |  4  | 0.987808 |
|                CamemBert                | 16  | 0.98729  |
|       ElectraForQuestionAnswering       | 64  | 0.986955 |
|           RobertaForCausalLM            | 16  | 0.986652 |
|     DistilBertForQuestionAnswering      | 256 | 0.986563 |
|             OPTForCausalLM              |  2  | 0.986223 |
|         Speech2Text2ForCausalLM         | 256 | 0.984997 |
|            YituTechConvBert             | 16  | 0.984648 |
|           LayoutLMForMaskedLM           | 16  | 0.984538 |
|       BlenderbotSmallForCausalLM        | 64  | 0.984431 |
|        BertForQuestionAnswering         | 16  | 0.983802 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.981419 |
|    LayoutLMForSequenceClassification    | 16  | 0.981054 |
|            TrOCRForCausalLM             | 32  | 0.980776 |
|       RobertaForQuestionAnswering       | 16  | 0.979292 |
|       DebertaForQuestionAnswering       | 16  | 0.976767 |
|          MobileBertForMaskedLM          | 128 | 0.97634  |
|                 T5Small                 |  4  | 0.97626  |
|       T5ForConditionalGeneration        |  4  | 0.976072 |
|            XLNetLMHeadModel             |  8  | 0.973686 |
|       MT5ForConditionalGeneration       | 16  | 0.973284 |
|      GPT2ForSequenceClassification      |  4  | 0.97291  |
|          AllenaiLongformerBase          |  4  | 0.966426 |
|           PegasusForCausalLM            | 32  | 0.966172 |
|     MobileBertForQuestionAnswering      | 128 | 0.964819 |
|           DebertaForMaskedLM            |  8  | 0.962408 |
|     PLBartForConditionalGeneration      |  4  | 0.960866 |
|     M2M100ForConditionalGeneration      | 16  | 0.949795 |
|          BlenderbotForCausalLM          |  4  | 0.940241 |
|         MegatronBertForCausalLM         |  4  | 0.936935 |
|          DebertaV2ForMaskedLM           |  2  | 0.935308 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.934866 |
|    MegatronBertForQuestionAnswering     |  8  | 0.93466  |
|      BartForConditionalGeneration       |  2  | 0.927283 |
|             BartForCausalLM             |  4  | 0.92293  |
|      MBartForConditionalGeneration      |  2  | 0.920428 |
|             XGLMForCausalLM             |  8  | 0.91343  |
|     PegasusForConditionalGeneration     | 32  | 0.912989 |
|            MBartForCausalLM             |  4  | 0.900763 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 966.052978 |
|       AlbertForQuestionAnswering        |  4  | 498.88194  |
|            AlbertForMaskedLM            |  4  | 495.753379 |
|            XLNetLMHeadModel             |  8  | 414.816323 |
|               GoogleFnet                | 16  | 281.987115 |
|             OPTForCausalLM              |  2  | 197.251415 |
|            TrOCRForCausalLM             | 32  | 171.079532 |
|      MBartForConditionalGeneration      |  2  | 167.244869 |
|            MBartForCausalLM             |  4  | 166.459916 |
|     PegasusForConditionalGeneration     | 32  | 163.475449 |
|            YituTechConvBert             | 16  | 133.223933 |
|     DistilBertForQuestionAnswering      | 256 | 132.213447 |
|      BartForConditionalGeneration       |  2  | 131.213024 |
|            PLBartForCausalLM            |  8  | 130.940792 |
|    MegatronBertForQuestionAnswering     |  8  | 129.870288 |
|                 T5Small                 |  4  | 127.658848 |
|       T5ForConditionalGeneration        |  4  |  122.7466  |
|          BlenderbotForCausalLM          |  4  | 121.272284 |
|          DebertaV2ForMaskedLM           |  2  | 120.806518 |
|     PLBartForConditionalGeneration      |  4  | 118.526559 |
|     M2M100ForConditionalGeneration      | 16  | 118.260652 |
|          DistilBertForMaskedLM          | 128 | 113.704365 |
| BlenderbotSmallForConditionalGeneration | 64  | 105.906381 |
|          MobileBertForMaskedLM          | 128 | 103.278252 |
|           RobertaForCausalLM            | 16  | 102.819485 |
|           LayoutLMForMaskedLM           | 16  | 100.756818 |
|             BertForMaskedLM             | 16  | 99.322012  |
|                CamemBert                | 16  | 95.471238  |
|             BartForCausalLM             |  4  | 92.250903  |
|       MT5ForConditionalGeneration       | 16  | 90.691077  |
|       DebertaForQuestionAnswering       | 16  | 89.873567  |
|         MegatronBertForCausalLM         |  4  |  83.74154  |
|           PegasusForCausalLM            | 32  | 77.892252  |
|       RobertaForQuestionAnswering       | 16  | 76.632281  |
|    LayoutLMForSequenceClassification    | 16  | 76.510695  |
|        BertForQuestionAnswering         | 16  | 76.082096  |
|               DistillGPT2               | 16  | 69.531288  |
|             XGLMForCausalLM             |  8  | 69.094485  |
|           DebertaForMaskedLM            |  8  | 66.487767  |
|      DebertaV2ForQuestionAnswering      |  1  | 63.112951  |
|       ElectraForQuestionAnswering       | 64  | 62.524527  |
|      GPT2ForSequenceClassification      |  4  | 58.164426  |
|         Speech2Text2ForCausalLM         | 256 | 57.453232  |
|           ElectraForCausalLM            | 32  | 56.670025  |
|       BlenderbotSmallForCausalLM        | 64  | 56.065077  |
|     MobileBertForQuestionAnswering      | 128 | 53.220044  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|          inception_v3           | 128  | 4.859302 |
|           resnest101e           |  64  | 4.839155 |
|           mnasnet_100           | 512  | 4.814097 |
|           fbnetc_100            | 512  | 4.773991 |
|       gluon_inception_v3        | 256  | 4.746673 |
|        adv_inception_v3         | 128  | 4.714396 |
|           regnety_002           | 1024 | 4.443584 |
|          cspdarknet53           |  64  | 4.373435 |
|      mobilenetv3_large_100      | 512  | 4.355431 |
|            fbnetv3_b            | 256  | 4.336895 |
|        ese_vovnet19b_dw         | 256  | 4.207168 |
|         mobilenetv2_100         | 128  | 4.196146 |
|           res2next50            | 128  | 4.018193 |
|            lcnet_050            | 256  | 3.98654  |
|        res2net50_14w_8s         | 128  | 3.885845 |
|        res2net101_26w_4s        | 128  | 3.867165 |
|          spnasnet_100           | 128  | 3.833757 |
|            hrnet_w18            | 128  | 3.833608 |
|           rexnet_100            | 256  | 3.765681 |
|          botnet26t_256          | 128  | 3.731904 |
|             dla102              | 128  | 3.645492 |
|          pnasnet5large          |  16  | 3.593747 |
|            nfnet_l0             | 128  | 3.565718 |
|            gernet_l             | 128  | 3.427954 |
|     swsl_resnext101_32x16d      |  32  | 3.302465 |
|           volo_d1_224           |  64  | 3.301374 |
|       eca_botnext26ts_256       | 128  | 3.044268 |
|        eca_halonext26ts         | 128  | 3.041339 |
|           dm_nfnet_f0           | 128  | 3.032063 |
|       tf_efficientnet_b0        | 128  | 2.973032 |
|            tinynet_a            | 128  | 2.941433 |
|        convmixer_768_32         |  32  | 2.843388 |
|            repvgg_a2            | 128  | 2.711512 |
|           selecsls42b           | 128  | 2.591714 |
|         visformer_small         | 128  | 2.511253 |
|           mobilevit_s           |  64  | 2.468254 |
|          ghostnet_100           | 512  | 2.449255 |
|         poolformer_m36          |  64  | 2.257244 |
|          jx_nest_base           |  32  | 2.015381 |
|      xcit_large_24_p8_224       |  16  | 1.963391 |
|             dpn107              |  64  |  1.932   |
|            levit_128            | 1024 | 1.887554 |
|            mixnet_l             | 128  | 1.880497 |
|          convnext_base          |  64  | 1.87646  |
|           tf_mixnet_l           | 128  | 1.835905 |
|         coat_lite_mini          | 128  | 1.834179 |
|        tnt_s_patch16_224        | 128  | 1.701725 |
|        twins_pcpvt_base         | 128  | 1.641681 |
|          gmlp_s16_224           | 128  | 1.621346 |
|          mixer_b16_224          | 128  | 1.59217  |
|  swin_base_patch4_window7_224   |  64  | 1.589145 |
|           convit_base           |  64  | 1.584548 |
| deit_base_distilled_patch16_224 |  64  | 1.57474  |
|      beit_base_patch16_224      |  64  | 1.511184 |
|         crossvit_9_240          | 256  | 1.502601 |
|      vit_base_patch16_224       |  64  | 1.445661 |
|          resmlp_12_224          | 128  | 1.443766 |
|            pit_b_224            |  64  | 1.437264 |
|        sebotnet33ts_256         |  64  | 1.393703 |
|          gmixer_24_224          | 128  | 1.331316 |
|          cait_m36_384           |  4   | 0.990541 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|      xcit_large_24_p8_224       |  16  | 77.048414 |
|          cait_m36_384           |  4   | 74.771344 |
|            hrnet_w18            | 128  | 70.368243 |
|          pnasnet5large          |  16  | 69.181212 |
|  swin_base_patch4_window7_224   |  64  | 57.035419 |
|         poolformer_m36          |  64  | 54.640806 |
|           mobilevit_s           |  64  | 51.218152 |
|        res2net101_26w_4s        | 128  | 49.745424 |
|          jx_nest_base           |  32  | 48.616643 |
|        tnt_s_patch16_224        | 128  | 48.289176 |
|           resnest101e           |  64  | 47.984113 |
|        res2net50_14w_8s         | 128  | 47.219782 |
|             dpn107              |  64  | 45.384325 |
|        twins_pcpvt_base         | 128  | 43.784516 |
|           volo_d1_224           |  64  | 42.868527 |
|           tf_mixnet_l           | 128  | 40.772844 |
|            levit_128            | 1024 | 37.75963  |
|            mixnet_l             | 128  | 37.450418 |
|            fbnetv3_b            | 256  | 36.388698 |
|        eca_halonext26ts         | 128  | 35.787696 |
|        sebotnet33ts_256         |  64  | 35.610618 |
|          gmixer_24_224          | 128  | 34.542805 |
|        adv_inception_v3         | 128  | 34.386084 |
|         crossvit_9_240          | 256  | 31.874722 |
|          inception_v3           | 128  | 31.618545 |
|          gmlp_s16_224           | 128  | 31.200216 |
|       gluon_inception_v3        | 256  | 30.044054 |
|           res2next50            | 128  | 29.074723 |
|       eca_botnext26ts_256       | 128  | 28.740153 |
|          convnext_base          |  64  | 28.409433 |
|           convit_base           |  64  | 28.229206 |
|           rexnet_100            | 256  | 27.901798 |
|             dla102              | 128  | 27.357499 |
|         coat_lite_mini          | 128  | 27.039902 |
|          botnet26t_256          | 128  | 26.117131 |
|          ghostnet_100           | 512  | 26.090349 |
|            tinynet_a            | 128  | 25.958327 |
|     swsl_resnext101_32x16d      |  32  | 24.85889  |
|       tf_efficientnet_b0        | 128  | 23.473641 |
|         visformer_small         | 128  | 22.780292 |
|          cspdarknet53           |  64  | 22.029673 |
|        convmixer_768_32         |  32  | 21.371649 |
|           dm_nfnet_f0           | 128  | 19.853335 |
|            pit_b_224            |  64  | 19.795364 |
|      mobilenetv3_large_100      | 512  | 19.596068 |
|          mixer_b16_224          | 128  | 19.473753 |
|         mobilenetv2_100         | 128  | 18.736879 |
|      beit_base_patch16_224      |  64  | 18.645375 |
|           regnety_002           | 1024 | 18.583896 |
| deit_base_distilled_patch16_224 |  64  | 18.423978 |
|          spnasnet_100           | 128  | 18.218836 |
|      vit_base_patch16_224       |  64  | 18.204916 |
|            nfnet_l0             | 128  | 18.060323 |
|            repvgg_a2            | 128  | 17.664764 |
|            gernet_l             | 128  | 17.485522 |
|           fbnetc_100            | 512  | 16.682311 |
|            lcnet_050            | 256  | 16.665224 |
|          resmlp_12_224          | 128  | 15.852203 |
|           selecsls42b           | 128  | 15.46414  |
|           mnasnet_100           | 512  | 14.919795 |
|        ese_vovnet19b_dw         | 256  | 13.66967  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995291 |
|           fbnetc_100            | 512  | 0.992669 |
|           regnety_002           | 1024 | 0.99218  |
|           dm_nfnet_f0           | 128  | 0.991964 |
|      mobilenetv3_large_100      | 512  | 0.991777 |
|           rexnet_100            | 256  | 0.991662 |
|       gluon_inception_v3        | 256  | 0.991218 |
|           mnasnet_100           | 512  | 0.991055 |
|            levit_128            | 1024 | 0.989976 |
|            nfnet_l0             | 128  | 0.989346 |
|          ghostnet_100           | 512  | 0.989304 |
|            fbnetv3_b            | 256  | 0.988473 |
|             dpn107              |  64  | 0.988338 |
|           convit_base           |  64  | 0.987749 |
|      beit_base_patch16_224      |  64  | 0.987417 |
|       eca_botnext26ts_256       | 128  | 0.987382 |
|      xcit_large_24_p8_224       |  16  | 0.987315 |
|             dla102              | 128  | 0.987256 |
|          mixer_b16_224          | 128  | 0.987218 |
|          gmlp_s16_224           | 128  | 0.986266 |
|        eca_halonext26ts         | 128  | 0.986092 |
|        convmixer_768_32         |  32  | 0.98559  |
|           res2next50            | 128  | 0.985443 |
|        twins_pcpvt_base         | 128  | 0.985228 |
|        res2net101_26w_4s        | 128  | 0.985177 |
|            mixnet_l             | 128  | 0.985106 |
|           tf_mixnet_l           | 128  | 0.985093 |
|         visformer_small         | 128  | 0.984957 |
|           resnest101e           |  64  | 0.984407 |
|          botnet26t_256          | 128  | 0.984157 |
|          convnext_base          |  64  | 0.984088 |
|        tnt_s_patch16_224        | 128  | 0.98394  |
|         poolformer_m36          |  64  | 0.983146 |
|         coat_lite_mini          | 128  | 0.982897 |
|        adv_inception_v3         | 128  | 0.982724 |
|          inception_v3           | 128  | 0.982624 |
|       tf_efficientnet_b0        | 128  | 0.982128 |
|            pit_b_224            |  64  | 0.981615 |
|        res2net50_14w_8s         | 128  | 0.98153  |
| deit_base_distilled_patch16_224 |  64  |  0.9815  |
|          pnasnet5large          |  16  | 0.981236 |
|          gmixer_24_224          | 128  | 0.981156 |
|         crossvit_9_240          | 256  | 0.981096 |
|  swin_base_patch4_window7_224   |  64  | 0.981046 |
|          jx_nest_base           |  32  | 0.980992 |
|      vit_base_patch16_224       |  64  | 0.979835 |
|          cspdarknet53           |  64  | 0.979735 |
|            gernet_l             | 128  | 0.97957  |
|         mobilenetv2_100         | 128  | 0.979154 |
|           mobilevit_s           |  64  | 0.979148 |
|          resmlp_12_224          | 128  | 0.977958 |
|            tinynet_a            | 128  | 0.976298 |
|     swsl_resnext101_32x16d      |  32  | 0.975217 |
|           volo_d1_224           |  64  | 0.974858 |
|            hrnet_w18            | 128  | 0.974804 |
|           selecsls42b           | 128  | 0.974751 |
|          spnasnet_100           | 128  | 0.972993 |
|          cait_m36_384           |  4   | 0.96912  |
|            repvgg_a2            | 128  | 0.968895 |
|            lcnet_050            | 256  | 0.967252 |
|        sebotnet33ts_256         |  64  | 0.832919 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 457.294731 |
|      xcit_large_24_p8_224       |  16  | 255.427523 |
|           convit_base           |  64  | 226.909486 |
|             dpn107              |  64  | 226.074041 |
|            levit_128            | 1024 | 209.815933 |
|        tnt_s_patch16_224        | 128  | 190.337673 |
|           dm_nfnet_f0           | 128  | 177.547003 |
|          convnext_base          |  64  |  176.3134  |
|        ese_vovnet19b_dw         | 256  | 167.009454 |
|  swin_base_patch4_window7_224   |  64  | 158.367035 |
|          mixer_b16_224          | 128  | 150.051314 |
|          jx_nest_base           |  32  | 149.882016 |
|        twins_pcpvt_base         | 128  | 147.691173 |
|         poolformer_m36          |  64  | 144.005446 |
|       gluon_inception_v3        | 256  | 142.918311 |
|            nfnet_l0             | 128  | 140.419004 |
|        sebotnet33ts_256         |  64  | 133.764246 |
|           tf_mixnet_l           | 128  | 132.698529 |
|            mixnet_l             | 128  | 125.546231 |
|          ghostnet_100           | 512  | 123.765144 |
|         crossvit_9_240          | 256  | 121.201358 |
|          gmixer_24_224          | 128  | 119.646405 |
|          gmlp_s16_224           | 128  | 117.358191 |
|          pnasnet5large          |  16  | 113.409861 |
|           volo_d1_224           |  64  | 110.257329 |
|      vit_base_patch16_224       |  64  | 107.031503 |
|      beit_base_patch16_224      |  64  | 106.310382 |
|        res2net101_26w_4s        | 128  | 103.382459 |
|         coat_lite_mini          | 128  | 101.328099 |
| deit_base_distilled_patch16_224 |  64  | 99.826582  |
|            pit_b_224            |  64  | 99.572722  |
|        eca_halonext26ts         | 128  | 96.934568  |
|           fbnetc_100            | 512  | 95.198046  |
|       eca_botnext26ts_256       | 128  | 94.540677  |
|     swsl_resnext101_32x16d      |  32  | 94.015649  |
|             dla102              | 128  |  90.22814  |
|            hrnet_w18            | 128  | 89.877108  |
|         visformer_small         | 128  | 84.843217  |
|           regnety_002           | 1024 | 84.427715  |
|          resmlp_12_224          | 128  | 81.798667  |
|        res2net50_14w_8s         | 128  | 80.457753  |
|          botnet26t_256          | 128  | 80.357795  |
|           res2next50            | 128  | 80.025234  |
|           mnasnet_100           | 512  | 79.674782  |
|           resnest101e           |  64  | 78.569994  |
|      mobilenetv3_large_100      | 512  | 78.398607  |
|           rexnet_100            | 256  | 78.209393  |
|        convmixer_768_32         |  32  | 74.692626  |
|           mobilevit_s           |  64  |  71.91661  |
|            fbnetv3_b            | 256  | 71.510942  |
|        adv_inception_v3         | 128  | 69.950468  |
|          inception_v3           | 128  | 69.132278  |
|          cspdarknet53           |  64  | 48.659902  |
|            repvgg_a2            | 128  | 46.645947  |
|       tf_efficientnet_b0        | 128  | 46.463341  |
|            gernet_l             | 128  | 39.258232  |
|           selecsls42b           | 128  | 35.256959  |
|            tinynet_a            | 128  | 33.924686  |
|         mobilenetv2_100         | 128  | 23.671914  |
|          spnasnet_100           | 128  | 20.967575  |
|            lcnet_050            | 256  |  9.372826  |
+---------------------------------+------+------------+

WeizhuoZhang-intel · 2024-10-15T15:19:39Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-13 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	41977a05314bbf537e1c5d6cf5916a368d1907d9
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.83x    |    1.85x    |    3.13x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.77    |    21.53    |    21.29    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.89x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 37.956148 |
|     pyhpc_equation_of_state     |    1    | 21.014033 |
|              dcgan              |    1    | 9.044944  |
|          squeezenet1_1          |    1    | 8.014638  |
|          timm_resnest           |    1    | 6.719509  |
|      functorch_dp_cifar10       |    1    |  6.58717  |
|         opacus_cifar10          |    1    | 6.541963  |
|           timm_nfnet            |    1    |  6.01067  |
|            resnet18             |    1    | 5.872883  |
|            resnet50             |    1    | 5.517933  |
|       doctr_det_predictor       |    1    | 5.494983  |
|         resnext50_32x4d         |    1    | 5.161174  |
|         LearningToPaint         |    1    | 5.156646  |
|          mobilenet_v2           |    1    | 4.739976  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.695904  |
|          lennard_jones          |    1    | 4.694618  |
|              vgg16              |    1    | 4.693627  |
|            resnet152            |    1    | 4.515312  |
|             yolov3              |    1    | 4.509194  |
|           mnasnet1_0            |    1    | 4.410949  |
|       mobilenet_v3_large        |    1    | 4.375337  |
|           timm_vovnet           |    1    | 4.344688  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.321906  |
|              llama              |    1    | 4.276569  |
|     nvidia_deeprecommender      |    1    | 4.185987  |
|             alexnet             |    1    | 4.102572  |
|         vision_maskrcnn         |    1    |  3.9398   |
|      doctr_reco_predictor       |    1    | 3.907565  |
|     functorch_maml_omniglot     |    1    | 3.686899  |
|       shufflenet_v2_x1_0        |    1    | 3.663526  |
|           densenet121           |    1    | 3.610906  |
|           timm_regnet           |    1    | 3.524393  |
|          basic_gnn_gin          |    1    | 3.389155  |
|              dlrm               |    1    | 3.375823  |
|         phlippe_resnet          |    1    | 3.193605  |
|        timm_efficientnet        |    1    | 3.138478  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.024257  |
|         basic_gnn_sage          |    1    | 2.972248  |
|           Super_SloMo           |    1    | 2.964218  |
|          basic_gnn_gcn          |    1    | 2.941872  |
|          maml_omniglot          |    5    | 2.723368  |
|          BERT_pytorch           |    1    | 2.723063  |
|        phlippe_densenet         |    1    | 2.720009  |
|          pytorch_unet           |    1    | 2.302336  |
|       Background_Matting        |    1    | 2.238955  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.212066  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  2.18823  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.025194  |
|             hf_GPT2             |    1    | 2.007814  |
|  timm_vision_transformer_large  |    1    | 1.994902  |
|     timm_vision_transformer     |    1    | 1.951666  |
|        soft_actor_critic        |   256   | 1.923838  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.893503  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.831672  |
|           hf_Reformer           |    1    | 1.822782  |
|          hf_GPT2_large          |    1    | 1.764299  |
|             hf_Bert             |    1    |  1.75786  |
|        basic_gnn_edgecnn        |    1    |  1.75018  |
|          hf_Bert_large          |    1    | 1.726678  |
|         pytorch_stargan         |   16    |  1.71629  |
|          hf_DistilBert          |    1    | 1.643029  |
|          fastNLP_Bert           |    1    | 1.621271  |
|             hf_Bart             |    1    | 1.524625  |
|      torch_multimodal_clip      |    1    | 1.457792  |
|            hf_Albert            |    1    | 1.443871  |
|           hf_T5_base            |    1    | 1.434982  |
|            moondream            |    1    | 1.418727  |
|       speech_transformer        |    1    | 1.409093  |
|        hf_distil_whisper        |    1    | 1.384691  |
|           hf_BigBird            |    1    | 1.320868  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.29867  |
|           hf_T5_large           |    1    |  1.23336  |
|              hf_T5              |    1    |  1.11853  |
|             demucs              |    1    | 1.052665  |
|           tts_angular           |    1    | 0.997986  |
|     resnet50_quantized_qat      |    1    | 0.989724  |
|              maml               |    1    | 0.986617  |
|   mobilenet_v2_quantized_qat    |    1    | 0.978738  |
|          hf_Longformer          |    1    | 0.886081  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 126.623854 |
|         vision_maskrcnn         |    1    | 119.457556 |
|    detectron2_fcos_r_50_fpn     |    1    | 94.681348  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 85.840752  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 74.509933  |
|              maml               |    1    | 72.288065  |
|          hf_Longformer          |    1    | 70.773877  |
|           hf_T5_large           |    1    | 56.585271  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 51.618343  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 47.890866  |
|       speech_transformer        |    1    | 44.000365  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 37.446848  |
|           hf_Reformer           |    1    | 36.771319  |
|           densenet121           |    1    | 33.973773  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 33.504788  |
|  timm_vision_transformer_large  |    1    | 28.546654  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.846541  |
|            moondream            |    1    | 27.459875  |
|          fastNLP_Bert           |    1    | 26.197948  |
|          basic_gnn_gcn          |    1    | 25.585044  |
|          hf_Bert_large          |    1    | 22.833251  |
|            resnet152            |    1    | 22.321853  |
|      torch_multimodal_clip      |    1    | 22.072462  |
|        hf_distil_whisper        |    1    | 21.815001  |
|          hf_GPT2_large          |    1    | 21.497832  |
|           Super_SloMo           |    1    | 21.361765  |
|          BERT_pytorch           |    1    | 19.387642  |
|              hf_T5              |    1    |  18.21561  |
|       doctr_det_predictor       |    1    | 17.965643  |
|             yolov3              |    1    | 17.894073  |
|             hf_Bart             |    1    | 17.825747  |
|             demucs              |    1    | 16.527905  |
|           timm_regnet           |    1    | 16.312829  |
|        phlippe_densenet         |    1    |  16.29047  |
|           timm_nfnet            |    1    | 15.643509  |
|       shufflenet_v2_x1_0        |    1    |  15.54544  |
|        timm_efficientnet        |    1    | 15.518977  |
|             hf_Bert             |    1    | 14.881018  |
|              llama              |    1    |  14.86649  |
|             hf_GPT2             |    1    | 14.706987  |
|          timm_resnest           |    1    | 14.268849  |
|       mobilenet_v3_large        |    1    |  14.08464  |
|         pytorch_stargan         |   16    | 14.056389  |
|            hf_Albert            |    1    | 13.922972  |
|     timm_vision_transformer     |    1    | 13.904051  |
|      doctr_reco_predictor       |    1    | 13.897322  |
|           timm_vovnet           |    1    | 13.337073  |
|          mobilenet_v2           |    1    | 12.982006  |
|            resnet50             |    1    | 12.942676  |
|         resnext50_32x4d         |    1    | 12.866012  |
|           mnasnet1_0            |    1    | 12.563228  |
|         opacus_cifar10          |    1    | 12.155224  |
|          hf_DistilBert          |    1    | 12.061079  |
|      functorch_dp_cifar10       |    1    | 11.725037  |
|     pyhpc_isoneutral_mixing     |    1    | 11.622068  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  11.5923   |
|            resnet18             |    1    | 10.046985  |
|          squeezenet1_1          |    1    | 10.005516  |
|         LearningToPaint         |    1    |  9.864055  |
|         phlippe_resnet          |    1    |  9.657318  |
|          maml_omniglot          |    5    |  9.348154  |
|           hf_T5_base            |    1    |  9.337374  |
|             alexnet             |    1    |  9.181532  |
|     pyhpc_equation_of_state     |    1    |  9.066752  |
|     functorch_maml_omniglot     |    1    |  8.93985   |
|              vgg16              |    1    |  8.775354  |
|       Background_Matting        |    1    |  8.700753  |
|              dlrm               |    1    |  8.499861  |
|              dcgan              |    1    |  8.207086  |
|         basic_gnn_sage          |    1    |  8.170221  |
|          basic_gnn_gin          |    1    |  8.106275  |
|        soft_actor_critic        |   256   |  8.088481  |
|          lennard_jones          |    1    |  8.005051  |
|        basic_gnn_edgecnn        |    1    |  7.971254  |
|     nvidia_deeprecommender      |    1    |  7.927321  |
|           tts_angular           |    1    |  7.826443  |
|          pytorch_unet           |    1    |  4.256495  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199093  |
|     resnet50_quantized_qat      |    1    |  0.18028   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986275 |
|             demucs              |    1    | 0.985009 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.97537  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973715 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.973577 |
|       Background_Matting        |    1    | 0.970734 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.965632 |
|          pytorch_unet           |    1    | 0.965458 |
|              llama              |    1    | 0.96502  |
|        basic_gnn_edgecnn        |    1    | 0.961758 |
|      torch_multimodal_clip      |    1    | 0.95853  |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954584 |
|       doctr_det_predictor       |    1    | 0.951717 |
|         vision_maskrcnn         |    1    | 0.951179 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.950624 |
|         LearningToPaint         |    1    | 0.949836 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948344 |
|           hf_BigBird            |    1    | 0.944991 |
|     resnet50_quantized_qat      |    1    | 0.944041 |
|          basic_gnn_gcn          |    1    | 0.943894 |
|      doctr_reco_predictor       |    1    | 0.942332 |
|           Super_SloMo           |    1    | 0.940893 |
|         basic_gnn_sage          |    1    | 0.93956  |
|          basic_gnn_gin          |    1    | 0.935792 |
|          fastNLP_Bert           |    1    | 0.931183 |
|             hf_Bert             |    1    | 0.927667 |
|         pytorch_stargan         |   16    |  0.9255  |
|           hf_T5_base            |    1    | 0.918406 |
|        hf_distil_whisper        |    1    | 0.917325 |
|            hf_Albert            |    1    | 0.917164 |
|          hf_DistilBert          |    1    | 0.912972 |
|       speech_transformer        |    1    | 0.910615 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.910224 |
|          BERT_pytorch           |    1    | 0.906416 |
|              hf_T5              |    1    | 0.904422 |
|   mobilenet_v2_quantized_qat    |    1    | 0.903614 |
|             hf_GPT2             |    1    | 0.897245 |
|          hf_GPT2_large          |    1    | 0.897158 |
|          hf_Longformer          |    1    | 0.88919  |
|           tts_angular           |    1    | 0.88814  |
|         opacus_cifar10          |    1    | 0.887712 |
|        soft_actor_critic        |   256   | 0.881471 |
|             hf_Bart             |    1    | 0.87894  |
|  timm_vision_transformer_large  |    1    | 0.870956 |
|           timm_nfnet            |    1    | 0.869679 |
|        timm_efficientnet        |    1    | 0.868235 |
|          mobilenet_v2           |    1    | 0.867114 |
|          squeezenet1_1          |    1    | 0.864253 |
|     timm_vision_transformer     |    1    | 0.863971 |
|            moondream            |    1    | 0.860168 |
|              vgg16              |    1    | 0.858857 |
|          hf_Bert_large          |    1    | 0.858428 |
|          lennard_jones          |    1    | 0.857621 |
|     functorch_maml_omniglot     |    1    | 0.855346 |
|          maml_omniglot          |    5    | 0.854003 |
|             alexnet             |    1    | 0.850284 |
|       mobilenet_v3_large        |    1    | 0.849351 |
|      functorch_dp_cifar10       |    1    | 0.846707 |
|           mnasnet1_0            |    1    | 0.844622 |
|              dcgan              |    1    | 0.844477 |
|     pyhpc_equation_of_state     |    1    | 0.843234 |
|     nvidia_deeprecommender      |    1    | 0.837091 |
|         phlippe_resnet          |    1    | 0.832278 |
|          timm_resnest           |    1    | 0.831568 |
|       shufflenet_v2_x1_0        |    1    | 0.828892 |
|           hf_Reformer           |    1    | 0.827033 |
|           hf_T5_large           |    1    | 0.823169 |
|     pyhpc_isoneutral_mixing     |    1    | 0.819936 |
|         resnext50_32x4d         |    1    | 0.80339  |
|        phlippe_densenet         |    1    | 0.802395 |
|           densenet121           |    1    | 0.801275 |
|           timm_vovnet           |    1    | 0.792904 |
|            resnet18             |    1    | 0.791621 |
|             yolov3              |    1    | 0.790805 |
|            resnet50             |    1    | 0.779703 |
|           timm_regnet           |    1    | 0.778079 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.768547 |
|            resnet152            |    1    | 0.739109 |
|              maml               |    1    | 0.708128 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9846.465532 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2945.859911 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2898.013834 |
|          hf_GPT2_large          |    1    | 2536.549229 |
|            moondream            |    1    | 2236.250644 |
|           hf_T5_large           |    1    | 2181.807903 |
|        hf_distil_whisper        |    1    | 1955.820604 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1679.161694 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1664.285261 |
|          pytorch_unet           |    1    | 1399.036344 |
|       Background_Matting        |    1    | 1218.456267 |
|             demucs              |    1    | 1184.914049 |
|  timm_vision_transformer_large  |    1    | 950.765555  |
|         vision_maskrcnn         |    1    | 701.104173  |
|    detectron2_fcos_r_50_fpn     |    1    |  626.17267  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 598.478587  |
|           hf_BigBird            |    1    | 533.439679  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 512.734618  |
|          hf_Longformer          |    1    | 510.742052  |
|          hf_Bert_large          |    1    | 445.054312  |
|       doctr_det_predictor       |    1    | 392.192711  |
|      torch_multimodal_clip      |    1    | 328.166185  |
|           Super_SloMo           |    1    | 317.609985  |
|             hf_Bart             |    1    | 247.483445  |
|              hf_T5              |    1    | 234.693943  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 227.740425  |
|         pytorch_stargan         |   16    | 221.131756  |
|             hf_Bert             |    1    | 173.523786  |
|       speech_transformer        |    1    | 164.283026  |
|          fastNLP_Bert           |    1    | 163.493663  |
|            hf_Albert            |    1    | 160.946578  |
|             hf_GPT2             |    1    | 112.593039  |
|          hf_DistilBert          |    1    | 104.563665  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 101.953958  |
|           hf_Reformer           |    1    |  99.011027  |
|        basic_gnn_edgecnn        |    1    |  85.81117   |
|             yolov3              |    1    |  63.299886  |
|              maml               |    1    |  55.458475  |
|              vgg16              |    1    |  43.057372  |
|          BERT_pytorch           |    1    |  42.663031  |
|     nvidia_deeprecommender      |    1    |  38.903046  |
|           timm_regnet           |    1    |  32.74291   |
|           tts_angular           |    1    |  30.657294  |
|           timm_nfnet            |    1    |  30.603865  |
|            resnet152            |    1    |  29.301486  |
|          basic_gnn_gcn          |    1    |  27.92944   |
|     timm_vision_transformer     |    1    |  18.323052  |
|           densenet121           |    1    |  13.096858  |
|         resnext50_32x4d         |    1    |  13.059097  |
|           timm_vovnet           |    1    |  12.583686  |
|             alexnet             |    1    |  12.063775  |
|            resnet50             |    1    |  10.862464  |
|              llama              |    1    |  10.21483   |
|         basic_gnn_sage          |    1    |  9.839656   |
|          basic_gnn_gin          |    1    |  9.002088   |
|     resnet50_quantized_qat      |    1    |  8.613951   |
|        timm_efficientnet        |    1    |  7.981744   |
|          timm_resnest           |    1    |  5.948557   |
|   mobilenet_v2_quantized_qat    |    1    |  5.809182   |
|      doctr_reco_predictor       |    1    |  5.558935   |
|            resnet18             |    1    |  3.889004   |
|       mobilenet_v3_large        |    1    |  3.816639   |
|           mnasnet1_0            |    1    |  3.636968   |
|          mobilenet_v2           |    1    |   3.50004   |
|       shufflenet_v2_x1_0        |    1    |  3.425273   |
|         LearningToPaint         |    1    |  2.490576   |
|        phlippe_densenet         |    1    |  2.374842   |
|          squeezenet1_1          |    1    |  2.166028   |
|         opacus_cifar10          |    1    |  1.649706   |
|      functorch_dp_cifar10       |    1    |   1.6199    |
|         phlippe_resnet          |    1    |  0.823652   |
|          maml_omniglot          |    5    |  0.638772   |
|        soft_actor_critic        |   256   |   0.63491   |
|              dcgan              |    1    |  0.583635   |
|     functorch_maml_omniglot     |    1    |   0.50871   |
|              dlrm               |    1    |   0.45382   |
|     pyhpc_isoneutral_mixing     |    1    |  0.044341   |
|          lennard_jones          |    1    |  0.027238   |
|     pyhpc_equation_of_state     |    1    |  0.026223   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 4.009587 |
|     MobileBertForQuestionAnswering      | 1  | 3.400097 |
|     PegasusForConditionalGeneration     | 1  | 2.913721 |
|          BlenderbotForCausalLM          | 1  |  2.8289  |
|     M2M100ForConditionalGeneration      | 1  | 2.786435 |
|          DistilBertForMaskedLM          | 1  | 2.750354 |
|         Speech2Text2ForCausalLM         | 1  | 2.677129 |
|           PegasusForCausalLM            | 1  | 2.65641  |
|            YituTechConvBert             | 1  | 2.548968 |
|     DistilBertForQuestionAnswering      | 1  | 2.532515 |
|             XGLMForCausalLM             | 1  | 2.491346 |
|       BlenderbotSmallForCausalLM        | 1  | 2.484611 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.457713 |
|            XLNetLMHeadModel             | 1  | 2.058402 |
|       DebertaForQuestionAnswering       | 1  | 2.024096 |
|           DebertaForMaskedLM            | 1  | 2.012672 |
|           ElectraForCausalLM            | 1  | 1.948966 |
|    MegatronBertForQuestionAnswering     | 1  | 1.862209 |
|       MT5ForConditionalGeneration       | 1  | 1.85663  |
|           LayoutLMForMaskedLM           | 1  | 1.825583 |
|           RobertaForCausalLM            | 1  | 1.822651 |
|                CamemBert                | 1  | 1.812849 |
|             BertForMaskedLM             | 1  | 1.799588 |
|      GPT2ForSequenceClassification      | 1  | 1.799072 |
|       RobertaForQuestionAnswering       | 1  | 1.79301  |
|          DebertaV2ForMaskedLM           | 1  | 1.791945 |
|         MegatronBertForCausalLM         | 1  | 1.786453 |
|               DistillGPT2               | 1  | 1.770423 |
|    LayoutLMForSequenceClassification    | 1  | 1.759132 |
|       ElectraForQuestionAnswering       | 1  | 1.739632 |
|        BertForQuestionAnswering         | 1  | 1.736996 |
|            TrOCRForCausalLM             | 1  | 1.645481 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.572744 |
|             OPTForCausalLM              | 1  | 1.420885 |
|             BartForCausalLM             | 1  | 1.391474 |
|      BartForConditionalGeneration       | 1  | 1.386374 |
|     PLBartForConditionalGeneration      | 1  | 1.350149 |
|            PLBartForCausalLM            | 1  | 1.314074 |
|      MBartForConditionalGeneration      | 1  | 1.30691  |
|            MBartForCausalLM             | 1  | 1.283745 |
|               GoogleFnet                | 1  | 1.220588 |
|       AlbertForQuestionAnswering        | 1  | 1.191542 |
|                 T5Small                 | 1  | 1.172353 |
|       T5ForConditionalGeneration        | 1  | 1.16219  |
|            AlbertForMaskedLM            | 1  | 1.158848 |
|          AllenaiLongformerBase          | 1  | 0.801291 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.104614 |
|          MobileBertForMaskedLM          | 1  | 48.389674 |
|     MobileBertForQuestionAnswering      | 1  | 48.294634 |
|     PegasusForConditionalGeneration     | 1  | 32.247982 |
|     M2M100ForConditionalGeneration      | 1  | 32.067062 |
|      MBartForConditionalGeneration      | 1  | 31.505356 |
|            XLNetLMHeadModel             | 1  | 28.518734 |
|             XGLMForCausalLM             | 1  | 28.485451 |
|      BartForConditionalGeneration       | 1  | 27.722606 |
|      DebertaV2ForQuestionAnswering      | 1  | 26.806234 |
|          DebertaV2ForMaskedLM           | 1  | 26.707224 |
|          BlenderbotForCausalLM          | 1  | 25.359502 |
| BlenderbotSmallForConditionalGeneration | 1  | 24.573975 |
|         MegatronBertForCausalLM         | 1  | 23.43295  |
|       MT5ForConditionalGeneration       | 1  | 23.22621  |
|    MegatronBertForQuestionAnswering     | 1  | 23.108357 |
|            YituTechConvBert             | 1  | 21.509281 |
|     PLBartForConditionalGeneration      | 1  | 19.514032 |
|                 T5Small                 | 1  | 18.258179 |
|       T5ForConditionalGeneration        | 1  | 18.254099 |
|           PegasusForCausalLM            | 1  | 17.364501 |
|            TrOCRForCausalLM             | 1  | 17.364479 |
|            MBartForCausalLM             | 1  | 16.642804 |
|       DebertaForQuestionAnswering       | 1  | 16.399323 |
|           DebertaForMaskedLM            | 1  | 16.310442 |
|           ElectraForCausalLM            | 1  | 15.711032 |
|       ElectraForQuestionAnswering       | 1  | 15.541585 |
|                CamemBert                | 1  | 15.326884 |
|           LayoutLMForMaskedLM           | 1  | 15.322477 |
|       RobertaForQuestionAnswering       | 1  | 15.164505 |
|             BertForMaskedLM             | 1  | 15.156675 |
|           RobertaForCausalLM            | 1  | 15.155761 |
|        BertForQuestionAnswering         | 1  | 15.141815 |
|            AlbertForMaskedLM            | 1  | 15.137682 |
|    LayoutLMForSequenceClassification    | 1  | 15.105627 |
|             BartForCausalLM             | 1  | 14.584945 |
|       BlenderbotSmallForCausalLM        | 1  | 14.445394 |
|      GPT2ForSequenceClassification      | 1  | 14.26865  |
|               GoogleFnet                | 1  | 13.847618 |
|             OPTForCausalLM              | 1  | 13.748066 |
|         Speech2Text2ForCausalLM         | 1  | 13.030873 |
|          DistilBertForMaskedLM          | 1  | 12.870584 |
|     DistilBertForQuestionAnswering      | 1  |  12.6374  |
|            PLBartForCausalLM            | 1  | 12.554377 |
|       AlbertForQuestionAnswering        | 1  | 12.320202 |
|               DistillGPT2               | 1  | 11.187705 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.974359 |
|            MBartForCausalLM             | 1  | 0.967524 |
|               DistillGPT2               | 1  | 0.954802 |
|     PLBartForConditionalGeneration      | 1  | 0.940736 |
|       RobertaForQuestionAnswering       | 1  | 0.938978 |
|            YituTechConvBert             | 1  | 0.938315 |
|                CamemBert                | 1  | 0.937662 |
|           LayoutLMForMaskedLM           | 1  | 0.937337 |
|       MT5ForConditionalGeneration       | 1  | 0.936832 |
|             BertForMaskedLM             | 1  | 0.936304 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.935354 |
|          BlenderbotForCausalLM          | 1  | 0.933674 |
|             BartForCausalLM             | 1  | 0.932965 |
|            PLBartForCausalLM            | 1  | 0.932505 |
|     M2M100ForConditionalGeneration      | 1  | 0.926835 |
|           DebertaForMaskedLM            | 1  | 0.925246 |
|            TrOCRForCausalLM             | 1  | 0.924643 |
|           PegasusForCausalLM            | 1  | 0.923995 |
|           RobertaForCausalLM            | 1  | 0.922934 |
|                 T5Small                 | 1  | 0.921739 |
|       T5ForConditionalGeneration        | 1  | 0.921694 |
|    LayoutLMForSequenceClassification    | 1  | 0.921268 |
|        BertForQuestionAnswering         | 1  | 0.919131 |
|      GPT2ForSequenceClassification      | 1  | 0.91708  |
|       BlenderbotSmallForCausalLM        | 1  | 0.916411 |
|             XGLMForCausalLM             | 1  | 0.915549 |
|      BartForConditionalGeneration       | 1  | 0.909318 |
|       DebertaForQuestionAnswering       | 1  | 0.90136  |
|          DebertaV2ForMaskedLM           | 1  | 0.901299 |
|          AllenaiLongformerBase          | 1  | 0.897929 |
|           ElectraForCausalLM            | 1  | 0.895968 |
|      MBartForConditionalGeneration      | 1  | 0.89594  |
|    MegatronBertForQuestionAnswering     | 1  | 0.886669 |
|          DistilBertForMaskedLM          | 1  | 0.879397 |
|       ElectraForQuestionAnswering       | 1  | 0.879142 |
|         MegatronBertForCausalLM         | 1  | 0.876191 |
|            XLNetLMHeadModel             | 1  | 0.874095 |
|     DistilBertForQuestionAnswering      | 1  | 0.871491 |
|         Speech2Text2ForCausalLM         | 1  | 0.858609 |
|     PegasusForConditionalGeneration     | 1  | 0.855716 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.851477 |
|          MobileBertForMaskedLM          | 1  | 0.776374 |
|     MobileBertForQuestionAnswering      | 1  | 0.742378 |
|               GoogleFnet                | 1  | 0.688397 |
|            AlbertForMaskedLM            | 1  | 0.686628 |
|       AlbertForQuestionAnswering        | 1  | 0.668996 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 3878.568502 |
|       AlbertForQuestionAnswering        | 1  | 3863.611041 |
|             OPTForCausalLM              | 1  | 2191.945048 |
|      MBartForConditionalGeneration      | 1  | 1785.37226  |
|      BartForConditionalGeneration       | 1  | 1386.930944 |
|          DebertaV2ForMaskedLM           | 1  | 1323.02706  |
|          AllenaiLongformerBase          | 1  | 1128.518397 |
|      DebertaV2ForQuestionAnswering      | 1  | 1018.371465 |
|            MBartForCausalLM             | 1  | 983.413177  |
|            XLNetLMHeadModel             | 1  | 962.196156  |
|       T5ForConditionalGeneration        | 1  | 802.024373  |
|                 T5Small                 | 1  | 801.724633  |
|          BlenderbotForCausalLM          | 1  |  760.88927  |
|     PLBartForConditionalGeneration      | 1  | 653.605874  |
|             BartForCausalLM             | 1  | 609.293263  |
|         MegatronBertForCausalLM         | 1  | 506.261696  |
|               GoogleFnet                | 1  | 465.867668  |
|    MegatronBertForQuestionAnswering     | 1  | 465.524502  |
|            PLBartForCausalLM            | 1  | 383.370813  |
|      GPT2ForSequenceClassification      | 1  | 372.247399  |
|             XGLMForCausalLM             | 1  | 282.256379  |
|     M2M100ForConditionalGeneration      | 1  | 210.661437  |
|           RobertaForCausalLM            | 1  | 208.516855  |
|           DebertaForMaskedLM            | 1  | 205.413158  |
|            YituTechConvBert             | 1  | 197.391482  |
|       MT5ForConditionalGeneration       | 1  | 193.154759  |
|             BertForMaskedLM             | 1  | 182.782177  |
|                CamemBert                | 1  | 182.448662  |
|           LayoutLMForMaskedLM           | 1  | 182.321807  |
|            TrOCRForCausalLM             | 1  | 182.144189  |
|     PegasusForConditionalGeneration     | 1  | 177.964049  |
|       RobertaForQuestionAnswering       | 1  | 147.178354  |
|               DistillGPT2               | 1  | 147.070095  |
|        BertForQuestionAnswering         | 1  | 143.915967  |
|    LayoutLMForSequenceClassification    | 1  | 143.071739  |
|       DebertaForQuestionAnswering       | 1  | 141.494184  |
|           PegasusForCausalLM            | 1  |  88.844903  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.258616  |
|           ElectraForCausalLM            | 1  |  49.02058   |
|          DistilBertForMaskedLM          | 1  |  30.467752  |
|          MobileBertForMaskedLM          | 1  |  30.118121  |
|       ElectraForQuestionAnswering       | 1  |  28.945884  |
|       BlenderbotSmallForCausalLM        | 1  |  28.133676  |
|     DistilBertForQuestionAnswering      | 1  |  19.481839  |
|     MobileBertForQuestionAnswering      | 1  |  17.887661  |
|         Speech2Text2ForCausalLM         | 1  |  5.388039   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.481766 |
|        ese_vovnet19b_dw         | 1  | 6.242762 |
|        adv_inception_v3         | 1  | 6.242724 |
|       gluon_inception_v3        | 1  | 6.112854 |
|           resnest101e           | 1  | 5.808825 |
|          pnasnet5large          | 1  | 5.374845 |
|          cspdarknet53           | 1  | 5.286646 |
|           dm_nfnet_f0           | 1  | 5.277348 |
|           res2next50            | 1  | 4.95491  |
|     swsl_resnext101_32x16d      | 1  | 4.935397 |
|         mobilenetv2_100         | 1  | 4.789678 |
|            nfnet_l0             | 1  | 4.751316 |
|           fbnetc_100            | 1  | 4.718018 |
|            fbnetv3_b            | 1  | 4.664707 |
|             dla102              | 1  | 4.624485 |
|          spnasnet_100           | 1  | 4.565383 |
|           mnasnet_100           | 1  | 4.467315 |
|      mobilenetv3_large_100      | 1  | 4.432688 |
|          botnet26t_256          | 1  | 4.338587 |
|        res2net50_14w_8s         | 1  | 4.327121 |
|           selecsls42b           | 1  | 4.191367 |
|            hrnet_w18            | 1  | 4.149835 |
|        res2net101_26w_4s        | 1  | 4.139655 |
|            gernet_l             | 1  | 4.099441 |
|           regnety_002           | 1  | 4.029643 |
|          ghostnet_100           | 1  | 3.886894 |
|            repvgg_a2            | 1  | 3.86118  |
|       eca_botnext26ts_256       | 1  | 3.692028 |
|        eca_halonext26ts         | 1  | 3.522115 |
|            levit_128            | 1  | 3.385252 |
|            lcnet_050            | 1  | 3.376647 |
|            tinynet_a            | 1  | 3.343912 |
|           rexnet_100            | 1  | 3.259113 |
|         poolformer_m36          | 1  | 3.252242 |
|         visformer_small         | 1  | 3.097259 |
|       tf_efficientnet_b0        | 1  | 3.090203 |
|           mobilevit_s           | 1  | 2.966347 |
|             dpn107              | 1  | 2.94882  |
|        convmixer_768_32         | 1  | 2.637366 |
|         coat_lite_mini          | 1  | 2.526534 |
|        twins_pcpvt_base         | 1  | 2.411009 |
|           volo_d1_224           | 1  | 2.385332 |
|            mixnet_l             | 1  | 2.208198 |
|          gmixer_24_224          | 1  | 2.205142 |
|           tf_mixnet_l           | 1  | 2.198595 |
|  swin_base_patch4_window7_224   | 1  | 1.941995 |
|          gmlp_s16_224           | 1  | 1.854242 |
|      beit_base_patch16_224      | 1  | 1.836457 |
|            pit_b_224            | 1  | 1.816954 |
|          jx_nest_base           | 1  | 1.80557  |
|        tnt_s_patch16_224        | 1  | 1.800914 |
|           convit_base           | 1  | 1.781383 |
|         crossvit_9_240          | 1  | 1.732857 |
|          convnext_base          | 1  | 1.707918 |
|          resmlp_12_224          | 1  | 1.690462 |
|      xcit_large_24_p8_224       | 1  | 1.631166 |
|      vit_base_patch16_224       | 1  | 1.614669 |
|          cait_m36_384           | 1  | 1.54359  |
| deit_base_distilled_patch16_224 | 1  | 1.529571 |
|          mixer_b16_224          | 1  | 1.440769 |
|        sebotnet33ts_256         | 1  | 1.230805 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|            hrnet_w18            | 1  | 44.466418 |
|      xcit_large_24_p8_224       | 1  | 43.887382 |
|          pnasnet5large          | 1  | 43.287714 |
|         poolformer_m36          | 1  | 42.633052 |
|          cait_m36_384           | 1  | 39.434334 |
|  swin_base_patch4_window7_224   | 1  | 37.930187 |
|        res2net101_26w_4s        | 1  | 31.864156 |
|          jx_nest_base           | 1  | 31.597656 |
|        twins_pcpvt_base         | 1  | 30.592023 |
|           resnest101e           | 1  | 30.376875 |
|        res2net50_14w_8s         | 1  | 30.133832 |
|        tnt_s_patch16_224        | 1  | 28.750845 |
|             dpn107              | 1  | 27.598969 |
|           tf_mixnet_l           | 1  | 26.832174 |
|        adv_inception_v3         | 1  | 25.568734 |
|           mobilevit_s           | 1  | 25.112142 |
|            mixnet_l             | 1  | 25.029483 |
|           volo_d1_224           | 1  | 23.878007 |
|       gluon_inception_v3        | 1  | 22.985138 |
|          inception_v3           | 1  | 22.976193 |
|            levit_128            | 1  | 21.927756 |
|          gmixer_24_224          | 1  | 21.909044 |
|         crossvit_9_240          | 1  | 21.592006 |
|          gmlp_s16_224           | 1  | 20.785517 |
|          convnext_base          | 1  | 20.557396 |
|           res2next50            | 1  | 20.507112 |
|            fbnetv3_b            | 1  | 19.874757 |
|        eca_halonext26ts         | 1  | 19.795796 |
|        sebotnet33ts_256         | 1  | 19.267966 |
|             dla102              | 1  | 18.919274 |
|           rexnet_100            | 1  | 18.745234 |
|          ghostnet_100           | 1  | 18.586461 |
|         coat_lite_mini          | 1  | 18.397173 |
|           convit_base           | 1  | 17.425074 |
|            tinynet_a            | 1  | 16.837489 |
|         visformer_small         | 1  | 16.808387 |
|       eca_botnext26ts_256       | 1  | 16.769745 |
|     swsl_resnext101_32x16d      | 1  | 16.680943 |
|       tf_efficientnet_b0        | 1  | 15.883334 |
|        convmixer_768_32         | 1  | 15.869419 |
|          botnet26t_256          | 1  | 15.658995 |
|          cspdarknet53           | 1  | 15.446476 |
|           dm_nfnet_f0           | 1  | 15.36097  |
|            pit_b_224            | 1  | 14.648025 |
|          mixer_b16_224          | 1  | 14.386932 |
|      beit_base_patch16_224      | 1  | 14.357766 |
|            nfnet_l0             | 1  | 14.270266 |
| deit_base_distilled_patch16_224 | 1  | 13.912998 |
|           regnety_002           | 1  | 13.857182 |
|      mobilenetv3_large_100      | 1  | 13.807967 |
|            repvgg_a2            | 1  | 13.787225 |
|      vit_base_patch16_224       | 1  | 13.713808 |
|           fbnetc_100            | 1  | 13.690992 |
|          spnasnet_100           | 1  | 13.663353 |
|            gernet_l             | 1  | 13.134355 |
|         mobilenetv2_100         | 1  | 12.750082 |
|        ese_vovnet19b_dw         | 1  | 12.546096 |
|           mnasnet_100           | 1  | 12.51163  |
|           selecsls42b           | 1  | 12.126813 |
|          resmlp_12_224          | 1  | 11.919891 |
|            lcnet_050            | 1  | 11.24997  |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.915395 |
|          pnasnet5large          | 1  | 0.908725 |
|          convnext_base          | 1  | 0.908246 |
|      beit_base_patch16_224      | 1  | 0.894664 |
|        convmixer_768_32         | 1  | 0.894463 |
|          cait_m36_384           | 1  | 0.892761 |
|           dm_nfnet_f0           | 1  | 0.89237  |
|          resmlp_12_224          | 1  | 0.88968  |
|         poolformer_m36          | 1  | 0.888741 |
|        ese_vovnet19b_dw         | 1  | 0.88711  |
|           volo_d1_224           | 1  | 0.877583 |
|           convit_base           | 1  | 0.876567 |
|  swin_base_patch4_window7_224   | 1  | 0.874317 |
|            pit_b_224            | 1  | 0.873299 |
|      vit_base_patch16_224       | 1  | 0.872243 |
|          mixer_b16_224          | 1  | 0.872156 |
|         coat_lite_mini          | 1  | 0.871958 |
|         visformer_small         | 1  | 0.871168 |
|         mobilenetv2_100         | 1  | 0.870946 |
|        twins_pcpvt_base         | 1  | 0.868453 |
|          jx_nest_base           | 1  | 0.868324 |
|          gmlp_s16_224           | 1  | 0.866963 |
| deit_base_distilled_patch16_224 | 1  | 0.866624 |
|           mobilevit_s           | 1  | 0.862378 |
|      mobilenetv3_large_100      | 1  | 0.861556 |
|          gmixer_24_224          | 1  | 0.860614 |
|            lcnet_050            | 1  | 0.860593 |
|       tf_efficientnet_b0        | 1  | 0.859161 |
|           fbnetc_100            | 1  | 0.858562 |
|           mnasnet_100           | 1  | 0.856481 |
|           rexnet_100            | 1  | 0.854548 |
|            tinynet_a            | 1  | 0.852333 |
|          botnet26t_256          | 1  | 0.852325 |
|          spnasnet_100           | 1  | 0.851237 |
|            fbnetv3_b            | 1  | 0.849973 |
|      xcit_large_24_p8_224       | 1  | 0.847709 |
|        sebotnet33ts_256         | 1  | 0.847191 |
|       eca_botnext26ts_256       | 1  | 0.845704 |
|        eca_halonext26ts         | 1  | 0.843735 |
|           tf_mixnet_l           | 1  | 0.839839 |
|            mixnet_l             | 1  | 0.835217 |
|          ghostnet_100           | 1  | 0.831742 |
|        tnt_s_patch16_224        | 1  |  0.8312  |
|           regnety_002           | 1  | 0.828252 |
|             dpn107              | 1  | 0.827663 |
|         crossvit_9_240          | 1  | 0.827426 |
|            levit_128            | 1  | 0.823784 |
|           res2next50            | 1  | 0.798772 |
|          cspdarknet53           | 1  | 0.794495 |
|       gluon_inception_v3        | 1  | 0.788846 |
|          inception_v3           | 1  | 0.788374 |
|             dla102              | 1  | 0.787848 |
|        res2net50_14w_8s         | 1  | 0.786944 |
|        adv_inception_v3         | 1  | 0.786373 |
|           resnest101e           | 1  | 0.77878  |
|           selecsls42b           | 1  | 0.772963 |
|            gernet_l             | 1  | 0.769135 |
|            hrnet_w18            | 1  | 0.764076 |
|            repvgg_a2            | 1  | 0.761165 |
|        res2net101_26w_4s        | 1  | 0.756175 |
|     swsl_resnext101_32x16d      | 1  | 0.724698 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1621.359222 |
|      xcit_large_24_p8_224       | 1  | 497.846474  |
|          pnasnet5large          | 1  | 129.695963  |
|          jx_nest_base           | 1  | 108.939585  |
|          convnext_base          | 1  |  99.161031  |
|  swin_base_patch4_window7_224   | 1  |  94.696552  |
|     swsl_resnext101_32x16d      | 1  |  91.457301  |
|           convit_base           | 1  |  87.113087  |
| deit_base_distilled_patch16_224 | 1  |  80.807778  |
|      vit_base_patch16_224       | 1  |  74.945829  |
|      beit_base_patch16_224      | 1  |  73.942976  |
|             dpn107              | 1  |  71.683482  |
|        convmixer_768_32         | 1  |  69.732231  |
|            pit_b_224            | 1  |  64.346009  |
|          mixer_b16_224          | 1  |  61.17952   |
|        twins_pcpvt_base         | 1  |  49.207902  |
|        sebotnet33ts_256         | 1  |  49.111884  |
|         poolformer_m36          | 1  |  48.846847  |
|        tnt_s_patch16_224        | 1  |  41.909942  |
|           volo_d1_224           | 1  |  41.136013  |
|           dm_nfnet_f0           | 1  |  40.333773  |
|           resnest101e           | 1  |  31.461266  |
|          gmlp_s16_224           | 1  |  28.726248  |
|        res2net101_26w_4s        | 1  |  28.096168  |
|            nfnet_l0             | 1  |  27.80372   |
|          gmixer_24_224          | 1  |  26.049407  |
|            hrnet_w18            | 1  |  22.947572  |
|         visformer_small         | 1  |  22.789122  |
|           mobilevit_s           | 1  |  20.736389  |
|           tf_mixnet_l           | 1  |  20.271592  |
|             dla102              | 1  |  19.314792  |
|            mixnet_l             | 1  |  19.041269  |
|          resmlp_12_224          | 1  |  17.117394  |
|        res2net50_14w_8s         | 1  |  16.955973  |
|        eca_halonext26ts         | 1  |  15.937991  |
|          cspdarknet53           | 1  |  15.931428  |
|       gluon_inception_v3        | 1  |  15.672792  |
|          inception_v3           | 1  |  15.205272  |
|        adv_inception_v3         | 1  |  15.076065  |
|         coat_lite_mini          | 1  |  14.986805  |
|       eca_botnext26ts_256       | 1  |  14.902925  |
|           res2next50            | 1  |  14.571939  |
|         crossvit_9_240          | 1  |  14.164409  |
|            repvgg_a2            | 1  |  12.722631  |
|            gernet_l             | 1  |  12.339295  |
|          botnet26t_256          | 1  |  11.986498  |
|           selecsls42b           | 1  |  11.35421   |
|       tf_efficientnet_b0        | 1  |  8.358015   |
|        ese_vovnet19b_dw         | 1  |  8.137877   |
|            fbnetv3_b            | 1  |  7.864957   |
|           rexnet_100            | 1  |  7.777675   |
|            tinynet_a            | 1  |  7.166545   |
|            levit_128            | 1  |  5.513563   |
|          ghostnet_100           | 1  |  4.878266   |
|           fbnetc_100            | 1  |  4.336528   |
|          spnasnet_100           | 1  |  4.019589   |
|      mobilenetv3_large_100      | 1  |   3.76099   |
|           mnasnet_100           | 1  |  3.595023   |
|         mobilenetv2_100         | 1  |  3.519627   |
|           regnety_002           | 1  |  3.295652   |
|            lcnet_050            | 1  |  1.819344   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-16T11:52:04Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-14 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	ed94725b8c5d70b31659d10775c011a23cbcb464
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.44x    |    1.30x    |    1.83x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   32.65    |    34.51    |    35.85    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.87x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|     pyhpc_equation_of_state     | 1048576 | 16.63558 |
|           mnasnet1_0            |   32    | 2.960843 |
|       mobilenet_v3_large        |   32    | 2.948884 |
|          squeezenet1_1          |   16    | 2.851434 |
|        timm_efficientnet        |   64    |  2.8267  |
|          mobilenet_v2           |   16    | 2.787359 |
|       shufflenet_v2_x1_0        |   64    | 2.518737 |
|          timm_resnest           |   32    | 2.308418 |
|            resnet50             |   32    | 2.261501 |
|        phlippe_densenet         |   128   | 2.123669 |
|            resnet152            |   32    | 2.034392 |
|           densenet121           |   64    | 1.999736 |
|       doctr_det_predictor       |    1    | 1.95628  |
|         resnext50_32x4d         |    8    | 1.906519 |
|           timm_nfnet            |   128   | 1.881853 |
|           timm_regnet           |   32    | 1.881579 |
|         phlippe_resnet          |   128   | 1.879755 |
|             hf_GPT2             |    1    | 1.801498 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.782161 |
|            resnet18             |    8    | 1.737339 |
|      doctr_reco_predictor       |    1    | 1.714793 |
|           timm_vovnet           |   32    | 1.632976 |
|             alexnet             |   128   | 1.632317 |
|             yolov3              |    8    | 1.523916 |
|        basic_gnn_edgecnn        |    1    | 1.49329  |
|         LearningToPaint         |   96    | 1.483405 |
|          hf_GPT2_large          |    1    | 1.463202 |
|     functorch_maml_omniglot     |    1    | 1.432401 |
|          BERT_pytorch           |    2    | 1.365624 |
|         vision_maskrcnn         |    1    | 1.359302 |
|            moondream            |    1    | 1.349342 |
|    detectron2_fcos_r_50_fpn     |    1    | 1.33156  |
|          maml_omniglot          |    5    | 1.329988 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.313973 |
|              dcgan              |   256   | 1.313185 |
|          hf_Longformer          |    1    | 1.30665  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.294937 |
|          hf_Bert_large          |    1    | 1.284968 |
|            hf_Albert            |    1    | 1.283144 |
|          basic_gnn_gcn          |    1    | 1.282877 |
|              vgg16              |    4    | 1.277574 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.267423 |
|          pytorch_unet           |    1    | 1.253318 |
|             hf_Bart             |    1    | 1.252703 |
|           hf_Reformer           |    1    | 1.249735 |
|             hf_Bert             |    1    | 1.246668 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.23782  |
|          fastNLP_Bert           |    1    | 1.218653 |
|         pytorch_stargan         |   16    | 1.20559  |
|         basic_gnn_sage          |    1    | 1.198843 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.195138 |
|          hf_DistilBert          |    1    | 1.191591 |
|          lennard_jones          |  1000   | 1.171082 |
|           hf_BigBird            |    1    | 1.142622 |
|       Background_Matting        |    1    | 1.134245 |
|              dlrm               |  2048   | 1.131747 |
|           Super_SloMo           |    6    | 1.12854  |
|     timm_vision_transformer     |   32    | 1.126664 |
|        soft_actor_critic        |   256   | 1.124603 |
|          basic_gnn_gin          |    1    | 1.121063 |
|              llama              |   32    | 1.119368 |
|      torch_multimodal_clip      |   32    | 1.118888 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.085935 |
|        hf_distil_whisper        |    1    | 1.071562 |
|              hf_T5              |    1    | 1.071526 |
|     nvidia_deeprecommender      |   256   | 1.061518 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.054624 |
|         opacus_cifar10          |   64    | 1.028313 |
|             demucs              |    1    | 1.013799 |
|   mobilenet_v2_quantized_qat    |   96    | 1.010681 |
|  timm_vision_transformer_large  |   32    | 1.010125 |
|           hf_T5_large           |    1    | 1.000398 |
|           tts_angular           |   64    | 0.998704 |
|     resnet50_quantized_qat      |   32    | 0.988871 |
|      functorch_dp_cifar10       |   64    | 0.957005 |
|       speech_transformer        |    1    | 0.939615 |
|           hf_T5_base            |    1    | 0.876668 |
|              maml               |    1    | 0.848656 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.758007 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 154.152953 |
|         vision_maskrcnn         |    1    | 145.691361 |
|           hf_T5_large           |    1    | 122.481646 |
|    detectron2_fcos_r_50_fpn     |    1    | 120.084646 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 113.823558 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 104.820419 |
|          hf_Longformer          |    1    |  88.05972  |
|              maml               |    1    | 82.203595  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 63.335968  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 56.405409  |
|       speech_transformer        |    1    | 55.093846  |
|           hf_T5_base            |    1    | 53.182184  |
|           densenet121           |   64    | 52.399105  |
|  timm_vision_transformer_large  |   32    | 49.736345  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.913498  |
|          hf_GPT2_large          |    1    |  46.97292  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 46.487789  |
|            moondream            |    1    | 44.672593  |
|           hf_Reformer           |    1    | 44.502552  |
|      torch_multimodal_clip      |   32    | 38.570988  |
|     pyhpc_isoneutral_mixing     | 1048576 | 37.906906  |
|           Super_SloMo           |    6    | 37.892134  |
|            resnet152            |   32    | 36.520692  |
|          hf_Bert_large          |    1    | 36.215127  |
|        hf_distil_whisper        |    1    | 36.144577  |
|          fastNLP_Bert           |    1    | 33.555859  |
|       doctr_det_predictor       |    1    | 29.535454  |
|          basic_gnn_gcn          |    1    | 29.453002  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.432953  |
|             yolov3              |    8    | 29.150583  |
|          BERT_pytorch           |    2    |  29.11675  |
|             hf_Bart             |    1    | 24.525767  |
|              hf_T5              |    1    |  24.11734  |
|           timm_regnet           |   32    | 23.902044  |
|        timm_efficientnet        |   64    |  23.25873  |
|       mobilenet_v3_large        |   32    |  22.58094  |
|       shufflenet_v2_x1_0        |   64    | 22.522533  |
|        phlippe_densenet         |   128   | 22.406881  |
|             hf_Bert             |    1    |  20.51889  |
|          mobilenet_v2           |   16    | 20.303091  |
|           timm_nfnet            |   128   | 19.571682  |
|     timm_vision_transformer     |   32    | 18.849581  |
|          timm_resnest           |   32    | 18.448933  |
|           timm_vovnet           |   32    | 18.033453  |
|              llama              |   32    | 18.006484  |
|         resnext50_32x4d         |    8    | 17.843331  |
|             hf_GPT2             |    1    | 17.798986  |
|             demucs              |    1    | 17.758667  |
|            hf_Albert            |    1    | 17.722765  |
|            resnet50             |   32    | 17.462355  |
|       Background_Matting        |    1    | 17.333424  |
|           mnasnet1_0            |   32    | 16.433063  |
|         pytorch_stargan         |   16    |  16.08118  |
|         opacus_cifar10          |   64    | 15.852274  |
|      functorch_dp_cifar10       |   64    |  15.2947   |
|      doctr_reco_predictor       |    1    | 14.462736  |
|          hf_DistilBert          |    1    | 14.069888  |
|          pytorch_unet           |    1    | 13.141868  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.118835  |
|            resnet18             |    8    | 11.920954  |
|          squeezenet1_1          |   16    | 11.619735  |
|         LearningToPaint         |   96    | 11.359263  |
|         phlippe_resnet          |   128   | 11.029633  |
|              vgg16              |    4    | 10.054835  |
|             alexnet             |   128   |  9.666617  |
|     pyhpc_equation_of_state     | 1048576 |  9.664933  |
|          maml_omniglot          |    5    |  9.011008  |
|     functorch_maml_omniglot     |    1    |  8.513217  |
|              dlrm               |  2048   |  8.247541  |
|              dcgan              |   256   |  7.990126  |
|          basic_gnn_gin          |    1    |  7.879398  |
|        basic_gnn_edgecnn        |    1    |  7.856437  |
|         basic_gnn_sage          |    1    |  7.766225  |
|     nvidia_deeprecommender      |   256   |  7.676979  |
|        soft_actor_critic        |   256   |  7.419479  |
|          lennard_jones          |  1000   |  7.306594  |
|           tts_angular           |   64    |  7.160175  |
|   mobilenet_v2_quantized_qat    |   96    |  0.12314   |
|     resnet50_quantized_qat      |   32    |  0.09799   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.995807 |
|           timm_nfnet            |   128   | 0.991799 |
|              dlrm               |  2048   | 0.988285 |
|       Background_Matting        |    1    | 0.980947 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.980037 |
|        timm_efficientnet        |   64    | 0.978688 |
|     nvidia_deeprecommender      |   256   | 0.977816 |
|             demucs              |    1    | 0.977116 |
|           densenet121           |   64    | 0.976352 |
|           timm_regnet           |   32    | 0.974403 |
|        basic_gnn_edgecnn        |    1    | 0.973109 |
|          pytorch_unet           |    1    | 0.972997 |
|           Super_SloMo           |    6    | 0.972596 |
|      torch_multimodal_clip      |   32    | 0.971807 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971386 |
|             yolov3              |    8    | 0.966198 |
|            resnet50             |   32    | 0.963992 |
|            resnet152            |   32    | 0.963656 |
|         LearningToPaint         |   96    | 0.962994 |
|          timm_resnest           |   32    | 0.961605 |
|           timm_vovnet           |   32    | 0.961438 |
|     timm_vision_transformer     |   32    | 0.958369 |
|   mobilenet_v2_quantized_qat    |   96    | 0.957831 |
|     resnet50_quantized_qat      |   32    | 0.957245 |
|              vgg16              |    4    | 0.956499 |
|             alexnet             |   128   | 0.954516 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.953869 |
|       doctr_det_predictor       |    1    | 0.953691 |
|           mnasnet1_0            |   32    | 0.953524 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.953164 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.949691 |
|         vision_maskrcnn         |    1    | 0.949221 |
|         pytorch_stargan         |   16    | 0.946706 |
|       mobilenet_v3_large        |   32    | 0.943392 |
|          mobilenet_v2           |   16    | 0.942254 |
|       shufflenet_v2_x1_0        |   64    | 0.941591 |
|         resnext50_32x4d         |    8    | 0.940557 |
|          basic_gnn_gcn          |    1    | 0.940382 |
|        phlippe_densenet         |   128   | 0.939634 |
|      doctr_reco_predictor       |    1    | 0.936693 |
|           tts_angular           |   64    | 0.926788 |
|           hf_T5_base            |    1    | 0.926123 |
|          BERT_pytorch           |    2    | 0.919292 |
|     pyhpc_equation_of_state     | 1048576 | 0.911311 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.911213 |
|          squeezenet1_1          |   16    | 0.910333 |
|              dcgan              |   256   | 0.907975 |
|         opacus_cifar10          |   64    | 0.888181 |
|            resnet18             |    8    | 0.88717  |
|        soft_actor_critic        |   256   | 0.886878 |
|           hf_BigBird            |    1    | 0.885556 |
|         phlippe_resnet          |   128   | 0.883634 |
|         basic_gnn_sage          |    1    | 0.866834 |
|          basic_gnn_gin          |    1    | 0.866005 |
|          lennard_jones          |  1000   | 0.864502 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.860249 |
|          maml_omniglot          |    5    | 0.855497 |
|              llama              |   32    | 0.854979 |
|     functorch_maml_omniglot     |    1    | 0.852818 |
|      functorch_dp_cifar10       |   64    | 0.83153  |
|          hf_GPT2_large          |    1    | 0.761963 |
|            moondream            |    1    | 0.728201 |
|              maml               |    1    | 0.719572 |
|           hf_T5_large           |    1    | 0.714914 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.710102 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.696171 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.691168 |
|             hf_GPT2             |    1    | 0.684237 |
|          fastNLP_Bert           |    1    | 0.682523 |
|          hf_Longformer          |    1    | 0.679871 |
|              hf_T5              |    1    | 0.657617 |
|          hf_DistilBert          |    1    | 0.655781 |
|             hf_Bart             |    1    | 0.64409  |
|          hf_Bert_large          |    1    | 0.640921 |
|             hf_Bert             |    1    | 0.639739 |
|            hf_Albert            |    1    | 0.59518  |
|       speech_transformer        |    1    | 0.581553 |
|        hf_distil_whisper        |    1    | 0.567517 |
|           hf_Reformer           |    1    | 0.557552 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4511.483788 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1442.451213 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1317.250374 |
|           hf_T5_base            |    1    | 1215.737523 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1168.753674 |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1073.9382  |
|           Super_SloMo           |    6    | 613.718155  |
|          hf_GPT2_large          |    1    | 578.160944  |
|           timm_nfnet            |   128   | 534.673816  |
|           hf_T5_large           |    1    | 504.854394  |
|            moondream            |    1    | 447.069605  |
|        hf_distil_whisper        |    1    |  393.14727  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 389.785386  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 347.651367  |
|         vision_maskrcnn         |    1    | 309.674709  |
|       Background_Matting        |    1    | 250.465493  |
|           timm_regnet           |   32    | 222.270662  |
|          pytorch_unet           |    1    | 218.937691  |
|            resnet152            |   32    | 190.847814  |
|           densenet121           |   64    | 189.530813  |
|      torch_multimodal_clip      |   32    | 174.333041  |
|             yolov3              |    8    | 170.990823  |
|    detectron2_fcos_r_50_fpn     |    1    | 165.599981  |
|             demucs              |    1    | 148.513664  |
|           hf_BigBird            |    1    | 146.865959  |
|          hf_Bert_large          |    1    | 132.621571  |
|           timm_vovnet           |   32    | 121.170606  |
|     timm_vision_transformer     |   32    | 102.687041  |
|         pytorch_stargan         |   16    |  96.47544   |
|       doctr_det_predictor       |    1    |  89.154178  |
|          hf_Longformer          |    1    |  81.964931  |
|            resnet50             |   32    |  75.829086  |
|       speech_transformer        |    1    |  65.033884  |
|             hf_Bart             |    1    |  58.374405  |
|          timm_resnest           |   32    |  54.367539  |
|        timm_efficientnet        |   64    |  52.117447  |
|             hf_Bert             |    1    |  50.736508  |
|              maml               |    1    |  49.66624   |
|              hf_T5              |    1    |  49.266167  |
|          fastNLP_Bert           |    1    |  45.85516   |
|            hf_Albert            |    1    |  44.370564  |
|             alexnet             |   128   |  44.184514  |
|   mobilenet_v2_quantized_qat    |   96    |  41.832772  |
|              vgg16              |    4    |  37.502665  |
|         LearningToPaint         |   96    |  36.836968  |
|     nvidia_deeprecommender      |   256   |  34.561446  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.929081  |
|           hf_Reformer           |    1    |  31.726139  |
|              llama              |   32    |  31.246474  |
|     pyhpc_isoneutral_mixing     | 1048576 |  30.709415  |
|          hf_DistilBert          |    1    |  30.600726  |
|     resnet50_quantized_qat      |   32    |  28.371877  |
|          BERT_pytorch           |    2    |  27.462162  |
|             hf_GPT2             |    1    |  27.14794   |
|         resnext50_32x4d         |    8    |  25.307948  |
|        phlippe_densenet         |   128   |  21.303161  |
|           tts_angular           |   64    |  20.779698  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  20.661296  |
|              dcgan              |   256   |  19.11279   |
|        basic_gnn_edgecnn        |    1    |  18.571529  |
|       shufflenet_v2_x1_0        |   64    |  17.418321  |
|           mnasnet1_0            |   32    |  14.992484  |
|       mobilenet_v3_large        |   32    |  14.277554  |
|          mobilenet_v2           |   16    |  10.082213  |
|            resnet18             |    8    |  10.037971  |
|          basic_gnn_gcn          |    1    |  9.751039   |
|         opacus_cifar10          |   64    |  7.911051   |
|      functorch_dp_cifar10       |   64    |  7.687357   |
|              dlrm               |  2048   |  7.363649   |
|          squeezenet1_1          |   16    |  6.710765   |
|         basic_gnn_sage          |    1    |  5.339599   |
|          basic_gnn_gin          |    1    |  5.129583   |
|         phlippe_resnet          |   128   |  4.754381   |
|      doctr_reco_predictor       |    1    |  3.736954   |
|     pyhpc_equation_of_state     | 1048576 |  0.827593   |
|        soft_actor_critic        |   256   |  0.582304   |
|     functorch_maml_omniglot     |    1    |  0.538722   |
|          maml_omniglot          |    5    |  0.497821   |
|          lennard_jones          |  1000   |  0.271253   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.112995 |
|     MobileBertForQuestionAnswering      | 128 | 1.91539  |
|      GPT2ForSequenceClassification      |  4  | 1.855085 |
|           ElectraForCausalLM            | 32  | 1.760529 |
|       ElectraForQuestionAnswering       | 64  | 1.739511 |
|          MobileBertForMaskedLM          | 128 | 1.617255 |
|               DistillGPT2               | 16  | 1.518142 |
|       RobertaForQuestionAnswering       | 16  | 1.449949 |
|    LayoutLMForSequenceClassification    | 16  | 1.442791 |
|        BertForQuestionAnswering         | 16  | 1.419328 |
|           RobertaForCausalLM            | 16  | 1.408472 |
|            YituTechConvBert             | 16  | 1.389141 |
|           LayoutLMForMaskedLM           | 16  | 1.372031 |
|               GoogleFnet                | 16  | 1.37016  |
|             BertForMaskedLM             | 16  | 1.358398 |
|                CamemBert                | 16  | 1.356618 |
|    MegatronBertForQuestionAnswering     |  8  | 1.346547 |
|         MegatronBertForCausalLM         |  4  | 1.321153 |
|          AllenaiLongformerBase          |  4  | 1.318155 |
|       DebertaForQuestionAnswering       | 16  | 1.313959 |
|           DebertaForMaskedLM            |  8  | 1.267292 |
|     PLBartForConditionalGeneration      |  4  | 1.24522  |
|      MBartForConditionalGeneration      |  2  | 1.223143 |
|             OPTForCausalLM              |  2  | 1.195622 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.184115 |
|       MT5ForConditionalGeneration       | 16  | 1.158981 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.14717  |
|          DebertaV2ForMaskedLM           |  2  | 1.133289 |
|                 T5Small                 |  4  | 1.131448 |
|       AlbertForQuestionAnswering        |  4  | 1.128016 |
|            AlbertForMaskedLM            |  4  | 1.126797 |
|       T5ForConditionalGeneration        |  4  | 1.125765 |
|          DistilBertForMaskedLM          | 128 | 1.115043 |
|     DistilBertForQuestionAnswering      | 256 | 1.104352 |
|       BlenderbotSmallForCausalLM        | 64  | 1.100322 |
|         Speech2Text2ForCausalLM         | 256 | 1.097852 |
|             XGLMForCausalLM             |  8  |  1.094   |
|      BartForConditionalGeneration       |  2  | 1.089072 |
|     M2M100ForConditionalGeneration      | 16  | 1.085056 |
|     PegasusForConditionalGeneration     | 32  | 1.060779 |
|            PLBartForCausalLM            |  8  | 1.06001  |
|            MBartForCausalLM             |  4  | 1.054116 |
|            TrOCRForCausalLM             | 32  | 1.047505 |
|           PegasusForCausalLM            | 32  | 1.041311 |
|          BlenderbotForCausalLM          |  4  | 1.041298 |
|             BartForCausalLM             |  4  | 1.035015 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 141.785725 |
|          MobileBertForMaskedLM          | 128 | 77.521344  |
|     MobileBertForQuestionAnswering      | 128 | 76.768915  |
|     M2M100ForConditionalGeneration      | 16  | 56.847901  |
|     PegasusForConditionalGeneration     | 32  |  56.6786   |
|      MBartForConditionalGeneration      |  2  | 56.409893  |
|      BartForConditionalGeneration       |  2  | 48.595077  |
|             XGLMForCausalLM             |  8  | 47.160501  |
|          BlenderbotForCausalLM          |  4  | 46.573131  |
|          DebertaV2ForMaskedLM           |  2  | 45.740082  |
|      DebertaV2ForQuestionAnswering      |  1  | 43.435533  |
|       MT5ForConditionalGeneration       | 16  | 41.299017  |
|         MegatronBertForCausalLM         |  4  | 39.706026  |
| BlenderbotSmallForConditionalGeneration | 64  | 39.273075  |
|    MegatronBertForQuestionAnswering     |  8  | 39.121366  |
|            YituTechConvBert             | 16  | 36.434093  |
|            XLNetLMHeadModel             |  8  | 34.261719  |
|     PLBartForConditionalGeneration      |  4  | 31.643261  |
|                 T5Small                 |  4  | 31.348391  |
|       T5ForConditionalGeneration        |  4  | 31.135503  |
|           DebertaForMaskedLM            |  8  | 26.679301  |
|           PegasusForCausalLM            | 32  | 26.618082  |
|       DebertaForQuestionAnswering       | 16  | 26.496642  |
|            MBartForCausalLM             |  4  |  26.37448  |
|             OPTForCausalLM              |  2  | 26.125358  |
|            TrOCRForCausalLM             | 32  | 25.675438  |
|           ElectraForCausalLM            | 32  | 22.850916  |
|       RobertaForQuestionAnswering       | 16  | 22.786493  |
|       ElectraForQuestionAnswering       | 64  |  22.53126  |
|        BertForQuestionAnswering         | 16  | 22.440271  |
|           LayoutLMForMaskedLM           | 16  | 22.419353  |
|           RobertaForCausalLM            | 16  | 22.356334  |
|                CamemBert                | 16  | 22.338154  |
|            AlbertForMaskedLM            |  4  |  22.26377  |
|             BartForCausalLM             |  4  |  21.97041  |
|    LayoutLMForSequenceClassification    | 16  | 21.960106  |
|             BertForMaskedLM             | 16  | 21.703938  |
|      GPT2ForSequenceClassification      |  4  | 21.687995  |
|       AlbertForQuestionAnswering        |  4  | 20.500277  |
|       BlenderbotSmallForCausalLM        | 64  | 19.865456  |
|     DistilBertForQuestionAnswering      | 256 |  17.31516  |
|          DistilBertForMaskedLM          | 128 | 17.206096  |
|               GoogleFnet                | 16  | 17.172511  |
|         Speech2Text2ForCausalLM         | 256 |  17.09256  |
|            PLBartForCausalLM            |  8  | 16.895191  |
|               DistillGPT2               | 16  | 14.323663  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            AlbertForMaskedLM            |  4  | 0.993642 |
|       AlbertForQuestionAnswering        |  4  | 0.992869 |
|     DistilBertForQuestionAnswering      | 256 | 0.992239 |
|               DistillGPT2               | 16  | 0.991734 |
|           RobertaForCausalLM            | 16  | 0.991301 |
|            TrOCRForCausalLM             | 32  | 0.991137 |
|             OPTForCausalLM              |  2  | 0.990731 |
|          DistilBertForMaskedLM          | 128 | 0.990262 |
|               GoogleFnet                | 16  | 0.990156 |
|            PLBartForCausalLM            |  8  | 0.989876 |
|           ElectraForCausalLM            | 32  | 0.989802 |
|       ElectraForQuestionAnswering       | 64  | 0.989573 |
|            MBartForCausalLM             |  4  | 0.989242 |
|             BertForMaskedLM             | 16  | 0.989199 |
|                CamemBert                | 16  | 0.989178 |
|    MegatronBertForQuestionAnswering     |  8  | 0.989131 |
|           LayoutLMForMaskedLM           | 16  | 0.988863 |
|     PegasusForConditionalGeneration     | 32  | 0.987812 |
|            YituTechConvBert             | 16  | 0.987589 |
|       DebertaForQuestionAnswering       | 16  | 0.987363 |
|         Speech2Text2ForCausalLM         | 256 | 0.987097 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.986984 |
|        BertForQuestionAnswering         | 16  | 0.986851 |
|    LayoutLMForSequenceClassification    | 16  | 0.985982 |
|       RobertaForQuestionAnswering       | 16  | 0.98578  |
|      MBartForConditionalGeneration      |  2  | 0.985606 |
|           PegasusForCausalLM            | 32  | 0.985143 |
|     PLBartForConditionalGeneration      |  4  | 0.985091 |
|          BlenderbotForCausalLM          |  4  | 0.984965 |
|             BartForCausalLM             |  4  | 0.984607 |
|          DebertaV2ForMaskedLM           |  2  | 0.984083 |
|         MegatronBertForCausalLM         |  4  | 0.983896 |
|      GPT2ForSequenceClassification      |  4  | 0.983797 |
|           DebertaForMaskedLM            |  8  | 0.983539 |
|       BlenderbotSmallForCausalLM        | 64  |  0.9832  |
|          MobileBertForMaskedLM          | 128 | 0.982534 |
|            XLNetLMHeadModel             |  8  | 0.98171  |
|      BartForConditionalGeneration       |  2  | 0.981253 |
|       MT5ForConditionalGeneration       | 16  | 0.980505 |
|                 T5Small                 |  4  | 0.979161 |
|       T5ForConditionalGeneration        |  4  | 0.978373 |
|     MobileBertForQuestionAnswering      | 128 | 0.974229 |
|             XGLMForCausalLM             |  8  | 0.971277 |
|          AllenaiLongformerBase          |  4  | 0.971269 |
|     M2M100ForConditionalGeneration      | 16  | 0.953636 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.736067 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2729.908312 |
|       AlbertForQuestionAnswering        |  4  | 2711.849942 |
|            XLNetLMHeadModel             |  8  | 1421.616406 |
|     PegasusForConditionalGeneration     | 32  | 993.475776  |
|            TrOCRForCausalLM             | 32  | 986.081826  |
|     DistilBertForQuestionAnswering      | 256 | 878.784159  |
|    MegatronBertForQuestionAnswering     |  8  | 774.122741  |
|            MBartForCausalLM             |  4  | 679.533576  |
|          BlenderbotForCausalLM          |  4  | 677.460362  |
|          DistilBertForMaskedLM          | 128 | 669.351224  |
|      MBartForConditionalGeneration      |  2  | 669.157943  |
|     M2M100ForConditionalGeneration      | 16  | 608.174349  |
|           RobertaForCausalLM            | 16  | 602.866648  |
|             OPTForCausalLM              |  2  | 602.380058  |
|          DebertaV2ForMaskedLM           |  2  | 598.095456  |
|      BartForConditionalGeneration       |  2  | 593.019562  |
|            YituTechConvBert             | 16  |  588.41805  |
|                CamemBert                | 16  |  562.63653  |
|             BertForMaskedLM             | 16  | 556.515311  |
|           LayoutLMForMaskedLM           | 16  | 555.390856  |
|          AllenaiLongformerBase          |  4  | 547.519998  |
|             BartForCausalLM             |  4  | 523.176375  |
|            PLBartForCausalLM            |  8  | 514.338402  |
|       DebertaForQuestionAnswering       | 16  | 502.307243  |
|           PegasusForCausalLM            | 32  | 490.044023  |
| BlenderbotSmallForConditionalGeneration | 64  | 486.243181  |
|     PLBartForConditionalGeneration      |  4  | 466.319841  |
|         MegatronBertForCausalLM         |  4  | 447.081665  |
|        BertForQuestionAnswering         | 16  | 445.311795  |
|    LayoutLMForSequenceClassification    | 16  | 442.591435  |
|       RobertaForQuestionAnswering       | 16  | 429.022116  |
|          MobileBertForMaskedLM          | 128 | 414.825511  |
|               GoogleFnet                | 16  | 410.839227  |
|             XGLMForCausalLM             |  8  | 385.503677  |
|               DistillGPT2               | 16  | 384.591005  |
|                 T5Small                 |  4  | 355.277215  |
|       T5ForConditionalGeneration        |  4  |  354.70441  |
|           DebertaForMaskedLM            |  8  | 354.124032  |
|       ElectraForQuestionAnswering       | 64  | 325.612072  |
|      DebertaV2ForQuestionAnswering      |  1  | 290.963565  |
|         Speech2Text2ForCausalLM         | 256 | 279.953726  |
|       MT5ForConditionalGeneration       | 16  | 279.194794  |
|       BlenderbotSmallForCausalLM        | 64  | 278.728415  |
|      GPT2ForSequenceClassification      |  4  | 266.261344  |
|           ElectraForCausalLM            | 32  | 252.294066  |
|     MobileBertForQuestionAnswering      | 128 | 243.441244  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.960198 |
|         mobilenetv2_100         | 128  | 3.955554 |
|           mnasnet_100           | 512  | 3.877884 |
|            lcnet_050            | 256  | 3.872617 |
|      mobilenetv3_large_100      | 512  | 3.67355  |
|          spnasnet_100           | 128  | 3.637709 |
|            fbnetv3_b            | 256  | 3.509491 |
|           regnety_002           | 1024 | 3.297028 |
|           rexnet_100            | 256  | 3.106573 |
|       tf_efficientnet_b0        | 128  | 2.996745 |
|            tinynet_a            | 128  | 2.892298 |
|          pnasnet5large          |  16  | 2.639384 |
|        ese_vovnet19b_dw         | 256  | 2.622759 |
|          botnet26t_256          | 128  | 2.461476 |
|           res2next50            | 128  | 2.443065 |
|       gluon_inception_v3        | 256  | 2.367611 |
|          ghostnet_100           | 512  | 2.349398 |
|          inception_v3           | 128  | 2.303752 |
|       eca_botnext26ts_256       | 128  | 2.301238 |
|        eca_halonext26ts         | 128  | 2.295942 |
|        adv_inception_v3         | 128  | 2.278429 |
|             dla102              | 128  | 2.20099  |
|        res2net50_14w_8s         | 128  | 2.177679 |
|        res2net101_26w_4s        | 128  | 2.152305 |
|          cspdarknet53           |  64  | 2.091419 |
|            repvgg_a2            | 128  | 2.024816 |
|            nfnet_l0             | 128  | 2.015901 |
|        convmixer_768_32         |  32  | 1.970432 |
|            gernet_l             | 128  | 1.93655  |
|         poolformer_m36          |  64  | 1.925234 |
|           tf_mixnet_l           | 128  | 1.899454 |
|           dm_nfnet_f0           | 128  | 1.882644 |
|           mobilevit_s           |  64  | 1.863881 |
|           selecsls42b           | 128  | 1.79676  |
|            mixnet_l             | 128  | 1.723182 |
|           volo_d1_224           |  64  | 1.721764 |
|         visformer_small         | 128  | 1.701056 |
|        sebotnet33ts_256         |  64  | 1.577361 |
|           resnest101e           |  64  | 1.489014 |
|            levit_128            | 1024 | 1.48351  |
|             dpn107              |  64  | 1.437877 |
|          jx_nest_base           |  32  | 1.277661 |
|          gmlp_s16_224           | 128  | 1.239022 |
|      xcit_large_24_p8_224       |  16  | 1.234474 |
|          resmlp_12_224          | 128  | 1.197773 |
|           convit_base           |  64  | 1.194929 |
|         coat_lite_mini          | 128  | 1.14377  |
|        tnt_s_patch16_224        | 128  | 1.137572 |
|          gmixer_24_224          | 128  | 1.131973 |
|  swin_base_patch4_window7_224   |  64  | 1.12302  |
|          cait_m36_384           |  4   | 1.108107 |
|        twins_pcpvt_base         | 128  | 1.102646 |
|          convnext_base          |  64  | 1.066769 |
|      beit_base_patch16_224      |  64  | 1.060905 |
|          mixer_b16_224          | 128  | 1.042384 |
| deit_base_distilled_patch16_224 |  64  | 1.034162 |
|            pit_b_224            |  64  | 1.030416 |
|      vit_base_patch16_224       |  64  | 1.025586 |
|         crossvit_9_240          | 256  | 1.008129 |
|            hrnet_w18            | 128  | 0.803141 |
|     swsl_resnext101_32x16d      |  32  | 0.07372  |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|     swsl_resnext101_32x16d      |  32  | 107.926293 |
|          cait_m36_384           |  4   | 85.118467  |
|      xcit_large_24_p8_224       |  16  | 83.852456  |
|            hrnet_w18            | 128  |  79.87722  |
|  swin_base_patch4_window7_224   |  64  | 76.848913  |
|          pnasnet5large          |  16  | 75.345258  |
|           mobilevit_s           |  64  | 64.617001  |
|         poolformer_m36          |  64  | 59.235973  |
|          jx_nest_base           |  32  | 55.642622  |
|        tnt_s_patch16_224        | 128  |  53.77874  |
|        res2net101_26w_4s        | 128  | 50.779692  |
|        twins_pcpvt_base         | 128  | 50.532288  |
|             dpn107              |  64  | 49.038348  |
|        res2net50_14w_8s         | 128  |  47.75017  |
|           resnest101e           |  64  | 46.918979  |
|           volo_d1_224           |  64  | 46.865567  |
|           tf_mixnet_l           | 128  | 43.256594  |
|            levit_128            | 1024 | 41.596844  |
|            mixnet_l             | 128  | 39.598361  |
|        eca_halonext26ts         | 128  | 39.374038  |
|            fbnetv3_b            | 256  | 38.978422  |
|        sebotnet33ts_256         |  64  | 38.455819  |
|          gmixer_24_224          | 128  | 36.675228  |
|        adv_inception_v3         | 128  | 35.658951  |
|         crossvit_9_240          | 256  | 35.545849  |
|          gmlp_s16_224           | 128  | 33.769269  |
|          inception_v3           | 128  | 33.535071  |
|       gluon_inception_v3        | 256  | 30.873628  |
|       eca_botnext26ts_256       | 128  | 30.854154  |
|          convnext_base          |  64  | 30.479862  |
|           convit_base           |  64  | 30.331477  |
|           res2next50            | 128  | 29.782535  |
|         coat_lite_mini          | 128  | 29.254702  |
|           rexnet_100            | 256  | 27.796112  |
|             dla102              | 128  | 27.712389  |
|          botnet26t_256          | 128  | 27.246558  |
|          ghostnet_100           | 512  | 26.292001  |
|            tinynet_a            | 128  |  26.08675  |
|         visformer_small         | 128  | 23.824816  |
|       tf_efficientnet_b0        | 128  | 23.587893  |
|          cspdarknet53           |  64  | 22.119668  |
|            pit_b_224            |  64  | 21.445498  |
|          mixer_b16_224          | 128  | 21.233155  |
|        convmixer_768_32         |  32  | 20.859685  |
|      beit_base_patch16_224      |  64  | 19.992407  |
| deit_base_distilled_patch16_224 |  64  | 19.101179  |
|         mobilenetv2_100         | 128  | 19.079926  |
|      mobilenetv3_large_100      | 512  | 19.074849  |
|      vit_base_patch16_224       |  64  | 18.870677  |
|           dm_nfnet_f0           | 128  | 17.970662  |
|          spnasnet_100           | 128  | 17.923025  |
|            nfnet_l0             | 128  | 16.997547  |
|            lcnet_050            | 256  | 16.918174  |
|           regnety_002           | 1024 | 16.857867  |
|            gernet_l             | 128  | 16.792329  |
|            repvgg_a2            | 128  | 16.404407  |
|          resmlp_12_224          | 128  |  15.49012  |
|           selecsls42b           | 128  | 14.831554  |
|           fbnetc_100            | 512  | 14.559194  |
|           mnasnet_100           | 512  | 12.783238  |
|        ese_vovnet19b_dw         | 256  | 12.576737  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997032 |
|      mobilenetv3_large_100      | 512  | 0.99578  |
|           dm_nfnet_f0           | 128  | 0.995693 |
|           fbnetc_100            | 512  | 0.995684 |
|          convnext_base          |  64  | 0.995356 |
|           mnasnet_100           | 512  | 0.994612 |
|           regnety_002           | 1024 | 0.994333 |
|            fbnetv3_b            | 256  | 0.994303 |
|            levit_128            | 1024 | 0.994241 |
|          ghostnet_100           | 512  | 0.993976 |
|       eca_botnext26ts_256       | 128  | 0.993307 |
|           rexnet_100            | 256  | 0.993219 |
|        eca_halonext26ts         | 128  | 0.992823 |
|          gmlp_s16_224           | 128  | 0.992282 |
|        twins_pcpvt_base         | 128  | 0.992271 |
|          mixer_b16_224          | 128  | 0.992257 |
|        convmixer_768_32         |  32  | 0.991877 |
|           convit_base           |  64  | 0.99162  |
|            nfnet_l0             | 128  | 0.991541 |
|      xcit_large_24_p8_224       |  16  | 0.991535 |
|           tf_mixnet_l           | 128  | 0.991503 |
|        res2net101_26w_4s        | 128  | 0.991255 |
|           res2next50            | 128  | 0.990874 |
|          gmixer_24_224          | 128  | 0.990821 |
|       tf_efficientnet_b0        | 128  | 0.990785 |
|         visformer_small         | 128  | 0.990767 |
|         coat_lite_mini          | 128  | 0.990593 |
|            mixnet_l             | 128  | 0.990469 |
|         mobilenetv2_100         | 128  | 0.990345 |
|      beit_base_patch16_224      |  64  | 0.990309 |
|             dpn107              |  64  | 0.990086 |
|          botnet26t_256          | 128  | 0.989775 |
|       gluon_inception_v3        | 256  | 0.98977  |
|           mobilevit_s           |  64  | 0.988764 |
|        sebotnet33ts_256         |  64  | 0.988727 |
|         poolformer_m36          |  64  | 0.988687 |
|             dla102              | 128  | 0.988605 |
|        tnt_s_patch16_224        | 128  | 0.988593 |
|          cspdarknet53           |  64  | 0.988213 |
|            pit_b_224            |  64  | 0.988025 |
|           resnest101e           |  64  | 0.987636 |
|          cait_m36_384           |  4   | 0.987472 |
|          resmlp_12_224          | 128  | 0.987304 |
|            gernet_l             | 128  | 0.987156 |
|        res2net50_14w_8s         | 128  | 0.986764 |
| deit_base_distilled_patch16_224 |  64  | 0.986407 |
|      vit_base_patch16_224       |  64  | 0.985948 |
|           selecsls42b           | 128  | 0.985858 |
|  swin_base_patch4_window7_224   |  64  | 0.985848 |
|          jx_nest_base           |  32  | 0.984975 |
|          inception_v3           | 128  | 0.984867 |
|        adv_inception_v3         | 128  | 0.984509 |
|            tinynet_a            | 128  | 0.984232 |
|          spnasnet_100           | 128  | 0.983389 |
|          pnasnet5large          |  16  | 0.983356 |
|     swsl_resnext101_32x16d      |  32  | 0.982704 |
|            hrnet_w18            | 128  | 0.982664 |
|           volo_d1_224           |  64  | 0.98139  |
|            lcnet_050            | 256  | 0.978239 |
|            repvgg_a2            | 128  | 0.978148 |
|         crossvit_9_240          | 256  | 0.973491 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+--------------+
|              name               |  bs  |   inductor   |
+---------------------------------+------+--------------+
|     swsl_resnext101_32x16d      |  32  | 17346.401411 |
|           resnest101e           |  64  | 2715.886105  |
|      xcit_large_24_p8_224       |  16  | 1489.189146  |
|            hrnet_w18            | 128  | 1429.588219  |
|          cait_m36_384           |  4   | 1210.246456  |
|          convnext_base          |  64  | 1164.451164  |
|          mixer_b16_224          | 128  | 1071.461754  |
|             dpn107              |  64  |  972.678069  |
|           dm_nfnet_f0           | 128  |  951.158239  |
|           convit_base           |  64  |  928.645595  |
|  swin_base_patch4_window7_224   |  64  |  903.399379  |
|        twins_pcpvt_base         | 128  |  830.995162  |
|        tnt_s_patch16_224        | 128  |  816.458035  |
|       gluon_inception_v3        | 256  |  788.285624  |
|      vit_base_patch16_224       |  64  |  697.657557  |
| deit_base_distilled_patch16_224 |  64  |  692.167956  |
|      beit_base_patch16_224      |  64  |  683.90925   |
|        res2net101_26w_4s        | 128  |  644.44004   |
|            nfnet_l0             | 128  |  608.536342  |
|          gmixer_24_224          | 128  |  584.811016  |
|          gmlp_s16_224           | 128  |  584.220592  |
|            levit_128            | 1024 |  566.972338  |
|            pit_b_224            |  64  |  555.220269  |
|        ese_vovnet19b_dw         | 256  |  551.288606  |
|             dla102              | 128  |  537.258449  |
|          jx_nest_base           |  32  |  517.263832  |
|         crossvit_9_240          | 256  |  508.219966  |
|        convmixer_768_32         |  32  |  447.514585  |
|           volo_d1_224           |  64  |  417.434467  |
|         coat_lite_mini          | 128  |  414.148016  |
|         poolformer_m36          |  64  |  412.738128  |
|          inception_v3           | 128  |  399.28021   |
|        adv_inception_v3         | 128  |  399.146077  |
|        res2net50_14w_8s         | 128  |  397.933718  |
|         visformer_small         | 128  |  389.424393  |
|            mixnet_l             | 128  |  363.390252  |
|          ghostnet_100           | 512  |  360.880326  |
|           res2next50            | 128  |  356.989343  |
|           tf_mixnet_l           | 128  |  352.161258  |
|          pnasnet5large          |  16  |  352.026434  |
|            repvgg_a2            | 128  |  344.611366  |
|        sebotnet33ts_256         |  64  |  320.580564  |
|        eca_halonext26ts         | 128  |  309.519205  |
|           fbnetc_100            | 512  |  304.617327  |
|       eca_botnext26ts_256       | 128  |  302.52548   |
|          botnet26t_256          | 128  |  293.507233  |
|            gernet_l             | 128  |  293.328351  |
|           regnety_002           | 1024 |  282.821747  |
|          resmlp_12_224          | 128  |  266.887098  |
|          cspdarknet53           |  64  |  263.170551  |
|           mnasnet_100           | 512  |  257.792474  |
|            fbnetv3_b            | 256  |  242.37076   |
|           selecsls42b           | 128  |  231.370658  |
|      mobilenetv3_large_100      | 512  |  228.374856  |
|           rexnet_100            | 256  |  226.66103   |
|           mobilevit_s           |  64  |  205.746968  |
|       tf_efficientnet_b0        | 128  |  120.203233  |
|            tinynet_a            | 128  |  84.425724   |
|         mobilenetv2_100         | 128  |   70.2519    |
|          spnasnet_100           | 128  |  65.522712   |
|            lcnet_050            | 256  |  27.416788   |
+---------------------------------+------+--------------+

zxd1997066 · 2024-10-16T11:52:06Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-14 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	ed94725b8c5d70b31659d10775c011a23cbcb464
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+3f05699
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.58x    |    1.19x    |    1.53x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   30.66    |    32.71    |    26.76    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.82x    |    0.70x    |    0.78x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 51.075087 |
|     pyhpc_equation_of_state     |    1    | 25.322701 |
|     functorch_maml_omniglot     |    1    | 3.552332  |
|          squeezenet1_1          |    1    | 3.535303  |
|         basic_gnn_sage          |    1    | 3.503813  |
|          basic_gnn_gin          |    1    |  3.48339  |
|          basic_gnn_gcn          |    1    | 3.148879  |
|          maml_omniglot          |    5    | 2.957275  |
|           timm_nfnet            |    1    | 2.779272  |
|         opacus_cifar10          |    1    | 2.740849  |
|      functorch_dp_cifar10       |    1    | 2.526247  |
|       shufflenet_v2_x1_0        |    1    | 2.371975  |
|              dcgan              |    1    | 2.239833  |
|            resnet18             |    1    |  2.21657  |
|          mobilenet_v2           |    1    | 2.140213  |
|           mnasnet1_0            |    1    | 2.064306  |
|         phlippe_resnet          |    1    | 2.045994  |
|          timm_resnest           |    1    | 2.023221  |
|       mobilenet_v3_large        |    1    | 1.970776  |
|        phlippe_densenet         |    1    | 1.833043  |
|            resnet50             |    1    | 1.819554  |
|           densenet121           |    1    |  1.81764  |
|        timm_efficientnet        |    1    | 1.794529  |
|            resnet152            |    1    | 1.701438  |
|          lennard_jones          |    1    | 1.680517  |
|         LearningToPaint         |    1    | 1.662772  |
|           timm_vovnet           |    1    | 1.616947  |
|           timm_regnet           |    1    | 1.528549  |
|         resnext50_32x4d         |    1    | 1.527682  |
|      doctr_reco_predictor       |    1    |  1.50422  |
|              vgg16              |    1    | 1.435581  |
|       doctr_det_predictor       |    1    | 1.409586  |
|        basic_gnn_edgecnn        |    1    | 1.393054  |
|             yolov3              |    1    | 1.388201  |
|              llama              |    1    |  1.37539  |
|             alexnet             |    1    | 1.358875  |
|          BERT_pytorch           |    1    | 1.299941  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.298044  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.280288  |
|         vision_maskrcnn         |    1    | 1.267398  |
|             hf_GPT2             |    1    | 1.260884  |
|           hf_Reformer           |    1    | 1.258576  |
|          hf_GPT2_large          |    1    | 1.245018  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.241021  |
|            hf_Albert            |    1    | 1.228191  |
|              maml               |    1    | 1.211736  |
|           Super_SloMo           |    1    | 1.208741  |
|          fastNLP_Bert           |    1    | 1.202662  |
|            moondream            |    1    | 1.202481  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  1.19567  |
|         pytorch_stargan         |   16    | 1.182309  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.181683  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.175741  |
|          hf_Bert_large          |    1    | 1.148399  |
|             hf_Bert             |    1    |  1.14187  |
|  timm_vision_transformer_large  |    1    | 1.133768  |
|          pytorch_unet           |    1    | 1.132114  |
|             hf_Bart             |    1    | 1.128457  |
|      torch_multimodal_clip      |    1    | 1.112806  |
|     timm_vision_transformer     |    1    | 1.109197  |
|              dlrm               |    1    | 1.108972  |
|           hf_BigBird            |    1    | 1.101496  |
|          hf_DistilBert          |    1    | 1.100652  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.096936  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  1.08012  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.068608  |
|       speech_transformer        |    1    |  1.06169  |
|        soft_actor_critic        |   256   | 1.049024  |
|        hf_distil_whisper        |    1    |  1.04007  |
|             demucs              |    1    | 1.005406  |
|           tts_angular           |    1    | 0.998298  |
|          hf_Longformer          |    1    | 0.997821  |
|     resnet50_quantized_qat      |    1    | 0.994111  |
|     nvidia_deeprecommender      |    1    | 0.989659  |
|   mobilenet_v2_quantized_qat    |    1    | 0.978109  |
|       Background_Matting        |    1    |  0.92633  |
|           hf_T5_large           |    1    | 0.796876  |
|              hf_T5              |    1    | 0.677531  |
|           hf_T5_base            |    1    | 0.587743  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 156.826676 |
|         vision_maskrcnn         |    1    | 146.132764 |
|           hf_T5_large           |    1    | 129.094915 |
|    detectron2_fcos_r_50_fpn     |    1    | 121.696178 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 114.164172 |
|           hf_T5_base            |    1    | 104.710882 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 92.309635  |
|          hf_Longformer          |    1    | 89.323553  |
|              maml               |    1    | 83.473211  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 68.440183  |
|       speech_transformer        |    1    | 55.588488  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 49.459186  |
|  timm_vision_transformer_large  |    1    | 46.261354  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 45.312465  |
|           hf_Reformer           |    1    | 44.525899  |
|           densenet121           |    1    |  39.98806  |
|            moondream            |    1    | 36.840079  |
|          hf_GPT2_large          |    1    | 36.598948  |
|          hf_Bert_large          |    1    | 35.362688  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 34.306027  |
|        hf_distil_whisper        |    1    | 33.912876  |
|          fastNLP_Bert           |    1    | 33.567592  |
|          basic_gnn_gcn          |    1    | 29.596611  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.803056  |
|       doctr_det_predictor       |    1    | 27.219771  |
|      torch_multimodal_clip      |    1    | 26.996613  |
|            resnet152            |    1    | 26.090934  |
|              hf_T5              |    1    | 24.996558  |
|           Super_SloMo           |    1    | 24.670837  |
|          BERT_pytorch           |    1    | 24.243487  |
|             hf_Bart             |    1    | 23.640664  |
|             hf_Bert             |    1    |  19.98859  |
|             yolov3              |    1    |  19.89574  |
|       Background_Matting        |    1    | 19.434281  |
|             demucs              |    1    | 17.948838  |
|           timm_regnet           |    1    |  17.91436  |
|              llama              |    1    | 17.840552  |
|             hf_GPT2             |    1    | 17.554038  |
|        phlippe_densenet         |    1    |  17.42529  |
|       shufflenet_v2_x1_0        |    1    | 17.200721  |
|            hf_Albert            |    1    | 16.934845  |
|     timm_vision_transformer     |    1    |  16.54905  |
|           timm_nfnet            |    1    |  16.51335  |
|        timm_efficientnet        |    1    | 16.163959  |
|          timm_resnest           |    1    | 15.153288  |
|       mobilenet_v3_large        |    1    | 15.090695  |
|      doctr_reco_predictor       |    1    | 14.738617  |
|           timm_vovnet           |    1    | 14.439516  |
|         pytorch_stargan         |   16    | 14.071981  |
|         resnext50_32x4d         |    1    | 14.013516  |
|            resnet50             |    1    | 13.964446  |
|          mobilenet_v2           |    1    | 13.895985  |
|          hf_DistilBert          |    1    | 13.710991  |
|           mnasnet1_0            |    1    | 13.527102  |
|     pyhpc_isoneutral_mixing     |    1    | 12.705098  |
|         opacus_cifar10          |    1    | 12.512147  |
|      functorch_dp_cifar10       |    1    |  11.92223  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.539742  |
|          pytorch_unet           |    1    | 10.552236  |
|            resnet18             |    1    | 10.154636  |
|         LearningToPaint         |    1    |  9.891555  |
|          squeezenet1_1          |    1    |  9.813584  |
|         phlippe_resnet          |    1    |  9.673459  |
|              vgg16              |    1    |  9.431331  |
|             alexnet             |    1    |  9.203779  |
|          maml_omniglot          |    5    |  9.111372  |
|     pyhpc_equation_of_state     |    1    |  9.062595  |
|     functorch_maml_omniglot     |    1    |  8.603998  |
|     nvidia_deeprecommender      |    1    |  8.47332   |
|              dlrm               |    1    |  8.202537  |
|              dcgan              |    1    |  7.73014   |
|          basic_gnn_gin          |    1    |  7.697163  |
|         basic_gnn_sage          |    1    |  7.695083  |
|        soft_actor_critic        |   256   |  7.547479  |
|        basic_gnn_edgecnn        |    1    |  7.527001  |
|          lennard_jones          |    1    |  7.508005  |
|           tts_angular           |    1    |  7.289201  |
|   mobilenet_v2_quantized_qat    |    1    |  0.098483  |
|     resnet50_quantized_qat      |    1    |  0.070114  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995081 |
|              dlrm               |    1    |  0.9873  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981241 |
|       Background_Matting        |    1    | 0.978751 |
|          pytorch_unet           |    1    | 0.974662 |
|        basic_gnn_edgecnn        |    1    | 0.973199 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972803 |
|       doctr_det_predictor       |    1    | 0.955818 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.955148 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.952874 |
|     resnet50_quantized_qat      |    1    | 0.95149  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.951339 |
|          basic_gnn_gin          |    1    | 0.947413 |
|         vision_maskrcnn         |    1    | 0.946735 |
|         pytorch_stargan         |   16    | 0.946524 |
|          basic_gnn_gcn          |    1    | 0.944739 |
|         basic_gnn_sage          |    1    | 0.944646 |
|         LearningToPaint         |    1    | 0.944136 |
|      doctr_reco_predictor       |    1    | 0.943412 |
|           Super_SloMo           |    1    | 0.936343 |
|           hf_T5_base            |    1    | 0.929843 |
|   mobilenet_v2_quantized_qat    |    1    | 0.922449 |
|              llama              |    1    | 0.917852 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.912412 |
|           hf_BigBird            |    1    | 0.891815 |
|        soft_actor_critic        |   256   | 0.886427 |
|           tts_angular           |    1    | 0.885894 |
|         opacus_cifar10          |    1    | 0.880028 |
|             yolov3              |    1    | 0.876148 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.860042 |
|        timm_efficientnet        |    1    | 0.858736 |
|          lennard_jones          |    1    | 0.857302 |
|              dcgan              |    1    | 0.855874 |
|          mobilenet_v2           |    1    | 0.855109 |
|           mnasnet1_0            |    1    | 0.854111 |
|          squeezenet1_1          |    1    | 0.85361  |
|          maml_omniglot          |    5    | 0.853403 |
|     functorch_maml_omniglot     |    1    | 0.850886 |
|          timm_resnest           |    1    | 0.847075 |
|     pyhpc_equation_of_state     |    1    | 0.84141  |
|       mobilenet_v3_large        |    1    | 0.835025 |
|         phlippe_resnet          |    1    | 0.82963  |
|       shufflenet_v2_x1_0        |    1    | 0.828518 |
|     pyhpc_isoneutral_mixing     |    1    | 0.814499 |
|           timm_nfnet            |    1    | 0.814223 |
|        phlippe_densenet         |    1    | 0.806133 |
|           timm_vovnet           |    1    | 0.797342 |
|         resnext50_32x4d         |    1    | 0.797335 |
|            resnet18             |    1    | 0.789201 |
|            resnet50             |    1    | 0.781439 |
|           timm_regnet           |    1    | 0.778506 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.76516  |
|          hf_GPT2_large          |    1    | 0.762156 |
|           densenet121           |    1    | 0.743511 |
|      functorch_dp_cifar10       |    1    | 0.739819 |
|             alexnet             |    1    | 0.739513 |
|          BERT_pytorch           |    1    | 0.736128 |
|            moondream            |    1    | 0.729479 |
|     timm_vision_transformer     |    1    | 0.726542 |
|              vgg16              |    1    | 0.723202 |
|           hf_T5_large           |    1    | 0.721421 |
|            resnet152            |    1    | 0.71783  |
|              maml               |    1    | 0.711726 |
|          fastNLP_Bert           |    1    | 0.693577 |
|             hf_GPT2             |    1    | 0.691257 |
|          hf_Longformer          |    1    | 0.683343 |
|     nvidia_deeprecommender      |    1    | 0.672491 |
|              hf_T5              |    1    | 0.665285 |
|  timm_vision_transformer_large  |    1    | 0.662188 |
|          hf_DistilBert          |    1    | 0.661726 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.658882 |
|             hf_Bart             |    1    | 0.658051 |
|             hf_Bert             |    1    | 0.650983 |
|          hf_Bert_large          |    1    | 0.649127 |
|      torch_multimodal_clip      |    1    | 0.639643 |
|            hf_Albert            |    1    | 0.605545 |
|       speech_transformer        |    1    | 0.584242 |
|        hf_distil_whisper        |    1    | 0.57235  |
|           hf_Reformer           |    1    | 0.55583  |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26992.89203  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11868.664759 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11350.218552 |
|          hf_GPT2_large          |    1    | 10183.486289 |
|           hf_T5_large           |    1    | 7780.401763  |
|            moondream            |    1    | 7505.264373  |
|        hf_distil_whisper        |    1    | 7027.893933  |
|       Background_Matting        |    1    | 6082.800332  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  5715.60599  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5093.460499  |
|          pytorch_unet           |    1    | 4525.046798  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3194.598566  |
|  timm_vision_transformer_large  |    1    | 2902.992574  |
|         vision_maskrcnn         |    1    |  2661.8795   |
|    detectron2_fcos_r_50_fpn     |    1    | 2429.691774  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2427.319647  |
|             demucs              |    1    | 2304.392231  |
|         pytorch_stargan         |   16    | 2052.958165  |
|           Super_SloMo           |    1    | 1949.463926  |
|          hf_Bert_large          |    1    | 1844.620707  |
|           hf_BigBird            |    1    | 1600.641645  |
|       doctr_det_predictor       |    1    | 1574.812838  |
|      torch_multimodal_clip      |    1    | 1321.337472  |
|          hf_Longformer          |    1    | 1160.997877  |
|             hf_Bart             |    1    |  878.540012  |
|              hf_T5              |    1    |  810.188495  |
|       speech_transformer        |    1    |  705.697713  |
|             hf_Bert             |    1    |  698.79146   |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.038574  |
|            hf_Albert            |    1    |  596.543421  |
|          fastNLP_Bert           |    1    |  540.959856  |
|             yolov3              |    1    |  434.893431  |
|          hf_DistilBert          |    1    |  433.203559  |
|             hf_GPT2             |    1    |  361.659001  |
|           hf_Reformer           |    1    |  299.064427  |
|        basic_gnn_edgecnn        |    1    |  235.55871   |
|              vgg16              |    1    |  193.604884  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  160.489987  |
|           timm_regnet           |    1    |  151.984362  |
|          BERT_pytorch           |    1    |  145.617554  |
|            resnet152            |    1    |  138.934382  |
|           timm_nfnet            |    1    |  99.082643   |
|              maml               |    1    |  82.008701   |
|           timm_vovnet           |    1    |  81.306124   |
|     timm_vision_transformer     |    1    |  67.124071   |
|     nvidia_deeprecommender      |    1    |  59.893034   |
|         resnext50_32x4d         |    1    |  57.723267   |
|           tts_angular           |    1    |  54.986089   |
|            resnet50             |    1    |  51.855716   |
|           densenet121           |    1    |  44.515598   |
|          timm_resnest           |    1    |  33.376532   |
|          basic_gnn_gcn          |    1    |  32.481332   |
|              llama              |    1    |  25.738847   |
|      doctr_reco_predictor       |    1    |  23.796447   |
|            resnet18             |    1    |  22.746109   |
|             alexnet             |    1    |  22.529182   |
|     resnet50_quantized_qat      |    1    |  19.665104   |
|         basic_gnn_sage          |    1    |  17.231167   |
|          basic_gnn_gin          |    1    |  17.004431   |
|        timm_efficientnet        |    1    |  13.175415   |
|         LearningToPaint         |    1    |   9.888913   |
|   mobilenet_v2_quantized_qat    |    1    |   8.688199   |
|           mnasnet1_0            |    1    |   7.59462    |
|          mobilenet_v2           |    1    |   7.513623   |
|       mobilenet_v3_large        |    1    |   7.211626   |
|          squeezenet1_1          |    1    |   5.856602   |
|       shufflenet_v2_x1_0        |    1    |   5.586866   |
|        soft_actor_critic        |   256   |   3.469629   |
|        phlippe_densenet         |    1    |   3.466044   |
|      functorch_dp_cifar10       |    1    |   2.165992   |
|         opacus_cifar10          |    1    |   2.15139    |
|              dcgan              |    1    |   1.764493   |
|         phlippe_resnet          |    1    |   1.402284   |
|     functorch_maml_omniglot     |    1    |   0.88237    |
|          maml_omniglot          |    5    |   0.796348   |
|              dlrm               |    1    |   0.758945   |
|     pyhpc_isoneutral_mixing     |    1    |   0.062539   |
|     pyhpc_equation_of_state     |    1    |   0.04887    |
|          lennard_jones          |    1    |   0.047375   |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 1.59555  |
|         Speech2Text2ForCausalLM         | 1  | 1.39637  |
|      GPT2ForSequenceClassification      | 1  | 1.370124 |
|            XLNetLMHeadModel             | 1  | 1.368517 |
|          DistilBertForMaskedLM          | 1  | 1.355501 |
|            YituTechConvBert             | 1  | 1.328838 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.323107 |
|       BlenderbotSmallForCausalLM        | 1  | 1.300931 |
|           DebertaForMaskedLM            | 1  | 1.269672 |
|     M2M100ForConditionalGeneration      | 1  | 1.267732 |
|           PegasusForCausalLM            | 1  | 1.267334 |
|               GoogleFnet                | 1  | 1.255093 |
|     DistilBertForQuestionAnswering      | 1  | 1.250999 |
|       DebertaForQuestionAnswering       | 1  | 1.244097 |
|          BlenderbotForCausalLM          | 1  | 1.227506 |
|             XGLMForCausalLM             | 1  | 1.226541 |
|           LayoutLMForMaskedLM           | 1  | 1.222351 |
|     PegasusForConditionalGeneration     | 1  | 1.215598 |
|           ElectraForCausalLM            | 1  | 1.214246 |
|               DistillGPT2               | 1  | 1.211578 |
|     MobileBertForQuestionAnswering      | 1  | 1.20992  |
|             BertForMaskedLM             | 1  | 1.20186  |
|           RobertaForCausalLM            | 1  | 1.201484 |
|                CamemBert                | 1  | 1.197974 |
|       MT5ForConditionalGeneration       | 1  | 1.176241 |
|            TrOCRForCausalLM             | 1  | 1.173049 |
|    LayoutLMForSequenceClassification    | 1  | 1.168556 |
|       AlbertForQuestionAnswering        | 1  | 1.165501 |
|          DebertaV2ForMaskedLM           | 1  | 1.165205 |
|            AlbertForMaskedLM            | 1  | 1.153195 |
|       ElectraForQuestionAnswering       | 1  | 1.151438 |
|         MegatronBertForCausalLM         | 1  | 1.146056 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.145877 |
|       RobertaForQuestionAnswering       | 1  | 1.142487 |
|        BertForQuestionAnswering         | 1  | 1.138313 |
|    MegatronBertForQuestionAnswering     | 1  | 1.125654 |
|     PLBartForConditionalGeneration      | 1  | 1.120994 |
|      MBartForConditionalGeneration      | 1  | 1.08757  |
|             BartForCausalLM             | 1  | 1.080246 |
|      BartForConditionalGeneration       | 1  | 1.060887 |
|            PLBartForCausalLM            | 1  | 1.05847  |
|             OPTForCausalLM              | 1  | 1.028829 |
|            MBartForCausalLM             | 1  | 1.018158 |
|          AllenaiLongformerBase          | 1  |  0.9942  |
|       T5ForConditionalGeneration        | 1  | 0.622914 |
|                 T5Small                 | 1  | 0.622761 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+------------+
|                  name                   | bs |  inductor  |
+-----------------------------------------+----+------------+
|          MobileBertForMaskedLM          | 1  | 117.722063 |
|     MobileBertForQuestionAnswering      | 1  | 116.159035 |
|          AllenaiLongformerBase          | 1  | 100.408954 |
|     M2M100ForConditionalGeneration      | 1  | 60.950511  |
|     PegasusForConditionalGeneration     | 1  | 60.458376  |
|      MBartForConditionalGeneration      | 1  | 59.204301  |
|      BartForConditionalGeneration       | 1  | 50.652661  |
|             XGLMForCausalLM             | 1  | 44.976322  |
|          BlenderbotForCausalLM          | 1  |  44.29538  |
|          DebertaV2ForMaskedLM           | 1  | 40.370263  |
|      DebertaV2ForQuestionAnswering      | 1  |  40.13206  |
|         MegatronBertForCausalLM         | 1  | 36.160772  |
|    MegatronBertForQuestionAnswering     | 1  | 35.541083  |
|       MT5ForConditionalGeneration       | 1  | 35.479752  |
| BlenderbotSmallForConditionalGeneration | 1  | 34.554469  |
|            XLNetLMHeadModel             | 1  | 34.332827  |
|            YituTechConvBert             | 1  | 29.632112  |
|       T5ForConditionalGeneration        | 1  | 29.532234  |
|                 T5Small                 | 1  | 29.387793  |
|     PLBartForConditionalGeneration      | 1  | 25.720412  |
|            MBartForCausalLM             | 1  | 23.500779  |
|           PegasusForCausalLM            | 1  | 23.048999  |
|            TrOCRForCausalLM             | 1  | 22.855627  |
|             OPTForCausalLM              | 1  |  22.32195  |
|           ElectraForCausalLM            | 1  | 21.171439  |
|       ElectraForQuestionAnswering       | 1  | 20.825441  |
|           LayoutLMForMaskedLM           | 1  | 20.618719  |
|                CamemBert                | 1  | 20.563246  |
|           RobertaForCausalLM            | 1  | 20.550285  |
|             BertForMaskedLM             | 1  | 20.437847  |
|       RobertaForQuestionAnswering       | 1  | 20.361714  |
|    LayoutLMForSequenceClassification    | 1  | 20.345149  |
|        BertForQuestionAnswering         | 1  | 20.243398  |
|             BartForCausalLM             | 1  |  20.13054  |
|           DebertaForMaskedLM            | 1  | 19.923099  |
|       DebertaForQuestionAnswering       | 1  | 19.739972  |
|       BlenderbotSmallForCausalLM        | 1  | 17.061833  |
|      GPT2ForSequenceClassification      | 1  | 16.015001  |
|               GoogleFnet                | 1  | 15.026479  |
|            PLBartForCausalLM            | 1  | 14.662152  |
|         Speech2Text2ForCausalLM         | 1  |  14.63683  |
|          DistilBertForMaskedLM          | 1  | 14.618562  |
|     DistilBertForQuestionAnswering      | 1  | 14.357098  |
|            AlbertForMaskedLM            | 1  |  13.01933  |
|               DistillGPT2               | 1  | 12.221126  |
|       AlbertForQuestionAnswering        | 1  | 10.579983  |
+-----------------------------------------+----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.965108 |
|      GPT2ForSequenceClassification      | 1  | 0.870652 |
|      MBartForConditionalGeneration      | 1  | 0.813202 |
|          BlenderbotForCausalLM          | 1  | 0.78586  |
|       MT5ForConditionalGeneration       | 1  | 0.769027 |
|         Speech2Text2ForCausalLM         | 1  | 0.767327 |
|          AllenaiLongformerBase          | 1  | 0.758466 |
|     DistilBertForQuestionAnswering      | 1  | 0.75846  |
|            XLNetLMHeadModel             | 1  | 0.749782 |
|       T5ForConditionalGeneration        | 1  | 0.745815 |
|                 T5Small                 | 1  | 0.745776 |
|            MBartForCausalLM             | 1  | 0.745054 |
|            PLBartForCausalLM            | 1  | 0.743747 |
|     PLBartForConditionalGeneration      | 1  | 0.73966  |
|               DistillGPT2               | 1  | 0.735447 |
|           PegasusForCausalLM            | 1  | 0.723984 |
|       BlenderbotSmallForCausalLM        | 1  | 0.722945 |
|            TrOCRForCausalLM             | 1  | 0.718387 |
|       DebertaForQuestionAnswering       | 1  | 0.712635 |
|          DistilBertForMaskedLM          | 1  | 0.71097  |
|      DebertaV2ForQuestionAnswering      | 1  | 0.705171 |
|          DebertaV2ForMaskedLM           | 1  | 0.701323 |
|       ElectraForQuestionAnswering       | 1  | 0.700719 |
|           DebertaForMaskedLM            | 1  | 0.693451 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.693308 |
|               GoogleFnet                | 1  | 0.689504 |
|       RobertaForQuestionAnswering       | 1  | 0.688039 |
|     M2M100ForConditionalGeneration      | 1  | 0.685714 |
|           ElectraForCausalLM            | 1  | 0.680573 |
|    LayoutLMForSequenceClassification    | 1  | 0.679977 |
|    MegatronBertForQuestionAnswering     | 1  | 0.679308 |
|     PegasusForConditionalGeneration     | 1  | 0.676896 |
|             XGLMForCausalLM             | 1  | 0.67583  |
|        BertForQuestionAnswering         | 1  | 0.675101 |
|         MegatronBertForCausalLM         | 1  | 0.668616 |
|                CamemBert                | 1  | 0.667723 |
|           RobertaForCausalLM            | 1  | 0.666877 |
|           LayoutLMForMaskedLM           | 1  | 0.666609 |
|             BertForMaskedLM             | 1  | 0.661468 |
|             BartForCausalLM             | 1  | 0.655978 |
|            YituTechConvBert             | 1  | 0.646646 |
|      BartForConditionalGeneration       | 1  | 0.637264 |
|     MobileBertForQuestionAnswering      | 1  | 0.625671 |
|          MobileBertForMaskedLM          | 1  | 0.62553  |
|            AlbertForMaskedLM            | 1  | 0.376726 |
|       AlbertForQuestionAnswering        | 1  | 0.371318 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 13017.424289 |
|       AlbertForQuestionAnswering        | 1  | 12979.708799 |
|      MBartForConditionalGeneration      | 1  |  6200.11733  |
|      BartForConditionalGeneration       | 1  | 5629.614526  |
|             OPTForCausalLM              | 1  | 5334.939842  |
|          DebertaV2ForMaskedLM           | 1  | 5100.327605  |
|      DebertaV2ForQuestionAnswering      | 1  | 4077.528736  |
|            XLNetLMHeadModel             | 1  | 3305.196408  |
|            MBartForCausalLM             | 1  | 3085.690964  |
|          BlenderbotForCausalLM          | 1  | 2701.057379  |
|             BartForCausalLM             | 1  | 2532.461171  |
|       T5ForConditionalGeneration        | 1  | 2485.503556  |
|                 T5Small                 | 1  | 2484.259879  |
|          AllenaiLongformerBase          | 1  | 2463.615728  |
|         MegatronBertForCausalLM         | 1  | 2195.788174  |
|     PLBartForConditionalGeneration      | 1  | 2170.224134  |
|    MegatronBertForQuestionAnswering     | 1  | 1943.275361  |
|      GPT2ForSequenceClassification      | 1  | 1309.521007  |
|            PLBartForCausalLM            | 1  |   1218.757   |
|             XGLMForCausalLM             | 1  |  864.045759  |
|           DebertaForMaskedLM            | 1  |  797.672526  |
|           RobertaForCausalLM            | 1  |  780.237124  |
|     M2M100ForConditionalGeneration      | 1  |  732.817957  |
|            YituTechConvBert             | 1  |  698.110969  |
|                CamemBert                | 1  |  693.466116  |
|             BertForMaskedLM             | 1  |  685.294429  |
|           LayoutLMForMaskedLM           | 1  |  684.602626  |
|     PegasusForConditionalGeneration     | 1  |  637.110489  |
|            TrOCRForCausalLM             | 1  |  601.216869  |
|       DebertaForQuestionAnswering       | 1  |  589.167478  |
|        BertForQuestionAnswering         | 1  |  571.095345  |
|       RobertaForQuestionAnswering       | 1  |  568.749231  |
|    LayoutLMForSequenceClassification    | 1  |  567.033698  |
|               DistillGPT2               | 1  |  500.272049  |
|               GoogleFnet                | 1  |  483.401614  |
|       MT5ForConditionalGeneration       | 1  |  407.551765  |
|           PegasusForCausalLM            | 1  |  307.160377  |
| BlenderbotSmallForConditionalGeneration | 1  |  152.51305   |
|           ElectraForCausalLM            | 1  |  139.463028  |
|       ElectraForQuestionAnswering       | 1  |  103.14873   |
|          DistilBertForMaskedLM          | 1  |  102.82432   |
|          MobileBertForMaskedLM          | 1  |  91.349677   |
|       BlenderbotSmallForCausalLM        | 1  |  87.949764   |
|     DistilBertForQuestionAnswering      | 1  |  70.119233   |
|     MobileBertForQuestionAnswering      | 1  |  63.722195   |
|         Speech2Text2ForCausalLM         | 1  |  20.778414   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.431634 |
|          inception_v3           | 1  | 2.413462 |
|       gluon_inception_v3        | 1  | 2.395498 |
|        adv_inception_v3         | 1  | 2.37006  |
|           dm_nfnet_f0           | 1  | 2.22911  |
|            nfnet_l0             | 1  | 2.228761 |
|         mobilenetv2_100         | 1  | 2.212038 |
|          ghostnet_100           | 1  | 2.197057 |
|          spnasnet_100           | 1  | 2.191027 |
|            lcnet_050            | 1  | 2.171174 |
|           mnasnet_100           | 1  | 2.113406 |
|           regnety_002           | 1  | 2.101399 |
|           fbnetc_100            | 1  | 2.051915 |
|      mobilenetv3_large_100      | 1  | 2.029903 |
|            repvgg_a2            | 1  | 1.953343 |
|            fbnetv3_b            | 1  | 1.875697 |
|           rexnet_100            | 1  | 1.851842 |
|       tf_efficientnet_b0        | 1  | 1.832749 |
|            levit_128            | 1  | 1.830486 |
|            tinynet_a            | 1  | 1.77407  |
|         poolformer_m36          | 1  | 1.77301  |
|           selecsls42b           | 1  | 1.760577 |
|        ese_vovnet19b_dw         | 1  | 1.732781 |
|             dla102              | 1  | 1.730217 |
|           mobilevit_s           | 1  | 1.694638 |
|        eca_halonext26ts         | 1  | 1.635524 |
|          botnet26t_256          | 1  | 1.619029 |
|           res2next50            | 1  | 1.607823 |
|        res2net50_14w_8s         | 1  |  1.6036  |
|       eca_botnext26ts_256       | 1  | 1.59214  |
|          cspdarknet53           | 1  | 1.577796 |
|           tf_mixnet_l           | 1  | 1.565627 |
|        res2net101_26w_4s        | 1  | 1.534698 |
|         coat_lite_mini          | 1  | 1.496037 |
|           volo_d1_224           | 1  | 1.455565 |
|         visformer_small         | 1  | 1.420822 |
|            mixnet_l             | 1  | 1.393576 |
|          jx_nest_base           | 1  | 1.379512 |
|           convit_base           | 1  | 1.349916 |
|            gernet_l             | 1  | 1.339688 |
|          gmixer_24_224          | 1  | 1.313219 |
|        twins_pcpvt_base         | 1  | 1.263455 |
|      beit_base_patch16_224      | 1  | 1.245803 |
|        convmixer_768_32         | 1  | 1.244793 |
|  swin_base_patch4_window7_224   | 1  | 1.203318 |
|             dpn107              | 1  | 1.192464 |
|          resmlp_12_224          | 1  | 1.171124 |
| deit_base_distilled_patch16_224 | 1  | 1.147518 |
|            pit_b_224            | 1  | 1.126479 |
|      vit_base_patch16_224       | 1  | 1.125888 |
|          mixer_b16_224          | 1  | 1.122308 |
|          gmlp_s16_224           | 1  | 1.114952 |
|      xcit_large_24_p8_224       | 1  | 1.099969 |
|         crossvit_9_240          | 1  | 1.084433 |
|        tnt_s_patch16_224        | 1  | 1.065948 |
|          convnext_base          | 1  | 1.030198 |
|          cait_m36_384           | 1  | 1.019703 |
|           resnest101e           | 1  | 0.992866 |
|        sebotnet33ts_256         | 1  | 0.970119 |
|            hrnet_w18            | 1  | 0.622425 |
|     swsl_resnext101_32x16d      | 1  | 0.067838 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|     swsl_resnext101_32x16d      | 1  | 84.176262 |
|          cait_m36_384           | 1  | 79.178449 |
|      xcit_large_24_p8_224       | 1  | 61.896642 |
|            hrnet_w18            | 1  | 55.795194 |
|          pnasnet5large          | 1  | 51.115787 |
|  swin_base_patch4_window7_224   | 1  | 50.611115 |
|        twins_pcpvt_base         | 1  | 45.916134 |
|         poolformer_m36          | 1  | 43.495928 |
|        tnt_s_patch16_224        | 1  | 40.773076 |
|          jx_nest_base           | 1  | 38.142282 |
|        res2net101_26w_4s        | 1  | 37.516763 |
|           resnest101e           | 1  | 35.884381 |
|        res2net50_14w_8s         | 1  | 34.876533 |
|             dpn107              | 1  | 32.440566 |
|           mobilevit_s           | 1  | 30.771538 |
|           volo_d1_224           | 1  | 30.436562 |
|          gmixer_24_224          | 1  | 29.604641 |
|           tf_mixnet_l           | 1  | 29.473866 |
|          gmlp_s16_224           | 1  | 28.589084 |
|        adv_inception_v3         | 1  | 28.328373 |
|         crossvit_9_240          | 1  | 28.205145 |
|            mixnet_l             | 1  | 27.760043 |
|            levit_128            | 1  | 27.449327 |
|          convnext_base          | 1  | 26.857902 |
|          inception_v3           | 1  | 26.372517 |
|       gluon_inception_v3        | 1  | 26.230481 |
|           res2next50            | 1  | 23.594501 |
|            fbnetv3_b            | 1  | 22.119222 |
|        eca_halonext26ts         | 1  | 22.061034 |
|        sebotnet33ts_256         | 1  | 21.537526 |
|             dla102              | 1  | 21.517383 |
|         coat_lite_mini          | 1  | 21.452396 |
|           convit_base           | 1  | 21.177712 |
|          ghostnet_100           | 1  | 20.72509  |
|           rexnet_100            | 1  | 20.18518  |
|            pit_b_224            | 1  | 18.71726  |
|       eca_botnext26ts_256       | 1  | 18.052033 |
|            tinynet_a            | 1  | 17.956981 |
|      beit_base_patch16_224      | 1  | 17.841859 |
|         visformer_small         | 1  | 17.429001 |
|        convmixer_768_32         | 1  | 17.369016 |
|          mixer_b16_224          | 1  | 17.287926 |
|          botnet26t_256          | 1  | 17.243491 |
|          cspdarknet53           | 1  | 16.939582 |
| deit_base_distilled_patch16_224 | 1  | 16.718829 |
|       tf_efficientnet_b0        | 1  | 16.700284 |
|      vit_base_patch16_224       | 1  | 16.57863  |
|           dm_nfnet_f0           | 1  | 16.370819 |
|            repvgg_a2            | 1  | 14.935305 |
|            nfnet_l0             | 1  | 14.890287 |
|           fbnetc_100            | 1  | 14.873761 |
|           regnety_002           | 1  | 14.816444 |
|          spnasnet_100           | 1  | 14.810454 |
|      mobilenetv3_large_100      | 1  | 14.707195 |
|            gernet_l             | 1  | 14.369333 |
|         mobilenetv2_100         | 1  | 13.631841 |
|           mnasnet_100           | 1  | 13.384842 |
|        ese_vovnet19b_dw         | 1  | 13.348574 |
|           selecsls42b           | 1  |  12.7935  |
|          resmlp_12_224          | 1  | 12.707361 |
|            lcnet_050            | 1  | 11.579756 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        convmixer_768_32         | 1  | 0.902969 |
|            nfnet_l0             | 1  | 0.896859 |
|          pnasnet5large          | 1  | 0.894487 |
|        ese_vovnet19b_dw         | 1  | 0.883919 |
|         mobilenetv2_100         | 1  | 0.878613 |
|           mnasnet_100           | 1  | 0.864469 |
|        eca_halonext26ts         | 1  | 0.862598 |
|       eca_botnext26ts_256       | 1  | 0.862375 |
|            lcnet_050            | 1  | 0.858565 |
|       tf_efficientnet_b0        | 1  | 0.855457 |
|          botnet26t_256          | 1  | 0.854553 |
|           dm_nfnet_f0           | 1  | 0.85441  |
|           fbnetc_100            | 1  | 0.853971 |
|          spnasnet_100           | 1  | 0.853874 |
|            fbnetv3_b            | 1  | 0.853436 |
|      mobilenetv3_large_100      | 1  | 0.852024 |
|           rexnet_100            | 1  | 0.84953  |
|          cspdarknet53           | 1  | 0.845005 |
|            tinynet_a            | 1  | 0.84315  |
|          ghostnet_100           | 1  | 0.833036 |
|           mobilevit_s           | 1  | 0.832489 |
|            mixnet_l             | 1  | 0.830494 |
|           regnety_002           | 1  | 0.829035 |
|           tf_mixnet_l           | 1  | 0.82841  |
|        sebotnet33ts_256         | 1  | 0.81934  |
|         visformer_small         | 1  | 0.818267 |
|             dpn107              | 1  | 0.812146 |
|         poolformer_m36          | 1  | 0.803091 |
|            levit_128            | 1  | 0.795963 |
|        adv_inception_v3         | 1  | 0.794398 |
|       gluon_inception_v3        | 1  | 0.793976 |
|             dla102              | 1  | 0.793968 |
|          inception_v3           | 1  | 0.793374 |
|          gmlp_s16_224           | 1  | 0.792823 |
|           resnest101e           | 1  | 0.791483 |
|          gmixer_24_224          | 1  | 0.786175 |
|           selecsls42b           | 1  | 0.785579 |
|           res2next50            | 1  | 0.783152 |
|            hrnet_w18            | 1  | 0.781094 |
|          cait_m36_384           | 1  | 0.77595  |
|        res2net50_14w_8s         | 1  | 0.767755 |
|          resmlp_12_224          | 1  | 0.758629 |
|         crossvit_9_240          | 1  | 0.747639 |
|            repvgg_a2            | 1  | 0.738314 |
|        res2net101_26w_4s        | 1  | 0.73563  |
|            gernet_l             | 1  | 0.732245 |
|         coat_lite_mini          | 1  | 0.731492 |
|           convit_base           | 1  | 0.706421 |
|           volo_d1_224           | 1  | 0.704708 |
|      beit_base_patch16_224      | 1  | 0.694562 |
|      vit_base_patch16_224       | 1  | 0.68713  |
| deit_base_distilled_patch16_224 | 1  | 0.681689 |
|          jx_nest_base           | 1  | 0.674504 |
|          mixer_b16_224          | 1  | 0.674087 |
|            pit_b_224            | 1  | 0.672829 |
|        tnt_s_patch16_224        | 1  | 0.658938 |
|     swsl_resnext101_32x16d      | 1  | 0.653398 |
|        twins_pcpvt_base         | 1  | 0.643156 |
|  swin_base_patch4_window7_224   | 1  | 0.633168 |
|      xcit_large_24_p8_224       | 1  | 0.594885 |
|          convnext_base          | 1  | 0.581907 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13589.628065 |
|          cait_m36_384           | 1  |  3589.25832  |
|      xcit_large_24_p8_224       | 1  | 1684.796364  |
|           resnest101e           | 1  |  1089.81673  |
|          pnasnet5large          | 1  |  376.169878  |
|          convnext_base          | 1  |  362.279316  |
|            hrnet_w18            | 1  |  292.803424  |
|             dpn107              | 1  |  273.449309  |
|          jx_nest_base           | 1  |  260.042085  |
|        convmixer_768_32         | 1  |  252.129125  |
|  swin_base_patch4_window7_224   | 1  |  248.437101  |
|      vit_base_patch16_224       | 1  |  215.061556  |
|           convit_base           | 1  |  213.476165  |
|      beit_base_patch16_224      | 1  |  212.409436  |
| deit_base_distilled_patch16_224 | 1  |  211.969775  |
|            pit_b_224            | 1  |  181.496745  |
|           dm_nfnet_f0           | 1  |  160.980285  |
|          mixer_b16_224          | 1  |  159.313055  |
|        twins_pcpvt_base         | 1  |  127.500606  |
|         poolformer_m36          | 1  |  123.02547   |
|        tnt_s_patch16_224        | 1  |  113.658519  |
|        res2net101_26w_4s        | 1  |  113.61639   |
|           volo_d1_224           | 1  |  104.647742  |
|        sebotnet33ts_256         | 1  |  99.935485   |
|            nfnet_l0             | 1  |  95.387335   |
|             dla102              | 1  |  92.077912   |
|          cspdarknet53           | 1  |  84.961366   |
|          gmlp_s16_224           | 1  |  77.681452   |
|          gmixer_24_224          | 1  |  70.887537   |
|       gluon_inception_v3        | 1  |  70.353283   |
|          inception_v3           | 1  |  70.317506   |
|        adv_inception_v3         | 1  |  70.303211   |
|         visformer_small         | 1  |  68.767048   |
|            repvgg_a2            | 1  |  65.747836   |
|        res2net50_14w_8s         | 1  |  64.784315   |
|           res2next50            | 1  |  59.591819   |
|            gernet_l             | 1  |  57.820921   |
|           selecsls42b           | 1  |  47.256752   |
|          botnet26t_256          | 1  |  47.194487   |
|         coat_lite_mini          | 1  |  44.211811   |
|        eca_halonext26ts         | 1  |  43.633817   |
|       eca_botnext26ts_256       | 1  |  43.421368   |
|          resmlp_12_224          | 1  |  42.665408   |
|         crossvit_9_240          | 1  |  39.308176   |
|           mobilevit_s           | 1  |  38.369068   |
|        ese_vovnet19b_dw         | 1  |  31.853097   |
|            mixnet_l             | 1  |  29.847446   |
|           tf_mixnet_l           | 1  |  29.216548   |
|            fbnetv3_b            | 1  |   15.73334   |
|       tf_efficientnet_b0        | 1  |   13.92068   |
|           rexnet_100            | 1  |  13.481896   |
|            tinynet_a            | 1  |   11.91088   |
|            levit_128            | 1  |  11.633842   |
|           fbnetc_100            | 1  |   9.472475   |
|          ghostnet_100           | 1  |   8.739159   |
|          spnasnet_100           | 1  |   8.14675    |
|           mnasnet_100           | 1  |   7.625611   |
|         mobilenetv2_100         | 1  |   7.461885   |
|      mobilenetv3_large_100      | 1  |   7.20891    |
|           regnety_002           | 1  |   6.068399   |
|            lcnet_050            | 1  |   2.445824   |
+---------------------------------+----+--------------+

zxd1997066 · 2024-10-21T13:49:39Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-19 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.14x    |    2.18x    |    2.77x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.66    |    22.97    |    22.57    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 17.760126 |
|          timm_resnest           |   32    | 5.318381  |
|            resnet50             |   32    | 4.756981  |
|         phlippe_resnet          |   128   | 4.633717  |
|         resnext50_32x4d         |    8    | 4.241775  |
|          squeezenet1_1          |   16    | 4.073069  |
|           mnasnet1_0            |   32    | 3.854901  |
|            resnet18             |    8    | 3.836072  |
|            resnet152            |   32    | 3.819623  |
|          mobilenet_v2           |   16    | 3.759268  |
|              vgg16              |    4    | 3.744175  |
|       mobilenet_v3_large        |   32    | 3.654498  |
|             yolov3              |    8    | 3.448783  |
|           timm_vovnet           |   32    |  3.38216  |
|             alexnet             |   128   |  3.33889  |
|        timm_efficientnet        |   64    | 3.023516  |
|           timm_regnet           |   32    | 3.023123  |
|       shufflenet_v2_x1_0        |   64    | 2.874608  |
|          maml_omniglot          |    5    | 2.834728  |
|             hf_GPT2             |    1    |  2.82132  |
|           timm_nfnet            |   128   | 2.765431  |
|              llama              |   32    | 2.685547  |
|             hf_Bert             |    1    |  2.67107  |
|       doctr_det_predictor       |    1    | 2.619459  |
|        soft_actor_critic        |   256   | 2.600837  |
|           densenet121           |   64    |  2.57082  |
|        phlippe_densenet         |   128   | 2.550803  |
|          hf_Bert_large          |    1    | 2.534141  |
|          lennard_jones          |  1000   | 2.474933  |
|           hf_T5_base            |    1    | 2.404326  |
|          BERT_pytorch           |    2    | 2.395971  |
|          hf_DistilBert          |    1    | 2.340131  |
|             hf_Bart             |    1    | 2.182235  |
|     functorch_maml_omniglot     |    1    | 2.153885  |
|            hf_Albert            |    1    | 2.142398  |
|         LearningToPaint         |   96    |  2.1097   |
|              hf_T5              |    1    | 2.107304  |
|            moondream            |    1    | 2.053952  |
|          fastNLP_Bert           |    1    |  2.03435  |
|              dcgan              |   256   | 2.000483  |
|          hf_GPT2_large          |    1    | 1.907352  |
|          hf_Longformer          |    1    | 1.869785  |
|      doctr_reco_predictor       |    1    | 1.861953  |
|           hf_T5_large           |    1    | 1.859138  |
|      torch_multimodal_clip      |   32    |  1.76357  |
|          pytorch_unet           |    1    | 1.747391  |
|       Background_Matting        |    1    | 1.730176  |
|        basic_gnn_edgecnn        |    1    | 1.706772  |
|     timm_vision_transformer     |   32    | 1.646992  |
|         pytorch_stargan         |   16    | 1.620187  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.617379  |
|        hf_distil_whisper        |    1    | 1.580686  |
|       speech_transformer        |    1    | 1.534882  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.45133  |
|  timm_vision_transformer_large  |   32    |  1.45023  |
|           hf_Reformer           |    1    | 1.435743  |
|     nvidia_deeprecommender      |   256   | 1.357048  |
|          basic_gnn_gin          |    1    | 1.355742  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.348278  |
|         vision_maskrcnn         |    1    | 1.336093  |
|         basic_gnn_sage          |    1    | 1.283016  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.252641  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.247338  |
|         opacus_cifar10          |   64    | 1.237449  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.202788  |
|           Super_SloMo           |    6    |  1.20111  |
|          basic_gnn_gcn          |    1    | 1.190583  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.147216  |
|              dlrm               |  2048   | 1.146021  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.133232  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.081611  |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.076879  |
|             demucs              |    1    |  1.0569   |
|      functorch_dp_cifar10       |   64    | 1.054238  |
|           hf_BigBird            |    1    | 1.051778  |
|   mobilenet_v2_quantized_qat    |   96    | 1.001612  |
|     resnet50_quantized_qat      |   32    | 0.982005  |
|           tts_angular           |   64    | 0.979794  |
|              maml               |    1    | 0.782633  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 130.334727 |
|           hf_BigBird            |    1    | 128.379343 |
|    detectron2_fcos_r_50_fpn     |    1    | 101.812834 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 99.529716  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 88.519536  |
|              maml               |    1    | 72.552504  |
|          hf_Longformer          |    1    | 71.012954  |
|           hf_T5_large           |    1    |  62.6417   |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 53.315994  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 45.900496  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 45.176953  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 44.797221  |
|       speech_transformer        |    1    | 44.333715  |
|           hf_Reformer           |    1    | 37.664164  |
|            moondream            |    1    | 34.717001  |
|           densenet121           |   64    | 34.282752  |
|     pyhpc_isoneutral_mixing     | 1048576 | 33.372009  |
|          hf_GPT2_large          |    1    | 32.305387  |
|  timm_vision_transformer_large  |   32    | 30.118594  |
|           hf_T5_base            |    1    |  28.80279  |
|          fastNLP_Bert           |    1    | 26.871194  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 26.769621  |
|           Super_SloMo           |    6    | 26.407687  |
|          basic_gnn_gcn          |    1    | 26.216909  |
|       doctr_det_predictor       |    1    | 26.209103  |
|        hf_distil_whisper        |    1    | 25.452145  |
|      torch_multimodal_clip      |   32    | 24.262478  |
|          hf_Bert_large          |    1    | 23.904487  |
|          BERT_pytorch           |    2    | 21.822959  |
|            resnet152            |   32    | 21.813474  |
|             yolov3              |    8    | 19.788177  |
|              hf_T5              |    1    | 19.033622  |
|             hf_Bart             |    1    | 18.897245  |
|             demucs              |    1    | 17.077418  |
|           timm_regnet           |   32    | 16.722397  |
|       shufflenet_v2_x1_0        |   64    | 16.535832  |
|       Background_Matting        |    1    | 16.438842  |
|        phlippe_densenet         |   128   | 16.417092  |
|        timm_efficientnet        |   64    | 16.119931  |
|           timm_nfnet            |   128   | 15.889123  |
|              llama              |   32    | 15.796921  |
|             hf_Bert             |    1    | 15.625289  |
|             hf_GPT2             |    1    | 15.481207  |
|          timm_resnest           |   32    | 15.391572  |
|            hf_Albert            |    1    | 14.606054  |
|       mobilenet_v3_large        |   32    | 14.553941  |
|     timm_vision_transformer     |   32    | 14.303456  |
|      doctr_reco_predictor       |    1    | 13.843348  |
|           timm_vovnet           |   32    | 13.823167  |
|          pytorch_unet           |    1    | 13.225065  |
|         resnext50_32x4d         |    8    | 13.100197  |
|            resnet50             |   32    | 13.042262  |
|          mobilenet_v2           |   16    |  13.01937  |
|         opacus_cifar10          |   64    | 12.906624  |
|         pytorch_stargan         |   16    | 12.824818  |
|          hf_DistilBert          |    1    | 12.726449  |
|           mnasnet1_0            |   32    | 12.703283  |
|      functorch_dp_cifar10       |   64    |  12.54308  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.328061  |
|          squeezenet1_1          |   16    | 10.526938  |
|            resnet18             |    8    | 10.418644  |
|         LearningToPaint         |   96    | 10.089569  |
|     pyhpc_equation_of_state     | 1048576 |  9.948291  |
|         phlippe_resnet          |   128   |  9.802966  |
|             alexnet             |   128   |  9.710312  |
|              vgg16              |    4    |  9.632322  |
|     functorch_maml_omniglot     |    1    |  9.110132  |
|          maml_omniglot          |    5    |  8.727686  |
|              dlrm               |  2048   |  8.713823  |
|         basic_gnn_sage          |    1    |  8.514908  |
|        basic_gnn_edgecnn        |    1    |  8.466565  |
|          basic_gnn_gin          |    1    |  8.415532  |
|              dcgan              |   256   |  8.371837  |
|     nvidia_deeprecommender      |   256   |  8.298877  |
|        soft_actor_critic        |   256   |  8.09575   |
|           tts_angular           |   64    |  7.980738  |
|          lennard_jones          |  1000   |  7.92141   |
|   mobilenet_v2_quantized_qat    |   96    |  0.205367  |
|     resnet50_quantized_qat      |   32    |  0.196352  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.988246 |
|           timm_nfnet            |   128   | 0.985868 |
|           hf_T5_base            |    1    | 0.98215  |
|             demucs              |    1    | 0.981745 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97372  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.973621 |
|           timm_regnet           |   32    | 0.972011 |
|        timm_efficientnet        |   64    | 0.971939 |
|       Background_Matting        |    1    | 0.970345 |
|           Super_SloMo           |    6    | 0.969482 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.968703 |
|              llama              |   32    | 0.968266 |
|       doctr_det_predictor       |    1    | 0.967034 |
|      torch_multimodal_clip      |   32    | 0.966554 |
|           densenet121           |   64    | 0.96557  |
|             yolov3              |    8    | 0.965246 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.965198 |
|          pytorch_unet           |    1    | 0.964711 |
|        basic_gnn_edgecnn        |    1    | 0.964551 |
|         LearningToPaint         |   96    | 0.964088 |
|            resnet152            |   32    | 0.962442 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.960458 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.960401 |
|           timm_vovnet           |   32    | 0.95964  |
|            resnet50             |   32    | 0.958962 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.957802 |
|     timm_vision_transformer     |   32    | 0.95427  |
|          timm_resnest           |   32    | 0.951797 |
|         vision_maskrcnn         |    1    | 0.951718 |
|   mobilenet_v2_quantized_qat    |   96    | 0.949055 |
|          basic_gnn_gcn          |    1    | 0.948066 |
|      doctr_reco_predictor       |    1    | 0.945946 |
|           mnasnet1_0            |   32    | 0.940071 |
|          basic_gnn_gin          |    1    | 0.939691 |
|           hf_BigBird            |    1    | 0.939677 |
|         basic_gnn_sage          |    1    | 0.938202 |
|     pyhpc_equation_of_state     | 1048576 |  0.9371  |
|       mobilenet_v3_large        |   32    | 0.93472  |
|          mobilenet_v2           |   16    | 0.933809 |
|       shufflenet_v2_x1_0        |   64    | 0.932409 |
|         pytorch_stargan         |   16    | 0.931563 |
|  timm_vision_transformer_large  |   32    | 0.931414 |
|             hf_Bert             |    1    | 0.931023 |
|         resnext50_32x4d         |    8    | 0.92844  |
|     nvidia_deeprecommender      |   256   | 0.91995  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.919703 |
|     resnet50_quantized_qat      |   32    | 0.919651 |
|          fastNLP_Bert           |    1    | 0.916786 |
|        phlippe_densenet         |   128   | 0.916667 |
|            hf_Albert            |    1    | 0.910164 |
|       speech_transformer        |    1    | 0.909695 |
|          squeezenet1_1          |   16    | 0.904506 |
|           tts_angular           |   64    | 0.903297 |
|          BERT_pytorch           |    2    | 0.90235  |
|          hf_DistilBert          |    1    | 0.900307 |
|              dcgan              |   256   | 0.900232 |
|             hf_GPT2             |    1    | 0.895764 |
|         opacus_cifar10          |   64    | 0.889485 |
|          hf_Longformer          |    1    | 0.889239 |
|        hf_distil_whisper        |    1    | 0.882193 |
|              vgg16              |    4    | 0.88135  |
|        soft_actor_critic        |   256   | 0.880537 |
|              hf_T5              |    1    | 0.880017 |
|             hf_Bart             |    1    | 0.87959  |
|          hf_GPT2_large          |    1    | 0.878463 |
|         phlippe_resnet          |   128   | 0.871658 |
|          lennard_jones          |  1000   | 0.862355 |
|          maml_omniglot          |    5    | 0.858268 |
|          hf_Bert_large          |    1    | 0.856384 |
|     functorch_maml_omniglot     |    1    | 0.852433 |
|      functorch_dp_cifar10       |   64    | 0.841067 |
|            moondream            |    1    | 0.838207 |
|           hf_Reformer           |    1    | 0.831019 |
|           hf_T5_large           |    1    | 0.829933 |
|            resnet18             |    8    | 0.817483 |
|             alexnet             |   128   | 0.777849 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.765096 |
|              maml               |    1    | 0.717391 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.56746  |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 811.611978 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 798.721659 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 702.533653 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 697.044648 |
|  timm_vision_transformer_large  |   32    | 667.703621 |
|           hf_T5_base            |    1    | 352.636648 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 251.193841 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 240.845118 |
|         vision_maskrcnn         |    1    | 236.367281 |
|           Super_SloMo           |    6    | 199.66253  |
|          hf_GPT2_large          |    1    | 128.630122 |
|           hf_T5_large           |    1    | 125.774704 |
|            moondream            |    1    | 101.48465  |
|        hf_distil_whisper        |    1    | 101.232224 |
|           timm_nfnet            |   128   | 96.320929  |
|           hf_BigBird            |    1    | 96.223636  |
|    detectron2_fcos_r_50_fpn     |    1    | 75.876674  |
|          pytorch_unet           |    1    | 66.101153  |
|       Background_Matting        |    1    | 64.356304  |
|              maml               |    1    | 58.211818  |
|             demucs              |    1    | 51.782532  |
|           densenet121           |   64    |  49.65618  |
|      torch_multimodal_clip      |   32    | 40.073701  |
|           timm_regnet           |   32    | 36.707295  |
|       doctr_det_predictor       |    1    | 36.345868  |
|          hf_Longformer          |    1    | 33.420299  |
|          hf_Bert_large          |    1    | 33.172839  |
|            resnet152            |   32    | 30.196396  |
|   mobilenet_v2_quantized_qat    |   96    | 30.073671  |
|             yolov3              |    8    | 28.950817  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.143393  |
|     pyhpc_isoneutral_mixing     | 1048576 | 26.612307  |
|       speech_transformer        |    1    | 24.203353  |
|        timm_efficientnet        |   64    | 21.465097  |
|           hf_Reformer           |    1    | 21.240768  |
|     timm_vision_transformer     |   32    | 21.089791  |
|             hf_Bart             |    1    | 18.614347  |
|              hf_T5              |    1    | 18.564202  |
|     nvidia_deeprecommender      |   256   | 17.897618  |
|          fastNLP_Bert           |    1    | 17.142785  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.741468  |
|           timm_vovnet           |   32    | 16.517608  |
|         pytorch_stargan         |   16    |  15.40556  |
|             hf_Bert             |    1    | 13.194715  |
|            hf_Albert            |    1    | 13.136701  |
|          BERT_pytorch           |    2    | 12.212984  |
|     resnet50_quantized_qat      |   32    | 12.203087  |
|            resnet50             |   32    | 11.850594  |
|             hf_GPT2             |    1    | 10.988627  |
|       shufflenet_v2_x1_0        |   64    |  9.006401  |
|              llama              |   32    |  8.826291  |
|           tts_angular           |   64    |  8.778487  |
|          timm_resnest           |   32    |  8.527014  |
|         LearningToPaint         |   96    |  8.39395   |
|          hf_DistilBert          |    1    |  8.155625  |
|        phlippe_densenet         |   128   |  7.985681  |
|          basic_gnn_gcn          |    1    |  7.720428  |
|       mobilenet_v3_large        |   32    |  7.183962  |
|      functorch_dp_cifar10       |   64    |  7.093946  |
|         resnext50_32x4d         |    8    |  6.950853  |
|         opacus_cifar10          |   64    |  6.905668  |
|        basic_gnn_edgecnn        |    1    |  6.593711  |
|              vgg16              |    4    |  6.412303  |
|             alexnet             |   128   |  6.17843   |
|           mnasnet1_0            |   32    |  6.148746  |
|          mobilenet_v2           |   16    |  4.771413  |
|              dlrm               |  2048   |  4.490198  |
|         basic_gnn_sage          |    1    |  4.192019  |
|          basic_gnn_gin          |    1    |  3.624149  |
|      doctr_reco_predictor       |    1    |  3.303739  |
|          squeezenet1_1          |   16    |  3.262512  |
|              dcgan              |   256   |  2.776372  |
|            resnet18             |    8    |  2.719633  |
|         phlippe_resnet          |   128   |  1.480004  |
|     pyhpc_equation_of_state     | 1048576 |  1.197092  |
|     functorch_maml_omniglot     |    1    |  0.387127  |
|          maml_omniglot          |    5    |  0.281248  |
|        soft_actor_critic        |   256   |  0.238423  |
|          lennard_jones          |  1000   |  0.159693  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 12.66535 |
|       ElectraForQuestionAnswering       | 64  |  4.1868  |
|           ElectraForCausalLM            | 32  | 3.712914 |
|     MobileBertForQuestionAnswering      | 128 | 3.309397 |
|           RobertaForCausalLM            | 16  | 3.074948 |
|       RobertaForQuestionAnswering       | 16  | 3.031668 |
|    LayoutLMForSequenceClassification    | 16  | 3.001638 |
|          MobileBertForMaskedLM          | 128 | 2.98047  |
|        BertForQuestionAnswering         | 16  | 2.937977 |
|                CamemBert                | 16  | 2.872065 |
|             BertForMaskedLM             | 16  | 2.787059 |
|           LayoutLMForMaskedLM           | 16  | 2.767495 |
|    MegatronBertForQuestionAnswering     |  8  | 2.681929 |
|               DistillGPT2               | 16  | 2.471047 |
|       T5ForConditionalGeneration        |  4  | 2.402103 |
|                 T5Small                 |  4  | 2.379414 |
|            YituTechConvBert             | 16  | 2.373881 |
|           DebertaForMaskedLM            |  8  | 2.366864 |
|      GPT2ForSequenceClassification      |  4  | 2.352537 |
|         MegatronBertForCausalLM         |  4  | 2.336571 |
|       DebertaForQuestionAnswering       | 16  | 2.153147 |
|             OPTForCausalLM              |  2  | 2.070487 |
|             XGLMForCausalLM             |  8  | 2.050709 |
|          BlenderbotForCausalLM          |  4  | 1.985555 |
|         Speech2Text2ForCausalLM         | 256 | 1.980042 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.955319 |
|       MT5ForConditionalGeneration       | 16  | 1.866619 |
|       BlenderbotSmallForCausalLM        | 64  | 1.850347 |
|     DistilBertForQuestionAnswering      | 256 | 1.830931 |
|            PLBartForCausalLM            |  8  | 1.772015 |
|            MBartForCausalLM             |  4  | 1.75393  |
|          DebertaV2ForMaskedLM           |  2  | 1.753741 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.735792 |
|     PLBartForConditionalGeneration      |  4  | 1.72364  |
|          DistilBertForMaskedLM          | 128 | 1.704993 |
|            TrOCRForCausalLM             | 32  | 1.657286 |
|     PegasusForConditionalGeneration     | 32  | 1.631271 |
|     M2M100ForConditionalGeneration      | 16  | 1.626542 |
|           PegasusForCausalLM            | 32  | 1.60711  |
|      MBartForConditionalGeneration      |  2  | 1.596845 |
|      BartForConditionalGeneration       |  2  | 1.511941 |
|             BartForCausalLM             |  4  | 1.494778 |
|       AlbertForQuestionAnswering        |  4  | 1.445657 |
|            AlbertForMaskedLM            |  4  | 1.428923 |
|               GoogleFnet                | 16  | 1.427426 |
|          AllenaiLongformerBase          |  4  | 1.069841 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 85.278247 |
|          MobileBertForMaskedLM          | 128 | 51.172835 |
|     MobileBertForQuestionAnswering      | 128 | 50.586721 |
|     M2M100ForConditionalGeneration      | 16  | 36.013939 |
|     PegasusForConditionalGeneration     | 32  | 35.986088 |
|      MBartForConditionalGeneration      |  2  | 35.577043 |
|      BartForConditionalGeneration       |  2  | 31.122634 |
|          BlenderbotForCausalLM          |  4  | 30.57656  |
|             XGLMForCausalLM             |  8  | 30.329575 |
|          DebertaV2ForMaskedLM           |  2  | 29.999693 |
|      DebertaV2ForQuestionAnswering      |  1  | 28.990627 |
|       MT5ForConditionalGeneration       | 16  | 27.132855 |
| BlenderbotSmallForConditionalGeneration | 64  | 26.272402 |
|         MegatronBertForCausalLM         |  4  | 24.654512 |
|    MegatronBertForQuestionAnswering     |  8  | 24.015661 |
|            YituTechConvBert             | 16  | 22.580009 |
|     PLBartForConditionalGeneration      |  4  | 21.647312 |
|                 T5Small                 |  4  | 20.579799 |
|       T5ForConditionalGeneration        |  4  | 20.549742 |
|           DebertaForMaskedLM            |  8  | 19.097673 |
|           PegasusForCausalLM            | 32  | 18.792851 |
|       DebertaForQuestionAnswering       | 16  | 18.765916 |
|             OPTForCausalLM              |  2  | 18.530576 |
|            MBartForCausalLM             |  4  | 18.431257 |
|            TrOCRForCausalLM             | 32  | 18.217135 |
|            AlbertForMaskedLM            |  4  | 16.709884 |
|             BartForCausalLM             |  4  |   16.24   |
|            XLNetLMHeadModel             |  8  | 16.143122 |
|      GPT2ForSequenceClassification      |  4  | 16.034056 |
|           ElectraForCausalLM            | 32  | 15.591211 |
|           LayoutLMForMaskedLM           | 16  | 15.518198 |
|       RobertaForQuestionAnswering       | 16  | 15.462643 |
|           RobertaForCausalLM            | 16  | 15.417101 |
|        BertForQuestionAnswering         | 16  | 15.398734 |
|                CamemBert                | 16  | 15.322288 |
|       BlenderbotSmallForCausalLM        | 64  | 15.287238 |
|             BertForMaskedLM             | 16  | 15.28004  |
|       ElectraForQuestionAnswering       | 64  | 15.259878 |
|    LayoutLMForSequenceClassification    | 16  | 15.226881 |
|       AlbertForQuestionAnswering        |  4  | 13.882244 |
|               GoogleFnet                | 16  | 13.675244 |
|         Speech2Text2ForCausalLM         | 256 | 13.304677 |
|          DistilBertForMaskedLM          | 128 | 13.301233 |
|     DistilBertForQuestionAnswering      | 256 | 13.277887 |
|            PLBartForCausalLM            |  8  | 13.24764  |
|               DistillGPT2               | 16  | 11.994466 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.991815 |
|            PLBartForCausalLM            |  8  | 0.991127 |
|               GoogleFnet                | 16  | 0.99093  |
|               DistillGPT2               | 16  | 0.990916 |
|          DistilBertForMaskedLM          | 128 | 0.989302 |
|           ElectraForCausalLM            | 32  | 0.988953 |
|            AlbertForMaskedLM            |  4  | 0.988805 |
|             BertForMaskedLM             | 16  | 0.988462 |
|            YituTechConvBert             | 16  | 0.987498 |
|       ElectraForQuestionAnswering       | 64  | 0.987483 |
|             OPTForCausalLM              |  2  | 0.986601 |
|           RobertaForCausalLM            | 16  | 0.986541 |
|       BlenderbotSmallForCausalLM        | 64  | 0.986457 |
|                CamemBert                | 16  | 0.985997 |
|         Speech2Text2ForCausalLM         | 256 | 0.98555  |
|           LayoutLMForMaskedLM           | 16  | 0.984989 |
|     DistilBertForQuestionAnswering      | 256 | 0.984579 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.984296 |
|        BertForQuestionAnswering         | 16  | 0.982841 |
|            TrOCRForCausalLM             | 32  | 0.981913 |
|    LayoutLMForSequenceClassification    | 16  | 0.981663 |
|       DebertaForQuestionAnswering       | 16  | 0.980641 |
|                 T5Small                 |  4  | 0.980073 |
|          MobileBertForMaskedLM          | 128 | 0.979609 |
|       RobertaForQuestionAnswering       | 16  | 0.978993 |
|       MT5ForConditionalGeneration       | 16  | 0.977633 |
|          AllenaiLongformerBase          |  4  | 0.976917 |
|       T5ForConditionalGeneration        |  4  | 0.976187 |
|      GPT2ForSequenceClassification      |  4  | 0.975219 |
|     MobileBertForQuestionAnswering      | 128 | 0.969424 |
|           DebertaForMaskedLM            |  8  | 0.967978 |
|           PegasusForCausalLM            | 32  | 0.967179 |
|     PLBartForConditionalGeneration      |  4  | 0.963313 |
|     M2M100ForConditionalGeneration      | 16  | 0.951876 |
|          BlenderbotForCausalLM          |  4  |  0.9409  |
|         MegatronBertForCausalLM         |  4  | 0.940013 |
|    MegatronBertForQuestionAnswering     |  8  | 0.937455 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.935927 |
|            XLNetLMHeadModel             |  8  | 0.935342 |
|      BartForConditionalGeneration       |  2  | 0.928485 |
|             XGLMForCausalLM             |  8  | 0.926131 |
|             BartForCausalLM             |  4  | 0.923258 |
|      MBartForConditionalGeneration      |  2  | 0.92226  |
|     PegasusForConditionalGeneration     | 32  | 0.914882 |
|          DebertaV2ForMaskedLM           |  2  | 0.904453 |
|            MBartForCausalLM             |  4  | 0.901213 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 935.986988 |
|            AlbertForMaskedLM            |  4  | 503.284519 |
|       AlbertForQuestionAnswering        |  4  | 489.632055 |
|            XLNetLMHeadModel             |  8  | 365.40789  |
|               GoogleFnet                | 16  | 264.344479 |
|             OPTForCausalLM              |  2  | 196.188031 |
|            TrOCRForCausalLM             | 32  | 171.305536 |
|      MBartForConditionalGeneration      |  2  | 166.49127  |
|            MBartForCausalLM             |  4  | 166.061031 |
|     PegasusForConditionalGeneration     | 32  | 163.393441 |
|     DistilBertForQuestionAnswering      | 256 | 132.971858 |
|      BartForConditionalGeneration       |  2  | 130.831203 |
|            PLBartForCausalLM            |  8  | 130.776736 |
|    MegatronBertForQuestionAnswering     |  8  | 127.708785 |
|            YituTechConvBert             | 16  | 125.643915 |
|          DebertaV2ForMaskedLM           |  2  | 121.615453 |
|                 T5Small                 |  4  | 121.553813 |
|       T5ForConditionalGeneration        |  4  | 120.815979 |
|          BlenderbotForCausalLM          |  4  | 119.666896 |
|     M2M100ForConditionalGeneration      | 16  | 119.042012 |
|     PLBartForConditionalGeneration      |  4  | 117.294614 |
|          DistilBertForMaskedLM          | 128 | 112.296149 |
| BlenderbotSmallForConditionalGeneration | 64  | 106.070188 |
|           RobertaForCausalLM            | 16  | 101.601856 |
|          MobileBertForMaskedLM          | 128 | 100.797239 |
|           LayoutLMForMaskedLM           | 16  | 100.697413 |
|             BertForMaskedLM             | 16  | 98.899578  |
|                CamemBert                | 16  | 96.098069  |
|             BartForCausalLM             |  4  | 91.498693  |
|       MT5ForConditionalGeneration       | 16  | 90.277084  |
|       DebertaForQuestionAnswering       | 16  |  90.12894  |
|         MegatronBertForCausalLM         |  4  | 84.082508  |
|        BertForQuestionAnswering         | 16  | 77.327254  |
|           PegasusForCausalLM            | 32  | 77.319826  |
|    LayoutLMForSequenceClassification    | 16  | 77.047447  |
|       RobertaForQuestionAnswering       | 16  | 75.025156  |
|             XGLMForCausalLM             |  8  | 70.361877  |
|      DebertaV2ForQuestionAnswering      |  1  | 69.478164  |
|               DistillGPT2               | 16  | 68.969438  |
|       ElectraForQuestionAnswering       | 64  | 61.893463  |
|           DebertaForMaskedLM            |  8  | 58.395368  |
|         Speech2Text2ForCausalLM         | 256 | 57.428267  |
|      GPT2ForSequenceClassification      |  4  | 57.367718  |
|           ElectraForCausalLM            | 32  | 56.023051  |
|       BlenderbotSmallForCausalLM        | 64  | 55.526541  |
|     MobileBertForQuestionAnswering      | 128 | 53.335898  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 5.416536 |
|          inception_v3           | 128  | 5.077241 |
|           fbnetc_100            | 512  | 4.905863 |
|           mnasnet_100           | 512  | 4.905445 |
|        adv_inception_v3         | 128  | 4.891774 |
|       gluon_inception_v3        | 256  | 4.831684 |
|          cspdarknet53           |  64  | 4.724341 |
|           regnety_002           | 1024 | 4.644098 |
|      mobilenetv3_large_100      | 512  | 4.598647 |
|            fbnetv3_b            | 256  | 4.42593  |
|         mobilenetv2_100         | 128  | 4.346586 |
|            lcnet_050            | 256  | 4.267499 |
|           res2next50            | 128  | 4.25978  |
|        ese_vovnet19b_dw         | 256  | 4.217994 |
|            hrnet_w18            | 128  | 4.207217 |
|          spnasnet_100           | 128  | 4.104757 |
|        res2net50_14w_8s         | 128  | 4.091154 |
|        res2net101_26w_4s        | 128  | 4.082773 |
|          botnet26t_256          | 128  | 4.073419 |
|          pnasnet5large          |  16  | 3.899025 |
|             dla102              | 128  | 3.843971 |
|           rexnet_100            | 256  | 3.842816 |
|     swsl_resnext101_32x16d      |  32  | 3.761598 |
|            gernet_l             | 128  | 3.71695  |
|            nfnet_l0             | 128  | 3.595047 |
|       eca_botnext26ts_256       | 128  | 3.351736 |
|           volo_d1_224           |  64  | 3.321289 |
|            tinynet_a            | 128  | 3.28828  |
|        eca_halonext26ts         | 128  | 3.214526 |
|           dm_nfnet_f0           | 128  | 3.117456 |
|        convmixer_768_32         |  32  | 2.949205 |
|           mobilevit_s           |  64  | 2.941875 |
|            repvgg_a2            | 128  | 2.835223 |
|       tf_efficientnet_b0        | 128  | 2.699368 |
|           selecsls42b           | 128  | 2.623939 |
|         visformer_small         | 128  | 2.519277 |
|          ghostnet_100           | 512  | 2.512799 |
|           convit_base           |  64  | 2.368311 |
|         poolformer_m36          |  64  |  2.3124  |
|          jx_nest_base           |  32  | 2.029247 |
|      xcit_large_24_p8_224       |  16  | 2.004885 |
|           tf_mixnet_l           | 128  | 1.995553 |
|             dpn107              |  64  | 1.982879 |
|            mixnet_l             | 128  | 1.942422 |
|            levit_128            | 1024 | 1.898983 |
|          convnext_base          |  64  | 1.877421 |
|         coat_lite_mini          | 128  | 1.874438 |
|        twins_pcpvt_base         | 128  | 1.645115 |
|  swin_base_patch4_window7_224   |  64  | 1.627459 |
|        tnt_s_patch16_224        | 128  | 1.59814  |
|          gmlp_s16_224           | 128  | 1.591595 |
|          mixer_b16_224          | 128  | 1.591574 |
| deit_base_distilled_patch16_224 |  64  | 1.57715  |
|      beit_base_patch16_224      |  64  | 1.536449 |
|         crossvit_9_240          | 256  | 1.516186 |
|          resmlp_12_224          | 128  | 1.481314 |
|            pit_b_224            |  64  | 1.441622 |
|      vit_base_patch16_224       |  64  | 1.43762  |
|        sebotnet33ts_256         |  64  | 1.43521  |
|          gmixer_24_224          | 128  | 1.418858 |
|          cait_m36_384           |  4   | 0.97759  |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          cait_m36_384           |  4   | 49.942579 |
|      xcit_large_24_p8_224       |  16  | 47.278789 |
|          pnasnet5large          |  16  | 45.552597 |
|         poolformer_m36          |  64  | 44.101318 |
|            hrnet_w18            | 128  | 43.443843 |
|  swin_base_patch4_window7_224   |  64  | 41.427504 |
|           mobilevit_s           |  64  | 34.50024  |
|        tnt_s_patch16_224        | 128  | 33.965735 |
|          jx_nest_base           |  32  | 33.215539 |
|        res2net101_26w_4s        | 128  | 32.28108  |
|             dpn107              |  64  | 31.535686 |
|           resnest101e           |  64  | 31.353768 |
|        res2net50_14w_8s         | 128  | 30.825788 |
|        twins_pcpvt_base         | 128  | 30.776896 |
|           tf_mixnet_l           | 128  | 28.86663  |
|           volo_d1_224           |  64  | 28.075815 |
|        eca_halonext26ts         | 128  | 28.039242 |
|            mixnet_l             | 128  | 26.883464 |
|        adv_inception_v3         | 128  | 25.878895 |
|            levit_128            | 1024 | 25.834167 |
|          inception_v3           | 128  | 23.697053 |
|         crossvit_9_240          | 256  | 23.632581 |
|        sebotnet33ts_256         |  64  | 23.574581 |
|          gmixer_24_224          | 128  | 23.194232 |
|       gluon_inception_v3        | 256  | 21.927026 |
|          gmlp_s16_224           | 128  | 21.768032 |
|          convnext_base          |  64  | 20.839448 |
|         coat_lite_mini          | 128  | 20.774475 |
|           res2next50            | 128  | 20.702944 |
|           rexnet_100            | 256  | 19.705288 |
|       eca_botnext26ts_256       | 128  | 19.562229 |
|            fbnetv3_b            | 256  | 19.555343 |
|           convit_base           |  64  | 19.31538  |
|          ghostnet_100           | 512  | 18.693332 |
|          botnet26t_256          | 128  | 18.063459 |
|             dla102              | 128  | 18.036793 |
|            tinynet_a            | 128  | 17.387065 |
|         visformer_small         | 128  | 17.351054 |
|       tf_efficientnet_b0        | 128  | 16.803003 |
|     swsl_resnext101_32x16d      |  32  | 16.639346 |
|        convmixer_768_32         |  32  | 16.609189 |
|           dm_nfnet_f0           | 128  | 15.227795 |
|            pit_b_224            |  64  | 15.140773 |
|          cspdarknet53           |  64  | 15.08698  |
|          mixer_b16_224          | 128  | 14.741078 |
|      beit_base_patch16_224      |  64  | 14.464796 |
| deit_base_distilled_patch16_224 |  64  | 14.243501 |
|      vit_base_patch16_224       |  64  | 14.025716 |
|            repvgg_a2            | 128  | 13.744544 |
|          spnasnet_100           | 128  | 13.340191 |
|          resmlp_12_224          | 128  | 13.191351 |
|            nfnet_l0             | 128  | 13.137835 |
|           regnety_002           | 1024 | 13.106984 |
|      mobilenetv3_large_100      | 512  | 12.973099 |
|            gernet_l             | 128  | 12.736161 |
|         mobilenetv2_100         | 128  | 12.631088 |
|           selecsls42b           | 128  | 12.032943 |
|           fbnetc_100            | 512  | 11.989668 |
|            lcnet_050            | 256  | 11.268672 |
|           mnasnet_100           | 512  | 11.173967 |
|        ese_vovnet19b_dw         | 256  | 10.613742 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995455 |
|           rexnet_100            | 256  | 0.992403 |
|           fbnetc_100            | 512  | 0.99233  |
|      mobilenetv3_large_100      | 512  | 0.992109 |
|           mnasnet_100           | 512  | 0.991969 |
|           regnety_002           | 1024 | 0.991764 |
|            fbnetv3_b            | 256  | 0.991193 |
|       gluon_inception_v3        | 256  | 0.99117  |
|           dm_nfnet_f0           | 128  | 0.991107 |
|          ghostnet_100           | 512  | 0.990886 |
|            levit_128            | 1024 | 0.990862 |
|      beit_base_patch16_224      |  64  | 0.989617 |
|      xcit_large_24_p8_224       |  16  | 0.988955 |
|             dpn107              |  64  | 0.988583 |
|             dla102              | 128  | 0.988394 |
|        twins_pcpvt_base         | 128  | 0.988335 |
|       eca_botnext26ts_256       | 128  | 0.987967 |
|          mixer_b16_224          | 128  | 0.987821 |
|            nfnet_l0             | 128  | 0.987794 |
|        res2net101_26w_4s        | 128  | 0.987695 |
|           convit_base           |  64  | 0.987682 |
|        eca_halonext26ts         | 128  | 0.987435 |
|         coat_lite_mini          | 128  | 0.986523 |
|           res2next50            | 128  | 0.986493 |
|        tnt_s_patch16_224        | 128  | 0.986466 |
|         visformer_small         | 128  | 0.985921 |
|          botnet26t_256          | 128  | 0.985908 |
|            mixnet_l             | 128  | 0.985675 |
|           tf_mixnet_l           | 128  | 0.985669 |
|          convnext_base          |  64  | 0.985173 |
|           resnest101e           |  64  | 0.984968 |
|          inception_v3           | 128  | 0.984877 |
|       tf_efficientnet_b0        | 128  | 0.984707 |
|        adv_inception_v3         | 128  | 0.984695 |
|          gmlp_s16_224           | 128  | 0.984345 |
|         mobilenetv2_100         | 128  | 0.98423  |
|          gmixer_24_224          | 128  | 0.983739 |
|         poolformer_m36          |  64  | 0.983422 |
|         crossvit_9_240          | 256  | 0.983208 |
|            pit_b_224            |  64  | 0.983004 |
|          cspdarknet53           |  64  | 0.982813 |
|        res2net50_14w_8s         | 128  | 0.982782 |
|          pnasnet5large          |  16  | 0.982654 |
|        convmixer_768_32         |  32  | 0.982627 |
|          jx_nest_base           |  32  | 0.982519 |
| deit_base_distilled_patch16_224 |  64  | 0.982504 |
|      vit_base_patch16_224       |  64  |  0.9823  |
|            tinynet_a            | 128  | 0.982129 |
|  swin_base_patch4_window7_224   |  64  | 0.98183  |
|            gernet_l             | 128  | 0.981015 |
|          resmlp_12_224          | 128  | 0.980196 |
|           mobilevit_s           |  64  | 0.980003 |
|            hrnet_w18            | 128  | 0.978694 |
|     swsl_resnext101_32x16d      |  32  | 0.977906 |
|           volo_d1_224           |  64  | 0.977553 |
|           selecsls42b           | 128  | 0.97659  |
|          spnasnet_100           | 128  | 0.973932 |
|            repvgg_a2            | 128  | 0.973594 |
|          cait_m36_384           |  4   | 0.972478 |
|            lcnet_050            | 256  | 0.970755 |
|        sebotnet33ts_256         |  64  | 0.833227 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 457.941116 |
|      xcit_large_24_p8_224       |  16  | 253.222949 |
|             dpn107              |  64  | 215.192236 |
|            levit_128            | 1024 | 208.085622 |
|        tnt_s_patch16_224        | 128  | 191.362968 |
|          convnext_base          |  64  | 175.845271 |
|           dm_nfnet_f0           | 128  | 170.696165 |
|        ese_vovnet19b_dw         | 256  | 167.069666 |
|  swin_base_patch4_window7_224   |  64  | 157.942513 |
|           convit_base           |  64  | 151.81522  |
|          mixer_b16_224          | 128  | 150.255577 |
|          jx_nest_base           |  32  | 148.87076  |
|        twins_pcpvt_base         | 128  | 146.78129  |
|       gluon_inception_v3        | 256  | 140.910179 |
|         poolformer_m36          |  64  | 140.835064 |
|            nfnet_l0             | 128  | 138.275991 |
|           tf_mixnet_l           | 128  | 125.548013 |
|        sebotnet33ts_256         |  64  | 124.921851 |
|            mixnet_l             | 128  | 122.608262 |
|          ghostnet_100           | 512  | 121.032381 |
|         crossvit_9_240          | 256  | 120.611871 |
|          gmixer_24_224          | 128  | 114.69872  |
|          gmlp_s16_224           | 128  | 110.84111  |
|           volo_d1_224           |  64  | 109.05166  |
|      vit_base_patch16_224       |  64  | 106.823443 |
|      beit_base_patch16_224      |  64  | 105.60883  |
|          pnasnet5large          |  16  | 105.024759 |
|         coat_lite_mini          | 128  | 100.115588 |
|            pit_b_224            |  64  | 99.441385  |
|        res2net101_26w_4s        | 128  | 98.768971  |
| deit_base_distilled_patch16_224 |  64  | 98.666431  |
|           fbnetc_100            | 512  | 93.219173  |
|        eca_halonext26ts         | 128  | 92.658481  |
|       eca_botnext26ts_256       | 128  | 89.311213  |
|             dla102              | 128  | 85.649779  |
|         visformer_small         | 128  | 83.210279  |
|     swsl_resnext101_32x16d      |  32  |  82.56173  |
|          resmlp_12_224          | 128  | 81.792973  |
|            hrnet_w18            | 128  | 81.511118  |
|           regnety_002           | 1024 |  80.45541  |
|           mnasnet_100           | 512  | 77.840367  |
|           rexnet_100            | 256  | 76.369054  |
|           res2next50            | 128  |  76.06615  |
|        res2net50_14w_8s         | 128  | 75.755682  |
|      mobilenetv3_large_100      | 512  | 74.377706  |
|          botnet26t_256          | 128  | 71.870662  |
|        convmixer_768_32         |  32  | 71.772658  |
|           resnest101e           |  64  | 69.901217  |
|            fbnetv3_b            | 256  | 69.199978  |
|          inception_v3           | 128  | 65.964453  |
|        adv_inception_v3         | 128  | 65.814887  |
|           mobilevit_s           |  64  | 59.578915  |
|       tf_efficientnet_b0        | 128  | 51.870497  |
|          cspdarknet53           |  64  | 45.726583  |
|            repvgg_a2            | 128  | 44.349489  |
|            gernet_l             | 128  | 36.486333  |
|           selecsls42b           | 128  | 34.658505  |
|            tinynet_a            | 128  | 30.311214  |
|         mobilenetv2_100         | 128  | 22.234281  |
|          spnasnet_100           | 128  | 19.372166  |
|            lcnet_050            | 256  |  8.69134   |
+---------------------------------+------+------------+

zxd1997066 · 2024-10-21T13:49:42Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-19 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.85x    |    1.84x    |    3.14x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.65    |    21.44    |    21.04    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 35.666434 |
|     pyhpc_equation_of_state     |    1    | 19.025095 |
|              dcgan              |    1    | 9.000619  |
|          squeezenet1_1          |    1    | 7.895748  |
|          timm_resnest           |    1    | 6.848257  |
|         opacus_cifar10          |    1    | 6.608184  |
|      functorch_dp_cifar10       |    1    | 6.592929  |
|           timm_nfnet            |    1    | 6.005398  |
|            resnet18             |    1    | 5.806459  |
|          maml_omniglot          |    5    | 5.779188  |
|            resnet50             |    1    | 5.577231  |
|       doctr_det_predictor       |    1    | 5.486876  |
|         LearningToPaint         |    1    | 5.076345  |
|         resnext50_32x4d         |    1    | 5.004196  |
|          mobilenet_v2           |    1    | 4.726356  |
|              vgg16              |    1    | 4.715873  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.708141  |
|             yolov3              |    1    | 4.507494  |
|            resnet152            |    1    | 4.479529  |
|          lennard_jones          |    1    | 4.477904  |
|           mnasnet1_0            |    1    | 4.459061  |
|       mobilenet_v3_large        |    1    | 4.328516  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.314599  |
|              llama              |    1    | 4.283488  |
|     nvidia_deeprecommender      |    1    | 4.189501  |
|             alexnet             |    1    | 4.063591  |
|           timm_vovnet           |    1    | 4.063464  |
|         vision_maskrcnn         |    1    | 3.964025  |
|      doctr_reco_predictor       |    1    | 3.945976  |
|       shufflenet_v2_x1_0        |    1    | 3.722669  |
|     functorch_maml_omniglot     |    1    | 3.715178  |
|           densenet121           |    1    | 3.542205  |
|           timm_regnet           |    1    | 3.496119  |
|              dlrm               |    1    | 3.428494  |
|          basic_gnn_gin          |    1    |  3.40615  |
|         phlippe_resnet          |    1    | 3.210019  |
|        timm_efficientnet        |    1    | 3.124478  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.021052  |
|         basic_gnn_sage          |    1    | 2.995801  |
|          basic_gnn_gcn          |    1    |  2.98134  |
|           Super_SloMo           |    1    | 2.948876  |
|        phlippe_densenet         |    1    | 2.828107  |
|          BERT_pytorch           |    1    | 2.669958  |
|          pytorch_unet           |    1    | 2.288114  |
|       Background_Matting        |    1    | 2.219117  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.218958  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.183654  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.005354  |
|             hf_GPT2             |    1    | 1.999361  |
|        soft_actor_critic        |   256   | 1.995457  |
|  timm_vision_transformer_large  |    1    | 1.992184  |
|     timm_vision_transformer     |    1    | 1.951096  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.900086  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.829694  |
|           hf_Reformer           |    1    | 1.820192  |
|             hf_Bert             |    1    | 1.764753  |
|         pytorch_stargan         |   16    | 1.762932  |
|          hf_GPT2_large          |    1    | 1.756884  |
|        basic_gnn_edgecnn        |    1    | 1.738967  |
|          hf_Bert_large          |    1    |  1.71435  |
|          hf_DistilBert          |    1    | 1.651562  |
|          fastNLP_Bert           |    1    | 1.624439  |
|             hf_Bart             |    1    | 1.533872  |
|      torch_multimodal_clip      |    1    | 1.459194  |
|           hf_T5_base            |    1    | 1.442831  |
|            moondream            |    1    | 1.436995  |
|            hf_Albert            |    1    | 1.425449  |
|       speech_transformer        |    1    | 1.396571  |
|        hf_distil_whisper        |    1    | 1.383171  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.306888  |
|           hf_BigBird            |    1    | 1.306706  |
|           hf_T5_large           |    1    | 1.223067  |
|              hf_T5              |    1    | 1.116952  |
|             demucs              |    1    | 1.032621  |
|           tts_angular           |    1    | 0.997814  |
|     resnet50_quantized_qat      |    1    | 0.985888  |
|   mobilenet_v2_quantized_qat    |    1    | 0.984888  |
|              maml               |    1    | 0.935827  |
|          hf_Longformer          |    1    | 0.887901  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 126.878488 |
|         vision_maskrcnn         |    1    | 119.711398 |
|    detectron2_fcos_r_50_fpn     |    1    | 94.700104  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 92.179632  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 74.173461  |
|              maml               |    1    | 71.764122  |
|          hf_Longformer          |    1    | 71.269038  |
|           hf_T5_large           |    1    | 57.620022  |
|       speech_transformer        |    1    |  43.89163  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 40.176678  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 40.046316  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 37.363386  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 37.182792  |
|           hf_Reformer           |    1    | 36.735227  |
|           densenet121           |    1    | 33.577867  |
|  timm_vision_transformer_large  |    1    | 28.217145  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  27.77378  |
|            moondream            |    1    |  27.43253  |
|          fastNLP_Bert           |    1    | 26.119345  |
|          basic_gnn_gcn          |    1    | 25.603097  |
|          hf_Bert_large          |    1    |  22.6739   |
|      torch_multimodal_clip      |    1    | 21.860642  |
|        hf_distil_whisper        |    1    | 21.667594  |
|            resnet152            |    1    | 21.641188  |
|           Super_SloMo           |    1    | 21.614071  |
|          hf_GPT2_large          |    1    | 21.165495  |
|          BERT_pytorch           |    1    | 19.506702  |
|              hf_T5              |    1    |  18.34238  |
|             yolov3              |    1    | 17.875218  |
|             hf_Bart             |    1    | 17.685457  |
|       doctr_det_predictor       |    1    | 17.301364  |
|             demucs              |    1    | 16.625449  |
|        phlippe_densenet         |    1    | 16.175429  |
|           timm_regnet           |    1    | 16.088785  |
|           timm_nfnet            |    1    | 15.459403  |
|        timm_efficientnet        |    1    | 15.368622  |
|       shufflenet_v2_x1_0        |    1    | 15.224344  |
|              llama              |    1    | 14.889355  |
|             hf_Bert             |    1    | 14.840199  |
|             hf_GPT2             |    1    |  14.66819  |
|          timm_resnest           |    1    | 14.205149  |
|       mobilenet_v3_large        |    1    | 13.913918  |
|     timm_vision_transformer     |    1    | 13.848663  |
|            hf_Albert            |    1    | 13.831974  |
|      doctr_reco_predictor       |    1    | 13.702872  |
|           timm_vovnet           |    1    | 13.144944  |
|          mobilenet_v2           |    1    | 12.752179  |
|            resnet50             |    1    | 12.662748  |
|         resnext50_32x4d         |    1    | 12.607344  |
|           mnasnet1_0            |    1    | 12.387166  |
|         opacus_cifar10          |    1    |  12.1104   |
|          hf_DistilBert          |    1    | 12.049124  |
|      functorch_dp_cifar10       |    1    | 11.662193  |
|     pyhpc_isoneutral_mixing     |    1    | 11.659898  |
|         pytorch_stargan         |   16    | 11.595513  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  11.53317  |
|            resnet18             |    1    |  9.987488  |
|          squeezenet1_1          |    1    |  9.925412  |
|         LearningToPaint         |    1    |  9.772393  |
|         phlippe_resnet          |    1    |  9.558467  |
|           hf_T5_base            |    1    |  9.245883  |
|     pyhpc_equation_of_state     |    1    |  9.139266  |
|             alexnet             |    1    |  9.061597  |
|     functorch_maml_omniglot     |    1    |  8.905368  |
|              vgg16              |    1    |  8.810278  |
|       Background_Matting        |    1    |  8.593663  |
|          maml_omniglot          |    5    |  8.581519  |
|              dlrm               |    1    |  8.529099  |
|         basic_gnn_sage          |    1    |  8.203546  |
|              dcgan              |    1    |  8.192549  |
|          basic_gnn_gin          |    1    |  8.127156  |
|     nvidia_deeprecommender      |    1    |  7.990095  |
|          lennard_jones          |    1    |  7.984209  |
|        basic_gnn_edgecnn        |    1    |  7.978944  |
|        soft_actor_critic        |   256   |  7.940011  |
|           tts_angular           |    1    |  7.828032  |
|          pytorch_unet           |    1    |  4.181474  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199199  |
|     resnet50_quantized_qat      |    1    |  0.178896  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.986136 |
|             demucs              |    1    | 0.98486  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974354 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973112 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.972486 |
|       Background_Matting        |    1    | 0.97005  |
|          pytorch_unet           |    1    | 0.965736 |
|              llama              |    1    | 0.965256 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.964988 |
|        basic_gnn_edgecnn        |    1    | 0.963444 |
|      torch_multimodal_clip      |    1    | 0.956946 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954053 |
|         vision_maskrcnn         |    1    | 0.953582 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.951613 |
|         LearningToPaint         |    1    | 0.951393 |
|       doctr_det_predictor       |    1    | 0.951071 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948958 |
|          basic_gnn_gcn          |    1    | 0.945235 |
|      doctr_reco_predictor       |    1    | 0.94402  |
|           hf_BigBird            |    1    | 0.943737 |
|     resnet50_quantized_qat      |    1    | 0.942029 |
|           Super_SloMo           |    1    | 0.941581 |
|         basic_gnn_sage          |    1    | 0.941176 |
|          basic_gnn_gin          |    1    | 0.939808 |
|          fastNLP_Bert           |    1    | 0.932568 |
|         pytorch_stargan         |   16    | 0.928875 |
|             hf_Bert             |    1    | 0.926804 |
|           hf_T5_base            |    1    | 0.922295 |
|        hf_distil_whisper        |    1    | 0.920314 |
|            hf_Albert            |    1    | 0.918381 |
|  pytorch_CycleGAN_and_pix2pix   |    1    |   0.91   |
|          hf_DistilBert          |    1    | 0.909692 |
|       speech_transformer        |    1    | 0.909142 |
|   mobilenet_v2_quantized_qat    |    1    | 0.906767 |
|          BERT_pytorch           |    1    | 0.904171 |
|              hf_T5              |    1    | 0.904122 |
|             hf_GPT2             |    1    | 0.899586 |
|          hf_GPT2_large          |    1    | 0.896657 |
|          hf_Longformer          |    1    | 0.889463 |
|         opacus_cifar10          |    1    | 0.888771 |
|           tts_angular           |    1    | 0.884718 |
|             hf_Bart             |    1    | 0.881941 |
|        soft_actor_critic        |   256   | 0.878747 |
|  timm_vision_transformer_large  |    1    | 0.869107 |
|           timm_nfnet            |    1    | 0.868766 |
|        timm_efficientnet        |    1    | 0.866354 |
|          mobilenet_v2           |    1    | 0.865772 |
|     timm_vision_transformer     |    1    | 0.863051 |
|          squeezenet1_1          |    1    | 0.862952 |
|            moondream            |    1    | 0.860018 |
|          hf_Bert_large          |    1    | 0.858506 |
|              vgg16              |    1    | 0.858454 |
|          lennard_jones          |    1    | 0.856187 |
|          maml_omniglot          |    5    | 0.855118 |
|     functorch_maml_omniglot     |    1    | 0.855118 |
|       mobilenet_v3_large        |    1    | 0.847348 |
|           mnasnet1_0            |    1    | 0.844622 |
|              dcgan              |    1    | 0.842258 |
|     pyhpc_equation_of_state     |    1    | 0.841584 |
|             alexnet             |    1    | 0.83871  |
|      functorch_dp_cifar10       |    1    | 0.838129 |
|     nvidia_deeprecommender      |    1    | 0.836822 |
|          timm_resnest           |    1    | 0.83617  |
|         phlippe_resnet          |    1    | 0.835443 |
|       shufflenet_v2_x1_0        |    1    | 0.828892 |
|           hf_Reformer           |    1    | 0.827231 |
|           hf_T5_large           |    1    | 0.824705 |
|     pyhpc_isoneutral_mixing     |    1    | 0.812102 |
|        phlippe_densenet         |    1    | 0.803892 |
|         resnext50_32x4d         |    1    | 0.802189 |
|           densenet121           |    1    | 0.800213 |
|            resnet18             |    1    | 0.797778 |
|           timm_vovnet           |    1    | 0.793559 |
|             yolov3              |    1    | 0.790494 |
|           timm_regnet           |    1    | 0.782443 |
|            resnet50             |    1    | 0.776683 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.76526  |
|            resnet152            |    1    | 0.737738 |
|              maml               |    1    | 0.70257  |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9883.062668 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2931.807264 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2903.736149 |
|          hf_GPT2_large          |    1    | 2526.885372 |
|            moondream            |    1    | 2240.200631 |
|           hf_T5_large           |    1    | 2178.280345 |
|        hf_distil_whisper        |    1    | 1947.29675  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1671.365518 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1645.40502  |
|          pytorch_unet           |    1    | 1400.717686 |
|       Background_Matting        |    1    | 1217.868552 |
|             demucs              |    1    | 1205.464278 |
|  timm_vision_transformer_large  |    1    | 945.383879  |
|         vision_maskrcnn         |    1    | 695.612569  |
|    detectron2_fcos_r_50_fpn     |    1    | 625.750021  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 598.408432  |
|           hf_BigBird            |    1    | 540.915692  |
|          hf_Longformer          |    1    | 511.740668  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 511.674975  |
|          hf_Bert_large          |    1    | 448.118809  |
|       doctr_det_predictor       |    1    | 391.593327  |
|      torch_multimodal_clip      |    1    | 323.905088  |
|           Super_SloMo           |    1    | 318.145944  |
|             hf_Bart             |    1    | 247.902607  |
|              hf_T5              |    1    |  234.29801  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 227.570596  |
|         pytorch_stargan         |   16    | 215.768818  |
|             hf_Bert             |    1    | 172.811655  |
|          fastNLP_Bert           |    1    | 166.565374  |
|       speech_transformer        |    1    | 164.274811  |
|            hf_Albert            |    1    | 161.516822  |
|             hf_GPT2             |    1    | 112.206355  |
|          hf_DistilBert          |    1    | 104.342454  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 101.426877  |
|           hf_Reformer           |    1    |  98.931337  |
|        basic_gnn_edgecnn        |    1    |  86.038001  |
|             yolov3              |    1    |  62.987767  |
|              maml               |    1    |  58.185676  |
|              vgg16              |    1    |  42.840053  |
|          BERT_pytorch           |    1    |  42.820258  |
|     nvidia_deeprecommender      |    1    |  39.266183  |
|           timm_regnet           |    1    |  32.904503  |
|           timm_nfnet            |    1    |  30.549887  |
|           tts_angular           |    1    |  30.469313  |
|            resnet152            |    1    |  29.450691  |
|          basic_gnn_gcn          |    1    |  27.49122   |
|     timm_vision_transformer     |    1    |  18.277154  |
|           timm_vovnet           |    1    |  13.373015  |
|           densenet121           |    1    |  13.296242  |
|         resnext50_32x4d         |    1    |  13.293248  |
|             alexnet             |    1    |  12.385059  |
|            resnet50             |    1    |  10.710595  |
|              llama              |    1    |  10.319067  |
|         basic_gnn_sage          |    1    |  9.851427   |
|          basic_gnn_gin          |    1    |  8.992885   |
|     resnet50_quantized_qat      |    1    |  8.555586   |
|        timm_efficientnet        |    1    |  7.918734   |
|          timm_resnest           |    1    |  5.914545   |
|   mobilenet_v2_quantized_qat    |    1    |  5.833248   |
|      doctr_reco_predictor       |    1    |  5.550178   |
|            resnet18             |    1    |  3.915284   |
|       mobilenet_v3_large        |    1    |  3.748937   |
|           mnasnet1_0            |    1    |  3.528325   |
|       shufflenet_v2_x1_0        |    1    |  3.527395   |
|          mobilenet_v2           |    1    |  3.475737   |
|         LearningToPaint         |    1    |  2.532208   |
|        phlippe_densenet         |    1    |  2.256014   |
|          squeezenet1_1          |    1    |   2.16459   |
|         opacus_cifar10          |    1    |  1.654555   |
|      functorch_dp_cifar10       |    1    |  1.642534   |
|         phlippe_resnet          |    1    |   0.81653   |
|        soft_actor_critic        |   256   |  0.602897   |
|              dcgan              |    1    |  0.590253   |
|     functorch_maml_omniglot     |    1    |  0.504908   |
|              dlrm               |    1    |  0.449484   |
|          maml_omniglot          |    5    |  0.298224   |
|     pyhpc_isoneutral_mixing     |    1    |  0.047831   |
|          lennard_jones          |    1    |  0.029157   |
|     pyhpc_equation_of_state     |    1    |  0.027995   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.993686 |
|     MobileBertForQuestionAnswering      | 1  | 3.42135  |
|     PegasusForConditionalGeneration     | 1  | 2.924113 |
|          BlenderbotForCausalLM          | 1  | 2.848328 |
|          DistilBertForMaskedLM          | 1  | 2.719266 |
|         Speech2Text2ForCausalLM         | 1  | 2.686381 |
|           PegasusForCausalLM            | 1  | 2.675288 |
|     DistilBertForQuestionAnswering      | 1  | 2.540852 |
|       BlenderbotSmallForCausalLM        | 1  | 2.530987 |
|            YituTechConvBert             | 1  | 2.524717 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.487212 |
|             XGLMForCausalLM             | 1  | 2.476342 |
|     M2M100ForConditionalGeneration      | 1  | 2.275792 |
|            XLNetLMHeadModel             | 1  | 2.055951 |
|       DebertaForQuestionAnswering       | 1  | 2.00848  |
|           DebertaForMaskedLM            | 1  | 1.996917 |
|           ElectraForCausalLM            | 1  | 1.955027 |
|       MT5ForConditionalGeneration       | 1  | 1.857688 |
|                CamemBert                | 1  | 1.828722 |
|           RobertaForCausalLM            | 1  | 1.828694 |
|           LayoutLMForMaskedLM           | 1  | 1.818156 |
|      GPT2ForSequenceClassification      | 1  | 1.810655 |
|             BertForMaskedLM             | 1  | 1.806246 |
|         MegatronBertForCausalLM         | 1  | 1.80619  |
|    MegatronBertForQuestionAnswering     | 1  | 1.796835 |
|    LayoutLMForSequenceClassification    | 1  | 1.779613 |
|       RobertaForQuestionAnswering       | 1  | 1.773618 |
|               DistillGPT2               | 1  | 1.767799 |
|        BertForQuestionAnswering         | 1  | 1.743277 |
|       ElectraForQuestionAnswering       | 1  | 1.699194 |
|            TrOCRForCausalLM             | 1  | 1.673093 |
|          DebertaV2ForMaskedLM           | 1  | 1.637538 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.603087 |
|             OPTForCausalLM              | 1  | 1.432889 |
|             BartForCausalLM             | 1  | 1.38468  |
|      BartForConditionalGeneration       | 1  | 1.372691 |
|     PLBartForConditionalGeneration      | 1  | 1.346454 |
|            PLBartForCausalLM            | 1  | 1.317792 |
|      MBartForConditionalGeneration      | 1  | 1.306436 |
|            MBartForCausalLM             | 1  | 1.28132  |
|               GoogleFnet                | 1  | 1.227962 |
|       AlbertForQuestionAnswering        | 1  | 1.206772 |
|            AlbertForMaskedLM            | 1  | 1.180129 |
|                 T5Small                 | 1  | 1.157682 |
|       T5ForConditionalGeneration        | 1  | 1.156376 |
|          AllenaiLongformerBase          | 1  | 0.79366  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.666111 |
|     MobileBertForQuestionAnswering      | 1  | 48.452992 |
|          MobileBertForMaskedLM          | 1  | 48.448234 |
|     PegasusForConditionalGeneration     | 1  | 32.073073 |
|     M2M100ForConditionalGeneration      | 1  | 31.99756  |
|      MBartForConditionalGeneration      | 1  | 31.291648 |
|            XLNetLMHeadModel             | 1  | 28.569585 |
|             XGLMForCausalLM             | 1  | 28.432614 |
|      BartForConditionalGeneration       | 1  | 27.514668 |
|      DebertaV2ForQuestionAnswering      | 1  | 26.812631 |
|          DebertaV2ForMaskedLM           | 1  | 26.458831 |
|          BlenderbotForCausalLM          | 1  | 25.223404 |
| BlenderbotSmallForConditionalGeneration | 1  | 24.509179 |
|       MT5ForConditionalGeneration       | 1  | 23.439409 |
|         MegatronBertForCausalLM         | 1  | 23.061277 |
|    MegatronBertForQuestionAnswering     | 1  | 22.985396 |
|            YituTechConvBert             | 1  | 21.535059 |
|     PLBartForConditionalGeneration      | 1  | 19.407555 |
|                 T5Small                 | 1  | 18.325954 |
|       T5ForConditionalGeneration        | 1  | 18.299582 |
|            TrOCRForCausalLM             | 1  | 17.311745 |
|           PegasusForCausalLM            | 1  | 17.290043 |
|            MBartForCausalLM             | 1  | 16.512623 |
|       DebertaForQuestionAnswering       | 1  | 16.275015 |
|           DebertaForMaskedLM            | 1  | 16.19969  |
|           ElectraForCausalLM            | 1  | 15.61056  |
|       ElectraForQuestionAnswering       | 1  | 15.419519 |
|           LayoutLMForMaskedLM           | 1  | 15.268714 |
|                CamemBert                | 1  | 15.152794 |
|             BertForMaskedLM             | 1  | 15.138183 |
|       RobertaForQuestionAnswering       | 1  | 15.093506 |
|           RobertaForCausalLM            | 1  | 15.060248 |
|    LayoutLMForSequenceClassification    | 1  | 15.049176 |
|        BertForQuestionAnswering         | 1  | 15.039736 |
|            AlbertForMaskedLM            | 1  | 14.670074 |
|             BartForCausalLM             | 1  | 14.521159 |
|       BlenderbotSmallForCausalLM        | 1  | 14.497516 |
|      GPT2ForSequenceClassification      | 1  | 14.226727 |
|             OPTForCausalLM              | 1  | 13.775181 |
|               GoogleFnet                | 1  | 13.231067 |
|         Speech2Text2ForCausalLM         | 1  | 12.873018 |
|          DistilBertForMaskedLM          | 1  | 12.678327 |
|     DistilBertForQuestionAnswering      | 1  | 12.445391 |
|            PLBartForCausalLM            | 1  | 12.395551 |
|       AlbertForQuestionAnswering        | 1  | 11.779576 |
|               DistillGPT2               | 1  | 11.183185 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.975325 |
|            MBartForCausalLM             | 1  | 0.967657 |
|               DistillGPT2               | 1  | 0.953977 |
|           PegasusForCausalLM            | 1  | 0.94081  |
|                CamemBert                | 1  | 0.939305 |
|       RobertaForQuestionAnswering       | 1  | 0.938678 |
|            YituTechConvBert             | 1  | 0.937522 |
|     PLBartForConditionalGeneration      | 1  | 0.937488 |
|           LayoutLMForMaskedLM           | 1  | 0.936725 |
|       MT5ForConditionalGeneration       | 1  | 0.936473 |
|             BertForMaskedLM             | 1  | 0.935834 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.93566  |
|          BlenderbotForCausalLM          | 1  |  0.9336  |
|             BartForCausalLM             | 1  | 0.932609 |
|            PLBartForCausalLM            | 1  | 0.931732 |
|           DebertaForMaskedLM            | 1  | 0.92502  |
|            TrOCRForCausalLM             | 1  | 0.924948 |
|        BertForQuestionAnswering         | 1  | 0.924843 |
|    MegatronBertForQuestionAnswering     | 1  | 0.924823 |
|     M2M100ForConditionalGeneration      | 1  | 0.924704 |
|           RobertaForCausalLM            | 1  | 0.922444 |
|                 T5Small                 | 1  | 0.921524 |
|       T5ForConditionalGeneration        | 1  | 0.921503 |
|    LayoutLMForSequenceClassification    | 1  | 0.921008 |
|      GPT2ForSequenceClassification      | 1  | 0.918468 |
|             XGLMForCausalLM             | 1  | 0.91679  |
|       BlenderbotSmallForCausalLM        | 1  | 0.916513 |
|      BartForConditionalGeneration       | 1  | 0.909081 |
|       DebertaForQuestionAnswering       | 1  | 0.901327 |
|          DebertaV2ForMaskedLM           | 1  | 0.900817 |
|          AllenaiLongformerBase          | 1  | 0.897624 |
|      MBartForConditionalGeneration      | 1  | 0.896146 |
|           ElectraForCausalLM            | 1  | 0.895246 |
|         MegatronBertForCausalLM         | 1  | 0.878124 |
|          DistilBertForMaskedLM          | 1  | 0.877506 |
|       ElectraForQuestionAnswering       | 1  | 0.877432 |
|            XLNetLMHeadModel             | 1  | 0.874919 |
|     PegasusForConditionalGeneration     | 1  | 0.873998 |
|     DistilBertForQuestionAnswering      | 1  | 0.870566 |
|         Speech2Text2ForCausalLM         | 1  | 0.855846 |
|               GoogleFnet                | 1  | 0.852181 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.850607 |
|          MobileBertForMaskedLM          | 1  | 0.781172 |
|     MobileBertForQuestionAnswering      | 1  | 0.741248 |
|            AlbertForMaskedLM            | 1  | 0.686586 |
|       AlbertForQuestionAnswering        | 1  | 0.668951 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|            AlbertForMaskedLM            | 1  | 3860.564438 |
|       AlbertForQuestionAnswering        | 1  | 3836.929962 |
|             OPTForCausalLM              | 1  | 2192.57328  |
|      MBartForConditionalGeneration      | 1  | 1779.573305 |
|      BartForConditionalGeneration       | 1  | 1386.393582 |
|          DebertaV2ForMaskedLM           | 1  | 1309.036896 |
|          AllenaiLongformerBase          | 1  | 1129.665151 |
|      DebertaV2ForQuestionAnswering      | 1  | 1009.438191 |
|            MBartForCausalLM             | 1  | 976.277312  |
|            XLNetLMHeadModel             | 1  | 964.572857  |
|                 T5Small                 | 1  | 802.936089  |
|       T5ForConditionalGeneration        | 1  | 802.847952  |
|          BlenderbotForCausalLM          | 1  | 756.445698  |
|     PLBartForConditionalGeneration      | 1  | 649.509721  |
|             BartForCausalLM             | 1  | 606.378406  |
|         MegatronBertForCausalLM         | 1  | 510.229804  |
|    MegatronBertForQuestionAnswering     | 1  | 467.478763  |
|               GoogleFnet                | 1  | 450.412992  |
|            PLBartForCausalLM            | 1  | 383.043417  |
|      GPT2ForSequenceClassification      | 1  | 369.741468  |
|             XGLMForCausalLM             | 1  | 274.035528  |
|     M2M100ForConditionalGeneration      | 1  | 264.160147  |
|           RobertaForCausalLM            | 1  |  207.34992  |
|           DebertaForMaskedLM            | 1  | 205.445318  |
|            YituTechConvBert             | 1  | 197.358861  |
|       MT5ForConditionalGeneration       | 1  | 192.770929  |
|           LayoutLMForMaskedLM           | 1  |  181.48213  |
|                CamemBert                | 1  | 181.259903  |
|            TrOCRForCausalLM             | 1  | 181.106312  |
|             BertForMaskedLM             | 1  | 180.674602  |
|     PegasusForConditionalGeneration     | 1  | 177.444643  |
|       RobertaForQuestionAnswering       | 1  | 153.243277  |
|               DistillGPT2               | 1  | 146.950159  |
|        BertForQuestionAnswering         | 1  | 143.209744  |
|    LayoutLMForSequenceClassification    | 1  | 142.384572  |
|       DebertaForQuestionAnswering       | 1  | 140.824063  |
|           PegasusForCausalLM            | 1  |  88.461158  |
| BlenderbotSmallForConditionalGeneration | 1  |  51.888482  |
|           ElectraForCausalLM            | 1  |  48.520791  |
|          DistilBertForMaskedLM          | 1  |  30.514723  |
|       ElectraForQuestionAnswering       | 1  |  29.386266  |
|          MobileBertForMaskedLM          | 1  |  29.071357  |
|       BlenderbotSmallForCausalLM        | 1  |  28.181029  |
|     DistilBertForQuestionAnswering      | 1  |  19.858426  |
|     MobileBertForQuestionAnswering      | 1  |  17.974263  |
|         Speech2Text2ForCausalLM         | 1  |  5.332559   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|       gluon_inception_v3        | 1  | 6.365586 |
|        ese_vovnet19b_dw         | 1  | 6.288528 |
|          inception_v3           | 1  | 6.175453 |
|        adv_inception_v3         | 1  | 6.000201 |
|           resnest101e           | 1  | 5.880732 |
|          pnasnet5large          | 1  | 5.359693 |
|          cspdarknet53           | 1  | 5.30477  |
|           dm_nfnet_f0           | 1  | 5.139804 |
|           res2next50            | 1  | 4.909663 |
|     swsl_resnext101_32x16d      | 1  | 4.894647 |
|            nfnet_l0             | 1  | 4.824564 |
|         mobilenetv2_100         | 1  | 4.816481 |
|           fbnetc_100            | 1  | 4.685898 |
|            fbnetv3_b            | 1  | 4.64673  |
|             dla102              | 1  | 4.637794 |
|          spnasnet_100           | 1  | 4.585708 |
|      mobilenetv3_large_100      | 1  | 4.548798 |
|           mnasnet_100           | 1  | 4.501143 |
|        res2net50_14w_8s         | 1  | 4.347868 |
|          botnet26t_256          | 1  | 4.29827  |
|            gernet_l             | 1  | 4.18841  |
|            hrnet_w18            | 1  | 4.17507  |
|           selecsls42b           | 1  | 4.143066 |
|        res2net101_26w_4s        | 1  | 4.106709 |
|           regnety_002           | 1  | 4.006867 |
|          ghostnet_100           | 1  | 3.853743 |
|            repvgg_a2            | 1  | 3.765042 |
|       eca_botnext26ts_256       | 1  | 3.600782 |
|           mobilevit_s           | 1  | 3.515144 |
|            lcnet_050            | 1  | 3.376796 |
|            tinynet_a            | 1  | 3.37448  |
|        eca_halonext26ts         | 1  | 3.336943 |
|            levit_128            | 1  | 3.327885 |
|           rexnet_100            | 1  | 3.260867 |
|         poolformer_m36          | 1  | 3.20728  |
|         visformer_small         | 1  | 3.049056 |
|       tf_efficientnet_b0        | 1  | 3.045921 |
|             dpn107              | 1  | 2.966105 |
|        convmixer_768_32         | 1  | 2.64928  |
|         coat_lite_mini          | 1  | 2.525829 |
|        twins_pcpvt_base         | 1  | 2.461356 |
|           volo_d1_224           | 1  | 2.410378 |
|          gmixer_24_224          | 1  | 2.246753 |
|            mixnet_l             | 1  | 2.240927 |
|           tf_mixnet_l           | 1  | 2.185221 |
|  swin_base_patch4_window7_224   | 1  | 2.119316 |
|          gmlp_s16_224           | 1  | 1.835281 |
|        tnt_s_patch16_224        | 1  | 1.829227 |
|      beit_base_patch16_224      | 1  | 1.828676 |
|          jx_nest_base           | 1  | 1.820216 |
|      xcit_large_24_p8_224       | 1  | 1.819513 |
|            pit_b_224            | 1  | 1.801074 |
|           convit_base           | 1  | 1.797522 |
|         crossvit_9_240          | 1  | 1.71293  |
|          convnext_base          | 1  | 1.700077 |
|          resmlp_12_224          | 1  | 1.689489 |
|      vit_base_patch16_224       | 1  | 1.632793 |
| deit_base_distilled_patch16_224 | 1  | 1.543263 |
|          cait_m36_384           | 1  | 1.540684 |
|          mixer_b16_224          | 1  | 1.442038 |
|        sebotnet33ts_256         | 1  | 1.209403 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|      xcit_large_24_p8_224       | 1  | 43.080951 |
|         poolformer_m36          | 1  | 42.994646 |
|            hrnet_w18            | 1  | 42.942655 |
|          pnasnet5large          | 1  | 42.045186 |
|          cait_m36_384           | 1  | 39.730178 |
|  swin_base_patch4_window7_224   | 1  | 37.885762 |
|          jx_nest_base           | 1  | 31.763545 |
|        res2net101_26w_4s        | 1  | 31.038473 |
|        twins_pcpvt_base         | 1  | 30.225745 |
|           resnest101e           | 1  | 29.783035 |
|        res2net50_14w_8s         | 1  | 29.459274 |
|        tnt_s_patch16_224        | 1  | 28.465147 |
|             dpn107              | 1  | 27.228236 |
|           tf_mixnet_l           | 1  | 26.618956 |
|        adv_inception_v3         | 1  | 25.303451 |
|            mixnet_l             | 1  | 25.110867 |
|           mobilevit_s           | 1  | 25.066594 |
|           volo_d1_224           | 1  | 23.755914 |
|          inception_v3           | 1  | 22.553503 |
|       gluon_inception_v3        | 1  | 22.516549 |
|            levit_128            | 1  | 21.869012 |
|          gmixer_24_224          | 1  | 21.709453 |
|         crossvit_9_240          | 1  | 21.466187 |
|          gmlp_s16_224           | 1  | 20.503432 |
|          convnext_base          | 1  | 20.442278 |
|           res2next50            | 1  | 20.040068 |
|        eca_halonext26ts         | 1  | 19.687736 |
|            fbnetv3_b            | 1  | 19.591803 |
|        sebotnet33ts_256         | 1  | 19.170403 |
|           rexnet_100            | 1  | 18.674231 |
|          ghostnet_100           | 1  | 18.398978 |
|             dla102              | 1  | 18.363007 |
|         coat_lite_mini          | 1  | 18.280134 |
|           convit_base           | 1  | 17.287843 |
|         visformer_small         | 1  | 16.777698 |
|       eca_botnext26ts_256       | 1  | 16.714922 |
|            tinynet_a            | 1  | 16.560579 |
|     swsl_resnext101_32x16d      | 1  | 16.210122 |
|        convmixer_768_32         | 1  | 15.744583 |
|       tf_efficientnet_b0        | 1  | 15.638214 |
|           dm_nfnet_f0           | 1  | 15.390852 |
|          botnet26t_256          | 1  | 15.381285 |
|          cspdarknet53           | 1  | 15.168763 |
|            pit_b_224            | 1  | 14.53164  |
|      beit_base_patch16_224      | 1  | 14.191414 |
|          mixer_b16_224          | 1  | 14.175863 |
|            nfnet_l0             | 1  | 14.018933 |
| deit_base_distilled_patch16_224 | 1  | 13.769616 |
|           regnety_002           | 1  | 13.736162 |
|      mobilenetv3_large_100      | 1  | 13.684591 |
|      vit_base_patch16_224       | 1  | 13.684061 |
|            repvgg_a2            | 1  | 13.561972 |
|           fbnetc_100            | 1  | 13.38354  |
|          spnasnet_100           | 1  | 13.294982 |
|            gernet_l             | 1  | 12.899985 |
|         mobilenetv2_100         | 1  | 12.563255 |
|        ese_vovnet19b_dw         | 1  | 12.354243 |
|           mnasnet_100           | 1  | 12.291165 |
|           selecsls42b           | 1  | 11.900152 |
|          resmlp_12_224          | 1  | 11.893026 |
|            lcnet_050            | 1  | 11.047448 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.912883 |
|          convnext_base          | 1  | 0.909833 |
|          pnasnet5large          | 1  | 0.907392 |
|      beit_base_patch16_224      | 1  | 0.896603 |
|        convmixer_768_32         | 1  | 0.896017 |
|          cait_m36_384           | 1  | 0.892373 |
|           dm_nfnet_f0           | 1  | 0.891508 |
|          resmlp_12_224          | 1  | 0.88879  |
|         poolformer_m36          | 1  | 0.887616 |
|        ese_vovnet19b_dw         | 1  | 0.884378 |
|      xcit_large_24_p8_224       | 1  | 0.879563 |
|           volo_d1_224           | 1  | 0.876666 |
|           convit_base           | 1  | 0.87631  |
|         mobilenetv2_100         | 1  | 0.874473 |
|  swin_base_patch4_window7_224   | 1  | 0.874061 |
|         visformer_small         | 1  | 0.87301  |
|            pit_b_224            | 1  | 0.872837 |
|      vit_base_patch16_224       | 1  | 0.870318 |
|          mixer_b16_224          | 1  | 0.870131 |
|          jx_nest_base           | 1  | 0.868433 |
|          gmlp_s16_224           | 1  | 0.868025 |
| deit_base_distilled_patch16_224 | 1  | 0.867303 |
|        twins_pcpvt_base         | 1  | 0.867163 |
|           mobilevit_s           | 1  | 0.862611 |
|            lcnet_050            | 1  | 0.860749 |
|      mobilenetv3_large_100      | 1  | 0.860626 |
|         coat_lite_mini          | 1  | 0.860417 |
|           mnasnet_100           | 1  | 0.859744 |
|          gmixer_24_224          | 1  | 0.85896  |
|       tf_efficientnet_b0        | 1  | 0.856265 |
|           rexnet_100            | 1  | 0.856111 |
|            fbnetv3_b            | 1  | 0.852957 |
|       eca_botnext26ts_256       | 1  | 0.851113 |
|          botnet26t_256          | 1  | 0.850285 |
|            tinynet_a            | 1  | 0.849605 |
|          spnasnet_100           | 1  | 0.849085 |
|           fbnetc_100            | 1  | 0.847852 |
|        sebotnet33ts_256         | 1  | 0.844036 |
|           tf_mixnet_l           | 1  | 0.841073 |
|        eca_halonext26ts         | 1  | 0.840972 |
|        tnt_s_patch16_224        | 1  | 0.837383 |
|            mixnet_l             | 1  | 0.835342 |
|          ghostnet_100           | 1  | 0.831955 |
|         crossvit_9_240          | 1  | 0.830556 |
|           regnety_002           | 1  | 0.827503 |
|             dpn107              | 1  | 0.826841 |
|            levit_128            | 1  | 0.822198 |
|          cspdarknet53           | 1  | 0.795164 |
|           res2next50            | 1  | 0.79505  |
|             dla102              | 1  | 0.78884  |
|          inception_v3           | 1  | 0.787622 |
|       gluon_inception_v3        | 1  | 0.787202 |
|        adv_inception_v3         | 1  | 0.786856 |
|        res2net50_14w_8s         | 1  | 0.780286 |
|           resnest101e           | 1  | 0.779524 |
|           selecsls42b           | 1  | 0.771946 |
|            gernet_l             | 1  | 0.770175 |
|            repvgg_a2            | 1  | 0.765791 |
|            hrnet_w18            | 1  | 0.758858 |
|        res2net101_26w_4s        | 1  | 0.757158 |
|     swsl_resnext101_32x16d      | 1  | 0.724084 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1622.132588 |
|      xcit_large_24_p8_224       | 1  | 430.050934  |
|          pnasnet5large          | 1  |  129.01909  |
|          jx_nest_base           | 1  | 108.792932  |
|          convnext_base          | 1  |  99.288825  |
|     swsl_resnext101_32x16d      | 1  |  91.641332  |
|  swin_base_patch4_window7_224   | 1  |  87.899179  |
|           convit_base           | 1  |  87.271373  |
| deit_base_distilled_patch16_224 | 1  |  80.50194   |
|      beit_base_patch16_224      | 1  |  73.536141  |
|      vit_base_patch16_224       | 1  |  72.914972  |
|             dpn107              | 1  |  72.342298  |
|        convmixer_768_32         | 1  |  69.638126  |
|            pit_b_224            | 1  |  64.697285  |
|          mixer_b16_224          | 1  |  61.324358  |
|         poolformer_m36          | 1  |  49.436694  |
|        sebotnet33ts_256         | 1  |  49.29483   |
|        twins_pcpvt_base         | 1  |  48.438617  |
|           volo_d1_224           | 1  |  41.879146  |
|        tnt_s_patch16_224        | 1  |  41.745703  |
|           dm_nfnet_f0           | 1  |  40.925609  |
|           resnest101e           | 1  |  31.408634  |
|          gmlp_s16_224           | 1  |  28.786794  |
|        res2net101_26w_4s        | 1  |  28.359706  |
|            nfnet_l0             | 1  |  27.721632  |
|          gmixer_24_224          | 1  |  25.967138  |
|         visformer_small         | 1  |  22.789057  |
|            hrnet_w18            | 1  |  22.761458  |
|           mobilevit_s           | 1  |  20.729886  |
|           tf_mixnet_l           | 1  |  20.296846  |
|             dla102              | 1  |  19.25199   |
|            mixnet_l             | 1  |  18.913833  |
|        res2net50_14w_8s         | 1  |  17.280115  |
|          resmlp_12_224          | 1  |  17.081355  |
|        eca_halonext26ts         | 1  |  15.958258  |
|          cspdarknet53           | 1  |  15.925146  |
|          inception_v3           | 1  |  15.644854  |
|        adv_inception_v3         | 1  |  15.543211  |
|         coat_lite_mini          | 1  |  15.21007   |
|       gluon_inception_v3        | 1  |  15.093871  |
|       eca_botnext26ts_256       | 1  |  14.919386  |
|           res2next50            | 1  |  14.504826  |
|         crossvit_9_240          | 1  |  14.020649  |
|            repvgg_a2            | 1  |  13.016799  |
|            gernet_l             | 1  |  12.369115  |
|          botnet26t_256          | 1  |  12.084518  |
|           selecsls42b           | 1  |  11.372196  |
|       tf_efficientnet_b0        | 1  |  8.396063   |
|        ese_vovnet19b_dw         | 1  |  8.074353   |
|            fbnetv3_b            | 1  |  7.838312   |
|           rexnet_100            | 1  |  7.746617   |
|            tinynet_a            | 1  |  7.203199   |
|            levit_128            | 1  |  5.513957   |
|          ghostnet_100           | 1  |  4.975376   |
|           fbnetc_100            | 1  |  4.305335   |
|          spnasnet_100           | 1  |   4.01897   |
|      mobilenetv3_large_100      | 1  |  3.768383   |
|           mnasnet_100           | 1  |  3.562599   |
|         mobilenetv2_100         | 1  |  3.471943   |
|           regnety_002           | 1  |   3.28538   |
|            lcnet_050            | 1  |  1.828251   |
+---------------------------------+----+-------------+

WeizhuoZhang-intel · 2024-10-21T14:25:44Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-19 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.01x    |    2.17x    |    2.65x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   23.82    |    31.73    |    32.57    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.154252 |
|          timm_resnest           |   32    | 4.715913  |
|            resnet50             |   32    | 4.032457  |
|         phlippe_resnet          |   128   | 3.737354  |
|          squeezenet1_1          |   16    | 3.709033  |
|           mnasnet1_0            |   32    | 3.295869  |
|            resnet152            |   32    |  3.23514  |
|              vgg16              |    4    | 3.223033  |
|         resnext50_32x4d         |    8    | 3.202158  |
|             yolov3              |    8    | 3.160205  |
|          mobilenet_v2           |   16    | 3.130465  |
|           timm_vovnet           |   32    | 3.086211  |
|             alexnet             |   128   |  2.97378  |
|            resnet18             |    8    | 2.940621  |
|       mobilenet_v3_large        |   32    | 2.929655  |
|             hf_GPT2             |    1    | 2.910408  |
|          hf_Bert_large          |    1    | 2.722362  |
|             hf_Bert             |    1    |  2.69369  |
|        timm_efficientnet        |   64    | 2.693553  |
|              llama              |   32    |  2.68894  |
|           timm_nfnet            |   128   | 2.663188  |
|       shufflenet_v2_x1_0        |   64    | 2.606628  |
|       doctr_det_predictor       |    1    | 2.571852  |
|           timm_regnet           |   32    | 2.558186  |
|        phlippe_densenet         |   128   |  2.45143  |
|          hf_DistilBert          |    1    |  2.42914  |
|           densenet121           |   64    | 2.418315  |
|           hf_T5_base            |    1    | 2.408396  |
|          BERT_pytorch           |    2    | 2.380099  |
|             hf_Bart             |    1    | 2.185217  |
|            hf_Albert            |    1    | 2.170972  |
|     functorch_maml_omniglot     |    1    | 2.169574  |
|        soft_actor_critic        |   256   | 2.163561  |
|              hf_T5              |    1    | 2.118355  |
|          fastNLP_Bert           |    1    | 2.067902  |
|            moondream            |    1    | 2.052586  |
|          lennard_jones          |  1000   |  2.02813  |
|         LearningToPaint         |   96    | 1.950205  |
|      doctr_reco_predictor       |    1    | 1.926674  |
|          hf_GPT2_large          |    1    | 1.923087  |
|          hf_Longformer          |    1    | 1.864824  |
|           hf_T5_large           |    1    | 1.815626  |
|              dcgan              |   256   | 1.802455  |
|          pytorch_unet           |    1    | 1.722718  |
|        basic_gnn_edgecnn        |    1    | 1.711818  |
|       Background_Matting        |    1    | 1.697464  |
|     timm_vision_transformer     |   32    | 1.627574  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.588712  |
|        hf_distil_whisper        |    1    | 1.568664  |
|         pytorch_stargan         |   16    | 1.538831  |
|       speech_transformer        |    1    |  1.51038  |
|     nvidia_deeprecommender      |   256   | 1.477212  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.462095  |
|  timm_vision_transformer_large  |   32    | 1.442286  |
|          maml_omniglot          |    5    | 1.416316  |
|           hf_Reformer           |    1    | 1.408698  |
|          basic_gnn_gin          |    1    | 1.368994  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.346855  |
|         vision_maskrcnn         |    1    | 1.334597  |
|         basic_gnn_sage          |    1    | 1.281285  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  1.25198  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  1.22254  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.199692  |
|              dlrm               |  2048   | 1.191104  |
|           Super_SloMo           |    6    | 1.168636  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.144963  |
|          basic_gnn_gcn          |    1    | 1.138218  |
|      torch_multimodal_clip      |   32    | 1.135513  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.134611  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.081676  |
|             demucs              |    1    | 1.072659  |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.053527  |
|           hf_BigBird            |    1    | 1.044718  |
|         opacus_cifar10          |   64    | 1.019512  |
|   mobilenet_v2_quantized_qat    |   96    | 0.998974  |
|           tts_angular           |   64    | 0.981248  |
|      functorch_dp_cifar10       |   64    | 0.980974  |
|     resnet50_quantized_qat      |   32    |  0.97693  |
|              maml               |    1    | 0.788038  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 129.979425 |
|           hf_BigBird            |    1    | 128.751864 |
|    detectron2_fcos_r_50_fpn     |    1    | 101.16621  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 96.029919  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  88.29825  |
|              maml               |    1    | 72.318518  |
|          hf_Longformer          |    1    | 71.181838  |
|           hf_T5_large           |    1    | 62.434522  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 60.362376  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  59.61734  |
|           densenet121           |   64    | 47.647274  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 45.111132  |
|       speech_transformer        |    1    | 44.143288  |
|  timm_vision_transformer_large  |   32    | 42.951227  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 41.498971  |
|           hf_Reformer           |    1    | 37.889629  |
|      torch_multimodal_clip      |   32    | 35.088736  |
|           Super_SloMo           |    6    | 34.721404  |
|            moondream            |    1    | 34.684921  |
|            resnet152            |   32    | 34.583737  |
|     pyhpc_isoneutral_mixing     | 1048576 | 33.317923  |
|          hf_GPT2_large          |    1    | 32.391791  |
|           hf_T5_base            |    1    | 28.751499  |
|          BERT_pytorch           |    2    |  28.25256  |
|             yolov3              |    8    | 27.948919  |
|          fastNLP_Bert           |    1    | 26.995232  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 26.679168  |
|       doctr_det_predictor       |    1    | 26.292028  |
|          basic_gnn_gcn          |    1    | 26.132361  |
|        hf_distil_whisper        |    1    | 25.549935  |
|          hf_Bert_large          |    1    | 24.066601  |
|           timm_regnet           |   32    | 24.050036  |
|        timm_efficientnet        |   64    |  23.11206  |
|        phlippe_densenet         |   128   | 21.779495  |
|       mobilenet_v3_large        |   32    | 21.429712  |
|       shufflenet_v2_x1_0        |   64    | 20.705574  |
|           timm_nfnet            |   128   |  20.43361  |
|          mobilenet_v2           |   16    | 19.138951  |
|              hf_T5              |    1    |  19.04428  |
|             hf_Bart             |    1    | 18.958491  |
|     timm_vision_transformer     |   32    |  18.49368  |
|          timm_resnest           |   32    | 18.185454  |
|           timm_vovnet           |   32    | 17.703335  |
|         resnext50_32x4d         |    8    | 17.467445  |
|            resnet50             |   32    |  17.46248  |
|             demucs              |    1    | 17.082252  |
|           mnasnet1_0            |   32    | 16.378751  |
|       Background_Matting        |    1    | 16.358527  |
|              llama              |   32    | 15.877667  |
|         opacus_cifar10          |   64    | 15.643584  |
|             hf_Bert             |    1    | 15.590098  |
|             hf_GPT2             |    1    |  15.41778  |
|         pytorch_stargan         |   16    | 15.365216  |
|      functorch_dp_cifar10       |   64    | 15.254731  |
|            hf_Albert            |    1    | 14.547378  |
|      doctr_reco_predictor       |    1    | 14.058441  |
|          pytorch_unet           |    1    |  13.2177   |
|          hf_DistilBert          |    1    | 12.704651  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.319978  |
|            resnet18             |    8    | 12.210015  |
|         LearningToPaint         |   96    | 11.913199  |
|          squeezenet1_1          |   16    | 11.810187  |
|         phlippe_resnet          |   128   | 11.517234  |
|              vgg16              |    4    | 10.597861  |
|             alexnet             |   128   | 10.376729  |
|     pyhpc_equation_of_state     | 1048576 |  9.894637  |
|          maml_omniglot          |    5    |  9.586718  |
|     functorch_maml_omniglot     |    1    |  9.094515  |
|              dcgan              |   256   |  8.762964  |
|              dlrm               |  2048   |  8.710823  |
|     nvidia_deeprecommender      |   256   |  8.569712  |
|         basic_gnn_sage          |    1    |  8.491566  |
|          basic_gnn_gin          |    1    |  8.434036  |
|        basic_gnn_edgecnn        |    1    |  8.417822  |
|        soft_actor_critic        |   256   |  8.287626  |
|          lennard_jones          |  1000   |  8.116385  |
|           tts_angular           |   64    |  8.072873  |
|   mobilenet_v2_quantized_qat    |   96    |  0.223682  |
|     resnet50_quantized_qat      |   32    |  0.19912   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.98784  |
|           timm_nfnet            |   128   | 0.987027 |
|             demucs              |    1    | 0.981732 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974793 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973744 |
|           timm_regnet           |   32    | 0.97235  |
|       Background_Matting        |    1    | 0.970341 |
|           Super_SloMo           |    6    | 0.969319 |
|      torch_multimodal_clip      |   32    | 0.969104 |
|              llama              |   32    | 0.968706 |
|        timm_efficientnet        |   64    | 0.96762  |
|       doctr_det_predictor       |    1    | 0.967026 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.96634  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.966244 |
|        basic_gnn_edgecnn        |    1    | 0.965375 |
|           hf_T5_base            |    1    | 0.965046 |
|          pytorch_unet           |    1    | 0.965007 |
|           densenet121           |   64    | 0.963758 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.963494 |
|            resnet50             |   32    | 0.960798 |
|         LearningToPaint         |   96    | 0.960708 |
|            resnet152            |   32    | 0.958427 |
|             yolov3              |    8    | 0.956816 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.956095 |
|           timm_vovnet           |   32    | 0.955882 |
|     resnet50_quantized_qat      |   32    | 0.953255 |
|     timm_vision_transformer     |   32    | 0.952746 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.952199 |
|          timm_resnest           |   32    | 0.950348 |
|         vision_maskrcnn         |    1    | 0.948452 |
|      doctr_reco_predictor       |    1    | 0.947137 |
|          basic_gnn_gcn          |    1    | 0.946961 |
|   mobilenet_v2_quantized_qat    |   96    | 0.945127 |
|           hf_BigBird            |    1    | 0.939715 |
|           mnasnet1_0            |   32    | 0.93728  |
|     pyhpc_equation_of_state     | 1048576 |  0.9371  |
|          basic_gnn_gin          |    1    | 0.937063 |
|         basic_gnn_sage          |    1    | 0.935574 |
|  timm_vision_transformer_large  |   32    | 0.931088 |
|             hf_Bert             |    1    | 0.927967 |
|         pytorch_stargan         |   16    | 0.927732 |
|         resnext50_32x4d         |    8    | 0.927173 |
|       mobilenet_v3_large        |   32    | 0.92669  |
|       shufflenet_v2_x1_0        |   64    | 0.924312 |
|          mobilenet_v2           |   16    | 0.924165 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.921072 |
|          fastNLP_Bert           |    1    | 0.91973  |
|     nvidia_deeprecommender      |   256   | 0.919048 |
|            hf_Albert            |    1    | 0.913929 |
|       speech_transformer        |    1    | 0.910299 |
|        phlippe_densenet         |   128   | 0.908328 |
|            resnet18             |    8    | 0.906999 |
|              dcgan              |   256   | 0.900232 |
|          BERT_pytorch           |    2    | 0.900155 |
|           tts_angular           |   64    | 0.899012 |
|          hf_DistilBert          |    1    | 0.898063 |
|             hf_GPT2             |    1    | 0.897006 |
|          squeezenet1_1          |   16    | 0.894068 |
|         opacus_cifar10          |   64    | 0.892473 |
|          hf_Longformer          |    1    | 0.887606 |
|        hf_distil_whisper        |    1    | 0.882573 |
|              vgg16              |    4    | 0.882274 |
|             hf_Bart             |    1    | 0.880644 |
|        soft_actor_critic        |   256   | 0.880168 |
|              hf_T5              |    1    | 0.879554 |
|          hf_GPT2_large          |    1    | 0.879001 |
|      functorch_dp_cifar10       |   64    | 0.874548 |
|         phlippe_resnet          |   128   | 0.864901 |
|          lennard_jones          |  1000   | 0.859504 |
|          hf_Bert_large          |    1    | 0.855295 |
|     functorch_maml_omniglot     |    1    | 0.854232 |
|          maml_omniglot          |    5    | 0.85133  |
|            moondream            |    1    | 0.84121  |
|           hf_T5_large           |    1    | 0.829149 |
|           hf_Reformer           |    1    |  0.8125  |
|             alexnet             |   128   | 0.781381 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.766375 |
|              maml               |    1    | 0.718259 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.568389 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 812.69881  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 788.918188 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 702.189588 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 697.860987 |
|  timm_vision_transformer_large  |   32    | 673.869787 |
|           hf_T5_base            |    1    | 352.153067 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 251.743416 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 241.863885 |
|         vision_maskrcnn         |    1    | 237.538932 |
|           Super_SloMo           |    6    | 203.987092 |
|          hf_GPT2_large          |    1    | 130.886191 |
|           hf_T5_large           |    1    | 126.979895 |
|           timm_nfnet            |   128   | 101.913342 |
|            moondream            |    1    | 101.377275 |
|        hf_distil_whisper        |    1    | 101.012966 |
|           hf_BigBird            |    1    | 95.274045  |
|    detectron2_fcos_r_50_fpn     |    1    | 77.486423  |
|          pytorch_unet           |    1    |  66.68387  |
|       Background_Matting        |    1    | 65.047968  |
|      torch_multimodal_clip      |   32    | 61.283153  |
|              maml               |    1    | 57.425774  |
|           densenet121           |   64    | 52.914399  |
|             demucs              |    1    | 51.915812  |
|           timm_regnet           |   32    | 42.672477  |
|            resnet152            |   32    | 36.834629  |
|       doctr_det_predictor       |    1    | 36.254533  |
|          hf_Longformer          |    1    | 33.363538  |
|          hf_Bert_large          |    1    | 33.195623  |
|             yolov3              |    8    |  32.43536  |
|   mobilenet_v2_quantized_qat    |   96    | 30.692361  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.118658  |
|     pyhpc_isoneutral_mixing     | 1048576 | 26.945721  |
|       speech_transformer        |    1    | 24.303339  |
|        timm_efficientnet        |   64    | 23.810489  |
|     timm_vision_transformer     |   32    | 21.884059  |
|           hf_Reformer           |    1    | 21.231165  |
|           timm_vovnet           |   32    | 18.766364  |
|              hf_T5              |    1    | 18.722814  |
|             hf_Bart             |    1    | 18.671195  |
|          fastNLP_Bert           |    1    | 17.449407  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.638931  |
|         pytorch_stargan         |   16    | 16.444051  |
|     nvidia_deeprecommender      |   256   | 16.324185  |
|            resnet50             |   32    | 14.458419  |
|             hf_Bert             |    1    | 13.509596  |
|            hf_Albert            |    1    | 12.936548  |
|     resnet50_quantized_qat      |   32    | 12.347706  |
|          BERT_pytorch           |    2    |  12.08108  |
|             hf_GPT2             |    1    | 10.279487  |
|       shufflenet_v2_x1_0        |   64    | 10.149725  |
|          timm_resnest           |   32    |  9.447795  |
|         resnext50_32x4d         |    8    |  9.392877  |
|       mobilenet_v3_large        |   32    |  8.999878  |
|           tts_angular           |   64    |  8.910902  |
|              llama              |   32    |  8.823104  |
|         LearningToPaint         |   96    |  8.726797  |
|         opacus_cifar10          |   64    |  8.414812  |
|        phlippe_densenet         |   128   |  8.399589  |
|      functorch_dp_cifar10       |   64    |  8.268902  |
|          hf_DistilBert          |    1    |  8.094532  |
|          basic_gnn_gcn          |    1    |  7.975795  |
|              vgg16              |    4    |  7.396203  |
|           mnasnet1_0            |   32    |  7.246442  |
|             alexnet             |   128   |  6.838786  |
|        basic_gnn_edgecnn        |    1    |  6.377892  |
|          mobilenet_v2           |   16    |  5.729049  |
|              dlrm               |  2048   |  4.340845  |
|         basic_gnn_sage          |    1    |  4.187308  |
|          squeezenet1_1          |   16    |  3.704919  |
|          basic_gnn_gin          |    1    |  3.675451  |
|            resnet18             |    8    |  3.558471  |
|              dcgan              |   256   |  3.014685  |
|      doctr_reco_predictor       |    1    |  3.002369  |
|         phlippe_resnet          |   128   |  1.860656  |
|     pyhpc_equation_of_state     | 1048576 |  1.167096  |
|          maml_omniglot          |    5    |  0.519101  |
|     functorch_maml_omniglot     |    1    |  0.386089  |
|        soft_actor_critic        |   256   |  0.287424  |
|          lennard_jones          |  1000   |  0.197833  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.222448 |
|       ElectraForQuestionAnswering       | 64  | 4.155247  |
|           ElectraForCausalLM            | 32  | 3.735735  |
|     MobileBertForQuestionAnswering      | 128 | 3.325312  |
|           RobertaForCausalLM            | 16  |  3.07848  |
|        BertForQuestionAnswering         | 16  | 3.018149  |
|    LayoutLMForSequenceClassification    | 16  | 3.004488  |
|       RobertaForQuestionAnswering       | 16  | 2.999612  |
|          MobileBertForMaskedLM          | 128 | 2.936484  |
|                CamemBert                | 16  | 2.931426  |
|           LayoutLMForMaskedLM           | 16  | 2.760036  |
|             BertForMaskedLM             | 16  | 2.730384  |
|    MegatronBertForQuestionAnswering     |  8  |  2.67225  |
|               DistillGPT2               | 16  | 2.466636  |
|         MegatronBertForCausalLM         |  4  | 2.373986  |
|                 T5Small                 |  4  | 2.356655  |
|       T5ForConditionalGeneration        |  4  | 2.350836  |
|           DebertaForMaskedLM            |  8  | 2.342943  |
|      GPT2ForSequenceClassification      |  4  | 2.338818  |
|            YituTechConvBert             | 16  | 2.250046  |
|       DebertaForQuestionAnswering       | 16  | 2.157198  |
|             XGLMForCausalLM             |  8  | 2.097615  |
|             OPTForCausalLM              |  2  | 2.066887  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.982612  |
|         Speech2Text2ForCausalLM         | 256 | 1.973752  |
|          BlenderbotForCausalLM          |  4  | 1.887349  |
|       MT5ForConditionalGeneration       | 16  | 1.878368  |
|       BlenderbotSmallForCausalLM        | 64  | 1.851545  |
|     DistilBertForQuestionAnswering      | 256 | 1.822889  |
|            PLBartForCausalLM            |  8  | 1.772833  |
|            MBartForCausalLM             |  4  | 1.745818  |
|     PLBartForConditionalGeneration      |  4  | 1.730222  |
|          DistilBertForMaskedLM          | 128 | 1.723656  |
|          DebertaV2ForMaskedLM           |  2  | 1.719463  |
| BlenderbotSmallForConditionalGeneration | 64  |  1.71756  |
|            TrOCRForCausalLM             | 32  |  1.65413  |
|     PegasusForConditionalGeneration     | 32  | 1.650703  |
|     M2M100ForConditionalGeneration      | 16  | 1.623277  |
|      MBartForConditionalGeneration      |  2  | 1.622031  |
|           PegasusForCausalLM            | 32  | 1.599206  |
|             BartForCausalLM             |  4  | 1.513026  |
|      BartForConditionalGeneration       |  2  | 1.510905  |
|            AlbertForMaskedLM            |  4  | 1.447603  |
|       AlbertForQuestionAnswering        |  4  | 1.441942  |
|               GoogleFnet                | 16  | 1.326395  |
|          AllenaiLongformerBase          |  4  | 1.057552  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 113.103276 |
|     MobileBertForQuestionAnswering      | 128 | 70.689838  |
|          MobileBertForMaskedLM          | 128 | 70.439141  |
|      MBartForConditionalGeneration      |  2  | 51.651356  |
|     M2M100ForConditionalGeneration      | 16  | 51.637135  |
|     PegasusForConditionalGeneration     | 32  | 51.573251  |
|      BartForConditionalGeneration       |  2  | 43.900958  |
|             XGLMForCausalLM             |  8  | 43.134065  |
|          BlenderbotForCausalLM          |  4  | 42.369916  |
|          DebertaV2ForMaskedLM           |  2  | 42.311601  |
|       MT5ForConditionalGeneration       | 16  | 38.442628  |
|            XLNetLMHeadModel             |  8  | 38.003008  |
| BlenderbotSmallForConditionalGeneration | 64  | 36.948027  |
|         MegatronBertForCausalLM         |  4  | 36.458609  |
|    MegatronBertForQuestionAnswering     |  8  | 35.616622  |
|            YituTechConvBert             | 16  | 34.419767  |
|     PLBartForConditionalGeneration      |  4  | 29.807325  |
|      DebertaV2ForQuestionAnswering      |  1  | 29.336449  |
|                 T5Small                 |  4  | 28.513814  |
|       T5ForConditionalGeneration        |  4  | 28.503076  |
|       DebertaForQuestionAnswering       | 16  | 25.659283  |
|           DebertaForMaskedLM            |  8  | 25.521808  |
|           PegasusForCausalLM            | 32  |  24.77747  |
|             OPTForCausalLM              |  2  | 24.699805  |
|            MBartForCausalLM             |  4  | 24.672659  |
|            TrOCRForCausalLM             | 32  | 24.286369  |
|            AlbertForMaskedLM            |  4  | 22.817989  |
|      GPT2ForSequenceClassification      |  4  | 21.444456  |
|       RobertaForQuestionAnswering       | 16  | 21.347934  |
|           LayoutLMForMaskedLM           | 16  | 21.332325  |
|        BertForQuestionAnswering         | 16  | 21.329599  |
|           RobertaForCausalLM            | 16  | 21.301045  |
|           ElectraForCausalLM            | 32  | 21.272372  |
|                CamemBert                | 16  | 21.252834  |
|       ElectraForQuestionAnswering       | 64  | 21.243473  |
|             BartForCausalLM             |  4  | 21.125138  |
|             BertForMaskedLM             | 16  | 21.000797  |
|    LayoutLMForSequenceClassification    | 16  | 20.881993  |
|       AlbertForQuestionAnswering        |  4  | 20.239829  |
|       BlenderbotSmallForCausalLM        | 64  | 19.088457  |
|          DistilBertForMaskedLM          | 128 | 16.975276  |
|     DistilBertForQuestionAnswering      | 256 | 16.818489  |
|         Speech2Text2ForCausalLM         | 256 | 16.488335  |
|            PLBartForCausalLM            |  8  |  16.30899  |
|               GoogleFnet                | 16  | 15.890077  |
|               DistillGPT2               | 16  | 14.868574  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            PLBartForCausalLM            |  8  | 0.991202 |
|       AlbertForQuestionAnswering        |  4  | 0.990765 |
|               GoogleFnet                | 16  | 0.990223 |
|               DistillGPT2               | 16  | 0.989872 |
|          DistilBertForMaskedLM          | 128 | 0.989382 |
|           ElectraForCausalLM            | 32  | 0.988904 |
|            AlbertForMaskedLM            |  4  | 0.987891 |
|             BertForMaskedLM             | 16  | 0.987563 |
|                CamemBert                | 16  | 0.987136 |
|       ElectraForQuestionAnswering       | 64  | 0.98704  |
|           RobertaForCausalLM            | 16  | 0.986758 |
|     DistilBertForQuestionAnswering      | 256 | 0.98668  |
|             OPTForCausalLM              |  2  | 0.986097 |
|            YituTechConvBert             | 16  | 0.985322 |
|           LayoutLMForMaskedLM           | 16  | 0.984979 |
|         Speech2Text2ForCausalLM         | 256 | 0.984885 |
|       BlenderbotSmallForCausalLM        | 64  | 0.984639 |
|        BertForQuestionAnswering         | 16  | 0.98333  |
| BlenderbotSmallForConditionalGeneration | 64  | 0.982875 |
|    LayoutLMForSequenceClassification    | 16  | 0.980821 |
|            TrOCRForCausalLM             | 32  | 0.980573 |
|       RobertaForQuestionAnswering       | 16  | 0.97947  |
|       DebertaForQuestionAnswering       | 16  | 0.979251 |
|          MobileBertForMaskedLM          | 128 | 0.977173 |
|                 T5Small                 |  4  | 0.976077 |
|       T5ForConditionalGeneration        |  4  | 0.975714 |
|       MT5ForConditionalGeneration       | 16  | 0.974022 |
|      GPT2ForSequenceClassification      |  4  | 0.972913 |
|           PegasusForCausalLM            | 32  | 0.966278 |
|     MobileBertForQuestionAnswering      | 128 | 0.965642 |
|           DebertaForMaskedLM            |  8  | 0.964373 |
|          AllenaiLongformerBase          |  4  | 0.962619 |
|     PLBartForConditionalGeneration      |  4  | 0.961107 |
|     M2M100ForConditionalGeneration      | 16  | 0.949407 |
|          BlenderbotForCausalLM          |  4  | 0.940398 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.936095 |
|         MegatronBertForCausalLM         |  4  | 0.935544 |
|          DebertaV2ForMaskedLM           |  2  | 0.935266 |
|    MegatronBertForQuestionAnswering     |  8  | 0.93397  |
|            XLNetLMHeadModel             |  8  | 0.930635 |
|      BartForConditionalGeneration       |  2  | 0.926245 |
|             BartForCausalLM             |  4  | 0.922133 |
|      MBartForConditionalGeneration      |  2  | 0.920498 |
|     PegasusForConditionalGeneration     | 32  | 0.914761 |
|             XGLMForCausalLM             |  8  | 0.912812 |
|            MBartForCausalLM             |  4  | 0.900473 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 952.967038 |
|            AlbertForMaskedLM            |  4  | 496.356103 |
|       AlbertForQuestionAnswering        |  4  | 491.560914 |
|            XLNetLMHeadModel             |  8  | 412.243002 |
|               GoogleFnet                | 16  | 281.752261 |
|             OPTForCausalLM              |  2  | 197.167321 |
|            TrOCRForCausalLM             | 32  | 171.263286 |
|            MBartForCausalLM             |  4  | 167.020923 |
|      MBartForConditionalGeneration      |  2  | 166.980929 |
|     PegasusForConditionalGeneration     | 32  | 163.204597 |
|            YituTechConvBert             | 16  | 133.796775 |
|     DistilBertForQuestionAnswering      | 256 | 132.226559 |
|            PLBartForCausalLM            |  8  | 131.042904 |
|      BartForConditionalGeneration       |  2  |  130.9406  |
|    MegatronBertForQuestionAnswering     |  8  | 129.607196 |
|                 T5Small                 |  4  | 122.885671 |
|       T5ForConditionalGeneration        |  4  | 122.613979 |
|          DebertaV2ForMaskedLM           |  2  | 122.372629 |
|          BlenderbotForCausalLM          |  4  | 120.995799 |
|     PLBartForConditionalGeneration      |  4  | 118.519137 |
|     M2M100ForConditionalGeneration      | 16  | 118.235879 |
|          DistilBertForMaskedLM          | 128 | 111.728536 |
| BlenderbotSmallForConditionalGeneration | 64  | 106.309947 |
|          MobileBertForMaskedLM          | 128 | 102.756761 |
|           RobertaForCausalLM            | 16  | 102.096585 |
|             BertForMaskedLM             | 16  | 100.903489 |
|           LayoutLMForMaskedLM           | 16  | 100.665645 |
|                CamemBert                | 16  | 95.044655  |
|             BartForCausalLM             |  4  | 91.836449  |
|       MT5ForConditionalGeneration       | 16  | 91.043151  |
|       DebertaForQuestionAnswering       | 16  | 90.162895  |
|         MegatronBertForCausalLM         |  4  | 83.991341  |
|           PegasusForCausalLM            | 32  | 77.452351  |
|    LayoutLMForSequenceClassification    | 16  | 77.402871  |
|        BertForQuestionAnswering         | 16  | 76.658575  |
|       RobertaForQuestionAnswering       | 16  | 76.526267  |
|               DistillGPT2               | 16  | 69.946464  |
|      DebertaV2ForQuestionAnswering      |  1  | 69.490225  |
|             XGLMForCausalLM             |  8  | 68.820335  |
|       ElectraForQuestionAnswering       | 64  | 62.720883  |
|           DebertaForMaskedLM            |  8  | 59.016761  |
|      GPT2ForSequenceClassification      |  4  | 58.062148  |
|         Speech2Text2ForCausalLM         | 256 | 57.355976  |
|       BlenderbotSmallForCausalLM        | 64  | 55.953856  |
|           ElectraForCausalLM            | 32  | 55.613135  |
|     MobileBertForQuestionAnswering      | 128 |  53.08698  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 4.911364 |
|          inception_v3           | 128  | 4.910184 |
|           mnasnet_100           | 512  | 4.797821 |
|           fbnetc_100            | 512  | 4.777136 |
|       gluon_inception_v3        | 256  | 4.717211 |
|        adv_inception_v3         | 128  | 4.708842 |
|      mobilenetv3_large_100      | 512  | 4.463387 |
|           regnety_002           | 1024 | 4.387588 |
|          cspdarknet53           |  64  | 4.360947 |
|            fbnetv3_b            | 256  | 4.247958 |
|        ese_vovnet19b_dw         | 256  | 4.203857 |
|         mobilenetv2_100         | 128  | 4.196267 |
|           res2next50            | 128  | 4.082115 |
|            lcnet_050            | 256  | 3.950057 |
|        res2net101_26w_4s        | 128  | 3.943093 |
|        res2net50_14w_8s         | 128  | 3.88739  |
|            hrnet_w18            | 128  | 3.836502 |
|          spnasnet_100           | 128  | 3.834425 |
|           rexnet_100            | 256  | 3.74767  |
|             dla102              | 128  | 3.679729 |
|          botnet26t_256          | 128  | 3.675971 |
|            nfnet_l0             | 128  | 3.626727 |
|          pnasnet5large          |  16  | 3.612588 |
|            gernet_l             | 128  | 3.412008 |
|     swsl_resnext101_32x16d      |  32  | 3.329428 |
|           volo_d1_224           |  64  | 3.272341 |
|       eca_botnext26ts_256       | 128  | 3.14904  |
|        eca_halonext26ts         | 128  | 3.058575 |
|       tf_efficientnet_b0        | 128  | 3.045313 |
|           dm_nfnet_f0           | 128  | 3.011726 |
|            tinynet_a            | 128  | 2.901365 |
|        convmixer_768_32         |  32  | 2.836812 |
|            repvgg_a2            | 128  | 2.696213 |
|           selecsls42b           | 128  | 2.604492 |
|         visformer_small         | 128  | 2.515283 |
|           mobilevit_s           |  64  | 2.46495  |
|          ghostnet_100           | 512  | 2.411106 |
|         poolformer_m36          |  64  | 2.296563 |
|          jx_nest_base           |  32  | 2.038358 |
|      xcit_large_24_p8_224       |  16  | 1.982092 |
|             dpn107              |  64  | 1.93579  |
|            levit_128            | 1024 | 1.89089  |
|            mixnet_l             | 128  | 1.872933 |
|         coat_lite_mini          | 128  | 1.86317  |
|          convnext_base          |  64  | 1.861573 |
|           tf_mixnet_l           | 128  | 1.812413 |
|          gmlp_s16_224           | 128  | 1.733237 |
|        tnt_s_patch16_224        | 128  | 1.647581 |
|  swin_base_patch4_window7_224   |  64  | 1.616242 |
|        twins_pcpvt_base         | 128  | 1.616123 |
|          mixer_b16_224          | 128  | 1.593059 |
| deit_base_distilled_patch16_224 |  64  | 1.59067  |
|           convit_base           |  64  | 1.581561 |
|      beit_base_patch16_224      |  64  | 1.520338 |
|         crossvit_9_240          | 256  | 1.509079 |
|      vit_base_patch16_224       |  64  | 1.465755 |
|          resmlp_12_224          | 128  | 1.443612 |
|            pit_b_224            |  64  | 1.425612 |
|          gmixer_24_224          | 128  | 1.381093 |
|        sebotnet33ts_256         |  64  | 1.102377 |
|          cait_m36_384           |  4   | 0.986861 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|      xcit_large_24_p8_224       |  16  | 78.211534 |
|          cait_m36_384           |  4   | 74.828427 |
|            hrnet_w18            | 128  | 71.31754  |
|          pnasnet5large          |  16  | 68.787669 |
|  swin_base_patch4_window7_224   |  64  | 58.290648 |
|         poolformer_m36          |  64  | 55.705074 |
|           mobilevit_s           |  64  | 51.55968  |
|        res2net101_26w_4s        | 128  | 49.569019 |
|          jx_nest_base           |  32  | 49.492229 |
|           resnest101e           |  64  | 48.724543 |
|        tnt_s_patch16_224        | 128  | 48.499949 |
|        res2net50_14w_8s         | 128  | 46.688705 |
|             dpn107              |  64  | 45.780157 |
|        twins_pcpvt_base         | 128  | 44.278307 |
|           volo_d1_224           |  64  | 42.959502 |
|           tf_mixnet_l           | 128  | 41.098207 |
|        eca_halonext26ts         | 128  | 37.982579 |
|            mixnet_l             | 128  | 37.93795  |
|            levit_128            | 1024 | 37.64421  |
|            fbnetv3_b            | 256  | 37.258997 |
|        sebotnet33ts_256         |  64  | 35.688599 |
|        adv_inception_v3         | 128  | 34.649756 |
|          gmixer_24_224          | 128  | 34.645965 |
|         crossvit_9_240          | 256  | 31.901734 |
|          gmlp_s16_224           | 128  | 31.876017 |
|          inception_v3           | 128  | 31.846438 |
|       gluon_inception_v3        | 256  | 30.327957 |
|           res2next50            | 128  | 29.46362  |
|       eca_botnext26ts_256       | 128  | 28.926993 |
|          convnext_base          |  64  | 28.49619  |
|           convit_base           |  64  | 28.235712 |
|           rexnet_100            | 256  | 28.166838 |
|             dla102              | 128  | 27.877378 |
|         coat_lite_mini          | 128  | 27.210453 |
|          ghostnet_100           | 512  | 26.277887 |
|          botnet26t_256          | 128  | 26.047283 |
|            tinynet_a            | 128  | 25.686432 |
|     swsl_resnext101_32x16d      |  32  | 25.428811 |
|       tf_efficientnet_b0        | 128  | 23.97875  |
|         visformer_small         | 128  | 22.936363 |
|          cspdarknet53           |  64  | 21.792093 |
|        convmixer_768_32         |  32  | 21.60633  |
|      mobilenetv3_large_100      | 512  | 20.039675 |
|            pit_b_224            |  64  | 19.903467 |
|           dm_nfnet_f0           | 128  | 19.54712  |
|          mixer_b16_224          | 128  | 19.394172 |
|         mobilenetv2_100         | 128  | 18.983621 |
|           regnety_002           | 1024 | 18.787698 |
|      beit_base_patch16_224      |  64  | 18.763174 |
| deit_base_distilled_patch16_224 |  64  | 18.531428 |
|      vit_base_patch16_224       |  64  | 18.389533 |
|          spnasnet_100           | 128  | 18.151022 |
|            nfnet_l0             | 128  | 17.747488 |
|            gernet_l             | 128  | 17.673705 |
|            repvgg_a2            | 128  | 17.554392 |
|            lcnet_050            | 256  | 16.730766 |
|           fbnetc_100            | 512  | 16.685323 |
|          resmlp_12_224          | 128  | 15.921458 |
|           selecsls42b           | 128  | 15.395581 |
|           mnasnet_100           | 512  | 14.873025 |
|        ese_vovnet19b_dw         | 256  | 13.826768 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995333 |
|           fbnetc_100            | 512  | 0.993312 |
|           rexnet_100            | 256  | 0.991891 |
|           dm_nfnet_f0           | 128  | 0.991842 |
|      mobilenetv3_large_100      | 512  | 0.991316 |
|           mnasnet_100           | 512  | 0.991056 |
|           regnety_002           | 1024 | 0.990929 |
|       gluon_inception_v3        | 256  | 0.990482 |
|            nfnet_l0             | 128  | 0.990358 |
|            levit_128            | 1024 | 0.989852 |
|          ghostnet_100           | 512  | 0.988693 |
|            fbnetv3_b            | 256  | 0.988618 |
|             dpn107              |  64  | 0.988295 |
|      beit_base_patch16_224      |  64  | 0.987578 |
|           convit_base           |  64  | 0.98754  |
|      xcit_large_24_p8_224       |  16  | 0.987316 |
|        res2net101_26w_4s        | 128  | 0.986999 |
|             dla102              | 128  | 0.986967 |
|          mixer_b16_224          | 128  | 0.986657 |
|        eca_halonext26ts         | 128  | 0.986519 |
|          gmlp_s16_224           | 128  | 0.986495 |
|           res2next50            | 128  | 0.985929 |
|       eca_botnext26ts_256       | 128  | 0.985783 |
|         visformer_small         | 128  | 0.985483 |
|        twins_pcpvt_base         | 128  | 0.985478 |
|        tnt_s_patch16_224        | 128  | 0.985045 |
|            mixnet_l             | 128  | 0.984893 |
|           tf_mixnet_l           | 128  | 0.984775 |
|        convmixer_768_32         |  32  | 0.984712 |
|          botnet26t_256          | 128  | 0.984377 |
|          convnext_base          |  64  | 0.984232 |
|          inception_v3           | 128  | 0.984131 |
|           resnest101e           |  64  | 0.983568 |
|         poolformer_m36          |  64  | 0.983494 |
|         coat_lite_mini          | 128  | 0.983366 |
|        adv_inception_v3         | 128  | 0.98336  |
|        res2net50_14w_8s         | 128  | 0.982284 |
|         crossvit_9_240          | 256  | 0.982052 |
|          gmixer_24_224          | 128  | 0.982046 |
|          pnasnet5large          |  16  | 0.981908 |
|       tf_efficientnet_b0        | 128  | 0.981582 |
|            pit_b_224            |  64  |  0.9812  |
| deit_base_distilled_patch16_224 |  64  | 0.981089 |
|      vit_base_patch16_224       |  64  | 0.980892 |
|          jx_nest_base           |  32  | 0.980029 |
|          cspdarknet53           |  64  | 0.979501 |
|         mobilenetv2_100         | 128  | 0.979289 |
|  swin_base_patch4_window7_224   |  64  | 0.979261 |
|           mobilevit_s           |  64  | 0.979155 |
|            gernet_l             | 128  | 0.978254 |
|            tinynet_a            | 128  | 0.976854 |
|          resmlp_12_224          | 128  | 0.976516 |
|     swsl_resnext101_32x16d      |  32  | 0.976492 |
|           volo_d1_224           |  64  | 0.975498 |
|           selecsls42b           | 128  | 0.974753 |
|            hrnet_w18            | 128  | 0.973646 |
|          spnasnet_100           | 128  | 0.97301  |
|            repvgg_a2            | 128  | 0.969918 |
|          cait_m36_384           |  4   | 0.968159 |
|            lcnet_050            | 256  | 0.963969 |
|        sebotnet33ts_256         |  64  | 0.832729 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 458.967589 |
|      xcit_large_24_p8_224       |  16  | 253.901295 |
|           convit_base           |  64  | 227.493245 |
|             dpn107              |  64  | 223.962383 |
|            levit_128            | 1024 | 209.213653 |
|        tnt_s_patch16_224        | 128  | 189.302302 |
|           dm_nfnet_f0           | 128  | 177.423029 |
|          convnext_base          |  64  | 176.939853 |
|        ese_vovnet19b_dw         | 256  | 167.146217 |
|        sebotnet33ts_256         |  64  | 165.58743  |
|  swin_base_patch4_window7_224   |  64  | 159.859687 |
|          jx_nest_base           |  32  | 149.60375  |
|          mixer_b16_224          | 128  | 148.907506 |
|        twins_pcpvt_base         | 128  | 147.913525 |
|         poolformer_m36          |  64  | 143.890663 |
|       gluon_inception_v3        | 256  | 143.329582 |
|            nfnet_l0             | 128  | 138.901766 |
|           tf_mixnet_l           | 128  | 137.717325 |
|            mixnet_l             | 128  | 127.793754 |
|          ghostnet_100           | 512  | 126.696002 |
|         crossvit_9_240          | 256  | 120.641648 |
|          gmixer_24_224          | 128  | 115.251488 |
|          pnasnet5large          |  16  | 113.590408 |
|          gmlp_s16_224           | 128  | 111.074998 |
|           volo_d1_224           |  64  | 109.94031  |
|      vit_base_patch16_224       |  64  | 106.384325 |
|      beit_base_patch16_224      |  64  | 105.805039 |
|        res2net101_26w_4s        | 128  | 103.842368 |
|         coat_lite_mini          | 128  | 101.578349 |
|            pit_b_224            |  64  | 99.531335  |
| deit_base_distilled_patch16_224 |  64  | 98.425969  |
|        eca_halonext26ts         | 128  | 96.247737  |
|           fbnetc_100            | 512  | 95.578697  |
|     swsl_resnext101_32x16d      |  32  |  94.48073  |
|       eca_botnext26ts_256       | 128  | 92.360054  |
|             dla102              | 128  | 90.113944  |
|            hrnet_w18            | 128  | 89.553639  |
|           regnety_002           | 1024 | 84.829778  |
|         visformer_small         | 128  | 84.117761  |
|          resmlp_12_224          | 128  | 82.152187  |
|           res2next50            | 128  | 80.357056  |
|          botnet26t_256          | 128  |   80.258   |
|        res2net50_14w_8s         | 128  | 80.210289  |
|           mnasnet_100           | 512  | 79.489573  |
|           rexnet_100            | 256  | 79.200809  |
|           resnest101e           |  64  | 77.664594  |
|      mobilenetv3_large_100      | 512  | 77.036748  |
|        convmixer_768_32         |  32  | 74.853926  |
|            fbnetv3_b            | 256  | 72.911831  |
|           mobilevit_s           |  64  | 71.656051  |
|        adv_inception_v3         | 128  | 69.546016  |
|          inception_v3           | 128  | 68.825513  |
|          cspdarknet53           |  64  | 48.626891  |
|            repvgg_a2            | 128  |  46.78199  |
|       tf_efficientnet_b0        | 128  | 46.012574  |
|            gernet_l             | 128  |  39.90179  |
|           selecsls42b           | 128  | 35.309194  |
|            tinynet_a            | 128  | 33.793266  |
|         mobilenetv2_100         | 128  | 23.382785  |
|          spnasnet_100           | 128  | 21.077692  |
|            lcnet_050            | 256  |  9.403681  |
+---------------------------------+------+------------+

WeizhuoZhang-intel · 2024-10-21T14:25:46Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-19 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 90%, 72/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.81x    |    1.83x    |    3.12x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.71    |    21.45    |    21.01    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    |  38.0458  |
|     pyhpc_equation_of_state     |    1    | 18.625827 |
|              dcgan              |    1    |  8.99389  |
|          squeezenet1_1          |    1    | 7.790003  |
|          timm_resnest           |    1    | 6.760906  |
|      functorch_dp_cifar10       |    1    | 6.492979  |
|         opacus_cifar10          |    1    | 6.377253  |
|           timm_nfnet            |    1    |  6.01505  |
|            resnet18             |    1    | 5.852647  |
|            resnet50             |    1    |  5.52897  |
|       doctr_det_predictor       |    1    | 5.448716  |
|         LearningToPaint         |    1    | 5.198936  |
|         resnext50_32x4d         |    1    | 5.053529  |
|              vgg16              |    1    | 4.722577  |
|          mobilenet_v2           |    1    |  4.70138  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.676647  |
|            resnet152            |    1    | 4.591587  |
|             yolov3              |    1    | 4.534552  |
|           mnasnet1_0            |    1    | 4.496897  |
|       mobilenet_v3_large        |    1    | 4.474712  |
|          lennard_jones          |    1    | 4.402271  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.298722  |
|             alexnet             |    1    | 4.275482  |
|           timm_vovnet           |    1    | 4.238142  |
|              llama              |    1    | 4.180145  |
|     nvidia_deeprecommender      |    1    | 4.172562  |
|      doctr_reco_predictor       |    1    | 3.914183  |
|         vision_maskrcnn         |    1    | 3.912131  |
|       shufflenet_v2_x1_0        |    1    | 3.716406  |
|     functorch_maml_omniglot     |    1    | 3.676238  |
|           timm_regnet           |    1    | 3.552925  |
|           densenet121           |    1    |  3.51577  |
|          basic_gnn_gin          |    1    | 3.443785  |
|              dlrm               |    1    | 3.392521  |
|         phlippe_resnet          |    1    | 3.148168  |
|        timm_efficientnet        |    1    | 3.143513  |
|    detectron2_fcos_r_50_fpn     |    1    |  3.04778  |
|         basic_gnn_sage          |    1    | 3.004733  |
|          basic_gnn_gcn          |    1    | 2.977118  |
|           Super_SloMo           |    1    | 2.947892  |
|        phlippe_densenet         |    1    | 2.776025  |
|          maml_omniglot          |    5    | 2.667891  |
|          BERT_pytorch           |    1    | 2.661528  |
|          pytorch_unet           |    1    |  2.27976  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.212573  |
|       Background_Matting        |    1    | 2.212118  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.166865  |
|             hf_GPT2             |    1    | 2.036232  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.002913  |
|  timm_vision_transformer_large  |    1    | 1.977792  |
|     timm_vision_transformer     |    1    | 1.911797  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.873325  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.823459  |
|           hf_Reformer           |    1    | 1.815093  |
|        soft_actor_critic        |   256   | 1.800667  |
|         pytorch_stargan         |   16    | 1.753193  |
|          hf_GPT2_large          |    1    | 1.745114  |
|        basic_gnn_edgecnn        |    1    | 1.728612  |
|             hf_Bert             |    1    | 1.724769  |
|          hf_Bert_large          |    1    | 1.694806  |
|          hf_DistilBert          |    1    |  1.67923  |
|          fastNLP_Bert           |    1    | 1.659158  |
|             hf_Bart             |    1    | 1.521829  |
|      torch_multimodal_clip      |    1    | 1.466746  |
|            hf_Albert            |    1    | 1.453672  |
|            moondream            |    1    | 1.429042  |
|           hf_T5_base            |    1    |  1.42489  |
|       speech_transformer        |    1    | 1.404791  |
|        hf_distil_whisper        |    1    | 1.372996  |
|           hf_BigBird            |    1    | 1.300938  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  1.29465  |
|           hf_T5_large           |    1    | 1.209087  |
|              hf_T5              |    1    | 1.098751  |
|             demucs              |    1    | 1.048401  |
|           tts_angular           |    1    | 0.997916  |
|     resnet50_quantized_qat      |    1    | 0.986624  |
|   mobilenet_v2_quantized_qat    |    1    | 0.981648  |
|              maml               |    1    | 0.939833  |
|          hf_Longformer          |    1    | 0.876674  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 126.732327 |
|         vision_maskrcnn         |    1    | 119.931601 |
|    detectron2_fcos_r_50_fpn     |    1    | 94.649407  |
| detectron2_fasterrcnn_r_101_fpn |    1    |  85.3279   |
| detectron2_fasterrcnn_r_101_c4  |    1    | 74.172643  |
|              maml               |    1    |  71.69098  |
|          hf_Longformer          |    1    | 71.201877  |
|           hf_T5_large           |    1    | 57.448217  |
| detectron2_fasterrcnn_r_101_dc5 |    1    |  50.59591  |
|       speech_transformer        |    1    | 43.925917  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  37.24554  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 36.745681  |
|           hf_Reformer           |    1    | 36.689059  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  33.95187  |
|           densenet121           |    1    | 33.588763  |
|  timm_vision_transformer_large  |    1    | 28.246603  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.077924  |
|            moondream            |    1    | 27.535354  |
|          fastNLP_Bert           |    1    |  26.10108  |
|          basic_gnn_gcn          |    1    | 25.609599  |
|          hf_Bert_large          |    1    | 22.659494  |
|      torch_multimodal_clip      |    1    | 22.047342  |
|            resnet152            |    1    | 21.582235  |
|        hf_distil_whisper        |    1    | 21.564472  |
|           Super_SloMo           |    1    |  21.52219  |
|          hf_GPT2_large          |    1    | 21.346195  |
|          BERT_pytorch           |    1    | 19.508305  |
|              hf_T5              |    1    | 18.382592  |
|             yolov3              |    1    | 17.798968  |
|             hf_Bart             |    1    | 17.699483  |
|       doctr_det_predictor       |    1    | 17.397952  |
|             demucs              |    1    | 16.550764  |
|        phlippe_densenet         |    1    | 16.155221  |
|           timm_regnet           |    1    | 16.094101  |
|           timm_nfnet            |    1    | 15.489911  |
|        timm_efficientnet        |    1    | 15.337076  |
|       shufflenet_v2_x1_0        |    1    | 15.218422  |
|              llama              |    1    | 14.928927  |
|             hf_Bert             |    1    | 14.828123  |
|             hf_GPT2             |    1    | 14.693093  |
|          timm_resnest           |    1    | 14.179519  |
|         pytorch_stargan         |   16    | 14.149361  |
|       mobilenet_v3_large        |    1    | 13.897572  |
|            hf_Albert            |    1    | 13.825189  |
|     timm_vision_transformer     |    1    | 13.803251  |
|      doctr_reco_predictor       |    1    | 13.705684  |
|           timm_vovnet           |    1    | 13.146974  |
|          mobilenet_v2           |    1    | 12.722047  |
|            resnet50             |    1    | 12.683287  |
|         resnext50_32x4d         |    1    | 12.598451  |
|           mnasnet1_0            |    1    | 12.380848  |
|          hf_DistilBert          |    1    | 12.115471  |
|         opacus_cifar10          |    1    | 12.114354  |
|     pyhpc_isoneutral_mixing     |    1    |  11.73736  |
|      functorch_dp_cifar10       |    1    | 11.654376  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  11.51322  |
|            resnet18             |    1    |  9.958779  |
|          squeezenet1_1          |    1    |  9.95326   |
|           hf_T5_base            |    1    |  9.878494  |
|         LearningToPaint         |    1    |  9.776305  |
|         phlippe_resnet          |    1    |  9.570507  |
|          maml_omniglot          |    5    |  9.373331  |
|     pyhpc_equation_of_state     |    1    |  9.144939  |
|             alexnet             |    1    |  9.079602  |
|     functorch_maml_omniglot     |    1    |  8.91883   |
|              vgg16              |    1    |  8.772271  |
|       Background_Matting        |    1    |  8.653046  |
|              dlrm               |    1    |  8.526336  |
|         basic_gnn_sage          |    1    |  8.211464  |
|              dcgan              |    1    |  8.205038  |
|          basic_gnn_gin          |    1    |   8.1217   |
|        soft_actor_critic        |   256   |  8.117458  |
|        basic_gnn_edgecnn        |    1    |  7.988371  |
|          lennard_jones          |    1    |  7.984392  |
|     nvidia_deeprecommender      |    1    |  7.983473  |
|           tts_angular           |    1    |  7.836178  |
|          pytorch_unet           |    1    |  4.254144  |
|   mobilenet_v2_quantized_qat    |    1    |  0.199595  |
|     resnet50_quantized_qat      |    1    |  0.18005   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.98628  |
|             demucs              |    1    | 0.985214 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.974347 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974183 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.973028 |
|       Background_Matting        |    1    | 0.968918 |
|          pytorch_unet           |    1    | 0.966014 |
|              llama              |    1    | 0.96502  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.964994 |
|        basic_gnn_edgecnn        |    1    | 0.963444 |
|      torch_multimodal_clip      |    1    | 0.958569 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954039 |
|         vision_maskrcnn         |    1    | 0.952695 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.950896 |
|         LearningToPaint         |    1    | 0.950847 |
|       doctr_det_predictor       |    1    | 0.950732 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948638 |
|           hf_BigBird            |    1    | 0.94471  |
|          basic_gnn_gcn          |    1    | 0.94414  |
|     resnet50_quantized_qat      |    1    | 0.943925 |
|      doctr_reco_predictor       |    1    | 0.943492 |
|           Super_SloMo           |    1    | 0.941197 |
|          basic_gnn_gin          |    1    | 0.939808 |
|         basic_gnn_sage          |    1    | 0.939726 |
|           hf_T5_base            |    1    | 0.936829 |
|             hf_GPT2             |    1    | 0.934238 |
|          fastNLP_Bert           |    1    | 0.932544 |
|             hf_Bert             |    1    | 0.927119 |
|         pytorch_stargan         |   16    | 0.925455 |
|        hf_distil_whisper        |    1    | 0.920427 |
|            hf_Albert            |    1    | 0.918435 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.913952 |
|          hf_DistilBert          |    1    | 0.910242 |
|       speech_transformer        |    1    | 0.909142 |
|   mobilenet_v2_quantized_qat    |    1    | 0.90512  |
|          BERT_pytorch           |    1    | 0.904171 |
|          hf_GPT2_large          |    1    | 0.897119 |
|              hf_T5              |    1    | 0.895751 |
|         opacus_cifar10          |    1    | 0.89089  |
|          hf_Longformer          |    1    | 0.888944 |
|           tts_angular           |    1    | 0.884564 |
|             hf_Bart             |    1    | 0.881712 |
|        soft_actor_critic        |   256   | 0.876526 |
|  timm_vision_transformer_large  |    1    | 0.870982 |
|        timm_efficientnet        |    1    | 0.86839  |
|           timm_nfnet            |    1    | 0.868241 |
|     timm_vision_transformer     |    1    | 0.865989 |
|          mobilenet_v2           |    1    | 0.865772 |
|          squeezenet1_1          |    1    | 0.862952 |
|            moondream            |    1    | 0.860221 |
|              vgg16              |    1    | 0.858504 |
|          hf_Bert_large          |    1    | 0.857941 |
|      functorch_dp_cifar10       |    1    | 0.856971 |
|          lennard_jones          |    1    | 0.855946 |
|     functorch_maml_omniglot     |    1    | 0.853774 |
|          maml_omniglot          |    5    | 0.852201 |
|       mobilenet_v3_large        |    1    | 0.848446 |
|           mnasnet1_0            |    1    | 0.844622 |
|              dcgan              |    1    | 0.843478 |
|     pyhpc_equation_of_state     |    1    | 0.839934 |
|             alexnet             |    1    | 0.838812 |
|          timm_resnest           |    1    | 0.837234 |
|     nvidia_deeprecommender      |    1    | 0.837091 |
|         phlippe_resnet          |    1    | 0.833861 |
|           hf_T5_large           |    1    | 0.832263 |
|       shufflenet_v2_x1_0        |    1    | 0.828892 |
|           hf_Reformer           |    1    | 0.827033 |
|     pyhpc_isoneutral_mixing     |    1    | 0.810207 |
|        phlippe_densenet         |    1    | 0.803892 |
|         resnext50_32x4d         |    1    | 0.802022 |
|           densenet121           |    1    | 0.799576 |
|           timm_vovnet           |    1    | 0.795041 |
|             yolov3              |    1    | 0.792992 |
|            resnet18             |    1    | 0.791621 |
|           timm_regnet           |    1    | 0.77832  |
|            resnet50             |    1    | 0.775862 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.765385 |
|            resnet152            |    1    | 0.738107 |
|              maml               |    1    | 0.705018 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9855.99046  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2920.549352 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2879.565644 |
|          hf_GPT2_large          |    1    | 2523.562337 |
|            moondream            |    1    | 2231.497044 |
|           hf_T5_large           |    1    | 2178.038675 |
|        hf_distil_whisper        |    1    | 1941.747776 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1673.84268  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1636.353801 |
|          pytorch_unet           |    1    | 1400.239695 |
|       Background_Matting        |    1    | 1213.465006 |
|             demucs              |    1    | 1174.031408 |
|  timm_vision_transformer_large  |    1    | 941.166964  |
|         vision_maskrcnn         |    1    | 699.127834  |
|    detectron2_fcos_r_50_fpn     |    1    | 615.489548  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 594.920297  |
|           hf_BigBird            |    1    | 538.543727  |
|          hf_Longformer          |    1    | 510.949605  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 509.340722  |
|          hf_Bert_large          |    1    | 444.687337  |
|       doctr_det_predictor       |    1    | 390.831606  |
|      torch_multimodal_clip      |    1    |  320.94174  |
|           Super_SloMo           |    1    | 318.051745  |
|             hf_Bart             |    1    | 245.969461  |
|              hf_T5              |    1    | 235.669068  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  226.55382  |
|         pytorch_stargan         |   16    |  215.78157  |
|             hf_Bert             |    1    | 173.460099  |
|       speech_transformer        |    1    | 163.899778  |
|          fastNLP_Bert           |    1    | 162.298404  |
|            hf_Albert            |    1    | 159.489878  |
|             hf_GPT2             |    1    | 109.479176  |
|          hf_DistilBert          |    1    | 102.700724  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 100.892756  |
|           hf_Reformer           |    1    |  98.643123  |
|        basic_gnn_edgecnn        |    1    |  86.372071  |
|             yolov3              |    1    |  63.254605  |
|              maml               |    1    |  57.843878  |
|          BERT_pytorch           |    1    |  42.301514  |
|              vgg16              |    1    |  41.498702  |
|     nvidia_deeprecommender      |    1    |  39.10132   |
|           timm_regnet           |    1    |  32.475844  |
|           timm_nfnet            |    1    |  30.267482  |
|           tts_angular           |    1    |  30.265266  |
|            resnet152            |    1    |  29.215783  |
|          basic_gnn_gcn          |    1    |  27.303554  |
|     timm_vision_transformer     |    1    |  18.23786   |
|           densenet121           |    1    |  13.181898  |
|         resnext50_32x4d         |    1    |  13.029867  |
|           timm_vovnet           |    1    |  12.659772  |
|             alexnet             |    1    |  11.71523   |
|            resnet50             |    1    |  10.825178  |
|              llama              |    1    |  10.188285  |
|         basic_gnn_sage          |    1    |  9.683793   |
|          basic_gnn_gin          |    1    |  8.769033   |
|     resnet50_quantized_qat      |    1    |  8.628777   |
|        timm_efficientnet        |    1    |  7.914534   |
|          timm_resnest           |    1    |  5.916498   |
|   mobilenet_v2_quantized_qat    |    1    |    5.795    |
|      doctr_reco_predictor       |    1    |  5.539523   |
|            resnet18             |    1    |  3.897698   |
|       mobilenet_v3_large        |    1    |   3.81693   |
|           mnasnet1_0            |    1    |  3.577561   |
|          mobilenet_v2           |    1    |   3.50432   |
|       shufflenet_v2_x1_0        |    1    |  3.394106   |
|         LearningToPaint         |    1    |  2.482461   |
|        phlippe_densenet         |    1    |  2.308143   |
|          squeezenet1_1          |    1    |   2.17903   |
|         opacus_cifar10          |    1    |  1.678078   |
|      functorch_dp_cifar10       |    1    |  1.628624   |
|         phlippe_resnet          |    1    |  0.827246   |
|        soft_actor_critic        |   256   |  0.674426   |
|          maml_omniglot          |    5    |  0.647646   |
|              dcgan              |    1    |  0.587362   |
|     functorch_maml_omniglot     |    1    |   0.51169   |
|              dlrm               |    1    |  0.446698   |
|     pyhpc_isoneutral_mixing     |    1    |  0.046451   |
|          lennard_jones          |    1    |  0.030032   |
|     pyhpc_equation_of_state     |    1    |  0.029525   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 3.940964 |
|     MobileBertForQuestionAnswering      | 1  | 3.401299 |
|     PegasusForConditionalGeneration     | 1  | 2.866691 |
|          BlenderbotForCausalLM          | 1  | 2.823533 |
|           PegasusForCausalLM            | 1  | 2.740361 |
|          DistilBertForMaskedLM          | 1  | 2.725089 |
|         Speech2Text2ForCausalLM         | 1  | 2.657009 |
|            YituTechConvBert             | 1  | 2.521769 |
|     DistilBertForQuestionAnswering      | 1  | 2.513391 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.471184 |
|     M2M100ForConditionalGeneration      | 1  | 2.462192 |
|       BlenderbotSmallForCausalLM        | 1  | 2.460975 |
|             XGLMForCausalLM             | 1  | 2.341333 |
|            XLNetLMHeadModel             | 1  | 2.038877 |
|           ElectraForCausalLM            | 1  | 2.003291 |
|           DebertaForMaskedLM            | 1  | 2.001902 |
|       DebertaForQuestionAnswering       | 1  | 1.993781 |
|       MT5ForConditionalGeneration       | 1  | 1.860279 |
|           RobertaForCausalLM            | 1  | 1.827145 |
|           LayoutLMForMaskedLM           | 1  | 1.819706 |
|                CamemBert                | 1  | 1.812798 |
|             BertForMaskedLM             | 1  | 1.806056 |
|      GPT2ForSequenceClassification      | 1  | 1.798211 |
|    MegatronBertForQuestionAnswering     | 1  | 1.782433 |
|               DistillGPT2               | 1  | 1.775635 |
|         MegatronBertForCausalLM         | 1  | 1.771879 |
|       RobertaForQuestionAnswering       | 1  | 1.751894 |
|    LayoutLMForSequenceClassification    | 1  | 1.747224 |
|       ElectraForQuestionAnswering       | 1  | 1.726113 |
|        BertForQuestionAnswering         | 1  | 1.72478  |
|          DebertaV2ForMaskedLM           | 1  | 1.637815 |
|            TrOCRForCausalLM             | 1  | 1.622674 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.567165 |
|             OPTForCausalLM              | 1  | 1.420944 |
|             BartForCausalLM             | 1  | 1.39321  |
|      BartForConditionalGeneration       | 1  | 1.36299  |
|     PLBartForConditionalGeneration      | 1  | 1.34476  |
|            PLBartForCausalLM            | 1  | 1.315545 |
|      MBartForConditionalGeneration      | 1  | 1.300393 |
|            MBartForCausalLM             | 1  | 1.282466 |
|               GoogleFnet                | 1  | 1.229411 |
|       AlbertForQuestionAnswering        | 1  | 1.181745 |
|            AlbertForMaskedLM            | 1  |  1.1791  |
|                 T5Small                 | 1  | 1.154903 |
|       T5ForConditionalGeneration        | 1  | 1.153763 |
|          AllenaiLongformerBase          | 1  | 0.788155 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.553375 |
|     MobileBertForQuestionAnswering      | 1  | 48.584864 |
|          MobileBertForMaskedLM          | 1  | 48.537896 |
|     PegasusForConditionalGeneration     | 1  | 32.033711 |
|     M2M100ForConditionalGeneration      | 1  | 31.880876 |
|      MBartForConditionalGeneration      | 1  | 31.345794 |
|            XLNetLMHeadModel             | 1  | 28.612435 |
|             XGLMForCausalLM             | 1  | 28.252463 |
|      BartForConditionalGeneration       | 1  | 27.649657 |
|      DebertaV2ForQuestionAnswering      | 1  | 26.805892 |
|          DebertaV2ForMaskedLM           | 1  | 26.428182 |
|          BlenderbotForCausalLM          | 1  | 25.329299 |
| BlenderbotSmallForConditionalGeneration | 1  | 24.50641  |
|       MT5ForConditionalGeneration       | 1  | 23.399327 |
|         MegatronBertForCausalLM         | 1  | 23.162058 |
|    MegatronBertForQuestionAnswering     | 1  | 22.874279 |
|            YituTechConvBert             | 1  | 21.537152 |
|     PLBartForConditionalGeneration      | 1  | 19.435394 |
|       T5ForConditionalGeneration        | 1  | 18.45345  |
|                 T5Small                 | 1  | 18.383195 |
|           PegasusForCausalLM            | 1  | 17.299741 |
|            TrOCRForCausalLM             | 1  | 17.278473 |
|            MBartForCausalLM             | 1  | 16.590176 |
|       DebertaForQuestionAnswering       | 1  | 16.342249 |
|           DebertaForMaskedLM            | 1  | 16.199646 |
|           ElectraForCausalLM            | 1  |  15.6778  |
|       ElectraForQuestionAnswering       | 1  | 15.409073 |
|           LayoutLMForMaskedLM           | 1  | 15.227261 |
|                CamemBert                | 1  | 15.158389 |
|             BertForMaskedLM             | 1  | 15.113524 |
|       RobertaForQuestionAnswering       | 1  | 15.078399 |
|           RobertaForCausalLM            | 1  | 15.043814 |
|        BertForQuestionAnswering         | 1  | 15.041558 |
|    LayoutLMForSequenceClassification    | 1  | 15.030831 |
|            AlbertForMaskedLM            | 1  | 14.755032 |
|             BartForCausalLM             | 1  | 14.487568 |
|       BlenderbotSmallForCausalLM        | 1  | 14.453912 |
|      GPT2ForSequenceClassification      | 1  |  14.196   |
|             OPTForCausalLM              | 1  | 13.860019 |
|               GoogleFnet                | 1  | 13.204522 |
|         Speech2Text2ForCausalLM         | 1  | 12.838331 |
|          DistilBertForMaskedLM          | 1  | 12.665628 |
|     DistilBertForQuestionAnswering      | 1  | 12.447532 |
|            PLBartForCausalLM            | 1  | 12.299669 |
|       AlbertForQuestionAnswering        | 1  | 11.878867 |
|               DistillGPT2               | 1  | 11.168521 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.973965 |
|            MBartForCausalLM             | 1  | 0.967539 |
|               DistillGPT2               | 1  | 0.954362 |
|            YituTechConvBert             | 1  | 0.941531 |
|           PegasusForCausalLM            | 1  | 0.941217 |
|                CamemBert                | 1  | 0.938981 |
|       RobertaForQuestionAnswering       | 1  | 0.938378 |
|     PLBartForConditionalGeneration      | 1  | 0.936949 |
|           LayoutLMForMaskedLM           | 1  | 0.936725 |
|       MT5ForConditionalGeneration       | 1  | 0.936436 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.935598 |
|             BertForMaskedLM             | 1  | 0.935069 |
|          BlenderbotForCausalLM          | 1  | 0.933584 |
|             BartForCausalLM             | 1  | 0.93223  |
|            PLBartForCausalLM            | 1  | 0.931462 |
|    MegatronBertForQuestionAnswering     | 1  | 0.925258 |
|        BertForQuestionAnswering         | 1  | 0.924843 |
|            TrOCRForCausalLM             | 1  | 0.924769 |
|           DebertaForMaskedLM            | 1  | 0.924754 |
|     M2M100ForConditionalGeneration      | 1  | 0.923771 |
|           RobertaForCausalLM            | 1  | 0.922934 |
|                 T5Small                 | 1  | 0.921503 |
|       T5ForConditionalGeneration        | 1  | 0.921248 |
|    LayoutLMForSequenceClassification    | 1  | 0.920954 |
|      GPT2ForSequenceClassification      | 1  | 0.918217 |
|             XGLMForCausalLM             | 1  | 0.916845 |
|       BlenderbotSmallForCausalLM        | 1  | 0.915848 |
|      BartForConditionalGeneration       | 1  | 0.909329 |
|          DebertaV2ForMaskedLM           | 1  | 0.90063  |
|          DistilBertForMaskedLM          | 1  | 0.898411 |
|          AllenaiLongformerBase          | 1  | 0.897641 |
|      MBartForConditionalGeneration      | 1  | 0.89615  |
|     PegasusForConditionalGeneration     | 1  | 0.895971 |
|           ElectraForCausalLM            | 1  | 0.89444  |
|         MegatronBertForCausalLM         | 1  | 0.878758 |
|       ElectraForQuestionAnswering       | 1  | 0.876459 |
|            XLNetLMHeadModel             | 1  | 0.875233 |
|     DistilBertForQuestionAnswering      | 1  | 0.870566 |
|       DebertaForQuestionAnswering       | 1  | 0.866647 |
|         Speech2Text2ForCausalLM         | 1  | 0.853741 |
|               GoogleFnet                | 1  | 0.852839 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.850891 |
|          MobileBertForMaskedLM          | 1  | 0.781036 |
|     MobileBertForQuestionAnswering      | 1  | 0.741248 |
|            AlbertForMaskedLM            | 1  | 0.686485 |
|       AlbertForQuestionAnswering        | 1  | 0.668441 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|       AlbertForQuestionAnswering        | 1  | 3837.180586 |
|            AlbertForMaskedLM            | 1  | 3822.949928 |
|             OPTForCausalLM              | 1  | 2185.483203 |
|      MBartForConditionalGeneration      | 1  | 1768.726205 |
|      BartForConditionalGeneration       | 1  | 1380.288454 |
|          DebertaV2ForMaskedLM           | 1  | 1310.806799 |
|          AllenaiLongformerBase          | 1  | 1123.289368 |
|      DebertaV2ForQuestionAnswering      | 1  | 1015.868715 |
|            MBartForCausalLM             | 1  | 971.263115  |
|            XLNetLMHeadModel             | 1  | 959.986387  |
|                 T5Small                 | 1  | 801.136451  |
|       T5ForConditionalGeneration        | 1  | 800.167117  |
|          BlenderbotForCausalLM          | 1  | 752.367427  |
|     PLBartForConditionalGeneration      | 1  | 647.674388  |
|             BartForCausalLM             | 1  | 606.941869  |
|         MegatronBertForCausalLM         | 1  |  504.36738  |
|    MegatronBertForQuestionAnswering     | 1  | 461.718238  |
|               GoogleFnet                | 1  | 449.784108  |
|            PLBartForCausalLM            | 1  | 379.643783  |
|      GPT2ForSequenceClassification      | 1  | 369.240689  |
|             XGLMForCausalLM             | 1  | 300.796047  |
|     M2M100ForConditionalGeneration      | 1  | 248.050381  |
|           RobertaForCausalLM            | 1  | 204.819578  |
|           DebertaForMaskedLM            | 1  |  203.73169  |
|            YituTechConvBert             | 1  | 196.318413  |
|       MT5ForConditionalGeneration       | 1  | 193.348753  |
|           LayoutLMForMaskedLM           | 1  | 181.146234  |
|                CamemBert                | 1  | 180.942283  |
|            TrOCRForCausalLM             | 1  | 179.947061  |
|             BertForMaskedLM             | 1  |  179.42489  |
|     PegasusForConditionalGeneration     | 1  | 177.444293  |
|       RobertaForQuestionAnswering       | 1  | 150.474684  |
|        BertForQuestionAnswering         | 1  | 144.723167  |
|               DistillGPT2               | 1  | 144.579019  |
|    LayoutLMForSequenceClassification    | 1  | 141.889317  |
|       DebertaForQuestionAnswering       | 1  | 140.770436  |
|           PegasusForCausalLM            | 1  |  87.519923  |
| BlenderbotSmallForConditionalGeneration | 1  |  51.74645   |
|           ElectraForCausalLM            | 1  |  47.288263  |
|          DistilBertForMaskedLM          | 1  |  30.301668  |
|          MobileBertForMaskedLM          | 1  |  29.351261  |
|       ElectraForQuestionAnswering       | 1  |  28.812127  |
|       BlenderbotSmallForCausalLM        | 1  |  27.839748  |
|     DistilBertForQuestionAnswering      | 1  |  19.434876  |
|     MobileBertForQuestionAnswering      | 1  |  18.348568  |
|         Speech2Text2ForCausalLM         | 1  |  5.340485   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.361325 |
|       gluon_inception_v3        | 1  |  6.2778  |
|        adv_inception_v3         | 1  | 6.26036  |
|        ese_vovnet19b_dw         | 1  | 6.259972 |
|           resnest101e           | 1  | 5.748628 |
|          pnasnet5large          | 1  | 5.351136 |
|          cspdarknet53           | 1  | 5.326542 |
|           dm_nfnet_f0           | 1  | 5.174775 |
|           res2next50            | 1  | 4.94176  |
|         mobilenetv2_100         | 1  | 4.813035 |
|            nfnet_l0             | 1  | 4.758488 |
|     swsl_resnext101_32x16d      | 1  | 4.753733 |
|           fbnetc_100            | 1  | 4.750516 |
|            fbnetv3_b            | 1  | 4.712456 |
|          spnasnet_100           | 1  | 4.602522 |
|             dla102              | 1  | 4.60046  |
|      mobilenetv3_large_100      | 1  | 4.578406 |
|           mnasnet_100           | 1  | 4.517264 |
|        res2net50_14w_8s         | 1  | 4.376658 |
|          botnet26t_256          | 1  | 4.332496 |
|           selecsls42b           | 1  | 4.283076 |
|            gernet_l             | 1  | 4.117917 |
|        res2net101_26w_4s        | 1  | 4.111663 |
|            hrnet_w18            | 1  | 4.085768 |
|           regnety_002           | 1  | 4.016996 |
|          ghostnet_100           | 1  | 3.88362  |
|            repvgg_a2            | 1  | 3.831451 |
|       eca_botnext26ts_256       | 1  | 3.54087  |
|        eca_halonext26ts         | 1  | 3.467705 |
|            tinynet_a            | 1  | 3.332207 |
|           rexnet_100            | 1  | 3.277946 |
|            levit_128            | 1  | 3.274815 |
|            lcnet_050            | 1  | 3.267708 |
|         poolformer_m36          | 1  | 3.214719 |
|       tf_efficientnet_b0        | 1  | 3.09247  |
|         visformer_small         | 1  | 3.074659 |
|           mobilevit_s           | 1  | 2.980407 |
|             dpn107              | 1  | 2.922549 |
|        convmixer_768_32         | 1  | 2.645781 |
|         coat_lite_mini          | 1  | 2.539751 |
|           volo_d1_224           | 1  | 2.44513  |
|        twins_pcpvt_base         | 1  | 2.425685 |
|            mixnet_l             | 1  | 2.212458 |
|          gmixer_24_224          | 1  | 2.195847 |
|           tf_mixnet_l           | 1  | 2.171803 |
|  swin_base_patch4_window7_224   | 1  | 1.910956 |
|          gmlp_s16_224           | 1  | 1.851834 |
|      beit_base_patch16_224      | 1  | 1.835153 |
|        tnt_s_patch16_224        | 1  | 1.80822  |
|          jx_nest_base           | 1  | 1.804967 |
|           convit_base           | 1  | 1.787709 |
|            pit_b_224            | 1  | 1.775354 |
|         crossvit_9_240          | 1  | 1.715695 |
|          convnext_base          | 1  | 1.705023 |
|      xcit_large_24_p8_224       | 1  | 1.698642 |
|          resmlp_12_224          | 1  | 1.698142 |
|      vit_base_patch16_224       | 1  | 1.635232 |
|          cait_m36_384           | 1  | 1.547431 |
| deit_base_distilled_patch16_224 | 1  | 1.534422 |
|          mixer_b16_224          | 1  | 1.42384  |
|        sebotnet33ts_256         | 1  | 1.237561 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|      xcit_large_24_p8_224       | 1  | 42.944854 |
|            hrnet_w18            | 1  | 42.782382 |
|         poolformer_m36          | 1  | 42.720248 |
|          pnasnet5large          | 1  | 41.884389 |
|          cait_m36_384           | 1  | 39.286176 |
|  swin_base_patch4_window7_224   | 1  | 37.92277  |
|          jx_nest_base           | 1  | 31.847565 |
|        res2net101_26w_4s        | 1  | 31.021957 |
|        twins_pcpvt_base         | 1  | 30.215948 |
|           resnest101e           | 1  | 29.753788 |
|        res2net50_14w_8s         | 1  | 29.373754 |
|        tnt_s_patch16_224        | 1  | 28.426546 |
|             dpn107              | 1  | 27.210503 |
|           tf_mixnet_l           | 1  | 26.589439 |
|        adv_inception_v3         | 1  | 25.162444 |
|           mobilevit_s           | 1  | 25.075568 |
|            mixnet_l             | 1  | 25.064088 |
|           volo_d1_224           | 1  | 23.65519  |
|       gluon_inception_v3        | 1  | 22.620869 |
|          inception_v3           | 1  | 22.563661 |
|            levit_128            | 1  | 21.822787 |
|          gmixer_24_224          | 1  | 21.734145 |
|         crossvit_9_240          | 1  | 21.419224 |
|          gmlp_s16_224           | 1  | 20.472064 |
|          convnext_base          | 1  | 20.388539 |
|           res2next50            | 1  | 20.059689 |
|        eca_halonext26ts         | 1  | 19.721754 |
|            fbnetv3_b            | 1  | 19.522927 |
|        sebotnet33ts_256         | 1  | 19.233126 |
|           rexnet_100            | 1  | 18.611883 |
|             dla102              | 1  | 18.441669 |
|          ghostnet_100           | 1  | 18.28526  |
|         coat_lite_mini          | 1  | 18.257303 |
|           convit_base           | 1  | 17.223186 |
|         visformer_small         | 1  | 16.808948 |
|       eca_botnext26ts_256       | 1  | 16.677406 |
|            tinynet_a            | 1  | 16.52917  |
|     swsl_resnext101_32x16d      | 1  | 16.333373 |
|        convmixer_768_32         | 1  | 15.716697 |
|       tf_efficientnet_b0        | 1  | 15.620808 |
|           dm_nfnet_f0           | 1  | 15.371365 |
|          botnet26t_256          | 1  | 15.305194 |
|          cspdarknet53           | 1  | 15.150652 |
|            pit_b_224            | 1  | 14.54545  |
|          mixer_b16_224          | 1  | 14.242746 |
|      beit_base_patch16_224      | 1  | 14.187811 |
|            nfnet_l0             | 1  | 13.961467 |
| deit_base_distilled_patch16_224 | 1  | 13.752193 |
|           regnety_002           | 1  | 13.695859 |
|      vit_base_patch16_224       | 1  | 13.645177 |
|      mobilenetv3_large_100      | 1  | 13.637648 |
|            repvgg_a2            | 1  | 13.501929 |
|           fbnetc_100            | 1  | 13.39276  |
|          spnasnet_100           | 1  | 13.323308 |
|            gernet_l             | 1  | 12.871542 |
|         mobilenetv2_100         | 1  | 12.574023 |
|        ese_vovnet19b_dw         | 1  | 12.347544 |
|           mnasnet_100           | 1  | 12.335513 |
|           selecsls42b           | 1  | 11.870738 |
|          resmlp_12_224          | 1  | 11.864805 |
|            lcnet_050            | 1  | 11.081031 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.912943 |
|          convnext_base          | 1  | 0.907841 |
|          pnasnet5large          | 1  | 0.905938 |
|        convmixer_768_32         | 1  | 0.896716 |
|      beit_base_patch16_224      | 1  | 0.895606 |
|          cait_m36_384           | 1  | 0.893525 |
|           dm_nfnet_f0           | 1  | 0.890536 |
|          resmlp_12_224          | 1  | 0.890455 |
|         poolformer_m36          | 1  | 0.888225 |
|        ese_vovnet19b_dw         | 1  | 0.887305 |
|           convit_base           | 1  | 0.881645 |
|           volo_d1_224           | 1  | 0.876874 |
|  swin_base_patch4_window7_224   | 1  | 0.873956 |
|         visformer_small         | 1  | 0.872857 |
|            pit_b_224            | 1  | 0.871926 |
|      vit_base_patch16_224       | 1  | 0.871208 |
|         coat_lite_mini          | 1  | 0.871172 |
|          mixer_b16_224          | 1  | 0.869778 |
|          gmlp_s16_224           | 1  | 0.869616 |
|         mobilenetv2_100         | 1  | 0.867953 |
|        twins_pcpvt_base         | 1  | 0.867688 |
|          jx_nest_base           | 1  | 0.867472 |
| deit_base_distilled_patch16_224 | 1  | 0.867321 |
|           mobilevit_s           | 1  | 0.862232 |
|            lcnet_050            | 1  | 0.860932 |
|           mnasnet_100           | 1  | 0.859859 |
|          gmixer_24_224          | 1  | 0.859752 |
|           fbnetc_100            | 1  | 0.858841 |
|      mobilenetv3_large_100      | 1  | 0.858516 |
|           rexnet_100            | 1  | 0.855082 |
|       tf_efficientnet_b0        | 1  | 0.854468 |
|      xcit_large_24_p8_224       | 1  | 0.851484 |
|          botnet26t_256          | 1  | 0.851217 |
|       eca_botnext26ts_256       | 1  | 0.850935 |
|            fbnetv3_b            | 1  | 0.85093  |
|            tinynet_a            | 1  | 0.849666 |
|          spnasnet_100           | 1  | 0.849006 |
|        sebotnet33ts_256         | 1  | 0.845618 |
|           tf_mixnet_l           | 1  | 0.845077 |
|            mixnet_l             | 1  | 0.842653 |
|        eca_halonext26ts         | 1  | 0.841824 |
|        tnt_s_patch16_224        | 1  | 0.837641 |
|          ghostnet_100           | 1  | 0.833468 |
|           regnety_002           | 1  | 0.829582 |
|         crossvit_9_240          | 1  | 0.826703 |
|            levit_128            | 1  | 0.822894 |
|             dpn107              | 1  | 0.821181 |
|          cspdarknet53           | 1  | 0.801517 |
|           res2next50            | 1  | 0.792983 |
|             dla102              | 1  | 0.789102 |
|        adv_inception_v3         | 1  | 0.788762 |
|          inception_v3           | 1  | 0.787468 |
|       gluon_inception_v3        | 1  | 0.785194 |
|        res2net50_14w_8s         | 1  | 0.783302 |
|           resnest101e           | 1  | 0.777635 |
|           selecsls42b           | 1  | 0.771145 |
|            gernet_l             | 1  | 0.765827 |
|            repvgg_a2            | 1  | 0.763024 |
|            hrnet_w18            | 1  | 0.758882 |
|        res2net101_26w_4s        | 1  | 0.755918 |
|     swsl_resnext101_32x16d      | 1  | 0.72701  |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1603.119475 |
|      xcit_large_24_p8_224       | 1  | 430.644062  |
|          pnasnet5large          | 1  | 128.328719  |
|          jx_nest_base           | 1  | 108.210964  |
|          convnext_base          | 1  |  98.859713  |
|  swin_base_patch4_window7_224   | 1  |  94.933738  |
|     swsl_resnext101_32x16d      | 1  |  91.954877  |
|           convit_base           | 1  |  86.502173  |
| deit_base_distilled_patch16_224 | 1  |  79.416448  |
|      beit_base_patch16_224      | 1  |  74.959791  |
|      vit_base_patch16_224       | 1  |  72.274145  |
|             dpn107              | 1  |  72.176802  |
|        convmixer_768_32         | 1  |  69.382437  |
|            pit_b_224            | 1  |  64.281123  |
|          mixer_b16_224          | 1  |  61.260451  |
|        sebotnet33ts_256         | 1  |  49.28566   |
|         poolformer_m36          | 1  |  49.107334  |
|        twins_pcpvt_base         | 1  |  48.632689  |
|        tnt_s_patch16_224        | 1  |  41.372306  |
|           volo_d1_224           | 1  |  40.636113  |
|           dm_nfnet_f0           | 1  |  40.275484  |
|           resnest101e           | 1  |  31.186355  |
|          gmlp_s16_224           | 1  |  28.473668  |
|        res2net101_26w_4s        | 1  |  27.944537  |
|            nfnet_l0             | 1  |  27.770762  |
|          gmixer_24_224          | 1  |  25.841561  |
|            hrnet_w18            | 1  |  22.649428  |
|         visformer_small         | 1  |  22.524625  |
|           mobilevit_s           | 1  |  20.742195  |
|           tf_mixnet_l           | 1  |  19.710964  |
|             dla102              | 1  |  19.297797  |
|            mixnet_l             | 1  |  19.248586  |
|        res2net50_14w_8s         | 1  |  16.932697  |
|          resmlp_12_224          | 1  |  16.777997  |
|          cspdarknet53           | 1  |  15.999118  |
|        eca_halonext26ts         | 1  |  15.901517  |
|       gluon_inception_v3        | 1  |  15.157689  |
|        adv_inception_v3         | 1  |  15.080129  |
|          inception_v3           | 1  |  15.045475  |
|       eca_botnext26ts_256       | 1  |  14.883651  |
|         coat_lite_mini          | 1  |  14.745904  |
|         crossvit_9_240          | 1  |  14.648987  |
|           res2next50            | 1  |  14.431491  |
|            repvgg_a2            | 1  |  12.682913  |
|            gernet_l             | 1  |  12.310647  |
|          botnet26t_256          | 1  |  12.113674  |
|           selecsls42b           | 1  |  10.859935  |
|       tf_efficientnet_b0        | 1  |  8.351227   |
|        ese_vovnet19b_dw         | 1  |  8.059192   |
|            fbnetv3_b            | 1  |  7.815179   |
|           rexnet_100            | 1  |  7.808384   |
|            tinynet_a            | 1  |  7.169738   |
|            levit_128            | 1  |  5.463202   |
|          ghostnet_100           | 1  |  4.846575   |
|           fbnetc_100            | 1  |  4.316148   |
|          spnasnet_100           | 1  |  3.988598   |
|      mobilenetv3_large_100      | 1  |  3.783365   |
|           mnasnet_100           | 1  |  3.613431   |
|         mobilenetv2_100         | 1  |  3.536469   |
|           regnety_002           | 1  |  3.282103   |
|            lcnet_050            | 1  |  1.779614   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-21T21:51:02Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-20 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.55x    |    1.39x    |    1.89x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   29.20    |    26.68    |    25.57    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|     pyhpc_equation_of_state     | 1048576 | 16.67518 |
|       mobilenet_v3_large        |   32    | 3.169569 |
|          squeezenet1_1          |   16    |  3.0887  |
|          mobilenet_v2           |   16    | 3.079586 |
|           mnasnet1_0            |   32    | 3.073068 |
|        timm_efficientnet        |   64    | 3.034165 |
|       shufflenet_v2_x1_0        |   64    | 2.590762 |
|          timm_resnest           |   32    | 2.378406 |
|        phlippe_densenet         |   128   | 2.245189 |
|            resnet50             |   32    | 2.242643 |
|        soft_actor_critic        |   256   | 2.171372 |
|         phlippe_resnet          |   128   | 2.129181 |
|         resnext50_32x4d         |    8    | 2.114802 |
|           densenet121           |   64    | 2.098498 |
|            resnet18             |    8    | 2.046849 |
|            resnet152            |   32    | 1.988569 |
|             hf_GPT2             |    1    | 1.96211  |
|           timm_regnet           |   32    | 1.903703 |
|          BERT_pytorch           |    2    | 1.856887 |
|       doctr_det_predictor       |    1    | 1.824102 |
|           timm_nfnet            |   128   | 1.777768 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.75391  |
|          maml_omniglot          |    5    | 1.749545 |
|      doctr_reco_predictor       |    1    | 1.683675 |
|            hf_Albert            |    1    | 1.646996 |
|          fastNLP_Bert           |    1    | 1.616576 |
|             alexnet             |   128   | 1.61185  |
|              llama              |   32    | 1.610729 |
|           timm_vovnet           |   32    | 1.609246 |
|          hf_GPT2_large          |    1    | 1.589915 |
|     functorch_maml_omniglot     |    1    | 1.581886 |
|            moondream            |    1    | 1.560344 |
|          hf_Bert_large          |    1    | 1.539581 |
|             hf_Bert             |    1    | 1.535785 |
|          hf_Longformer          |    1    | 1.512019 |
|        basic_gnn_edgecnn        |    1    | 1.502937 |
|             yolov3              |    8    | 1.468778 |
|         LearningToPaint         |   96    | 1.429124 |
|              vgg16              |    4    | 1.412883 |
|          hf_DistilBert          |    1    | 1.40471  |
|      torch_multimodal_clip      |   32    | 1.361878 |
|          lennard_jones          |  1000   | 1.354687 |
|              dcgan              |   256   | 1.35181  |
|         vision_maskrcnn         |    1    | 1.344056 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.34355  |
|             hf_Bart             |    1    | 1.324589 |
|          basic_gnn_gcn          |    1    | 1.320214 |
|    detectron2_fcos_r_50_fpn     |    1    | 1.317741 |
|           hf_Reformer           |    1    | 1.311616 |
|     timm_vision_transformer     |   32    | 1.300259 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.299265 |
|         opacus_cifar10          |   64    | 1.275486 |
|           hf_T5_large           |    1    | 1.263921 |
|              hf_T5              |    1    | 1.262841 |
|         basic_gnn_sage          |    1    | 1.260552 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.260298 |
|        hf_distil_whisper        |    1    | 1.231741 |
|         pytorch_stargan         |   16    | 1.227882 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.221536 |
|          pytorch_unet           |    1    | 1.219109 |
|              dlrm               |  2048   | 1.210851 |
|     nvidia_deeprecommender      |   256   | 1.199035 |
|          basic_gnn_gin          |    1    | 1.185516 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.182263 |
|      functorch_dp_cifar10       |   64    | 1.179586 |
|           hf_BigBird            |    1    |  1.1759  |
|       Background_Matting        |    1    | 1.134569 |
|           Super_SloMo           |    6    | 1.131382 |
|       speech_transformer        |    1    | 1.106784 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.082527 |
|  timm_vision_transformer_large  |   32    | 1.061876 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.048211 |
|             demucs              |    1    | 1.017588 |
|           tts_angular           |   64    | 0.995929 |
|   mobilenet_v2_quantized_qat    |   96    | 0.995075 |
|     resnet50_quantized_qat      |   32    | 0.992671 |
|           hf_T5_base            |    1    | 0.885915 |
|              maml               |    1    | 0.784028 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.739048 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 154.267506 |
|         vision_maskrcnn         |    1    | 146.040869 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 121.317781 |
|    detectron2_fcos_r_50_fpn     |    1    | 119.903965 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 103.631668 |
|          hf_Longformer          |    1    | 86.689996  |
|           hf_T5_large           |    1    |  84.93808  |
|              maml               |    1    | 81.379805  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 57.714518  |
|       speech_transformer        |    1    | 51.565377  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 50.988055  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.081511  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 45.824501  |
|           hf_Reformer           |    1    | 43.412724  |
|           densenet121           |   64    | 39.283085  |
|     pyhpc_isoneutral_mixing     | 1048576 | 38.005313  |
|           hf_T5_base            |    1    | 37.937684  |
|          hf_GPT2_large          |    1    | 37.127113  |
|  timm_vision_transformer_large  |   32    | 36.619898  |
|            moondream            |    1    | 35.233086  |
|          fastNLP_Bert           |    1    | 30.800405  |
|          hf_Bert_large          |    1    | 30.227986  |
|        hf_distil_whisper        |    1    | 29.877201  |
|          basic_gnn_gcn          |    1    | 29.503422  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.457739  |
|       doctr_det_predictor       |    1    | 28.845267  |
|           Super_SloMo           |    6    | 28.691635  |
|      torch_multimodal_clip      |   32    | 25.740604  |
|            resnet152            |   32    | 24.239902  |
|          BERT_pytorch           |    2    |  23.30176  |
|             hf_Bart             |    1    | 21.570991  |
|              hf_T5              |    1    |  21.3383   |
|             yolov3              |    8    | 20.595826  |
|       shufflenet_v2_x1_0        |   64    | 18.411772  |
|             demucs              |    1    | 17.816234  |
|             hf_Bert             |    1    | 17.579113  |
|        phlippe_densenet         |   128   | 17.264439  |
|       Background_Matting        |    1    | 17.073425  |
|           timm_regnet           |   32    | 17.013693  |
|             hf_GPT2             |    1    | 16.454523  |
|              llama              |   32    |  16.41609  |
|        timm_efficientnet        |   64    | 16.259652  |
|          timm_resnest           |   32    | 16.045708  |
|            hf_Albert            |    1    |  15.52938  |
|     timm_vision_transformer     |   32    | 15.436663  |
|       mobilenet_v3_large        |   32    |  14.9569   |
|           timm_nfnet            |   128   | 14.951008  |
|           timm_vovnet           |   32    | 14.471801  |
|      doctr_reco_predictor       |    1    | 14.212681  |
|         resnext50_32x4d         |    8    | 13.573959  |
|          mobilenet_v2           |   16    | 13.386248  |
|            resnet50             |   32    | 13.235569  |
|          hf_DistilBert          |    1    | 13.210019  |
|         opacus_cifar10          |   64    | 13.001975  |
|          pytorch_unet           |    1    | 12.976621  |
|         pytorch_stargan         |   16    | 12.901886  |
|           mnasnet1_0            |   32    | 12.897195  |
|      functorch_dp_cifar10       |   64    | 12.337861  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.932457  |
|          squeezenet1_1          |   16    | 10.251949  |
|            resnet18             |    8    | 10.137232  |
|     pyhpc_equation_of_state     | 1048576 |  9.833165  |
|         LearningToPaint         |   96    |  9.820813  |
|         phlippe_resnet          |   128   |  9.439043  |
|             alexnet             |   128   |  9.25486   |
|              vgg16              |    4    |  9.123813  |
|     functorch_maml_omniglot     |    1    |  8.456029  |
|              dlrm               |  2048   |  8.189551  |
|          maml_omniglot          |    5    |  8.162845  |
|          basic_gnn_gin          |    1    |  7.814761  |
|        basic_gnn_edgecnn        |    1    |  7.798952  |
|         basic_gnn_sage          |    1    |  7.746803  |
|     nvidia_deeprecommender      |   256   |  7.638032  |
|              dcgan              |   256   |  7.629546  |
|        soft_actor_critic        |   256   |  7.376514  |
|          lennard_jones          |  1000   |  7.320078  |
|           tts_angular           |   64    |  7.155447  |
|   mobilenet_v2_quantized_qat    |   96    |  0.100241  |
|     resnet50_quantized_qat      |   32    |  0.071692  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.992013 |
|              dlrm               |  2048   | 0.987828 |
|           hf_T5_base            |    1    | 0.987562 |
|          hf_GPT2_large          |    1    | 0.984527 |
|        timm_efficientnet        |   64    | 0.982683 |
|       Background_Matting        |    1    | 0.981483 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.980592 |
|  timm_vision_transformer_large  |   32    | 0.979506 |
|             demucs              |    1    | 0.977724 |
|           densenet121           |   64    | 0.977697 |
|        basic_gnn_edgecnn        |    1    | 0.974885 |
|          pytorch_unet           |    1    | 0.973994 |
|           timm_regnet           |   32    | 0.97316  |
|           Super_SloMo           |    6    | 0.972374 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972275 |
|             yolov3              |    8    | 0.969258 |
|         LearningToPaint         |   96    | 0.964703 |
|      torch_multimodal_clip      |   32    | 0.964029 |
|           timm_vovnet           |   32    | 0.963671 |
|          timm_resnest           |   32    | 0.963425 |
|            resnet50             |   32    | 0.962494 |
|            resnet152            |   32    | 0.960766 |
|     timm_vision_transformer     |   32    | 0.959456 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.958632 |
|     resnet50_quantized_qat      |   32    | 0.958506 |
|   mobilenet_v2_quantized_qat    |   96    | 0.957716 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.957141 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.953239 |
|       doctr_det_predictor       |    1    | 0.952935 |
|           mnasnet1_0            |   32    | 0.952837 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.952333 |
|          mobilenet_v2           |   16    | 0.950796 |
|       shufflenet_v2_x1_0        |   64    | 0.949264 |
|       mobilenet_v3_large        |   32    | 0.94877  |
|           hf_BigBird            |    1    | 0.946054 |
|         pytorch_stargan         |   16    | 0.945274 |
|         vision_maskrcnn         |    1    | 0.943441 |
|        phlippe_densenet         |   128   | 0.942237 |
|          basic_gnn_gcn          |    1    | 0.942017 |
|         resnext50_32x4d         |    8    | 0.939047 |
|      doctr_reco_predictor       |    1    | 0.93875  |
|              llama              |   32    | 0.929408 |
|           tts_angular           |   64    | 0.925255 |
|        hf_distil_whisper        |    1    | 0.917822 |
|          squeezenet1_1          |   16    | 0.914779 |
|              dcgan              |   256   | 0.912329 |
|     pyhpc_equation_of_state     | 1048576 | 0.910915 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.910243 |
|             alexnet             |   128   | 0.895928 |
|         phlippe_resnet          |   128   | 0.892587 |
|            resnet18             |    8    | 0.892348 |
|         opacus_cifar10          |   64    | 0.885897 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.882128 |
|        soft_actor_critic        |   256   | 0.881299 |
|         basic_gnn_sage          |    1    | 0.866555 |
|          basic_gnn_gin          |    1    | 0.864909 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.858882 |
|          lennard_jones          |  1000   | 0.857936 |
|          maml_omniglot          |    5    | 0.853556 |
|     functorch_maml_omniglot     |    1    | 0.852818 |
|          fastNLP_Bert           |    1    | 0.848849 |
|            moondream            |    1    | 0.823684 |
|          hf_Bert_large          |    1    | 0.809008 |
|       speech_transformer        |    1    | 0.808746 |
|      functorch_dp_cifar10       |   64    | 0.806138 |
|          BERT_pytorch           |    2    | 0.804475 |
|          hf_Longformer          |    1    | 0.799136 |
|             hf_Bert             |    1    | 0.794292 |
|           hf_T5_large           |    1    | 0.793931 |
|            hf_Albert            |    1    | 0.789953 |
|              hf_T5              |    1    | 0.777578 |
|     nvidia_deeprecommender      |   256   | 0.776221 |
|              vgg16              |    4    | 0.770403 |
|             hf_Bart             |    1    | 0.769151 |
|          hf_DistilBert          |    1    | 0.767808 |
|             hf_GPT2             |    1    | 0.761809 |
|           hf_Reformer           |    1    | 0.735897 |
|              maml               |    1    | 0.718264 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.695842 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4499.737492 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1457.528266 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1332.891823 |
|           hf_T5_base            |    1    | 1206.201939 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1185.634376 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1084.457203 |
|           Super_SloMo           |    6    | 627.655295  |
|           timm_nfnet            |   128   | 574.049967  |
|          hf_GPT2_large          |    1    | 554.872232  |
|            moondream            |    1    | 412.457547  |
|           hf_T5_large           |    1    | 410.029913  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 396.394196  |
|        hf_distil_whisper        |    1    | 358.842415  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 353.387536  |
|         vision_maskrcnn         |    1    | 313.330214  |
|       Background_Matting        |    1    | 261.180793  |
|          pytorch_unet           |    1    | 233.081955  |
|           timm_regnet           |   32    | 224.022902  |
|            resnet152            |   32    | 194.352376  |
|           densenet121           |   64    | 193.317317  |
|             yolov3              |    8    | 184.709312  |
|    detectron2_fcos_r_50_fpn     |    1    | 170.555398  |
|      torch_multimodal_clip      |   32    | 154.948384  |
|             demucs              |    1    | 153.281066  |
|           hf_BigBird            |    1    | 145.290955  |
|           timm_vovnet           |   32    | 127.092191  |
|          hf_Bert_large          |    1    | 113.430463  |
|       doctr_det_predictor       |    1    |  97.889834  |
|         pytorch_stargan         |   16    |  97.153869  |
|     timm_vision_transformer     |   32    |  91.347212  |
|            resnet50             |   32    |  76.886334  |
|          hf_Longformer          |    1    |  72.171862  |
|             hf_Bart             |    1    |  54.791267  |
|          timm_resnest           |   32    |  53.858848  |
|              maml               |    1    |  53.186772  |
|       speech_transformer        |    1    |  53.090397  |
|        timm_efficientnet        |   64    |  48.742424  |
|             alexnet             |   128   |  44.743445  |
|              hf_T5              |    1    |  43.134361  |
|   mobilenet_v2_quantized_qat    |   96    |  41.76094   |
|             hf_Bert             |    1    |  41.680032  |
|         LearningToPaint         |   96    |  38.87812   |
|            hf_Albert            |    1    |  35.814949  |
|              vgg16              |    4    |  35.541923  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  34.925142  |
|          fastNLP_Bert           |    1    |  33.865433  |
|     nvidia_deeprecommender      |   256   |  31.292612  |
|     pyhpc_isoneutral_mixing     | 1048576 |  31.206338  |
|           hf_Reformer           |    1    |  28.916446  |
|     resnet50_quantized_qat      |   32    |  28.401762  |
|          hf_DistilBert          |    1    |  26.827831  |
|             hf_GPT2             |    1    |  24.344001  |
|         resnext50_32x4d         |    8    |  23.098348  |
|           tts_angular           |   64    |  21.950965  |
|              llama              |   32    |  21.75779   |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  20.726866  |
|          BERT_pytorch           |    2    |  20.36983   |
|        phlippe_densenet         |   128   |  20.344896  |
|              dcgan              |   256   |  19.678236  |
|        basic_gnn_edgecnn        |    1    |  19.64072   |
|       shufflenet_v2_x1_0        |   64    |  16.372185  |
|           mnasnet1_0            |   32    |  14.334932  |
|       mobilenet_v3_large        |   32    |  13.111065  |
|          basic_gnn_gcn          |    1    |   9.71618   |
|          mobilenet_v2           |   16    |  9.414723   |
|            resnet18             |    8    |  9.219143   |
|              dlrm               |  2048   |  7.221731   |
|         opacus_cifar10          |   64    |  6.305326   |
|      functorch_dp_cifar10       |   64    |  6.254378   |
|          squeezenet1_1          |   16    |  6.171323   |
|         basic_gnn_sage          |    1    |   5.16651   |
|          basic_gnn_gin          |    1    |  4.784958   |
|         phlippe_resnet          |   128   |  4.344159   |
|      doctr_reco_predictor       |    1    |  3.827952   |
|     pyhpc_equation_of_state     | 1048576 |  0.905361   |
|     functorch_maml_omniglot     |    1    |   0.47725   |
|          maml_omniglot          |    5    |  0.382315   |
|        soft_actor_critic        |   256   |  0.323974   |
|          lennard_jones          |  1000   |  0.218326   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.622727 |
|     MobileBertForQuestionAnswering      | 128 | 2.086166 |
|           ElectraForCausalLM            | 32  | 2.007727 |
|      GPT2ForSequenceClassification      |  4  | 1.977098 |
|       ElectraForQuestionAnswering       | 64  | 1.939933 |
|          MobileBertForMaskedLM          | 128 | 1.663757 |
|               DistillGPT2               | 16  | 1.52832  |
|       RobertaForQuestionAnswering       | 16  | 1.508387 |
|    LayoutLMForSequenceClassification    | 16  | 1.469876 |
|        BertForQuestionAnswering         | 16  | 1.453456 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.438207 |
|               GoogleFnet                | 16  | 1.434273 |
|            YituTechConvBert             | 16  | 1.431379 |
|           RobertaForCausalLM            | 16  | 1.428466 |
|          AllenaiLongformerBase          |  4  | 1.420515 |
|    MegatronBertForQuestionAnswering     |  8  | 1.412841 |
|           LayoutLMForMaskedLM           | 16  | 1.406692 |
|                CamemBert                | 16  | 1.393478 |
|             BertForMaskedLM             | 16  | 1.391907 |
|         MegatronBertForCausalLM         |  4  | 1.378735 |
|           DebertaForMaskedLM            |  8  | 1.358408 |
|       DebertaForQuestionAnswering       | 16  | 1.340379 |
|             XGLMForCausalLM             |  8  | 1.302416 |
|     PLBartForConditionalGeneration      |  4  | 1.29517  |
|      MBartForConditionalGeneration      |  2  | 1.288401 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.269911 |
|          BlenderbotForCausalLM          |  4  | 1.266906 |
|          DebertaV2ForMaskedLM           |  2  | 1.246682 |
|       AlbertForQuestionAnswering        |  4  | 1.245802 |
|            AlbertForMaskedLM            |  4  | 1.243235 |
|         Speech2Text2ForCausalLM         | 256 | 1.229063 |
|             OPTForCausalLM              |  2  | 1.226384 |
|          DistilBertForMaskedLM          | 128 | 1.218223 |
|       MT5ForConditionalGeneration       | 16  | 1.208612 |
|     DistilBertForQuestionAnswering      | 256 | 1.200273 |
|     M2M100ForConditionalGeneration      | 16  | 1.196312 |
|     PegasusForConditionalGeneration     | 32  | 1.166141 |
|      BartForConditionalGeneration       |  2  | 1.165072 |
|       BlenderbotSmallForCausalLM        | 64  | 1.161861 |
|             BartForCausalLM             |  4  | 1.14634  |
|                 T5Small                 |  4  | 1.143004 |
|       T5ForConditionalGeneration        |  4  | 1.142666 |
|           PegasusForCausalLM            | 32  | 1.140916 |
|            MBartForCausalLM             |  4  | 1.121627 |
|            TrOCRForCausalLM             | 32  | 1.083903 |
|            PLBartForCausalLM            |  8  | 1.07606  |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 104.167308 |
|     MobileBertForQuestionAnswering      | 128 | 73.451456  |
|          MobileBertForMaskedLM          | 128 |  73.26196  |
|     M2M100ForConditionalGeneration      | 16  |  44.84485  |
|      MBartForConditionalGeneration      |  2  | 44.277011  |
|     PegasusForConditionalGeneration     | 32  |  43.98295  |
|      BartForConditionalGeneration       |  2  | 38.965032  |
|             XGLMForCausalLM             |  8  |  36.10099  |
|          BlenderbotForCausalLM          |  4  | 35.661044  |
|          DebertaV2ForMaskedLM           |  2  | 35.229601  |
|      DebertaV2ForQuestionAnswering      |  1  | 34.450448  |
|         MegatronBertForCausalLM         |  4  | 30.929803  |
| BlenderbotSmallForConditionalGeneration | 64  | 30.629474  |
|       MT5ForConditionalGeneration       | 16  | 30.510285  |
|    MegatronBertForQuestionAnswering     |  8  | 29.847179  |
|            YituTechConvBert             | 16  | 25.785181  |
|     PLBartForConditionalGeneration      |  4  | 24.620196  |
|                 T5Small                 |  4  | 23.074739  |
|       T5ForConditionalGeneration        |  4  |  22.89557  |
|            MBartForCausalLM             |  4  | 20.510067  |
|           PegasusForCausalLM            | 32  |  20.47388  |
|            TrOCRForCausalLM             | 32  | 20.453762  |
|             OPTForCausalLM              |  2  | 20.216585  |
|           DebertaForMaskedLM            |  8  | 19.625156  |
|       DebertaForQuestionAnswering       | 16  | 19.464221  |
|             BartForCausalLM             |  4  | 17.923287  |
|           ElectraForCausalLM            | 32  | 17.904718  |
|       ElectraForQuestionAnswering       | 64  | 17.481155  |
|           LayoutLMForMaskedLM           | 16  | 17.391465  |
|           RobertaForCausalLM            | 16  | 17.330915  |
|       RobertaForQuestionAnswering       | 16  | 17.315066  |
|                CamemBert                | 16  | 17.226743  |
|    LayoutLMForSequenceClassification    | 16  | 17.216894  |
|        BertForQuestionAnswering         | 16  | 17.171481  |
|             BertForMaskedLM             | 16  |  17.07589  |
|       BlenderbotSmallForCausalLM        | 64  | 15.882824  |
|      GPT2ForSequenceClassification      |  4  | 15.879224  |
|            AlbertForMaskedLM            |  4  | 15.376479  |
|               GoogleFnet                | 16  | 14.172867  |
|         Speech2Text2ForCausalLM         | 256 | 13.819994  |
|            PLBartForCausalLM            |  8  | 13.739865  |
|          DistilBertForMaskedLM          | 128 | 13.681298  |
|     DistilBertForQuestionAnswering      | 256 | 13.416114  |
|       AlbertForQuestionAnswering        |  4  | 13.137373  |
|               DistillGPT2               | 16  | 11.320226  |
|            XLNetLMHeadModel             |  8  |  9.439048  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.994016 |
|            AlbertForMaskedLM            |  4  | 0.993781 |
|             OPTForCausalLM              |  2  | 0.99277  |
|     DistilBertForQuestionAnswering      | 256 | 0.99248  |
|            TrOCRForCausalLM             | 32  | 0.992133 |
|           RobertaForCausalLM            | 16  | 0.992026 |
|          DistilBertForMaskedLM          | 128 | 0.991233 |
|               DistillGPT2               | 16  | 0.991186 |
|           ElectraForCausalLM            | 32  | 0.990892 |
|               GoogleFnet                | 16  | 0.99087  |
|       ElectraForQuestionAnswering       | 64  | 0.990564 |
|           LayoutLMForMaskedLM           | 16  | 0.990177 |
|            PLBartForCausalLM            |  8  | 0.989974 |
|             BertForMaskedLM             | 16  | 0.989931 |
|                CamemBert                | 16  | 0.989449 |
|            MBartForCausalLM             |  4  | 0.989239 |
|       RobertaForQuestionAnswering       | 16  | 0.988338 |
|        BertForQuestionAnswering         | 16  | 0.988201 |
|       DebertaForQuestionAnswering       | 16  | 0.988078 |
|            YituTechConvBert             | 16  | 0.987951 |
|         Speech2Text2ForCausalLM         | 256 | 0.987908 |
|    MegatronBertForQuestionAnswering     |  8  | 0.987839 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987806 |
|     PLBartForConditionalGeneration      |  4  | 0.987479 |
|    LayoutLMForSequenceClassification    | 16  | 0.986819 |
|             BartForCausalLM             |  4  | 0.98663  |
|       BlenderbotSmallForCausalLM        | 64  | 0.986378 |
|      GPT2ForSequenceClassification      |  4  | 0.985691 |
|           PegasusForCausalLM            | 32  | 0.985644 |
|          MobileBertForMaskedLM          | 128 | 0.985591 |
|            XLNetLMHeadModel             |  8  | 0.985509 |
|       T5ForConditionalGeneration        |  4  | 0.983941 |
|                 T5Small                 |  4  | 0.983796 |
|           DebertaForMaskedLM            |  8  | 0.983465 |
|         MegatronBertForCausalLM         |  4  | 0.982713 |
|     MobileBertForQuestionAnswering      | 128 | 0.980034 |
|          AllenaiLongformerBase          |  4  | 0.97776  |
|     PegasusForConditionalGeneration     | 32  | 0.974369 |
|      BartForConditionalGeneration       |  2  | 0.968054 |
|      MBartForConditionalGeneration      |  2  | 0.967296 |
|       MT5ForConditionalGeneration       | 16  | 0.963161 |
|          DebertaV2ForMaskedLM           |  2  | 0.933681 |
|     M2M100ForConditionalGeneration      | 16  |  0.9268  |
|      DebertaV2ForQuestionAnswering      |  1  | 0.878486 |
|             XGLMForCausalLM             |  8  | 0.873921 |
|          BlenderbotForCausalLM          |  4  | 0.844018 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2593.005242 |
|       AlbertForQuestionAnswering        |  4  | 2576.511159 |
|            XLNetLMHeadModel             |  8  | 1323.895921 |
|            TrOCRForCausalLM             | 32  | 987.804877  |
|     PegasusForConditionalGeneration     | 32  | 943.118895  |
|     DistilBertForQuestionAnswering      | 256 |  840.02391  |
|    MegatronBertForQuestionAnswering     |  8  | 750.235491  |
|      MBartForConditionalGeneration      |  2  | 659.502691  |
|            MBartForCausalLM             |  4  | 656.032498  |
|          DistilBertForMaskedLM          | 128 | 634.719867  |
|           RobertaForCausalLM            | 16  | 608.571073  |
|             OPTForCausalLM              |  2  | 593.162918  |
|          BlenderbotForCausalLM          |  4  | 590.363462  |
|            YituTechConvBert             | 16  | 585.873762  |
|     M2M100ForConditionalGeneration      | 16  | 584.144906  |
|      BartForConditionalGeneration       |  2  | 581.230161  |
|          DebertaV2ForMaskedLM           |  2  | 564.321494  |
|                CamemBert                | 16  | 559.916067  |
|             BertForMaskedLM             | 16  | 554.354728  |
|           LayoutLMForMaskedLM           | 16  | 552.086667  |
|            PLBartForCausalLM            |  8  |  520.86689  |
|          AllenaiLongformerBase          |  4  |  520.62611  |
|       DebertaForQuestionAnswering       | 16  | 502.093143  |
|             BartForCausalLM             |  4  | 497.282972  |
|           PegasusForCausalLM            | 32  | 467.490271  |
| BlenderbotSmallForConditionalGeneration | 64  | 464.997485  |
|     PLBartForConditionalGeneration      |  4  | 456.885936  |
|        BertForQuestionAnswering         | 16  | 443.444881  |
|    LayoutLMForSequenceClassification    | 16  | 441.089523  |
|         MegatronBertForCausalLM         |  4  | 440.329604  |
|       RobertaForQuestionAnswering       | 16  | 421.165934  |
|          MobileBertForMaskedLM          | 128 | 404.829605  |
|               GoogleFnet                | 16  | 401.723596  |
|               DistillGPT2               | 16  |  392.13522  |
|       T5ForConditionalGeneration        |  4  | 359.389451  |
|                 T5Small                 |  4  | 359.302169  |
|             XGLMForCausalLM             |  8  |  338.50198  |
|           DebertaForMaskedLM            |  8  | 337.222618  |
|       ElectraForQuestionAnswering       | 64  | 295.698818  |
|       MT5ForConditionalGeneration       | 16  | 277.786052  |
|       BlenderbotSmallForCausalLM        | 64  | 272.567142  |
|         Speech2Text2ForCausalLM         | 256 | 255.499281  |
|      GPT2ForSequenceClassification      |  4  | 253.965638  |
|      DebertaV2ForQuestionAnswering      |  1  | 235.002102  |
|     MobileBertForQuestionAnswering      | 128 | 226.653927  |
|           ElectraForCausalLM            | 32  | 223.301129  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|         mobilenetv2_100         | 128  | 3.957812 |
|           fbnetc_100            | 512  | 3.951392 |
|            lcnet_050            | 256  | 3.944322 |
|           mnasnet_100           | 512  | 3.894754 |
|          spnasnet_100           | 128  | 3.804132 |
|      mobilenetv3_large_100      | 512  | 3.609236 |
|            fbnetv3_b            | 256  | 3.483963 |
|           regnety_002           | 1024 | 3.296691 |
|           rexnet_100            | 256  | 3.119593 |
|       tf_efficientnet_b0        | 128  | 3.089014 |
|            tinynet_a            | 128  | 2.990859 |
|          pnasnet5large          |  16  | 2.715483 |
|        ese_vovnet19b_dw         | 256  | 2.604484 |
|          botnet26t_256          | 128  | 2.482477 |
|           mobilevit_s           |  64  | 2.453301 |
|           res2next50            | 128  | 2.434767 |
|          ghostnet_100           | 512  | 2.341513 |
|       gluon_inception_v3        | 256  | 2.32692  |
|       eca_botnext26ts_256       | 128  | 2.302268 |
|          inception_v3           | 128  | 2.300644 |
|        adv_inception_v3         | 128  | 2.281481 |
|        eca_halonext26ts         | 128  | 2.250794 |
|             dla102              | 128  | 2.178897 |
|        res2net50_14w_8s         | 128  | 2.169996 |
|        res2net101_26w_4s        | 128  |  2.1227  |
|          cspdarknet53           |  64  | 2.062818 |
|           tf_mixnet_l           | 128  | 1.989868 |
|            repvgg_a2            | 128  | 1.969205 |
|        convmixer_768_32         |  32  | 1.957096 |
|            nfnet_l0             | 128  | 1.948142 |
|            gernet_l             | 128  | 1.89332  |
|         poolformer_m36          |  64  |  1.8832  |
|           volo_d1_224           |  64  | 1.827838 |
|            mixnet_l             | 128  | 1.78946  |
|           dm_nfnet_f0           | 128  | 1.781198 |
|           selecsls42b           | 128  | 1.772059 |
|         visformer_small         | 128  |  1.6647  |
|        sebotnet33ts_256         |  64  | 1.581297 |
|           convit_base           |  64  | 1.568443 |
|           resnest101e           |  64  | 1.492785 |
|            levit_128            | 1024 | 1.482063 |
|             dpn107              |  64  | 1.444728 |
|          jx_nest_base           |  32  | 1.378487 |
|          gmlp_s16_224           | 128  | 1.371567 |
|      xcit_large_24_p8_224       |  16  | 1.341962 |
|          gmixer_24_224          | 128  | 1.320658 |
|  swin_base_patch4_window7_224   |  64  | 1.310958 |
|         coat_lite_mini          | 128  | 1.290646 |
|        twins_pcpvt_base         | 128  | 1.265354 |
|        tnt_s_patch16_224        | 128  | 1.246671 |
|          mixer_b16_224          | 128  | 1.202466 |
|          convnext_base          |  64  | 1.202161 |
|      beit_base_patch16_224      |  64  | 1.188367 |
| deit_base_distilled_patch16_224 |  64  | 1.164963 |
|      vit_base_patch16_224       |  64  | 1.148262 |
|          cait_m36_384           |  4   | 1.140201 |
|            pit_b_224            |  64  | 1.126143 |
|         crossvit_9_240          | 256  | 1.081148 |
|            hrnet_w18            | 128  | 0.805685 |
|          resmlp_12_224          | 128  | 0.784875 |
|     swsl_resnext101_32x16d      |  32  | 0.075084 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|     swsl_resnext101_32x16d      |  32  | 99.380436 |
|          cait_m36_384           |  4   | 63.875214 |
|            hrnet_w18            | 128  | 54.24628  |
|      xcit_large_24_p8_224       |  16  | 52.816297 |
|          pnasnet5large          |  16  | 50.742821 |
|  swin_base_patch4_window7_224   |  64  | 48.939003 |
|         poolformer_m36          |  64  | 45.682872 |
|           mobilevit_s           |  64  | 41.352462 |
|          jx_nest_base           |  32  | 39.155823 |
|        tnt_s_patch16_224        | 128  | 38.904254 |
|        twins_pcpvt_base         | 128  | 37.056501 |
|             dpn107              |  64  | 35.74535  |
|        res2net101_26w_4s        | 128  | 35.368421 |
|        res2net50_14w_8s         | 128  | 33.297087 |
|        eca_halonext26ts         | 128  | 30.813409 |
|           volo_d1_224           |  64  | 30.613598 |
|           resnest101e           |  64  | 30.363572 |
|           tf_mixnet_l           | 128  | 29.912306 |
|         crossvit_9_240          | 256  | 28.412812 |
|            levit_128            | 1024 | 27.397092 |
|            mixnet_l             | 128  | 27.339515 |
|        adv_inception_v3         | 128  | 27.306922 |
|          gmixer_24_224          | 128  | 26.618039 |
|          gmlp_s16_224           | 128  | 25.638735 |
|          inception_v3           | 128  | 25.233334 |
|        sebotnet33ts_256         |  64  | 25.094629 |
|         coat_lite_mini          | 128  | 22.999467 |
|          convnext_base          |  64  | 22.816681 |
|       gluon_inception_v3        | 256  | 22.243693 |
|           res2next50            | 128  | 21.899773 |
|       eca_botnext26ts_256       | 128  | 20.220822 |
|           rexnet_100            | 256  | 19.644446 |
|           convit_base           |  64  | 19.484873 |
|            fbnetv3_b            | 256  | 19.061922 |
|          ghostnet_100           | 512  | 18.554073 |
|          botnet26t_256          | 128  | 18.346513 |
|             dla102              | 128  | 17.867578 |
|            tinynet_a            | 128  | 17.459561 |
|            pit_b_224            |  64  | 17.163599 |
|         visformer_small         | 128  | 16.857878 |
|       tf_efficientnet_b0        | 128  | 16.287156 |
|      beit_base_patch16_224      |  64  | 15.999402 |
|          mixer_b16_224          | 128  | 15.986397 |
|        convmixer_768_32         |  32  | 15.669601 |
|          cspdarknet53           |  64  | 15.281402 |
| deit_base_distilled_patch16_224 |  64  | 15.077743 |
|      vit_base_patch16_224       |  64  | 15.035649 |
|          resmlp_12_224          | 128  | 13.745129 |
|          spnasnet_100           | 128  | 13.470311 |
|           dm_nfnet_f0           | 128  | 13.362821 |
|            repvgg_a2            | 128  | 12.996029 |
|            gernet_l             | 128  | 12.478163 |
|         mobilenetv2_100         | 128  | 12.172623 |
|            nfnet_l0             | 128  | 12.135773 |
|      mobilenetv3_large_100      | 512  | 11.715991 |
|           selecsls42b           | 128  |  11.5564  |
|           regnety_002           | 1024 | 11.544796 |
|            lcnet_050            | 256  | 10.883855 |
|           fbnetc_100            | 512  | 9.874778  |
|           mnasnet_100           | 512  |  9.30089  |
|        ese_vovnet19b_dw         | 256  | 9.242847  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997212 |
|            fbnetv3_b            | 256  | 0.996038 |
|           fbnetc_100            | 512  |  0.996   |
|      mobilenetv3_large_100      | 512  | 0.995234 |
|           dm_nfnet_f0           | 128  | 0.994987 |
|           mnasnet_100           | 512  | 0.99484  |
|          ghostnet_100           | 512  | 0.99436  |
|           regnety_002           | 1024 | 0.994331 |
|            nfnet_l0             | 128  | 0.993955 |
|            levit_128            | 1024 |  0.9937  |
|       eca_botnext26ts_256       | 128  | 0.993676 |
|          convnext_base          |  64  | 0.993554 |
|           rexnet_100            | 256  | 0.992775 |
|        eca_halonext26ts         | 128  | 0.992636 |
|           res2next50            | 128  | 0.992154 |
|          gmlp_s16_224           | 128  | 0.991903 |
|        twins_pcpvt_base         | 128  | 0.991837 |
|         coat_lite_mini          | 128  | 0.991485 |
|           tf_mixnet_l           | 128  | 0.991411 |
|         mobilenetv2_100         | 128  | 0.99137  |
|        res2net101_26w_4s        | 128  | 0.99123  |
|         visformer_small         | 128  | 0.991105 |
|          botnet26t_256          | 128  | 0.991088 |
|       tf_efficientnet_b0        | 128  | 0.991076 |
|        convmixer_768_32         |  32  | 0.990954 |
|            mixnet_l             | 128  | 0.990921 |
|      xcit_large_24_p8_224       |  16  | 0.990693 |
|          mixer_b16_224          | 128  | 0.990425 |
|        sebotnet33ts_256         |  64  | 0.990379 |
|          cspdarknet53           |  64  | 0.989816 |
|           mobilevit_s           |  64  | 0.989438 |
|             dla102              | 128  | 0.989369 |
|             dpn107              |  64  | 0.989365 |
|          gmixer_24_224          | 128  | 0.989076 |
|            gernet_l             | 128  | 0.988914 |
|       gluon_inception_v3        | 256  | 0.988596 |
|        tnt_s_patch16_224        | 128  | 0.988561 |
|  swin_base_patch4_window7_224   |  64  | 0.988007 |
|           convit_base           |  64  | 0.987931 |
|      beit_base_patch16_224      |  64  | 0.987876 |
|         poolformer_m36          |  64  | 0.987473 |
|           selecsls42b           | 128  | 0.987289 |
|        res2net50_14w_8s         | 128  | 0.987243 |
|            tinynet_a            | 128  | 0.98657  |
| deit_base_distilled_patch16_224 |  64  | 0.986074 |
|          pnasnet5large          |  16  | 0.985906 |
|          resmlp_12_224          | 128  | 0.985679 |
|          spnasnet_100           | 128  | 0.984715 |
|      vit_base_patch16_224       |  64  | 0.984699 |
|           resnest101e           |  64  | 0.98421  |
|        adv_inception_v3         | 128  | 0.983282 |
|          jx_nest_base           |  32  | 0.983232 |
|          inception_v3           | 128  | 0.982855 |
|            hrnet_w18            | 128  | 0.982693 |
|            lcnet_050            | 256  | 0.982645 |
|            pit_b_224            |  64  | 0.982502 |
|     swsl_resnext101_32x16d      |  32  | 0.980631 |
|            repvgg_a2            | 128  | 0.980099 |
|           volo_d1_224           |  64  | 0.979373 |
|          cait_m36_384           |  4   | 0.977842 |
|         crossvit_9_240          | 256  | 0.966794 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|     swsl_resnext101_32x16d      |  32  | 17343.70681 |
|           resnest101e           |  64  |  2728.1848  |
|            hrnet_w18            | 128  | 1423.521325 |
|      xcit_large_24_p8_224       |  16  | 1354.772992 |
|          cait_m36_384           |  4   | 1196.093264 |
|          convnext_base          |  64  | 1061.982767 |
|           dm_nfnet_f0           | 128  | 1018.161425 |
|             dpn107              |  64  | 984.815873  |
|          mixer_b16_224          | 128  | 966.236449  |
|       gluon_inception_v3        | 256  | 820.015453  |
|  swin_base_patch4_window7_224   |  64  | 790.424226  |
|        tnt_s_patch16_224        | 128  | 753.430994  |
|        twins_pcpvt_base         | 128  | 731.468685  |
|           convit_base           |  64  | 722.075618  |
|        res2net101_26w_4s        | 128  | 663.529711  |
|      vit_base_patch16_224       |  64  | 641.702717  |
| deit_base_distilled_patch16_224 |  64  | 636.076769  |
|            nfnet_l0             | 128  |  635.70658  |
|      beit_base_patch16_224      |  64  | 630.486124  |
|            levit_128            | 1024 | 569.440482  |
|        ese_vovnet19b_dw         | 256  | 559.595735  |
|             dla102              | 128  | 552.005055  |
|          gmlp_s16_224           | 128  | 536.129404  |
|            pit_b_224            |  64  | 523.103653  |
|          gmixer_24_224          | 128  | 507.647042  |
|         crossvit_9_240          | 256  | 483.650029  |
|          jx_nest_base           |  32  | 483.475159  |
|        convmixer_768_32         |  32  | 452.790485  |
|         poolformer_m36          |  64  | 431.475692  |
|          resmlp_12_224          | 128  | 417.953602  |
|          inception_v3           | 128  | 411.234208  |
|        adv_inception_v3         | 128  | 409.830102  |
|        res2net50_14w_8s         | 128  | 404.530252  |
|         visformer_small         | 128  | 403.417436  |
|           volo_d1_224           |  64  | 397.769864  |
|         coat_lite_mini          | 128  | 370.658581  |
|           res2next50            | 128  | 364.623254  |
|          ghostnet_100           | 512  | 362.610781  |
|            repvgg_a2            | 128  | 356.640738  |
|          pnasnet5large          |  16  | 350.275973  |
|            mixnet_l             | 128  | 349.668379  |
|           tf_mixnet_l           | 128  | 337.456984  |
|        sebotnet33ts_256         |  64  | 322.804167  |
|        eca_halonext26ts         | 128  | 318.171077  |
|           fbnetc_100            | 512  | 307.115855  |
|            gernet_l             | 128  |  305.97325  |
|       eca_botnext26ts_256       | 128  | 305.194923  |
|          botnet26t_256          | 128  | 294.859101  |
|           regnety_002           | 1024 | 283.135134  |
|          cspdarknet53           |  64  | 271.188313  |
|           mnasnet_100           | 512  | 257.651444  |
|            fbnetv3_b            | 256  | 244.927831  |
|           selecsls42b           | 128  | 236.589518  |
|      mobilenetv3_large_100      | 512  |  232.35919  |
|           rexnet_100            | 256  | 226.516567  |
|           mobilevit_s           |  64  | 157.269392  |
|       tf_efficientnet_b0        | 128  | 116.648645  |
|            tinynet_a            | 128  |  81.822543  |
|         mobilenetv2_100         | 128  |  70.082826  |
|          spnasnet_100           | 128  |  63.158922  |
|            lcnet_050            | 256  |  27.028893  |
+---------------------------------+------+-------------+

zxd1997066 · 2024-10-21T21:51:04Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-20 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.61x    |    1.22x    |    1.59x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   28.74    |    25.86    |    24.84    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 49.18894  |
|     pyhpc_equation_of_state     |    1    | 23.705217 |
|          maml_omniglot          |    5    |  3.99273  |
|     functorch_maml_omniglot     |    1    |  3.78498  |
|         basic_gnn_sage          |    1    | 3.601295  |
|          basic_gnn_gin          |    1    | 3.576546  |
|          squeezenet1_1          |    1    | 3.562461  |
|          basic_gnn_gcn          |    1    | 3.223276  |
|         opacus_cifar10          |    1    | 2.828547  |
|           timm_nfnet            |    1    | 2.809579  |
|      functorch_dp_cifar10       |    1    | 2.611297  |
|       shufflenet_v2_x1_0        |    1    | 2.404592  |
|            resnet18             |    1    | 2.254194  |
|              dcgan              |    1    | 2.237591  |
|          mobilenet_v2           |    1    | 2.174599  |
|         phlippe_resnet          |    1    | 2.153385  |
|           mnasnet1_0            |    1    | 2.083528  |
|          timm_resnest           |    1    | 2.037389  |
|       mobilenet_v3_large        |    1    | 2.027057  |
|            resnet50             |    1    | 1.822586  |
|        phlippe_densenet         |    1    | 1.820991  |
|           densenet121           |    1    | 1.816458  |
|        timm_efficientnet        |    1    | 1.788976  |
|            resnet152            |    1    | 1.718074  |
|          lennard_jones          |    1    | 1.644123  |
|         LearningToPaint         |    1    | 1.635638  |
|           timm_vovnet           |    1    | 1.626703  |
|              llama              |    1    |  1.60373  |
|           timm_regnet           |    1    | 1.529406  |
|         resnext50_32x4d         |    1    | 1.526581  |
|      doctr_reco_predictor       |    1    | 1.508318  |
|              vgg16              |    1    | 1.447022  |
|       doctr_det_predictor       |    1    | 1.416697  |
|             yolov3              |    1    | 1.410973  |
|              dlrm               |    1    | 1.403051  |
|        basic_gnn_edgecnn        |    1    | 1.398616  |
|          BERT_pytorch           |    1    |  1.39845  |
|             alexnet             |    1    | 1.372759  |
|            hf_Albert            |    1    | 1.323679  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.296082  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.286515  |
|             hf_GPT2             |    1    | 1.285114  |
|         vision_maskrcnn         |    1    | 1.266602  |
|           hf_Reformer           |    1    |  1.26511  |
|          hf_GPT2_large          |    1    | 1.257473  |
|     timm_vision_transformer     |    1    |  1.25312  |
|          fastNLP_Bert           |    1    | 1.249574  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.235622  |
|            moondream            |    1    | 1.225502  |
|           Super_SloMo           |    1    |  1.20765  |
|        soft_actor_critic        |   256   | 1.203319  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.193876  |
|             hf_Bert             |    1    | 1.192304  |
|         pytorch_stargan         |   16    | 1.186066  |
|          hf_Bert_large          |    1    |  1.18134  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.179018  |
|  timm_vision_transformer_large  |    1    | 1.168157  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.164974  |
|              maml               |    1    | 1.157519  |
|          hf_DistilBert          |    1    | 1.142952  |
|      torch_multimodal_clip      |    1    | 1.131291  |
|          pytorch_unet           |    1    |  1.12509  |
|             hf_Bart             |    1    | 1.114021  |
|           hf_BigBird            |    1    | 1.101672  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.096332  |
|       speech_transformer        |    1    | 1.086747  |
|        hf_distil_whisper        |    1    | 1.079876  |
|       Background_Matting        |    1    | 1.073642  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.058467  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051597  |
|          hf_Longformer          |    1    | 1.032405  |
|             demucs              |    1    | 1.000972  |
|           tts_angular           |    1    | 1.000595  |
|     resnet50_quantized_qat      |    1    | 0.995896  |
|   mobilenet_v2_quantized_qat    |    1    | 0.981034  |
|     nvidia_deeprecommender      |    1    | 0.959102  |
|           hf_T5_large           |    1    | 0.791819  |
|              hf_T5              |    1    | 0.709018  |
|           hf_T5_base            |    1    | 0.593349  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 155.856281 |
|         vision_maskrcnn         |    1    | 145.930161 |
|    detectron2_fcos_r_50_fpn     |    1    | 121.575897 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 113.086556 |
|           hf_T5_base            |    1    | 92.057091  |
|           hf_T5_large           |    1    | 90.943387  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  90.35652  |
|          hf_Longformer          |    1    | 87.987154  |
|              maml               |    1    | 83.164694  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 66.974378  |
|       speech_transformer        |    1    | 51.750122  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.517135  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 44.821228  |
|           hf_Reformer           |    1    | 43.581853  |
|           densenet121           |    1    | 39.403072  |
|  timm_vision_transformer_large  |    1    | 35.338322  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 33.331118  |
|          fastNLP_Bert           |    1    | 30.531983  |
|          basic_gnn_gcn          |    1    | 29.580471  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.052349  |
|          hf_Bert_large          |    1    | 28.879965  |
|          hf_GPT2_large          |    1    | 27.804994  |
|            moondream            |    1    | 26.858045  |
|        hf_distil_whisper        |    1    | 26.788012  |
|       doctr_det_predictor       |    1    | 26.339764  |
|            resnet152            |    1    | 24.994459  |
|           Super_SloMo           |    1    | 24.917693  |
|      torch_multimodal_clip      |    1    | 24.249187  |
|              hf_T5              |    1    | 22.248969  |
|          BERT_pytorch           |    1    | 21.474616  |
|             hf_Bart             |    1    | 20.499072  |
|             yolov3              |    1    | 19.319208  |
|             demucs              |    1    | 17.984392  |
|           timm_regnet           |    1    | 17.431216  |
|        phlippe_densenet         |    1    | 17.195381  |
|             hf_Bert             |    1    | 16.977027  |
|       shufflenet_v2_x1_0        |    1    | 16.844522  |
|              llama              |    1    | 16.401082  |
|           timm_nfnet            |    1    | 16.043093  |
|             hf_GPT2             |    1    | 15.875063  |
|        timm_efficientnet        |    1    | 15.856803  |
|       Background_Matting        |    1    | 15.121902  |
|          timm_resnest           |    1    | 14.948469  |
|     timm_vision_transformer     |    1    | 14.784421  |
|       mobilenet_v3_large        |    1    | 14.775074  |
|            hf_Albert            |    1    | 14.562431  |
|      doctr_reco_predictor       |    1    | 14.478402  |
|           timm_vovnet           |    1    | 14.090637  |
|          mobilenet_v2           |    1    | 13.552212  |
|         resnext50_32x4d         |    1    | 13.545462  |
|            resnet50             |    1    | 13.542384  |
|           mnasnet1_0            |    1    | 13.161754  |
|          hf_DistilBert          |    1    | 12.863911  |
|     pyhpc_isoneutral_mixing     |    1    | 12.786835  |
|         opacus_cifar10          |    1    | 12.326751  |
|      functorch_dp_cifar10       |    1    | 11.812037  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.455828  |
|         pytorch_stargan         |   16    | 10.665411  |
|          pytorch_unet           |    1    | 10.445473  |
|            resnet18             |    1    |  9.965124  |
|          squeezenet1_1          |    1    |  9.770608  |
|         LearningToPaint         |    1    |  9.759374  |
|         phlippe_resnet          |    1    |  9.507196  |
|              vgg16              |    1    |  9.273207  |
|     pyhpc_equation_of_state     |    1    |  9.147152  |
|             alexnet             |    1    |  9.137479  |
|     functorch_maml_omniglot     |    1    |   8.5439   |
|     nvidia_deeprecommender      |    1    |  8.376834  |
|          maml_omniglot          |    5    |  8.245185  |
|              dlrm               |    1    |  8.158658  |
|              dcgan              |    1    |  7.744001  |
|         basic_gnn_sage          |    1    |  7.64422   |
|          basic_gnn_gin          |    1    |  7.634189  |
|          lennard_jones          |    1    |  7.508885  |
|        soft_actor_critic        |   256   |  7.479222  |
|        basic_gnn_edgecnn        |    1    |  7.447435  |
|           tts_angular           |    1    |  7.282914  |
|   mobilenet_v2_quantized_qat    |    1    |  0.098145  |
|     resnet50_quantized_qat      |    1    |  0.070682  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995293 |
|           hf_T5_base            |    1    | 0.988385 |
|              dlrm               |    1    | 0.987301 |
|          hf_GPT2_large          |    1    | 0.986352 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981088 |
|       Background_Matting        |    1    | 0.980947 |
|          pytorch_unet           |    1    | 0.974321 |
|        basic_gnn_edgecnn        |    1    | 0.972781 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972549 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.95879  |
|       doctr_det_predictor       |    1    | 0.95688  |
|      torch_multimodal_clip      |    1    | 0.956125 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955151 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.953384 |
|          basic_gnn_gin          |    1    | 0.950931 |
|     resnet50_quantized_qat      |    1    | 0.950729 |
|         vision_maskrcnn         |    1    |  0.9484  |
|         basic_gnn_sage          |    1    | 0.946558 |
|         pytorch_stargan         |   16    | 0.944856 |
|         LearningToPaint         |    1    | 0.944157 |
|           hf_BigBird            |    1    | 0.943342 |
|          basic_gnn_gcn          |    1    | 0.942247 |
|      doctr_reco_predictor       |    1    | 0.937867 |
|           Super_SloMo           |    1    | 0.935427 |
|        hf_distil_whisper        |    1    | 0.930537 |
|              llama              |    1    | 0.920469 |
|   mobilenet_v2_quantized_qat    |    1    | 0.919552 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.916375 |
|           tts_angular           |    1    | 0.888789 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885215 |
|        soft_actor_critic        |   256   | 0.884191 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.882227 |
|         opacus_cifar10          |    1    | 0.880396 |
|             yolov3              |    1    | 0.875912 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.859173 |
|           mnasnet1_0            |    1    | 0.857904 |
|          lennard_jones          |    1    | 0.857302 |
|              dcgan              |    1    | 0.856861 |
|        timm_efficientnet        |    1    | 0.85619  |
|          fastNLP_Bert           |    1    | 0.854414 |
|          squeezenet1_1          |    1    | 0.853755 |
|          maml_omniglot          |    5    |   0.85   |
|          timm_resnest           |    1    | 0.848977 |
|     functorch_maml_omniglot     |    1    | 0.848233 |
|       mobilenet_v3_large        |    1    | 0.84471  |
|          mobilenet_v2           |    1    | 0.843802 |
|     pyhpc_equation_of_state     |    1    | 0.839384 |
|       shufflenet_v2_x1_0        |    1    | 0.829291 |
|         phlippe_resnet          |    1    | 0.826638 |
|            moondream            |    1    | 0.825499 |
|           hf_T5_large           |    1    | 0.816318 |
|       speech_transformer        |    1    | 0.815673 |
|          hf_Bert_large          |    1    | 0.814292 |
|     pyhpc_isoneutral_mixing     |    1    | 0.813632 |
|           timm_nfnet            |    1    | 0.812891 |
|        phlippe_densenet         |    1    | 0.809619 |
|          hf_Longformer          |    1    | 0.804862 |
|             hf_Bert             |    1    | 0.804367 |
|     timm_vision_transformer     |    1    | 0.803201 |
|            hf_Albert            |    1    | 0.802449 |
|           timm_vovnet           |    1    | 0.798119 |
|         resnext50_32x4d         |    1    | 0.79645  |
|            resnet18             |    1    | 0.787454 |
|             hf_Bart             |    1    | 0.785831 |
|            resnet50             |    1    | 0.783159 |
|          hf_DistilBert          |    1    | 0.780343 |
|          BERT_pytorch           |    1    | 0.779025 |
|           timm_regnet           |    1    | 0.77689  |
|             hf_GPT2             |    1    | 0.769097 |
|           densenet121           |    1    | 0.744858 |
|      functorch_dp_cifar10       |    1    | 0.743202 |
|             alexnet             |    1    | 0.739623 |
|              hf_T5              |    1    | 0.738237 |
|  timm_vision_transformer_large  |    1    | 0.733276 |
|              vgg16              |    1    | 0.722635 |
|           hf_Reformer           |    1    | 0.721973 |
|            resnet152            |    1    | 0.717218 |
|              maml               |    1    | 0.709992 |
|     nvidia_deeprecommender      |    1    | 0.673288 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26701.937424 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11880.060635 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11322.329926 |
|          hf_GPT2_large          |    1    | 9927.884421  |
|           hf_T5_large           |    1    | 7549.666604  |
|            moondream            |    1    | 7373.854478  |
|        hf_distil_whisper        |    1    | 6738.371476  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5755.949018  |
|       Background_Matting        |    1    | 5255.918742  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5096.276046  |
|          pytorch_unet           |    1    | 4549.825608  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3189.091234  |
|  timm_vision_transformer_large  |    1    | 2803.971553  |
|         vision_maskrcnn         |    1    | 2652.122753  |
|    detectron2_fcos_r_50_fpn     |    1    | 2434.227201  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2420.165473  |
|             demucs              |    1    | 2315.314439  |
|         pytorch_stargan         |   16    | 2048.497726  |
|           Super_SloMo           |    1    | 1950.790115  |
|          hf_Bert_large          |    1    | 1764.302453  |
|           hf_BigBird            |    1    | 1583.031237  |
|       doctr_det_predictor       |    1    | 1576.100312  |
|      torch_multimodal_clip      |    1    | 1232.635788  |
|          hf_Longformer          |    1    | 1124.026768  |
|             hf_Bart             |    1    |  869.044689  |
|              hf_T5              |    1    |  761.589265  |
|       speech_transformer        |    1    |  676.551307  |
|             hf_Bert             |    1    |  667.429384  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  627.242669  |
|            hf_Albert            |    1    |  562.147835  |
|          fastNLP_Bert           |    1    |  513.529685  |
|             yolov3              |    1    |  434.02887   |
|          hf_DistilBert          |    1    |  414.959178  |
|             hf_GPT2             |    1    |  352.937833  |
|           hf_Reformer           |    1    |  289.86825   |
|        basic_gnn_edgecnn        |    1    |  235.953206  |
|              vgg16              |    1    |  191.756396  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  160.752025  |
|           timm_regnet           |    1    |  150.812648  |
|            resnet152            |    1    |  137.550446  |
|          BERT_pytorch           |    1    |  134.699785  |
|           timm_nfnet            |    1    |  97.873984   |
|              maml               |    1    |  85.746575   |
|           timm_vovnet           |    1    |  80.955012   |
|     nvidia_deeprecommender      |    1    |  58.471429   |
|     timm_vision_transformer     |    1    |  57.796203   |
|         resnext50_32x4d         |    1    |  57.462373   |
|           tts_angular           |    1    |  54.706935   |
|            resnet50             |    1    |  51.454723   |
|           densenet121           |    1    |  44.411628   |
|          timm_resnest           |    1    |  33.480653   |
|          basic_gnn_gcn          |    1    |  31.976932   |
|      doctr_reco_predictor       |    1    |  23.907805   |
|             alexnet             |    1    |  22.398099   |
|            resnet18             |    1    |  22.388656   |
|              llama              |    1    |  21.211782   |
|     resnet50_quantized_qat      |    1    |  19.625583   |
|         basic_gnn_sage          |    1    |  16.840143   |
|          basic_gnn_gin          |    1    |  16.579472   |
|        timm_efficientnet        |    1    |   13.16353   |
|         LearningToPaint         |    1    |  10.065082   |
|   mobilenet_v2_quantized_qat    |    1    |   8.82712    |
|           mnasnet1_0            |    1    |   7.498206   |
|          mobilenet_v2           |    1    |    7.4405    |
|       mobilenet_v3_large        |    1    |   6.996863   |
|          squeezenet1_1          |    1    |   5.805319   |
|       shufflenet_v2_x1_0        |    1    |   5.509719   |
|        phlippe_densenet         |    1    |   3.434459   |
|        soft_actor_critic        |   256   |   3.033256   |
|      functorch_dp_cifar10       |    1    |   2.097598   |
|         opacus_cifar10          |    1    |   2.082792   |
|              dcgan              |    1    |   1.768923   |
|         phlippe_resnet          |    1    |   1.340203   |
|     functorch_maml_omniglot     |    1    |   0.822764   |
|              dlrm               |    1    |   0.598527   |
|          maml_omniglot          |    5    |   0.587279   |
|     pyhpc_isoneutral_mixing     |    1    |   0.064039   |
|     pyhpc_equation_of_state     |    1    |   0.050779   |
|          lennard_jones          |    1    |   0.048152   |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.154906 |
|     MobileBertForQuestionAnswering      | 1  | 1.84121  |
|         Speech2Text2ForCausalLM         | 1  | 1.424509 |
|      GPT2ForSequenceClassification      | 1  | 1.366601 |
|            XLNetLMHeadModel             | 1  | 1.360971 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.351644 |
|          DistilBertForMaskedLM          | 1  | 1.332923 |
|     DistilBertForQuestionAnswering      | 1  | 1.322214 |
|            YituTechConvBert             | 1  | 1.32126  |
|       BlenderbotSmallForCausalLM        | 1  | 1.313146 |
|           ElectraForCausalLM            | 1  | 1.287846 |
|       DebertaForQuestionAnswering       | 1  | 1.284313 |
|       ElectraForQuestionAnswering       | 1  | 1.28025  |
|     M2M100ForConditionalGeneration      | 1  | 1.269737 |
|     PegasusForConditionalGeneration     | 1  | 1.263244 |
|           PegasusForCausalLM            | 1  | 1.255984 |
|           DebertaForMaskedLM            | 1  | 1.255223 |
|          BlenderbotForCausalLM          | 1  | 1.252969 |
|             XGLMForCausalLM             | 1  | 1.251126 |
|               GoogleFnet                | 1  | 1.240797 |
|    LayoutLMForSequenceClassification    | 1  | 1.225412 |
|       MT5ForConditionalGeneration       | 1  | 1.215931 |
|               DistillGPT2               | 1  | 1.213275 |
|           LayoutLMForMaskedLM           | 1  | 1.209796 |
|                CamemBert                | 1  | 1.207375 |
|             BertForMaskedLM             | 1  | 1.204515 |
|       RobertaForQuestionAnswering       | 1  | 1.198099 |
|            AlbertForMaskedLM            | 1  | 1.19241  |
|       AlbertForQuestionAnswering        | 1  | 1.192183 |
|        BertForQuestionAnswering         | 1  | 1.191366 |
|    MegatronBertForQuestionAnswering     | 1  | 1.17874  |
|           RobertaForCausalLM            | 1  | 1.178333 |
|         MegatronBertForCausalLM         | 1  | 1.169316 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.159604 |
|          DebertaV2ForMaskedLM           | 1  | 1.158108 |
|            TrOCRForCausalLM             | 1  | 1.145073 |
|     PLBartForConditionalGeneration      | 1  | 1.107703 |
|      MBartForConditionalGeneration      | 1  | 1.090526 |
|             BartForCausalLM             | 1  | 1.064013 |
|      BartForConditionalGeneration       | 1  | 1.061648 |
|            PLBartForCausalLM            | 1  | 1.023887 |
|             OPTForCausalLM              | 1  | 1.019828 |
|            MBartForCausalLM             | 1  | 1.009299 |
|          AllenaiLongformerBase          | 1  | 0.954534 |
|       T5ForConditionalGeneration        | 1  | 0.619278 |
|                 T5Small                 | 1  | 0.61187  |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 99.027091 |
|     MobileBertForQuestionAnswering      | 1  | 69.46541  |
|          MobileBertForMaskedLM          | 1  | 69.435394 |
|     M2M100ForConditionalGeneration      | 1  | 41.017072 |
|     PegasusForConditionalGeneration     | 1  | 40.945796 |
|      MBartForConditionalGeneration      | 1  | 39.287446 |
|      BartForConditionalGeneration       | 1  | 35.757967 |
|             XGLMForCausalLM             | 1  | 33.838638 |
|          BlenderbotForCausalLM          | 1  | 32.960642 |
|            XLNetLMHeadModel             | 1  | 32.937014 |
|      DebertaV2ForQuestionAnswering      | 1  | 31.14765  |
|          DebertaV2ForMaskedLM           | 1  | 30.989671 |
|         MegatronBertForCausalLM         | 1  | 29.435086 |
| BlenderbotSmallForConditionalGeneration | 1  | 29.19581  |
|    MegatronBertForQuestionAnswering     | 1  | 29.109654 |
|       MT5ForConditionalGeneration       | 1  | 28.149746 |
|                 T5Small                 | 1  | 26.801587 |
|       T5ForConditionalGeneration        | 1  | 26.786923 |
|            YituTechConvBert             | 1  | 26.004369 |
|     PLBartForConditionalGeneration      | 1  | 22.361004 |
|            MBartForCausalLM             | 1  | 19.982959 |
|           PegasusForCausalLM            | 1  | 19.678478 |
|            TrOCRForCausalLM             | 1  | 19.538153 |
|             OPTForCausalLM              | 1  | 19.188112 |
|           DebertaForMaskedLM            | 1  | 18.262167 |
|           ElectraForCausalLM            | 1  | 18.147722 |
|       DebertaForQuestionAnswering       | 1  | 17.986728 |
|       ElectraForQuestionAnswering       | 1  | 17.94973  |
|           LayoutLMForMaskedLM           | 1  | 17.60418  |
|           RobertaForCausalLM            | 1  | 17.498605 |
|                CamemBert                | 1  | 17.464348 |
|             BertForMaskedLM             | 1  | 17.405721 |
|       RobertaForQuestionAnswering       | 1  | 17.404312 |
|    LayoutLMForSequenceClassification    | 1  | 17.289393 |
|        BertForQuestionAnswering         | 1  | 17.191958 |
|             BartForCausalLM             | 1  | 16.96982  |
|       BlenderbotSmallForCausalLM        | 1  | 15.494947 |
|      GPT2ForSequenceClassification      | 1  | 14.330494 |
|               GoogleFnet                | 1  | 14.267249 |
|         Speech2Text2ForCausalLM         | 1  | 13.554009 |
|          DistilBertForMaskedLM          | 1  | 13.340199 |
|            PLBartForCausalLM            | 1  | 13.337876 |
|     DistilBertForQuestionAnswering      | 1  | 13.178801 |
|               DistillGPT2               | 1  | 11.577226 |
|            AlbertForMaskedLM            | 1  | 9.370421  |
|       AlbertForQuestionAnswering        | 1  | 6.921949  |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986477 |
|      MBartForConditionalGeneration      | 1  | 0.974926 |
|      GPT2ForSequenceClassification      | 1  | 0.964131 |
|                 T5Small                 | 1  | 0.955267 |
|       T5ForConditionalGeneration        | 1  | 0.954695 |
|          AllenaiLongformerBase          | 1  | 0.948903 |
|            XLNetLMHeadModel             | 1  | 0.907816 |
|            PLBartForCausalLM            | 1  | 0.901257 |
|     PLBartForConditionalGeneration      | 1  | 0.900189 |
|            MBartForCausalLM             | 1  | 0.887269 |
|       DebertaForQuestionAnswering       | 1  | 0.88108  |
|      BartForConditionalGeneration       | 1  | 0.875579 |
|               GoogleFnet                | 1  | 0.856848 |
|       RobertaForQuestionAnswering       | 1  | 0.853488 |
|    LayoutLMForSequenceClassification    | 1  | 0.847368 |
|        BertForQuestionAnswering         | 1  | 0.847273 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.842767 |
|       ElectraForQuestionAnswering       | 1  | 0.840098 |
|    MegatronBertForQuestionAnswering     | 1  | 0.839696 |
|           DebertaForMaskedLM            | 1  | 0.830114 |
|               DistillGPT2               | 1  | 0.82785  |
|          DebertaV2ForMaskedLM           | 1  | 0.823484 |
|           LayoutLMForMaskedLM           | 1  | 0.818182 |
|         MegatronBertForCausalLM         | 1  | 0.816809 |
|             BertForMaskedLM             | 1  | 0.815244 |
|                CamemBert                | 1  | 0.813465 |
|           RobertaForCausalLM            | 1  | 0.812596 |
|             BartForCausalLM             | 1  | 0.808275 |
|            YituTechConvBert             | 1  | 0.808237 |
|         Speech2Text2ForCausalLM         | 1  | 0.804896 |
|           ElectraForCausalLM            | 1  | 0.803483 |
|     DistilBertForQuestionAnswering      | 1  | 0.799672 |
|          BlenderbotForCausalLM          | 1  | 0.797984 |
|            TrOCRForCausalLM             | 1  | 0.786958 |
|       MT5ForConditionalGeneration       | 1  | 0.780766 |
|       BlenderbotSmallForCausalLM        | 1  | 0.764544 |
|           PegasusForCausalLM            | 1  | 0.753918 |
|          DistilBertForMaskedLM          | 1  | 0.745944 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.738604 |
|     M2M100ForConditionalGeneration      | 1  | 0.720781 |
|     PegasusForConditionalGeneration     | 1  | 0.712606 |
|     MobileBertForQuestionAnswering      | 1  | 0.711235 |
|             XGLMForCausalLM             | 1  | 0.705199 |
|          MobileBertForMaskedLM          | 1  | 0.695156 |
|            AlbertForMaskedLM            | 1  | 0.44867  |
|       AlbertForQuestionAnswering        | 1  | 0.443654 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12987.856738 |
|       AlbertForQuestionAnswering        | 1  | 12947.254334 |
|      MBartForConditionalGeneration      | 1  | 6075.137233  |
|      BartForConditionalGeneration       | 1  |  5533.00002  |
|             OPTForCausalLM              | 1  | 5326.215524  |
|          DebertaV2ForMaskedLM           | 1  | 5071.555743  |
|      DebertaV2ForQuestionAnswering      | 1  | 3997.018139  |
|            XLNetLMHeadModel             | 1  |  3200.71733  |
|            MBartForCausalLM             | 1  | 3084.033919  |
|          BlenderbotForCausalLM          | 1  |  2650.62404  |
|       T5ForConditionalGeneration        | 1  | 2558.870628  |
|                 T5Small                 | 1  | 2522.740459  |
|             BartForCausalLM             | 1  | 2521.797424  |
|          AllenaiLongformerBase          | 1  | 2465.419342  |
|     PLBartForConditionalGeneration      | 1  | 2161.776766  |
|         MegatronBertForCausalLM         | 1  | 2047.498854  |
|    MegatronBertForQuestionAnswering     | 1  | 1856.916499  |
|      GPT2ForSequenceClassification      | 1  | 1276.088199  |
|            PLBartForCausalLM            | 1  | 1234.975934  |
|             XGLMForCausalLM             | 1  |  835.961916  |
|           DebertaForMaskedLM            | 1  |  788.612636  |
|           RobertaForCausalLM            | 1  |  768.387993  |
|     M2M100ForConditionalGeneration      | 1  |  716.357733  |
|            YituTechConvBert             | 1  |  686.309348  |
|                CamemBert                | 1  |  681.660683  |
|             BertForMaskedLM             | 1  |  674.658878  |
|           LayoutLMForMaskedLM           | 1  |  673.922019  |
|     PegasusForConditionalGeneration     | 1  |  606.081203  |
|            TrOCRForCausalLM             | 1  |  598.527125  |
|       DebertaForQuestionAnswering       | 1  |  556.169685  |
|        BertForQuestionAnswering         | 1  |  537.841024  |
|    LayoutLMForSequenceClassification    | 1  |  533.923419  |
|       RobertaForQuestionAnswering       | 1  |  533.44491   |
|               DistillGPT2               | 1  |   498.7695   |
|               GoogleFnet                | 1  |  475.962318  |
|       MT5ForConditionalGeneration       | 1  |  394.786492  |
|           PegasusForCausalLM            | 1  |  301.051424  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.281645  |
|           ElectraForCausalLM            | 1  |  130.001207  |
|          DistilBertForMaskedLM          | 1  |  99.988219   |
|       ElectraForQuestionAnswering       | 1  |  89.113726   |
|       BlenderbotSmallForCausalLM        | 1  |  84.584273   |
|          MobileBertForMaskedLM          | 1  |  67.698582   |
|     DistilBertForQuestionAnswering      | 1  |   64.2093    |
|     MobileBertForQuestionAnswering      | 1  |  40.026507   |
|         Speech2Text2ForCausalLM         | 1  |  18.669413   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.444028 |
|          inception_v3           | 1  | 2.431283 |
|       gluon_inception_v3        | 1  | 2.407216 |
|        adv_inception_v3         | 1  | 2.385139 |
|         mobilenetv2_100         | 1  | 2.256402 |
|           dm_nfnet_f0           | 1  | 2.229605 |
|            nfnet_l0             | 1  | 2.229005 |
|          ghostnet_100           | 1  | 2.225429 |
|            lcnet_050            | 1  | 2.224921 |
|          spnasnet_100           | 1  | 2.203058 |
|            levit_128            | 1  | 2.180414 |
|           mnasnet_100           | 1  | 2.145568 |
|           fbnetc_100            | 1  | 2.118038 |
|           regnety_002           | 1  | 2.11239  |
|      mobilenetv3_large_100      | 1  | 2.055031 |
|            repvgg_a2            | 1  | 1.970993 |
|            fbnetv3_b            | 1  | 1.892053 |
|       tf_efficientnet_b0        | 1  | 1.867669 |
|           rexnet_100            | 1  | 1.855469 |
|            tinynet_a            | 1  | 1.818844 |
|           selecsls42b           | 1  | 1.805642 |
|         poolformer_m36          | 1  | 1.793386 |
|             dla102              | 1  | 1.743937 |
|           mobilevit_s           | 1  | 1.735932 |
|        ese_vovnet19b_dw         | 1  | 1.732804 |
|          botnet26t_256          | 1  | 1.671204 |
|       eca_botnext26ts_256       | 1  | 1.66072  |
|        eca_halonext26ts         | 1  | 1.649699 |
|          cspdarknet53           | 1  |  1.6198  |
|        res2net50_14w_8s         | 1  | 1.610719 |
|           res2next50            | 1  | 1.597608 |
|           volo_d1_224           | 1  | 1.597151 |
|           tf_mixnet_l           | 1  | 1.556904 |
|         coat_lite_mini          | 1  | 1.551706 |
|        res2net101_26w_4s        | 1  | 1.543314 |
|        twins_pcpvt_base         | 1  | 1.437524 |
|         visformer_small         | 1  | 1.43318  |
|           convit_base           | 1  | 1.415318 |
|          gmixer_24_224          | 1  | 1.388536 |
|            mixnet_l             | 1  | 1.376978 |
|          jx_nest_base           | 1  | 1.360887 |
|            gernet_l             | 1  | 1.351203 |
|      beit_base_patch16_224      | 1  | 1.318873 |
|          resmlp_12_224          | 1  | 1.309972 |
|         crossvit_9_240          | 1  | 1.307714 |
|  swin_base_patch4_window7_224   | 1  | 1.296289 |
|        tnt_s_patch16_224        | 1  | 1.275623 |
|        convmixer_768_32         | 1  | 1.250946 |
|          gmlp_s16_224           | 1  | 1.243745 |
| deit_base_distilled_patch16_224 | 1  | 1.216592 |
|          mixer_b16_224          | 1  | 1.211152 |
|            pit_b_224            | 1  | 1.20142  |
|             dpn107              | 1  | 1.198487 |
|      vit_base_patch16_224       | 1  | 1.197699 |
|      xcit_large_24_p8_224       | 1  | 1.185749 |
|          convnext_base          | 1  | 1.179437 |
|          cait_m36_384           | 1  | 1.051781 |
|           resnest101e           | 1  | 0.992064 |
|        sebotnet33ts_256         | 1  | 0.97519  |
|            hrnet_w18            | 1  | 0.616807 |
|     swsl_resnext101_32x16d      | 1  | 0.06733  |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|     swsl_resnext101_32x16d      | 1  | 83.289372 |
|          cait_m36_384           | 1  | 55.436026 |
|            hrnet_w18            | 1  | 53.075793 |
|      xcit_large_24_p8_224       | 1  | 51.622806 |
|          pnasnet5large          | 1  | 48.832631 |
|  swin_base_patch4_window7_224   | 1  | 46.82526  |
|         poolformer_m36          | 1  | 42.952853 |
|        twins_pcpvt_base         | 1  | 36.43503  |
|        res2net101_26w_4s        | 1  | 36.410605 |
|          jx_nest_base           | 1  | 35.557811 |
|           resnest101e           | 1  | 34.846949 |
|        tnt_s_patch16_224        | 1  | 34.59774  |
|        res2net50_14w_8s         | 1  | 33.760937 |
|             dpn107              | 1  | 31.898478 |
|           tf_mixnet_l           | 1  | 29.280297 |
|           mobilevit_s           | 1  | 29.276553 |
|        adv_inception_v3         | 1  | 27.599857 |
|           volo_d1_224           | 1  | 27.565298 |
|            mixnet_l             | 1  | 27.388005 |
|          gmixer_24_224          | 1  | 25.621756 |
|          inception_v3           | 1  | 25.564084 |
|       gluon_inception_v3        | 1  | 25.507812 |
|         crossvit_9_240          | 1  | 25.141988 |
|            levit_128            | 1  | 24.463524 |
|          gmlp_s16_224           | 1  | 24.299088 |
|           res2next50            | 1  | 22.846028 |
|          convnext_base          | 1  | 22.749969 |
|        eca_halonext26ts         | 1  | 21.683907 |
|            fbnetv3_b            | 1  | 21.523357 |
|        sebotnet33ts_256         | 1  | 21.123631 |
|             dla102              | 1  | 20.675347 |
|          ghostnet_100           | 1  | 20.211924 |
|         coat_lite_mini          | 1  | 20.090267 |
|           rexnet_100            | 1  | 19.975615 |
|           convit_base           | 1  | 18.702579 |
|       eca_botnext26ts_256       | 1  |  17.7244  |
|            tinynet_a            | 1  | 17.563572 |
|         visformer_small         | 1  | 17.147082 |
|            pit_b_224            | 1  | 16.859476 |
|        convmixer_768_32         | 1  | 16.804639 |
|          botnet26t_256          | 1  | 16.788324 |
|       tf_efficientnet_b0        | 1  | 16.537651 |
|          cspdarknet53           | 1  | 16.429006 |
|          mixer_b16_224          | 1  | 15.975652 |
|           dm_nfnet_f0           | 1  | 15.91341  |
|      beit_base_patch16_224      | 1  | 15.855432 |
| deit_base_distilled_patch16_224 | 1  | 14.855424 |
|      vit_base_patch16_224       | 1  | 14.761556 |
|           fbnetc_100            | 1  | 14.492386 |
|            nfnet_l0             | 1  | 14.490644 |
|            repvgg_a2            | 1  | 14.481643 |
|           regnety_002           | 1  | 14.466556 |
|          spnasnet_100           | 1  | 14.386639 |
|      mobilenetv3_large_100      | 1  | 14.382168 |
|            gernet_l             | 1  | 13.935906 |
|         mobilenetv2_100         | 1  | 13.275546 |
|           mnasnet_100           | 1  | 13.059503 |
|        ese_vovnet19b_dw         | 1  | 12.984251 |
|           selecsls42b           | 1  | 12.388576 |
|          resmlp_12_224          | 1  | 11.798247 |
|            lcnet_050            | 1  | 11.294318 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.951857 |
|          pnasnet5large          | 1  | 0.908851 |
|        convmixer_768_32         | 1  | 0.904675 |
|            nfnet_l0             | 1  | 0.891682 |
|      xcit_large_24_p8_224       | 1  | 0.888186 |
|        ese_vovnet19b_dw         | 1  | 0.881347 |
|        eca_halonext26ts         | 1  | 0.867074 |
|         mobilenetv2_100         | 1  | 0.864097 |
|           mnasnet_100           | 1  | 0.863756 |
|            lcnet_050            | 1  | 0.860986 |
|            fbnetv3_b            | 1  | 0.860576 |
|       tf_efficientnet_b0        | 1  | 0.860566 |
|       eca_botnext26ts_256       | 1  | 0.860412 |
|          botnet26t_256          | 1  | 0.857255 |
|           dm_nfnet_f0           | 1  | 0.853925 |
|          spnasnet_100           | 1  | 0.853357 |
|      mobilenetv3_large_100      | 1  | 0.851627 |
|            tinynet_a            | 1  | 0.846374 |
|           fbnetc_100            | 1  | 0.845366 |
|           rexnet_100            | 1  | 0.842578 |
|           mobilevit_s           | 1  | 0.837811 |
|           tf_mixnet_l           | 1  | 0.836847 |
|          cspdarknet53           | 1  | 0.836522 |
|          ghostnet_100           | 1  | 0.830287 |
|           regnety_002           | 1  | 0.829278 |
|          resmlp_12_224          | 1  | 0.823586 |
|         visformer_small         | 1  | 0.819455 |
|        sebotnet33ts_256         | 1  | 0.818843 |
|         coat_lite_mini          | 1  | 0.817585 |
|             dpn107              | 1  | 0.814227 |
|            mixnet_l             | 1  | 0.813197 |
|         poolformer_m36          | 1  | 0.804592 |
|            levit_128            | 1  | 0.802766 |
|          convnext_base          | 1  | 0.800906 |
|          gmlp_s16_224           | 1  | 0.79934  |
|           resnest101e           | 1  | 0.795249 |
|       gluon_inception_v3        | 1  | 0.794749 |
|        adv_inception_v3         | 1  | 0.794437 |
|             dla102              | 1  | 0.794383 |
|          inception_v3           | 1  | 0.793039 |
|           res2next50            | 1  | 0.792494 |
|           volo_d1_224           | 1  | 0.791543 |
|           selecsls42b           | 1  | 0.787935 |
|          gmixer_24_224          | 1  | 0.787016 |
|         crossvit_9_240          | 1  | 0.786094 |
|        tnt_s_patch16_224        | 1  | 0.782899 |
|           convit_base           | 1  | 0.780505 |
|        twins_pcpvt_base         | 1  | 0.780276 |
|          jx_nest_base           | 1  | 0.777531 |
|          mixer_b16_224          | 1  | 0.775982 |
|            hrnet_w18            | 1  | 0.774648 |
|      beit_base_patch16_224      | 1  | 0.77325  |
|        res2net50_14w_8s         | 1  | 0.771448 |
| deit_base_distilled_patch16_224 | 1  | 0.765953 |
|      vit_base_patch16_224       | 1  | 0.76356  |
|            pit_b_224            | 1  | 0.752996 |
|  swin_base_patch4_window7_224   | 1  | 0.740699 |
|            repvgg_a2            | 1  | 0.737538 |
|        res2net101_26w_4s        | 1  | 0.736697 |
|            gernet_l             | 1  | 0.731609 |
|     swsl_resnext101_32x16d      | 1  | 0.653634 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13590.315832 |
|          cait_m36_384           | 1  | 3458.043158  |
|      xcit_large_24_p8_224       | 1  | 1568.376896  |
|           resnest101e           | 1  |  1088.58798  |
|          pnasnet5large          | 1  |  372.675981  |
|          convnext_base          | 1  |  304.617753  |
|            hrnet_w18            | 1  |  292.527123  |
|             dpn107              | 1  |  268.872783  |
|        convmixer_768_32         | 1  |  250.985414  |
|          jx_nest_base           | 1  |  249.751779  |
|  swin_base_patch4_window7_224   | 1  |  224.523219  |
|           convit_base           | 1  |  199.714455  |
|      vit_base_patch16_224       | 1  |  199.053371  |
|      beit_base_patch16_224      | 1  |  197.242316  |
| deit_base_distilled_patch16_224 | 1  |  196.916682  |
|            pit_b_224            | 1  |  166.343667  |
|           dm_nfnet_f0           | 1  |  159.485496  |
|          mixer_b16_224          | 1  |  143.152761  |
|         poolformer_m36          | 1  |  120.913124  |
|        res2net101_26w_4s        | 1  |  112.683305  |
|        twins_pcpvt_base         | 1  |  103.438295  |
|        sebotnet33ts_256         | 1  |  99.196291   |
|            nfnet_l0             | 1  |  94.804351   |
|           volo_d1_224           | 1  |  92.118476   |
|        tnt_s_patch16_224        | 1  |  92.028394   |
|             dla102              | 1  |  91.202423   |
|          cspdarknet53           | 1  |  84.327447   |
|       gluon_inception_v3        | 1  |  69.926859   |
|        adv_inception_v3         | 1  |  69.827052   |
|          inception_v3           | 1  |  69.807114   |
|          gmlp_s16_224           | 1  |  68.587112   |
|         visformer_small         | 1  |  67.818246   |
|          gmixer_24_224          | 1  |  65.816295   |
|            repvgg_a2            | 1  |  64.857369   |
|        res2net50_14w_8s         | 1  |  64.192583   |
|           res2next50            | 1  |  59.342796   |
|            gernet_l             | 1  |  57.170057   |
|          botnet26t_256          | 1  |  46.585674   |
|           selecsls42b           | 1  |  46.151978   |
|        eca_halonext26ts         | 1  |  42.833561   |
|       eca_botnext26ts_256       | 1  |  41.893318   |
|         coat_lite_mini          | 1  |  37.828528   |
|           mobilevit_s           | 1  |  36.712708   |
|          resmlp_12_224          | 1  |  36.270341   |
|        ese_vovnet19b_dw         | 1  |  31.794687   |
|         crossvit_9_240          | 1  |  31.208677   |
|            mixnet_l             | 1  |  30.954366   |
|           tf_mixnet_l           | 1  |  29.278027   |
|            fbnetv3_b            | 1  |  15.523047   |
|       tf_efficientnet_b0        | 1  |  13.550459   |
|           rexnet_100            | 1  |  13.365658   |
|            tinynet_a            | 1  |   11.75147   |
|           fbnetc_100            | 1  |   9.211103   |
|            levit_128            | 1  |   8.999678   |
|          ghostnet_100           | 1  |   8.593376   |
|          spnasnet_100           | 1  |   8.072937   |
|           mnasnet_100           | 1  |   7.509885   |
|         mobilenetv2_100         | 1  |   7.358443   |
|      mobilenetv3_large_100      | 1  |   7.065596   |
|           regnety_002           | 1  |   6.031088   |
|            lcnet_050            | 1  |   2.40423    |
+---------------------------------+----+--------------+

zxd1997066 · 2024-10-21T22:50:44Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-20 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.50x    |    1.30x    |    1.83x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   31.30    |    34.58    |    36.03    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 16.778469 |
|           mnasnet1_0            |   32    | 2.975402  |
|       mobilenet_v3_large        |   32    | 2.935324  |
|          squeezenet1_1          |   16    | 2.920795  |
|        timm_efficientnet        |   64    | 2.899883  |
|          mobilenet_v2           |   16    | 2.679066  |
|       shufflenet_v2_x1_0        |   64    | 2.482035  |
|          timm_resnest           |   32    | 2.319878  |
|            resnet50             |   32    | 2.281334  |
|        phlippe_densenet         |   128   | 2.110345  |
|            resnet152            |   32    | 2.060169  |
|             hf_GPT2             |    1    | 2.030706  |
|           densenet121           |   64    | 1.996659  |
|         resnext50_32x4d         |    8    | 1.919871  |
|       doctr_det_predictor       |    1    | 1.914729  |
|           timm_nfnet            |   128   | 1.892593  |
|           timm_regnet           |   32    | 1.883972  |
|         phlippe_resnet          |   128   | 1.874243  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.772188  |
|            resnet18             |    8    | 1.743574  |
|      doctr_reco_predictor       |    1    | 1.707025  |
|            hf_Albert            |    1    | 1.675774  |
|             alexnet             |   128   | 1.660347  |
|     functorch_maml_omniglot     |    1    | 1.642605  |
|          fastNLP_Bert           |    1    | 1.633071  |
|              llama              |   32    |  1.63093  |
|          hf_Bert_large          |    1    |  1.62781  |
|          hf_GPT2_large          |    1    | 1.620296  |
|           timm_vovnet           |   32    | 1.586907  |
|            moondream            |    1    | 1.568509  |
|             hf_Bert             |    1    | 1.560045  |
|        basic_gnn_edgecnn        |    1    | 1.547425  |
|             yolov3              |    8    | 1.545263  |
|          hf_Longformer          |    1    | 1.542781  |
|         LearningToPaint         |   96    | 1.463049  |
|          hf_DistilBert          |    1    | 1.434673  |
|          BERT_pytorch           |    2    | 1.377795  |
|         vision_maskrcnn         |    1    | 1.353986  |
|             hf_Bart             |    1    |  1.35042  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.332604  |
|           hf_Reformer           |    1    | 1.329043  |
|              dcgan              |   256   | 1.328404  |
|          basic_gnn_gcn          |    1    |  1.31847  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.312291  |
|           hf_T5_large           |    1    | 1.298287  |
|              vgg16              |    4    | 1.286491  |
|              hf_T5              |    1    | 1.283022  |
|         basic_gnn_sage          |    1    | 1.264072  |
|          maml_omniglot          |    5    | 1.263167  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.262361  |
|          pytorch_unet           |    1    | 1.248001  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.247114  |
|        hf_distil_whisper        |    1    | 1.230823  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  1.21595  |
|         pytorch_stargan         |   16    | 1.212531  |
|              dlrm               |  2048   | 1.206426  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.201172  |
|           hf_BigBird            |    1    | 1.190642  |
|          basic_gnn_gin          |    1    | 1.185721  |
|       speech_transformer        |    1    | 1.149756  |
|       Background_Matting        |    1    | 1.147251  |
|           Super_SloMo           |    6    |  1.1337   |
|      torch_multimodal_clip      |   32    | 1.120298  |
|     timm_vision_transformer     |   32    | 1.115633  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.087563  |
|        soft_actor_critic        |   256   | 1.087419  |
|     nvidia_deeprecommender      |   256   |  1.05558  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  1.05173  |
|         opacus_cifar10          |   64    | 1.034296  |
|          lennard_jones          |  1000   | 1.012419  |
|  timm_vision_transformer_large  |   32    | 1.012403  |
|             demucs              |    1    | 1.011679  |
|           tts_angular           |   64    | 0.999422  |
|   mobilenet_v2_quantized_qat    |   96    | 0.998617  |
|     resnet50_quantized_qat      |   32    | 0.991422  |
|      functorch_dp_cifar10       |   64    | 0.951918  |
|           hf_T5_base            |    1    | 0.886295  |
|              maml               |    1    | 0.783181  |
|     pyhpc_isoneutral_mixing     | 1048576 |  0.74859  |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 154.460287 |
|         vision_maskrcnn         |    1    | 146.198533 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 121.999393 |
|    detectron2_fcos_r_50_fpn     |    1    | 120.460917 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 103.648529 |
|          hf_Longformer          |    1    | 86.744095  |
|           hf_T5_large           |    1    | 84.458448  |
|              maml               |    1    | 82.263184  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 58.062641  |
|           densenet121           |   64    | 53.164427  |
|       speech_transformer        |    1    | 51.656397  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 51.330131  |
|  timm_vision_transformer_large  |   32    | 49.817425  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 47.951413  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 46.259885  |
|           hf_Reformer           |    1    |  43.65674  |
|      torch_multimodal_clip      |   32    | 38.867922  |
|           Super_SloMo           |    6    | 38.619052  |
|     pyhpc_isoneutral_mixing     | 1048576 | 38.030339  |
|           hf_T5_base            |    1    | 37.905819  |
|          hf_GPT2_large          |    1    | 37.374612  |
|            resnet152            |   32    | 36.882826  |
|            moondream            |    1    | 35.603965  |
|          fastNLP_Bert           |    1    | 30.776305  |
|          hf_Bert_large          |    1    | 30.604353  |
|        hf_distil_whisper        |    1    | 30.134781  |
|          basic_gnn_gcn          |    1    |  29.57913  |
|             yolov3              |    8    | 29.535235  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.434981  |
|          BERT_pytorch           |    2    | 29.344644  |
|       doctr_det_predictor       |    1    | 28.832861  |
|           timm_regnet           |   32    | 23.947938  |
|        timm_efficientnet        |   64    | 23.267981  |
|       mobilenet_v3_large        |   32    | 22.704849  |
|       shufflenet_v2_x1_0        |   64    | 22.631619  |
|        phlippe_densenet         |   128   | 22.616495  |
|             hf_Bart             |    1    | 21.558625  |
|              hf_T5              |    1    | 21.386523  |
|          mobilenet_v2           |   16    | 20.381832  |
|           timm_nfnet            |   128   | 19.402891  |
|     timm_vision_transformer     |   32    | 19.081458  |
|          timm_resnest           |   32    | 18.498784  |
|           timm_vovnet           |   32    | 18.388672  |
|         resnext50_32x4d         |    8    | 17.878537  |
|             demucs              |    1    | 17.861792  |
|            resnet50             |   32    | 17.633714  |
|             hf_Bert             |    1    | 17.604629  |
|       Background_Matting        |    1    | 17.193368  |
|           mnasnet1_0            |   32    | 16.553249  |
|              llama              |   32    | 16.334246  |
|             hf_GPT2             |    1    | 16.261677  |
|         pytorch_stargan         |   16    | 16.261585  |
|         opacus_cifar10          |   64    |  15.87509  |
|            hf_Albert            |    1    | 15.642339  |
|      functorch_dp_cifar10       |   64    | 15.334263  |
|      doctr_reco_predictor       |    1    | 14.243977  |
|          hf_DistilBert          |    1    | 13.273493  |
|          pytorch_unet           |    1    | 13.036442  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.950692  |
|            resnet18             |    8    | 11.937818  |
|          squeezenet1_1          |   16    | 11.581193  |
|         LearningToPaint         |   96    | 11.490214  |
|         phlippe_resnet          |   128   | 11.034486  |
|              vgg16              |    4    | 10.137568  |
|     pyhpc_equation_of_state     | 1048576 |  9.75314   |
|             alexnet             |   128   |  9.697547  |
|          maml_omniglot          |    5    |  8.99341   |
|     functorch_maml_omniglot     |    1    |  8.425715  |
|              dlrm               |  2048   |  8.228181  |
|              dcgan              |   256   |  8.046879  |
|        basic_gnn_edgecnn        |    1    |  7.816628  |
|          basic_gnn_gin          |    1    |  7.807539  |
|     nvidia_deeprecommender      |   256   |   7.7498   |
|         basic_gnn_sage          |    1    |  7.732394  |
|        soft_actor_critic        |   256   |  7.399466  |
|          lennard_jones          |  1000   |  7.288101  |
|           tts_angular           |   64    |  7.233957  |
|   mobilenet_v2_quantized_qat    |   96    |  0.126933  |
|     resnet50_quantized_qat      |   32    |  0.09814   |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.995872 |
|           timm_nfnet            |   128   | 0.991575 |
|              dlrm               |  2048   | 0.988017 |
|           hf_T5_base            |    1    | 0.987905 |
|          hf_GPT2_large          |    1    | 0.984887 |
|        timm_efficientnet        |   64    | 0.981275 |
|       Background_Matting        |    1    | 0.980878 |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  0.9802  |
|     nvidia_deeprecommender      |   256   | 0.977492 |
|             demucs              |    1    | 0.977302 |
|           densenet121           |   64    | 0.975427 |
|           timm_regnet           |   32    | 0.974061 |
|        basic_gnn_edgecnn        |    1    | 0.973993 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973132 |
|           Super_SloMo           |    6    | 0.972736 |
|          pytorch_unet           |    1    | 0.972408 |
|      torch_multimodal_clip      |   32    | 0.972352 |
|             yolov3              |    8    | 0.966844 |
|          timm_resnest           |   32    | 0.963928 |
|            resnet50             |   32    | 0.963072 |
|            resnet152            |   32    | 0.962987 |
|         LearningToPaint         |   96    | 0.961696 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.959773 |
|     timm_vision_transformer     |   32    | 0.959698 |
|           timm_vovnet           |   32    | 0.959194 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.958303 |
|       doctr_det_predictor       |    1    | 0.957522 |
|     resnet50_quantized_qat      |   32    | 0.956805 |
|   mobilenet_v2_quantized_qat    |   96    | 0.956641 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955885 |
|              vgg16              |    4    | 0.95575  |
|             alexnet             |   128   | 0.95532  |
|           mnasnet1_0            |   32    | 0.954775 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95313  |
|         vision_maskrcnn         |    1    | 0.94962  |
|       mobilenet_v3_large        |   32    | 0.949213 |
|           hf_BigBird            |    1    | 0.946104 |
|         pytorch_stargan         |   16    | 0.943351 |
|       shufflenet_v2_x1_0        |   64    | 0.942358 |
|          mobilenet_v2           |   16    | 0.942286 |
|          basic_gnn_gcn          |    1    | 0.941176 |
|         resnext50_32x4d         |    8    | 0.940192 |
|        phlippe_densenet         |   128   | 0.939366 |
|      doctr_reco_predictor       |    1    | 0.938333 |
|              llama              |   32    | 0.929087 |
|           tts_angular           |   64    | 0.921591 |
|        hf_distil_whisper        |    1    | 0.918414 |
|          BERT_pytorch           |    2    | 0.91545  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.911765 |
|     pyhpc_equation_of_state     | 1048576 | 0.910871 |
|          squeezenet1_1          |   16    | 0.909846 |
|              dcgan              |   256   | 0.908595 |
|            resnet18             |    8    | 0.893437 |
|         phlippe_resnet          |   128   | 0.88847  |
|         opacus_cifar10          |   64    | 0.885816 |
|        soft_actor_critic        |   256   | 0.884581 |
|         basic_gnn_sage          |    1    | 0.866667 |
|          lennard_jones          |  1000   | 0.866592 |
|          basic_gnn_gin          |    1    | 0.866447 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.862056 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.858091 |
|     functorch_maml_omniglot     |    1    | 0.85371  |
|          maml_omniglot          |    5    | 0.851929 |
|          fastNLP_Bert           |    1    | 0.84258  |
|      functorch_dp_cifar10       |   64    | 0.824024 |
|            moondream            |    1    | 0.823968 |
|          hf_Bert_large          |    1    | 0.809685 |
|       speech_transformer        |    1    | 0.805933 |
|          hf_Longformer          |    1    | 0.799142 |
|           hf_T5_large           |    1    | 0.793473 |
|            hf_Albert            |    1    | 0.79168  |
|             hf_Bert             |    1    | 0.789968 |
|              hf_T5              |    1    | 0.770893 |
|             hf_Bart             |    1    | 0.765885 |
|          hf_DistilBert          |    1    | 0.762138 |
|             hf_GPT2             |    1    | 0.760365 |
|           hf_Reformer           |    1    |  0.7362  |
|              maml               |    1    | 0.715807 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.696055 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4466.899133 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1428.38121  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1292.756964 |
|           hf_T5_base            |    1    | 1178.363624 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1159.022611 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1070.697707 |
|           Super_SloMo           |    6    | 609.159522  |
|           timm_nfnet            |   128   | 528.318676  |
|          hf_GPT2_large          |    1    | 522.454343  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 394.323247  |
|           hf_T5_large           |    1    | 388.785693  |
|            moondream            |    1    | 387.352896  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 351.417638  |
|        hf_distil_whisper        |    1    | 337.847155  |
|         vision_maskrcnn         |    1    | 305.552149  |
|       Background_Matting        |    1    | 247.360038  |
|           timm_regnet           |   32    |  218.7839   |
|          pytorch_unet           |    1    | 218.160038  |
|           densenet121           |   64    | 189.949993  |
|            resnet152            |   32    | 188.887548  |
|      torch_multimodal_clip      |   32    | 177.097315  |
|             yolov3              |    8    |  165.44375  |
|    detectron2_fcos_r_50_fpn     |    1    |  164.62079  |
|             demucs              |    1    | 143.171697  |
|           hf_BigBird            |    1    |  139.59313  |
|           timm_vovnet           |   32    | 123.381081  |
|          hf_Bert_large          |    1    | 102.729746  |
|     timm_vision_transformer     |   32    | 101.871141  |
|         pytorch_stargan         |   16    |  95.464332  |
|       doctr_det_predictor       |    1    |  89.629634  |
|            resnet50             |   32    |  75.360742  |
|          hf_Longformer          |    1    |  70.292859  |
|              maml               |    1    |  54.172138  |
|          timm_resnest           |   32    |  53.937726  |
|             hf_Bart             |    1    |  52.333191  |
|       speech_transformer        |    1    |  50.735303  |
|        timm_efficientnet        |   64    |  50.670273  |
|             alexnet             |   128   |  43.568478  |
|   mobilenet_v2_quantized_qat    |   96    |  41.884292  |
|              hf_T5              |    1    |  41.82688   |
|             hf_Bert             |    1    |  39.975689  |
|              vgg16              |    4    |  37.678284  |
|         LearningToPaint         |   96    |  37.329905  |
|     nvidia_deeprecommender      |   256   |  34.386937  |
|            hf_Albert            |    1    |  34.299734  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.460064  |
|          fastNLP_Bert           |    1    |  32.457539  |
|     pyhpc_isoneutral_mixing     | 1048576 |  30.442039  |
|           hf_Reformer           |    1    |  28.828448  |
|     resnet50_quantized_qat      |   32    |  28.330522  |
|          BERT_pytorch           |    2    |  27.258745  |
|          hf_DistilBert          |    1    |  25.768371  |
|         resnext50_32x4d         |    8    |  24.919783  |
|             hf_GPT2             |    1    |  23.306516  |
|        phlippe_densenet         |   128   |  21.473032  |
|              llama              |   32    |  20.898376  |
|           tts_angular           |   64    |  20.821586  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  20.44029   |
|              dcgan              |   256   |  18.922105  |
|        basic_gnn_edgecnn        |    1    |  17.933965  |
|       shufflenet_v2_x1_0        |   64    |  17.309293  |
|           mnasnet1_0            |   32    |  14.777425  |
|       mobilenet_v3_large        |   32    |  14.451211  |
|          mobilenet_v2           |   16    |  10.464679  |
|            resnet18             |    8    |  10.134519  |
|          basic_gnn_gcn          |    1    |  9.526054   |
|         opacus_cifar10          |   64    |  7.916355   |
|      functorch_dp_cifar10       |   64    |  7.657425   |
|              dlrm               |  2048   |  6.974462   |
|          squeezenet1_1          |   16    |  6.467038   |
|         basic_gnn_sage          |    1    |   5.06424   |
|          basic_gnn_gin          |    1    |  4.783308   |
|         phlippe_resnet          |   128   |   4.78133   |
|      doctr_reco_predictor       |    1    |  3.705119   |
|     pyhpc_equation_of_state     | 1048576 |  0.890133   |
|        soft_actor_critic        |   256   |  0.604547   |
|          maml_omniglot          |    5    |   0.53029   |
|     functorch_maml_omniglot     |    1    |  0.455159   |
|          lennard_jones          |  1000   |  0.283405   |
|               drq               |    0    |     0.0     |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.203929 |
|     MobileBertForQuestionAnswering      | 128 | 1.921677 |
|      GPT2ForSequenceClassification      |  4  | 1.865113 |
|           ElectraForCausalLM            | 32  | 1.764607 |
|       ElectraForQuestionAnswering       | 64  |  1.7445  |
|          MobileBertForMaskedLM          | 128 | 1.627426 |
|               DistillGPT2               | 16  | 1.515062 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.470931 |
|       RobertaForQuestionAnswering       | 16  | 1.455071 |
|    LayoutLMForSequenceClassification    | 16  | 1.440927 |
|        BertForQuestionAnswering         | 16  | 1.423182 |
|           RobertaForCausalLM            | 16  | 1.407417 |
|            YituTechConvBert             | 16  | 1.402848 |
|               GoogleFnet                | 16  | 1.379003 |
|           LayoutLMForMaskedLM           | 16  | 1.365355 |
|             BertForMaskedLM             | 16  | 1.355646 |
|                CamemBert                | 16  | 1.354133 |
|    MegatronBertForQuestionAnswering     |  8  | 1.346709 |
|       DebertaForQuestionAnswering       | 16  | 1.328549 |
|          AllenaiLongformerBase          |  4  | 1.322712 |
|         MegatronBertForCausalLM         |  4  | 1.293168 |
|           DebertaForMaskedLM            |  8  | 1.27283  |
|     PLBartForConditionalGeneration      |  4  | 1.244958 |
|      MBartForConditionalGeneration      |  2  | 1.21644  |
|             OPTForCausalLM              |  2  | 1.197869 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.176688 |
|       MT5ForConditionalGeneration       | 16  | 1.174721 |
|            AlbertForMaskedLM            |  4  | 1.136384 |
|       AlbertForQuestionAnswering        |  4  | 1.135623 |
|          DebertaV2ForMaskedLM           |  2  | 1.128032 |
|                 T5Small                 |  4  | 1.121943 |
|       T5ForConditionalGeneration        |  4  | 1.12107  |
|          DistilBertForMaskedLM          | 128 | 1.114973 |
|     DistilBertForQuestionAnswering      | 256 | 1.103346 |
|         Speech2Text2ForCausalLM         | 256 | 1.099943 |
|       BlenderbotSmallForCausalLM        | 64  | 1.099543 |
|             XGLMForCausalLM             |  8  | 1.098864 |
|      BartForConditionalGeneration       |  2  | 1.092883 |
|     M2M100ForConditionalGeneration      | 16  | 1.089434 |
|     PegasusForConditionalGeneration     | 32  | 1.062469 |
|            PLBartForCausalLM            |  8  | 1.059997 |
|            MBartForCausalLM             |  4  | 1.054213 |
|            TrOCRForCausalLM             | 32  | 1.049947 |
|          BlenderbotForCausalLM          |  4  | 1.04131  |
|           PegasusForCausalLM            | 32  | 1.040395 |
|             BartForCausalLM             |  4  | 1.038337 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 143.030896 |
|     MobileBertForQuestionAnswering      | 128 | 77.602084  |
|          MobileBertForMaskedLM          | 128 | 77.427809  |
|     M2M100ForConditionalGeneration      | 16  | 57.347645  |
|     PegasusForConditionalGeneration     | 32  | 57.202153  |
|      MBartForConditionalGeneration      |  2  | 56.880597  |
|      BartForConditionalGeneration       |  2  | 48.943338  |
|             XGLMForCausalLM             |  8  | 47.529434  |
|          BlenderbotForCausalLM          |  4  | 47.151511  |
|          DebertaV2ForMaskedLM           |  2  |  45.91257  |
|       MT5ForConditionalGeneration       | 16  | 41.666795  |
|         MegatronBertForCausalLM         |  4  | 39.974288  |
| BlenderbotSmallForConditionalGeneration | 64  | 39.499597  |
|    MegatronBertForQuestionAnswering     |  8  | 39.317034  |
|            YituTechConvBert             | 16  | 36.577411  |
|            XLNetLMHeadModel             |  8  | 35.180186  |
|      DebertaV2ForQuestionAnswering      |  1  | 34.390887  |
|     PLBartForConditionalGeneration      |  4  |  31.97336  |
|                 T5Small                 |  4  | 31.704001  |
|       T5ForConditionalGeneration        |  4  | 31.436583  |
|       DebertaForQuestionAnswering       | 16  | 26.891693  |
|           DebertaForMaskedLM            |  8  | 26.701434  |
|           PegasusForCausalLM            | 32  |  26.66251  |
|             OPTForCausalLM              |  2  | 26.485791  |
|            MBartForCausalLM             |  4  |  26.44258  |
|            TrOCRForCausalLM             | 32  | 26.021995  |
|           ElectraForCausalLM            | 32  | 23.125902  |
|       RobertaForQuestionAnswering       | 16  | 22.763045  |
|       ElectraForQuestionAnswering       | 64  | 22.697511  |
|           LayoutLMForMaskedLM           | 16  | 22.585121  |
|        BertForQuestionAnswering         | 16  | 22.524387  |
|                CamemBert                | 16  | 22.462054  |
|    LayoutLMForSequenceClassification    | 16  | 22.422393  |
|           RobertaForCausalLM            | 16  | 22.360082  |
|             BartForCausalLM             |  4  | 22.224876  |
|            AlbertForMaskedLM            |  4  | 22.213738  |
|             BertForMaskedLM             | 16  | 22.123488  |
|      GPT2ForSequenceClassification      |  4  | 21.804548  |
|       AlbertForQuestionAnswering        |  4  | 20.445186  |
|       BlenderbotSmallForCausalLM        | 64  | 20.099399  |
|     DistilBertForQuestionAnswering      | 256 | 17.515925  |
|               GoogleFnet                | 16  | 17.370638  |
|          DistilBertForMaskedLM          | 128 | 17.279862  |
|         Speech2Text2ForCausalLM         | 256 | 17.246325  |
|            PLBartForCausalLM            |  8  | 16.957017  |
|               DistillGPT2               | 16  | 14.603624  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.993577 |
|            AlbertForMaskedLM            |  4  | 0.993322 |
|               DistillGPT2               | 16  | 0.99215  |
|     DistilBertForQuestionAnswering      | 256 | 0.992121 |
|            TrOCRForCausalLM             | 32  | 0.991068 |
|           RobertaForCausalLM            | 16  | 0.991045 |
|             OPTForCausalLM              |  2  | 0.990799 |
|          DistilBertForMaskedLM          | 128 | 0.990091 |
|            PLBartForCausalLM            |  8  | 0.989769 |
|           ElectraForCausalLM            | 32  | 0.989718 |
|               GoogleFnet                | 16  | 0.989631 |
|                CamemBert                | 16  | 0.989364 |
|           LayoutLMForMaskedLM           | 16  | 0.989169 |
|             BertForMaskedLM             | 16  | 0.989153 |
|    MegatronBertForQuestionAnswering     |  8  | 0.988993 |
|       ElectraForQuestionAnswering       | 64  | 0.988958 |
|            MBartForCausalLM             |  4  | 0.988917 |
|     PegasusForConditionalGeneration     | 32  | 0.988347 |
|       DebertaForQuestionAnswering       | 16  | 0.98746  |
|            YituTechConvBert             | 16  | 0.987162 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987025 |
|        BertForQuestionAnswering         | 16  | 0.986965 |
|         Speech2Text2ForCausalLM         | 256 | 0.986959 |
|       RobertaForQuestionAnswering       | 16  | 0.985669 |
|      MBartForConditionalGeneration      |  2  | 0.985529 |
|    LayoutLMForSequenceClassification    | 16  | 0.985482 |
|           PegasusForCausalLM            | 32  | 0.985079 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985026 |
|          BlenderbotForCausalLM          |  4  | 0.984921 |
|     PLBartForConditionalGeneration      |  4  | 0.984628 |
|             BartForCausalLM             |  4  | 0.983817 |
|      GPT2ForSequenceClassification      |  4  | 0.983795 |
|          DebertaV2ForMaskedLM           |  2  | 0.983705 |
|         MegatronBertForCausalLM         |  4  | 0.983673 |
|           DebertaForMaskedLM            |  8  | 0.982834 |
|          MobileBertForMaskedLM          | 128 | 0.982452 |
|            XLNetLMHeadModel             |  8  | 0.981655 |
|       MT5ForConditionalGeneration       | 16  | 0.98139  |
|      BartForConditionalGeneration       |  2  | 0.979897 |
|       T5ForConditionalGeneration        |  4  | 0.978723 |
|                 T5Small                 |  4  | 0.978671 |
|     MobileBertForQuestionAnswering      | 128 | 0.974126 |
|             XGLMForCausalLM             |  8  | 0.971194 |
|          AllenaiLongformerBase          |  4  | 0.971089 |
|     M2M100ForConditionalGeneration      | 16  | 0.953728 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.878118 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2690.592514 |
|       AlbertForQuestionAnswering        |  4  | 2680.859504 |
|            XLNetLMHeadModel             |  8  | 1406.195914 |
|     PegasusForConditionalGeneration     | 32  | 985.742079  |
|            TrOCRForCausalLM             | 32  |  977.87677  |
|     DistilBertForQuestionAnswering      | 256 | 874.188036  |
|    MegatronBertForQuestionAnswering     |  8  | 770.123144  |
|            MBartForCausalLM             |  4  | 676.160215  |
|          BlenderbotForCausalLM          |  4  | 668.304947  |
|      MBartForConditionalGeneration      |  2  | 666.181419  |
|          DistilBertForMaskedLM          | 128 | 665.689589  |
|     M2M100ForConditionalGeneration      | 16  | 601.139229  |
|             OPTForCausalLM              |  2  | 599.247246  |
|           RobertaForCausalLM            | 16  | 599.099573  |
|          DebertaV2ForMaskedLM           |  2  | 592.285402  |
|      BartForConditionalGeneration       |  2  | 586.705065  |
|            YituTechConvBert             | 16  | 583.942043  |
|                CamemBert                | 16  | 559.034726  |
|             BertForMaskedLM             | 16  | 554.349108  |
|           LayoutLMForMaskedLM           | 16  | 551.645944  |
|          AllenaiLongformerBase          |  4  | 543.961059  |
|             BartForCausalLM             |  4  | 518.259025  |
|            PLBartForCausalLM            |  8  |  510.85112  |
|       DebertaForQuestionAnswering       | 16  | 493.654287  |
|           PegasusForCausalLM            | 32  | 487.359832  |
| BlenderbotSmallForConditionalGeneration | 64  | 484.594405  |
|     PLBartForConditionalGeneration      |  4  | 462.363975  |
|         MegatronBertForCausalLM         |  4  | 447.438881  |
|        BertForQuestionAnswering         | 16  |  441.07924  |
|    LayoutLMForSequenceClassification    | 16  | 438.727714  |
|       RobertaForQuestionAnswering       | 16  | 425.346934  |
|          MobileBertForMaskedLM          | 128 | 407.936427  |
|               GoogleFnet                | 16  | 404.437592  |
|               DistillGPT2               | 16  | 383.491676  |
|             XGLMForCausalLM             |  8  | 380.234916  |
|       T5ForConditionalGeneration        |  4  | 354.310436  |
|                 T5Small                 |  4  | 353.790053  |
|           DebertaForMaskedLM            |  8  | 351.053709  |
|       ElectraForQuestionAnswering       | 64  | 322.593109  |
|         Speech2Text2ForCausalLM         | 256 | 278.741773  |
|       BlenderbotSmallForCausalLM        | 64  | 276.102584  |
|       MT5ForConditionalGeneration       | 16  | 274.922711  |
|      GPT2ForSequenceClassification      |  4  | 264.554284  |
|           ElectraForCausalLM            | 32  | 250.823556  |
|     MobileBertForQuestionAnswering      | 128 | 240.695041  |
|      DebertaV2ForQuestionAnswering      |  1  | 221.758504  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 3.960019 |
|         mobilenetv2_100         | 128  | 3.948818 |
|           mnasnet_100           | 512  | 3.868093 |
|            lcnet_050            | 256  | 3.834242 |
|      mobilenetv3_large_100      | 512  | 3.673306 |
|          spnasnet_100           | 128  | 3.620535 |
|            fbnetv3_b            | 256  | 3.485519 |
|           regnety_002           | 1024 | 3.309183 |
|           rexnet_100            | 256  | 3.094697 |
|       tf_efficientnet_b0        | 128  | 3.006091 |
|            tinynet_a            | 128  | 2.900205 |
|          pnasnet5large          |  16  | 2.639962 |
|        ese_vovnet19b_dw         | 256  | 2.633564 |
|          botnet26t_256          | 128  | 2.479738 |
|           res2next50            | 128  | 2.444472 |
|       gluon_inception_v3        | 256  | 2.374355 |
|          ghostnet_100           | 512  | 2.352184 |
|          inception_v3           | 128  | 2.319559 |
|        eca_halonext26ts         | 128  | 2.311861 |
|        adv_inception_v3         | 128  | 2.291689 |
|       eca_botnext26ts_256       | 128  | 2.260328 |
|             dla102              | 128  | 2.222414 |
|        res2net50_14w_8s         | 128  | 2.176524 |
|        res2net101_26w_4s        | 128  | 2.144974 |
|          cspdarknet53           |  64  | 2.090544 |
|            repvgg_a2            | 128  | 2.044265 |
|            nfnet_l0             | 128  | 2.021504 |
|        convmixer_768_32         |  32  | 1.982546 |
|            gernet_l             | 128  | 1.942901 |
|         poolformer_m36          |  64  | 1.933878 |
|           tf_mixnet_l           | 128  | 1.898353 |
|           dm_nfnet_f0           | 128  | 1.879369 |
|           mobilevit_s           |  64  | 1.862829 |
|           selecsls42b           | 128  | 1.779966 |
|            mixnet_l             | 128  | 1.71946  |
|           volo_d1_224           |  64  | 1.710451 |
|         visformer_small         | 128  | 1.702829 |
|        sebotnet33ts_256         |  64  | 1.582688 |
|           resnest101e           |  64  | 1.490427 |
|            levit_128            | 1024 | 1.480708 |
|             dpn107              |  64  | 1.442099 |
|          jx_nest_base           |  32  | 1.271464 |
|          gmlp_s16_224           | 128  | 1.258825 |
|      xcit_large_24_p8_224       |  16  | 1.242594 |
|          resmlp_12_224          | 128  | 1.202441 |
|           convit_base           |  64  | 1.18995  |
|          gmixer_24_224          | 128  | 1.143469 |
|         coat_lite_mini          | 128  | 1.141986 |
|        tnt_s_patch16_224        | 128  | 1.13519  |
|  swin_base_patch4_window7_224   |  64  | 1.125502 |
|          cait_m36_384           |  4   | 1.102931 |
|        twins_pcpvt_base         | 128  | 1.099534 |
|          convnext_base          |  64  | 1.066804 |
|      beit_base_patch16_224      |  64  | 1.061841 |
|          mixer_b16_224          | 128  | 1.042019 |
| deit_base_distilled_patch16_224 |  64  | 1.033831 |
|            pit_b_224            |  64  | 1.031508 |
|      vit_base_patch16_224       |  64  | 1.021758 |
|         crossvit_9_240          | 256  | 1.007997 |
|            hrnet_w18            | 128  | 0.805101 |
|     swsl_resnext101_32x16d      |  32  | 0.073025 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|     swsl_resnext101_32x16d      |  32  | 108.091248 |
|          cait_m36_384           |  4   | 85.197777  |
|      xcit_large_24_p8_224       |  16  | 84.314853  |
|            hrnet_w18            | 128  | 81.439007  |
|  swin_base_patch4_window7_224   |  64  | 78.014396  |
|          pnasnet5large          |  16  | 75.756657  |
|           mobilevit_s           |  64  |  64.37151  |
|         poolformer_m36          |  64  | 60.098893  |
|          jx_nest_base           |  32  | 56.049519  |
|        tnt_s_patch16_224        | 128  | 54.362185  |
|        res2net101_26w_4s        | 128  | 51.847278  |
|        twins_pcpvt_base         | 128  | 50.448466  |
|             dpn107              |  64  | 49.261199  |
|        res2net50_14w_8s         | 128  | 48.250066  |
|           resnest101e           |  64  |  47.07273  |
|           volo_d1_224           |  64  | 46.889243  |
|           tf_mixnet_l           | 128  | 43.255999  |
|            levit_128            | 1024 | 41.882263  |
|            mixnet_l             | 128  | 40.315867  |
|        eca_halonext26ts         | 128  | 39.494939  |
|            fbnetv3_b            | 256  | 39.116779  |
|        sebotnet33ts_256         |  64  | 38.820374  |
|          gmixer_24_224          | 128  |  37.11923  |
|        adv_inception_v3         | 128  | 35.960796  |
|         crossvit_9_240          | 256  |  35.60722  |
|          gmlp_s16_224           | 128  | 33.956414  |
|          inception_v3           | 128  | 33.514772  |
|          convnext_base          |  64  |  30.99289  |
|       gluon_inception_v3        | 256  | 30.811395  |
|       eca_botnext26ts_256       | 128  | 30.802739  |
|           convit_base           |  64  | 30.353355  |
|           res2next50            | 128  |  29.90928  |
|         coat_lite_mini          | 128  |  29.33926  |
|           rexnet_100            | 256  | 28.090457  |
|             dla102              | 128  | 27.745728  |
|          botnet26t_256          | 128  | 27.560568  |
|            tinynet_a            | 128  | 26.243356  |
|          ghostnet_100           | 512  | 26.241118  |
|         visformer_small         | 128  | 23.835816  |
|       tf_efficientnet_b0        | 128  | 23.815942  |
|          cspdarknet53           |  64  | 21.770716  |
|            pit_b_224            |  64  | 21.336959  |
|          mixer_b16_224          | 128  | 21.300163  |
|        convmixer_768_32         |  32  | 20.932506  |
|      beit_base_patch16_224      |  64  | 20.092156  |
|         mobilenetv2_100         | 128  | 19.319904  |
|      mobilenetv3_large_100      | 512  |  19.17702  |
|      vit_base_patch16_224       |  64  | 19.010387  |
| deit_base_distilled_patch16_224 |  64  | 18.756189  |
|          spnasnet_100           | 128  | 17.996252  |
|           dm_nfnet_f0           | 128  | 17.604002  |
|            lcnet_050            | 256  |  17.28118  |
|            gernet_l             | 128  | 16.967881  |
|           regnety_002           | 1024 | 16.932597  |
|            nfnet_l0             | 128  | 16.678238  |
|            repvgg_a2            | 128  | 16.431603  |
|          resmlp_12_224          | 128  | 15.529536  |
|           selecsls42b           | 128  | 14.721496  |
|           fbnetc_100            | 512  | 14.543477  |
|           mnasnet_100           | 512  | 12.864294  |
|        ese_vovnet19b_dw         | 256  | 12.434067  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997017 |
|      mobilenetv3_large_100      | 512  | 0.995864 |
|           fbnetc_100            | 512  | 0.995613 |
|          convnext_base          |  64  |  0.9949  |
|            fbnetv3_b            | 256  | 0.994493 |
|           mnasnet_100           | 512  | 0.994429 |
|           dm_nfnet_f0           | 128  | 0.994318 |
|            levit_128            | 1024 | 0.994102 |
|           regnety_002           | 1024 |  0.9941  |
|          ghostnet_100           | 512  | 0.993931 |
|       eca_botnext26ts_256       | 128  | 0.993278 |
|           rexnet_100            | 256  | 0.992837 |
|        eca_halonext26ts         | 128  | 0.992734 |
|          mixer_b16_224          | 128  | 0.992731 |
|        twins_pcpvt_base         | 128  | 0.992499 |
|            nfnet_l0             | 128  | 0.991791 |
|        convmixer_768_32         |  32  | 0.991544 |
|      xcit_large_24_p8_224       |  16  | 0.991499 |
|          gmlp_s16_224           | 128  | 0.991412 |
|           convit_base           |  64  | 0.991402 |
|        res2net101_26w_4s        | 128  | 0.991143 |
|           res2next50            | 128  | 0.990755 |
|       tf_efficientnet_b0        | 128  |  0.9907  |
|         visformer_small         | 128  | 0.990678 |
|            mixnet_l             | 128  | 0.990662 |
|           tf_mixnet_l           | 128  | 0.990508 |
|      beit_base_patch16_224      |  64  | 0.990486 |
|         coat_lite_mini          | 128  | 0.990473 |
|          botnet26t_256          | 128  | 0.99015  |
|             dpn107              |  64  |   0.99   |
|          gmixer_24_224          | 128  | 0.989983 |
|       gluon_inception_v3        | 256  | 0.989863 |
|         mobilenetv2_100         | 128  | 0.989697 |
|           mobilevit_s           |  64  | 0.989484 |
|        tnt_s_patch16_224        | 128  | 0.988694 |
|         poolformer_m36          |  64  | 0.988629 |
|        sebotnet33ts_256         |  64  | 0.988453 |
|             dla102              | 128  | 0.988169 |
|          cspdarknet53           |  64  | 0.987611 |
|            pit_b_224            |  64  | 0.987332 |
|          cait_m36_384           |  4   | 0.987293 |
|            gernet_l             | 128  | 0.987162 |
|        res2net50_14w_8s         | 128  | 0.98677  |
|      vit_base_patch16_224       |  64  | 0.986341 |
| deit_base_distilled_patch16_224 |  64  | 0.986235 |
|          resmlp_12_224          | 128  | 0.986218 |
|           selecsls42b           | 128  | 0.985852 |
|          pnasnet5large          |  16  | 0.985775 |
|  swin_base_patch4_window7_224   |  64  | 0.985519 |
|           resnest101e           |  64  | 0.985335 |
|            tinynet_a            | 128  | 0.984849 |
|        adv_inception_v3         | 128  | 0.984758 |
|          jx_nest_base           |  32  | 0.984717 |
|          inception_v3           | 128  | 0.984326 |
|     swsl_resnext101_32x16d      |  32  | 0.982837 |
|          spnasnet_100           | 128  | 0.982759 |
|            hrnet_w18            | 128  | 0.982053 |
|           volo_d1_224           |  64  | 0.981445 |
|            lcnet_050            | 256  | 0.978782 |
|            repvgg_a2            | 128  | 0.978722 |
|         crossvit_9_240          | 256  | 0.973586 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+--------------+
|              name               |  bs  |   inductor   |
+---------------------------------+------+--------------+
|     swsl_resnext101_32x16d      |  32  | 17343.232379 |
|           resnest101e           |  64  | 2711.976757  |
|      xcit_large_24_p8_224       |  16  | 1480.136423  |
|            hrnet_w18            | 128  | 1429.036932  |
|          cait_m36_384           |  4   | 1188.252294  |
|          convnext_base          |  64  | 1158.985582  |
|          mixer_b16_224          | 128  |  1064.03406  |
|             dpn107              |  64  |  966.934308  |
|           dm_nfnet_f0           | 128  |  947.43269   |
|           convit_base           |  64  |  922.725684  |
|  swin_base_patch4_window7_224   |  64  |  897.030058  |
|        twins_pcpvt_base         | 128  |  826.833521  |
|        tnt_s_patch16_224        | 128  |  814.552089  |
|       gluon_inception_v3        | 256  |  784.939237  |
|      vit_base_patch16_224       |  64  |  691.676755  |
| deit_base_distilled_patch16_224 |  64  |  687.450036  |
|      beit_base_patch16_224      |  64  |  679.066684  |
|        res2net101_26w_4s        | 128  |  641.212546  |
|            nfnet_l0             | 128  |  604.763487  |
|          gmixer_24_224          | 128  |  580.099988  |
|          gmlp_s16_224           | 128  |  574.695414  |
|            levit_128            | 1024 |  565.003565  |
|            pit_b_224            |  64  |  550.798102  |
|        ese_vovnet19b_dw         | 256  |  550.13448   |
|             dla102              | 128  |  531.253951  |
|          jx_nest_base           |  32  |  517.181327  |
|         crossvit_9_240          | 256  |  507.353571  |
|        convmixer_768_32         |  32  |  442.673085  |
|           volo_d1_224           |  64  |  416.263874  |
|         coat_lite_mini          | 128  |  413.000771  |
|         poolformer_m36          |  64  |  410.195699  |
|        res2net50_14w_8s         | 128  |  396.834369  |
|          inception_v3           | 128  |  395.666097  |
|        adv_inception_v3         | 128  |  394.950877  |
|         visformer_small         | 128  |  386.501392  |
|            mixnet_l             | 128  |  363.528877  |
|          ghostnet_100           | 512  |  359.682034  |
|           res2next50            | 128  |  354.903422  |
|          pnasnet5large          |  16  |  351.250741  |
|           tf_mixnet_l           | 128  |  350.88849   |
|            repvgg_a2            | 128  |  340.705119  |
|        sebotnet33ts_256         |  64  |  319.296663  |
|       eca_botnext26ts_256       | 128  |  306.672727  |
|        eca_halonext26ts         | 128  |  305.789804  |
|           fbnetc_100            | 512  |  304.623725  |
|          botnet26t_256          | 128  |  291.36208   |
|            gernet_l             | 128  |  291.332855  |
|           regnety_002           | 1024 |  281.557693  |
|          resmlp_12_224          | 128  |  265.624204  |
|          cspdarknet53           |  64  |  261.747992  |
|           mnasnet_100           | 512  |  257.82958   |
|            fbnetv3_b            | 256  |  243.807377  |
|           selecsls42b           | 128  |  232.636121  |
|      mobilenetv3_large_100      | 512  |  228.049146  |
|           rexnet_100            | 256  |  227.725287  |
|           mobilevit_s           |  64  |  205.156566  |
|       tf_efficientnet_b0        | 128  |  119.373821  |
|            tinynet_a            | 128  |  85.024662   |
|         mobilenetv2_100         | 128  |  70.406389   |
|          spnasnet_100           | 128  |  65.858742   |
|            lcnet_050            | 256  |  28.029161   |
+---------------------------------+------+--------------+

zxd1997066 · 2024-10-21T22:50:46Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-20 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	47e80abc7a9de6b5cdc20f7d1a8afb68c639d764
Torchbench	main	23512dbe
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 99%, 79/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.60x    |    1.22x    |    1.59x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   28.82    |    25.87    |    24.86    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.81x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 48.416082 |
|     pyhpc_equation_of_state     |    1    | 23.610034 |
|     functorch_maml_omniglot     |    1    | 3.774777  |
|         basic_gnn_sage          |    1    | 3.576387  |
|          squeezenet1_1          |    1    | 3.564918  |
|          basic_gnn_gin          |    1    | 3.560715  |
|          basic_gnn_gcn          |    1    | 3.169318  |
|          maml_omniglot          |    5    | 2.882501  |
|         opacus_cifar10          |    1    |  2.8575   |
|           timm_nfnet            |    1    | 2.834012  |
|      functorch_dp_cifar10       |    1    |  2.60777  |
|       shufflenet_v2_x1_0        |    1    | 2.312063  |
|            resnet18             |    1    | 2.251814  |
|              dcgan              |    1    | 2.226103  |
|          mobilenet_v2           |    1    | 2.184434  |
|         phlippe_resnet          |    1    | 2.135787  |
|           mnasnet1_0            |    1    | 2.085839  |
|          timm_resnest           |    1    |  2.03943  |
|       mobilenet_v3_large        |    1    | 2.017552  |
|        timm_efficientnet        |    1    | 1.879083  |
|        phlippe_densenet         |    1    | 1.844253  |
|            resnet50             |    1    |  1.82973  |
|           densenet121           |    1    | 1.816518  |
|            resnet152            |    1    | 1.713582  |
|         LearningToPaint         |    1    | 1.643042  |
|          lennard_jones          |    1    | 1.626436  |
|           timm_vovnet           |    1    | 1.625243  |
|              llama              |    1    | 1.593231  |
|           timm_regnet           |    1    | 1.524653  |
|         resnext50_32x4d         |    1    | 1.523831  |
|      doctr_reco_predictor       |    1    | 1.511475  |
|              vgg16              |    1    | 1.447346  |
|       doctr_det_predictor       |    1    | 1.411327  |
|              dlrm               |    1    | 1.407189  |
|             yolov3              |    1    | 1.407184  |
|        basic_gnn_edgecnn        |    1    | 1.402676  |
|          BERT_pytorch           |    1    | 1.385501  |
|             alexnet             |    1    | 1.366859  |
|            hf_Albert            |    1    | 1.320077  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.296119  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.292556  |
|             hf_GPT2             |    1    | 1.283901  |
|          fastNLP_Bert           |    1    | 1.270971  |
|         vision_maskrcnn         |    1    | 1.266576  |
|           hf_Reformer           |    1    |  1.26483  |
|     timm_vision_transformer     |    1    | 1.262898  |
|          hf_GPT2_large          |    1    | 1.255364  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.239243  |
|            moondream            |    1    | 1.224141  |
|           Super_SloMo           |    1    |  1.20801  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.193025  |
|             hf_Bert             |    1    | 1.190658  |
|         pytorch_stargan         |   16    | 1.186391  |
|          hf_Bert_large          |    1    | 1.178971  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.177541  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.170205  |
|  timm_vision_transformer_large  |    1    | 1.170089  |
|              maml               |    1    | 1.157868  |
|          hf_DistilBert          |    1    | 1.146601  |
|      torch_multimodal_clip      |    1    | 1.131767  |
|          pytorch_unet           |    1    | 1.124642  |
|             hf_Bart             |    1    | 1.114362  |
|           hf_BigBird            |    1    | 1.100821  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.096337  |
|       speech_transformer        |    1    | 1.090214  |
|        hf_distil_whisper        |    1    | 1.079515  |
|       Background_Matting        |    1    | 1.074568  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.055933  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051274  |
|        soft_actor_critic        |   256   |  1.04396  |
|          hf_Longformer          |    1    | 1.032853  |
|             demucs              |    1    | 1.006913  |
|           tts_angular           |    1    | 0.999162  |
|     resnet50_quantized_qat      |    1    |  0.99517  |
|   mobilenet_v2_quantized_qat    |    1    | 0.986578  |
|     nvidia_deeprecommender      |    1    |  0.95026  |
|           hf_T5_large           |    1    | 0.791479  |
|              hf_T5              |    1    | 0.708408  |
|           hf_T5_base            |    1    |  0.5926   |
|               drq               |    0    |    0.0    |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 155.969979 |
|         vision_maskrcnn         |    1    | 146.043851 |
|    detectron2_fcos_r_50_fpn     |    1    | 121.57755  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 117.449757 |
|           hf_T5_base            |    1    | 93.556918  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 90.586941  |
|           hf_T5_large           |    1    | 90.066199  |
|          hf_Longformer          |    1    | 88.203099  |
|              maml               |    1    | 83.222034  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 62.684147  |
|       speech_transformer        |    1    |  51.7965   |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.494767  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 44.917488  |
|           hf_Reformer           |    1    | 43.557534  |
|           densenet121           |    1    | 39.481786  |
|  timm_vision_transformer_large  |    1    | 35.316036  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 33.421543  |
|          fastNLP_Bert           |    1    | 30.582803  |
|          basic_gnn_gcn          |    1    | 29.594004  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.045876  |
|          hf_Bert_large          |    1    |  28.83807  |
|          hf_GPT2_large          |    1    | 27.877928  |
|            moondream            |    1    |  26.97324  |
|        hf_distil_whisper        |    1    | 26.798844  |
|       doctr_det_predictor       |    1    | 26.309169  |
|            resnet152            |    1    | 24.991574  |
|           Super_SloMo           |    1    | 24.855009  |
|      torch_multimodal_clip      |    1    | 24.238179  |
|              hf_T5              |    1    | 22.272713  |
|          BERT_pytorch           |    1    | 21.517039  |
|             hf_Bart             |    1    |  20.49976  |
|             yolov3              |    1    | 19.370647  |
|             demucs              |    1    | 17.952912  |
|           timm_regnet           |    1    |  17.47263  |
|        phlippe_densenet         |    1    | 17.182107  |
|             hf_Bert             |    1    | 17.055066  |
|       shufflenet_v2_x1_0        |    1    | 16.819802  |
|              llama              |    1    | 16.400875  |
|           timm_nfnet            |    1    | 16.099451  |
|             hf_GPT2             |    1    | 15.848306  |
|        timm_efficientnet        |    1    | 15.842921  |
|       Background_Matting        |    1    | 15.039717  |
|          timm_resnest           |    1    | 15.010678  |
|     timm_vision_transformer     |    1    | 14.801953  |
|       mobilenet_v3_large        |    1    | 14.764742  |
|            hf_Albert            |    1    | 14.569935  |
|      doctr_reco_predictor       |    1    | 14.499668  |
|           timm_vovnet           |    1    | 14.081132  |
|         pytorch_stargan         |   16    | 14.075078  |
|            resnet50             |    1    | 13.536742  |
|          mobilenet_v2           |    1    | 13.534964  |
|         resnext50_32x4d         |    1    | 13.519919  |
|           mnasnet1_0            |    1    | 13.185036  |
|          hf_DistilBert          |    1    | 12.864211  |
|     pyhpc_isoneutral_mixing     |    1    | 12.825855  |
|         opacus_cifar10          |    1    |  12.36701  |
|      functorch_dp_cifar10       |    1    | 11.828317  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.440567  |
|          pytorch_unet           |    1    | 10.341339  |
|            resnet18             |    1    |  9.980988  |
|          squeezenet1_1          |    1    |  9.787523  |
|         LearningToPaint         |    1    |  9.767524  |
|         phlippe_resnet          |    1    |  9.509701  |
|              vgg16              |    1    |  9.275557  |
|     pyhpc_equation_of_state     |    1    |  9.160363  |
|             alexnet             |    1    |  9.148556  |
|          maml_omniglot          |    5    |  9.114345  |
|     functorch_maml_omniglot     |    1    |  8.546419  |
|     nvidia_deeprecommender      |    1    |  8.385161  |
|              dlrm               |    1    |  8.167035  |
|              dcgan              |    1    |  7.740691  |
|          basic_gnn_gin          |    1    |  7.633787  |
|         basic_gnn_sage          |    1    |  7.633457  |
|        soft_actor_critic        |   256   |  7.543902  |
|          lennard_jones          |    1    |  7.507649  |
|        basic_gnn_edgecnn        |    1    |  7.468036  |
|           tts_angular           |    1    |  7.27825   |
|   mobilenet_v2_quantized_qat    |    1    |  0.099695  |
|     resnet50_quantized_qat      |    1    |  0.069108  |
|               drq               |    0    |    0.0     |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995172 |
|           hf_T5_base            |    1    | 0.988271 |
|              dlrm               |    1    | 0.987307 |
|          hf_GPT2_large          |    1    | 0.986335 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.981243 |
|       Background_Matting        |    1    | 0.980966 |
|          pytorch_unet           |    1    | 0.974548 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973452 |
|        basic_gnn_edgecnn        |    1    | 0.972792 |
|       doctr_det_predictor       |    1    | 0.960006 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.958264 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955993 |
|      torch_multimodal_clip      |    1    | 0.954085 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954023 |
|     resnet50_quantized_qat      |    1    | 0.951034 |
|          basic_gnn_gin          |    1    | 0.951014 |
|         vision_maskrcnn         |    1    | 0.946997 |
|         basic_gnn_sage          |    1    | 0.945603 |
|         pytorch_stargan         |   16    | 0.945417 |
|         LearningToPaint         |    1    | 0.945205 |
|           hf_BigBird            |    1    | 0.944226 |
|          basic_gnn_gcn          |    1    | 0.942247 |
|      doctr_reco_predictor       |    1    | 0.937893 |
|           Super_SloMo           |    1    | 0.935211 |
|        hf_distil_whisper        |    1    | 0.930537 |
|   mobilenet_v2_quantized_qat    |    1    | 0.919552 |
|              llama              |    1    | 0.919158 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.911266 |
|           tts_angular           |    1    | 0.888789 |
|        soft_actor_critic        |   256   | 0.885714 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885124 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.88187  |
|         opacus_cifar10          |    1    | 0.880396 |
|             yolov3              |    1    | 0.875475 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.860106 |
|          fastNLP_Bert           |    1    | 0.859577 |
|        timm_efficientnet        |    1    | 0.857461 |
|              dcgan              |    1    | 0.856861 |
|           mnasnet1_0            |    1    | 0.856637 |
|          lennard_jones          |    1    | 0.856347 |
|          squeezenet1_1          |    1    | 0.853755 |
|          maml_omniglot          |    5    | 0.852665 |
|     functorch_maml_omniglot     |    1    | 0.851929 |
|          mobilenet_v2           |    1    | 0.851667 |
|          timm_resnest           |    1    | 0.84778  |
|       mobilenet_v3_large        |    1    | 0.843857 |
|     pyhpc_equation_of_state     |    1    | 0.840308 |
|       shufflenet_v2_x1_0        |    1    | 0.827138 |
|         phlippe_resnet          |    1    | 0.826822 |
|            moondream            |    1    | 0.82574  |
|       speech_transformer        |    1    | 0.817488 |
|           hf_T5_large           |    1    | 0.816956 |
|          hf_Bert_large          |    1    | 0.814784 |
|           timm_nfnet            |    1    | 0.81295  |
|     pyhpc_isoneutral_mixing     |    1    | 0.812766 |
|        phlippe_densenet         |    1    | 0.80981  |
|             hf_Bert             |    1    | 0.804682 |
|     timm_vision_transformer     |    1    | 0.80427  |
|            hf_Albert            |    1    | 0.802513 |
|          hf_Longformer          |    1    | 0.802377 |
|         resnext50_32x4d         |    1    | 0.798333 |
|           timm_vovnet           |    1    | 0.79548  |
|            resnet18             |    1    | 0.788192 |
|             hf_Bart             |    1    | 0.78587  |
|            resnet50             |    1    | 0.780461 |
|          hf_DistilBert          |    1    | 0.780343 |
|          BERT_pytorch           |    1    | 0.780111 |
|           timm_regnet           |    1    | 0.777049 |
|             hf_GPT2             |    1    | 0.768382 |
|      functorch_dp_cifar10       |    1    | 0.742252 |
|           densenet121           |    1    | 0.74204  |
|             alexnet             |    1    | 0.734034 |
|  timm_vision_transformer_large  |    1    | 0.733301 |
|              hf_T5              |    1    | 0.730286 |
|           hf_Reformer           |    1    | 0.724005 |
|              vgg16              |    1    | 0.72257  |
|            resnet152            |    1    | 0.720544 |
|              maml               |    1    | 0.719803 |
|     nvidia_deeprecommender      |    1    | 0.671887 |
|               drq               |    0    |   0.0    |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26695.71038  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11876.205925 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11318.031283 |
|          hf_GPT2_large          |    1    | 9926.384111  |
|           hf_T5_large           |    1    | 7553.804511  |
|            moondream            |    1    | 7367.581891  |
|        hf_distil_whisper        |    1    | 6737.661343  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5765.715822  |
|       Background_Matting        |    1    | 5243.855502  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5091.637955  |
|          pytorch_unet           |    1    | 4543.500468  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3187.780291  |
|  timm_vision_transformer_large  |    1    | 2790.900004  |
|         vision_maskrcnn         |    1    |  2648.09124  |
|    detectron2_fcos_r_50_fpn     |    1    | 2432.210415  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2418.513186  |
|             demucs              |    1    | 2300.113419  |
|         pytorch_stargan         |   16    | 2045.481413  |
|           Super_SloMo           |    1    | 1948.369763  |
|          hf_Bert_large          |    1    | 1764.448445  |
|           hf_BigBird            |    1    | 1576.735995  |
|       doctr_det_predictor       |    1    | 1574.248815  |
|      torch_multimodal_clip      |    1    | 1229.718684  |
|          hf_Longformer          |    1    | 1119.418967  |
|             hf_Bart             |    1    |  867.247805  |
|              hf_T5              |    1    |  761.753834  |
|       speech_transformer        |    1    |  675.048239  |
|             hf_Bert             |    1    |  667.709462  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  623.120433  |
|            hf_Albert            |    1    |  561.793067  |
|          fastNLP_Bert           |    1    |  512.918064  |
|             yolov3              |    1    |  433.768077  |
|          hf_DistilBert          |    1    |  414.678767  |
|             hf_GPT2             |    1    |  352.56697   |
|           hf_Reformer           |    1    |  289.270945  |
|        basic_gnn_edgecnn        |    1    |  233.99154   |
|              vgg16              |    1    |  191.714708  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  159.845543  |
|           timm_regnet           |    1    |  150.527736  |
|            resnet152            |    1    |  137.686542  |
|          BERT_pytorch           |    1    |  135.504635  |
|           timm_nfnet            |    1    |  97.104238   |
|              maml               |    1    |  85.483475   |
|           timm_vovnet           |    1    |  80.901583   |
|     nvidia_deeprecommender      |    1    |  58.721928   |
|         resnext50_32x4d         |    1    |   57.58267   |
|     timm_vision_transformer     |    1    |  57.193411   |
|           tts_angular           |    1    |  54.871745   |
|            resnet50             |    1    |  51.406052   |
|           densenet121           |    1    |  44.562445   |
|          timm_resnest           |    1    |  33.416148   |
|          basic_gnn_gcn          |    1    |  32.041179   |
|      doctr_reco_predictor       |    1    |  23.924675   |
|            resnet18             |    1    |  22.388439   |
|             alexnet             |    1    |  22.373313   |
|              llama              |    1    |  21.113225   |
|     resnet50_quantized_qat      |    1    |  19.669633   |
|         basic_gnn_sage          |    1    |  16.837826   |
|          basic_gnn_gin          |    1    |  16.572734   |
|        timm_efficientnet        |    1    |  12.839142   |
|         LearningToPaint         |    1    |   10.02303   |
|   mobilenet_v2_quantized_qat    |    1    |   8.762718   |
|           mnasnet1_0            |    1    |   7.499111   |
|          mobilenet_v2           |    1    |   7.429706   |
|       mobilenet_v3_large        |    1    |   7.05254    |
|          squeezenet1_1          |    1    |   5.811285   |
|       shufflenet_v2_x1_0        |    1    |   5.704166   |
|        soft_actor_critic        |   256   |   3.490191   |
|        phlippe_densenet         |    1    |   3.438326   |
|      functorch_dp_cifar10       |    1    |   2.094104   |
|         opacus_cifar10          |    1    |   2.068808   |
|              dcgan              |    1    |   1.766446   |
|         phlippe_resnet          |    1    |   1.340091   |
|     functorch_maml_omniglot     |    1    |   0.82541    |
|          maml_omniglot          |    5    |   0.81085    |
|              dlrm               |    1    |   0.597756   |
|     pyhpc_isoneutral_mixing     |    1    |   0.064641   |
|     pyhpc_equation_of_state     |    1    |   0.049855   |
|          lennard_jones          |    1    |   0.04944    |
|               drq               |    0    |     0.0      |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.168007 |
|     MobileBertForQuestionAnswering      | 1  | 1.840015 |
|         Speech2Text2ForCausalLM         | 1  | 1.425705 |
|      GPT2ForSequenceClassification      | 1  | 1.362408 |
|            XLNetLMHeadModel             | 1  | 1.357708 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.348228 |
|     DistilBertForQuestionAnswering      | 1  | 1.329978 |
|          DistilBertForMaskedLM          | 1  | 1.328116 |
|            YituTechConvBert             | 1  | 1.320294 |
|       BlenderbotSmallForCausalLM        | 1  | 1.311873 |
|           ElectraForCausalLM            | 1  | 1.294213 |
|       DebertaForQuestionAnswering       | 1  | 1.282961 |
|       ElectraForQuestionAnswering       | 1  | 1.270693 |
|     PegasusForConditionalGeneration     | 1  | 1.261258 |
|     M2M100ForConditionalGeneration      | 1  | 1.259609 |
|             XGLMForCausalLM             | 1  | 1.255697 |
|           PegasusForCausalLM            | 1  | 1.254266 |
|           DebertaForMaskedLM            | 1  | 1.253521 |
|          BlenderbotForCausalLM          | 1  | 1.250744 |
|               GoogleFnet                | 1  | 1.237023 |
|    LayoutLMForSequenceClassification    | 1  | 1.217271 |
|               DistillGPT2               | 1  | 1.210062 |
|                CamemBert                | 1  | 1.205508 |
|       MT5ForConditionalGeneration       | 1  | 1.204938 |
|             BertForMaskedLM             | 1  | 1.203053 |
|           LayoutLMForMaskedLM           | 1  | 1.202526 |
|        BertForQuestionAnswering         | 1  | 1.198707 |
|       RobertaForQuestionAnswering       | 1  | 1.195801 |
|            AlbertForMaskedLM            | 1  | 1.192183 |
|       AlbertForQuestionAnswering        | 1  | 1.189172 |
|    MegatronBertForQuestionAnswering     | 1  | 1.181981 |
|           RobertaForCausalLM            | 1  | 1.177494 |
|         MegatronBertForCausalLM         | 1  | 1.173932 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.16254  |
|          DebertaV2ForMaskedLM           | 1  | 1.158071 |
|            TrOCRForCausalLM             | 1  | 1.142881 |
|     PLBartForConditionalGeneration      | 1  | 1.112091 |
|      MBartForConditionalGeneration      | 1  | 1.090134 |
|      BartForConditionalGeneration       | 1  | 1.063501 |
|             BartForCausalLM             | 1  | 1.062782 |
|            PLBartForCausalLM            | 1  | 1.022531 |
|             OPTForCausalLM              | 1  | 1.021278 |
|            MBartForCausalLM             | 1  | 1.007065 |
|          AllenaiLongformerBase          | 1  | 0.959087 |
|       T5ForConditionalGeneration        | 1  | 0.62149  |
|                 T5Small                 | 1  | 0.612924 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 98.639465 |
|          MobileBertForMaskedLM          | 1  | 69.285888 |
|     MobileBertForQuestionAnswering      | 1  | 69.169162 |
|     M2M100ForConditionalGeneration      | 1  | 41.073897 |
|     PegasusForConditionalGeneration     | 1  | 40.986797 |
|      MBartForConditionalGeneration      | 1  | 39.286109 |
|      BartForConditionalGeneration       | 1  | 35.774033 |
|             XGLMForCausalLM             | 1  | 33.791885 |
|            XLNetLMHeadModel             | 1  | 33.050721 |
|          BlenderbotForCausalLM          | 1  | 32.957564 |
|      DebertaV2ForQuestionAnswering      | 1  | 31.299746 |
|          DebertaV2ForMaskedLM           | 1  | 31.10446  |
|         MegatronBertForCausalLM         | 1  | 29.438016 |
| BlenderbotSmallForConditionalGeneration | 1  | 29.187387 |
|    MegatronBertForQuestionAnswering     | 1  | 29.090241 |
|       MT5ForConditionalGeneration       | 1  | 28.239929 |
|       T5ForConditionalGeneration        | 1  | 26.900372 |
|                 T5Small                 | 1  | 26.768055 |
|            YituTechConvBert             | 1  | 25.97258  |
|     PLBartForConditionalGeneration      | 1  | 22.394797 |
|            MBartForCausalLM             | 1  | 20.009131 |
|           PegasusForCausalLM            | 1  | 19.718175 |
|            TrOCRForCausalLM             | 1  | 19.591188 |
|             OPTForCausalLM              | 1  | 19.211249 |
|           DebertaForMaskedLM            | 1  | 18.207657 |
|           ElectraForCausalLM            | 1  | 18.184845 |
|       DebertaForQuestionAnswering       | 1  | 18.036473 |
|       ElectraForQuestionAnswering       | 1  | 17.957804 |
|           LayoutLMForMaskedLM           | 1  | 17.596934 |
|           RobertaForCausalLM            | 1  | 17.529971 |
|                CamemBert                | 1  | 17.489937 |
|             BertForMaskedLM             | 1  | 17.426574 |
|       RobertaForQuestionAnswering       | 1  | 17.377108 |
|    LayoutLMForSequenceClassification    | 1  | 17.286469 |
|        BertForQuestionAnswering         | 1  | 17.233873 |
|             BartForCausalLM             | 1  | 16.992155 |
|       BlenderbotSmallForCausalLM        | 1  | 15.529778 |
|      GPT2ForSequenceClassification      | 1  | 14.364425 |
|               GoogleFnet                | 1  | 14.302522 |
|         Speech2Text2ForCausalLM         | 1  | 13.566314 |
|          DistilBertForMaskedLM          | 1  | 13.389491 |
|            PLBartForCausalLM            | 1  | 13.351081 |
|     DistilBertForQuestionAnswering      | 1  | 13.167669 |
|               DistillGPT2               | 1  | 11.592402 |
|            AlbertForMaskedLM            | 1  | 9.290966  |
|       AlbertForQuestionAnswering        | 1  | 7.052244  |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986541 |
|      MBartForConditionalGeneration      | 1  | 0.974751 |
|      GPT2ForSequenceClassification      | 1  | 0.963369 |
|                 T5Small                 | 1  | 0.955482 |
|       T5ForConditionalGeneration        | 1  | 0.954293 |
|          AllenaiLongformerBase          | 1  | 0.948468 |
|            XLNetLMHeadModel             | 1  | 0.90933  |
|            PLBartForCausalLM            | 1  | 0.900845 |
|     PLBartForConditionalGeneration      | 1  | 0.900189 |
|            MBartForCausalLM             | 1  | 0.887361 |
|       DebertaForQuestionAnswering       | 1  | 0.88125  |
|      BartForConditionalGeneration       | 1  | 0.87553  |
|               GoogleFnet                | 1  | 0.857217 |
|       RobertaForQuestionAnswering       | 1  | 0.85329  |
|    LayoutLMForSequenceClassification    | 1  | 0.848993 |
|        BertForQuestionAnswering         | 1  | 0.847515 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.842758 |
|       ElectraForQuestionAnswering       | 1  | 0.840615 |
|    MegatronBertForQuestionAnswering     | 1  | 0.839899 |
|           DebertaForMaskedLM            | 1  | 0.830367 |
|               DistillGPT2               | 1  | 0.828017 |
|          DebertaV2ForMaskedLM           | 1  | 0.823835 |
|         MegatronBertForCausalLM         | 1  | 0.81846  |
|           LayoutLMForMaskedLM           | 1  | 0.818357 |
|             BertForMaskedLM             | 1  | 0.814928 |
|                CamemBert                | 1  | 0.813679 |
|           RobertaForCausalLM            | 1  | 0.812596 |
|             BartForCausalLM             | 1  | 0.807696 |
|            YituTechConvBert             | 1  | 0.806852 |
|         Speech2Text2ForCausalLM         | 1  | 0.805638 |
|           ElectraForCausalLM            | 1  | 0.803883 |
|     DistilBertForQuestionAnswering      | 1  | 0.799263 |
|          BlenderbotForCausalLM          | 1  | 0.79801  |
|            TrOCRForCausalLM             | 1  | 0.786849 |
|       MT5ForConditionalGeneration       | 1  | 0.777839 |
|       BlenderbotSmallForCausalLM        | 1  | 0.764244 |
|           PegasusForCausalLM            | 1  | 0.753957 |
|          DistilBertForMaskedLM          | 1  | 0.745944 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.737039 |
|     M2M100ForConditionalGeneration      | 1  | 0.720903 |
|     MobileBertForQuestionAnswering      | 1  | 0.718541 |
|     PegasusForConditionalGeneration     | 1  | 0.712155 |
|             XGLMForCausalLM             | 1  | 0.707154 |
|          MobileBertForMaskedLM          | 1  | 0.693637 |
|            AlbertForMaskedLM            | 1  | 0.448743 |
|       AlbertForQuestionAnswering        | 1  | 0.443631 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12981.40174  |
|       AlbertForQuestionAnswering        | 1  | 12942.033986 |
|      MBartForConditionalGeneration      | 1  | 6070.444023  |
|      BartForConditionalGeneration       | 1  | 5526.163937  |
|             OPTForCausalLM              | 1  | 5307.202386  |
|          DebertaV2ForMaskedLM           | 1  | 5066.956516  |
|      DebertaV2ForQuestionAnswering      | 1  | 3984.639341  |
|            XLNetLMHeadModel             | 1  | 3209.295375  |
|            MBartForCausalLM             | 1  | 3080.671788  |
|          BlenderbotForCausalLM          | 1  | 2646.414348  |
|       T5ForConditionalGeneration        | 1  |  2540.77012  |
|             BartForCausalLM             | 1  |  2520.67264  |
|                 T5Small                 | 1  | 2513.716872  |
|          AllenaiLongformerBase          | 1  | 2448.556795  |
|     PLBartForConditionalGeneration      | 1  | 2146.211609  |
|         MegatronBertForCausalLM         | 1  |  2034.26955  |
|    MegatronBertForQuestionAnswering     | 1  | 1856.368364  |
|      GPT2ForSequenceClassification      | 1  | 1274.814727  |
|            PLBartForCausalLM            | 1  | 1233.174075  |
|             XGLMForCausalLM             | 1  |  835.63718   |
|           DebertaForMaskedLM            | 1  |  788.405845  |
|           RobertaForCausalLM            | 1  |  767.785282  |
|     M2M100ForConditionalGeneration      | 1  |  716.164327  |
|            YituTechConvBert             | 1  |  684.18435   |
|                CamemBert                | 1  |  681.770878  |
|           LayoutLMForMaskedLM           | 1  |  676.386116  |
|             BertForMaskedLM             | 1  |  673.960672  |
|     PegasusForConditionalGeneration     | 1  |   605.5026   |
|            TrOCRForCausalLM             | 1  |  597.51318   |
|       DebertaForQuestionAnswering       | 1  |  555.806224  |
|    LayoutLMForSequenceClassification    | 1  |  535.666114  |
|       RobertaForQuestionAnswering       | 1  |  533.819449  |
|        BertForQuestionAnswering         | 1  |  533.30232   |
|               DistillGPT2               | 1  |  499.093014  |
|               GoogleFnet                | 1  |  477.720973  |
|       MT5ForConditionalGeneration       | 1  |  394.133337  |
|           PegasusForCausalLM            | 1  |  300.351799  |
| BlenderbotSmallForConditionalGeneration | 1  |  143.43253   |
|           ElectraForCausalLM            | 1  |  129.184801  |
|          DistilBertForMaskedLM          | 1  |  100.007382  |
|       ElectraForQuestionAnswering       | 1  |  89.770246   |
|       BlenderbotSmallForCausalLM        | 1  |  84.410679   |
|          MobileBertForMaskedLM          | 1  |  67.782419   |
|     DistilBertForQuestionAnswering      | 1  |  63.715116   |
|     MobileBertForQuestionAnswering      | 1  |  40.131714   |
|         Speech2Text2ForCausalLM         | 1  |  18.607097   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 2.44474  |
|          inception_v3           | 1  | 2.426677 |
|       gluon_inception_v3        | 1  | 2.411136 |
|        adv_inception_v3         | 1  | 2.382813 |
|         mobilenetv2_100         | 1  | 2.261852 |
|            nfnet_l0             | 1  | 2.235695 |
|           dm_nfnet_f0           | 1  | 2.229026 |
|          spnasnet_100           | 1  | 2.215216 |
|          ghostnet_100           | 1  | 2.208646 |
|            lcnet_050            | 1  | 2.202036 |
|            levit_128            | 1  | 2.189916 |
|           mnasnet_100           | 1  | 2.141998 |
|           regnety_002           | 1  | 2.120001 |
|           fbnetc_100            | 1  | 2.114058 |
|      mobilenetv3_large_100      | 1  | 2.058991 |
|            repvgg_a2            | 1  | 1.972125 |
|            fbnetv3_b            | 1  | 1.890599 |
|           rexnet_100            | 1  | 1.852906 |
|       tf_efficientnet_b0        | 1  | 1.827902 |
|         poolformer_m36          | 1  | 1.795568 |
|           selecsls42b           | 1  | 1.792028 |
|            tinynet_a            | 1  | 1.78575  |
|             dla102              | 1  | 1.744994 |
|        ese_vovnet19b_dw         | 1  | 1.743728 |
|           mobilevit_s           | 1  | 1.738391 |
|          botnet26t_256          | 1  | 1.670671 |
|       eca_botnext26ts_256       | 1  | 1.66571  |
|        eca_halonext26ts         | 1  | 1.64988  |
|          cspdarknet53           | 1  | 1.62062  |
|        res2net50_14w_8s         | 1  | 1.607124 |
|           res2next50            | 1  | 1.601469 |
|           volo_d1_224           | 1  | 1.578217 |
|           tf_mixnet_l           | 1  | 1.574613 |
|         coat_lite_mini          | 1  | 1.555848 |
|        res2net101_26w_4s        | 1  | 1.544995 |
|        twins_pcpvt_base         | 1  | 1.433191 |
|         visformer_small         | 1  | 1.431403 |
|           convit_base           | 1  | 1.413544 |
|          gmixer_24_224          | 1  | 1.395825 |
|            mixnet_l             | 1  | 1.383713 |
|          jx_nest_base           | 1  | 1.363751 |
|            gernet_l             | 1  | 1.350326 |
|      beit_base_patch16_224      | 1  | 1.313706 |
|          resmlp_12_224          | 1  | 1.308284 |
|         crossvit_9_240          | 1  | 1.308206 |
|  swin_base_patch4_window7_224   | 1  | 1.298489 |
|        tnt_s_patch16_224        | 1  | 1.287972 |
|        convmixer_768_32         | 1  | 1.249334 |
|          gmlp_s16_224           | 1  | 1.240465 |
| deit_base_distilled_patch16_224 | 1  | 1.215837 |
|          mixer_b16_224          | 1  | 1.214728 |
|      vit_base_patch16_224       | 1  | 1.194964 |
|            pit_b_224            | 1  | 1.187904 |
|      xcit_large_24_p8_224       | 1  | 1.184208 |
|             dpn107              | 1  | 1.179631 |
|          convnext_base          | 1  | 1.175613 |
|          cait_m36_384           | 1  | 1.046155 |
|           resnest101e           | 1  | 0.992916 |
|        sebotnet33ts_256         | 1  | 0.974704 |
|            hrnet_w18            | 1  | 0.62026  |
|     swsl_resnext101_32x16d      | 1  | 0.067301 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|     swsl_resnext101_32x16d      | 1  | 83.309335 |
|          cait_m36_384           | 1  | 55.740597 |
|            hrnet_w18            | 1  | 53.387127 |
|      xcit_large_24_p8_224       | 1  | 51.715458 |
|          pnasnet5large          | 1  | 48.731556 |
|  swin_base_patch4_window7_224   | 1  | 46.755659 |
|         poolformer_m36          | 1  | 42.897622 |
|        twins_pcpvt_base         | 1  | 36.431595 |
|        res2net101_26w_4s        | 1  | 36.214182 |
|          jx_nest_base           | 1  | 35.566671 |
|        tnt_s_patch16_224        | 1  | 34.805368 |
|           resnest101e           | 1  | 34.803224 |
|        res2net50_14w_8s         | 1  | 33.795868 |
|             dpn107              | 1  | 31.817563 |
|           tf_mixnet_l           | 1  | 29.364083 |
|           mobilevit_s           | 1  | 29.349375 |
|        adv_inception_v3         | 1  | 27.729113 |
|           volo_d1_224           | 1  | 27.546822 |
|            mixnet_l             | 1  | 27.345173 |
|          gmixer_24_224          | 1  | 25.608252 |
|          inception_v3           | 1  | 25.607528 |
|       gluon_inception_v3        | 1  | 25.418936 |
|         crossvit_9_240          | 1  | 25.189243 |
|            levit_128            | 1  | 24.478425 |
|          gmlp_s16_224           | 1  | 24.314076 |
|           res2next50            | 1  | 22.925354 |
|          convnext_base          | 1  | 22.72636  |
|        eca_halonext26ts         | 1  | 21.651897 |
|            fbnetv3_b            | 1  | 21.578497 |
|        sebotnet33ts_256         | 1  | 21.122072 |
|             dla102              | 1  | 20.68792  |
|          ghostnet_100           | 1  | 20.184432 |
|         coat_lite_mini          | 1  | 20.102017 |
|           rexnet_100            | 1  | 19.953179 |
|           convit_base           | 1  | 18.706726 |
|       eca_botnext26ts_256       | 1  | 17.738821 |
|            tinynet_a            | 1  | 17.570137 |
|         visformer_small         | 1  | 17.172561 |
|            pit_b_224            | 1  | 16.839549 |
|        convmixer_768_32         | 1  | 16.812611 |
|          botnet26t_256          | 1  | 16.771052 |
|       tf_efficientnet_b0        | 1  | 16.515034 |
|          cspdarknet53           | 1  | 16.470335 |
|          mixer_b16_224          | 1  | 16.020229 |
|           dm_nfnet_f0           | 1  |  15.9393  |
|      beit_base_patch16_224      | 1  | 15.921717 |
| deit_base_distilled_patch16_224 | 1  | 14.847483 |
|      vit_base_patch16_224       | 1  | 14.801853 |
|            repvgg_a2            | 1  | 14.509695 |
|           regnety_002           | 1  | 14.505743 |
|           fbnetc_100            | 1  | 14.48766  |
|            nfnet_l0             | 1  | 14.472862 |
|      mobilenetv3_large_100      | 1  | 14.419962 |
|          spnasnet_100           | 1  | 14.394211 |
|            gernet_l             | 1  | 13.907701 |
|         mobilenetv2_100         | 1  | 13.302841 |
|           mnasnet_100           | 1  | 13.081515 |
|        ese_vovnet19b_dw         | 1  | 12.994774 |
|           selecsls42b           | 1  | 12.420322 |
|          resmlp_12_224          | 1  | 11.819098 |
|            lcnet_050            | 1  | 11.290637 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.952115 |
|          pnasnet5large          | 1  | 0.908416 |
|        convmixer_768_32         | 1  | 0.905139 |
|            nfnet_l0             | 1  | 0.890449 |
|      xcit_large_24_p8_224       | 1  | 0.885962 |
|        ese_vovnet19b_dw         | 1  | 0.884011 |
|        eca_halonext26ts         | 1  | 0.869909 |
|           mnasnet_100           | 1  | 0.866127 |
|         mobilenetv2_100         | 1  | 0.864662 |
|            lcnet_050            | 1  | 0.860303 |
|       tf_efficientnet_b0        | 1  | 0.858651 |
|          spnasnet_100           | 1  | 0.856097 |
|       eca_botnext26ts_256       | 1  | 0.854462 |
|           dm_nfnet_f0           | 1  | 0.853517 |
|      mobilenetv3_large_100      | 1  | 0.851124 |
|          botnet26t_256          | 1  | 0.849515 |
|           rexnet_100            | 1  | 0.848818 |
|           fbnetc_100            | 1  | 0.844607 |
|            fbnetv3_b            | 1  | 0.844141 |
|            tinynet_a            | 1  | 0.844098 |
|          cspdarknet53           | 1  | 0.843702 |
|           tf_mixnet_l           | 1  | 0.838009 |
|           mobilevit_s           | 1  | 0.837712 |
|          ghostnet_100           | 1  | 0.831381 |
|           regnety_002           | 1  | 0.829752 |
|          resmlp_12_224          | 1  | 0.823534 |
|            mixnet_l             | 1  | 0.821957 |
|        sebotnet33ts_256         | 1  | 0.818904 |
|         coat_lite_mini          | 1  | 0.81866  |
|         visformer_small         | 1  | 0.816518 |
|             dpn107              | 1  | 0.812732 |
|         poolformer_m36          | 1  | 0.804787 |
|            levit_128            | 1  | 0.803052 |
|          convnext_base          | 1  | 0.798045 |
|          gmlp_s16_224           | 1  | 0.795914 |
|        adv_inception_v3         | 1  | 0.795733 |
|          inception_v3           | 1  | 0.795731 |
|           resnest101e           | 1  | 0.793769 |
|       gluon_inception_v3        | 1  | 0.793678 |
|             dla102              | 1  | 0.79327  |
|           res2next50            | 1  | 0.792611 |
|          gmixer_24_224          | 1  | 0.789651 |
|           volo_d1_224           | 1  | 0.787632 |
|           selecsls42b           | 1  | 0.784838 |
|         crossvit_9_240          | 1  | 0.784153 |
|           convit_base           | 1  | 0.783879 |
|        tnt_s_patch16_224        | 1  | 0.782173 |
|            hrnet_w18            | 1  | 0.779326 |
|        twins_pcpvt_base         | 1  | 0.778217 |
|          jx_nest_base           | 1  | 0.777173 |
|        res2net50_14w_8s         | 1  | 0.776978 |
|          mixer_b16_224          | 1  | 0.776562 |
|      beit_base_patch16_224      | 1  | 0.773221 |
| deit_base_distilled_patch16_224 | 1  | 0.765976 |
|      vit_base_patch16_224       | 1  | 0.763791 |
|            pit_b_224            | 1  | 0.753495 |
|  swin_base_patch4_window7_224   | 1  | 0.740392 |
|            repvgg_a2            | 1  | 0.738633 |
|        res2net101_26w_4s        | 1  | 0.735698 |
|            gernet_l             | 1  | 0.732352 |
|     swsl_resnext101_32x16d      | 1  | 0.653659 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13587.489641 |
|          cait_m36_384           | 1  | 3479.567647  |
|      xcit_large_24_p8_224       | 1  | 1568.142452  |
|           resnest101e           | 1  | 1088.361076  |
|          pnasnet5large          | 1  |  372.473859  |
|          convnext_base          | 1  |  305.513522  |
|            hrnet_w18            | 1  |  292.413348  |
|             dpn107              | 1  |  273.395505  |
|        convmixer_768_32         | 1  |  250.544743  |
|          jx_nest_base           | 1  |  248.740809  |
|  swin_base_patch4_window7_224   | 1  |  224.385534  |
|      vit_base_patch16_224       | 1  |  199.055843  |
|           convit_base           | 1  |  198.689268  |
|      beit_base_patch16_224      | 1  |  197.150587  |
| deit_base_distilled_patch16_224 | 1  |  196.260539  |
|            pit_b_224            | 1  |  166.987915  |
|           dm_nfnet_f0           | 1  |  159.680961  |
|          mixer_b16_224          | 1  |  142.762966  |
|         poolformer_m36          | 1  |  120.666515  |
|        res2net101_26w_4s        | 1  |  112.242881  |
|        twins_pcpvt_base         | 1  |  103.607234  |
|        sebotnet33ts_256         | 1  |  99.366489   |
|            nfnet_l0             | 1  |  94.916974   |
|           volo_d1_224           | 1  |  92.963346   |
|             dla102              | 1  |  91.359302   |
|        tnt_s_patch16_224        | 1  |  91.005809   |
|          cspdarknet53           | 1  |  84.175868   |
|          inception_v3           | 1  |  69.991902   |
|        adv_inception_v3         | 1  |  69.810902   |
|       gluon_inception_v3        | 1  |  69.692181   |
|          gmlp_s16_224           | 1  |  68.604307   |
|         visformer_small         | 1  |  67.782201   |
|          gmixer_24_224          | 1  |  65.490215   |
|            repvgg_a2            | 1  |  64.756724   |
|        res2net50_14w_8s         | 1  |  64.286233   |
|           res2next50            | 1  |   59.29202   |
|            gernet_l             | 1  |  57.068427   |
|          botnet26t_256          | 1  |  46.444835   |
|           selecsls42b           | 1  |   45.96679   |
|        eca_halonext26ts         | 1  |   42.91549   |
|       eca_botnext26ts_256       | 1  |  41.916173   |
|         coat_lite_mini          | 1  |  37.700977   |
|           mobilevit_s           | 1  |  36.667761   |
|          resmlp_12_224          | 1  |  36.296012   |
|        ese_vovnet19b_dw         | 1  |   31.47753   |
|         crossvit_9_240          | 1  |  31.224998   |
|            mixnet_l             | 1  |  30.018666   |
|           tf_mixnet_l           | 1  |  29.182404   |
|            fbnetv3_b            | 1  |  15.529204   |
|       tf_efficientnet_b0        | 1  |  13.852984   |
|           rexnet_100            | 1  |  13.518348   |
|            tinynet_a            | 1  |  11.773307   |
|           fbnetc_100            | 1  |   9.22381    |
|            levit_128            | 1  |   8.992193   |
|          ghostnet_100           | 1  |   8.560939   |
|          spnasnet_100           | 1  |   8.051792   |
|           mnasnet_100           | 1  |   7.508493   |
|         mobilenetv2_100         | 1  |   7.355516   |
|      mobilenetv3_large_100      | 1  |   7.028981   |
|           regnety_002           | 1  |   6.020237   |
|            lcnet_050            | 1  |   2.410644   |
+---------------------------------+----+--------------+

zxd1997066 · 2024-10-28T14:43:41Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-26 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 73/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.15x    |    2.16x    |    2.74x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   21.69    |    23.42    |    23.06    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|     pyhpc_equation_of_state     | 1048576 | 18.03564 |
|          timm_resnest           |   32    | 5.28062  |
|            resnet50             |   32    | 4.935961 |
|         resnext50_32x4d         |    8    | 4.400583 |
|         phlippe_resnet          |   128   | 4.362852 |
|          squeezenet1_1          |   16    | 4.142752 |
|            resnet152            |   32    | 4.02127  |
|           mnasnet1_0            |   32    | 3.86438  |
|            resnet18             |    8    | 3.859721 |
|          mobilenet_v2           |   16    | 3.784046 |
|              vgg16              |    4    | 3.747073 |
|       mobilenet_v3_large        |   32    | 3.643136 |
|           timm_vovnet           |   32    | 3.490827 |
|        timm_efficientnet        |   64    | 3.488772 |
|             yolov3              |    8    | 3.485466 |
|             alexnet             |   128   | 3.272514 |
|           timm_regnet           |   32    | 2.991063 |
|             hf_GPT2             |    1    | 2.905028 |
|       shufflenet_v2_x1_0        |   64    | 2.83945  |
|          maml_omniglot          |    5    | 2.740285 |
|             hf_Bert             |    1    | 2.697671 |
|           densenet121           |   64    | 2.647302 |
|              llama              |   32    | 2.639729 |
|           timm_nfnet            |   128   | 2.620417 |
|       doctr_det_predictor       |    1    | 2.612542 |
|          hf_Bert_large          |    1    | 2.60734  |
|        soft_actor_critic        |   256   | 2.58833  |
|          hf_DistilBert          |    1    | 2.37322  |
|        phlippe_densenet         |   128   | 2.353879 |
|          BERT_pytorch           |    2    | 2.348975 |
|          lennard_jones          |  1000   | 2.338374 |
|           hf_T5_base            |    1    | 2.266157 |
|            hf_Albert            |    1    | 2.221545 |
|             hf_Bart             |    1    | 2.17856  |
|               drq               |    1    | 2.175494 |
|     functorch_maml_omniglot     |    1    | 2.174651 |
|         LearningToPaint         |   96    | 2.10644  |
|              hf_T5              |    1    | 2.084737 |
|          fastNLP_Bert           |    1    | 2.077746 |
|            moondream            |    1    | 2.067466 |
|              dcgan              |   256   | 2.023902 |
|          hf_Longformer          |    1    | 1.960892 |
|          hf_GPT2_large          |    1    | 1.882402 |
|           hf_T5_large           |    1    | 1.83397  |
|        basic_gnn_edgecnn        |    1    |  1.8338  |
|      doctr_reco_predictor       |    1    | 1.83056  |
|          pytorch_unet           |    1    | 1.749935 |
|       Background_Matting        |    1    | 1.740242 |
|      torch_multimodal_clip      |   32    | 1.732291 |
|        hf_distil_whisper        |    1    | 1.689356 |
|         pytorch_stargan         |   16    | 1.666737 |
|     timm_vision_transformer     |   32    | 1.616217 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.603946 |
|       speech_transformer        |    1    | 1.491091 |
|    detectron2_fcos_r_50_fpn     |    1    | 1.482031 |
|  timm_vision_transformer_large  |   32    | 1.473562 |
|           hf_Reformer           |    1    | 1.459893 |
|     nvidia_deeprecommender      |   256   | 1.402611 |
|         vision_maskrcnn         |    1    | 1.35317  |
|          basic_gnn_gin          |    1    | 1.352486 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.348495 |
|         basic_gnn_sage          |    1    | 1.282961 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.256553 |
|          basic_gnn_gcn          |    1    | 1.248823 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.234585 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.207029 |
|           Super_SloMo           |    6    | 1.201131 |
|         opacus_cifar10          |   64    | 1.175699 |
|              dlrm               |  2048   | 1.152452 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.149637 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.136002 |
|      functorch_dp_cifar10       |   64    | 1.129947 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.082066 |
|     pyhpc_isoneutral_mixing     | 1048576 | 1.078616 |
|             demucs              |    1    | 1.059685 |
|           hf_BigBird            |    1    | 1.058651 |
|   mobilenet_v2_quantized_qat    |   96    | 1.005213 |
|           tts_angular           |   64    | 0.988513 |
|     resnet50_quantized_qat      |   32    | 0.980222 |
|              maml               |    1    | 0.745216 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 130.645806 |
|           hf_BigBird            |    1    | 129.434619 |
|    detectron2_fcos_r_50_fpn     |    1    | 103.002902 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 99.736448  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 88.555838  |
|              maml               |    1    | 73.168854  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 63.653395  |
|           hf_T5_large           |    1    |  62.93573  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 52.812058  |
|          hf_Longformer          |    1    |  50.14822  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 50.084134  |
|       speech_transformer        |    1    | 44.869633  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 42.054627  |
|           hf_Reformer           |    1    | 38.401185  |
|            moondream            |    1    | 35.534659  |
|           densenet121           |   64    | 34.738404  |
|     pyhpc_isoneutral_mixing     | 1048576 | 34.295081  |
|          hf_GPT2_large          |    1    | 32.802282  |
|  timm_vision_transformer_large  |   32    | 30.319262  |
|           hf_T5_base            |    1    | 29.316454  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.304025  |
|          fastNLP_Bert           |    1    | 27.255372  |
|           Super_SloMo           |    6    |  26.98556  |
|          basic_gnn_gcn          |    1    | 26.781609  |
|       doctr_det_predictor       |    1    | 26.635054  |
|        hf_distil_whisper        |    1    | 26.085503  |
|      torch_multimodal_clip      |   32    | 24.603762  |
|          hf_Bert_large          |    1    | 24.439137  |
|            resnet152            |   32    | 22.319981  |
|          BERT_pytorch           |    2    | 22.237369  |
|             yolov3              |    8    | 20.195127  |
|             hf_Bart             |    1    | 19.422813  |
|              hf_T5              |    1    | 19.409975  |
|             demucs              |    1    | 17.632789  |
|           timm_regnet           |   32    | 17.213793  |
|       shufflenet_v2_x1_0        |   64    | 17.098229  |
|        phlippe_densenet         |   128   | 16.872422  |
|       Background_Matting        |    1    | 16.764859  |
|           timm_nfnet            |   128   | 16.721421  |
|        timm_efficientnet        |   64    |  16.51971  |
|              llama              |   32    | 16.171959  |
|             hf_Bert             |    1    | 16.126162  |
|             hf_GPT2             |    1    | 16.062485  |
|          timm_resnest           |   32    | 15.979119  |
|       mobilenet_v3_large        |   32    | 14.990239  |
|            hf_Albert            |    1    | 14.972935  |
|     timm_vision_transformer     |   32    | 14.686389  |
|      doctr_reco_predictor       |    1    | 14.388361  |
|           timm_vovnet           |   32    | 14.235814  |
|          pytorch_unet           |    1    | 13.760925  |
|         resnext50_32x4d         |    8    | 13.604988  |
|         opacus_cifar10          |   64    | 13.601585  |
|          mobilenet_v2           |   16    | 13.519766  |
|            resnet50             |   32    | 13.484236  |
|         pytorch_stargan         |   16    | 13.421015  |
|          hf_DistilBert          |    1    | 13.189222  |
|           mnasnet1_0            |   32    | 13.169852  |
|      functorch_dp_cifar10       |   64    | 12.923079  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  12.77551  |
|          squeezenet1_1          |   16    | 11.043257  |
|            resnet18             |    8    |  10.82631  |
|         LearningToPaint         |   96    | 10.501685  |
|     pyhpc_equation_of_state     | 1048576 | 10.367415  |
|         phlippe_resnet          |   128   | 10.234522  |
|             alexnet             |   128   | 10.178928  |
|              vgg16              |    4    |  9.985151  |
|     functorch_maml_omniglot     |    1    |  9.474603  |
|               drq               |    1    |  9.355328  |
|              dlrm               |  2048   |  9.299233  |
|          maml_omniglot          |    5    |  9.237246  |
|        basic_gnn_edgecnn        |    1    |  9.002595  |
|          basic_gnn_gin          |    1    |  8.987981  |
|         basic_gnn_sage          |    1    |  8.981066  |
|              dcgan              |   256   |  8.804194  |
|     nvidia_deeprecommender      |   256   |  8.712173  |
|        soft_actor_critic        |   256   |  8.479515  |
|           tts_angular           |   64    |  8.418899  |
|          lennard_jones          |  1000   |  8.395633  |
|   mobilenet_v2_quantized_qat    |   96    |  0.233567  |
|     resnet50_quantized_qat      |   32    |  0.226533  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.987293 |
|           timm_nfnet            |   128   | 0.986369 |
|             demucs              |    1    | 0.98139  |
|           hf_T5_base            |    1    | 0.978289 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974801 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973286 |
|        timm_efficientnet        |   64    | 0.972992 |
|           timm_regnet           |   32    | 0.972344 |
|       Background_Matting        |    1    | 0.970795 |
|           Super_SloMo           |    6    | 0.968815 |
|              llama              |   32    | 0.967386 |
|      torch_multimodal_clip      |   32    | 0.967219 |
|       doctr_det_predictor       |    1    | 0.966689 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.966255 |
|             yolov3              |    8    | 0.965606 |
|           densenet121           |   64    | 0.965593 |
|        basic_gnn_edgecnn        |    1    | 0.965404 |
|            resnet152            |   32    | 0.96442  |
|          pytorch_unet           |    1    | 0.96376  |
|         LearningToPaint         |   96    | 0.962807 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.958655 |
|            resnet50             |   32    | 0.958369 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.957676 |
|           timm_vovnet           |   32    | 0.957584 |
|          timm_resnest           |   32    | 0.956198 |
|     resnet50_quantized_qat      |   32    | 0.955075 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.954605 |
|   mobilenet_v2_quantized_qat    |   96    | 0.953048 |
|     timm_vision_transformer     |   32    | 0.952512 |
|         vision_maskrcnn         |    1    | 0.948708 |
|          basic_gnn_gcn          |    1    | 0.948181 |
|      doctr_reco_predictor       |    1    | 0.947302 |
|         basic_gnn_sage          |    1    | 0.942496 |
|           hf_BigBird            |    1    | 0.939393 |
|           mnasnet1_0            |   32    | 0.939305 |
|          basic_gnn_gin          |    1    | 0.938547 |
|     pyhpc_equation_of_state     | 1048576 | 0.937754 |
|       mobilenet_v3_large        |   32    | 0.93606  |
|          mobilenet_v2           |   16    | 0.935149 |
|       shufflenet_v2_x1_0        |   64    | 0.93191  |
|  timm_vision_transformer_large  |   32    | 0.931399 |
|             hf_Bert             |    1    | 0.930683 |
|         pytorch_stargan         |   16    | 0.929544 |
|         resnext50_32x4d         |    8    | 0.925137 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.919703 |
|     nvidia_deeprecommender      |   256   | 0.919577 |
|          fastNLP_Bert           |    1    | 0.918886 |
|        phlippe_densenet         |   128   | 0.916607 |
|            hf_Albert            |    1    | 0.915132 |
|       speech_transformer        |    1    | 0.909795 |
|          squeezenet1_1          |   16    | 0.904255 |
|            moondream            |    1    | 0.903936 |
|              dcgan              |   256   | 0.903712 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.901389 |
|          hf_DistilBert          |    1    | 0.899949 |
|          BERT_pytorch           |    2    | 0.899945 |
|               drq               |    1    | 0.897333 |
|             hf_GPT2             |    1    | 0.897157 |
|         opacus_cifar10          |   64    | 0.891142 |
|          hf_Longformer          |    1    | 0.890844 |
|           tts_angular           |   64    | 0.890693 |
|        soft_actor_critic        |   256   | 0.886364 |
|             hf_Bart             |    1    | 0.879655 |
|          hf_GPT2_large          |    1    | 0.878939 |
|              vgg16              |    4    | 0.878368 |
|              hf_T5              |    1    | 0.878099 |
|         phlippe_resnet          |   128   | 0.869973 |
|          lennard_jones          |  1000   | 0.863487 |
|          maml_omniglot          |    5    | 0.861852 |
|     functorch_maml_omniglot     |    1    | 0.857367 |
|        hf_distil_whisper        |    1    | 0.855473 |
|          hf_Bert_large          |    1    | 0.85514  |
|      functorch_dp_cifar10       |   64    | 0.852906 |
|           hf_T5_large           |    1    | 0.829748 |
|            resnet18             |    8    | 0.820952 |
|           hf_Reformer           |    1    | 0.810842 |
|             alexnet             |   128   | 0.777908 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.76651  |
|              maml               |    1    | 0.712919 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.56985  |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 804.984509 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 784.277499 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 700.112613 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 696.473973 |
|  timm_vision_transformer_large  |   32    | 656.534225 |
|           hf_T5_base            |    1    | 368.920966 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 253.447216 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 240.655616 |
|         vision_maskrcnn         |    1    | 229.363079 |
|           Super_SloMo           |    6    | 200.010551 |
|          hf_GPT2_large          |    1    | 130.286185 |
|           hf_T5_large           |    1    | 126.358677 |
|           timm_nfnet            |   128   | 101.238712 |
|            moondream            |    1    | 101.094647 |
|           hf_BigBird            |    1    | 95.959825  |
|        hf_distil_whisper        |    1    | 92.345258  |
|    detectron2_fcos_r_50_fpn     |    1    | 77.279608  |
|          pytorch_unet           |    1    | 66.412325  |
|       Background_Matting        |    1    | 64.336455  |
|              maml               |    1    | 58.700395  |
|             demucs              |    1    | 51.706711  |
|           densenet121           |   64    | 50.309861  |
|      torch_multimodal_clip      |   32    |  40.19017  |
|           timm_regnet           |   32    | 36.607963  |
|       doctr_det_predictor       |    1    | 36.428162  |
|          hf_Bert_large          |    1    | 33.695584  |
|          hf_Longformer          |    1    | 32.072774  |
|            resnet152            |   32    | 30.279781  |
|   mobilenet_v2_quantized_qat    |   96    | 29.751206  |
|             yolov3              |    8    | 28.966873  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.391618  |
|     pyhpc_isoneutral_mixing     | 1048576 | 26.529563  |
|       speech_transformer        |    1    | 24.046481  |
|           hf_Reformer           |    1    | 21.312825  |
|     timm_vision_transformer     |   32    | 21.105178  |
|        timm_efficientnet        |   64    | 21.030261  |
|              hf_T5              |    1    | 18.512403  |
|             hf_Bart             |    1    | 18.362163  |
|     nvidia_deeprecommender      |   256   | 17.365533  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.731241  |
|           timm_vovnet           |   32    | 16.696303  |
|          fastNLP_Bert           |    1    | 16.471406  |
|         pytorch_stargan         |   16    | 15.511407  |
|             hf_Bert             |    1    | 13.418412  |
|            hf_Albert            |    1    | 13.042836  |
|          BERT_pytorch           |    2    |  12.28296  |
|     resnet50_quantized_qat      |   32    | 12.015687  |
|            resnet50             |   32    |  11.70062  |
|             hf_GPT2             |    1    |  10.39188  |
|       shufflenet_v2_x1_0        |   64    |  9.05697   |
|          timm_resnest           |   32    |  8.90093   |
|              llama              |   32    |  8.867838  |
|        phlippe_densenet         |   128   |  8.779412  |
|           tts_angular           |   64    |  8.663489  |
|         LearningToPaint         |   96    |  8.186866  |
|          hf_DistilBert          |    1    |  7.974057  |
|         opacus_cifar10          |   64    |  7.471316  |
|          basic_gnn_gcn          |    1    |  7.360046  |
|       mobilenet_v3_large        |   32    |  7.180804  |
|      functorch_dp_cifar10       |   64    |  7.043386  |
|         resnext50_32x4d         |    8    |  6.909117  |
|        basic_gnn_edgecnn        |    1    |  6.733313  |
|              vgg16              |    4    |  6.448112  |
|             alexnet             |   128   |  6.318195  |
|           mnasnet1_0            |   32    |  6.236879  |
|          mobilenet_v2           |   16    |  4.827305  |
|              dlrm               |  2048   |  4.54281   |
|         basic_gnn_sage          |    1    |  4.158155  |
|          basic_gnn_gin          |    1    |  3.69641   |
|      doctr_reco_predictor       |    1    |  3.530365  |
|          squeezenet1_1          |   16    |  3.250381  |
|              dcgan              |   256   |  2.750612  |
|            resnet18             |    8    |  2.701199  |
|         phlippe_resnet          |   128   |  1.302308  |
|     pyhpc_equation_of_state     | 1048576 |  1.189584  |
|               drq               |    1    |  0.724282  |
|     functorch_maml_omniglot     |    1    |  0.371055  |
|          maml_omniglot          |    5    |  0.282999  |
|        soft_actor_critic        |   256   |  0.240592  |
|          lennard_jones          |  1000   |  0.154185  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 12.813884 |
|       ElectraForQuestionAnswering       | 64  | 4.193569  |
|           ElectraForCausalLM            | 32  | 3.703387  |
|     MobileBertForQuestionAnswering      | 128 | 3.395685  |
|           RobertaForCausalLM            | 16  | 3.075497  |
|    LayoutLMForSequenceClassification    | 16  | 3.036452  |
|       RobertaForQuestionAnswering       | 16  | 3.020856  |
|          MobileBertForMaskedLM          | 128 | 2.966719  |
|        BertForQuestionAnswering         | 16  | 2.930669  |
|                CamemBert                | 16  | 2.885756  |
|             BertForMaskedLM             | 16  | 2.788085  |
|           LayoutLMForMaskedLM           | 16  |  2.78607  |
|    MegatronBertForQuestionAnswering     |  8  | 2.703881  |
|               DistillGPT2               | 16  |  2.46284  |
|            YituTechConvBert             | 16  | 2.397607  |
|       T5ForConditionalGeneration        |  4  |  2.38738  |
|                 T5Small                 |  4  | 2.379471  |
|         MegatronBertForCausalLM         |  4  | 2.357449  |
|      GPT2ForSequenceClassification      |  4  | 2.331823  |
|           DebertaForMaskedLM            |  8  | 2.315463  |
|             XGLMForCausalLM             |  8  | 2.093303  |
|       DebertaForQuestionAnswering       | 16  | 2.071195  |
|             OPTForCausalLM              |  2  | 2.062423  |
|         Speech2Text2ForCausalLM         | 256 | 1.987389  |
|          BlenderbotForCausalLM          |  4  | 1.935759  |
|       MT5ForConditionalGeneration       | 16  | 1.899441  |
|       BlenderbotSmallForCausalLM        | 64  |  1.87186  |
|     DistilBertForQuestionAnswering      | 256 | 1.825779  |
|          DebertaV2ForMaskedLM           |  2  | 1.808338  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.756762  |
|            MBartForCausalLM             |  4  | 1.737761  |
|     PLBartForConditionalGeneration      |  4  |  1.72464  |
|          DistilBertForMaskedLM          | 128 | 1.698757  |
|            PLBartForCausalLM            |  8  | 1.662697  |
|            TrOCRForCausalLM             | 32  | 1.654771  |
|     M2M100ForConditionalGeneration      | 16  | 1.620644  |
|           PegasusForCausalLM            | 32  | 1.618846  |
|     PegasusForConditionalGeneration     | 32  | 1.615464  |
|      MBartForConditionalGeneration      |  2  |  1.61485  |
|      BartForConditionalGeneration       |  2  | 1.558961  |
|             BartForCausalLM             |  4  | 1.492264  |
|            AlbertForMaskedLM            |  4  | 1.473616  |
|       AlbertForQuestionAnswering        |  4  | 1.440989  |
|               GoogleFnet                | 16  | 1.430538  |
|          AllenaiLongformerBase          |  4  |  1.06993  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.035085  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 85.028715 |
|     MobileBertForQuestionAnswering      | 128 | 51.425956 |
|          MobileBertForMaskedLM          | 128 | 51.217132 |
|      MBartForConditionalGeneration      |  2  | 36.621088 |
|     M2M100ForConditionalGeneration      | 16  | 36.511427 |
|     PegasusForConditionalGeneration     | 32  | 36.267292 |
|      BartForConditionalGeneration       |  2  | 31.871458 |
|             XGLMForCausalLM             |  8  | 30.914835 |
|          BlenderbotForCausalLM          |  4  | 30.780349 |
|          DebertaV2ForMaskedLM           |  2  | 30.332199 |
|      DebertaV2ForQuestionAnswering      |  1  | 29.388546 |
|       MT5ForConditionalGeneration       | 16  | 27.554094 |
| BlenderbotSmallForConditionalGeneration | 64  | 26.853404 |
|         MegatronBertForCausalLM         |  4  | 25.349355 |
|    MegatronBertForQuestionAnswering     |  8  | 24.505287 |
|            YituTechConvBert             | 16  | 22.966212 |
|     PLBartForConditionalGeneration      |  4  | 22.110885 |
|                 T5Small                 |  4  | 21.112964 |
|       T5ForConditionalGeneration        |  4  | 21.041626 |
|           DebertaForMaskedLM            |  8  | 19.39571  |
|           PegasusForCausalLM            | 32  | 19.276175 |
|             OPTForCausalLM              |  2  | 19.162717 |
|            MBartForCausalLM             |  4  | 19.124373 |
|       DebertaForQuestionAnswering       | 16  | 18.995166 |
|            TrOCRForCausalLM             | 32  | 18.565152 |
|            AlbertForMaskedLM            |  4  | 17.442538 |
|             BartForCausalLM             |  4  | 16.658222 |
|      GPT2ForSequenceClassification      |  4  | 16.424279 |
|            XLNetLMHeadModel             |  8  | 16.35314  |
|           ElectraForCausalLM            | 32  | 15.996114 |
|       RobertaForQuestionAnswering       | 16  | 15.928575 |
|           LayoutLMForMaskedLM           | 16  | 15.890556 |
|             BertForMaskedLM             | 16  | 15.88272  |
|        BertForQuestionAnswering         | 16  | 15.837373 |
|           RobertaForCausalLM            | 16  | 15.745274 |
|    LayoutLMForSequenceClassification    | 16  | 15.693081 |
|       ElectraForQuestionAnswering       | 64  | 15.692543 |
|                CamemBert                | 16  | 15.680417 |
|       BlenderbotSmallForCausalLM        | 64  | 15.610655 |
|       AlbertForQuestionAnswering        |  4  | 14.216465 |
|               GoogleFnet                | 16  | 14.111338 |
|          DistilBertForMaskedLM          | 128 | 13.914886 |
|     DistilBertForQuestionAnswering      | 256 | 13.859338 |
|         Speech2Text2ForCausalLM         | 256 | 13.81621  |
|            PLBartForCausalLM            |  8  | 13.636787 |
|               DistillGPT2               | 16  | 12.409408 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.991641 |
|            PLBartForCausalLM            |  8  | 0.991412 |
|           LayoutLMForMaskedLM           | 16  | 0.990248 |
|               GoogleFnet                | 16  | 0.990154 |
|               DistillGPT2               | 16  | 0.990032 |
|             BertForMaskedLM             | 16  | 0.98984  |
|          DistilBertForMaskedLM          | 128 | 0.989382 |
|           ElectraForCausalLM            | 32  | 0.989377 |
|            AlbertForMaskedLM            |  4  | 0.988143 |
|       ElectraForQuestionAnswering       | 64  | 0.987591 |
|            YituTechConvBert             | 16  | 0.986514 |
|           RobertaForCausalLM            | 16  | 0.986208 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.985956 |
|                CamemBert                | 16  | 0.985933 |
|     DistilBertForQuestionAnswering      | 256 | 0.985925 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985558 |
|             OPTForCausalLM              |  2  | 0.985502 |
|         Speech2Text2ForCausalLM         | 256 | 0.985345 |
|        BertForQuestionAnswering         | 16  | 0.982519 |
|    LayoutLMForSequenceClassification    | 16  | 0.981422 |
|       DebertaForQuestionAnswering       | 16  | 0.981401 |
|            TrOCRForCausalLM             | 32  | 0.981394 |
|                 T5Small                 |  4  | 0.979169 |
|          MobileBertForMaskedLM          | 128 | 0.979107 |
|       T5ForConditionalGeneration        |  4  | 0.978712 |
|       RobertaForQuestionAnswering       | 16  | 0.978351 |
|       MT5ForConditionalGeneration       | 16  | 0.977266 |
|      GPT2ForSequenceClassification      |  4  | 0.975654 |
|          AllenaiLongformerBase          |  4  | 0.974137 |
|           PegasusForCausalLM            | 32  | 0.971187 |
|     MobileBertForQuestionAnswering      | 128 | 0.967898 |
|           DebertaForMaskedLM            |  8  | 0.966594 |
|     PLBartForConditionalGeneration      |  4  | 0.963251 |
|     M2M100ForConditionalGeneration      | 16  | 0.951706 |
|         MegatronBertForCausalLM         |  4  | 0.946221 |
|      BartForConditionalGeneration       |  2  | 0.942877 |
|          BlenderbotForCausalLM          |  4  | 0.940849 |
|    MegatronBertForQuestionAnswering     |  8  | 0.936817 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.935318 |
|            XLNetLMHeadModel             |  8  | 0.935076 |
|             BartForCausalLM             |  4  | 0.923267 |
|      MBartForConditionalGeneration      |  2  | 0.922269 |
|     PegasusForConditionalGeneration     | 32  | 0.917165 |
|             XGLMForCausalLM             |  8  | 0.909129 |
|          DebertaV2ForMaskedLM           |  2  | 0.904445 |
|            MBartForCausalLM             |  4  | 0.901392 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 930.832602 |
|            AlbertForMaskedLM            |  4  | 485.747068 |
|       AlbertForQuestionAnswering        |  4  | 485.023006 |
|            XLNetLMHeadModel             |  8  | 364.396653 |
|               GoogleFnet                | 16  | 263.781304 |
|             OPTForCausalLM              |  2  | 196.333387 |
|            TrOCRForCausalLM             | 32  | 169.948458 |
|            MBartForCausalLM             |  4  | 166.990091 |
|      MBartForConditionalGeneration      |  2  | 166.137123 |
|     PegasusForConditionalGeneration     | 32  | 162.443557 |
|            PLBartForCausalLM            |  8  | 138.754016 |
|     DistilBertForQuestionAnswering      | 256 | 132.57676  |
|      DebertaV2ForQuestionAnswering      |  1  | 131.865197 |
|      BartForConditionalGeneration       |  2  | 131.703483 |
|    MegatronBertForQuestionAnswering     |  8  | 127.94743  |
|            YituTechConvBert             | 16  | 124.052765 |
|       T5ForConditionalGeneration        |  4  | 122.171128 |
|                 T5Small                 |  4  | 121.526564 |
|          BlenderbotForCausalLM          |  4  | 120.387282 |
|     M2M100ForConditionalGeneration      | 16  | 118.74193  |
|          DebertaV2ForMaskedLM           |  2  | 117.513549 |
|     PLBartForConditionalGeneration      |  4  | 117.38078  |
|          DistilBertForMaskedLM          | 128 | 112.702083 |
| BlenderbotSmallForConditionalGeneration | 64  | 105.299799 |
|           RobertaForCausalLM            | 16  | 101.964212 |
|          MobileBertForMaskedLM          | 128 | 100.991255 |
|           LayoutLMForMaskedLM           | 16  | 99.413208  |
|             BertForMaskedLM             | 16  |  97.82967  |
|                CamemBert                | 16  | 95.302434  |
|       DebertaForQuestionAnswering       | 16  | 92.573219  |
|             BartForCausalLM             |  4  | 91.794964  |
|       MT5ForConditionalGeneration       | 16  |  89.75281  |
|         MegatronBertForCausalLM         |  4  | 83.768057  |
|        BertForQuestionAnswering         | 16  | 77.778349  |
|           PegasusForCausalLM            | 32  | 76.833636  |
|    LayoutLMForSequenceClassification    | 16  | 76.125165  |
|       RobertaForQuestionAnswering       | 16  | 75.552424  |
|               DistillGPT2               | 16  | 69.217538  |
|             XGLMForCausalLM             |  8  | 68.894954  |
|       ElectraForQuestionAnswering       | 64  | 61.887242  |
|           DebertaForMaskedLM            |  8  |  58.86294  |
|      GPT2ForSequenceClassification      |  4  | 57.671949  |
|         Speech2Text2ForCausalLM         | 256 |  56.78064  |
|           ElectraForCausalLM            | 32  | 56.181808  |
|       BlenderbotSmallForCausalLM        | 64  | 55.266913  |
|     MobileBertForQuestionAnswering      | 128 | 53.120221  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           resnest101e           |  64  | 5.33146  |
|          inception_v3           | 128  | 5.030141 |
|        adv_inception_v3         | 128  | 4.894069 |
|           mnasnet_100           | 512  | 4.878548 |
|           fbnetc_100            | 512  | 4.859015 |
|       gluon_inception_v3        | 256  | 4.795801 |
|          cspdarknet53           |  64  | 4.619314 |
|      mobilenetv3_large_100      | 512  | 4.605609 |
|           regnety_002           | 1024 | 4.581919 |
|            fbnetv3_b            | 256  | 4.510872 |
|         mobilenetv2_100         | 128  | 4.365139 |
|            lcnet_050            | 256  | 4.234801 |
|        ese_vovnet19b_dw         | 256  | 4.230486 |
|        res2net101_26w_4s        | 128  | 4.206002 |
|           res2next50            | 128  | 4.197473 |
|          spnasnet_100           | 128  | 4.121056 |
|          botnet26t_256          | 128  | 4.106477 |
|            hrnet_w18            | 128  | 4.075191 |
|        res2net50_14w_8s         | 128  | 4.060838 |
|          pnasnet5large          |  16  | 3.878344 |
|             dla102              | 128  | 3.785432 |
|     swsl_resnext101_32x16d      |  32  | 3.713182 |
|            gernet_l             | 128  | 3.712323 |
|            nfnet_l0             | 128  | 3.599283 |
|           rexnet_100            | 256  | 3.285928 |
|       eca_botnext26ts_256       | 128  | 3.261001 |
|           volo_d1_224           |  64  | 3.245663 |
|            tinynet_a            | 128  | 3.22232  |
|       tf_efficientnet_b0        | 128  | 3.214202 |
|           dm_nfnet_f0           | 128  | 3.138223 |
|           mobilevit_s           |  64  | 3.093926 |
|        convmixer_768_32         |  32  | 2.924301 |
|            repvgg_a2            | 128  | 2.770858 |
|        eca_halonext26ts         | 128  | 2.636054 |
|           selecsls42b           | 128  | 2.603632 |
|         visformer_small         | 128  |  2.5957  |
|          ghostnet_100           | 512  | 2.49276  |
|           convit_base           |  64  | 2.334731 |
|         poolformer_m36          |  64  | 2.296343 |
|          jx_nest_base           |  32  | 2.024814 |
|           tf_mixnet_l           | 128  | 2.00242  |
|      xcit_large_24_p8_224       |  16  | 1.992068 |
|             dpn107              |  64  | 1.961365 |
|            mixnet_l             | 128  | 1.926781 |
|            levit_128            | 1024 | 1.903955 |
|          convnext_base          |  64  | 1.873841 |
|         coat_lite_mini          | 128  | 1.866997 |
|        twins_pcpvt_base         | 128  | 1.634668 |
|        tnt_s_patch16_224        | 128  | 1.614053 |
|  swin_base_patch4_window7_224   |  64  | 1.587028 |
| deit_base_distilled_patch16_224 |  64  | 1.583008 |
|          gmlp_s16_224           | 128  | 1.566111 |
|      beit_base_patch16_224      |  64  | 1.523932 |
|         crossvit_9_240          | 256  | 1.504523 |
|        sebotnet33ts_256         |  64  | 1.45787  |
|          resmlp_12_224          | 128  | 1.453701 |
|            pit_b_224            |  64  | 1.451766 |
|      vit_base_patch16_224       |  64  | 1.448213 |
|          mixer_b16_224          | 128  | 1.437741 |
|          gmixer_24_224          | 128  | 1.298185 |
|          cait_m36_384           |  4   | 0.980929 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|          cait_m36_384           |  4   | 50.26071  |
|      xcit_large_24_p8_224       |  16  | 47.616036 |
|          pnasnet5large          |  16  | 46.016979 |
|         poolformer_m36          |  64  | 44.915305 |
|            hrnet_w18            | 128  | 43.603833 |
|  swin_base_patch4_window7_224   |  64  | 41.612752 |
|           mobilevit_s           |  64  | 35.105514 |
|        tnt_s_patch16_224        | 128  | 34.97813  |
|          jx_nest_base           |  32  | 33.91417  |
|        res2net101_26w_4s        | 128  | 32.701352 |
|             dpn107              |  64  | 32.429291 |
|           resnest101e           |  64  | 31.880148 |
|        twins_pcpvt_base         | 128  | 31.14484  |
|        res2net50_14w_8s         | 128  | 31.058546 |
|           tf_mixnet_l           | 128  | 29.309978 |
|        eca_halonext26ts         | 128  | 29.30088  |
|           volo_d1_224           |  64  | 28.378733 |
|            mixnet_l             | 128  | 27.221861 |
|        adv_inception_v3         | 128  | 26.791923 |
|            levit_128            | 1024 | 26.283678 |
|         crossvit_9_240          | 256  | 24.173132 |
|        sebotnet33ts_256         |  64  | 23.960116 |
|          inception_v3           | 128  | 23.887675 |
|          gmixer_24_224          | 128  | 23.79909  |
|       gluon_inception_v3        | 256  | 22.540085 |
|          gmlp_s16_224           | 128  | 22.464483 |
|          convnext_base          |  64  | 21.162301 |
|         coat_lite_mini          | 128  | 21.144414 |
|           res2next50            | 128  | 21.019542 |
|           rexnet_100            | 256  | 20.365271 |
|       eca_botnext26ts_256       | 128  | 20.288307 |
|            fbnetv3_b            | 256  | 20.195042 |
|           convit_base           |  64  | 19.68586  |
|          ghostnet_100           | 512  | 19.159384 |
|          botnet26t_256          | 128  | 18.620349 |
|             dla102              | 128  | 18.428232 |
|            tinynet_a            | 128  | 17.745207 |
|         visformer_small         | 128  | 17.595092 |
|       tf_efficientnet_b0        | 128  | 16.930203 |
|        convmixer_768_32         |  32  | 16.877192 |
|     swsl_resnext101_32x16d      |  32  | 16.848131 |
|          cspdarknet53           |  64  | 15.600889 |
|            pit_b_224            |  64  | 15.571916 |
|           dm_nfnet_f0           | 128  | 15.536443 |
|          mixer_b16_224          | 128  | 15.221279 |
|      beit_base_patch16_224      |  64  | 15.002269 |
|      vit_base_patch16_224       |  64  | 14.645254 |
| deit_base_distilled_patch16_224 |  64  | 14.547694 |
|            repvgg_a2            | 128  | 14.226118 |
|          spnasnet_100           | 128  | 14.014526 |
|            nfnet_l0             | 128  | 13.800574 |
|          resmlp_12_224          | 128  | 13.787505 |
|      mobilenetv3_large_100      | 512  | 13.554844 |
|           regnety_002           | 1024 |  13.5446  |
|            gernet_l             | 128  | 13.315318 |
|         mobilenetv2_100         | 128  | 13.012691 |
|           selecsls42b           | 128  |  12.5463  |
|           fbnetc_100            | 512  | 12.307255 |
|            lcnet_050            | 256  | 11.916455 |
|           mnasnet_100           | 512  | 11.550054 |
|        ese_vovnet19b_dw         | 256  | 11.302978 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995414 |
|           rexnet_100            | 256  | 0.992349 |
|      mobilenetv3_large_100      | 512  | 0.992117 |
|           fbnetc_100            | 512  | 0.991903 |
|           regnety_002           | 1024 |  0.9917  |
|           mnasnet_100           | 512  | 0.991571 |
|           dm_nfnet_f0           | 128  | 0.991534 |
|            fbnetv3_b            | 256  | 0.991439 |
|       gluon_inception_v3        | 256  | 0.991418 |
|          ghostnet_100           | 512  | 0.990946 |
|            levit_128            | 1024 | 0.990609 |
|      beit_base_patch16_224      |  64  | 0.988597 |
|        twins_pcpvt_base         | 128  | 0.988463 |
|          mixer_b16_224          | 128  | 0.988245 |
|           convit_base           |  64  | 0.988237 |
|             dpn107              |  64  | 0.98816  |
|       eca_botnext26ts_256       | 128  | 0.987883 |
|        eca_halonext26ts         | 128  | 0.987775 |
|             dla102              | 128  | 0.987612 |
|           res2next50            | 128  | 0.987404 |
|      xcit_large_24_p8_224       |  16  | 0.987315 |
|            nfnet_l0             | 128  | 0.987119 |
|        tnt_s_patch16_224        | 128  | 0.986372 |
|         coat_lite_mini          | 128  | 0.986116 |
|        res2net101_26w_4s        | 128  | 0.985977 |
|          gmlp_s16_224           | 128  | 0.985945 |
|          botnet26t_256          | 128  | 0.985918 |
|         visformer_small         | 128  | 0.98591  |
|           resnest101e           |  64  | 0.985511 |
|           tf_mixnet_l           | 128  | 0.985507 |
|          convnext_base          |  64  | 0.984976 |
|       tf_efficientnet_b0        | 128  | 0.984885 |
|            mixnet_l             | 128  | 0.984616 |
|        adv_inception_v3         | 128  | 0.984128 |
|          inception_v3           | 128  | 0.984036 |
|          gmixer_24_224          | 128  | 0.983824 |
|         poolformer_m36          |  64  | 0.983774 |
|          pnasnet5large          |  16  | 0.983493 |
|         mobilenetv2_100         | 128  | 0.983233 |
|         crossvit_9_240          | 256  | 0.983157 |
|          cspdarknet53           |  64  | 0.983076 |
|      vit_base_patch16_224       |  64  | 0.982854 |
|        convmixer_768_32         |  32  | 0.982835 |
|            tinynet_a            | 128  | 0.982791 |
|        res2net50_14w_8s         | 128  | 0.982691 |
|  swin_base_patch4_window7_224   |  64  | 0.981834 |
|          jx_nest_base           |  32  | 0.98143  |
| deit_base_distilled_patch16_224 |  64  | 0.980978 |
|     swsl_resnext101_32x16d      |  32  |  0.9805  |
|            pit_b_224            |  64  | 0.980223 |
|            hrnet_w18            | 128  | 0.979337 |
|          resmlp_12_224          | 128  | 0.979062 |
|           mobilevit_s           |  64  | 0.978914 |
|            gernet_l             | 128  | 0.978803 |
|           volo_d1_224           |  64  | 0.977448 |
|           selecsls42b           | 128  | 0.976552 |
|          cait_m36_384           |  4   | 0.974257 |
|          spnasnet_100           | 128  | 0.973743 |
|            repvgg_a2            | 128  | 0.971986 |
|            lcnet_050            | 256  | 0.966314 |
|        sebotnet33ts_256         |  64  | 0.83383  |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 461.397877 |
|      xcit_large_24_p8_224       |  16  | 251.358739 |
|             dpn107              |  64  | 219.532329 |
|            levit_128            | 1024 | 207.341575 |
|        tnt_s_patch16_224        | 128  | 192.190536 |
|          convnext_base          |  64  | 174.451861 |
|           dm_nfnet_f0           | 128  | 168.199463 |
|        ese_vovnet19b_dw         | 256  | 166.026404 |
|          mixer_b16_224          | 128  | 163.689295 |
|  swin_base_patch4_window7_224   |  64  | 156.767881 |
|           convit_base           |  64  | 153.766061 |
|          jx_nest_base           |  32  | 149.989796 |
|        twins_pcpvt_base         | 128  | 146.205198 |
|         poolformer_m36          |  64  | 141.464001 |
|       gluon_inception_v3        | 256  | 140.371107 |
|            nfnet_l0             | 128  | 137.556963 |
|           tf_mixnet_l           | 128  | 126.502514 |
|        eca_halonext26ts         | 128  | 124.823333 |
|        sebotnet33ts_256         |  64  | 124.119574 |
|            mixnet_l             | 128  | 122.183626 |
|         crossvit_9_240          | 256  | 121.257925 |
|          ghostnet_100           | 512  | 121.022336 |
|          gmixer_24_224          | 128  | 114.848863 |
|          gmlp_s16_224           | 128  | 109.912667 |
|           volo_d1_224           |  64  | 109.562478 |
|      vit_base_patch16_224       |  64  | 105.674393 |
|          pnasnet5large          |  16  | 105.40705  |
|      beit_base_patch16_224      |  64  | 104.822794 |
|         coat_lite_mini          | 128  | 103.826236 |
|            pit_b_224            |  64  | 99.282168  |
| deit_base_distilled_patch16_224 |  64  | 98.278879  |
|        res2net101_26w_4s        | 128  |  97.81229  |
|           fbnetc_100            | 512  | 92.991385  |
|           rexnet_100            | 256  | 89.505648  |
|       eca_botnext26ts_256       | 128  | 89.283917  |
|             dla102              | 128  | 85.957814  |
|     swsl_resnext101_32x16d      |  32  | 83.607726  |
|            hrnet_w18            | 128  | 81.711383  |
|          resmlp_12_224          | 128  | 80.790674  |
|           regnety_002           | 1024 | 80.483893  |
|         visformer_small         | 128  | 80.224557  |
|           mnasnet_100           | 512  | 77.861269  |
|           res2next50            | 128  | 75.715779  |
|        res2net50_14w_8s         | 128  | 75.598441  |
|      mobilenetv3_large_100      | 512  | 74.370224  |
|        convmixer_768_32         |  32  | 72.336816  |
|          botnet26t_256          | 128  | 71.841444  |
|           resnest101e           |  64  | 70.332856  |
|            fbnetv3_b            | 256  |  66.87963  |
|        adv_inception_v3         | 128  |  66.19297  |
|          inception_v3           | 128  |  65.92857  |
|           mobilevit_s           |  64  | 57.258078  |
|          cspdarknet53           |  64  | 45.619614  |
|            repvgg_a2            | 128  | 44.916408  |
|       tf_efficientnet_b0        | 128  | 43.138491  |
|            gernet_l             | 128  | 36.047797  |
|           selecsls42b           | 128  | 34.267163  |
|            tinynet_a            | 128  | 30.241985  |
|         mobilenetv2_100         | 128  | 21.951609  |
|          spnasnet_100           | 128  | 19.435217  |
|            lcnet_050            | 256  |  8.59958   |
+---------------------------------+------+------------+

zxd1997066 · 2024-10-28T14:43:43Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-26 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 73/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.85x    |    1.85x    |    3.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.75    |    21.98    |    21.59    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 35.905803 |
|     pyhpc_equation_of_state     |    1    | 20.612312 |
|              dcgan              |    1    | 8.541594  |
|          squeezenet1_1          |    1    | 7.787929  |
|      functorch_dp_cifar10       |    1    | 6.852757  |
|          timm_resnest           |    1    | 6.644045  |
|         opacus_cifar10          |    1    | 6.625948  |
|           timm_nfnet            |    1    | 6.060049  |
|            resnet18             |    1    | 5.904586  |
|          maml_omniglot          |    5    | 5.767068  |
|            resnet50             |    1    | 5.587631  |
|       doctr_det_predictor       |    1    | 5.476731  |
|         LearningToPaint         |    1    | 5.112603  |
|         resnext50_32x4d         |    1    | 5.105821  |
|              vgg16              |    1    |  4.78433  |
|          mobilenet_v2           |    1    | 4.718598  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.678573  |
|            resnet152            |    1    | 4.579331  |
|             yolov3              |    1    | 4.485097  |
|          lennard_jones          |    1    | 4.426856  |
|           mnasnet1_0            |    1    | 4.371616  |
|       mobilenet_v3_large        |    1    | 4.322162  |
|     nvidia_deeprecommender      |    1    | 4.294145  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.270594  |
|           timm_vovnet           |    1    | 4.238973  |
|             alexnet             |    1    | 4.083321  |
|              llama              |    1    | 4.049305  |
|         vision_maskrcnn         |    1    | 3.979159  |
|      doctr_reco_predictor       |    1    |  3.94624  |
|       shufflenet_v2_x1_0        |    1    |  3.70553  |
|     functorch_maml_omniglot     |    1    | 3.626007  |
|           timm_regnet           |    1    | 3.512762  |
|           densenet121           |    1    | 3.452923  |
|          basic_gnn_gin          |    1    | 3.443451  |
|              dlrm               |    1    | 3.338569  |
|         phlippe_resnet          |    1    | 3.177252  |
|        timm_efficientnet        |    1    | 3.108121  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.054557  |
|         basic_gnn_sage          |    1    | 3.023953  |
|          basic_gnn_gcn          |    1    | 3.003337  |
|           Super_SloMo           |    1    | 2.948033  |
|          BERT_pytorch           |    1    | 2.682614  |
|               drq               |    1    |  2.65897  |
|        phlippe_densenet         |    1    | 2.616761  |
|          pytorch_unet           |    1    |  2.29137  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.229067  |
|       Background_Matting        |    1    | 2.222108  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.174487  |
|             hf_GPT2             |    1    | 2.054291  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.014611  |
|  timm_vision_transformer_large  |    1    | 2.005562  |
|     timm_vision_transformer     |    1    |  1.97556  |
|        soft_actor_critic        |   256   |  1.93603  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.905561  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.836134  |
|           hf_Reformer           |    1    | 1.819272  |
|          hf_GPT2_large          |    1    | 1.773735  |
|         pytorch_stargan         |   16    | 1.763365  |
|        basic_gnn_edgecnn        |    1    | 1.758075  |
|             hf_Bert             |    1    | 1.756065  |
|          hf_Bert_large          |    1    | 1.730696  |
|          hf_DistilBert          |    1    | 1.681884  |
|          fastNLP_Bert           |    1    | 1.641787  |
|             hf_Bart             |    1    | 1.539188  |
|      torch_multimodal_clip      |    1    |  1.48722  |
|           hf_T5_base            |    1    | 1.444158  |
|            hf_Albert            |    1    | 1.443705  |
|            moondream            |    1    | 1.439517  |
|       speech_transformer        |    1    | 1.408669  |
|        hf_distil_whisper        |    1    | 1.383495  |
|           hf_BigBird            |    1    | 1.324445  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.304711  |
|           hf_T5_large           |    1    | 1.230602  |
|              hf_T5              |    1    | 1.114452  |
|             demucs              |    1    | 1.037822  |
|           tts_angular           |    1    | 0.998957  |
|     resnet50_quantized_qat      |    1    | 0.993205  |
|   mobilenet_v2_quantized_qat    |    1    | 0.977967  |
|          hf_Longformer          |    1    | 0.937525  |
|              maml               |    1    | 0.909339  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 127.700947 |
|         vision_maskrcnn         |    1    | 120.603663 |
|    detectron2_fcos_r_50_fpn     |    1    | 96.449888  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 89.482975  |
| detectron2_fasterrcnn_r_101_c4  |    1    |  74.53534  |
|              maml               |    1    | 72.739502  |
|           hf_T5_large           |    1    |  57.49233  |
|          hf_Longformer          |    1    | 49.900914  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 47.666019  |
|       speech_transformer        |    1    | 44.529293  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 37.985638  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  37.97416  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 37.451843  |
|           hf_Reformer           |    1    | 37.374505  |
|           densenet121           |    1    | 34.220437  |
|  timm_vision_transformer_large  |    1    | 28.683365  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.585826  |
|            moondream            |    1    | 27.914966  |
|          fastNLP_Bert           |    1    | 26.622849  |
|          basic_gnn_gcn          |    1    | 26.205669  |
|          hf_Bert_large          |    1    | 23.244741  |
|      torch_multimodal_clip      |    1    | 22.469697  |
|           Super_SloMo           |    1    | 22.199674  |
|            resnet152            |    1    | 22.106471  |
|        hf_distil_whisper        |    1    | 21.795557  |
|          hf_GPT2_large          |    1    | 21.723719  |
|          BERT_pytorch           |    1    | 20.075825  |
|              hf_T5              |    1    | 18.915323  |
|             yolov3              |    1    | 18.380778  |
|             hf_Bart             |    1    | 18.229582  |
|       doctr_det_predictor       |    1    | 17.929403  |
|             demucs              |    1    | 17.085796  |
|        phlippe_densenet         |    1    | 16.773789  |
|           timm_regnet           |    1    | 16.680592  |
|           timm_nfnet            |    1    | 16.027347  |
|        timm_efficientnet        |    1    | 15.884302  |
|       shufflenet_v2_x1_0        |    1    | 15.805749  |
|              llama              |    1    | 15.481803  |
|             hf_Bert             |    1    | 15.322339  |
|             hf_GPT2             |    1    | 15.180948  |
|          timm_resnest           |    1    | 14.661908  |
|       mobilenet_v3_large        |    1    | 14.456087  |
|     timm_vision_transformer     |    1    | 14.366505  |
|            hf_Albert            |    1    | 14.352231  |
|      doctr_reco_predictor       |    1    | 14.257022  |
|           timm_vovnet           |    1    | 13.707437  |
|          mobilenet_v2           |    1    | 13.289175  |
|            resnet50             |    1    | 13.224878  |
|         resnext50_32x4d         |    1    | 13.188493  |
|           mnasnet1_0            |    1    | 13.032729  |
|         opacus_cifar10          |    1    | 12.822631  |
|          hf_DistilBert          |    1    | 12.572864  |
|     pyhpc_isoneutral_mixing     |    1    | 12.247478  |
|      functorch_dp_cifar10       |    1    | 12.224903  |
|         pytorch_stargan         |   16    | 12.079669  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.042433  |
|            resnet18             |    1    | 10.520573  |
|          squeezenet1_1          |    1    | 10.488758  |
|         LearningToPaint         |    1    | 10.308696  |
|         phlippe_resnet          |    1    |  10.11284  |
|           hf_T5_base            |    1    |  9.782326  |
|     pyhpc_equation_of_state     |    1    |  9.676256  |
|             alexnet             |    1    |  9.60198   |
|     functorch_maml_omniglot     |    1    |  9.43392   |
|               drq               |    1    |  9.300512  |
|              vgg16              |    1    |  9.282681  |
|          maml_omniglot          |    5    |  9.109565  |
|       Background_Matting        |    1    |  9.066449  |
|              dlrm               |    1    |  9.037712  |
|              dcgan              |    1    |  8.740378  |
|         basic_gnn_sage          |    1    |  8.738425  |
|          basic_gnn_gin          |    1    |  8.643445  |
|        basic_gnn_edgecnn        |    1    |  8.516657  |
|          lennard_jones          |    1    |  8.512543  |
|     nvidia_deeprecommender      |    1    |  8.484408  |
|        soft_actor_critic        |   256   |  8.456129  |
|           tts_angular           |    1    |  8.356534  |
|          pytorch_unet           |    1    |  4.781473  |
|   mobilenet_v2_quantized_qat    |    1    |  0.203429  |
|     resnet50_quantized_qat      |    1    |  0.182098  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.985729 |
|             demucs              |    1    | 0.984904 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974839 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.973005 |
|       Background_Matting        |    1    | 0.97053  |
|          pytorch_unet           |    1    | 0.96726  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.966907 |
|              llama              |    1    | 0.964809 |
|        basic_gnn_edgecnn        |    1    | 0.963563 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.958922 |
|      torch_multimodal_clip      |    1    | 0.958812 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95272  |
|         vision_maskrcnn         |    1    | 0.951211 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.950935 |
|         LearningToPaint         |    1    | 0.95049  |
|       doctr_det_predictor       |    1    | 0.950463 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948432 |
|          basic_gnn_gcn          |    1    | 0.946389 |
|      doctr_reco_predictor       |    1    | 0.946066 |
|         basic_gnn_sage          |    1    | 0.944065 |
|     resnet50_quantized_qat      |    1    | 0.943299 |
|          basic_gnn_gin          |    1    | 0.942701 |
|           Super_SloMo           |    1    | 0.939605 |
|           hf_BigBird            |    1    | 0.939394 |
|             hf_GPT2             |    1    | 0.938001 |
|          fastNLP_Bert           |    1    | 0.934291 |
|             hf_Bert             |    1    | 0.928358 |
|         pytorch_stargan         |   16    | 0.927966 |
|        hf_distil_whisper        |    1    | 0.920488 |
|           hf_T5_base            |    1    | 0.918358 |
|            hf_Albert            |    1    | 0.916941 |
|       speech_transformer        |    1    | 0.91188  |
|          hf_DistilBert          |    1    | 0.90989  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907731 |
|   mobilenet_v2_quantized_qat    |    1    | 0.90583  |
|          BERT_pytorch           |    1    | 0.902788 |
|          hf_GPT2_large          |    1    | 0.895336 |
|              hf_T5              |    1    | 0.895105 |
|          hf_Longformer          |    1    | 0.89313  |
|         opacus_cifar10          |    1    | 0.893122 |
|               drq               |    1    | 0.889032 |
|        soft_actor_critic        |   256   | 0.886767 |
|           tts_angular           |    1    | 0.883845 |
|             hf_Bart             |    1    | 0.879833 |
|  timm_vision_transformer_large  |    1    | 0.870871 |
|     timm_vision_transformer     |    1    | 0.866728 |
|           timm_nfnet            |    1    | 0.866384 |
|          mobilenet_v2           |    1    | 0.865513 |
|          squeezenet1_1          |    1    | 0.863976 |
|        timm_efficientnet        |    1    | 0.863795 |
|          lennard_jones          |    1    | 0.861897 |
|              vgg16              |    1    | 0.860827 |
|            moondream            |    1    |  0.8602  |
|          maml_omniglot          |    5    | 0.859594 |
|     functorch_maml_omniglot     |    1    | 0.857812 |
|          hf_Bert_large          |    1    | 0.853501 |
|       mobilenet_v3_large        |    1    | 0.849227 |
|              dcgan              |    1    | 0.847826 |
|           mnasnet1_0            |    1    | 0.846764 |
|     pyhpc_equation_of_state     |    1    | 0.845649 |
|             alexnet             |    1    | 0.839218 |
|      functorch_dp_cifar10       |    1    | 0.838288 |
|     nvidia_deeprecommender      |    1    | 0.836997 |
|          timm_resnest           |    1    | 0.835979 |
|         phlippe_resnet          |    1    | 0.833333 |
|       shufflenet_v2_x1_0        |    1    | 0.827538 |
|           hf_Reformer           |    1    | 0.824801 |
|           hf_T5_large           |    1    | 0.82324  |
|     pyhpc_isoneutral_mixing     |    1    | 0.819048 |
|        phlippe_densenet         |    1    | 0.803863 |
|         resnext50_32x4d         |    1    | 0.802022 |
|           densenet121           |    1    | 0.799366 |
|            resnet18             |    1    |  0.7908  |
|             yolov3              |    1    | 0.790751 |
|           timm_vovnet           |    1    | 0.788446 |
|           timm_regnet           |    1    | 0.785052 |
|            resnet50             |    1    | 0.779967 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.768413 |
|            resnet152            |    1    | 0.739521 |
|              maml               |    1    | 0.702309 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9863.906918 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2912.229498 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2893.502375 |
|          hf_GPT2_large          |    1    | 2531.009369 |
|            moondream            |    1    | 2241.146963 |
|           hf_T5_large           |    1    | 2180.532786 |
|        hf_distil_whisper        |    1    | 1948.281626 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1665.22887  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1640.881105 |
|          pytorch_unet           |    1    | 1403.143686 |
|       Background_Matting        |    1    | 1215.281456 |
|             demucs              |    1    | 1188.284529 |
|  timm_vision_transformer_large  |    1    | 950.816984  |
|         vision_maskrcnn         |    1    | 692.435865  |
|    detectron2_fcos_r_50_fpn     |    1    | 617.942074  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 598.864675  |
|           hf_BigBird            |    1    | 534.170237  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 515.307519  |
|          hf_Longformer          |    1    | 489.446102  |
|          hf_Bert_large          |    1    |  447.7036   |
|       doctr_det_predictor       |    1    | 391.452594  |
|      torch_multimodal_clip      |    1    | 321.328775  |
|           Super_SloMo           |    1    | 318.636975  |
|             hf_Bart             |    1    | 247.126608  |
|              hf_T5              |    1    | 235.674863  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 227.635033  |
|         pytorch_stargan         |   16    | 218.731878  |
|             hf_Bert             |    1    | 174.038884  |
|       speech_transformer        |    1    | 164.115184  |
|          fastNLP_Bert           |    1    | 162.485762  |
|            hf_Albert            |    1    | 160.829039  |
|             hf_GPT2             |    1    | 110.668598  |
|          hf_DistilBert          |    1    |  103.93333  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 101.644048  |
|           hf_Reformer           |    1    |  99.103783  |
|        basic_gnn_edgecnn        |    1    |  85.985789  |
|             yolov3              |    1    |  63.184901  |
|              maml               |    1    |  58.464923  |
|          BERT_pytorch           |    1    |  42.756333  |
|              vgg16              |    1    |  41.762572  |
|     nvidia_deeprecommender      |    1    |  38.986668  |
|           timm_regnet           |    1    |  32.897804  |
|           tts_angular           |    1    |  30.470966  |
|           timm_nfnet            |    1    |  30.372603  |
|            resnet152            |    1    |  29.663672  |
|          basic_gnn_gcn          |    1    |  27.795502  |
|     timm_vision_transformer     |    1    |  18.347924  |
|           densenet121           |    1    |  13.351663  |
|         resnext50_32x4d         |    1    |  13.329756  |
|           timm_vovnet           |    1    |  12.793486  |
|             alexnet             |    1    |  12.319132  |
|              llama              |    1    |  10.928382  |
|            resnet50             |    1    |  10.805654  |
|         basic_gnn_sage          |    1    |  9.734886   |
|          basic_gnn_gin          |    1    |   8.86211   |
|     resnet50_quantized_qat      |    1    |  8.626706   |
|        timm_efficientnet        |    1    |  7.924748   |
|          timm_resnest           |    1    |  6.058316   |
|   mobilenet_v2_quantized_qat    |    1    |  5.876307   |
|      doctr_reco_predictor       |    1    |  5.566865   |
|            resnet18             |    1    |  3.914984   |
|       mobilenet_v3_large        |    1    |  3.779006   |
|           mnasnet1_0            |    1    |  3.529321   |
|          mobilenet_v2           |    1    |  3.529037   |
|       shufflenet_v2_x1_0        |    1    |  3.393928   |
|         LearningToPaint         |    1    |  2.524561   |
|        phlippe_densenet         |    1    |  2.381054   |
|          squeezenet1_1          |    1    |  2.179402   |
|         opacus_cifar10          |    1    |  1.638567   |
|      functorch_dp_cifar10       |    1    |  1.614329   |
|               drq               |    1    |  0.943774   |
|         phlippe_resnet          |    1    |  0.812729   |
|        soft_actor_critic        |   256   |  0.627614   |
|              dcgan              |    1    |  0.614016   |
|     functorch_maml_omniglot     |    1    |  0.511322   |
|              dlrm               |    1    |  0.452027   |
|          maml_omniglot          |    5    |  0.295625   |
|     pyhpc_isoneutral_mixing     |    1    |   0.04747   |
|          lennard_jones          |    1    |  0.029722   |
|     pyhpc_equation_of_state     |    1    |  0.027061   |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 4.061884 |
|     MobileBertForQuestionAnswering      | 1  | 3.265446 |
|          BlenderbotForCausalLM          | 1  | 2.854742 |
|     M2M100ForConditionalGeneration      | 1  | 2.849137 |
|     PegasusForConditionalGeneration     | 1  | 2.842988 |
|          DistilBertForMaskedLM          | 1  | 2.742818 |
|           PegasusForCausalLM            | 1  | 2.732351 |
|         Speech2Text2ForCausalLM         | 1  | 2.687777 |
|     DistilBertForQuestionAnswering      | 1  | 2.576117 |
|       BlenderbotSmallForCausalLM        | 1  | 2.555319 |
|            YituTechConvBert             | 1  | 2.537206 |
|             XGLMForCausalLM             | 1  |  2.5214  |
| BlenderbotSmallForConditionalGeneration | 1  | 2.479031 |
|            XLNetLMHeadModel             | 1  | 2.059503 |
|         MegatronBertForCausalLM         | 1  | 2.025578 |
|           DebertaForMaskedLM            | 1  | 2.002105 |
|       DebertaForQuestionAnswering       | 1  | 1.997482 |
|       MT5ForConditionalGeneration       | 1  | 1.893377 |
|           ElectraForCausalLM            | 1  | 1.893093 |
|             BertForMaskedLM             | 1  | 1.839894 |
|                CamemBert                | 1  | 1.835992 |
|           LayoutLMForMaskedLM           | 1  | 1.831101 |
|           RobertaForCausalLM            | 1  | 1.826334 |
|      GPT2ForSequenceClassification      | 1  | 1.820556 |
|       RobertaForQuestionAnswering       | 1  | 1.793474 |
|               DistillGPT2               | 1  | 1.77598  |
|    LayoutLMForSequenceClassification    | 1  | 1.769686 |
|        BertForQuestionAnswering         | 1  | 1.763755 |
|       ElectraForQuestionAnswering       | 1  | 1.74778  |
|    MegatronBertForQuestionAnswering     | 1  | 1.711707 |
|            TrOCRForCausalLM             | 1  | 1.670586 |
|          DebertaV2ForMaskedLM           | 1  | 1.631849 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.620251 |
|             OPTForCausalLM              | 1  | 1.432185 |
|             BartForCausalLM             | 1  | 1.392603 |
|      BartForConditionalGeneration       | 1  | 1.379491 |
|     PLBartForConditionalGeneration      | 1  | 1.362126 |
|      MBartForConditionalGeneration      | 1  | 1.321412 |
|            PLBartForCausalLM            | 1  | 1.318873 |
|            MBartForCausalLM             | 1  | 1.286686 |
|               GoogleFnet                | 1  | 1.23251  |
|       AlbertForQuestionAnswering        | 1  | 1.200725 |
|            AlbertForMaskedLM            | 1  | 1.181818 |
|       T5ForConditionalGeneration        | 1  | 1.165128 |
|                 T5Small                 | 1  | 1.162365 |
|          AllenaiLongformerBase          | 1  | 0.818595 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.031324 |
|     MobileBertForQuestionAnswering      | 1  | 49.248753 |
|          MobileBertForMaskedLM          | 1  | 49.165179 |
|     PegasusForConditionalGeneration     | 1  | 32.565221 |
|     M2M100ForConditionalGeneration      | 1  | 32.480914 |
|      MBartForConditionalGeneration      | 1  | 31.928766 |
|            XLNetLMHeadModel             | 1  | 29.113798 |
|             XGLMForCausalLM             | 1  | 28.865434 |
|      BartForConditionalGeneration       | 1  | 28.000068 |
|      DebertaV2ForQuestionAnswering      | 1  | 27.293726 |
|          DebertaV2ForMaskedLM           | 1  | 27.126967 |
|          BlenderbotForCausalLM          | 1  | 25.847014 |
| BlenderbotSmallForConditionalGeneration | 1  | 25.015426 |
|       MT5ForConditionalGeneration       | 1  | 23.926125 |
|         MegatronBertForCausalLM         | 1  | 23.613435 |
|    MegatronBertForQuestionAnswering     | 1  | 23.467675 |
|            YituTechConvBert             | 1  | 22.020737 |
|     PLBartForConditionalGeneration      | 1  | 19.881643 |
|                 T5Small                 | 1  | 18.928145 |
|       T5ForConditionalGeneration        | 1  | 18.906174 |
|            TrOCRForCausalLM             | 1  | 18.012921 |
|           PegasusForCausalLM            | 1  | 17.839665 |
|            MBartForCausalLM             | 1  | 17.354566 |
|           DebertaForMaskedLM            | 1  | 17.018962 |
|       DebertaForQuestionAnswering       | 1  | 16.767166 |
|           ElectraForCausalLM            | 1  | 16.162974 |
|       ElectraForQuestionAnswering       | 1  | 15.934482 |
|           LayoutLMForMaskedLM           | 1  | 15.813459 |
|                CamemBert                | 1  | 15.695121 |
|             BertForMaskedLM             | 1  | 15.664273 |
|        BertForQuestionAnswering         | 1  | 15.629867 |
|       RobertaForQuestionAnswering       | 1  | 15.590593 |
|           RobertaForCausalLM            | 1  | 15.590395 |
|    LayoutLMForSequenceClassification    | 1  | 15.567207 |
|            AlbertForMaskedLM            | 1  | 15.177827 |
|             BartForCausalLM             | 1  | 15.025893 |
|       BlenderbotSmallForCausalLM        | 1  | 14.943937 |
|      GPT2ForSequenceClassification      | 1  | 14.677168 |
|             OPTForCausalLM              | 1  | 14.222582 |
|               GoogleFnet                | 1  | 13.781692 |
|         Speech2Text2ForCausalLM         | 1  | 13.533419 |
|          DistilBertForMaskedLM          | 1  | 13.364566 |
|     DistilBertForQuestionAnswering      | 1  | 13.141414 |
|            PLBartForCausalLM            | 1  | 13.03178  |
|       AlbertForQuestionAnswering        | 1  | 12.460511 |
|               DistillGPT2               | 1  | 11.741994 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.974019 |
|            MBartForCausalLM             | 1  | 0.967276 |
|               DistillGPT2               | 1  | 0.954046 |
|      GPT2ForSequenceClassification      | 1  | 0.952462 |
|             BartForCausalLM             | 1  | 0.946188 |
|                CamemBert                | 1  | 0.93904  |
|       RobertaForQuestionAnswering       | 1  | 0.938417 |
|           DebertaForMaskedLM            | 1  | 0.934942 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.934763 |
|             BertForMaskedLM             | 1  | 0.934518 |
|       MT5ForConditionalGeneration       | 1  | 0.933606 |
|          BlenderbotForCausalLM          | 1  | 0.933512 |
|            PLBartForCausalLM            | 1  | 0.932812 |
|           LayoutLMForMaskedLM           | 1  | 0.930842 |
|         MegatronBertForCausalLM         | 1  | 0.929652 |
|            YituTechConvBert             | 1  | 0.928994 |
|            TrOCRForCausalLM             | 1  | 0.924702 |
|     PLBartForConditionalGeneration      | 1  | 0.922303 |
|                 T5Small                 | 1  | 0.922078 |
|           RobertaForCausalLM            | 1  | 0.921937 |
|    MegatronBertForQuestionAnswering     | 1  | 0.921578 |
|       T5ForConditionalGeneration        | 1  | 0.920122 |
|        BertForQuestionAnswering         | 1  |  0.9199  |
|    LayoutLMForSequenceClassification    | 1  | 0.919447 |
|     M2M100ForConditionalGeneration      | 1  | 0.915154 |
|             XGLMForCausalLM             | 1  | 0.915111 |
|           PegasusForCausalLM            | 1  | 0.912281 |
|      BartForConditionalGeneration       | 1  | 0.90848  |
|          AllenaiLongformerBase          | 1  | 0.900408 |
|          DebertaV2ForMaskedLM           | 1  | 0.900103 |
|     PegasusForConditionalGeneration     | 1  | 0.899639 |
|      MBartForConditionalGeneration      | 1  | 0.89756  |
|           ElectraForCausalLM            | 1  | 0.895498 |
|       BlenderbotSmallForCausalLM        | 1  | 0.888824 |
|       ElectraForQuestionAnswering       | 1  | 0.88035  |
|          DistilBertForMaskedLM          | 1  | 0.877574 |
|            XLNetLMHeadModel             | 1  | 0.87559  |
|     DistilBertForQuestionAnswering      | 1  | 0.871269 |
|       DebertaForQuestionAnswering       | 1  | 0.86474  |
|         Speech2Text2ForCausalLM         | 1  | 0.856009 |
|               GoogleFnet                | 1  | 0.852953 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.848386 |
|          MobileBertForMaskedLM          | 1  | 0.777287 |
|     MobileBertForQuestionAnswering      | 1  | 0.740318 |
|            AlbertForMaskedLM            | 1  | 0.686516 |
|       AlbertForQuestionAnswering        | 1  | 0.668899 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|       AlbertForQuestionAnswering        | 1  | 3845.283092 |
|            AlbertForMaskedLM            | 1  | 3838.752773 |
|             OPTForCausalLM              | 1  | 2193.977152 |
|      MBartForConditionalGeneration      | 1  | 1779.439887 |
|      BartForConditionalGeneration       | 1  | 1385.746505 |
|          DebertaV2ForMaskedLM           | 1  | 1318.996885 |
|          AllenaiLongformerBase          | 1  | 1108.628046 |
|      DebertaV2ForQuestionAnswering      | 1  | 1010.238536 |
|            MBartForCausalLM             | 1  | 980.828925  |
|            XLNetLMHeadModel             | 1  |  964.1101   |
|                 T5Small                 | 1  | 803.172236  |
|       T5ForConditionalGeneration        | 1  | 803.154999  |
|          BlenderbotForCausalLM          | 1  | 751.812237  |
|     PLBartForConditionalGeneration      | 1  | 651.283158  |
|             BartForCausalLM             | 1  | 611.567118  |
|         MegatronBertForCausalLM         | 1  | 506.606718  |
|    MegatronBertForQuestionAnswering     | 1  | 464.611584  |
|               GoogleFnet                | 1  | 450.308602  |
|            PLBartForCausalLM            | 1  | 385.213747  |
|      GPT2ForSequenceClassification      | 1  | 369.965932  |
|             XGLMForCausalLM             | 1  | 276.327311  |
|     M2M100ForConditionalGeneration      | 1  | 212.599029  |
|           RobertaForCausalLM            | 1  | 208.188182  |
|           DebertaForMaskedLM            | 1  | 204.517921  |
|            YituTechConvBert             | 1  | 197.556346  |
|       MT5ForConditionalGeneration       | 1  | 193.447231  |
|            TrOCRForCausalLM             | 1  | 181.845694  |
|                CamemBert                | 1  |  181.72722  |
|           LayoutLMForMaskedLM           | 1  | 181.359308  |
|             BertForMaskedLM             | 1  | 181.350761  |
|     PegasusForConditionalGeneration     | 1  | 178.819137  |
|       RobertaForQuestionAnswering       | 1  |  151.34308  |
|               DistillGPT2               | 1  | 147.593575  |
|    LayoutLMForSequenceClassification    | 1  | 142.734888  |
|        BertForQuestionAnswering         | 1  | 142.643859  |
|       DebertaForQuestionAnswering       | 1  | 141.174056  |
|           PegasusForCausalLM            | 1  |  88.839915  |
| BlenderbotSmallForConditionalGeneration | 1  |  52.233325  |
|           ElectraForCausalLM            | 1  |  50.190811  |
|          DistilBertForMaskedLM          | 1  |  30.512328  |
|          MobileBertForMaskedLM          | 1  |  29.71385   |
|       ElectraForQuestionAnswering       | 1  |  28.775324  |
|       BlenderbotSmallForCausalLM        | 1  |  27.905744  |
|     DistilBertForQuestionAnswering      | 1  |  19.929899  |
|     MobileBertForQuestionAnswering      | 1  |  19.100993  |
|         Speech2Text2ForCausalLM         | 1  |   5.38682   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.378689 |
|        ese_vovnet19b_dw         | 1  | 6.247957 |
|        adv_inception_v3         | 1  | 6.08201  |
|       gluon_inception_v3        | 1  | 6.060838 |
|           resnest101e           | 1  | 5.797514 |
|          cspdarknet53           | 1  | 5.345493 |
|          pnasnet5large          | 1  | 5.299618 |
|           dm_nfnet_f0           | 1  | 5.22001  |
|           res2next50            | 1  | 4.826918 |
|     swsl_resnext101_32x16d      | 1  | 4.784137 |
|         mobilenetv2_100         | 1  | 4.727771 |
|            nfnet_l0             | 1  | 4.690581 |
|             dla102              | 1  | 4.579504 |
|            fbnetv3_b            | 1  | 4.576315 |
|           fbnetc_100            | 1  | 4.507382 |
|          spnasnet_100           | 1  | 4.501153 |
|      mobilenetv3_large_100      | 1  | 4.476025 |
|           mnasnet_100           | 1  | 4.43103  |
|        res2net50_14w_8s         | 1  | 4.408147 |
|          botnet26t_256          | 1  | 4.325441 |
|           selecsls42b           | 1  | 4.295137 |
|        res2net101_26w_4s        | 1  | 4.064214 |
|            gernet_l             | 1  | 4.027696 |
|           regnety_002           | 1  | 3.902991 |
|            hrnet_w18            | 1  | 3.820035 |
|          ghostnet_100           | 1  | 3.81869  |
|            repvgg_a2            | 1  | 3.778235 |
|       eca_botnext26ts_256       | 1  | 3.544248 |
|            lcnet_050            | 1  | 3.365806 |
|            levit_128            | 1  | 3.359281 |
|        eca_halonext26ts         | 1  | 3.358536 |
|            tinynet_a            | 1  | 3.31462  |
|         poolformer_m36          | 1  | 3.254861 |
|           rexnet_100            | 1  | 3.242695 |
|         visformer_small         | 1  | 3.083626 |
|       tf_efficientnet_b0        | 1  | 3.052799 |
|             dpn107              | 1  | 2.965916 |
|           mobilevit_s           | 1  | 2.926323 |
|        convmixer_768_32         | 1  | 2.648845 |
|         coat_lite_mini          | 1  | 2.530797 |
|        twins_pcpvt_base         | 1  | 2.404005 |
|           volo_d1_224           | 1  | 2.34766  |
|            mixnet_l             | 1  | 2.211051 |
|          gmixer_24_224          | 1  | 2.195961 |
|           tf_mixnet_l           | 1  | 2.195156 |
|          jx_nest_base           | 1  | 1.915372 |
|            pit_b_224            | 1  | 1.84285  |
|        tnt_s_patch16_224        | 1  | 1.839043 |
|          gmlp_s16_224           | 1  |  1.8342  |
|      beit_base_patch16_224      | 1  | 1.823198 |
|  swin_base_patch4_window7_224   | 1  | 1.814174 |
|           convit_base           | 1  | 1.807881 |
|         crossvit_9_240          | 1  | 1.74679  |
|          convnext_base          | 1  | 1.73041  |
|      xcit_large_24_p8_224       | 1  | 1.718673 |
|      vit_base_patch16_224       | 1  | 1.669576 |
|          resmlp_12_224          | 1  | 1.667831 |
|          cait_m36_384           | 1  | 1.557698 |
| deit_base_distilled_patch16_224 | 1  | 1.545772 |
|          mixer_b16_224          | 1  | 1.429642 |
|        sebotnet33ts_256         | 1  | 1.220357 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|      xcit_large_24_p8_224       | 1  | 43.746995 |
|            hrnet_w18            | 1  | 43.530208 |
|         poolformer_m36          | 1  | 43.37801  |
|          pnasnet5large          | 1  | 42.53695  |
|          cait_m36_384           | 1  | 40.396213 |
|  swin_base_patch4_window7_224   | 1  | 38.511835 |
|          jx_nest_base           | 1  | 32.218333 |
|        res2net101_26w_4s        | 1  | 31.566264 |
|        twins_pcpvt_base         | 1  | 30.740954 |
|           resnest101e           | 1  | 30.213167 |
|        res2net50_14w_8s         | 1  | 29.956637 |
|        tnt_s_patch16_224        | 1  | 29.142691 |
|             dpn107              | 1  | 27.772134 |
|           tf_mixnet_l           | 1  | 27.183265 |
|            mixnet_l             | 1  | 25.733691 |
|        adv_inception_v3         | 1  | 25.713479 |
|           mobilevit_s           | 1  | 25.595786 |
|           volo_d1_224           | 1  | 24.31625  |
|       gluon_inception_v3        | 1  | 23.105667 |
|          inception_v3           | 1  | 23.078969 |
|            levit_128            | 1  | 22.415817 |
|          gmixer_24_224          | 1  | 22.183653 |
|         crossvit_9_240          | 1  | 21.883068 |
|          gmlp_s16_224           | 1  | 21.019937 |
|          convnext_base          | 1  | 20.916778 |
|           res2next50            | 1  | 20.584414 |
|        eca_halonext26ts         | 1  | 20.263323 |
|            fbnetv3_b            | 1  | 20.077193 |
|        sebotnet33ts_256         | 1  | 19.812314 |
|           rexnet_100            | 1  | 19.300756 |
|             dla102              | 1  | 18.982662 |
|          ghostnet_100           | 1  | 18.872057 |
|         coat_lite_mini          | 1  | 18.80474  |
|           convit_base           | 1  | 17.824559 |
|         visformer_small         | 1  | 17.360452 |
|       eca_botnext26ts_256       | 1  |  17.3215  |
|            tinynet_a            | 1  | 17.147994 |
|     swsl_resnext101_32x16d      | 1  | 16.732746 |
|       tf_efficientnet_b0        | 1  | 16.337789 |
|        convmixer_768_32         | 1  | 16.074722 |
|          botnet26t_256          | 1  | 15.946563 |
|           dm_nfnet_f0           | 1  | 15.902995 |
|          cspdarknet53           | 1  | 15.753791 |
|            pit_b_224            | 1  | 15.026059 |
|          mixer_b16_224          | 1  | 14.767217 |
|      beit_base_patch16_224      | 1  | 14.750986 |
|            nfnet_l0             | 1  | 14.659367 |
| deit_base_distilled_patch16_224 | 1  | 14.340735 |
|           regnety_002           | 1  | 14.274947 |
|      vit_base_patch16_224       | 1  | 14.236084 |
|      mobilenetv3_large_100      | 1  | 14.19356  |
|            repvgg_a2            | 1  | 14.046696 |
|           fbnetc_100            | 1  | 13.903839 |
|          spnasnet_100           | 1  | 13.876731 |
|            gernet_l             | 1  | 13.405643 |
|         mobilenetv2_100         | 1  | 13.105427 |
|        ese_vovnet19b_dw         | 1  | 12.995955 |
|           mnasnet_100           | 1  | 12.861673 |
|           selecsls42b           | 1  | 12.52553  |
|          resmlp_12_224          | 1  |  12.5148  |
|            lcnet_050            | 1  | 11.727119 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          pnasnet5large          | 1  | 0.910815 |
|            nfnet_l0             | 1  | 0.910476 |
|          convnext_base          | 1  | 0.907132 |
|        convmixer_768_32         | 1  | 0.896436 |
|           dm_nfnet_f0           | 1  | 0.895945 |
|      beit_base_patch16_224      | 1  | 0.895061 |
|          cait_m36_384           | 1  | 0.89316  |
|          resmlp_12_224          | 1  | 0.890823 |
|         poolformer_m36          | 1  | 0.88791  |
|        ese_vovnet19b_dw         | 1  | 0.885479 |
|      xcit_large_24_p8_224       | 1  | 0.878961 |
|           volo_d1_224           | 1  | 0.876271 |
|           convit_base           | 1  | 0.874682 |
|  swin_base_patch4_window7_224   | 1  | 0.874172 |
|            pit_b_224            | 1  | 0.872138 |
|         coat_lite_mini          | 1  | 0.870526 |
|         mobilenetv2_100         | 1  | 0.870459 |
|          mixer_b16_224          | 1  | 0.868921 |
|      vit_base_patch16_224       | 1  | 0.868723 |
|        twins_pcpvt_base         | 1  | 0.866785 |
|         visformer_small         | 1  | 0.866459 |
| deit_base_distilled_patch16_224 | 1  | 0.866307 |
|          jx_nest_base           | 1  | 0.865707 |
|          gmlp_s16_224           | 1  | 0.863383 |
|       tf_efficientnet_b0        | 1  | 0.861785 |
|           mobilevit_s           | 1  | 0.861143 |
|            lcnet_050            | 1  | 0.860122 |
|      mobilenetv3_large_100      | 1  | 0.858995 |
|          gmixer_24_224          | 1  | 0.858609 |
|           fbnetc_100            | 1  | 0.856155 |
|           rexnet_100            | 1  | 0.855252 |
|           mnasnet_100           | 1  | 0.852737 |
|            tinynet_a            | 1  | 0.851858 |
|       eca_botnext26ts_256       | 1  | 0.851449 |
|            fbnetv3_b            | 1  | 0.850518 |
|          spnasnet_100           | 1  | 0.849435 |
|        sebotnet33ts_256         | 1  | 0.846161 |
|        eca_halonext26ts         | 1  | 0.84593  |
|           tf_mixnet_l           | 1  | 0.844152 |
|          botnet26t_256          | 1  | 0.842242 |
|        tnt_s_patch16_224        | 1  | 0.838813 |
|            mixnet_l             | 1  | 0.835321 |
|          ghostnet_100           | 1  | 0.833487 |
|           regnety_002           | 1  | 0.826763 |
|         crossvit_9_240          | 1  | 0.826175 |
|            levit_128            | 1  | 0.823656 |
|             dpn107              | 1  | 0.823187 |
|          cspdarknet53           | 1  | 0.795552 |
|           res2next50            | 1  | 0.794283 |
|        adv_inception_v3         | 1  | 0.788166 |
|          inception_v3           | 1  | 0.787724 |
|             dla102              | 1  | 0.786872 |
|       gluon_inception_v3        | 1  | 0.78544  |
|        res2net50_14w_8s         | 1  | 0.784282 |
|           resnest101e           | 1  | 0.777902 |
|           selecsls42b           | 1  | 0.772375 |
|            gernet_l             | 1  | 0.76736  |
|            repvgg_a2            | 1  | 0.758943 |
|        res2net101_26w_4s        | 1  | 0.756354 |
|            hrnet_w18            | 1  | 0.751197 |
|     swsl_resnext101_32x16d      | 1  | 0.718725 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1570.549672 |
|      xcit_large_24_p8_224       | 1  | 465.702546  |
|          pnasnet5large          | 1  | 128.639765  |
|          jx_nest_base           | 1  | 108.867263  |
|  swin_base_patch4_window7_224   | 1  | 104.608393  |
|          convnext_base          | 1  |  99.794776  |
|     swsl_resnext101_32x16d      | 1  |  90.482544  |
|           convit_base           | 1  |  87.022313  |
| deit_base_distilled_patch16_224 | 1  |  80.769136  |
|      beit_base_patch16_224      | 1  |  75.312327  |
|      vit_base_patch16_224       | 1  |  72.894078  |
|             dpn107              | 1  |  71.120692  |
|        convmixer_768_32         | 1  |  69.62407   |
|            pit_b_224            | 1  |  64.236036  |
|          mixer_b16_224          | 1  |  61.972206  |
|         poolformer_m36          | 1  |  49.270532  |
|        twins_pcpvt_base         | 1  |  49.13748   |
|        sebotnet33ts_256         | 1  |  49.106161  |
|           volo_d1_224           | 1  |  42.556094  |
|        tnt_s_patch16_224        | 1  |  41.60376   |
|           dm_nfnet_f0           | 1  |  40.351121  |
|           resnest101e           | 1  |  31.395967  |
|          gmlp_s16_224           | 1  |  28.743972  |
|            nfnet_l0             | 1  |  28.394378  |
|        res2net101_26w_4s        | 1  |  28.31072   |
|          gmixer_24_224          | 1  |  25.613338  |
|            hrnet_w18            | 1  |  23.576005  |
|         visformer_small         | 1  |  23.025244  |
|           mobilevit_s           | 1  |  20.743875  |
|           tf_mixnet_l           | 1  |  19.872118  |
|             dla102              | 1  |  19.427854  |
|            mixnet_l             | 1  |  19.112421  |
|          resmlp_12_224          | 1  |  17.245385  |
|        res2net50_14w_8s         | 1  |  16.833113  |
|        eca_halonext26ts         | 1  |  16.093305  |
|          cspdarknet53           | 1  |  15.936569  |
|       gluon_inception_v3        | 1  |  15.709245  |
|        adv_inception_v3         | 1  |  15.390305  |
|          inception_v3           | 1  |  15.043591  |
|         coat_lite_mini          | 1  |  14.921233  |
|       eca_botnext26ts_256       | 1  |  14.854832  |
|           res2next50            | 1  |  14.615757  |
|         crossvit_9_240          | 1  |  14.572321  |
|            repvgg_a2            | 1  |  12.869001  |
|            gernet_l             | 1  |  12.677569  |
|          botnet26t_256          | 1  |  11.874501  |
|           selecsls42b           | 1  |  10.981233  |
|       tf_efficientnet_b0        | 1  |  8.334896   |
|        ese_vovnet19b_dw         | 1  |   8.07372   |
|            fbnetv3_b            | 1  |  7.926052   |
|           rexnet_100            | 1  |  7.855281   |
|            tinynet_a            | 1  |   7.23319   |
|            levit_128            | 1  |  5.450975   |
|          ghostnet_100           | 1  |  4.849062   |
|           fbnetc_100            | 1  |  4.350619   |
|          spnasnet_100           | 1  |  3.990269   |
|      mobilenetv3_large_100      | 1  |   3.78432   |
|           mnasnet_100           | 1  |  3.621084   |
|         mobilenetv2_100         | 1  |  3.485909   |
|           regnety_002           | 1  |  3.297514   |
|            lcnet_050            | 1  |  1.825427   |
+---------------------------------+----+-------------+

WeizhuoZhang-intel · 2024-10-28T14:52:02Z

[amp] Performance Dashboard for amp precision -- Single-Socket Multi-threads (2024-10-26 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 73/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.00x    |    2.18x    |    2.63x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   23.75    |    32.04    |    32.76    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.97x    |    0.98x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 18.075791 |
|          timm_resnest           |   32    | 4.417584  |
|            resnet50             |   32    | 3.879592  |
|          squeezenet1_1          |   16    | 3.521467  |
|         phlippe_resnet          |   128   | 3.281025  |
|            resnet152            |   32    | 3.262062  |
|              vgg16              |    4    | 3.251912  |
|         resnext50_32x4d         |    8    |  3.21032  |
|           mnasnet1_0            |   32    | 3.167843  |
|          mobilenet_v2           |   16    | 3.131043  |
|             yolov3              |    8    |  3.08176  |
|           timm_vovnet           |   32    |  3.05791  |
|            resnet18             |    8    |  2.97462  |
|             alexnet             |   128   | 2.955585  |
|             hf_GPT2             |    1    | 2.891921  |
|       mobilenet_v3_large        |   32    | 2.888675  |
|             hf_Bert             |    1    | 2.680709  |
|          hf_Bert_large          |    1    | 2.664316  |
|           timm_nfnet            |   128   | 2.658872  |
|              llama              |   32    | 2.634203  |
|        timm_efficientnet        |   64    | 2.633207  |
|       doctr_det_predictor       |    1    | 2.592079  |
|       shufflenet_v2_x1_0        |   64    | 2.585156  |
|           timm_regnet           |   32    | 2.580064  |
|          hf_DistilBert          |    1    | 2.411069  |
|           densenet121           |   64    | 2.402152  |
|           hf_T5_base            |    1    |  2.39618  |
|          BERT_pytorch           |    2    | 2.333835  |
|             hf_Bart             |    1    | 2.203874  |
|     functorch_maml_omniglot     |    1    | 2.188026  |
|        soft_actor_critic        |   256   | 2.147814  |
|            hf_Albert            |    1    |  2.14566  |
|               drq               |    1    | 2.127147  |
|          fastNLP_Bert           |    1    | 2.106483  |
|        phlippe_densenet         |   128   | 2.094153  |
|            moondream            |    1    | 2.084699  |
|              hf_T5              |    1    | 2.083177  |
|          lennard_jones          |  1000   | 2.072944  |
|         LearningToPaint         |   96    | 1.952469  |
|          hf_Longformer          |    1    | 1.952378  |
|          hf_GPT2_large          |    1    | 1.875569  |
|           hf_T5_large           |    1    | 1.808494  |
|      doctr_reco_predictor       |    1    | 1.803523  |
|              dcgan              |   256   |  1.74546  |
|          pytorch_unet           |    1    | 1.739495  |
|       Background_Matting        |    1    | 1.738497  |
|        hf_distil_whisper        |    1    |  1.68582  |
|        basic_gnn_edgecnn        |    1    |  1.6477   |
|     timm_vision_transformer     |   32    | 1.635029  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.613403  |
|         pytorch_stargan         |   16    | 1.553535  |
|       speech_transformer        |    1    | 1.505731  |
|          maml_omniglot          |    5    | 1.481312  |
|     nvidia_deeprecommender      |   256   | 1.479496  |
|           hf_Reformer           |    1    | 1.461133  |
|  timm_vision_transformer_large  |   32    | 1.461022  |
|    detectron2_fcos_r_50_fpn     |    1    |  1.45123  |
|          basic_gnn_gin          |    1    | 1.374374  |
|         vision_maskrcnn         |    1    | 1.358389  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.357966  |
|         basic_gnn_sage          |    1    | 1.288759  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.255243  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.228567  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.208231  |
|              dlrm               |  2048   | 1.201191  |
|           Super_SloMo           |    6    | 1.181406  |
|          basic_gnn_gcn          |    1    |  1.1813   |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.151002  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.138355  |
|      torch_multimodal_clip      |   32    | 1.132227  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.083445  |
|             demucs              |    1    | 1.057622  |
|           hf_BigBird            |    1    | 1.049448  |
|     pyhpc_isoneutral_mixing     | 1048576 |  1.04423  |
|         opacus_cifar10          |   64    | 1.025555  |
|   mobilenet_v2_quantized_qat    |   96    | 0.997872  |
|           tts_angular           |   64    | 0.989672  |
|     resnet50_quantized_qat      |   32    | 0.981025  |
|      functorch_dp_cifar10       |   64    | 0.943818  |
|              maml               |    1    | 0.756576  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    4    |        pass        |
|           densenet121           |    4    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    4    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|        hf_distil_whisper        |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|              dlrm               |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|            hf_Albert            |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    4    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|            moondream            |    4    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    4    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    4    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    4    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|         vision_maskrcnn         |    1    | 130.568797 |
|           hf_BigBird            |    1    | 129.168214 |
|    detectron2_fcos_r_50_fpn     |    1    | 102.929059 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 99.952769  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 88.517776  |
|              maml               |    1    |  72.87085  |
|           hf_T5_large           |    1    | 62.969567  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 60.271584  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 54.033316  |
|          hf_Longformer          |    1    |  50.13606  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 49.734347  |
|           densenet121           |   64    |  47.27624  |
|       speech_transformer        |    1    | 44.853851  |
|  timm_vision_transformer_large  |   32    | 43.687869  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 42.297184  |
|           hf_Reformer           |    1    | 38.373202  |
|      torch_multimodal_clip      |   32    | 35.758438  |
|            moondream            |    1    | 35.661611  |
|           Super_SloMo           |    6    | 34.998108  |
|            resnet152            |   32    | 34.375329  |
|     pyhpc_isoneutral_mixing     | 1048576 | 34.107519  |
|          hf_GPT2_large          |    1    | 32.797678  |
|           hf_T5_base            |    1    | 29.086853  |
|          BERT_pytorch           |    2    | 28.658993  |
|             yolov3              |    8    | 28.493378  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.424177  |
|          fastNLP_Bert           |    1    | 27.250288  |
|       doctr_det_predictor       |    1    | 26.822814  |
|          basic_gnn_gcn          |    1    | 26.790364  |
|        hf_distil_whisper        |    1    | 25.992864  |
|          hf_Bert_large          |    1    | 24.510272  |
|           timm_regnet           |   32    | 24.325214  |
|        timm_efficientnet        |   64    | 23.066913  |
|        phlippe_densenet         |   128   |  22.18916  |
|       mobilenet_v3_large        |   32    |  21.59862  |
|       shufflenet_v2_x1_0        |   64    |  20.96718  |
|           timm_nfnet            |   128   | 20.806285  |
|          mobilenet_v2           |   16    | 19.566245  |
|              hf_T5              |    1    | 19.542651  |
|             hf_Bart             |    1    | 19.527315  |
|     timm_vision_transformer     |   32    | 18.809903  |
|          timm_resnest           |   32    | 18.403265  |
|           timm_vovnet           |   32    | 18.127335  |
|            resnet50             |   32    | 17.819397  |
|         resnext50_32x4d         |    8    |  17.76278  |
|             demucs              |    1    |  17.41446  |
|       Background_Matting        |    1    | 16.759876  |
|           mnasnet1_0            |   32    | 16.731527  |
|              llama              |   32    | 16.261248  |
|             hf_GPT2             |    1    | 16.040574  |
|             hf_Bert             |    1    | 16.016453  |
|         opacus_cifar10          |   64    | 15.928418  |
|         pytorch_stargan         |   16    | 15.915912  |
|      functorch_dp_cifar10       |   64    | 15.643668  |
|            hf_Albert            |    1    | 14.987964  |
|      doctr_reco_predictor       |    1    | 14.346858  |
|          pytorch_unet           |    1    | 13.705138  |
|          hf_DistilBert          |    1    | 13.203173  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.766116  |
|            resnet18             |    8    | 12.631868  |
|          squeezenet1_1          |   16    | 12.262425  |
|         LearningToPaint         |   96    | 12.248472  |
|         phlippe_resnet          |   128   | 11.930302  |
|              vgg16              |    4    | 10.963518  |
|             alexnet             |   128   | 10.669633  |
|     pyhpc_equation_of_state     | 1048576 | 10.462689  |
|          maml_omniglot          |    5    | 10.124358  |
|     functorch_maml_omniglot     |    1    |  9.521385  |
|               drq               |    1    |  9.340915  |
|              dcgan              |   256   |  9.330282  |
|              dlrm               |  2048   |  9.169182  |
|     nvidia_deeprecommender      |   256   |  9.008998  |
|        basic_gnn_edgecnn        |    1    |  8.993786  |
|         basic_gnn_sage          |    1    |  8.924543  |
|          basic_gnn_gin          |    1    |  8.902615  |
|        soft_actor_critic        |   256   |  8.736635  |
|          lennard_jones          |  1000   |  8.579706  |
|           tts_angular           |   64    |  8.473403  |
|   mobilenet_v2_quantized_qat    |   96    |  0.22707   |
|     resnet50_quantized_qat      |   32    |  0.20257   |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |  2048   | 0.98743  |
|           timm_nfnet            |   128   | 0.986667 |
|             demucs              |    1    | 0.981739 |
|           hf_T5_base            |    1    | 0.978039 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.974483 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972058 |
|           timm_regnet           |   32    | 0.971675 |
|      torch_multimodal_clip      |   32    | 0.971142 |
|       Background_Matting        |    1    | 0.970999 |
|              llama              |   32    | 0.969875 |
|        timm_efficientnet        |   64    | 0.967859 |
|           Super_SloMo           |    6    | 0.967263 |
|        basic_gnn_edgecnn        |    1    |  0.9662  |
|       doctr_det_predictor       |    1    | 0.965652 |
|           densenet121           |   64    | 0.964303 |
|          pytorch_unet           |    1    | 0.963768 |
|         LearningToPaint         |   96    | 0.962354 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.961491 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.959158 |
|            resnet152            |   32    | 0.958796 |
|             yolov3              |    8    | 0.957425 |
|            resnet50             |   32    | 0.95639  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.955921 |
|           timm_vovnet           |   32    | 0.95479  |
|     resnet50_quantized_qat      |   32    | 0.953527 |
|   mobilenet_v2_quantized_qat    |   96    | 0.95152  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.95136  |
|          timm_resnest           |   32    | 0.950763 |
|          basic_gnn_gcn          |    1    | 0.948181 |
|         vision_maskrcnn         |    1    | 0.948028 |
|     timm_vision_transformer     |   32    | 0.947651 |
|      doctr_reco_predictor       |    1    | 0.94542  |
|           hf_BigBird            |    1    | 0.942188 |
|          basic_gnn_gin          |    1    | 0.941423 |
|         basic_gnn_sage          |    1    | 0.941094 |
|           mnasnet1_0            |   32    | 0.938163 |
|     pyhpc_equation_of_state     | 1048576 | 0.937829 |
|  timm_vision_transformer_large  |   32    | 0.93082  |
|       mobilenet_v3_large        |   32    | 0.930818 |
|         resnext50_32x4d         |    8    | 0.928571 |
|         pytorch_stargan         |   16    | 0.927519 |
|       shufflenet_v2_x1_0        |   64    | 0.925057 |
|          mobilenet_v2           |   16    | 0.924855 |
|             hf_Bert             |    1    | 0.924505 |
|          fastNLP_Bert           |    1    | 0.920866 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.919881 |
|     nvidia_deeprecommender      |   256   |  0.9185  |
|            hf_Albert            |    1    | 0.912131 |
|        phlippe_densenet         |   128   | 0.91141  |
|       speech_transformer        |    1    | 0.909795 |
|          BERT_pytorch           |    2    | 0.904645 |
|            moondream            |    1    | 0.903841 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.902277 |
|          hf_DistilBert          |    1    | 0.899438 |
|               drq               |    1    | 0.898742 |
|              dcgan              |   256   | 0.897229 |
|           tts_angular           |   64    | 0.897155 |
|             hf_GPT2             |    1    | 0.894638 |
|          squeezenet1_1          |   16    | 0.894403 |
|         opacus_cifar10          |   64    | 0.891026 |
|          hf_Longformer          |    1    | 0.889855 |
|        soft_actor_critic        |   256   | 0.883888 |
|              vgg16              |    4    | 0.88143  |
|              hf_T5              |    1    | 0.879684 |
|          hf_GPT2_large          |    1    | 0.878149 |
|             hf_Bart             |    1    | 0.876536 |
|      functorch_dp_cifar10       |   64    | 0.869723 |
|         phlippe_resnet          |   128   | 0.866844 |
|          lennard_jones          |  1000   | 0.864909 |
|     functorch_maml_omniglot     |    1    | 0.858713 |
|          maml_omniglot          |    5    | 0.855799 |
|        hf_distil_whisper        |    1    | 0.855703 |
|          hf_Bert_large          |    1    | 0.855419 |
|           hf_T5_large           |    1    | 0.828934 |
|           hf_Reformer           |    1    | 0.82679  |
|            resnet18             |    8    | 0.818268 |
|             alexnet             |   128   | 0.781557 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.765579 |
|              maml               |    1    | 0.708185 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.568876 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|  detectron2_fasterrcnn_r_50_c4  |    1    | 804.014402 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 793.205411 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 693.168392 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 692.187362 |
|  timm_vision_transformer_large  |   32    | 661.03138  |
|           hf_T5_base            |    1    | 351.428784 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 253.547118 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 238.977347 |
|         vision_maskrcnn         |    1    | 230.336922 |
|           Super_SloMo           |    6    | 200.73429  |
|          hf_GPT2_large          |    1    | 132.171311 |
|           hf_T5_large           |    1    | 128.264711 |
|            moondream            |    1    | 101.699209 |
|           timm_nfnet            |   128   | 100.535871 |
|           hf_BigBird            |    1    | 96.462741  |
|        hf_distil_whisper        |    1    | 92.146527  |
|    detectron2_fcos_r_50_fpn     |    1    | 76.617896  |
|          pytorch_unet           |    1    | 66.114602  |
|       Background_Matting        |    1    | 64.288029  |
|      torch_multimodal_clip      |   32    | 62.212477  |
|              maml               |    1    | 57.469077  |
|           densenet121           |   64    | 53.227738  |
|             demucs              |    1    |  51.86793  |
|           timm_regnet           |   32    | 42.887437  |
|            resnet152            |   32    |  37.25491  |
|       doctr_det_predictor       |    1    | 36.323229  |
|          hf_Bert_large          |    1    | 33.545136  |
|             yolov3              |    8    | 32.656917  |
|          hf_Longformer          |    1    | 32.353368  |
|   mobilenet_v2_quantized_qat    |   96    | 31.343743  |
|     pyhpc_isoneutral_mixing     | 1048576 | 27.951596  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 27.054415  |
|        timm_efficientnet        |   64    | 24.345065  |
|       speech_transformer        |    1    | 23.946167  |
|           hf_Reformer           |    1    | 21.427801  |
|     timm_vision_transformer     |   32    | 21.026783  |
|             hf_Bart             |    1    | 18.876809  |
|              hf_T5              |    1    | 18.856982  |
|           timm_vovnet           |   32    |  18.50186  |
|     nvidia_deeprecommender      |   256   | 16.927331  |
|         pytorch_stargan         |   16    |  16.68542  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 16.656766  |
|          fastNLP_Bert           |    1    | 16.577531  |
|            resnet50             |   32    | 14.624125  |
|             hf_Bert             |    1    | 13.525218  |
|            hf_Albert            |    1    | 13.244145  |
|     resnet50_quantized_qat      |   32    | 12.279515  |
|          BERT_pytorch           |    2    | 12.195905  |
|             hf_GPT2             |    1    | 10.359744  |
|       shufflenet_v2_x1_0        |   64    | 10.042526  |
|          timm_resnest           |   32    | 10.007968  |
|        phlippe_densenet         |   128   |  9.635656  |
|         resnext50_32x4d         |    8    |  9.255974  |
|              llama              |   32    |  9.191716  |
|       mobilenet_v3_large        |   32    |  9.002803  |
|           tts_angular           |   64    |  8.840729  |
|         LearningToPaint         |   96    |  8.613204  |
|      functorch_dp_cifar10       |   64    |  7.967686  |
|          hf_DistilBert          |    1    |  7.925611  |
|         opacus_cifar10          |   64    |  7.793332  |
|          basic_gnn_gcn          |    1    |  7.745024  |
|              vgg16              |    4    |  7.351678  |
|           mnasnet1_0            |   32    |  7.333538  |
|             alexnet             |   128   |  6.873958  |
|        basic_gnn_edgecnn        |    1    |  6.706613  |
|          mobilenet_v2           |   16    |  5.843472  |
|              dlrm               |  2048   |  4.42264   |
|         basic_gnn_sage          |    1    |  4.096625  |
|          squeezenet1_1          |   16    |  3.838128  |
|          basic_gnn_gin          |    1    |  3.791047  |
|            resnet18             |    8    |  3.653622  |
|      doctr_reco_predictor       |    1    |  3.607951  |
|              dcgan              |   256   |  3.090938  |
|         phlippe_resnet          |   128   |  1.671366  |
|     pyhpc_equation_of_state     | 1048576 |  1.202579  |
|               drq               |    1    |  0.778218  |
|          maml_omniglot          |    5    |  0.535857  |
|     functorch_maml_omniglot     |    1    |  0.380145  |
|        soft_actor_critic        |   256   |  0.298162  |
|          lennard_jones          |  1000   |  0.196199  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|            XLNetLMHeadModel             |  8  | 11.374938 |
|       ElectraForQuestionAnswering       | 64  | 4.165355  |
|           ElectraForCausalLM            | 32  | 3.710189  |
|     MobileBertForQuestionAnswering      | 128 | 3.379256  |
|           RobertaForCausalLM            | 16  | 3.072875  |
|        BertForQuestionAnswering         | 16  | 3.059642  |
|       RobertaForQuestionAnswering       | 16  |  3.04111  |
|    LayoutLMForSequenceClassification    | 16  | 2.998999  |
|          MobileBertForMaskedLM          | 128 | 2.956934  |
|                CamemBert                | 16  | 2.882854  |
|           LayoutLMForMaskedLM           | 16  | 2.751237  |
|             BertForMaskedLM             | 16  | 2.745729  |
|    MegatronBertForQuestionAnswering     |  8  | 2.720924  |
|               DistillGPT2               | 16  | 2.459931  |
|         MegatronBertForCausalLM         |  4  |  2.39601  |
|       T5ForConditionalGeneration        |  4  | 2.372013  |
|                 T5Small                 |  4  | 2.364069  |
|      GPT2ForSequenceClassification      |  4  | 2.345179  |
|           DebertaForMaskedLM            |  8  | 2.343036  |
|      DebertaV2ForQuestionAnswering      |  1  | 2.257896  |
|            YituTechConvBert             | 16  | 2.223419  |
|             OPTForCausalLM              |  2  | 2.063598  |
|       DebertaForQuestionAnswering       | 16  | 2.061107  |
|             XGLMForCausalLM             |  8  | 2.032231  |
|         Speech2Text2ForCausalLM         | 256 | 1.975632  |
|          BlenderbotForCausalLM          |  4  | 1.873972  |
|       MT5ForConditionalGeneration       | 16  | 1.872905  |
|       BlenderbotSmallForCausalLM        | 64  | 1.853915  |
|     DistilBertForQuestionAnswering      | 256 | 1.823794  |
|          DebertaV2ForMaskedLM           |  2  | 1.780323  |
|            PLBartForCausalLM            |  8  | 1.774283  |
|     PLBartForConditionalGeneration      |  4  |  1.73369  |
|          DistilBertForMaskedLM          | 128 | 1.732659  |
|            MBartForCausalLM             |  4  | 1.704846  |
| BlenderbotSmallForConditionalGeneration | 64  | 1.703693  |
|            TrOCRForCausalLM             | 32  | 1.643482  |
|      MBartForConditionalGeneration      |  2  | 1.629816  |
|     PegasusForConditionalGeneration     | 32  | 1.628042  |
|     M2M100ForConditionalGeneration      | 16  | 1.619902  |
|           PegasusForCausalLM            | 32  | 1.611759  |
|      BartForConditionalGeneration       |  2  | 1.563606  |
|             BartForCausalLM             |  4  | 1.500412  |
|            AlbertForMaskedLM            |  4  | 1.475161  |
|       AlbertForQuestionAnswering        |  4  | 1.460668  |
|               GoogleFnet                | 16  | 1.351631  |
|          AllenaiLongformerBase          |  4  | 1.062707  |
+-----------------------------------------+-----+-----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 111.015155 |
|          MobileBertForMaskedLM          | 128 | 71.044013  |
|     MobileBertForQuestionAnswering      | 128 | 70.758921  |
|      MBartForConditionalGeneration      |  2  | 52.572612  |
|     M2M100ForConditionalGeneration      | 16  | 51.941664  |
|     PegasusForConditionalGeneration     | 32  | 51.580865  |
|      BartForConditionalGeneration       |  2  | 44.857119  |
|          DebertaV2ForMaskedLM           |  2  | 43.034587  |
|             XGLMForCausalLM             |  8  | 42.909242  |
|          BlenderbotForCausalLM          |  4  | 42.831899  |
|       MT5ForConditionalGeneration       | 16  | 38.772413  |
|            XLNetLMHeadModel             |  8  | 37.790604  |
| BlenderbotSmallForConditionalGeneration | 64  | 37.083808  |
|         MegatronBertForCausalLM         |  4  | 36.640128  |
|    MegatronBertForQuestionAnswering     |  8  | 36.008984  |
|            YituTechConvBert             | 16  |  34.98459  |
|     PLBartForConditionalGeneration      |  4  | 30.035001  |
|      DebertaV2ForQuestionAnswering      |  1  | 29.411706  |
|       T5ForConditionalGeneration        |  4  | 29.098211  |
|                 T5Small                 |  4  | 29.069518  |
|           DebertaForMaskedLM            |  8  | 25.936311  |
|       DebertaForQuestionAnswering       | 16  | 25.492244  |
|           PegasusForCausalLM            | 32  | 25.197185  |
|            MBartForCausalLM             |  4  |  25.17325  |
|             OPTForCausalLM              |  2  | 25.038907  |
|            TrOCRForCausalLM             | 32  |  24.6478   |
|            AlbertForMaskedLM            |  4  | 23.228735  |
|      GPT2ForSequenceClassification      |  4  | 21.854173  |
|       RobertaForQuestionAnswering       | 16  | 21.790101  |
|           ElectraForCausalLM            | 32  |  21.6686   |
|                CamemBert                | 16  | 21.645891  |
|           LayoutLMForMaskedLM           | 16  |  21.63223  |
|        BertForQuestionAnswering         | 16  | 21.616332  |
|             BartForCausalLM             |  4  | 21.607947  |
|       ElectraForQuestionAnswering       | 64  | 21.484144  |
|             BertForMaskedLM             | 16  | 21.455882  |
|           RobertaForCausalLM            | 16  |  21.39742  |
|    LayoutLMForSequenceClassification    | 16  |  21.3753   |
|       AlbertForQuestionAnswering        |  4  | 20.482483  |
|       BlenderbotSmallForCausalLM        | 64  | 19.832561  |
|          DistilBertForMaskedLM          | 128 | 17.407909  |
|     DistilBertForQuestionAnswering      | 256 | 17.301255  |
|         Speech2Text2ForCausalLM         | 256 | 16.879801  |
|            PLBartForCausalLM            |  8  | 16.681835  |
|               GoogleFnet                | 16  | 16.318048  |
|               DistillGPT2               | 16  | 15.191973  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            PLBartForCausalLM            |  8  | 0.991477 |
|       AlbertForQuestionAnswering        |  4  | 0.990539 |
|               GoogleFnet                | 16  | 0.989938 |
|               DistillGPT2               | 16  | 0.989878 |
|           LayoutLMForMaskedLM           | 16  | 0.989283 |
|          DistilBertForMaskedLM          | 128 | 0.989229 |
|             BertForMaskedLM             | 16  | 0.989112 |
|           ElectraForCausalLM            | 32  | 0.988674 |
|            AlbertForMaskedLM            |  4  | 0.987644 |
|       ElectraForQuestionAnswering       | 64  | 0.98738  |
|           RobertaForCausalLM            | 16  | 0.986933 |
|    LayoutLMForSequenceClassification    | 16  | 0.986422 |
|            YituTechConvBert             | 16  | 0.986161 |
|             OPTForCausalLM              |  2  | 0.985295 |
|     DistilBertForQuestionAnswering      | 256 | 0.98521  |
|         Speech2Text2ForCausalLM         | 256 | 0.984998 |
|                CamemBert                | 16  | 0.984644 |
|       BlenderbotSmallForCausalLM        | 64  | 0.984636 |
|        BertForQuestionAnswering         | 16  | 0.983353 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.982933 |
|            TrOCRForCausalLM             | 32  | 0.980596 |
|       DebertaForQuestionAnswering       | 16  | 0.979715 |
|       RobertaForQuestionAnswering       | 16  | 0.979209 |
|       T5ForConditionalGeneration        |  4  | 0.976349 |
|          MobileBertForMaskedLM          | 128 | 0.976251 |
|                 T5Small                 |  4  | 0.975806 |
|      GPT2ForSequenceClassification      |  4  | 0.974276 |
|       MT5ForConditionalGeneration       | 16  | 0.973695 |
|           PegasusForCausalLM            | 32  | 0.970161 |
|     MobileBertForQuestionAnswering      | 128 | 0.963512 |
|          AllenaiLongformerBase          |  4  | 0.962701 |
|           DebertaForMaskedLM            |  8  | 0.962321 |
|     PLBartForConditionalGeneration      |  4  | 0.960662 |
|     M2M100ForConditionalGeneration      | 16  | 0.949537 |
|         MegatronBertForCausalLM         |  4  | 0.942315 |
|          BlenderbotForCausalLM          |  4  | 0.940395 |
|      BartForConditionalGeneration       |  2  | 0.940269 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.935287 |
|          DebertaV2ForMaskedLM           |  2  | 0.935119 |
|    MegatronBertForQuestionAnswering     |  8  | 0.93396  |
|            XLNetLMHeadModel             |  8  | 0.930858 |
|             BartForCausalLM             |  4  | 0.921708 |
|      MBartForConditionalGeneration      |  2  | 0.919916 |
|     PegasusForConditionalGeneration     | 32  | 0.915143 |
|             XGLMForCausalLM             |  8  | 0.911604 |
|            MBartForCausalLM             |  4  | 0.900517 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 932.819339 |
|       AlbertForQuestionAnswering        |  4  | 483.219509 |
|            AlbertForMaskedLM            |  4  | 482.300309 |
|            XLNetLMHeadModel             |  8  | 412.041085 |
|               GoogleFnet                | 16  | 275.806189 |
|             OPTForCausalLM              |  2  | 196.675558 |
|            TrOCRForCausalLM             | 32  | 170.700521 |
|            MBartForCausalLM             |  4  | 169.971817 |
|      MBartForConditionalGeneration      |  2  | 166.010297 |
|     PegasusForConditionalGeneration     | 32  | 159.784646 |
|            YituTechConvBert             | 16  | 133.975604 |
|     DistilBertForQuestionAnswering      | 256 | 132.905725 |
|      BartForConditionalGeneration       |  2  | 131.159679 |
|            PLBartForCausalLM            |  8  | 130.919161 |
|    MegatronBertForQuestionAnswering     |  8  | 126.892274 |
|       T5ForConditionalGeneration        |  4  | 121.913551 |
|                 T5Small                 |  4  | 121.89442  |
|          BlenderbotForCausalLM          |  4  | 121.053535 |
|          DebertaV2ForMaskedLM           |  2  | 119.235388 |
|     M2M100ForConditionalGeneration      | 16  | 118.090716 |
|     PLBartForConditionalGeneration      |  4  | 116.859553 |
|          DistilBertForMaskedLM          | 128 | 110.71923  |
| BlenderbotSmallForConditionalGeneration | 64  | 105.840772 |
|          MobileBertForMaskedLM          | 128 | 102.433749 |
|           RobertaForCausalLM            | 16  | 101.870122 |
|           LayoutLMForMaskedLM           | 16  | 100.121851 |
|             BertForMaskedLM             | 16  | 99.260476  |
|                CamemBert                | 16  | 96.564263  |
|       DebertaForQuestionAnswering       | 16  | 92.651292  |
|             BartForCausalLM             |  4  | 91.276464  |
|       MT5ForConditionalGeneration       | 16  |  91.05218  |
|         MegatronBertForCausalLM         |  4  | 82.903404  |
|    LayoutLMForSequenceClassification    | 16  | 76.936805  |
|           PegasusForCausalLM            | 32  | 76.635282  |
|        BertForQuestionAnswering         | 16  | 75.222353  |
|       RobertaForQuestionAnswering       | 16  | 75.098824  |
|             XGLMForCausalLM             |  8  | 71.231504  |
|               DistillGPT2               | 16  | 69.124896  |
|       ElectraForQuestionAnswering       | 64  | 62.226968  |
|      DebertaV2ForQuestionAnswering      |  1  | 61.198044  |
|           DebertaForMaskedLM            |  8  | 58.793942  |
|         Speech2Text2ForCausalLM         | 256 | 57.760943  |
|      GPT2ForSequenceClassification      |  4  | 57.540977  |
|           ElectraForCausalLM            | 32  | 55.943676  |
|       BlenderbotSmallForCausalLM        | 64  |  55.36321  |
|     MobileBertForQuestionAnswering      | 128 | 53.324219  |
+-----------------------------------------+-----+------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|           fbnetc_100            | 512  | 4.825893 |
|           mnasnet_100           | 512  | 4.813183 |
|          inception_v3           | 128  | 4.774416 |
|       gluon_inception_v3        | 256  | 4.720867 |
|        adv_inception_v3         | 128  | 4.642799 |
|           resnest101e           |  64  | 4.637166 |
|      mobilenetv3_large_100      | 512  | 4.418203 |
|           regnety_002           | 1024 | 4.389682 |
|          cspdarknet53           |  64  | 4.359477 |
|            fbnetv3_b            | 256  | 4.217318 |
|        ese_vovnet19b_dw         | 256  | 4.182778 |
|         mobilenetv2_100         | 128  | 4.107821 |
|           res2next50            | 128  | 4.10424  |
|        res2net101_26w_4s        | 128  | 3.872546 |
|          spnasnet_100           | 128  | 3.814727 |
|            lcnet_050            | 256  | 3.810928 |
|        res2net50_14w_8s         | 128  | 3.754514 |
|            hrnet_w18            | 128  | 3.732303 |
|           rexnet_100            | 256  | 3.66849  |
|          botnet26t_256          | 128  | 3.66289  |
|             dla102              | 128  | 3.647942 |
|            nfnet_l0             | 128  | 3.586758 |
|          pnasnet5large          |  16  | 3.571498 |
|            gernet_l             | 128  | 3.461357 |
|     swsl_resnext101_32x16d      |  32  | 3.293954 |
|           volo_d1_224           |  64  | 3.281779 |
|        eca_halonext26ts         | 128  | 3.029436 |
|           dm_nfnet_f0           | 128  | 3.02735  |
|       tf_efficientnet_b0        | 128  | 3.018053 |
|       eca_botnext26ts_256       | 128  | 2.986336 |
|        convmixer_768_32         |  32  | 2.880711 |
|            tinynet_a            | 128  | 2.873338 |
|            repvgg_a2            | 128  | 2.589544 |
|           selecsls42b           | 128  | 2.583184 |
|         visformer_small         | 128  | 2.49739  |
|          ghostnet_100           | 512  | 2.461915 |
|           mobilevit_s           |  64  | 2.458278 |
|         poolformer_m36          |  64  | 2.255118 |
|      xcit_large_24_p8_224       |  16  | 1.990897 |
|          jx_nest_base           |  32  | 1.963932 |
|             dpn107              |  64  | 1.901807 |
|            levit_128            | 1024 | 1.895985 |
|         coat_lite_mini          | 128  | 1.885551 |
|          convnext_base          |  64  | 1.86053  |
|           tf_mixnet_l           | 128  | 1.857982 |
|            mixnet_l             | 128  | 1.843201 |
|        tnt_s_patch16_224        | 128  | 1.621051 |
|        twins_pcpvt_base         | 128  | 1.605525 |
|  swin_base_patch4_window7_224   |  64  | 1.580506 |
| deit_base_distilled_patch16_224 |  64  | 1.579056 |
|           convit_base           |  64  | 1.562051 |
|          gmlp_s16_224           | 128  | 1.534795 |
|          mixer_b16_224          | 128  | 1.529287 |
|      beit_base_patch16_224      |  64  | 1.52486  |
|         crossvit_9_240          | 256  | 1.477321 |
|      vit_base_patch16_224       |  64  | 1.456753 |
|            pit_b_224            |  64  | 1.456521 |
|          resmlp_12_224          | 128  | 1.450042 |
|        sebotnet33ts_256         |  64  | 1.373932 |
|          gmixer_24_224          | 128  | 1.339463 |
|          cait_m36_384           |  4   | 0.955684 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|      xcit_large_24_p8_224       |  16  | 78.96086  |
|          cait_m36_384           |  4   | 75.69072  |
|            hrnet_w18            | 128  | 70.787568 |
|          pnasnet5large          |  16  | 68.45521  |
|  swin_base_patch4_window7_224   |  64  | 57.85863  |
|         poolformer_m36          |  64  | 55.150953 |
|           mobilevit_s           |  64  | 51.678593 |
|        res2net101_26w_4s        | 128  | 49.701045 |
|        tnt_s_patch16_224        | 128  | 49.348098 |
|          jx_nest_base           |  32  | 49.143457 |
|           resnest101e           |  64  | 48.905378 |
|        res2net50_14w_8s         | 128  | 46.645862 |
|             dpn107              |  64  | 45.325273 |
|        twins_pcpvt_base         | 128  | 45.017324 |
|           volo_d1_224           |  64  | 43.217136 |
|           tf_mixnet_l           | 128  | 40.664032 |
|            levit_128            | 1024 | 38.415029 |
|            mixnet_l             | 128  | 37.90222  |
|            fbnetv3_b            | 256  | 37.014449 |
|        eca_halonext26ts         | 128  | 36.230535 |
|        sebotnet33ts_256         |  64  | 35.688176 |
|        adv_inception_v3         | 128  | 34.80613  |
|          gmixer_24_224          | 128  | 34.768381 |
|         crossvit_9_240          | 256  | 32.060956 |
|          inception_v3           | 128  | 31.912164 |
|          gmlp_s16_224           | 128  | 31.853523 |
|       gluon_inception_v3        | 256  | 30.388911 |
|           res2next50            | 128  | 29.542001 |
|       eca_botnext26ts_256       | 128  | 29.210934 |
|          convnext_base          |  64  | 29.028845 |
|           convit_base           |  64  | 28.742914 |
|             dla102              | 128  | 28.269381 |
|           rexnet_100            | 256  | 27.841471 |
|         coat_lite_mini          | 128  | 27.421074 |
|          ghostnet_100           | 512  | 26.649057 |
|          botnet26t_256          | 128  | 26.325916 |
|            tinynet_a            | 128  | 26.048851 |
|     swsl_resnext101_32x16d      |  32  | 25.833246 |
|       tf_efficientnet_b0        | 128  | 23.956497 |
|         visformer_small         | 128  | 23.277356 |
|          cspdarknet53           |  64  | 22.586836 |
|        convmixer_768_32         |  32  | 21.916234 |
|            pit_b_224            |  64  | 20.47278  |
|      mobilenetv3_large_100      | 512  | 20.183246 |
|           dm_nfnet_f0           | 128  | 20.152057 |
|          mixer_b16_224          | 128  | 19.867428 |
|      beit_base_patch16_224      |  64  | 19.126165 |
|           regnety_002           | 1024 | 19.119125 |
|         mobilenetv2_100         | 128  | 19.109762 |
| deit_base_distilled_patch16_224 |  64  | 18.85041  |
|          spnasnet_100           | 128  | 18.64424  |
|      vit_base_patch16_224       |  64  | 18.636988 |
|            gernet_l             | 128  | 18.039001 |
|            nfnet_l0             | 128  | 17.925019 |
|            repvgg_a2            | 128  | 17.815102 |
|           fbnetc_100            | 512  | 17.235642 |
|            lcnet_050            | 256  | 17.120019 |
|          resmlp_12_224          | 128  | 16.44424  |
|           selecsls42b           | 128  | 15.844703 |
|           mnasnet_100           | 512  | 15.429807 |
|        ese_vovnet19b_dw         | 256  | 14.167186 |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.995291 |
|           dm_nfnet_f0           | 128  | 0.992149 |
|           fbnetc_100            | 512  | 0.991708 |
|           regnety_002           | 1024 | 0.991583 |
|           mnasnet_100           | 512  | 0.991514 |
|      mobilenetv3_large_100      | 512  | 0.991327 |
|           rexnet_100            | 256  | 0.991312 |
|       gluon_inception_v3        | 256  | 0.990828 |
|            levit_128            | 1024 | 0.989811 |
|            fbnetv3_b            | 256  | 0.989641 |
|            nfnet_l0             | 128  | 0.989138 |
|          ghostnet_100           | 512  | 0.988399 |
|             dpn107              |  64  | 0.988166 |
|           convit_base           |  64  | 0.988025 |
|      beit_base_patch16_224      |  64  | 0.987414 |
|          mixer_b16_224          | 128  |  0.9868  |
|             dla102              | 128  | 0.98676  |
|        twins_pcpvt_base         | 128  | 0.986561 |
|      xcit_large_24_p8_224       |  16  | 0.986137 |
|          gmlp_s16_224           | 128  | 0.986103 |
|        eca_halonext26ts         | 128  | 0.986092 |
|           res2next50            | 128  | 0.985931 |
|        res2net101_26w_4s        | 128  | 0.985837 |
|       eca_botnext26ts_256       | 128  | 0.985798 |
|         visformer_small         | 128  | 0.985708 |
|        convmixer_768_32         |  32  | 0.985083 |
|           tf_mixnet_l           | 128  | 0.984778 |
|        tnt_s_patch16_224        | 128  | 0.984742 |
|           resnest101e           |  64  | 0.984538 |
|          botnet26t_256          | 128  | 0.984155 |
|            mixnet_l             | 128  | 0.98392  |
|          convnext_base          |  64  | 0.983683 |
|         coat_lite_mini          | 128  | 0.983505 |
|         poolformer_m36          |  64  | 0.983428 |
|        adv_inception_v3         | 128  | 0.982919 |
|          inception_v3           | 128  | 0.982362 |
|          gmixer_24_224          | 128  | 0.982237 |
|       tf_efficientnet_b0        | 128  | 0.982203 |
|        res2net50_14w_8s         | 128  | 0.982037 |
|         crossvit_9_240          | 256  | 0.981686 |
|          pnasnet5large          |  16  | 0.981579 |
|      vit_base_patch16_224       |  64  | 0.981162 |
| deit_base_distilled_patch16_224 |  64  | 0.980963 |
|          jx_nest_base           |  32  | 0.980511 |
|  swin_base_patch4_window7_224   |  64  | 0.980343 |
|         mobilenetv2_100         | 128  |  0.9797  |
|            pit_b_224            |  64  | 0.978717 |
|            gernet_l             | 128  | 0.978677 |
|           mobilevit_s           |  64  | 0.978221 |
|          cspdarknet53           |  64  | 0.978121 |
|          resmlp_12_224          | 128  | 0.97653  |
|            tinynet_a            | 128  | 0.976438 |
|     swsl_resnext101_32x16d      |  32  | 0.975707 |
|           volo_d1_224           |  64  | 0.974884 |
|           selecsls42b           | 128  | 0.973832 |
|            hrnet_w18            | 128  | 0.973288 |
|          spnasnet_100           | 128  | 0.972464 |
|          cait_m36_384           |  4   | 0.970098 |
|            repvgg_a2            | 128  | 0.969336 |
|            lcnet_050            | 256  | 0.956948 |
|        sebotnet33ts_256         |  64  | 0.832902 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|          cait_m36_384           |  4   | 461.601456 |
|      xcit_large_24_p8_224       |  16  | 253.943268 |
|           convit_base           |  64  | 228.857369 |
|             dpn107              |  64  | 226.034226 |
|            levit_128            | 1024 | 207.713928 |
|        tnt_s_patch16_224        | 128  | 190.100179 |
|           dm_nfnet_f0           | 128  | 177.707574 |
|          convnext_base          |  64  | 176.706467 |
|        ese_vovnet19b_dw         | 256  | 167.039741 |
|  swin_base_patch4_window7_224   |  64  | 157.075016 |
|          jx_nest_base           |  32  |  153.8974  |
|          mixer_b16_224          | 128  | 153.455951 |
|        twins_pcpvt_base         | 128  | 149.548461 |
|         poolformer_m36          |  64  | 142.331409 |
|       gluon_inception_v3        | 256  | 142.130629 |
|            nfnet_l0             | 128  | 138.291904 |
|        sebotnet33ts_256         |  64  | 131.185547 |
|           tf_mixnet_l           | 128  | 130.486343 |
|            mixnet_l             | 128  | 127.304293 |
|          ghostnet_100           | 512  | 123.467238 |
|         crossvit_9_240          | 256  | 122.252345 |
|          gmixer_24_224          | 128  | 114.469481 |
|          pnasnet5large          |  16  | 113.834513 |
|           volo_d1_224           |  64  | 110.892238 |
|          gmlp_s16_224           | 128  | 109.558678 |
|      beit_base_patch16_224      |  64  | 105.583635 |
|      vit_base_patch16_224       |  64  | 105.573789 |
|        res2net101_26w_4s        | 128  | 103.821582 |
|         coat_lite_mini          | 128  | 101.071634 |
|            pit_b_224            |  64  |  99.01694  |
| deit_base_distilled_patch16_224 |  64  | 98.051943  |
|        eca_halonext26ts         | 128  | 96.326368  |
|       eca_botnext26ts_256       | 128  | 95.860462  |
|           fbnetc_100            | 512  | 93.937838  |
|     swsl_resnext101_32x16d      |  32  |  93.10138  |
|             dla102              | 128  | 90.058881  |
|            hrnet_w18            | 128  | 89.205198  |
|           regnety_002           | 1024 | 84.568656  |
|         visformer_small         | 128  | 83.468338  |
|          resmlp_12_224          | 128  | 81.144548  |
|        res2net50_14w_8s         | 128  | 80.516892  |
|           rexnet_100            | 256  | 79.900589  |
|          botnet26t_256          | 128  |  79.26594  |
|           res2next50            | 128  |  79.05299  |
|           mnasnet_100           | 512  | 79.013133  |
|           resnest101e           |  64  |  78.37233  |
|      mobilenetv3_large_100      | 512  | 76.364637  |
|        convmixer_768_32         |  32  | 73.185141  |
|           mobilevit_s           |  64  | 71.701405  |
|            fbnetv3_b            | 256  | 71.390372  |
|        adv_inception_v3         | 128  | 69.423881  |
|          inception_v3           | 128  |  69.04541  |
|          cspdarknet53           |  64  | 48.185827  |
|            repvgg_a2            | 128  | 46.535909  |
|       tf_efficientnet_b0        | 128  | 45.796224  |
|            gernet_l             | 128  | 38.869329  |
|           selecsls42b           | 128  | 35.136927  |
|            tinynet_a            | 128  | 34.262941  |
|         mobilenetv2_100         | 128  | 23.187836  |
|          spnasnet_100           | 128  | 21.352788  |
|            lcnet_050            | 256  |  9.771031  |
+---------------------------------+------+------------+

WeizhuoZhang-intel · 2024-10-28T14:52:04Z

[amp] Performance Dashboard for amp precision -- Single-core Single-thread (2024-10-26 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8488C. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c7i.metal-24xl
CPU Model	Intel(R) Xeon(R) Platinum 8488C CPU @ 2.40GHz
Installed Memory	192GB (8x24GB DDR5 4800 MT/s [4800 MT/s])
OS	Ubuntu 22.04.3 LTS
Kernel	6.2.0-1017-aws
Microcode	0x2b0004d0
GCC	gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.15
OpenSSL	OpenSSL 3.2.0 23 Nov 2023 (Library: OpenSSL 3.2.0 23 Nov 2023)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor | 91%, 73/80 | 100%, 46/46 | 100%, 61/61 |
+----------+------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   2.81x    |    1.85x    |    3.11x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   19.79    |    21.97    |    21.59    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.88x    |    0.90x    |    0.84x    |
+----------+------------+-------------+-------------+

torchbench suite with amp precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 37.226686 |
|     pyhpc_equation_of_state     |    1    | 19.814882 |
|              dcgan              |    1    |  8.86836  |
|          squeezenet1_1          |    1    | 7.775837  |
|          timm_resnest           |    1    | 6.738322  |
|      functorch_dp_cifar10       |    1    | 6.591654  |
|         opacus_cifar10          |    1    | 6.538268  |
|           timm_nfnet            |    1    | 5.979686  |
|            resnet18             |    1    | 5.797691  |
|            resnet50             |    1    | 5.618234  |
|       doctr_det_predictor       |    1    | 5.544517  |
|         LearningToPaint         |    1    | 5.138236  |
|         resnext50_32x4d         |    1    | 5.072315  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 4.684421  |
|          mobilenet_v2           |    1    | 4.676297  |
|              vgg16              |    1    | 4.518927  |
|          lennard_jones          |    1    | 4.494118  |
|             yolov3              |    1    | 4.487003  |
|            resnet152            |    1    | 4.437074  |
|           mnasnet1_0            |    1    | 4.422979  |
|       mobilenet_v3_large        |    1    | 4.370021  |
|              llama              |    1    | 4.334582  |
|             alexnet             |    1    |  4.28464  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 4.271015  |
|           timm_vovnet           |    1    | 4.267456  |
|     nvidia_deeprecommender      |    1    | 4.262838  |
|         vision_maskrcnn         |    1    |  3.97917  |
|      doctr_reco_predictor       |    1    | 3.929692  |
|     functorch_maml_omniglot     |    1    | 3.609813  |
|       shufflenet_v2_x1_0        |    1    | 3.539176  |
|           densenet121           |    1    | 3.484928  |
|           timm_regnet           |    1    | 3.469721  |
|          basic_gnn_gin          |    1    | 3.435563  |
|              dlrm               |    1    | 3.368526  |
|         phlippe_resnet          |    1    | 3.147043  |
|        timm_efficientnet        |    1    | 3.094782  |
|    detectron2_fcos_r_50_fpn     |    1    | 3.048877  |
|          basic_gnn_gcn          |    1    | 3.028914  |
|         basic_gnn_sage          |    1    | 3.028671  |
|           Super_SloMo           |    1    | 2.942433  |
|          BERT_pytorch           |    1    | 2.724242  |
|        phlippe_densenet         |    1    | 2.702675  |
|               drq               |    1    | 2.689093  |
|          maml_omniglot          |    5    | 2.634508  |
|          pytorch_unet           |    1    | 2.290279  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 2.237335  |
|       Background_Matting        |    1    | 2.227814  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 2.127207  |
|             hf_GPT2             |    1    | 2.047739  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2.014793  |
|  timm_vision_transformer_large  |    1    | 1.988894  |
|     timm_vision_transformer     |    1    | 1.962304  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.910084  |
|        soft_actor_critic        |   256   | 1.832939  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  1.82611  |
|           hf_Reformer           |    1    | 1.804188  |
|             hf_Bert             |    1    | 1.776774  |
|          hf_GPT2_large          |    1    | 1.764733  |
|         pytorch_stargan         |   16    | 1.756912  |
|        basic_gnn_edgecnn        |    1    | 1.750347  |
|          hf_Bert_large          |    1    | 1.733459  |
|          hf_DistilBert          |    1    | 1.666645  |
|          fastNLP_Bert           |    1    | 1.638732  |
|             hf_Bart             |    1    | 1.533717  |
|      torch_multimodal_clip      |    1    | 1.483746  |
|           hf_T5_base            |    1    | 1.438525  |
|            hf_Albert            |    1    | 1.431476  |
|            moondream            |    1    | 1.427205  |
|       speech_transformer        |    1    | 1.412493  |
|        hf_distil_whisper        |    1    | 1.382512  |
|           hf_BigBird            |    1    | 1.319273  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.311076  |
|           hf_T5_large           |    1    | 1.236383  |
|              hf_T5              |    1    | 1.112659  |
|             demucs              |    1    | 1.042199  |
|           tts_angular           |    1    | 0.998064  |
|     resnet50_quantized_qat      |    1    | 0.985472  |
|   mobilenet_v2_quantized_qat    |    1    | 0.979692  |
|          hf_Longformer          |    1    |  0.94026  |
|              maml               |    1    | 0.910079  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|          basic_gnn_gcn          |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|             demucs              |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|        hf_distil_whisper        |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|              dlrm               |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|               drq               |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|            moondream            |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|              llama              |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
| detectron2_fasterrcnn_r_101_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_dc5 |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_101_fpn |    1    |   fail_accuracy    |
|  detectron2_fasterrcnn_r_50_c4  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_dc5  |    1    |   fail_accuracy    |
| detectron2_fasterrcnn_r_50_fpn  |    1    |   fail_accuracy    |
|         vision_maskrcnn         |    1    |   fail_accuracy    |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 127.631082 |
|         vision_maskrcnn         |    1    | 120.463083 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 96.438707  |
|    detectron2_fcos_r_50_fpn     |    1    | 96.146965  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 74.550655  |
|              maml               |    1    | 72.586714  |
|           hf_T5_large           |    1    | 57.523171  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  51.8665   |
|          hf_Longformer          |    1    | 50.043252  |
|       speech_transformer        |    1    | 44.533522  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 40.349826  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 37.880136  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 37.497478  |
|           hf_Reformer           |    1    | 37.368709  |
|           densenet121           |    1    | 34.087599  |
|  timm_vision_transformer_large  |    1    | 28.769025  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 28.436996  |
|            moondream            |    1    | 28.070422  |
|          fastNLP_Bert           |    1    | 26.729482  |
|          basic_gnn_gcn          |    1    | 26.190823  |
|          hf_Bert_large          |    1    | 23.220402  |
|      torch_multimodal_clip      |    1    | 22.516098  |
|           Super_SloMo           |    1    | 22.173318  |
|            resnet152            |    1    | 22.152061  |
|        hf_distil_whisper        |    1    | 22.080092  |
|          hf_GPT2_large          |    1    | 21.813776  |
|          BERT_pytorch           |    1    | 20.040397  |
|              hf_T5              |    1    | 18.872117  |
|             yolov3              |    1    | 18.519462  |
|             hf_Bart             |    1    | 18.235779  |
|       doctr_det_predictor       |    1    |  17.86122  |
|             demucs              |    1    |  17.11627  |
|        phlippe_densenet         |    1    | 16.727083  |
|           timm_regnet           |    1    |  16.6747   |
|           timm_nfnet            |    1    | 16.006706  |
|        timm_efficientnet        |    1    | 15.880884  |
|       shufflenet_v2_x1_0        |    1    | 15.760659  |
|              llama              |    1    | 15.423118  |
|             hf_Bert             |    1    | 15.346552  |
|             hf_GPT2             |    1    | 15.191904  |
|          timm_resnest           |    1    |  14.72846  |
|         pytorch_stargan         |   16    | 14.671589  |
|       mobilenet_v3_large        |    1    | 14.410433  |
|     timm_vision_transformer     |    1    | 14.388789  |
|            hf_Albert            |    1    |  14.32313  |
|      doctr_reco_predictor       |    1    | 14.249239  |
|           timm_vovnet           |    1    | 13.725329  |
|          mobilenet_v2           |    1    | 13.274302  |
|            resnet50             |    1    | 13.218815  |
|         resnext50_32x4d         |    1    | 13.125634  |
|           mnasnet1_0            |    1    | 12.898575  |
|         opacus_cifar10          |    1    | 12.786409  |
|          hf_DistilBert          |    1    |  12.59491  |
|     pyhpc_isoneutral_mixing     |    1    |  12.22533  |
|      functorch_dp_cifar10       |    1    | 12.181069  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.031876  |
|            resnet18             |    1    | 10.500638  |
|          squeezenet1_1          |    1    | 10.475684  |
|         LearningToPaint         |    1    | 10.303089  |
|         phlippe_resnet          |    1    | 10.108034  |
|           hf_T5_base            |    1    |  9.976747  |
|          maml_omniglot          |    5    |  9.876862  |
|     pyhpc_equation_of_state     |    1    |  9.66062   |
|             alexnet             |    1    |  9.605032  |
|     functorch_maml_omniglot     |    1    |  9.424371  |
|              vgg16              |    1    |  9.304564  |
|               drq               |    1    |  9.275541  |
|              dlrm               |    1    |  9.047339  |
|       Background_Matting        |    1    |  9.007105  |
|         basic_gnn_sage          |    1    |  8.725575  |
|              dcgan              |    1    |  8.714905  |
|          basic_gnn_gin          |    1    |  8.645687  |
|        soft_actor_critic        |   256   |  8.617858  |
|        basic_gnn_edgecnn        |    1    |  8.514071  |
|     nvidia_deeprecommender      |    1    |  8.505461  |
|          lennard_jones          |    1    |  8.502455  |
|           tts_angular           |    1    |  8.362793  |
|          pytorch_unet           |    1    |  4.806894  |
|   mobilenet_v2_quantized_qat    |    1    |  0.202675  |
|     resnet50_quantized_qat      |    1    |  0.180122  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|              dlrm               |    1    | 0.985729 |
|             demucs              |    1    | 0.983326 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.975531 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972698 |
|       Background_Matting        |    1    | 0.970528 |
|          pytorch_unet           |    1    | 0.966973 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.966734 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.965264 |
|              llama              |    1    | 0.964818 |
|        basic_gnn_edgecnn        |    1    | 0.964344 |
|      torch_multimodal_clip      |    1    | 0.959368 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.954079 |
|         vision_maskrcnn         |    1    | 0.951743 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.951194 |
|         LearningToPaint         |    1    | 0.951034 |
|       doctr_det_predictor       |    1    | 0.949786 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.948432 |
|          basic_gnn_gcn          |    1    | 0.945415 |
|     resnet50_quantized_qat      |    1    | 0.944272 |
|      doctr_reco_predictor       |    1    | 0.94311  |
|         basic_gnn_sage          |    1    | 0.941497 |
|          basic_gnn_gin          |    1    | 0.941417 |
|           Super_SloMo           |    1    | 0.941369 |
|           hf_BigBird            |    1    | 0.939128 |
|             hf_GPT2             |    1    | 0.938655 |
|          fastNLP_Bert           |    1    | 0.934626 |
|             hf_Bert             |    1    | 0.927835 |
|         pytorch_stargan         |   16    | 0.926284 |
|        hf_distil_whisper        |    1    | 0.920344 |
|           hf_T5_base            |    1    | 0.918412 |
|            hf_Albert            |    1    | 0.916941 |
|          hf_DistilBert          |    1    | 0.91044  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.909242 |
|       speech_transformer        |    1    | 0.906874 |
|   mobilenet_v2_quantized_qat    |    1    | 0.905689 |
|          BERT_pytorch           |    1    | 0.903006 |
|          hf_GPT2_large          |    1    | 0.895742 |
|              hf_T5              |    1    | 0.894805 |
|          hf_Longformer          |    1    | 0.89269  |
|         opacus_cifar10          |    1    | 0.892291 |
|               drq               |    1    | 0.889032 |
|        soft_actor_critic        |   256   | 0.885559 |
|           tts_angular           |    1    | 0.883845 |
|             hf_Bart             |    1    | 0.880554 |
|  timm_vision_transformer_large  |    1    | 0.871114 |
|           timm_nfnet            |    1    | 0.870075 |
|     timm_vision_transformer     |    1    | 0.867647 |
|          mobilenet_v2           |    1    | 0.866667 |
|        timm_efficientnet        |    1    | 0.864111 |
|          squeezenet1_1          |    1    | 0.863772 |
|          lennard_jones          |    1    | 0.860465 |
|            moondream            |    1    | 0.860182 |
|              vgg16              |    1    | 0.859056 |
|          maml_omniglot          |    5    | 0.858034 |
|     functorch_maml_omniglot     |    1    | 0.857812 |
|          hf_Bert_large          |    1    | 0.853267 |
|           mnasnet1_0            |    1    | 0.848806 |
|       mobilenet_v3_large        |    1    | 0.848134 |
|              dcgan              |    1    | 0.846599 |
|     pyhpc_equation_of_state     |    1    | 0.844262 |
|             alexnet             |    1    | 0.839218 |
|     nvidia_deeprecommender      |    1    | 0.837309 |
|      functorch_dp_cifar10       |    1    | 0.837292 |
|         phlippe_resnet          |    1    | 0.835165 |
|          timm_resnest           |    1    | 0.834213 |
|       shufflenet_v2_x1_0        |    1    | 0.827778 |
|           hf_Reformer           |    1    |  0.825   |
|     pyhpc_isoneutral_mixing     |    1    | 0.82035  |
|           hf_T5_large           |    1    | 0.81615  |
|         resnext50_32x4d         |    1    | 0.806234 |
|        phlippe_densenet         |    1    | 0.803863 |
|            resnet18             |    1    | 0.802442 |
|           densenet121           |    1    | 0.800422 |
|           timm_vovnet           |    1    | 0.792962 |
|             yolov3              |    1    | 0.790826 |
|           timm_regnet           |    1    | 0.783034 |
|            resnet50             |    1    | 0.779967 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.767316 |
|            resnet152            |    1    | 0.738784 |
|              maml               |    1    | 0.705739 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|           hf_T5_base            |    1    | 9851.618776 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 2910.257243 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 2881.020106 |
|          hf_GPT2_large          |    1    | 2529.675274 |
|            moondream            |    1    | 2244.732246 |
|           hf_T5_large           |    1    | 2181.75058  |
|        hf_distil_whisper        |    1    | 1946.491408 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1660.181898 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1631.811031 |
|          pytorch_unet           |    1    | 1402.876369 |
|       Background_Matting        |    1    | 1216.025023 |
|             demucs              |    1    | 1184.930288 |
|  timm_vision_transformer_large  |    1    | 946.897241  |
|         vision_maskrcnn         |    1    | 690.224753  |
|    detectron2_fcos_r_50_fpn     |    1    | 618.273943  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 597.969685  |
|           hf_BigBird            |    1    | 535.693511  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 515.390061  |
|          hf_Longformer          |    1    | 488.298239  |
|          hf_Bert_large          |    1    |  447.1685   |
|       doctr_det_predictor       |    1    | 389.679318  |
|           Super_SloMo           |    1    | 319.686082  |
|      torch_multimodal_clip      |    1    | 319.613765  |
|             hf_Bart             |    1    | 247.421134  |
|              hf_T5              |    1    |  235.32281  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  227.2822   |
|         pytorch_stargan         |   16    | 216.901537  |
|             hf_Bert             |    1    | 172.616437  |
|       speech_transformer        |    1    | 164.181658  |
|          fastNLP_Bert           |    1    | 162.448273  |
|            hf_Albert            |    1    | 161.115817  |
|             hf_GPT2             |    1    | 110.627084  |
|          hf_DistilBert          |    1    | 104.142253  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 103.821233  |
|           hf_Reformer           |    1    |  99.905415  |
|        basic_gnn_edgecnn        |    1    |  85.924288  |
|             yolov3              |    1    |  63.021501  |
|              maml               |    1    |  58.386989  |
|              vgg16              |    1    |  43.499144  |
|          BERT_pytorch           |    1    |  42.583894  |
|     nvidia_deeprecommender      |    1    |  37.506015  |
|           timm_regnet           |    1    |  33.068054  |
|           tts_angular           |    1    |  30.514964  |
|           timm_nfnet            |    1    |  30.358591  |
|            resnet152            |    1    |  29.875353  |
|          basic_gnn_gcn          |    1    |  27.817313  |
|     timm_vision_transformer     |    1    |  18.181383  |
|           densenet121           |    1    |  13.334331  |
|         resnext50_32x4d         |    1    |  13.064121  |
|           timm_vovnet           |    1    |  12.782672  |
|             alexnet             |    1    |  11.77289   |
|            resnet50             |    1    |  10.676547  |
|              llama              |    1    |  10.23594   |
|         basic_gnn_sage          |    1    |  9.748455   |
|          basic_gnn_gin          |    1    |  8.882383   |
|     resnet50_quantized_qat      |    1    |  8.645652   |
|        timm_efficientnet        |    1    |  7.942696   |
|          timm_resnest           |    1    |   5.93126   |
|   mobilenet_v2_quantized_qat    |    1    |  5.880195   |
|      doctr_reco_predictor       |    1    |  5.563682   |
|            resnet18             |    1    |  3.920416   |
|       mobilenet_v3_large        |    1    |  3.752872   |
|           mnasnet1_0            |    1    |  3.587645   |
|          mobilenet_v2           |    1    |   3.4986    |
|       shufflenet_v2_x1_0        |    1    |  3.404381   |
|         LearningToPaint         |    1    |  2.523125   |
|        phlippe_densenet         |    1    |  2.301178   |
|          squeezenet1_1          |    1    |   2.1887    |
|         opacus_cifar10          |    1    |   1.64841   |
|      functorch_dp_cifar10       |    1    |  1.638515   |
|               drq               |    1    |  0.942885   |
|         phlippe_resnet          |    1    |  0.824984   |
|        soft_actor_critic        |   256   |  0.662362   |
|          maml_omniglot          |    5    |  0.653929   |
|              dcgan              |    1    |  0.591339   |
|     functorch_maml_omniglot     |    1    |   0.5155    |
|              dlrm               |    1    |  0.451904   |
|     pyhpc_isoneutral_mixing     |    1    |  0.046494   |
|          lennard_jones          |    1    |   0.02886   |
|     pyhpc_equation_of_state     |    1    |  0.027353   |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with amp precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 4.064853 |
|     MobileBertForQuestionAnswering      | 1  | 3.346583 |
|          BlenderbotForCausalLM          | 1  | 2.817586 |
|     M2M100ForConditionalGeneration      | 1  | 2.80947  |
|     PegasusForConditionalGeneration     | 1  | 2.767442 |
|          DistilBertForMaskedLM          | 1  | 2.724002 |
|         Speech2Text2ForCausalLM         | 1  | 2.698672 |
|           PegasusForCausalLM            | 1  | 2.651473 |
|     DistilBertForQuestionAnswering      | 1  | 2.561591 |
|             XGLMForCausalLM             | 1  | 2.521119 |
|       BlenderbotSmallForCausalLM        | 1  | 2.503227 |
| BlenderbotSmallForConditionalGeneration | 1  | 2.497925 |
|            YituTechConvBert             | 1  | 2.491698 |
|            XLNetLMHeadModel             | 1  | 2.063138 |
|       DebertaForQuestionAnswering       | 1  | 2.026496 |
|         MegatronBertForCausalLM         | 1  | 2.01467  |
|           DebertaForMaskedLM            | 1  | 2.003821 |
|           ElectraForCausalLM            | 1  | 1.948219 |
|       MT5ForConditionalGeneration       | 1  | 1.871532 |
|           LayoutLMForMaskedLM           | 1  | 1.836725 |
|           RobertaForCausalLM            | 1  | 1.834957 |
|                CamemBert                | 1  | 1.82374  |
|             BertForMaskedLM             | 1  | 1.810453 |
|      GPT2ForSequenceClassification      | 1  | 1.806918 |
|       RobertaForQuestionAnswering       | 1  | 1.798475 |
|    MegatronBertForQuestionAnswering     | 1  | 1.785943 |
|    LayoutLMForSequenceClassification    | 1  | 1.778642 |
|               DistillGPT2               | 1  | 1.767182 |
|        BertForQuestionAnswering         | 1  | 1.751336 |
|       ElectraForQuestionAnswering       | 1  | 1.749167 |
|            TrOCRForCausalLM             | 1  | 1.633325 |
|          DebertaV2ForMaskedLM           | 1  | 1.613863 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.606681 |
|             OPTForCausalLM              | 1  | 1.421857 |
|             BartForCausalLM             | 1  | 1.387232 |
|      BartForConditionalGeneration       | 1  | 1.372961 |
|     PLBartForConditionalGeneration      | 1  | 1.351221 |
|      MBartForConditionalGeneration      | 1  | 1.317131 |
|            PLBartForCausalLM            | 1  | 1.315784 |
|            MBartForCausalLM             | 1  | 1.281994 |
|               GoogleFnet                | 1  | 1.233973 |
|       AlbertForQuestionAnswering        | 1  | 1.20268  |
|            AlbertForMaskedLM            | 1  | 1.185463 |
|       T5ForConditionalGeneration        | 1  | 1.158672 |
|                 T5Small                 | 1  | 1.158662 |
|          AllenaiLongformerBase          | 1  | 0.814587 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 80.310247 |
|          MobileBertForMaskedLM          | 1  | 48.986541 |
|     MobileBertForQuestionAnswering      | 1  | 48.889016 |
|     PegasusForConditionalGeneration     | 1  | 32.576671 |
|     M2M100ForConditionalGeneration      | 1  | 32.490455 |
|      MBartForConditionalGeneration      | 1  | 31.817837 |
|            XLNetLMHeadModel             | 1  | 29.004186 |
|             XGLMForCausalLM             | 1  | 28.874395 |
|      BartForConditionalGeneration       | 1  | 28.060509 |
|      DebertaV2ForQuestionAnswering      | 1  | 27.271894 |
|          DebertaV2ForMaskedLM           | 1  | 27.108656 |
|          BlenderbotForCausalLM          | 1  | 25.906855 |
| BlenderbotSmallForConditionalGeneration | 1  | 25.032328 |
|       MT5ForConditionalGeneration       | 1  | 23.958666 |
|         MegatronBertForCausalLM         | 1  | 23.690832 |
|    MegatronBertForQuestionAnswering     | 1  | 23.476184 |
|            YituTechConvBert             | 1  | 22.075418 |
|     PLBartForConditionalGeneration      | 1  | 19.918239 |
|       T5ForConditionalGeneration        | 1  | 18.93188  |
|                 T5Small                 | 1  | 18.867185 |
|            TrOCRForCausalLM             | 1  | 17.830359 |
|           PegasusForCausalLM            | 1  | 17.780869 |
|            MBartForCausalLM             | 1  | 17.421411 |
|           DebertaForMaskedLM            | 1  | 16.94346  |
|       DebertaForQuestionAnswering       | 1  | 16.738673 |
|           ElectraForCausalLM            | 1  | 16.114581 |
|       ElectraForQuestionAnswering       | 1  | 15.949692 |
|           LayoutLMForMaskedLM           | 1  | 15.823856 |
|                CamemBert                | 1  | 15.647581 |
|             BertForMaskedLM             | 1  | 15.640495 |
|    LayoutLMForSequenceClassification    | 1  | 15.582346 |
|       RobertaForQuestionAnswering       | 1  | 15.581563 |
|           RobertaForCausalLM            | 1  | 15.558648 |
|        BertForQuestionAnswering         | 1  | 15.528205 |
|            AlbertForMaskedLM            | 1  | 15.193252 |
|             BartForCausalLM             | 1  | 15.07858  |
|       BlenderbotSmallForCausalLM        | 1  | 14.992868 |
|      GPT2ForSequenceClassification      | 1  | 14.689639 |
|             OPTForCausalLM              | 1  | 14.302288 |
|               GoogleFnet                | 1  | 13.765995 |
|         Speech2Text2ForCausalLM         | 1  | 13.52373  |
|          DistilBertForMaskedLM          | 1  | 13.40925  |
|     DistilBertForQuestionAnswering      | 1  | 13.163005 |
|            PLBartForCausalLM            | 1  | 13.081174 |
|       AlbertForQuestionAnswering        | 1  | 12.312086 |
|               DistillGPT2               | 1  | 11.759608 |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.97432  |
|            MBartForCausalLM             | 1  | 0.967657 |
|               DistillGPT2               | 1  | 0.95423  |
|      GPT2ForSequenceClassification      | 1  | 0.952462 |
|             BartForCausalLM             | 1  | 0.946188 |
|           PegasusForCausalLM            | 1  | 0.942153 |
|       RobertaForQuestionAnswering       | 1  | 0.938138 |
|                CamemBert                | 1  | 0.937743 |
|       MT5ForConditionalGeneration       | 1  | 0.936473 |
|             BertForMaskedLM             | 1  | 0.934826 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.93471  |
|           DebertaForMaskedLM            | 1  | 0.93468  |
|          BlenderbotForCausalLM          | 1  | 0.933614 |
|            PLBartForCausalLM            | 1  | 0.932309 |
|           LayoutLMForMaskedLM           | 1  | 0.931148 |
|         MegatronBertForCausalLM         | 1  | 0.92927  |
|            YituTechConvBert             | 1  | 0.928348 |
|            TrOCRForCausalLM             | 1  | 0.924718 |
|     PLBartForConditionalGeneration      | 1  | 0.923129 |
|    MegatronBertForQuestionAnswering     | 1  | 0.922123 |
|                 T5Small                 | 1  | 0.921863 |
|       T5ForConditionalGeneration        | 1  | 0.92114  |
|           RobertaForCausalLM            | 1  | 0.920337 |
|        BertForQuestionAnswering         | 1  |  0.9199  |
|    LayoutLMForSequenceClassification    | 1  | 0.919447 |
|             XGLMForCausalLM             | 1  | 0.914742 |
|     M2M100ForConditionalGeneration      | 1  | 0.914406 |
|       BlenderbotSmallForCausalLM        | 1  | 0.911261 |
|      BartForConditionalGeneration       | 1  | 0.908579 |
|          AllenaiLongformerBase          | 1  | 0.900928 |
|          DebertaV2ForMaskedLM           | 1  | 0.90069  |
|     PegasusForConditionalGeneration     | 1  | 0.899712 |
|      MBartForConditionalGeneration      | 1  | 0.89764  |
|           ElectraForCausalLM            | 1  | 0.896219 |
|       ElectraForQuestionAnswering       | 1  | 0.88035  |
|          DistilBertForMaskedLM          | 1  | 0.87771  |
|            XLNetLMHeadModel             | 1  | 0.875399 |
|     DistilBertForQuestionAnswering      | 1  | 0.870106 |
|       DebertaForQuestionAnswering       | 1  | 0.86499  |
|         Speech2Text2ForCausalLM         | 1  | 0.857143 |
|               GoogleFnet                | 1  | 0.85268  |
| BlenderbotSmallForConditionalGeneration | 1  | 0.848386 |
|          MobileBertForMaskedLM          | 1  | 0.78805  |
|     MobileBertForQuestionAnswering      | 1  | 0.740516 |
|            AlbertForMaskedLM            | 1  | 0.686515 |
|       AlbertForQuestionAnswering        | 1  | 0.668687 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+-------------+
|                  name                   | bs |  inductor   |
+-----------------------------------------+----+-------------+
|       AlbertForQuestionAnswering        | 1  | 3851.921991 |
|            AlbertForMaskedLM            | 1  | 3828.025499 |
|             OPTForCausalLM              | 1  | 2191.034279 |
|      MBartForConditionalGeneration      | 1  | 1778.791354 |
|      BartForConditionalGeneration       | 1  | 1385.893846 |
|          DebertaV2ForMaskedLM           | 1  | 1328.598184 |
|          AllenaiLongformerBase          | 1  | 1111.31278  |
|      DebertaV2ForQuestionAnswering      | 1  | 1026.721428 |
|            MBartForCausalLM             | 1  | 980.918629  |
|            XLNetLMHeadModel             | 1  | 962.807729  |
|                 T5Small                 | 1  | 803.856261  |
|       T5ForConditionalGeneration        | 1  | 801.701266  |
|          BlenderbotForCausalLM          | 1  | 753.861932  |
|     PLBartForConditionalGeneration      | 1  | 653.669347  |
|             BartForCausalLM             | 1  | 611.501897  |
|         MegatronBertForCausalLM         | 1  | 508.628383  |
|    MegatronBertForQuestionAnswering     | 1  | 470.155876  |
|               GoogleFnet                | 1  | 450.381791  |
|            PLBartForCausalLM            | 1  | 382.534294  |
|      GPT2ForSequenceClassification      | 1  |  370.61548  |
|             XGLMForCausalLM             | 1  |  278.34587  |
|     M2M100ForConditionalGeneration      | 1  | 210.062328  |
|           RobertaForCausalLM            | 1  |  207.45656  |
|           DebertaForMaskedLM            | 1  | 205.662298  |
|            YituTechConvBert             | 1  | 197.276688  |
|       MT5ForConditionalGeneration       | 1  | 192.411975  |
|            TrOCRForCausalLM             | 1  | 181.500151  |
|                CamemBert                | 1  | 181.484762  |
|           LayoutLMForMaskedLM           | 1  | 181.238097  |
|             BertForMaskedLM             | 1  | 180.374953  |
|     PegasusForConditionalGeneration     | 1  | 176.860322  |
|               DistillGPT2               | 1  | 147.719752  |
|       RobertaForQuestionAnswering       | 1  | 143.067061  |
|        BertForQuestionAnswering         | 1  |  142.51737  |
|    LayoutLMForSequenceClassification    | 1  | 142.256431  |
|       DebertaForQuestionAnswering       | 1  | 141.110028  |
|           PegasusForCausalLM            | 1  |  88.162837  |
| BlenderbotSmallForConditionalGeneration | 1  |  51.843166  |
|           ElectraForCausalLM            | 1  |  48.589761  |
|          DistilBertForMaskedLM          | 1  |  30.295311  |
|          MobileBertForMaskedLM          | 1  |  29.197591  |
|       ElectraForQuestionAnswering       | 1  |  28.841281  |
|       BlenderbotSmallForCausalLM        | 1  |  28.169378  |
|     DistilBertForQuestionAnswering      | 1  |  19.486914  |
|     MobileBertForQuestionAnswering      | 1  |  18.464919  |
|         Speech2Text2ForCausalLM         | 1  |  5.360516   |
+-----------------------------------------+----+-------------+

timm_models suite with amp precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 6.314017 |
|       gluon_inception_v3        | 1  | 6.246128 |
|        ese_vovnet19b_dw         | 1  | 6.239007 |
|        adv_inception_v3         | 1  | 6.187769 |
|           resnest101e           | 1  | 5.989423 |
|          cspdarknet53           | 1  | 5.372244 |
|          pnasnet5large          | 1  | 5.256642 |
|           dm_nfnet_f0           | 1  | 5.243207 |
|           res2next50            | 1  | 4.902646 |
|     swsl_resnext101_32x16d      | 1  | 4.82005  |
|            nfnet_l0             | 1  | 4.781394 |
|         mobilenetv2_100         | 1  | 4.722745 |
|             dla102              | 1  | 4.70371  |
|           fbnetc_100            | 1  | 4.617062 |
|            fbnetv3_b            | 1  | 4.568696 |
|          spnasnet_100           | 1  | 4.465828 |
|      mobilenetv3_large_100      | 1  | 4.448977 |
|           mnasnet_100           | 1  | 4.440868 |
|          botnet26t_256          | 1  | 4.344725 |
|        res2net50_14w_8s         | 1  | 4.196465 |
|            gernet_l             | 1  | 4.108511 |
|           selecsls42b           | 1  | 4.050057 |
|        res2net101_26w_4s        | 1  | 4.023746 |
|            hrnet_w18            | 1  | 4.019873 |
|           regnety_002           | 1  | 3.946139 |
|          ghostnet_100           | 1  | 3.818358 |
|            repvgg_a2            | 1  | 3.802961 |
|       eca_botnext26ts_256       | 1  |  3.555   |
|        eca_halonext26ts         | 1  | 3.45491  |
|            levit_128            | 1  | 3.434763 |
|            lcnet_050            | 1  | 3.318073 |
|            tinynet_a            | 1  | 3.292803 |
|           rexnet_100            | 1  | 3.258723 |
|         poolformer_m36          | 1  | 3.230867 |
|       tf_efficientnet_b0        | 1  | 3.107065 |
|           mobilevit_s           | 1  | 3.103845 |
|         visformer_small         | 1  | 3.062605 |
|             dpn107              | 1  | 2.920478 |
|        convmixer_768_32         | 1  | 2.650777 |
|         coat_lite_mini          | 1  | 2.543174 |
|        twins_pcpvt_base         | 1  | 2.401393 |
|           volo_d1_224           | 1  | 2.310679 |
|            mixnet_l             | 1  | 2.228308 |
|           tf_mixnet_l           | 1  | 2.156945 |
|          gmixer_24_224          | 1  | 2.145618 |
|  swin_base_patch4_window7_224   | 1  | 1.903213 |
|          gmlp_s16_224           | 1  | 1.878321 |
|      xcit_large_24_p8_224       | 1  | 1.846565 |
|        tnt_s_patch16_224        | 1  | 1.828439 |
|      beit_base_patch16_224      | 1  | 1.815437 |
|           convit_base           | 1  | 1.79781  |
|            pit_b_224            | 1  | 1.788414 |
|         crossvit_9_240          | 1  | 1.757055 |
|          jx_nest_base           | 1  | 1.756844 |
|          convnext_base          | 1  | 1.689015 |
|          resmlp_12_224          | 1  | 1.685533 |
|      vit_base_patch16_224       | 1  | 1.636379 |
|          cait_m36_384           | 1  | 1.558617 |
| deit_base_distilled_patch16_224 | 1  | 1.535266 |
|          mixer_b16_224          | 1  | 1.425137 |
|        sebotnet33ts_256         | 1  | 1.226538 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|      xcit_large_24_p8_224       | 1  | 43.706745 |
|            hrnet_w18            | 1  | 43.549685 |
|         poolformer_m36          | 1  | 43.399668 |
|          pnasnet5large          | 1  | 42.576987 |
|          cait_m36_384           | 1  | 40.019968 |
|  swin_base_patch4_window7_224   | 1  | 38.429426 |
|          jx_nest_base           | 1  | 32.175072 |
|        res2net101_26w_4s        | 1  | 31.556388 |
|        twins_pcpvt_base         | 1  | 30.803274 |
|           resnest101e           | 1  | 30.462603 |
|        res2net50_14w_8s         | 1  | 29.867205 |
|        tnt_s_patch16_224        | 1  | 29.099627 |
|             dpn107              | 1  | 27.659047 |
|           tf_mixnet_l           | 1  | 27.245814 |
|            mixnet_l             | 1  | 25.691258 |
|        adv_inception_v3         | 1  | 25.630228 |
|           mobilevit_s           | 1  | 25.613403 |
|           volo_d1_224           | 1  | 24.339321 |
|       gluon_inception_v3        | 1  | 23.163644 |
|          inception_v3           | 1  | 23.075753 |
|            levit_128            | 1  | 22.395675 |
|          gmixer_24_224          | 1  | 22.20447  |
|         crossvit_9_240          | 1  | 21.902912 |
|          gmlp_s16_224           | 1  | 21.015241 |
|          convnext_base          | 1  | 20.921764 |
|           res2next50            | 1  | 20.607543 |
|        eca_halonext26ts         | 1  | 20.251492 |
|            fbnetv3_b            | 1  | 20.131312 |
|        sebotnet33ts_256         | 1  | 19.883408 |
|           rexnet_100            | 1  | 19.270344 |
|             dla102              | 1  | 18.951033 |
|         coat_lite_mini          | 1  | 18.928877 |
|          ghostnet_100           | 1  | 18.890375 |
|           convit_base           | 1  | 17.816019 |
|         visformer_small         | 1  | 17.398807 |
|       eca_botnext26ts_256       | 1  | 17.226955 |
|            tinynet_a            | 1  | 17.12727  |
|     swsl_resnext101_32x16d      | 1  | 16.771835 |
|       tf_efficientnet_b0        | 1  | 16.283842 |
|        convmixer_768_32         | 1  | 16.073766 |
|          botnet26t_256          | 1  | 15.941571 |
|           dm_nfnet_f0           | 1  | 15.878391 |
|          cspdarknet53           | 1  | 15.675457 |
|            pit_b_224            | 1  | 15.02801  |
|          mixer_b16_224          | 1  | 14.760064 |
|      beit_base_patch16_224      | 1  | 14.75779  |
|            nfnet_l0             | 1  | 14.625057 |
| deit_base_distilled_patch16_224 | 1  | 14.369627 |
|           regnety_002           | 1  | 14.241332 |
|      vit_base_patch16_224       | 1  | 14.233629 |
|      mobilenetv3_large_100      | 1  | 14.167491 |
|            repvgg_a2            | 1  | 14.109188 |
|           fbnetc_100            | 1  | 13.891218 |
|          spnasnet_100           | 1  | 13.814815 |
|            gernet_l             | 1  | 13.425841 |
|         mobilenetv2_100         | 1  | 13.156747 |
|        ese_vovnet19b_dw         | 1  | 12.972644 |
|           mnasnet_100           | 1  | 12.846214 |
|          resmlp_12_224          | 1  | 12.513358 |
|           selecsls42b           | 1  | 12.498388 |
|            lcnet_050            | 1  | 11.687913 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|            nfnet_l0             | 1  | 0.91207  |
|          pnasnet5large          | 1  | 0.909257 |
|          convnext_base          | 1  | 0.906304 |
|        convmixer_768_32         | 1  | 0.897801 |
|      beit_base_patch16_224      | 1  | 0.895474 |
|          cait_m36_384           | 1  | 0.893237 |
|           dm_nfnet_f0           | 1  |  0.8931  |
|          resmlp_12_224          | 1  | 0.888943 |
|        ese_vovnet19b_dw         | 1  | 0.888586 |
|         poolformer_m36          | 1  | 0.887845 |
|      xcit_large_24_p8_224       | 1  | 0.878077 |
|           convit_base           | 1  | 0.876411 |
|           volo_d1_224           | 1  | 0.876165 |
|  swin_base_patch4_window7_224   | 1  | 0.874106 |
|            pit_b_224            | 1  | 0.873028 |
|         coat_lite_mini          | 1  | 0.869748 |
|      vit_base_patch16_224       | 1  | 0.869091 |
|          gmlp_s16_224           | 1  | 0.868397 |
|          mixer_b16_224          | 1  | 0.868303 |
|         mobilenetv2_100         | 1  | 0.868294 |
|        twins_pcpvt_base         | 1  | 0.867637 |
|         visformer_small         | 1  | 0.866978 |
| deit_base_distilled_patch16_224 | 1  | 0.866796 |
|          jx_nest_base           | 1  | 0.865597 |
|       tf_efficientnet_b0        | 1  | 0.862834 |
|           mobilevit_s           | 1  | 0.861763 |
|      mobilenetv3_large_100      | 1  | 0.860149 |
|            lcnet_050            | 1  | 0.860055 |
|          gmixer_24_224          | 1  | 0.860051 |
|           fbnetc_100            | 1  | 0.856986 |
|           rexnet_100            | 1  | 0.856338 |
|           mnasnet_100           | 1  |  0.8528  |
|            fbnetv3_b            | 1  | 0.851597 |
|       eca_botnext26ts_256       | 1  | 0.85153  |
|            tinynet_a            | 1  | 0.849978 |
|          botnet26t_256          | 1  | 0.849838 |
|          spnasnet_100           | 1  | 0.849699 |
|        sebotnet33ts_256         | 1  | 0.846824 |
|        eca_halonext26ts         | 1  | 0.845957 |
|            mixnet_l             | 1  | 0.839285 |
|           tf_mixnet_l           | 1  | 0.838992 |
|        tnt_s_patch16_224        | 1  | 0.838011 |
|          ghostnet_100           | 1  | 0.832553 |
|         crossvit_9_240          | 1  | 0.827284 |
|           regnety_002           | 1  | 0.825921 |
|             dpn107              | 1  | 0.825883 |
|            levit_128            | 1  | 0.822581 |
|          cspdarknet53           | 1  | 0.79731  |
|           res2next50            | 1  | 0.794795 |
|          inception_v3           | 1  | 0.787541 |
|       gluon_inception_v3        | 1  | 0.786913 |
|             dla102              | 1  | 0.786428 |
|        res2net50_14w_8s         | 1  | 0.786339 |
|        adv_inception_v3         | 1  | 0.785733 |
|           resnest101e           | 1  | 0.776083 |
|           selecsls42b           | 1  | 0.77101  |
|            gernet_l             | 1  | 0.767607 |
|            repvgg_a2            | 1  | 0.759187 |
|        res2net101_26w_4s        | 1  | 0.755463 |
|            hrnet_w18            | 1  | 0.750425 |
|     swsl_resnext101_32x16d      | 1  | 0.717262 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+-------------+
|              name               | bs |  inductor   |
+---------------------------------+----+-------------+
|          cait_m36_384           | 1  | 1562.195511 |
|      xcit_large_24_p8_224       | 1  | 429.721258  |
|          pnasnet5large          | 1  | 129.850487  |
|          jx_nest_base           | 1  | 109.866922  |
|          convnext_base          | 1  |  99.747171  |
|  swin_base_patch4_window7_224   | 1  |  99.394792  |
|     swsl_resnext101_32x16d      | 1  |  91.64134   |
|           convit_base           | 1  |  87.211316  |
| deit_base_distilled_patch16_224 | 1  |  80.396162  |
|      beit_base_patch16_224      | 1  |  75.033797  |
|      vit_base_patch16_224       | 1  |  73.305568  |
|             dpn107              | 1  |  71.612834  |
|        convmixer_768_32         | 1  |  69.725137  |
|            pit_b_224            | 1  |  64.410295  |
|          mixer_b16_224          | 1  |  61.845951  |
|         poolformer_m36          | 1  |  49.086101  |
|        sebotnet33ts_256         | 1  |  48.968818  |
|        twins_pcpvt_base         | 1  |  48.875165  |
|           volo_d1_224           | 1  |  42.859783  |
|        tnt_s_patch16_224        | 1  |  41.704577  |
|           dm_nfnet_f0           | 1  |  40.621636  |
|           resnest101e           | 1  |  31.133823  |
|          gmlp_s16_224           | 1  |  28.797709  |
|        res2net101_26w_4s        | 1  |  28.38055   |
|            nfnet_l0             | 1  |  27.934995  |
|          gmixer_24_224          | 1  |  25.61871   |
|         visformer_small         | 1  |  22.936097  |
|            hrnet_w18            | 1  |  22.885285  |
|           mobilevit_s           | 1  |  20.933577  |
|           tf_mixnet_l           | 1  |  19.865021  |
|             dla102              | 1  |  18.982829  |
|            mixnet_l             | 1  |  18.900582  |
|        res2net50_14w_8s         | 1  |  17.510604  |
|          resmlp_12_224          | 1  |  17.057582  |
|          cspdarknet53           | 1  |  15.979685  |
|        eca_halonext26ts         | 1  |  15.913035  |
|          inception_v3           | 1  |  15.241069  |
|       gluon_inception_v3        | 1  |  15.149327  |
|        adv_inception_v3         | 1  |  15.12865   |
|       eca_botnext26ts_256       | 1  |  14.894089  |
|         coat_lite_mini          | 1  |  14.822768  |
|           res2next50            | 1  |  14.627767  |
|         crossvit_9_240          | 1  |  13.904011  |
|            repvgg_a2            | 1  |  12.645674  |
|            gernet_l             | 1  |  12.387129  |
|          botnet26t_256          | 1  |  11.924578  |
|           selecsls42b           | 1  |  11.400006  |
|       tf_efficientnet_b0        | 1  |  8.381909   |
|        ese_vovnet19b_dw         | 1  |  8.158012   |
|           rexnet_100            | 1  |  7.858276   |
|            fbnetv3_b            | 1  |  7.847141   |
|            tinynet_a            | 1  |  7.165572   |
|            levit_128            | 1  |   5.48644   |
|          ghostnet_100           | 1  |  4.961408   |
|           fbnetc_100            | 1  |  4.335941   |
|          spnasnet_100           | 1  |   4.03215   |
|      mobilenetv3_large_100      | 1  |  3.807373   |
|           mnasnet_100           | 1  |  3.617926   |
|         mobilenetv2_100         | 1  |  3.531893   |
|           regnety_002           | 1  |  3.325597   |
|            lcnet_050            | 1  |  1.823453   |
+---------------------------------+----+-------------+

zxd1997066 · 2024-10-28T21:40:17Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-27 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 80/80 | 100%, 46/46 | 100%, 61/61 |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.57x    |    1.41x    |    1.91x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   28.74    |    26.82    |    25.74    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.90x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 16.567087 |
|       mobilenet_v3_large        |   32    |  3.1914   |
|          squeezenet1_1          |   16    | 3.152579  |
|           mnasnet1_0            |   32    | 3.090679  |
|          mobilenet_v2           |   16    | 3.090629  |
|        timm_efficientnet        |   64    | 3.089108  |
|       shufflenet_v2_x1_0        |   64    | 2.757446  |
|          timm_resnest           |   32    |  2.43125  |
|            resnet50             |   32    | 2.317463  |
|        phlippe_densenet         |   128   | 2.222646  |
|         resnext50_32x4d         |    8    | 2.182545  |
|       doctr_det_predictor       |    1    | 2.172778  |
|            resnet152            |   32    | 2.137365  |
|         phlippe_resnet          |   128   |  2.12725  |
|        soft_actor_critic        |   256   | 2.113634  |
|             hf_GPT2             |    1    | 2.108432  |
|           densenet121           |   64    |  2.09438  |
|            resnet18             |    8    | 2.025268  |
|           timm_regnet           |   32    | 1.930657  |
|          BERT_pytorch           |    2    | 1.875536  |
|           timm_nfnet            |   128   | 1.814953  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.802239  |
|      doctr_reco_predictor       |    1    | 1.750386  |
|          maml_omniglot          |    5    | 1.738214  |
|            hf_Albert            |    1    | 1.714515  |
|             alexnet             |   128   |  1.70476  |
|          fastNLP_Bert           |    1    | 1.665699  |
|     functorch_maml_omniglot     |    1    | 1.665046  |
|           timm_vovnet           |   32    | 1.645418  |
|              llama              |   32    |  1.63262  |
|          hf_Bert_large          |    1    | 1.628739  |
|          hf_GPT2_large          |    1    | 1.620314  |
|             hf_Bert             |    1    | 1.572773  |
|            moondream            |    1    | 1.547589  |
|        basic_gnn_edgecnn        |    1    | 1.529911  |
|             yolov3              |    8    | 1.517479  |
|         LearningToPaint         |   96    | 1.504926  |
|          hf_Longformer          |    1    | 1.494691  |
|              dcgan              |   256   |  1.43836  |
|          hf_DistilBert          |    1    | 1.438263  |
|              vgg16              |    4    | 1.415592  |
|      torch_multimodal_clip      |   32    | 1.382659  |
|         vision_maskrcnn         |    1    | 1.372789  |
|             hf_Bart             |    1    | 1.356261  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.344865  |
|           hf_Reformer           |    1    | 1.342526  |
|          lennard_jones          |  1000   | 1.336577  |
|     timm_vision_transformer     |   32    | 1.331405  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.326625  |
|          basic_gnn_gcn          |    1    | 1.319472  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.296958  |
|           hf_T5_large           |    1    | 1.296483  |
|         opacus_cifar10          |   64    | 1.287914  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.281876  |
|         basic_gnn_sage          |    1    | 1.272141  |
|         pytorch_stargan         |   16    | 1.265305  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.255268  |
|          pytorch_unet           |    1    |  1.24967  |
|               drq               |    1    | 1.247072  |
|        hf_distil_whisper        |    1    | 1.235105  |
|              hf_T5              |    1    | 1.229827  |
|     nvidia_deeprecommender      |   256   | 1.219663  |
|           hf_BigBird            |    1    | 1.210995  |
|  detectron2_fasterrcnn_r_50_c4  |    1    |  1.2108   |
|              dlrm               |  2048   | 1.207062  |
|      functorch_dp_cifar10       |   64    | 1.191118  |
|          basic_gnn_gin          |    1    | 1.174319  |
|           Super_SloMo           |    6    |  1.1439   |
|       speech_transformer        |    1    | 1.143293  |
|       Background_Matting        |    1    | 1.138582  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.093716  |
|  timm_vision_transformer_large  |   32    | 1.061975  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.053647  |
|             demucs              |    1    | 1.010836  |
|   mobilenet_v2_quantized_qat    |   96    | 1.010512  |
|           tts_angular           |   64    |  0.99712  |
|     resnet50_quantized_qat      |   32    | 0.986667  |
|           hf_T5_base            |    1    | 0.889615  |
|              maml               |    1    | 0.781868  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.756206  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 154.127678 |
|         vision_maskrcnn         |    1    | 146.380891 |
|    detectron2_fcos_r_50_fpn     |    1    | 121.678057 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 113.323048 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 104.180405 |
|           hf_T5_large           |    1    | 85.664655  |
|              maml               |    1    | 81.977422  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 66.632777  |
|          hf_Longformer          |    1    | 59.078529  |
|       speech_transformer        |    1    | 51.713993  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.363198  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 47.500787  |
| detectron2_fasterrcnn_r_50_fpn  |    1    |  45.87943  |
|           hf_Reformer           |    1    | 43.788967  |
|           densenet121           |   64    | 39.667702  |
|     pyhpc_isoneutral_mixing     | 1048576 | 38.250344  |
|           hf_T5_base            |    1    | 38.195489  |
|          hf_GPT2_large          |    1    | 37.238399  |
|  timm_vision_transformer_large  |   32    | 36.686418  |
|            moondream            |    1    | 35.549219  |
|          fastNLP_Bert           |    1    |  31.04044  |
|          hf_Bert_large          |    1    | 30.614366  |
|        hf_distil_whisper        |    1    | 30.101685  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.646582  |
|          basic_gnn_gcn          |    1    |  29.56111  |
|       doctr_det_predictor       |    1    | 28.733583  |
|           Super_SloMo           |    6    |  28.63916  |
|      torch_multimodal_clip      |   32    | 26.149117  |
|            resnet152            |   32    | 24.262857  |
|          BERT_pytorch           |    2    | 23.576327  |
|             hf_Bart             |    1    | 21.772632  |
|              hf_T5              |    1    | 21.593761  |
|             yolov3              |    8    | 20.667985  |
|       shufflenet_v2_x1_0        |   64    | 18.520038  |
|             demucs              |    1    | 18.030887  |
|             hf_Bert             |    1    | 17.685814  |
|        phlippe_densenet         |   128   | 17.338368  |
|       Background_Matting        |    1    | 17.199993  |
|           timm_regnet           |   32    | 17.097945  |
|             hf_GPT2             |    1    | 16.562875  |
|              llama              |   32    | 16.450255  |
|          timm_resnest           |   32    | 16.300688  |
|        timm_efficientnet        |   64    | 16.122491  |
|            hf_Albert            |    1    | 15.690788  |
|     timm_vision_transformer     |   32    | 15.511504  |
|           timm_nfnet            |   128   | 15.313523  |
|       mobilenet_v3_large        |   32    | 15.094353  |
|           timm_vovnet           |   32    | 14.542229  |
|      doctr_reco_predictor       |    1    | 14.412653  |
|         resnext50_32x4d         |    8    |  13.74073  |
|          mobilenet_v2           |   16    | 13.485768  |
|            resnet50             |   32    | 13.358667  |
|          hf_DistilBert          |    1    | 13.309817  |
|          pytorch_unet           |    1    | 13.144274  |
|           mnasnet1_0            |   32    | 13.133489  |
|         opacus_cifar10          |   64    | 13.132229  |
|         pytorch_stargan         |   16    | 13.085207  |
|      functorch_dp_cifar10       |   64    | 12.575929  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.090812  |
|          squeezenet1_1          |   16    | 10.376426  |
|            resnet18             |    8    | 10.267687  |
|     pyhpc_equation_of_state     | 1048576 |  9.972641  |
|         LearningToPaint         |   96    |  9.927943  |
|         phlippe_resnet          |   128   |  9.587261  |
|             alexnet             |   128   |  9.417279  |
|              vgg16              |    4    |  9.270725  |
|     functorch_maml_omniglot     |    1    |  8.574405  |
|               drq               |    1    |  8.372789  |
|              dlrm               |  2048   |  8.343431  |
|          maml_omniglot          |    5    |  8.288785  |
|        basic_gnn_edgecnn        |    1    |  7.952996  |
|         basic_gnn_sage          |    1    |  7.913432  |
|          basic_gnn_gin          |    1    |  7.906948  |
|              dcgan              |   256   |  7.755527  |
|     nvidia_deeprecommender      |   256   |  7.735828  |
|        soft_actor_critic        |   256   |  7.509702  |
|          lennard_jones          |  1000   |  7.470188  |
|           tts_angular           |   64    |  7.324608  |
|   mobilenet_v2_quantized_qat    |   96    |  0.105617  |
|     resnet50_quantized_qat      |   32    |  0.075763  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|           timm_nfnet            |   128   | 0.989129 |
|              dlrm               |  2048   | 0.988195 |
|           hf_T5_base            |    1    | 0.987414 |
|          hf_GPT2_large          |    1    | 0.985154 |
|        timm_efficientnet        |   64    | 0.983484 |
|       Background_Matting        |    1    | 0.981489 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.979903 |
|  timm_vision_transformer_large  |   32    | 0.979466 |
|           densenet121           |   64    | 0.97666  |
|             demucs              |    1    | 0.976657 |
|           Super_SloMo           |    6    | 0.975318 |
|        basic_gnn_edgecnn        |    1    | 0.974445 |
|          pytorch_unet           |    1    | 0.973888 |
|           timm_regnet           |   32    | 0.973077 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971461 |
|          timm_resnest           |   32    | 0.968725 |
|             yolov3              |    8    | 0.967148 |
|            resnet50             |   32    | 0.966671 |
|         LearningToPaint         |   96    | 0.966098 |
|            resnet152            |   32    | 0.965128 |
|      torch_multimodal_clip      |   32    | 0.964815 |
|           timm_vovnet           |   32    | 0.963961 |
|     timm_vision_transformer     |   32    | 0.961729 |
|     resnet50_quantized_qat      |   32    | 0.958112 |
|   mobilenet_v2_quantized_qat    |   96    | 0.954645 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.954323 |
|           mnasnet1_0            |   32    | 0.954098 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.953725 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.952896 |
|       mobilenet_v3_large        |   32    | 0.947927 |
|       shufflenet_v2_x1_0        |   64    | 0.947313 |
|         vision_maskrcnn         |    1    | 0.946953 |
|         pytorch_stargan         |   16    | 0.945582 |
|          mobilenet_v2           |   16    | 0.94547  |
|           hf_BigBird            |    1    | 0.944748 |
|          basic_gnn_gcn          |    1    | 0.943606 |
|        phlippe_densenet         |   128   | 0.942848 |
|         resnext50_32x4d         |    8    | 0.935404 |
|      doctr_reco_predictor       |    1    | 0.933416 |
|       doctr_det_predictor       |    1    | 0.93247  |
|              llama              |   32    | 0.929724 |
|           tts_angular           |   64    | 0.924957 |
|        hf_distil_whisper        |    1    | 0.917524 |
|          squeezenet1_1          |   16    | 0.910272 |
|              dcgan              |   256   | 0.909959 |
|     pyhpc_equation_of_state     | 1048576 | 0.908914 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.905003 |
|             alexnet             |   128   | 0.896092 |
|         phlippe_resnet          |   128   | 0.889307 |
|            resnet18             |    8    | 0.886671 |
|        soft_actor_critic        |   256   | 0.885172 |
|         opacus_cifar10          |   64    | 0.884968 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881592 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.880921 |
|          basic_gnn_gin          |    1    | 0.868204 |
|         basic_gnn_sage          |    1    | 0.866555 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.858478 |
|          lennard_jones          |  1000   | 0.858407 |
|     functorch_maml_omniglot     |    1    | 0.853278 |
|          maml_omniglot          |    5    | 0.85166  |
|          fastNLP_Bert           |    1    | 0.851647 |
|            moondream            |    1    | 0.823528 |
|          hf_Bert_large          |    1    | 0.80904  |
|      functorch_dp_cifar10       |   64    | 0.80417  |
|          hf_Longformer          |    1    | 0.803743 |
|          BERT_pytorch           |    2    | 0.801837 |
|       speech_transformer        |    1    | 0.797421 |
|           hf_T5_large           |    1    | 0.795079 |
|             hf_Bert             |    1    | 0.793403 |
|            hf_Albert            |    1    | 0.790558 |
|     nvidia_deeprecommender      |   256   | 0.775734 |
|              hf_T5              |    1    | 0.774887 |
|              vgg16              |    4    | 0.769986 |
|          hf_DistilBert          |    1    | 0.76897  |
|             hf_Bart             |    1    |  0.7668  |
|               drq               |    1    | 0.764528 |
|             hf_GPT2             |    1    | 0.75776  |
|           hf_Reformer           |    1    | 0.725837 |
|              maml               |    1    | 0.714634 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.698382 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4271.082034 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1365.623393 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1245.806527 |
|           hf_T5_base            |    1    | 1184.528791 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1096.618734 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1020.65519  |
|           Super_SloMo           |    6    | 600.787923  |
|           timm_nfnet            |   128   | 547.585372  |
|          hf_GPT2_large          |    1    | 521.396795  |
|            moondream            |    1    | 388.882776  |
|           hf_T5_large           |    1    | 386.602451  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 371.894163  |
|        hf_distil_whisper        |    1    | 335.338666  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 328.297735  |
|         vision_maskrcnn         |    1    | 299.217654  |
|       Background_Matting        |    1    | 247.438966  |
|          pytorch_unet           |    1    | 219.101427  |
|           timm_regnet           |   32    | 213.938949  |
|           densenet121           |   64    | 189.950418  |
|            resnet152            |   32    | 187.151595  |
|             yolov3              |    8    | 168.262056  |
|    detectron2_fcos_r_50_fpn     |    1    | 162.902555  |
|      torch_multimodal_clip      |   32    | 146.177772  |
|             demucs              |    1    | 143.143411  |
|           hf_BigBird            |    1    |  135.18636  |
|           timm_vovnet           |   32    | 120.080094  |
|          hf_Bert_large          |    1    | 102.686703  |
|         pytorch_stargan         |   16    |  91.210469  |
|     timm_vision_transformer     |   32    |  84.949454  |
|       doctr_det_predictor       |    1    |  81.272475  |
|            resnet50             |   32    |  74.662964  |
|          hf_Longformer          |    1    |  73.123807  |
|              maml               |    1    |  54.133508  |
|             hf_Bart             |    1    |  52.306732  |
|          timm_resnest           |   32    |  51.934813  |
|       speech_transformer        |    1    |  50.373355  |
|        timm_efficientnet        |   64    |  47.84568   |
|   mobilenet_v2_quantized_qat    |   96    |  42.320751  |
|              hf_T5              |    1    |  42.280035  |
|             alexnet             |   128   |  41.779877  |
|             hf_Bert             |    1    |  39.64771   |
|         LearningToPaint         |   96    |  36.671107  |
|            hf_Albert            |    1    |  33.916237  |
|              vgg16              |    4    |  33.761874  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.151052  |
|          fastNLP_Bert           |    1    |  32.052149  |
|     pyhpc_isoneutral_mixing     | 1048576 |  30.617668  |
|     nvidia_deeprecommender      |   256   |  29.460035  |
|           hf_Reformer           |    1    |  29.208434  |
|     resnet50_quantized_qat      |   32    |  28.729119  |
|          hf_DistilBert          |    1    |  25.815285  |
|             hf_GPT2             |    1    |  22.579388  |
|         resnext50_32x4d         |    8    |  22.22312   |
|              llama              |   32    |  21.038526  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  20.738847  |
|        basic_gnn_edgecnn        |    1    |  20.503098  |
|        phlippe_densenet         |   128   |  20.254312  |
|           tts_angular           |   64    |  20.080016  |
|          BERT_pytorch           |    2    |  19.754439  |
|              dcgan              |   256   |  18.062911  |
|       shufflenet_v2_x1_0        |   64    |  15.835533  |
|           mnasnet1_0            |   32    |  14.13474   |
|       mobilenet_v3_large        |   32    |  13.048083  |
|          basic_gnn_gcn          |    1    |  9.792917   |
|          mobilenet_v2           |   16    |  9.257365   |
|            resnet18             |    8    |  8.903278   |
|              dlrm               |  2048   |  7.037734   |
|         opacus_cifar10          |   64    |  6.386788   |
|      functorch_dp_cifar10       |   64    |  6.232467   |
|          squeezenet1_1          |   16    |  6.063474   |
|         basic_gnn_sage          |    1    |  5.185243   |
|          basic_gnn_gin          |    1    |  4.709794   |
|         phlippe_resnet          |   128   |  4.354883   |
|      doctr_reco_predictor       |    1    |  3.704685   |
|     pyhpc_equation_of_state     | 1048576 |  0.899541   |
|               drq               |    1    |  0.853878   |
|     functorch_maml_omniglot     |    1    |  0.453915   |
|          maml_omniglot          |    5    |  0.399055   |
|        soft_actor_critic        |   256   |  0.326596   |
|          lennard_jones          |  1000   |  0.221244   |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.862721 |
|     MobileBertForQuestionAnswering      | 128 | 2.233576 |
|      GPT2ForSequenceClassification      |  4  | 2.091595 |
|           ElectraForCausalLM            | 32  | 2.028822 |
|       ElectraForQuestionAnswering       | 64  | 1.970654 |
|          MobileBertForMaskedLM          | 128 | 1.755021 |
|               DistillGPT2               | 16  | 1.579371 |
|    LayoutLMForSequenceClassification    | 16  | 1.527545 |
|       RobertaForQuestionAnswering       | 16  | 1.517395 |
|        BertForQuestionAnswering         | 16  | 1.508822 |
|            YituTechConvBert             | 16  | 1.497799 |
|               GoogleFnet                | 16  | 1.479572 |
|    MegatronBertForQuestionAnswering     |  8  | 1.47161  |
|      DebertaV2ForQuestionAnswering      |  1  | 1.466925 |
|           RobertaForCausalLM            | 16  | 1.455873 |
|           LayoutLMForMaskedLM           | 16  | 1.454096 |
|          AllenaiLongformerBase          |  4  | 1.443026 |
|             BertForMaskedLM             | 16  | 1.44126  |
|                CamemBert                | 16  | 1.434268 |
|           DebertaForMaskedLM            |  8  | 1.408334 |
|         MegatronBertForCausalLM         |  4  | 1.390121 |
|       DebertaForQuestionAnswering       | 16  | 1.386323 |
|     PLBartForConditionalGeneration      |  4  | 1.331905 |
|             XGLMForCausalLM             |  8  | 1.302247 |
|      MBartForConditionalGeneration      |  2  | 1.300589 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.28479  |
|          DebertaV2ForMaskedLM           |  2  | 1.270863 |
|       AlbertForQuestionAnswering        |  4  | 1.256022 |
|          BlenderbotForCausalLM          |  4  | 1.251582 |
|            AlbertForMaskedLM            |  4  | 1.250145 |
|             OPTForCausalLM              |  2  | 1.245166 |
|         Speech2Text2ForCausalLM         | 256 | 1.239449 |
|       MT5ForConditionalGeneration       | 16  | 1.232691 |
|          DistilBertForMaskedLM          | 128 | 1.227526 |
|     DistilBertForQuestionAnswering      | 256 | 1.204802 |
|     M2M100ForConditionalGeneration      | 16  | 1.189149 |
|      BartForConditionalGeneration       |  2  | 1.169572 |
|     PegasusForConditionalGeneration     | 32  | 1.168091 |
|       BlenderbotSmallForCausalLM        | 64  | 1.163686 |
|             BartForCausalLM             |  4  | 1.145728 |
|           PegasusForCausalLM            | 32  | 1.142975 |
|            MBartForCausalLM             |  4  | 1.139682 |
|       T5ForConditionalGeneration        |  4  | 1.136317 |
|                 T5Small                 |  4  | 1.13304  |
|            TrOCRForCausalLM             | 32  | 1.090592 |
|            PLBartForCausalLM            |  8  | 1.081479 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+------------+
|                  name                   | bs  |  inductor  |
+-----------------------------------------+-----+------------+
|          AllenaiLongformerBase          |  4  | 103.849374 |
|          MobileBertForMaskedLM          | 128 | 73.238243  |
|     MobileBertForQuestionAnswering      | 128 |  72.9653   |
|     M2M100ForConditionalGeneration      | 16  | 44.456919  |
|     PegasusForConditionalGeneration     | 32  | 44.380993  |
|      MBartForConditionalGeneration      |  2  | 44.282893  |
|      BartForConditionalGeneration       |  2  | 39.210973  |
|             XGLMForCausalLM             |  8  | 36.141006  |
|          BlenderbotForCausalLM          |  4  | 35.425569  |
|          DebertaV2ForMaskedLM           |  2  | 35.374474  |
|      DebertaV2ForQuestionAnswering      |  1  | 34.625947  |
|         MegatronBertForCausalLM         |  4  | 31.082506  |
| BlenderbotSmallForConditionalGeneration | 64  | 30.823828  |
|       MT5ForConditionalGeneration       | 16  | 30.615464  |
|    MegatronBertForQuestionAnswering     |  8  | 30.129449  |
|            YituTechConvBert             | 16  | 26.128878  |
|     PLBartForConditionalGeneration      |  4  | 24.581475  |
|       T5ForConditionalGeneration        |  4  | 23.318303  |
|                 T5Small                 |  4  | 23.187324  |
|           PegasusForCausalLM            | 32  |  20.69423  |
|            TrOCRForCausalLM             | 32  | 20.507367  |
|            MBartForCausalLM             |  4  | 20.507352  |
|             OPTForCausalLM              |  2  | 20.440286  |
|           DebertaForMaskedLM            |  8  |  19.99886  |
|       DebertaForQuestionAnswering       | 16  | 19.507087  |
|           ElectraForCausalLM            | 32  |  18.2019   |
|             BartForCausalLM             |  4  | 18.134735  |
|       ElectraForQuestionAnswering       | 64  | 17.636741  |
|       RobertaForQuestionAnswering       | 16  | 17.590825  |
|           LayoutLMForMaskedLM           | 16  | 17.552001  |
|           RobertaForCausalLM            | 16  | 17.372367  |
|    LayoutLMForSequenceClassification    | 16  | 17.315494  |
|                CamemBert                | 16  | 17.313694  |
|             BertForMaskedLM             | 16  |  17.31153  |
|        BertForQuestionAnswering         | 16  | 17.241283  |
|       BlenderbotSmallForCausalLM        | 64  | 16.175846  |
|      GPT2ForSequenceClassification      |  4  | 15.987902  |
|            AlbertForMaskedLM            |  4  | 15.723976  |
|               GoogleFnet                | 16  | 14.251138  |
|         Speech2Text2ForCausalLM         | 256 | 14.098412  |
|            PLBartForCausalLM            |  8  | 14.065292  |
|          DistilBertForMaskedLM          | 128 | 13.876044  |
|     DistilBertForQuestionAnswering      | 256 | 13.740055  |
|       AlbertForQuestionAnswering        |  4  | 13.416132  |
|               DistillGPT2               | 16  | 11.453996  |
|            XLNetLMHeadModel             |  8  |  9.765109  |
+-----------------------------------------+-----+------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.994369 |
|            AlbertForMaskedLM            |  4  | 0.993724 |
|     DistilBertForQuestionAnswering      | 256 | 0.993039 |
|             OPTForCausalLM              |  2  | 0.992866 |
|           RobertaForCausalLM            | 16  | 0.991881 |
|            TrOCRForCausalLM             | 32  | 0.991672 |
|               DistillGPT2               | 16  | 0.991605 |
|          DistilBertForMaskedLM          | 128 | 0.99142  |
|               GoogleFnet                | 16  | 0.990964 |
|            PLBartForCausalLM            |  8  | 0.990717 |
|           ElectraForCausalLM            | 32  | 0.990281 |
|           LayoutLMForMaskedLM           | 16  | 0.990213 |
|             BertForMaskedLM             | 16  | 0.990024 |
|                CamemBert                | 16  | 0.989926 |
|       ElectraForQuestionAnswering       | 64  | 0.989666 |
|            MBartForCausalLM             |  4  | 0.989613 |
|       DebertaForQuestionAnswering       | 16  | 0.988412 |
|       RobertaForQuestionAnswering       | 16  | 0.987894 |
|        BertForQuestionAnswering         | 16  | 0.987865 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.987807 |
|         Speech2Text2ForCausalLM         | 256 | 0.987431 |
|    MegatronBertForQuestionAnswering     |  8  | 0.987382 |
|    LayoutLMForSequenceClassification    | 16  | 0.987263 |
|            YituTechConvBert             | 16  | 0.987173 |
|     PLBartForConditionalGeneration      |  4  | 0.986923 |
|           PegasusForCausalLM            | 32  | 0.985959 |
|            XLNetLMHeadModel             |  8  | 0.98579  |
|             BartForCausalLM             |  4  | 0.985389 |
|       BlenderbotSmallForCausalLM        | 64  | 0.985382 |
|          MobileBertForMaskedLM          | 128 | 0.985357 |
|      GPT2ForSequenceClassification      |  4  | 0.984138 |
|           DebertaForMaskedLM            |  8  | 0.983597 |
|         MegatronBertForCausalLM         |  4  | 0.982309 |
|       T5ForConditionalGeneration        |  4  | 0.982088 |
|                 T5Small                 |  4  | 0.982081 |
|     MobileBertForQuestionAnswering      | 128 | 0.979896 |
|          AllenaiLongformerBase          |  4  | 0.97785  |
|     PegasusForConditionalGeneration     | 32  | 0.974425 |
|      BartForConditionalGeneration       |  2  | 0.96804  |
|      MBartForConditionalGeneration      |  2  | 0.967341 |
|       MT5ForConditionalGeneration       | 16  | 0.962424 |
|          DebertaV2ForMaskedLM           |  2  | 0.933602 |
|     M2M100ForConditionalGeneration      | 16  | 0.926664 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.877637 |
|             XGLMForCausalLM             |  8  | 0.873731 |
|          BlenderbotForCausalLM          |  4  | 0.844028 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2462.91864  |
|       AlbertForQuestionAnswering        |  4  | 2443.836604 |
|            XLNetLMHeadModel             |  8  | 1245.654322 |
|            TrOCRForCausalLM             | 32  | 940.897767  |
|     PegasusForConditionalGeneration     | 32  | 895.476986  |
|     DistilBertForQuestionAnswering      | 256 | 801.741122  |
|    MegatronBertForQuestionAnswering     |  8  |  699.43898  |
|      MBartForConditionalGeneration      |  2  | 626.704329  |
|            MBartForCausalLM             |  4  | 626.433462  |
|          DistilBertForMaskedLM          | 128 | 604.661271  |
|           RobertaForCausalLM            | 16  | 577.705922  |
|             OPTForCausalLM              |  2  | 572.759246  |
|          BlenderbotForCausalLM          |  4  | 556.488245  |
|     M2M100ForConditionalGeneration      | 16  |   553.036   |
|            YituTechConvBert             | 16  | 549.908454  |
|      BartForConditionalGeneration       |  2  | 547.451109  |
|          DebertaV2ForMaskedLM           |  2  | 536.362733  |
|                CamemBert                | 16  | 527.116106  |
|             BertForMaskedLM             | 16  | 521.198193  |
|           LayoutLMForMaskedLM           | 16  | 519.155688  |
|          AllenaiLongformerBase          |  4  | 502.148891  |
|            PLBartForCausalLM            |  8  | 499.825057  |
|       DebertaForQuestionAnswering       | 16  | 479.040542  |
|             BartForCausalLM             |  4  | 470.218338  |
|           PegasusForCausalLM            | 32  | 444.431202  |
| BlenderbotSmallForConditionalGeneration | 64  | 444.215138  |
|     PLBartForConditionalGeneration      |  4  | 434.086472  |
|        BertForQuestionAnswering         | 16  | 415.333782  |
|    LayoutLMForSequenceClassification    | 16  | 412.722271  |
|         MegatronBertForCausalLM         |  4  | 411.938049  |
|       RobertaForQuestionAnswering       | 16  |  406.44746  |
|               GoogleFnet                | 16  | 378.636181  |
|          MobileBertForMaskedLM          | 128 |  374.08436  |
|               DistillGPT2               | 16  | 367.732763  |
|       T5ForConditionalGeneration        |  4  | 351.252529  |
|                 T5Small                 |  4  |  350.87294  |
|             XGLMForCausalLM             |  8  | 320.705481  |
|           DebertaForMaskedLM            |  8  | 319.970246  |
|       ElectraForQuestionAnswering       | 64  | 286.184765  |
|       MT5ForConditionalGeneration       | 16  | 265.469061  |
|       BlenderbotSmallForCausalLM        | 64  | 261.453104  |
|         Speech2Text2ForCausalLM         | 256 | 247.577507  |
|      GPT2ForSequenceClassification      |  4  |  235.78825  |
|      DebertaV2ForQuestionAnswering      |  1  | 224.305101  |
|           ElectraForCausalLM            | 32  | 218.398544  |
|     MobileBertForQuestionAnswering      | 128 | 207.300324  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|            lcnet_050            | 256  | 4.017626 |
|           fbnetc_100            | 512  | 3.996718 |
|         mobilenetv2_100         | 128  | 3.977204 |
|           mnasnet_100           | 512  | 3.921548 |
|          spnasnet_100           | 128  | 3.815506 |
|      mobilenetv3_large_100      | 512  | 3.635456 |
|            fbnetv3_b            | 256  | 3.506136 |
|           regnety_002           | 1024 | 3.353522 |
|           rexnet_100            | 256  | 3.138836 |
|       tf_efficientnet_b0        | 128  | 3.086222 |
|            tinynet_a            | 128  | 3.075048 |
|          pnasnet5large          |  16  | 2.731994 |
|        ese_vovnet19b_dw         | 256  | 2.628438 |
|          botnet26t_256          | 128  | 2.556472 |
|           res2next50            | 128  | 2.465299 |
|           mobilevit_s           |  64  | 2.428519 |
|       gluon_inception_v3        | 256  |  2.397   |
|        eca_halonext26ts         | 128  | 2.355338 |
|       eca_botnext26ts_256       | 128  | 2.354716 |
|          ghostnet_100           | 512  | 2.353646 |
|          inception_v3           | 128  | 2.327905 |
|        adv_inception_v3         | 128  | 2.316057 |
|             dla102              | 128  | 2.234055 |
|        res2net50_14w_8s         | 128  | 2.192352 |
|        res2net101_26w_4s        | 128  | 2.15718  |
|          cspdarknet53           |  64  | 2.095539 |
|            repvgg_a2            | 128  | 2.064763 |
|            nfnet_l0             | 128  | 1.998113 |
|           tf_mixnet_l           | 128  | 1.994232 |
|        convmixer_768_32         |  32  | 1.991077 |
|         poolformer_m36          |  64  | 1.935921 |
|            gernet_l             | 128  | 1.932949 |
|           volo_d1_224           |  64  | 1.852555 |
|           dm_nfnet_f0           | 128  | 1.851918 |
|            mixnet_l             | 128  | 1.813339 |
|           selecsls42b           | 128  | 1.798844 |
|         visformer_small         | 128  | 1.715942 |
|        sebotnet33ts_256         |  64  | 1.635765 |
|           convit_base           |  64  | 1.571986 |
|           resnest101e           |  64  | 1.496474 |
|            levit_128            | 1024 | 1.494309 |
|             dpn107              |  64  | 1.452444 |
|          jx_nest_base           |  32  | 1.401718 |
|          gmlp_s16_224           | 128  | 1.385867 |
|      xcit_large_24_p8_224       |  16  | 1.369074 |
|          gmixer_24_224          | 128  | 1.333175 |
|  swin_base_patch4_window7_224   |  64  | 1.331864 |
|         coat_lite_mini          | 128  | 1.294807 |
|        twins_pcpvt_base         | 128  | 1.290351 |
|        tnt_s_patch16_224        | 128  | 1.249485 |
|          convnext_base          |  64  | 1.207937 |
|      beit_base_patch16_224      |  64  | 1.195149 |
|          mixer_b16_224          | 128  | 1.191272 |
| deit_base_distilled_patch16_224 |  64  | 1.173034 |
|      vit_base_patch16_224       |  64  | 1.158033 |
|            pit_b_224            |  64  | 1.130846 |
|          cait_m36_384           |  4   | 1.127072 |
|         crossvit_9_240          | 256  | 1.071416 |
|            hrnet_w18            | 128  | 0.80751  |
|          resmlp_12_224          | 128  | 0.779526 |
|     swsl_resnext101_32x16d      |  32  | 0.073223 |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+-----------+
|              name               |  bs  | inductor  |
+---------------------------------+------+-----------+
|     swsl_resnext101_32x16d      |  32  | 99.663125 |
|          cait_m36_384           |  4   | 64.399596 |
|            hrnet_w18            | 128  | 53.924052 |
|      xcit_large_24_p8_224       |  16  | 52.939932 |
|          pnasnet5large          |  16  | 51.002934 |
|  swin_base_patch4_window7_224   |  64  | 49.464665 |
|         poolformer_m36          |  64  | 45.357678 |
|           mobilevit_s           |  64  | 41.500893 |
|        tnt_s_patch16_224        | 128  | 38.964199 |
|          jx_nest_base           |  32  | 38.917854 |
|        twins_pcpvt_base         | 128  | 37.170936 |
|             dpn107              |  64  | 35.914128 |
|        res2net101_26w_4s        | 128  | 34.882518 |
|        res2net50_14w_8s         | 128  | 33.816447 |
|           volo_d1_224           |  64  | 31.132093 |
|        eca_halonext26ts         | 128  | 30.789006 |
|           resnest101e           |  64  | 30.583531 |
|           tf_mixnet_l           | 128  | 30.090891 |
|         crossvit_9_240          | 256  | 28.71917  |
|            levit_128            | 1024 | 27.644507 |
|        adv_inception_v3         | 128  | 27.624499 |
|            mixnet_l             | 128  | 27.60832  |
|          gmixer_24_224          | 128  | 26.933269 |
|          gmlp_s16_224           | 128  | 25.725121 |
|          inception_v3           | 128  | 25.375191 |
|        sebotnet33ts_256         |  64  | 25.274117 |
|         coat_lite_mini          | 128  | 23.234856 |
|          convnext_base          |  64  | 22.941422 |
|       gluon_inception_v3        | 256  | 22.568931 |
|           res2next50            | 128  | 21.913564 |
|       eca_botnext26ts_256       | 128  | 20.456803 |
|           convit_base           |  64  | 19.792138 |
|           rexnet_100            | 256  | 19.687995 |
|            fbnetv3_b            | 256  | 19.174963 |
|          ghostnet_100           | 512  | 18.841007 |
|          botnet26t_256          | 128  | 18.62617  |
|             dla102              | 128  | 17.940351 |
|            tinynet_a            | 128  | 17.764318 |
|            pit_b_224            |  64  | 17.282659 |
|         visformer_small         | 128  | 17.223475 |
|      beit_base_patch16_224      |  64  | 16.321416 |
|          mixer_b16_224          | 128  | 16.286909 |
|       tf_efficientnet_b0        | 128  | 16.159422 |
|        convmixer_768_32         |  32  | 15.818601 |
|          cspdarknet53           |  64  | 15.266751 |
| deit_base_distilled_patch16_224 |  64  | 15.235019 |
|      vit_base_patch16_224       |  64  | 15.164758 |
|          resmlp_12_224          | 128  | 14.069114 |
|          spnasnet_100           | 128  | 13.505876 |
|           dm_nfnet_f0           | 128  | 13.444646 |
|            repvgg_a2            | 128  | 13.086319 |
|            gernet_l             | 128  | 12.56781  |
|         mobilenetv2_100         | 128  | 12.431019 |
|            nfnet_l0             | 128  | 12.320356 |
|      mobilenetv3_large_100      | 512  | 11.825248 |
|           selecsls42b           | 128  | 11.793096 |
|           regnety_002           | 1024 | 11.71712  |
|            lcnet_050            | 256  | 11.294934 |
|           fbnetc_100            | 512  | 9.959709  |
|        ese_vovnet19b_dw         | 256  | 9.560352  |
|           mnasnet_100           | 512  | 9.431621  |
+---------------------------------+------+-----------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997062 |
|           fbnetc_100            | 512  | 0.996037 |
|      mobilenetv3_large_100      | 512  | 0.995737 |
|            fbnetv3_b            | 256  | 0.995638 |
|           mnasnet_100           | 512  | 0.994902 |
|           dm_nfnet_f0           | 128  | 0.994681 |
|          ghostnet_100           | 512  | 0.994517 |
|           regnety_002           | 1024 | 0.994259 |
|          convnext_base          |  64  | 0.993573 |
|            levit_128            | 1024 | 0.99352  |
|       eca_botnext26ts_256       | 128  | 0.993481 |
|        eca_halonext26ts         | 128  | 0.992913 |
|           rexnet_100            | 256  | 0.992775 |
|            nfnet_l0             | 128  | 0.992684 |
|           res2next50            | 128  | 0.991912 |
|          gmlp_s16_224           | 128  | 0.991551 |
|        twins_pcpvt_base         | 128  | 0.991221 |
|        convmixer_768_32         |  32  | 0.991209 |
|         coat_lite_mini          | 128  | 0.991163 |
|          botnet26t_256          | 128  | 0.99101  |
|       tf_efficientnet_b0        | 128  | 0.990994 |
|         mobilenetv2_100         | 128  | 0.990953 |
|         visformer_small         | 128  | 0.990936 |
|      xcit_large_24_p8_224       |  16  | 0.99091  |
|           tf_mixnet_l           | 128  | 0.990885 |
|        res2net101_26w_4s        | 128  | 0.990875 |
|          mixer_b16_224          | 128  | 0.990394 |
|        sebotnet33ts_256         |  64  | 0.989962 |
|            mixnet_l             | 128  | 0.989932 |
|             dla102              | 128  | 0.989827 |
|          cspdarknet53           |  64  | 0.989681 |
|           mobilevit_s           |  64  | 0.989523 |
|             dpn107              |  64  | 0.989452 |
|          gmixer_24_224          | 128  | 0.988915 |
|       gluon_inception_v3        | 256  | 0.988801 |
|           convit_base           |  64  | 0.988761 |
|        tnt_s_patch16_224        | 128  | 0.98857  |
|            gernet_l             | 128  | 0.988408 |
|        res2net50_14w_8s         | 128  | 0.988355 |
|  swin_base_patch4_window7_224   |  64  | 0.98801  |
|      beit_base_patch16_224      |  64  | 0.987792 |
|           resnest101e           |  64  | 0.987467 |
|         poolformer_m36          |  64  | 0.987206 |
|           selecsls42b           | 128  | 0.987068 |
|            tinynet_a            | 128  | 0.985809 |
|          resmlp_12_224          | 128  | 0.98473  |
| deit_base_distilled_patch16_224 |  64  | 0.984702 |
|      vit_base_patch16_224       |  64  | 0.98448  |
|          spnasnet_100           | 128  | 0.984022 |
|          jx_nest_base           |  32  | 0.983319 |
|          pnasnet5large          |  16  | 0.983281 |
|            pit_b_224            |  64  | 0.982892 |
|            hrnet_w18            | 128  | 0.982874 |
|        adv_inception_v3         | 128  | 0.982815 |
|          inception_v3           | 128  | 0.982779 |
|     swsl_resnext101_32x16d      |  32  | 0.980801 |
|            lcnet_050            | 256  | 0.980344 |
|            repvgg_a2            | 128  | 0.98001  |
|           volo_d1_224           |  64  | 0.977811 |
|          cait_m36_384           |  4   | 0.976459 |
|         crossvit_9_240          | 256  | 0.973427 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+--------------+
|              name               |  bs  |   inductor   |
+---------------------------------+------+--------------+
|     swsl_resnext101_32x16d      |  32  | 17323.035954 |
|           resnest101e           |  64  | 2703.220484  |
|            hrnet_w18            | 128  | 1418.118585  |
|      xcit_large_24_p8_224       |  16  | 1285.905804  |
|          cait_m36_384           |  4   | 1163.781257  |
|          convnext_base          |  64  | 1021.364186  |
|           dm_nfnet_f0           | 128  |  961.96599   |
|             dpn107              |  64  |  957.247866  |
|          mixer_b16_224          | 128  |  931.928001  |
|       gluon_inception_v3        | 256  |  782.111778  |
|  swin_base_patch4_window7_224   |  64  |  757.773063  |
|        tnt_s_patch16_224        | 128  |  741.561276  |
|        twins_pcpvt_base         | 128  |  704.290732  |
|           convit_base           |  64  |  694.783699  |
|        res2net101_26w_4s        | 128  |  635.41905   |
|      vit_base_patch16_224       |  64  |  609.325519  |
|            nfnet_l0             | 128  |  606.070539  |
| deit_base_distilled_patch16_224 |  64  |  604.041544  |
|      beit_base_patch16_224      |  64  |  600.490531  |
|            levit_128            | 1024 |  560.404777  |
|        ese_vovnet19b_dw         | 256  |  549.697916  |
|             dla102              | 128  |  523.802466  |
|          gmlp_s16_224           | 128  |  518.371992  |
|            pit_b_224            |  64  |  498.182065  |
|          gmixer_24_224          | 128  |  489.046023  |
|         crossvit_9_240          | 256  |  476.222322  |
|          jx_nest_base           |  32  |  466.242134  |
|        convmixer_768_32         |  32  |  437.323307  |
|          resmlp_12_224          | 128  |  410.094959  |
|         poolformer_m36          |  64  |  408.589493  |
|          inception_v3           | 128  |  394.895508  |
|        adv_inception_v3         | 128  |  394.276767  |
|        res2net50_14w_8s         | 128  |  393.799114  |
|         visformer_small         | 128  |  385.850794  |
|           volo_d1_224           |  64  |  384.802612  |
|         coat_lite_mini          | 128  |  364.028409  |
|          ghostnet_100           | 512  |  360.711571  |
|           res2next50            | 128  |  352.151875  |
|            mixnet_l             | 128  |  349.799499  |
|          pnasnet5large          |  16  |  340.535157  |
|            repvgg_a2            | 128  |  337.58728   |
|           tf_mixnet_l           | 128  |  336.841311  |
|        sebotnet33ts_256         |  64  |  308.683518  |
|           fbnetc_100            | 512  |  302.344297  |
|        eca_halonext26ts         | 128  |  300.643931  |
|       eca_botnext26ts_256       | 128  |  295.677822  |
|            gernet_l             | 128  |  290.94572   |
|          botnet26t_256          | 128  |  282.273715  |
|           regnety_002           | 1024 |  278.009162  |
|          cspdarknet53           |  64  |  260.563249  |
|           mnasnet_100           | 512  |  255.684829  |
|            fbnetv3_b            | 256  |  242.405808  |
|           selecsls42b           | 128  |  230.540075  |
|      mobilenetv3_large_100      | 512  |  230.417449  |
|           rexnet_100            | 256  |  224.587737  |
|           mobilevit_s           |  64  |  157.921517  |
|       tf_efficientnet_b0        | 128  |  116.289665  |
|            tinynet_a            | 128  |   80.49897   |
|         mobilenetv2_100         | 128  |  70.236566   |
|          spnasnet_100           | 128  |   62.23995   |
|            lcnet_050            | 256  |  26.766789   |
+---------------------------------+------+--------------+

zxd1997066 · 2024-10-28T21:40:19Z

[cppwrapper_static_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-27 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 80/80 | 100%, 46/46 | 100%, 61/61 |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.61x    |    1.22x    |    1.59x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   28.32    |    25.98    |    25.01    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 49.683541 |
|     pyhpc_equation_of_state     |    1    | 23.626757 |
|          maml_omniglot          |    5    | 3.978383  |
|     functorch_maml_omniglot     |    1    | 3.822895  |
|          basic_gnn_gin          |    1    | 3.622482  |
|         basic_gnn_sage          |    1    | 3.618642  |
|          squeezenet1_1          |    1    | 3.533763  |
|          basic_gnn_gcn          |    1    | 3.259725  |
|         opacus_cifar10          |    1    | 2.854854  |
|           timm_nfnet            |    1    | 2.713991  |
|      functorch_dp_cifar10       |    1    | 2.623144  |
|       shufflenet_v2_x1_0        |    1    | 2.331636  |
|              dcgan              |    1    |  2.26421  |
|            resnet18             |    1    | 2.237447  |
|          mobilenet_v2           |    1    |  2.20579  |
|         phlippe_resnet          |    1    | 2.122623  |
|           mnasnet1_0            |    1    | 2.093028  |
|       mobilenet_v3_large        |    1    | 2.039872  |
|          timm_resnest           |    1    | 2.022699  |
|        phlippe_densenet         |    1    | 1.848082  |
|            resnet50             |    1    | 1.828038  |
|        timm_efficientnet        |    1    | 1.826928  |
|           densenet121           |    1    | 1.806369  |
|            resnet152            |    1    |  1.72683  |
|         LearningToPaint         |    1    | 1.652815  |
|           timm_vovnet           |    1    |  1.62599  |
|          lennard_jones          |    1    | 1.602205  |
|              llama              |    1    | 1.597276  |
|           timm_regnet           |    1    | 1.525012  |
|         resnext50_32x4d         |    1    | 1.523925  |
|      doctr_reco_predictor       |    1    | 1.498494  |
|              vgg16              |    1    | 1.446721  |
|       doctr_det_predictor       |    1    | 1.412296  |
|        basic_gnn_edgecnn        |    1    | 1.410787  |
|              dlrm               |    1    |  1.40572  |
|             yolov3              |    1    | 1.402217  |
|          BERT_pytorch           |    1    | 1.401236  |
|             alexnet             |    1    | 1.379753  |
|               drq               |    1    | 1.358096  |
|            hf_Albert            |    1    | 1.326506  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.298339  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.285626  |
|             hf_GPT2             |    1    | 1.284414  |
|           hf_Reformer           |    1    | 1.267505  |
|          fastNLP_Bert           |    1    | 1.267443  |
|         vision_maskrcnn         |    1    | 1.265795  |
|     timm_vision_transformer     |    1    | 1.263555  |
|          hf_GPT2_large          |    1    | 1.256522  |
|            moondream            |    1    | 1.226361  |
|           Super_SloMo           |    1    | 1.225177  |
|        soft_actor_critic        |   256   | 1.200437  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.193678  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.192899  |
|             hf_Bert             |    1    | 1.190199  |
|         pytorch_stargan         |   16    | 1.189127  |
|          hf_Bert_large          |    1    | 1.179909  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.177588  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.175377  |
|  timm_vision_transformer_large  |    1    | 1.168957  |
|          hf_DistilBert          |    1    | 1.158142  |
|              maml               |    1    | 1.154701  |
|      torch_multimodal_clip      |    1    | 1.136189  |
|          pytorch_unet           |    1    | 1.116913  |
|             hf_Bart             |    1    | 1.115178  |
|           hf_BigBird            |    1    | 1.098321  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.097352  |
|       speech_transformer        |    1    | 1.097232  |
|        hf_distil_whisper        |    1    | 1.080051  |
|       Background_Matting        |    1    | 1.068853  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.052882  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051648  |
|          hf_Longformer          |    1    |  1.03354  |
|             demucs              |    1    |  1.00557  |
|           tts_angular           |    1    | 0.999606  |
|     resnet50_quantized_qat      |    1    |  0.99433  |
|   mobilenet_v2_quantized_qat    |    1    | 0.985433  |
|     nvidia_deeprecommender      |    1    | 0.966383  |
|           hf_T5_large           |    1    | 0.791867  |
|              hf_T5              |    1    | 0.705894  |
|           hf_T5_base            |    1    | 0.590945  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 156.34605  |
|         vision_maskrcnn         |    1    | 146.339533 |
|    detectron2_fcos_r_50_fpn     |    1    | 122.999989 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 117.877597 |
|           hf_T5_base            |    1    | 92.376431  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 91.363995  |
|           hf_T5_large           |    1    | 90.557956  |
|              maml               |    1    | 83.252874  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 63.077135  |
|          hf_Longformer          |    1    | 60.453511  |
|       speech_transformer        |    1    |  51.93967  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.796795  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 45.136063  |
|           hf_Reformer           |    1    |  43.78677  |
|           densenet121           |    1    | 39.492331  |
|  timm_vision_transformer_large  |    1    | 35.450888  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 33.161658  |
|          fastNLP_Bert           |    1    | 30.740575  |
|          basic_gnn_gcn          |    1    | 29.741655  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.390243  |
|          hf_Bert_large          |    1    | 28.943967  |
|          hf_GPT2_large          |    1    | 27.992043  |
|            moondream            |    1    | 27.067136  |
|        hf_distil_whisper        |    1    | 26.855007  |
|       doctr_det_predictor       |    1    | 26.530221  |
|            resnet152            |    1    | 25.064957  |
|           Super_SloMo           |    1    | 24.946598  |
|      torch_multimodal_clip      |    1    | 24.398307  |
|              hf_T5              |    1    | 22.414666  |
|          BERT_pytorch           |    1    | 21.618343  |
|             hf_Bart             |    1    | 20.697565  |
|             yolov3              |    1    | 19.495258  |
|             demucs              |    1    | 18.132755  |
|           timm_regnet           |    1    | 17.583904  |
|        phlippe_densenet         |    1    | 17.356004  |
|             hf_Bert             |    1    | 17.189131  |
|       shufflenet_v2_x1_0        |    1    | 16.910872  |
|              llama              |    1    | 16.491945  |
|           timm_nfnet            |    1    | 16.390766  |
|        timm_efficientnet        |    1    | 16.005933  |
|             hf_GPT2             |    1    |  15.86621  |
|       Background_Matting        |    1    | 15.352187  |
|          timm_resnest           |    1    | 15.137417  |
|     timm_vision_transformer     |    1    | 14.957746  |
|       mobilenet_v3_large        |    1    | 14.905253  |
|            hf_Albert            |    1    | 14.762145  |
|      doctr_reco_predictor       |    1    | 14.615374  |
|           timm_vovnet           |    1    | 14.241538  |
|         resnext50_32x4d         |    1    | 13.705105  |
|            resnet50             |    1    | 13.666719  |
|          mobilenet_v2           |    1    | 13.647086  |
|           mnasnet1_0            |    1    | 13.311271  |
|          hf_DistilBert          |    1    | 12.981577  |
|     pyhpc_isoneutral_mixing     |    1    | 12.892473  |
|         opacus_cifar10          |    1    | 12.467833  |
|      functorch_dp_cifar10       |    1    |  11.9344   |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.582693  |
|         pytorch_stargan         |   16    | 10.832793  |
|          pytorch_unet           |    1    | 10.560272  |
|            resnet18             |    1    | 10.095409  |
|          squeezenet1_1          |    1    |  9.895626  |
|         LearningToPaint         |    1    |  9.879558  |
|         phlippe_resnet          |    1    |  9.65777   |
|              vgg16              |    1    |  9.407169  |
|     pyhpc_equation_of_state     |    1    |  9.292983  |
|             alexnet             |    1    |  9.281162  |
|     functorch_maml_omniglot     |    1    |  8.669082  |
|               drq               |    1    |  8.532133  |
|     nvidia_deeprecommender      |    1    |  8.492184  |
|          maml_omniglot          |    5    |  8.378227  |
|              dlrm               |    1    |  8.282896  |
|              dcgan              |    1    |  7.861173  |
|         basic_gnn_sage          |    1    |   7.7749   |
|          basic_gnn_gin          |    1    |  7.761854  |
|          lennard_jones          |    1    |  7.635987  |
|        soft_actor_critic        |   256   |  7.595927  |
|        basic_gnn_edgecnn        |    1    |  7.574335  |
|           tts_angular           |    1    |  7.395056  |
|   mobilenet_v2_quantized_qat    |    1    |  0.099849  |
|     resnet50_quantized_qat      |    1    |  0.071629  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995311 |
|           hf_T5_base            |    1    | 0.988424 |
|              dlrm               |    1    | 0.987121 |
|          hf_GPT2_large          |    1    | 0.986175 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.980935 |
|       Background_Matting        |    1    | 0.980639 |
|          pytorch_unet           |    1    | 0.974234 |
|        basic_gnn_edgecnn        |    1    | 0.972826 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.972758 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.95826  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.95554  |
|       doctr_det_predictor       |    1    | 0.954828 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.952751 |
|      torch_multimodal_clip      |    1    | 0.952066 |
|     resnet50_quantized_qat      |    1    | 0.950311 |
|         pytorch_stargan         |   16    | 0.945962 |
|           hf_BigBird            |    1    | 0.94532  |
|         basic_gnn_sage          |    1    | 0.944896 |
|         vision_maskrcnn         |    1    | 0.944878 |
|         LearningToPaint         |    1    | 0.944219 |
|          basic_gnn_gin          |    1    | 0.943599 |
|          basic_gnn_gcn          |    1    | 0.941686 |
|      doctr_reco_predictor       |    1    | 0.941548 |
|           Super_SloMo           |    1    | 0.935005 |
|        hf_distil_whisper        |    1    | 0.930176 |
|   mobilenet_v2_quantized_qat    |    1    | 0.924797 |
|              llama              |    1    | 0.918654 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.911677 |
|           tts_angular           |    1    | 0.889982 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885138 |
|        soft_actor_critic        |   256   | 0.882891 |
|         opacus_cifar10          |    1    | 0.879944 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.877841 |
|             yolov3              |    1    | 0.875396 |
|              dcgan              |    1    | 0.868526 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.860255 |
|          fastNLP_Bert           |    1    | 0.858287 |
|           mnasnet1_0            |    1    | 0.857143 |
|          lennard_jones          |    1    | 0.856984 |
|          timm_resnest           |    1    | 0.854816 |
|          squeezenet1_1          |    1    | 0.853491 |
|          maml_omniglot          |    5    | 0.851813 |
|     functorch_maml_omniglot     |    1    | 0.850622 |
|          mobilenet_v2           |    1    | 0.849338 |
|        timm_efficientnet        |    1    | 0.849305 |
|     pyhpc_equation_of_state     |    1    | 0.841357 |
|       mobilenet_v3_large        |    1    | 0.840237 |
|       shufflenet_v2_x1_0        |    1    | 0.827012 |
|         phlippe_resnet          |    1    | 0.826498 |
|            moondream            |    1    | 0.825595 |
|     pyhpc_isoneutral_mixing     |    1    | 0.821581 |
|       speech_transformer        |    1    | 0.817174 |
|          hf_Bert_large          |    1    | 0.814944 |
|           timm_nfnet            |    1    | 0.813405 |
|        phlippe_densenet         |    1    | 0.809571 |
|          hf_Longformer          |    1    | 0.809367 |
|             hf_Bert             |    1    | 0.805617 |
|     timm_vision_transformer     |    1    | 0.804604 |
|            hf_Albert            |    1    | 0.803802 |
|           hf_T5_large           |    1    | 0.799969 |
|           timm_vovnet           |    1    | 0.794069 |
|            resnet18             |    1    | 0.793486 |
|         resnext50_32x4d         |    1    | 0.789907 |
|             hf_Bart             |    1    | 0.785416 |
|           densenet121           |    1    | 0.781937 |
|            resnet50             |    1    | 0.781182 |
|          hf_DistilBert          |    1    | 0.779857 |
|          BERT_pytorch           |    1    | 0.778621 |
|           timm_regnet           |    1    | 0.776688 |
|             hf_GPT2             |    1    | 0.769131 |
|               drq               |    1    | 0.754967 |
|      functorch_dp_cifar10       |    1    | 0.745113 |
|             alexnet             |    1    | 0.733557 |
|  timm_vision_transformer_large  |    1    | 0.733027 |
|              hf_T5              |    1    | 0.730821 |
|           hf_Reformer           |    1    | 0.725112 |
|              vgg16              |    1    | 0.722956 |
|              maml               |    1    | 0.717196 |
|            resnet152            |    1    | 0.716234 |
|     nvidia_deeprecommender      |    1    | 0.673935 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26826.720668 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 11829.925934 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11247.615354 |
|          hf_GPT2_large          |    1    | 9927.686905  |
|           hf_T5_large           |    1    |  7547.99735  |
|            moondream            |    1    | 7374.210553  |
|        hf_distil_whisper        |    1    | 6737.992912  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5722.842449  |
|       Background_Matting        |    1    | 5285.784199  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5031.555537  |
|          pytorch_unet           |    1    | 4577.319506  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3174.317135  |
|  timm_vision_transformer_large  |    1    | 2794.635616  |
|         vision_maskrcnn         |    1    | 2640.976353  |
|    detectron2_fcos_r_50_fpn     |    1    | 2428.059512  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2402.506466  |
|             demucs              |    1    | 2294.444787  |
|         pytorch_stargan         |   16    | 2037.181439  |
|           Super_SloMo           |    1    | 1923.475216  |
|          hf_Bert_large          |    1    | 1769.401712  |
|           hf_BigBird            |    1    | 1581.234344  |
|       doctr_det_predictor       |    1    | 1576.042675  |
|      torch_multimodal_clip      |    1    | 1226.588923  |
|          hf_Longformer          |    1    | 1116.854379  |
|             hf_Bart             |    1    |  867.523635  |
|              hf_T5              |    1    |  765.778925  |
|       speech_transformer        |    1    |  675.015348  |
|             hf_Bert             |    1    |  667.733951  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  625.430245  |
|            hf_Albert            |    1    |  562.142143  |
|          fastNLP_Bert           |    1    |  513.609547  |
|             yolov3              |    1    |  434.052385  |
|          hf_DistilBert          |    1    |  414.161618  |
|             hf_GPT2             |    1    |  352.697913  |
|           hf_Reformer           |    1    |  290.687346  |
|        basic_gnn_edgecnn        |    1    |  234.751708  |
|              vgg16              |    1    |  191.693486  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  165.383584  |
|           timm_regnet           |    1    |  150.525849  |
|            resnet152            |    1    |  137.186721  |
|          BERT_pytorch           |    1    |  134.36831   |
|           timm_nfnet            |    1    |  101.047841  |
|              maml               |    1    |  86.084497   |
|           timm_vovnet           |    1    |  80.849663   |
|     nvidia_deeprecommender      |    1    |  58.722224   |
|         resnext50_32x4d         |    1    |  57.735251   |
|     timm_vision_transformer     |    1    |  57.191585   |
|           tts_angular           |    1    |  54.802337   |
|            resnet50             |    1    |  51.311391   |
|           densenet121           |    1    |  44.607446   |
|          timm_resnest           |    1    |  33.406671   |
|          basic_gnn_gcn          |    1    |  32.187197   |
|      doctr_reco_predictor       |    1    |  24.018077   |
|             alexnet             |    1    |  22.413772   |
|            resnet18             |    1    |  22.366595   |
|              llama              |    1    |  21.120284   |
|     resnet50_quantized_qat      |    1    |  19.698285   |
|         basic_gnn_sage          |    1    |  16.872228   |
|          basic_gnn_gin          |    1    |  16.568715   |
|        timm_efficientnet        |    1    |  12.812705   |
|         LearningToPaint         |    1    |   9.982567   |
|   mobilenet_v2_quantized_qat    |    1    |   8.749244   |
|           mnasnet1_0            |    1    |   7.469639   |
|          mobilenet_v2           |    1    |   7.399787   |
|       mobilenet_v3_large        |    1    |   6.993479   |
|          squeezenet1_1          |    1    |   5.807232   |
|       shufflenet_v2_x1_0        |    1    |   5.675031   |
|        phlippe_densenet         |    1    |   3.441254   |
|        soft_actor_critic        |   256   |   3.038633   |
|      functorch_dp_cifar10       |    1    |   2.09027    |
|         opacus_cifar10          |    1    |   2.062132   |
|               drq               |    1    |   1.888905   |
|              dcgan              |    1    |   1.780811   |
|         phlippe_resnet          |    1    |   1.352973   |
|     functorch_maml_omniglot     |    1    |   0.827286   |
|              dlrm               |    1    |   0.599621   |
|          maml_omniglot          |    5    |   0.58694    |
|          lennard_jones          |    1    |   0.068622   |
|     pyhpc_isoneutral_mixing     |    1    |   0.064977   |
|     pyhpc_equation_of_state     |    1    |   0.050753   |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.154867 |
|     MobileBertForQuestionAnswering      | 1  | 1.832698 |
|         Speech2Text2ForCausalLM         | 1  | 1.422584 |
|      GPT2ForSequenceClassification      | 1  | 1.366252 |
|            XLNetLMHeadModel             | 1  | 1.363685 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.350342 |
|     DistilBertForQuestionAnswering      | 1  | 1.337764 |
|          DistilBertForMaskedLM          | 1  | 1.335821 |
|            YituTechConvBert             | 1  | 1.322721 |
|       BlenderbotSmallForCausalLM        | 1  | 1.31683  |
|       DebertaForQuestionAnswering       | 1  | 1.310523 |
|           ElectraForCausalLM            | 1  | 1.290157 |
|       ElectraForQuestionAnswering       | 1  | 1.282086 |
|           DebertaForMaskedLM            | 1  | 1.276763 |
|     PegasusForConditionalGeneration     | 1  | 1.257838 |
|     M2M100ForConditionalGeneration      | 1  | 1.25481  |
|           PegasusForCausalLM            | 1  | 1.251721 |
|          BlenderbotForCausalLM          | 1  | 1.251162 |
|             XGLMForCausalLM             | 1  | 1.24322  |
|               GoogleFnet                | 1  | 1.235346 |
|               DistillGPT2               | 1  | 1.21342  |
|           LayoutLMForMaskedLM           | 1  | 1.210525 |
|                CamemBert                | 1  | 1.206922 |
|       MT5ForConditionalGeneration       | 1  | 1.204926 |
|             BertForMaskedLM             | 1  | 1.203252 |
|    LayoutLMForSequenceClassification    | 1  | 1.198673 |
|       RobertaForQuestionAnswering       | 1  | 1.196569 |
|        BertForQuestionAnswering         | 1  | 1.196468 |
|            AlbertForMaskedLM            | 1  | 1.19286  |
|       AlbertForQuestionAnswering        | 1  | 1.190297 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.185263 |
|    MegatronBertForQuestionAnswering     | 1  | 1.175517 |
|           RobertaForCausalLM            | 1  | 1.172827 |
|         MegatronBertForCausalLM         | 1  | 1.170047 |
|          DebertaV2ForMaskedLM           | 1  | 1.167097 |
|            TrOCRForCausalLM             | 1  | 1.144668 |
|     PLBartForConditionalGeneration      | 1  | 1.115195 |
|      MBartForConditionalGeneration      | 1  | 1.090145 |
|      BartForConditionalGeneration       | 1  | 1.063069 |
|             BartForCausalLM             | 1  | 1.062839 |
|            PLBartForCausalLM            | 1  | 1.022914 |
|             OPTForCausalLM              | 1  |  1.0219  |
|            MBartForCausalLM             | 1  | 1.007043 |
|          AllenaiLongformerBase          | 1  | 0.96869  |
|       T5ForConditionalGeneration        | 1  | 0.618634 |
|                 T5Small                 | 1  | 0.612487 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 97.844117 |
|          MobileBertForMaskedLM          | 1  | 69.642864 |
|     MobileBertForQuestionAnswering      | 1  | 69.316835 |
|     M2M100ForConditionalGeneration      | 1  | 41.250943 |
|     PegasusForConditionalGeneration     | 1  | 41.103634 |
|      MBartForConditionalGeneration      | 1  | 39.523454 |
|      BartForConditionalGeneration       | 1  | 35.992247 |
|             XGLMForCausalLM             | 1  | 34.03774  |
|          BlenderbotForCausalLM          | 1  | 33.244145 |
|            XLNetLMHeadModel             | 1  | 33.139246 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.93692  |
|          DebertaV2ForMaskedLM           | 1  | 30.861078 |
|         MegatronBertForCausalLM         | 1  | 29.560471 |
|    MegatronBertForQuestionAnswering     | 1  | 29.417489 |
| BlenderbotSmallForConditionalGeneration | 1  | 29.269551 |
|       MT5ForConditionalGeneration       | 1  | 28.342576 |
|       T5ForConditionalGeneration        | 1  | 26.97218  |
|                 T5Small                 | 1  | 26.944648 |
|            YituTechConvBert             | 1  | 26.170911 |
|     PLBartForConditionalGeneration      | 1  | 22.54399  |
|            MBartForCausalLM             | 1  | 20.153666 |
|           PegasusForCausalLM            | 1  | 19.87379  |
|            TrOCRForCausalLM             | 1  | 19.780389 |
|             OPTForCausalLM              | 1  | 19.294918 |
|           ElectraForCausalLM            | 1  | 18.27477  |
|           DebertaForMaskedLM            | 1  | 18.197462 |
|       ElectraForQuestionAnswering       | 1  | 18.045199 |
|       DebertaForQuestionAnswering       | 1  | 17.934749 |
|           LayoutLMForMaskedLM           | 1  | 17.719493 |
|           RobertaForCausalLM            | 1  | 17.636612 |
|                CamemBert                | 1  | 17.62549  |
|             BertForMaskedLM             | 1  | 17.49717  |
|    LayoutLMForSequenceClassification    | 1  | 17.486232 |
|       RobertaForQuestionAnswering       | 1  | 17.457121 |
|        BertForQuestionAnswering         | 1  | 17.30572  |
|             BartForCausalLM             | 1  | 17.16219  |
|       BlenderbotSmallForCausalLM        | 1  | 15.648543 |
|      GPT2ForSequenceClassification      | 1  | 14.645402 |
|               GoogleFnet                | 1  | 14.42513  |
|         Speech2Text2ForCausalLM         | 1  | 13.853395 |
|          DistilBertForMaskedLM          | 1  | 13.654079 |
|     DistilBertForQuestionAnswering      | 1  | 13.489211 |
|            PLBartForCausalLM            | 1  | 13.477671 |
|               DistillGPT2               | 1  | 11.701898 |
|            AlbertForMaskedLM            | 1  | 9.425233  |
|       AlbertForQuestionAnswering        | 1  | 7.073616  |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986357 |
|      MBartForConditionalGeneration      | 1  | 0.974818 |
|                 T5Small                 | 1  | 0.954517 |
|      GPT2ForSequenceClassification      | 1  | 0.952927 |
|          AllenaiLongformerBase          | 1  | 0.949934 |
|            XLNetLMHeadModel             | 1  | 0.909789 |
|       T5ForConditionalGeneration        | 1  | 0.906531 |
|            PLBartForCausalLM            | 1  | 0.900741 |
|     PLBartForConditionalGeneration      | 1  |   0.9    |
|            MBartForCausalLM             | 1  | 0.887211 |
|       DebertaForQuestionAnswering       | 1  | 0.877832 |
|      BartForConditionalGeneration       | 1  | 0.875216 |
|       RobertaForQuestionAnswering       | 1  | 0.868356 |
|               GoogleFnet                | 1  | 0.852548 |
|    LayoutLMForSequenceClassification    | 1  | 0.849029 |
|        BertForQuestionAnswering         | 1  | 0.846396 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.842548 |
|    MegatronBertForQuestionAnswering     | 1  | 0.839391 |
|       ElectraForQuestionAnswering       | 1  | 0.837821 |
|           DebertaForMaskedLM            | 1  | 0.829928 |
|               DistillGPT2               | 1  | 0.828193 |
|          DebertaV2ForMaskedLM           | 1  | 0.823849 |
|         MegatronBertForCausalLM         | 1  | 0.816972 |
|           LayoutLMForMaskedLM           | 1  | 0.816796 |
|             BertForMaskedLM             | 1  | 0.815364 |
|                CamemBert                | 1  | 0.813585 |
|           RobertaForCausalLM            | 1  | 0.812248 |
|         Speech2Text2ForCausalLM         | 1  | 0.811194 |
|             BartForCausalLM             | 1  | 0.808409 |
|            YituTechConvBert             | 1  | 0.808269 |
|           ElectraForCausalLM            | 1  | 0.805874 |
|     DistilBertForQuestionAnswering      | 1  | 0.798855 |
|          BlenderbotForCausalLM          | 1  | 0.797827 |
|            TrOCRForCausalLM             | 1  | 0.787075 |
|       MT5ForConditionalGeneration       | 1  | 0.778522 |
|       BlenderbotSmallForCausalLM        | 1  | 0.761737 |
|           PegasusForCausalLM            | 1  | 0.753652 |
|          DistilBertForMaskedLM          | 1  | 0.746313 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.74073  |
|     M2M100ForConditionalGeneration      | 1  | 0.721139 |
|     PegasusForConditionalGeneration     | 1  | 0.712258 |
|     MobileBertForQuestionAnswering      | 1  | 0.711675 |
|             XGLMForCausalLM             | 1  | 0.698992 |
|          MobileBertForMaskedLM          | 1  | 0.693119 |
|            AlbertForMaskedLM            | 1  | 0.448727 |
|       AlbertForQuestionAnswering        | 1  | 0.443704 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12976.290601 |
|       AlbertForQuestionAnswering        | 1  | 12949.027961 |
|      MBartForConditionalGeneration      | 1  | 6069.059273  |
|      BartForConditionalGeneration       | 1  | 5531.516253  |
|             OPTForCausalLM              | 1  | 5321.350086  |
|          DebertaV2ForMaskedLM           | 1  | 5107.621024  |
|      DebertaV2ForQuestionAnswering      | 1  | 3990.649525  |
|            XLNetLMHeadModel             | 1  | 3201.384968  |
|            MBartForCausalLM             | 1  | 3086.500864  |
|          BlenderbotForCausalLM          | 1  | 2653.575985  |
|                 T5Small                 | 1  | 2544.359601  |
|       T5ForConditionalGeneration        | 1  |  2543.55947  |
|             BartForCausalLM             | 1  | 2519.892949  |
|          AllenaiLongformerBase          | 1  | 2449.418833  |
|     PLBartForConditionalGeneration      | 1  | 2147.477013  |
|         MegatronBertForCausalLM         | 1  | 2047.152027  |
|    MegatronBertForQuestionAnswering     | 1  | 1876.317614  |
|      GPT2ForSequenceClassification      | 1  |  1277.64569  |
|            PLBartForCausalLM            | 1  | 1232.807931  |
|             XGLMForCausalLM             | 1  |  842.25338   |
|           DebertaForMaskedLM            | 1  |  788.695143  |
|           RobertaForCausalLM            | 1  |  770.220176  |
|     M2M100ForConditionalGeneration      | 1  |   716.9386   |
|            YituTechConvBert             | 1  |  683.774669  |
|                CamemBert                | 1  |  680.66679   |
|             BertForMaskedLM             | 1  |  675.105134  |
|           LayoutLMForMaskedLM           | 1  |  674.270913  |
|     PegasusForConditionalGeneration     | 1  |  607.918272  |
|            TrOCRForCausalLM             | 1  |  597.933204  |
|       DebertaForQuestionAnswering       | 1  |  562.319188  |
|    LayoutLMForSequenceClassification    | 1  |  535.644915  |
|        BertForQuestionAnswering         | 1  |  533.309444  |
|       RobertaForQuestionAnswering       | 1  |  533.14879   |
|               DistillGPT2               | 1  |  499.287654  |
|               GoogleFnet                | 1  |  476.563572  |
|       MT5ForConditionalGeneration       | 1  |  396.070409  |
|           PegasusForCausalLM            | 1  |  300.501921  |
| BlenderbotSmallForConditionalGeneration | 1  |  142.33846   |
|           ElectraForCausalLM            | 1  |  129.293192  |
|          DistilBertForMaskedLM          | 1  |  99.877149   |
|       ElectraForQuestionAnswering       | 1  |  89.275523   |
|       BlenderbotSmallForCausalLM        | 1  |  84.352927   |
|          MobileBertForMaskedLM          | 1  |  67.691547   |
|     DistilBertForQuestionAnswering      | 1  |  63.635188   |
|     MobileBertForQuestionAnswering      | 1  |  40.309596   |
|         Speech2Text2ForCausalLM         | 1  |  18.668619   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 2.423375 |
|          pnasnet5large          | 1  | 2.417136 |
|       gluon_inception_v3        | 1  | 2.400176 |
|        adv_inception_v3         | 1  | 2.390242 |
|         mobilenetv2_100         | 1  | 2.253554 |
|          ghostnet_100           | 1  | 2.237203 |
|            lcnet_050            | 1  | 2.233697 |
|           dm_nfnet_f0           | 1  | 2.223377 |
|            nfnet_l0             | 1  | 2.217941 |
|          spnasnet_100           | 1  | 2.206551 |
|            levit_128            | 1  | 2.201324 |
|           mnasnet_100           | 1  | 2.142186 |
|           regnety_002           | 1  | 2.127202 |
|           fbnetc_100            | 1  | 2.104626 |
|      mobilenetv3_large_100      | 1  | 2.057859 |
|            repvgg_a2            | 1  | 1.982406 |
|            fbnetv3_b            | 1  | 1.898379 |
|           rexnet_100            | 1  | 1.867196 |
|       tf_efficientnet_b0        | 1  | 1.857197 |
|            tinynet_a            | 1  | 1.804448 |
|         poolformer_m36          | 1  | 1.800711 |
|           selecsls42b           | 1  | 1.798112 |
|             dla102              | 1  | 1.746081 |
|        ese_vovnet19b_dw         | 1  | 1.742423 |
|           mobilevit_s           | 1  | 1.729222 |
|          botnet26t_256          | 1  | 1.656163 |
|       eca_botnext26ts_256       | 1  | 1.653202 |
|        eca_halonext26ts         | 1  | 1.636653 |
|          cspdarknet53           | 1  | 1.630862 |
|           volo_d1_224           | 1  | 1.607669 |
|        res2net50_14w_8s         | 1  | 1.604255 |
|           res2next50            | 1  | 1.600842 |
|         coat_lite_mini          | 1  | 1.551082 |
|           tf_mixnet_l           | 1  | 1.546286 |
|        res2net101_26w_4s        | 1  | 1.546197 |
|         visformer_small         | 1  | 1.438284 |
|        twins_pcpvt_base         | 1  | 1.429211 |
|           convit_base           | 1  | 1.417697 |
|            mixnet_l             | 1  | 1.407333 |
|          gmixer_24_224          | 1  | 1.394119 |
|          jx_nest_base           | 1  | 1.365181 |
|            gernet_l             | 1  | 1.349041 |
|          resmlp_12_224          | 1  | 1.309365 |
|      beit_base_patch16_224      | 1  | 1.307616 |
|        tnt_s_patch16_224        | 1  | 1.303829 |
|         crossvit_9_240          | 1  | 1.295026 |
|  swin_base_patch4_window7_224   | 1  | 1.292122 |
|          gmlp_s16_224           | 1  | 1.250615 |
|        convmixer_768_32         | 1  | 1.247744 |
|          mixer_b16_224          | 1  | 1.214496 |
| deit_base_distilled_patch16_224 | 1  | 1.21204  |
|      vit_base_patch16_224       | 1  | 1.194582 |
|             dpn107              | 1  | 1.194235 |
|      xcit_large_24_p8_224       | 1  | 1.18928  |
|            pit_b_224            | 1  | 1.185957 |
|          convnext_base          | 1  | 1.178601 |
|          cait_m36_384           | 1  | 1.055116 |
|           resnest101e           | 1  | 0.993262 |
|        sebotnet33ts_256         | 1  | 0.98034  |
|            hrnet_w18            | 1  | 0.621622 |
|     swsl_resnext101_32x16d      | 1  | 0.067316 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|     swsl_resnext101_32x16d      | 1  | 83.493937 |
|          cait_m36_384           | 1  | 55.604663 |
|            hrnet_w18            | 1  |  53.6541  |
|      xcit_large_24_p8_224       | 1  | 51.954762 |
|          pnasnet5large          | 1  | 48.996335 |
|  swin_base_patch4_window7_224   | 1  | 46.892791 |
|         poolformer_m36          | 1  | 43.130751 |
|        twins_pcpvt_base         | 1  | 36.676026 |
|        res2net101_26w_4s        | 1  | 36.196782 |
|          jx_nest_base           | 1  | 35.510858 |
|           resnest101e           | 1  | 34.977301 |
|        tnt_s_patch16_224        | 1  | 34.748641 |
|        res2net50_14w_8s         | 1  | 33.883864 |
|             dpn107              | 1  |  32.058   |
|           mobilevit_s           | 1  | 29.377837 |
|           tf_mixnet_l           | 1  | 29.28685  |
|        adv_inception_v3         | 1  | 27.802735 |
|           volo_d1_224           | 1  | 27.736924 |
|            mixnet_l             | 1  | 27.540614 |
|          gmixer_24_224          | 1  | 25.835551 |
|       gluon_inception_v3        | 1  | 25.670795 |
|          inception_v3           | 1  | 25.631702 |
|         crossvit_9_240          | 1  | 25.323178 |
|            levit_128            | 1  | 24.682448 |
|          gmlp_s16_224           | 1  | 24.435352 |
|           res2next50            | 1  | 23.025554 |
|          convnext_base          | 1  | 22.947603 |
|        eca_halonext26ts         | 1  | 21.853827 |
|            fbnetv3_b            | 1  | 21.663356 |
|        sebotnet33ts_256         | 1  | 21.298936 |
|             dla102              | 1  | 20.838155 |
|         coat_lite_mini          | 1  | 20.357644 |
|          ghostnet_100           | 1  | 20.356798 |
|           rexnet_100            | 1  | 20.045461 |
|           convit_base           | 1  | 18.986393 |
|       eca_botnext26ts_256       | 1  | 17.956079 |
|            tinynet_a            | 1  | 17.721608 |
|         visformer_small         | 1  | 17.522918 |
|        convmixer_768_32         | 1  | 17.040473 |
|          botnet26t_256          | 1  | 17.006092 |
|            pit_b_224            | 1  | 16.957186 |
|       tf_efficientnet_b0        | 1  | 16.539997 |
|          cspdarknet53           | 1  | 16.424493 |
|          mixer_b16_224          | 1  | 16.282709 |
|      beit_base_patch16_224      | 1  | 16.160224 |
|           dm_nfnet_f0           | 1  | 16.032528 |
| deit_base_distilled_patch16_224 | 1  | 14.982217 |
|      vit_base_patch16_224       | 1  | 14.879501 |
|           regnety_002           | 1  | 14.644958 |
|            repvgg_a2            | 1  | 14.63484  |
|           fbnetc_100            | 1  | 14.589661 |
|            nfnet_l0             | 1  | 14.586885 |
|      mobilenetv3_large_100      | 1  | 14.529385 |
|          spnasnet_100           | 1  |  14.4973  |
|            gernet_l             | 1  | 14.072846 |
|         mobilenetv2_100         | 1  | 13.413277 |
|        ese_vovnet19b_dw         | 1  | 13.245033 |
|           mnasnet_100           | 1  | 13.230139 |
|           selecsls42b           | 1  | 12.664034 |
|          resmlp_12_224          | 1  | 12.086077 |
|            lcnet_050            | 1  | 11.553792 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.950755 |
|        convmixer_768_32         | 1  | 0.905581 |
|            nfnet_l0             | 1  |  0.8937  |
|          pnasnet5large          | 1  | 0.893014 |
|      xcit_large_24_p8_224       | 1  | 0.88815  |
|        ese_vovnet19b_dw         | 1  | 0.87857  |
|       eca_botnext26ts_256       | 1  | 0.87365  |
|         mobilenetv2_100         | 1  | 0.87066  |
|        eca_halonext26ts         | 1  | 0.868462 |
|           mnasnet_100           | 1  | 0.865565 |
|       tf_efficientnet_b0        | 1  | 0.862799 |
|            lcnet_050            | 1  | 0.858982 |
|          spnasnet_100           | 1  | 0.857133 |
|           dm_nfnet_f0           | 1  | 0.853634 |
|            tinynet_a            | 1  | 0.853078 |
|      mobilenetv3_large_100      | 1  | 0.850751 |
|          botnet26t_256          | 1  | 0.849629 |
|           fbnetc_100            | 1  | 0.847731 |
|           rexnet_100            | 1  | 0.847431 |
|            fbnetv3_b            | 1  | 0.844053 |
|           mobilevit_s           | 1  | 0.838648 |
|           tf_mixnet_l           | 1  | 0.837824 |
|          cspdarknet53           | 1  | 0.835642 |
|          ghostnet_100           | 1  | 0.831619 |
|           regnety_002           | 1  | 0.82809  |
|          resmlp_12_224          | 1  | 0.823562 |
|         coat_lite_mini          | 1  | 0.823102 |
|        sebotnet33ts_256         | 1  | 0.819653 |
|         visformer_small         | 1  | 0.819588 |
|            mixnet_l             | 1  | 0.818249 |
|             dpn107              | 1  | 0.812406 |
|            levit_128            | 1  | 0.804949 |
|         poolformer_m36          | 1  | 0.802811 |
|          convnext_base          | 1  | 0.800885 |
|          gmlp_s16_224           | 1  | 0.797042 |
|             dla102              | 1  | 0.795942 |
|           resnest101e           | 1  | 0.794992 |
|           res2next50            | 1  | 0.794897 |
|        adv_inception_v3         | 1  | 0.794877 |
|       gluon_inception_v3        | 1  | 0.794751 |
|          inception_v3           | 1  | 0.794548 |
|           volo_d1_224           | 1  | 0.788649 |
|          gmixer_24_224          | 1  | 0.786895 |
|           selecsls42b           | 1  | 0.786674 |
|        tnt_s_patch16_224        | 1  | 0.783367 |
|         crossvit_9_240          | 1  | 0.783069 |
|           convit_base           | 1  | 0.781228 |
|        twins_pcpvt_base         | 1  | 0.779251 |
|          jx_nest_base           | 1  | 0.779121 |
|          mixer_b16_224          | 1  | 0.775494 |
|        res2net50_14w_8s         | 1  | 0.774311 |
|      beit_base_patch16_224      | 1  | 0.772666 |
|            hrnet_w18            | 1  | 0.768248 |
|      vit_base_patch16_224       | 1  | 0.764094 |
| deit_base_distilled_patch16_224 | 1  | 0.763976 |
|            pit_b_224            | 1  | 0.754274 |
|  swin_base_patch4_window7_224   | 1  | 0.741557 |
|            repvgg_a2            | 1  | 0.739639 |
|        res2net101_26w_4s        | 1  | 0.735486 |
|            gernet_l             | 1  | 0.73134  |
|     swsl_resnext101_32x16d      | 1  | 0.653624 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13587.591399 |
|          cait_m36_384           | 1  | 3444.389088  |
|      xcit_large_24_p8_224       | 1  | 1566.248376  |
|           resnest101e           | 1  | 1088.083419  |
|          pnasnet5large          | 1  |  372.281962  |
|          convnext_base          | 1  |  305.242916  |
|            hrnet_w18            | 1  |  292.390673  |
|             dpn107              | 1  |  269.995907  |
|        convmixer_768_32         | 1  |  251.097209  |
|          jx_nest_base           | 1  |  248.572736  |
|  swin_base_patch4_window7_224   | 1  |  224.662817  |
|      vit_base_patch16_224       | 1  |  199.056048  |
|           convit_base           | 1  |  198.392832  |
|      beit_base_patch16_224      | 1  |  197.253638  |
| deit_base_distilled_patch16_224 | 1  |  196.91288   |
|            pit_b_224            | 1  |  168.085251  |
|           dm_nfnet_f0           | 1  |  160.124452  |
|          mixer_b16_224          | 1  |  143.03031   |
|         poolformer_m36          | 1  |  120.889787  |
|        res2net101_26w_4s        | 1  |  112.361093  |
|        twins_pcpvt_base         | 1  |  104.625887  |
|        sebotnet33ts_256         | 1  |  98.788383   |
|            nfnet_l0             | 1  |  95.019655   |
|           volo_d1_224           | 1  |   92.08364   |
|             dla102              | 1  |   90.94901   |
|        tnt_s_patch16_224        | 1  |  90.627893   |
|          cspdarknet53           | 1  |  84.254242   |
|       gluon_inception_v3        | 1  |   69.91098   |
|          inception_v3           | 1  |  69.865051   |
|        adv_inception_v3         | 1  |  69.629919   |
|          gmlp_s16_224           | 1  |  68.519625   |
|         visformer_small         | 1  |  67.705445   |
|          gmixer_24_224          | 1  |  65.593873   |
|        res2net50_14w_8s         | 1  |  64.792413   |
|            repvgg_a2            | 1  |  64.532704   |
|           res2next50            | 1  |  59.395009   |
|            gernet_l             | 1  |  57.176285   |
|          botnet26t_256          | 1  |  46.274331   |
|           selecsls42b           | 1  |  45.881766   |
|        eca_halonext26ts         | 1  |  43.198422   |
|       eca_botnext26ts_256       | 1  |  41.880795   |
|         coat_lite_mini          | 1  |  37.964919   |
|           mobilevit_s           | 1  |   36.65732   |
|          resmlp_12_224          | 1  |  36.254181   |
|         crossvit_9_240          | 1  |  31.612699   |
|        ese_vovnet19b_dw         | 1  |  31.576617   |
|            mixnet_l             | 1  |   30.06072   |
|           tf_mixnet_l           | 1  |  29.223361   |
|            fbnetv3_b            | 1  |   15.5165    |
|       tf_efficientnet_b0        | 1  |   13.55603   |
|           rexnet_100            | 1  |   13.3325    |
|            tinynet_a            | 1  |  11.678935   |
|           fbnetc_100            | 1  |   9.267581   |
|            levit_128            | 1  |   8.907741   |
|          ghostnet_100           | 1  |   8.53178    |
|          spnasnet_100           | 1  |   8.147922   |
|           mnasnet_100           | 1  |   7.548489   |
|         mobilenetv2_100         | 1  |   7.396658   |
|      mobilenetv3_large_100      | 1  |   7.037221   |
|           regnety_002           | 1  |   6.03456    |
|            lcnet_050            | 1  |   2.38392    |
+---------------------------------+----+--------------+

zxd1997066 · 2024-10-28T22:40:14Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2024-10-27 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
CORES=$(lscpu | grep Core | awk '{print $4}')
export OMP_NUM_THREADS=$CORES

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--node_id 0" --devices=cpu --dtypes=float32 --inference --compilers=inductor --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 80/80 | 100%, 46/46 | 100%, 61/61 |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.50x    |    1.31x    |    1.83x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   30.68    |    34.54    |    35.89    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.91x    |    0.98x    |    0.99x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_equation_of_state     | 1048576 | 15.915542 |
|           mnasnet1_0            |   32    | 2.979709  |
|          squeezenet1_1          |   16    | 2.934168  |
|        timm_efficientnet        |   64    | 2.929544  |
|       mobilenet_v3_large        |   32    | 2.918362  |
|          mobilenet_v2           |   16    | 2.731795  |
|       shufflenet_v2_x1_0        |   64    | 2.517812  |
|          timm_resnest           |   32    | 2.331783  |
|            resnet50             |   32    | 2.284342  |
|        phlippe_densenet         |   128   | 2.117575  |
|       doctr_det_predictor       |    1    | 2.115327  |
|             hf_GPT2             |    1    | 2.084509  |
|            resnet152            |   32    | 2.055172  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 2.043164  |
|           densenet121           |   64    | 2.000261  |
|         resnext50_32x4d         |    8    | 1.925885  |
|           timm_nfnet            |   128   | 1.889801  |
|           timm_regnet           |   32    |  1.88739  |
|         phlippe_resnet          |   128   | 1.883371  |
|          hf_Bert_large          |    1    | 1.766821  |
|            resnet18             |    8    | 1.726147  |
|      doctr_reco_predictor       |    1    | 1.725766  |
|            hf_Albert            |    1    | 1.706224  |
|          fastNLP_Bert           |    1    | 1.642077  |
|          hf_GPT2_large          |    1    | 1.620798  |
|             alexnet             |   128   | 1.617583  |
|     functorch_maml_omniglot     |    1    | 1.609483  |
|              llama              |   32    | 1.608872  |
|        basic_gnn_edgecnn        |    1    | 1.594848  |
|           timm_vovnet           |   32    | 1.569737  |
|             yolov3              |    8    | 1.566012  |
|             hf_Bert             |    1    | 1.555721  |
|            moondream            |    1    | 1.544206  |
|         LearningToPaint         |   96    | 1.482294  |
|          hf_Longformer          |    1    | 1.464755  |
|          hf_DistilBert          |    1    | 1.425654  |
|         vision_maskrcnn         |    1    | 1.369397  |
|             hf_Bart             |    1    | 1.363327  |
|          BERT_pytorch           |    2    | 1.351892  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.342639  |
|           hf_T5_large           |    1    | 1.340624  |
|              dcgan              |   256   | 1.331333  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.328375  |
|          basic_gnn_gcn          |    1    | 1.319036  |
|           hf_Reformer           |    1    | 1.317192  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.287626  |
|         basic_gnn_sage          |    1    |  1.28063  |
|              vgg16              |    4    | 1.279983  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1.263509  |
|          pytorch_unet           |    1    | 1.261437  |
|              hf_T5              |    1    |  1.25612  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.226605  |
|           hf_BigBird            |    1    | 1.220028  |
|              dlrm               |  2048   |  1.21089  |
|         pytorch_stargan         |   16    | 1.210594  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.209937  |
|          maml_omniglot          |    5    |  1.20507  |
|               drq               |    1    | 1.203587  |
|        hf_distil_whisper        |    1    | 1.197071  |
|          basic_gnn_gin          |    1    | 1.185712  |
|       Background_Matting        |    1    | 1.155855  |
|       speech_transformer        |    1    | 1.151691  |
|           Super_SloMo           |    6    | 1.135295  |
|     timm_vision_transformer     |   32    | 1.104928  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.093711  |
|     nvidia_deeprecommender      |   256   | 1.090195  |
|        soft_actor_critic        |   256   | 1.083615  |
|          lennard_jones          |  1000   | 1.075169  |
|      torch_multimodal_clip      |   32    | 1.067461  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.051454  |
|         opacus_cifar10          |   64    | 1.049904  |
|             demucs              |    1    | 1.013081  |
|  timm_vision_transformer_large  |   32    | 1.011647  |
|   mobilenet_v2_quantized_qat    |   96    |  1.00519  |
|           tts_angular           |   64    | 0.998114  |
|     resnet50_quantized_qat      |   32    | 0.996736  |
|      functorch_dp_cifar10       |   64    | 0.954149  |
|           hf_T5_base            |    1    | 0.936369  |
|              maml               |    1    | 0.792157  |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.741422  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    4    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    4    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    4    |  pass_due_to_skip  |
|        hf_distil_whisper        |    4    |        pass        |
|         LearningToPaint         |    4    |        pass        |
|           Super_SloMo           |    4    |        pass        |
|             alexnet             |    4    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    4    |        pass        |
|              dcgan              |    4    |        pass        |
|              dlrm               |    4    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    4    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    4    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    4    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    4    |        pass        |
|    detectron2_fcos_r_50_fpn     |    4    |        pass        |
|          hf_Bert_large          |    4    |        pass        |
|       doctr_det_predictor       |    4    |        pass        |
|      doctr_reco_predictor       |    4    |        pass        |
|             yolov3              |    4    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    4    |        pass        |
|      functorch_dp_cifar10       |    4    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    4    |        pass        |
|             hf_Bart             |    4    |        pass        |
|             hf_Bert             |    4    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    4    |        pass        |
|          hf_DistilBert          |    4    |        pass        |
|             hf_GPT2             |    2    |        pass        |
|          hf_Longformer          |    4    |        pass        |
|           hf_Reformer           |    4    |        pass        |
|              hf_T5              |    4    |        pass        |
|           hf_T5_base            |    4    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    4    |        pass        |
|            resnet18             |    4    |        pass        |
|              llama              |    4    |        pass        |
|         opacus_cifar10          |    4    |        pass        |
|            moondream            |    4    |        pass        |
|   mobilenet_v2_quantized_qat    |    4    |        pass        |
|       mobilenet_v3_large        |    4    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    2    |        pass        |
|     nvidia_deeprecommender      |    4    |        pass        |
|          mobilenet_v2           |    4    |        pass        |
|         phlippe_resnet          |    4    |        pass        |
|          lennard_jones          |    4    |        pass        |
|     pyhpc_equation_of_state     |    4    |        pass        |
|     pyhpc_isoneutral_mixing     |    4    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    4    |        pass        |
|           mnasnet1_0            |    4    |        pass        |
|          BERT_pytorch           |    4    |        pass        |
|            resnet152            |    4    |        pass        |
|           timm_regnet           |    4    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    4    |        pass        |
|         resnext50_32x4d         |    4    |        pass        |
|       shufflenet_v2_x1_0        |    4    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    4    |        pass        |
|        timm_efficientnet        |    4    |        pass        |
|            resnet50             |    4    |        pass        |
|           timm_nfnet            |    4    |        pass        |
|          timm_resnest           |    4    |        pass        |
|     timm_vision_transformer     |    4    |        pass        |
|           timm_vovnet           |    4    |        pass        |
|      torch_multimodal_clip      |    4    |        pass        |
|           tts_angular           |    4    |        pass        |
|              vgg16              |    4    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 153.968808 |
|         vision_maskrcnn         |    1    | 146.681381 |
|    detectron2_fcos_r_50_fpn     |    1    | 121.177966 |
| detectron2_fasterrcnn_r_101_fpn |    1    | 112.939468 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 104.333009 |
|           hf_T5_large           |    1    | 85.175097  |
|              maml               |    1    | 81.597902  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 62.384668  |
|          hf_Longformer          |    1    | 58.924085  |
|       speech_transformer        |    1    |  52.10291  |
|           densenet121           |   64    | 51.968859  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 50.342683  |
|  timm_vision_transformer_large  |   32    |  49.90946  |
| detectron2_fasterrcnn_r_50_dc5  |    1    |  48.6778   |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 47.276066  |
|           hf_Reformer           |    1    | 43.436268  |
|      torch_multimodal_clip      |   32    | 39.453516  |
|           Super_SloMo           |    6    | 38.496851  |
|     pyhpc_isoneutral_mixing     | 1048576 | 38.064848  |
|           hf_T5_base            |    1    | 37.808505  |
|          hf_GPT2_large          |    1    | 37.335707  |
|            resnet152            |   32    |  36.43239  |
|            moondream            |    1    | 35.793668  |
|          fastNLP_Bert           |    1    | 30.911648  |
|          hf_Bert_large          |    1    | 30.667262  |
|        hf_distil_whisper        |    1    | 30.120745  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 29.840287  |
|          basic_gnn_gcn          |    1    | 29.648573  |
|          BERT_pytorch           |    2    | 29.470579  |
|             yolov3              |    8    | 29.439997  |
|       doctr_det_predictor       |    1    | 28.939253  |
|           timm_regnet           |   32    | 23.886454  |
|        timm_efficientnet        |   64    | 23.158106  |
|       shufflenet_v2_x1_0        |   64    | 22.576524  |
|       mobilenet_v3_large        |   32    | 22.517663  |
|        phlippe_densenet         |   128   | 22.472621  |
|             hf_Bart             |    1    | 21.759763  |
|              hf_T5              |    1    | 21.563826  |
|          mobilenet_v2           |   16    | 20.368226  |
|           timm_nfnet            |   128   | 19.412834  |
|     timm_vision_transformer     |   32    | 19.108245  |
|          timm_resnest           |   32    | 18.600921  |
|           timm_vovnet           |   32    |  18.35729  |
|         resnext50_32x4d         |    8    | 17.915415  |
|             demucs              |    1    | 17.861973  |
|             hf_Bert             |    1    | 17.744312  |
|            resnet50             |   32    | 17.544471  |
|       Background_Matting        |    1    | 17.185043  |
|             hf_GPT2             |    1    |  16.57812  |
|           mnasnet1_0            |   32    | 16.547782  |
|              llama              |   32    | 16.482416  |
|         pytorch_stargan         |   16    | 16.360701  |
|         opacus_cifar10          |   64    |  16.07343  |
|            hf_Albert            |    1    |  15.69214  |
|      functorch_dp_cifar10       |   64    | 15.470994  |
|      doctr_reco_predictor       |    1    | 14.400281  |
|          hf_DistilBert          |    1    | 13.268417  |
|          pytorch_unet           |    1    | 13.154002  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 12.020219  |
|            resnet18             |    8    | 11.962816  |
|          squeezenet1_1          |   16    | 11.610504  |
|         LearningToPaint         |   96    | 11.562393  |
|         phlippe_resnet          |   128   | 11.089258  |
|              vgg16              |    4    | 10.187304  |
|     pyhpc_equation_of_state     | 1048576 |  9.965898  |
|             alexnet             |   128   |   9.8354   |
|          maml_omniglot          |    5    |  9.111908  |
|     functorch_maml_omniglot     |    1    |  8.533789  |
|               drq               |    1    |  8.407611  |
|              dlrm               |  2048   |  8.32098   |
|              dcgan              |   256   |  8.119024  |
|        basic_gnn_edgecnn        |    1    |  7.963426  |
|          basic_gnn_gin          |    1    |   7.9034   |
|         basic_gnn_sage          |    1    |  7.851027  |
|     nvidia_deeprecommender      |   256   |  7.810605  |
|        soft_actor_critic        |   256   |  7.55046   |
|          lennard_jones          |  1000   |  7.435201  |
|           tts_angular           |   64    |  7.286942  |
|   mobilenet_v2_quantized_qat    |   96    |  0.139555  |
|     resnet50_quantized_qat      |   32    |  0.101268  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|  timm_vision_transformer_large  |   32    | 0.995808 |
|           timm_nfnet            |   128   | 0.991832 |
|              dlrm               |  2048   | 0.988015 |
|           hf_T5_base            |    1    | 0.987451 |
|          hf_GPT2_large          |    1    | 0.985248 |
|       Background_Matting        |    1    | 0.981502 |
|        timm_efficientnet        |   64    | 0.98089  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.979822 |
|     nvidia_deeprecommender      |   256   | 0.977842 |
|             demucs              |    1    | 0.976838 |
|           densenet121           |   64    | 0.976064 |
|           Super_SloMo           |    6    | 0.975605 |
|           timm_regnet           |   32    | 0.974676 |
|        basic_gnn_edgecnn        |    1    | 0.973618 |
|          pytorch_unet           |    1    | 0.972319 |
|      torch_multimodal_clip      |   32    | 0.972062 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.971184 |
|             yolov3              |    8    | 0.968207 |
|            resnet50             |   32    | 0.967937 |
|          timm_resnest           |   32    | 0.963257 |
|            resnet152            |   32    | 0.963232 |
|     timm_vision_transformer     |   32    | 0.962399 |
|         LearningToPaint         |   96    | 0.96135  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 0.959851 |
|           timm_vovnet           |   32    | 0.959227 |
|   mobilenet_v2_quantized_qat    |   96    | 0.957854 |
|     resnet50_quantized_qat      |   32    | 0.956932 |
|              vgg16              |    4    | 0.956545 |
|       doctr_det_predictor       |    1    | 0.955219 |
|           mnasnet1_0            |   32    | 0.954821 |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.953483 |
|             alexnet             |   128   | 0.952964 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.95203  |
|         vision_maskrcnn         |    1    | 0.948082 |
|         pytorch_stargan         |   16    | 0.946167 |
|           hf_BigBird            |    1    | 0.946014 |
|       mobilenet_v3_large        |   32    | 0.945133 |
|       shufflenet_v2_x1_0        |   64    | 0.943357 |
|          basic_gnn_gcn          |    1    | 0.942857 |
|          mobilenet_v2           |   16    | 0.942367 |
|        phlippe_densenet         |   128   | 0.940262 |
|         resnext50_32x4d         |    8    | 0.938329 |
|      doctr_reco_predictor       |    1    | 0.936956 |
|              llama              |   32    | 0.929367 |
|           tts_angular           |   64    | 0.926693 |
|        hf_distil_whisper        |    1    | 0.918057 |
|          BERT_pytorch           |    2    | 0.916145 |
|              dcgan              |   256   | 0.910702 |
|          squeezenet1_1          |   16    | 0.910059 |
|     pyhpc_equation_of_state     | 1048576 | 0.908915 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.907529 |
|        soft_actor_critic        |   256   | 0.887884 |
|            resnet18             |    8    | 0.886671 |
|         phlippe_resnet          |   128   | 0.886057 |
|         opacus_cifar10          |   64    | 0.883721 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.881458 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.88056  |
|         basic_gnn_sage          |    1    | 0.866168 |
|          lennard_jones          |  1000   | 0.865922 |
|          basic_gnn_gin          |    1    | 0.864309 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.859469 |
|          maml_omniglot          |    5    | 0.857292 |
|     functorch_maml_omniglot     |    1    | 0.854015 |
|          fastNLP_Bert           |    1    | 0.852467 |
|            moondream            |    1    | 0.82414  |
|      functorch_dp_cifar10       |   64    | 0.821157 |
|          hf_Bert_large          |    1    | 0.809367 |
|          hf_Longformer          |    1    | 0.803068 |
|       speech_transformer        |    1    | 0.798533 |
|           hf_T5_large           |    1    | 0.795265 |
|             hf_Bert             |    1    | 0.790597 |
|            hf_Albert            |    1    | 0.788261 |
|              hf_T5              |    1    | 0.779053 |
|               drq               |    1    | 0.767609 |
|             hf_Bart             |    1    | 0.766888 |
|          hf_DistilBert          |    1    | 0.766489 |
|             hf_GPT2             |    1    | 0.757283 |
|           hf_Reformer           |    1    | 0.726275 |
|              maml               |    1    | 0.715216 |
|     pyhpc_isoneutral_mixing     | 1048576 | 0.698718 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+-------------+
|              name               |   bs    |  inductor   |
+---------------------------------+---------+-------------+
|  timm_vision_transformer_large  |   32    | 4499.335781 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1375.713074 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 1246.340409 |
|           hf_T5_base            |    1    | 1186.257111 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1115.729101 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1011.291435 |
|           Super_SloMo           |    6    | 608.576893  |
|           timm_nfnet            |   128   | 532.151448  |
|          hf_GPT2_large          |    1    | 521.064285  |
|            moondream            |    1    | 392.882406  |
|           hf_T5_large           |    1    | 386.822385  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 379.340446  |
|        hf_distil_whisper        |    1    | 350.182551  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 327.286512  |
|         vision_maskrcnn         |    1    | 300.322707  |
|       Background_Matting        |    1    | 246.092075  |
|           timm_regnet           |   32    | 219.143218  |
|          pytorch_unet           |    1    | 215.896227  |
|           densenet121           |   64    | 189.000707  |
|            resnet152            |   32    |  188.94704  |
|      torch_multimodal_clip      |   32    | 185.290616  |
|             yolov3              |    8    | 165.570135  |
|    detectron2_fcos_r_50_fpn     |    1    | 164.215139  |
|             demucs              |    1    | 145.589029  |
|           hf_BigBird            |    1    | 137.556893  |
|           timm_vovnet           |   32    | 124.534824  |
|     timm_vision_transformer     |   32    | 103.151446  |
|          hf_Bert_large          |    1    | 102.600082  |
|         pytorch_stargan         |   16    |  95.660608  |
|       doctr_det_predictor       |    1    |  80.625942  |
|            resnet50             |   32    |  74.611481  |
|          hf_Longformer          |    1    |  73.441688  |
|          timm_resnest           |   32    |  53.780557  |
|              maml               |    1    |  53.105544  |
|             hf_Bart             |    1    |  52.555093  |
|       speech_transformer        |    1    |  51.315628  |
|        timm_efficientnet        |   64    |  50.342515  |
|             alexnet             |   128   |  43.788048  |
|              hf_T5              |    1    |  41.921276  |
|   mobilenet_v2_quantized_qat    |   96    |  41.611539  |
|             hf_Bert             |    1    |  40.628889  |
|              vgg16              |    4    |  37.248019  |
|         LearningToPaint         |   96    |  37.122835  |
|            hf_Albert            |    1    |  34.779802  |
|     nvidia_deeprecommender      |   256   |  34.157905  |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  33.116483  |
|          fastNLP_Bert           |    1    |  32.938619  |
|     pyhpc_isoneutral_mixing     | 1048576 |  31.236282  |
|           hf_Reformer           |    1    |  28.971376  |
|     resnet50_quantized_qat      |   32    |  28.198756  |
|          BERT_pytorch           |    2    |  27.705792  |
|          hf_DistilBert          |    1    |  26.113938  |
|         resnext50_32x4d         |    8    |  24.896312  |
|             hf_GPT2             |    1    |  23.397822  |
|        phlippe_densenet         |   128   |  21.498977  |
|              llama              |   32    |  21.18371   |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  20.503168  |
|           tts_angular           |   64    |  20.191272  |
|              dcgan              |   256   |  19.368686  |
|        basic_gnn_edgecnn        |    1    |  17.800199  |
|       shufflenet_v2_x1_0        |   64    |  17.355058  |
|           mnasnet1_0            |   32    |  14.980981  |
|       mobilenet_v3_large        |   32    |  14.401361  |
|            resnet18             |    8    |  10.279493  |
|          mobilenet_v2           |   16    |  10.187443  |
|          basic_gnn_gcn          |    1    |  9.771767   |
|         opacus_cifar10          |   64    |  7.911811   |
|      functorch_dp_cifar10       |   64    |  7.586423   |
|              dlrm               |  2048   |  6.933613   |
|          squeezenet1_1          |   16    |  6.486681   |
|         basic_gnn_sage          |    1    |  5.107396   |
|         phlippe_resnet          |   128   |   4.79137   |
|          basic_gnn_gin          |    1    |  4.784145   |
|      doctr_reco_predictor       |    1    |  3.693735   |
|     pyhpc_equation_of_state     | 1048576 |   0.93603   |
|               drq               |    1    |  0.851413   |
|        soft_actor_critic        |   256   |  0.618593   |
|          maml_omniglot          |    5    |  0.580998   |
|     functorch_maml_omniglot     |    1    |  0.475989   |
|          lennard_jones          |  1000   |  0.292277   |
|        timm_efficientdet        |    0    |     0.0     |
+---------------------------------+---------+-------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|            XLNetLMHeadModel             |  8  | 5.189125 |
|     MobileBertForQuestionAnswering      | 128 | 1.941489 |
|      GPT2ForSequenceClassification      |  4  | 1.856297 |
|           ElectraForCausalLM            | 32  | 1.76089  |
|       ElectraForQuestionAnswering       | 64  | 1.745712 |
|          MobileBertForMaskedLM          | 128 | 1.622522 |
|               DistillGPT2               | 16  | 1.526496 |
|      DebertaV2ForQuestionAnswering      |  1  | 1.485629 |
|       RobertaForQuestionAnswering       | 16  | 1.451664 |
|    LayoutLMForSequenceClassification    | 16  | 1.441287 |
|        BertForQuestionAnswering         | 16  | 1.427881 |
|           RobertaForCausalLM            | 16  | 1.404619 |
|            YituTechConvBert             | 16  | 1.400042 |
|               GoogleFnet                | 16  | 1.380353 |
|           LayoutLMForMaskedLM           | 16  | 1.370962 |
|             BertForMaskedLM             | 16  | 1.362428 |
|                CamemBert                | 16  | 1.358066 |
|    MegatronBertForQuestionAnswering     |  8  | 1.35125  |
|          AllenaiLongformerBase          |  4  | 1.336219 |
|       DebertaForQuestionAnswering       | 16  | 1.333844 |
|         MegatronBertForCausalLM         |  4  | 1.301362 |
|           DebertaForMaskedLM            |  8  | 1.291091 |
|     PLBartForConditionalGeneration      |  4  | 1.251122 |
|      MBartForConditionalGeneration      |  2  | 1.224621 |
|             OPTForCausalLM              |  2  | 1.193598 |
| BlenderbotSmallForConditionalGeneration | 64  | 1.175592 |
|          DebertaV2ForMaskedLM           |  2  | 1.161587 |
|       MT5ForConditionalGeneration       | 16  | 1.154238 |
|                 T5Small                 |  4  | 1.144475 |
|            AlbertForMaskedLM            |  4  | 1.134698 |
|       T5ForConditionalGeneration        |  4  | 1.134108 |
|       AlbertForQuestionAnswering        |  4  | 1.132998 |
|          DistilBertForMaskedLM          | 128 | 1.113794 |
|             XGLMForCausalLM             |  8  | 1.106844 |
|         Speech2Text2ForCausalLM         | 256 | 1.09891  |
|       BlenderbotSmallForCausalLM        | 64  | 1.098183 |
|     M2M100ForConditionalGeneration      | 16  | 1.091962 |
|     DistilBertForQuestionAnswering      | 256 | 1.089168 |
|      BartForConditionalGeneration       |  2  | 1.080814 |
|     PegasusForConditionalGeneration     | 32  | 1.062072 |
|            PLBartForCausalLM            |  8  | 1.059874 |
|            MBartForCausalLM             |  4  | 1.053915 |
|           PegasusForCausalLM            | 32  | 1.03952  |
|            TrOCRForCausalLM             | 32  | 1.039052 |
|          BlenderbotForCausalLM          |  4  | 1.03785  |
|             BartForCausalLM             |  4  | 1.030299 |
+-----------------------------------------+-----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+
|                  name                   | bs  | inductor  |
+-----------------------------------------+-----+-----------+
|          AllenaiLongformerBase          |  4  | 138.92611 |
|     MobileBertForQuestionAnswering      | 128 | 77.556635 |
|          MobileBertForMaskedLM          | 128 | 77.311125 |
|     M2M100ForConditionalGeneration      | 16  | 57.426605 |
|      MBartForConditionalGeneration      |  2  | 57.212364 |
|     PegasusForConditionalGeneration     | 32  | 57.053843 |
|      BartForConditionalGeneration       |  2  | 49.451015 |
|             XGLMForCausalLM             |  8  | 47.673781 |
|          BlenderbotForCausalLM          |  4  | 46.989167 |
|          DebertaV2ForMaskedLM           |  2  | 45.912441 |
|       MT5ForConditionalGeneration       | 16  | 42.013198 |
|         MegatronBertForCausalLM         |  4  | 39.804776 |
| BlenderbotSmallForConditionalGeneration | 64  | 39.446197 |
|    MegatronBertForQuestionAnswering     |  8  | 39.253496 |
|            YituTechConvBert             | 16  | 36.999873 |
|            XLNetLMHeadModel             |  8  | 35.194643 |
|      DebertaV2ForQuestionAnswering      |  1  | 34.597755 |
|     PLBartForConditionalGeneration      |  4  | 31.910595 |
|       T5ForConditionalGeneration        |  4  | 31.51746  |
|                 T5Small                 |  4  | 31.26168  |
|           DebertaForMaskedLM            |  8  | 27.220114 |
|       DebertaForQuestionAnswering       | 16  | 26.803608 |
|            MBartForCausalLM             |  4  | 26.688162 |
|           PegasusForCausalLM            | 32  | 26.609823 |
|             OPTForCausalLM              |  2  | 26.39355  |
|            TrOCRForCausalLM             | 32  | 26.01068  |
|       RobertaForQuestionAnswering       | 16  | 22.774855 |
|           ElectraForCausalLM            | 32  | 22.730202 |
|       ElectraForQuestionAnswering       | 64  | 22.720569 |
|           LayoutLMForMaskedLM           | 16  | 22.656873 |
|                CamemBert                | 16  | 22.629005 |
|        BertForQuestionAnswering         | 16  | 22.619667 |
|            AlbertForMaskedLM            |  4  | 22.49294  |
|           RobertaForCausalLM            | 16  | 22.481685 |
|             BertForMaskedLM             | 16  | 22.428634 |
|    LayoutLMForSequenceClassification    | 16  | 22.285657 |
|             BartForCausalLM             |  4  | 22.171029 |
|      GPT2ForSequenceClassification      |  4  | 21.942831 |
|       AlbertForQuestionAnswering        |  4  | 20.222322 |
|       BlenderbotSmallForCausalLM        | 64  | 20.211522 |
|     DistilBertForQuestionAnswering      | 256 | 17.520758 |
|          DistilBertForMaskedLM          | 128 | 17.470052 |
|               GoogleFnet                | 16  | 17.315654 |
|         Speech2Text2ForCausalLM         | 256 | 17.262307 |
|            PLBartForCausalLM            |  8  | 17.175724 |
|               DistillGPT2               | 16  | 14.493041 |
+-----------------------------------------+-----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+
|                  name                   | bs  | inductor |
+-----------------------------------------+-----+----------+
|       AlbertForQuestionAnswering        |  4  | 0.993606 |
|            AlbertForMaskedLM            |  4  | 0.993237 |
|               DistillGPT2               | 16  | 0.99215  |
|     DistilBertForQuestionAnswering      | 256 | 0.992041 |
|            TrOCRForCausalLM             | 32  | 0.991356 |
|           RobertaForCausalLM            | 16  | 0.99083  |
|             OPTForCausalLM              |  2  | 0.990799 |
|          DistilBertForMaskedLM          | 128 | 0.990218 |
|            PLBartForCausalLM            |  8  | 0.989822 |
|               GoogleFnet                | 16  | 0.989681 |
|           ElectraForCausalLM            | 32  | 0.989673 |
|             BertForMaskedLM             | 16  | 0.989291 |
|       ElectraForQuestionAnswering       | 64  | 0.989196 |
|                CamemBert                | 16  | 0.98918  |
|    MegatronBertForQuestionAnswering     |  8  | 0.98911  |
|           LayoutLMForMaskedLM           | 16  | 0.989017 |
|            MBartForCausalLM             |  4  | 0.988681 |
|     PegasusForConditionalGeneration     | 32  | 0.98758  |
|       DebertaForQuestionAnswering       | 16  | 0.987363 |
|            YituTechConvBert             | 16  | 0.987092 |
|        BertForQuestionAnswering         | 16  | 0.987027 |
|         Speech2Text2ForCausalLM         | 256 | 0.986654 |
|       RobertaForQuestionAnswering       | 16  | 0.986336 |
|    LayoutLMForSequenceClassification    | 16  | 0.986149 |
|     PLBartForConditionalGeneration      |  4  | 0.98603  |
|      MBartForConditionalGeneration      |  2  | 0.985457 |
| BlenderbotSmallForConditionalGeneration | 64  | 0.985218 |
|          BlenderbotForCausalLM          |  4  |  0.9851  |
|           PegasusForCausalLM            | 32  | 0.985015 |
|             BartForCausalLM             |  4  | 0.984808 |
|      GPT2ForSequenceClassification      |  4  | 0.984142 |
|          DebertaV2ForMaskedLM           |  2  | 0.983808 |
|       BlenderbotSmallForCausalLM        | 64  | 0.98354  |
|         MegatronBertForCausalLM         |  4  | 0.983393 |
|           DebertaForMaskedLM            |  8  | 0.983345 |
|          MobileBertForMaskedLM          | 128 | 0.981905 |
|            XLNetLMHeadModel             |  8  | 0.981596 |
|       MT5ForConditionalGeneration       | 16  | 0.980005 |
|      BartForConditionalGeneration       |  2  | 0.979951 |
|                 T5Small                 |  4  | 0.97874  |
|       T5ForConditionalGeneration        |  4  | 0.97851  |
|     MobileBertForQuestionAnswering      | 128 | 0.973822 |
|             XGLMForCausalLM             |  8  | 0.97118  |
|          AllenaiLongformerBase          |  4  | 0.970803 |
|     M2M100ForConditionalGeneration      | 16  | 0.953696 |
|      DebertaV2ForQuestionAnswering      |  1  | 0.877954 |
+-----------------------------------------+-----+----------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------+
|                  name                   | bs  |  inductor   |
+-----------------------------------------+-----+-------------+
|            AlbertForMaskedLM            |  4  | 2690.326751 |
|       AlbertForQuestionAnswering        |  4  | 2676.778934 |
|            XLNetLMHeadModel             |  8  | 1414.128797 |
|            TrOCRForCausalLM             | 32  | 990.772726  |
|     PegasusForConditionalGeneration     | 32  | 985.507042  |
|     DistilBertForQuestionAnswering      | 256 | 884.194044  |
|    MegatronBertForQuestionAnswering     |  8  | 765.615265  |
|          BlenderbotForCausalLM          |  4  | 692.027016  |
|            MBartForCausalLM             |  4  | 678.682834  |
|      MBartForConditionalGeneration      |  2  | 667.316637  |
|          DistilBertForMaskedLM          | 128 | 663.411838  |
|     M2M100ForConditionalGeneration      | 16  | 607.445311  |
|          DebertaV2ForMaskedLM           |  2  | 606.229449  |
|             OPTForCausalLM              |  2  | 601.239843  |
|           RobertaForCausalLM            | 16  | 600.448867  |
|      BartForConditionalGeneration       |  2  | 595.297972  |
|            YituTechConvBert             | 16  | 584.701821  |
|                CamemBert                | 16  | 558.691467  |
|             BertForMaskedLM             | 16  | 552.895452  |
|           LayoutLMForMaskedLM           | 16  |  551.43053  |
|          AllenaiLongformerBase          |  4  | 543.140195  |
|             BartForCausalLM             |  4  | 527.912625  |
|            PLBartForCausalLM            |  8  | 511.506505  |
|       DebertaForQuestionAnswering       | 16  |  500.75865  |
|           PegasusForCausalLM            | 32  | 489.466648  |
| BlenderbotSmallForConditionalGeneration | 64  | 486.319256  |
|     PLBartForConditionalGeneration      |  4  | 464.236659  |
|         MegatronBertForCausalLM         |  4  | 453.931879  |
|        BertForQuestionAnswering         | 16  | 439.771049  |
|    LayoutLMForSequenceClassification    | 16  |  438.46108  |
|       RobertaForQuestionAnswering       | 16  | 424.087903  |
|          MobileBertForMaskedLM          | 128 | 407.752328  |
|               GoogleFnet                | 16  | 405.901573  |
|             XGLMForCausalLM             |  8  | 386.320798  |
|               DistillGPT2               | 16  | 381.547215  |
|                 T5Small                 |  4  | 354.696996  |
|       T5ForConditionalGeneration        |  4  |  354.19982  |
|           DebertaForMaskedLM            |  8  | 352.816871  |
|       ElectraForQuestionAnswering       | 64  | 321.976335  |
|         Speech2Text2ForCausalLM         | 256 | 278.379259  |
|       MT5ForConditionalGeneration       | 16  | 278.240367  |
|       BlenderbotSmallForCausalLM        | 64  | 276.419059  |
|      GPT2ForSequenceClassification      |  4  | 265.525069  |
|           ElectraForCausalLM            | 32  | 250.641317  |
|     MobileBertForQuestionAnswering      | 128 | 239.775906  |
|      DebertaV2ForQuestionAnswering      |  1  | 225.119717  |
+-----------------------------------------+-----+-------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|         mobilenetv2_100         | 128  | 3.96803  |
|           fbnetc_100            | 512  | 3.957369 |
|           mnasnet_100           | 512  | 3.892946 |
|            lcnet_050            | 256  | 3.836569 |
|      mobilenetv3_large_100      | 512  | 3.67162  |
|          spnasnet_100           | 128  | 3.639563 |
|            fbnetv3_b            | 256  | 3.515961 |
|           regnety_002           | 1024 | 3.312038 |
|           rexnet_100            | 256  | 3.09477  |
|       tf_efficientnet_b0        | 128  | 3.00941  |
|            tinynet_a            | 128  | 2.785956 |
|          pnasnet5large          |  16  | 2.649397 |
|        ese_vovnet19b_dw         | 256  | 2.638272 |
|          botnet26t_256          | 128  | 2.498188 |
|           res2next50            | 128  | 2.45891  |
|       gluon_inception_v3        | 256  | 2.405364 |
|          ghostnet_100           | 512  | 2.355524 |
|          inception_v3           | 128  | 2.338414 |
|       eca_botnext26ts_256       | 128  | 2.321875 |
|        eca_halonext26ts         | 128  | 2.319373 |
|        adv_inception_v3         | 128  | 2.308283 |
|             dla102              | 128  | 2.222801 |
|        res2net50_14w_8s         | 128  | 2.166161 |
|        res2net101_26w_4s        | 128  | 2.149595 |
|          cspdarknet53           |  64  | 2.092936 |
|            repvgg_a2            | 128  | 2.047636 |
|            nfnet_l0             | 128  | 2.025316 |
|        convmixer_768_32         |  32  | 1.983039 |
|         poolformer_m36          |  64  | 1.956845 |
|            gernet_l             | 128  | 1.946952 |
|           tf_mixnet_l           | 128  | 1.903434 |
|           dm_nfnet_f0           | 128  | 1.896678 |
|           mobilevit_s           |  64  | 1.866383 |
|           selecsls42b           | 128  | 1.788246 |
|         visformer_small         | 128  | 1.716436 |
|            mixnet_l             | 128  | 1.709487 |
|           volo_d1_224           |  64  | 1.694364 |
|        sebotnet33ts_256         |  64  | 1.605834 |
|           resnest101e           |  64  | 1.49005  |
|            levit_128            | 1024 |  1.4798  |
|             dpn107              |  64  | 1.442042 |
|          jx_nest_base           |  32  | 1.264759 |
|          gmlp_s16_224           | 128  | 1.243508 |
|      xcit_large_24_p8_224       |  16  | 1.22538  |
|          resmlp_12_224          | 128  | 1.201025 |
|           convit_base           |  64  | 1.167781 |
|         coat_lite_mini          | 128  | 1.142978 |
|        tnt_s_patch16_224        | 128  | 1.138569 |
|          gmixer_24_224          | 128  | 1.135066 |
|        twins_pcpvt_base         | 128  | 1.10159  |
|  swin_base_patch4_window7_224   |  64  | 1.100582 |
|          cait_m36_384           |  4   | 1.08047  |
|          convnext_base          |  64  | 1.06635  |
|      beit_base_patch16_224      |  64  | 1.064513 |
|          mixer_b16_224          | 128  | 1.041576 |
| deit_base_distilled_patch16_224 |  64  | 1.031139 |
|            pit_b_224            |  64  | 1.031017 |
|      vit_base_patch16_224       |  64  | 1.017482 |
|         crossvit_9_240          | 256  | 0.995895 |
|            hrnet_w18            | 128  | 0.801609 |
|     swsl_resnext101_32x16d      |  32  | 0.07357  |
+---------------------------------+------+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 8  |   pass   |
|      beit_base_patch16_224      | 8  |   pass   |
|          botnet26t_256          | 8  |   pass   |
|          cait_m36_384           | 8  |   pass   |
|         coat_lite_mini          | 8  |   pass   |
|           convit_base           | 8  |   pass   |
|        convmixer_768_32         | 8  |   pass   |
|          convnext_base          | 8  |   pass   |
|         crossvit_9_240          | 8  |   pass   |
|          cspdarknet53           | 8  |   pass   |
| deit_base_distilled_patch16_224 | 8  |   pass   |
|             dla102              | 8  |   pass   |
|           dm_nfnet_f0           | 8  |   pass   |
|             dpn107              | 8  |   pass   |
|       eca_botnext26ts_256       | 8  |   pass   |
|        eca_halonext26ts         | 8  |   pass   |
|        ese_vovnet19b_dw         | 8  |   pass   |
|           fbnetc_100            | 8  |   pass   |
|            fbnetv3_b            | 8  |   pass   |
|            gernet_l             | 8  |   pass   |
|          ghostnet_100           | 8  |   pass   |
|       gluon_inception_v3        | 8  |   pass   |
|          gmixer_24_224          | 8  |   pass   |
|          gmlp_s16_224           | 8  |   pass   |
|            hrnet_w18            | 8  |   pass   |
|          inception_v3           | 8  |   pass   |
|          jx_nest_base           | 8  |   pass   |
|            lcnet_050            | 8  |   pass   |
|            levit_128            | 8  |   pass   |
|          mixer_b16_224          | 8  |   pass   |
|            mixnet_l             | 8  |   pass   |
|           mnasnet_100           | 8  |   pass   |
|         mobilenetv2_100         | 8  |   pass   |
|      mobilenetv3_large_100      | 8  |   pass   |
|           mobilevit_s           | 8  |   pass   |
|            nfnet_l0             | 8  |   pass   |
|            pit_b_224            | 8  |   pass   |
|          pnasnet5large          | 8  |   pass   |
|         poolformer_m36          | 8  |   pass   |
|           regnety_002           | 8  |   pass   |
|            repvgg_a2            | 8  |   pass   |
|        res2net101_26w_4s        | 8  |   pass   |
|        res2net50_14w_8s         | 8  |   pass   |
|           res2next50            | 8  |   pass   |
|          resmlp_12_224          | 8  |   pass   |
|           resnest101e           | 8  |   pass   |
|           rexnet_100            | 8  |   pass   |
|        sebotnet33ts_256         | 8  |   pass   |
|           selecsls42b           | 8  |   pass   |
|          spnasnet_100           | 8  |   pass   |
|  swin_base_patch4_window7_224   | 8  |   pass   |
|     swsl_resnext101_32x16d      | 8  |   pass   |
|       tf_efficientnet_b0        | 8  |   pass   |
|           tf_mixnet_l           | 8  |   pass   |
|            tinynet_a            | 8  |   pass   |
|        tnt_s_patch16_224        | 8  |   pass   |
|        twins_pcpvt_base         | 8  |   pass   |
|         visformer_small         | 8  |   pass   |
|      vit_base_patch16_224       | 8  |   pass   |
|           volo_d1_224           | 8  |   pass   |
|      xcit_large_24_p8_224       | 8  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+------+------------+
|              name               |  bs  |  inductor  |
+---------------------------------+------+------------+
|     swsl_resnext101_32x16d      |  32  | 107.876118 |
|          cait_m36_384           |  4   |  85.27562  |
|      xcit_large_24_p8_224       |  16  | 84.595013  |
|            hrnet_w18            | 128  | 80.037889  |
|  swin_base_patch4_window7_224   |  64  | 77.566502  |
|          pnasnet5large          |  16  | 74.909521  |
|           mobilevit_s           |  64  | 64.380615  |
|         poolformer_m36          |  64  | 59.434113  |
|          jx_nest_base           |  32  | 55.922205  |
|        tnt_s_patch16_224        | 128  |  53.91061  |
|        res2net101_26w_4s        | 128  | 51.181501  |
|        twins_pcpvt_base         | 128  | 50.338605  |
|             dpn107              |  64  | 48.739965  |
|        res2net50_14w_8s         | 128  | 47.963478  |
|           volo_d1_224           |  64  | 46.716989  |
|           resnest101e           |  64  | 46.551919  |
|           tf_mixnet_l           | 128  | 42.372916  |
|            levit_128            | 1024 | 41.695208  |
|        eca_halonext26ts         | 128  | 39.632742  |
|            mixnet_l             | 128  | 39.579983  |
|            fbnetv3_b            | 256  | 38.801532  |
|        sebotnet33ts_256         |  64  | 38.739232  |
|          gmixer_24_224          | 128  | 37.032066  |
|         crossvit_9_240          | 256  | 35.906681  |
|        adv_inception_v3         | 128  | 35.671727  |
|          gmlp_s16_224           | 128  |  33.89249  |
|          inception_v3           | 128  | 33.357658  |
|          convnext_base          |  64  | 31.050873  |
|       eca_botnext26ts_256       | 128  | 30.811182  |
|           convit_base           |  64  | 30.744728  |
|       gluon_inception_v3        | 256  | 30.611191  |
|           res2next50            | 128  | 30.038327  |
|         coat_lite_mini          | 128  | 29.298935  |
|             dla102              | 128  | 27.512722  |
|          botnet26t_256          | 128  | 27.464539  |
|           rexnet_100            | 256  | 27.291454  |
|            tinynet_a            | 128  | 26.095005  |
|          ghostnet_100           | 512  | 25.912581  |
|         visformer_small         | 128  | 24.106228  |
|       tf_efficientnet_b0        | 128  | 23.433862  |
|          cspdarknet53           |  64  | 21.906107  |
|          mixer_b16_224          | 128  | 21.425268  |
|            pit_b_224            |  64  | 21.346381  |
|        convmixer_768_32         |  32  | 20.733943  |
|      beit_base_patch16_224      |  64  | 20.146055  |
| deit_base_distilled_patch16_224 |  64  |  19.36396  |
|         mobilenetv2_100         | 128  | 19.161409  |
|      mobilenetv3_large_100      | 512  | 19.143308  |
|      vit_base_patch16_224       |  64  | 19.094433  |
|          spnasnet_100           | 128  | 17.985178  |
|           dm_nfnet_f0           | 128  | 17.800601  |
|            gernet_l             | 128  | 17.033085  |
|            lcnet_050            | 256  | 16.933395  |
|           regnety_002           | 1024 | 16.735654  |
|            nfnet_l0             | 128  | 16.620321  |
|            repvgg_a2            | 128  | 16.443433  |
|          resmlp_12_224          | 128  | 15.712391  |
|           selecsls42b           | 128  | 14.947706  |
|           fbnetc_100            | 512  | 14.616416  |
|           mnasnet_100           | 512  | 13.006414  |
|        ese_vovnet19b_dw         | 256  |  12.58147  |
+---------------------------------+------+------------+

Peak Memory Compression Ratio

+---------------------------------+------+----------+
|              name               |  bs  | inductor |
+---------------------------------+------+----------+
|        ese_vovnet19b_dw         | 256  | 0.997018 |
|      mobilenetv3_large_100      | 512  | 0.996011 |
|           dm_nfnet_f0           | 128  | 0.995626 |
|           fbnetc_100            | 512  | 0.995597 |
|          convnext_base          |  64  | 0.995416 |
|           mnasnet_100           | 512  | 0.994367 |
|            fbnetv3_b            | 256  | 0.99432  |
|           regnety_002           | 1024 | 0.994214 |
|            levit_128            | 1024 |  0.9942  |
|          ghostnet_100           | 512  | 0.993775 |
|           rexnet_100            | 256  | 0.99326  |
|       eca_botnext26ts_256       | 128  | 0.992875 |
|        eca_halonext26ts         | 128  | 0.992793 |
|          mixer_b16_224          | 128  | 0.992573 |
|      xcit_large_24_p8_224       |  16  | 0.99204  |
|        twins_pcpvt_base         | 128  | 0.991731 |
|            nfnet_l0             | 128  | 0.991644 |
|        convmixer_768_32         |  32  | 0.991506 |
|           convit_base           |  64  | 0.991497 |
|        res2net101_26w_4s        | 128  | 0.991233 |
|           res2next50            | 128  | 0.991172 |
|          gmlp_s16_224           | 128  | 0.991083 |
|       tf_efficientnet_b0        | 128  | 0.990706 |
|           tf_mixnet_l           | 128  | 0.990694 |
|         visformer_small         | 128  | 0.990686 |
|         coat_lite_mini          | 128  | 0.990634 |
|      beit_base_patch16_224      |  64  | 0.990397 |
|             dpn107              |  64  | 0.989962 |
|          gmixer_24_224          | 128  | 0.989855 |
|         mobilenetv2_100         | 128  | 0.989745 |
|          botnet26t_256          | 128  | 0.989699 |
|            mixnet_l             | 128  | 0.989578 |
|       gluon_inception_v3        | 256  | 0.989509 |
|           mobilevit_s           |  64  | 0.989275 |
|             dla102              | 128  | 0.988912 |
|        tnt_s_patch16_224        | 128  | 0.988506 |
|          cspdarknet53           |  64  | 0.988075 |
|        sebotnet33ts_256         |  64  | 0.987715 |
|         poolformer_m36          |  64  | 0.987623 |
|            pit_b_224            |  64  | 0.987444 |
|          cait_m36_384           |  4   | 0.987266 |
|          resmlp_12_224          | 128  | 0.987094 |
|            gernet_l             | 128  | 0.987051 |
|           resnest101e           |  64  | 0.987043 |
|        res2net50_14w_8s         | 128  | 0.986852 |
| deit_base_distilled_patch16_224 |  64  | 0.986696 |
|  swin_base_patch4_window7_224   |  64  | 0.985974 |
|      vit_base_patch16_224       |  64  | 0.985838 |
|           selecsls42b           | 128  | 0.985573 |
|          jx_nest_base           |  32  |  0.9846  |
|          inception_v3           | 128  | 0.984257 |
|        adv_inception_v3         | 128  | 0.984037 |
|            tinynet_a            | 128  | 0.983874 |
|     swsl_resnext101_32x16d      |  32  | 0.983413 |
|          pnasnet5large          |  16  | 0.982923 |
|          spnasnet_100           | 128  | 0.982832 |
|            hrnet_w18            | 128  | 0.98193  |
|           volo_d1_224           |  64  | 0.980613 |
|            repvgg_a2            | 128  | 0.978565 |
|         crossvit_9_240          | 256  | 0.977806 |
|            lcnet_050            | 256  | 0.975987 |
+---------------------------------+------+----------+

Absolute latency (ms)

+---------------------------------+------+-------------+
|              name               |  bs  |  inductor   |
+---------------------------------+------+-------------+
|     swsl_resnext101_32x16d      |  32  | 17354.25163 |
|           resnest101e           |  64  | 2714.110783 |
|      xcit_large_24_p8_224       |  16  |  1486.9789  |
|            hrnet_w18            | 128  | 1425.974751 |
|          cait_m36_384           |  4   | 1193.869985 |
|          convnext_base          |  64  | 1158.017267 |
|          mixer_b16_224          | 128  | 1057.775421 |
|             dpn107              |  64  | 961.542809  |
|           dm_nfnet_f0           | 128  | 940.736217  |
|           convit_base           |  64  | 940.089678  |
|  swin_base_patch4_window7_224   |  64  | 917.443992  |
|        twins_pcpvt_base         | 128  | 826.497661  |
|        tnt_s_patch16_224        | 128  | 814.104755  |
|       gluon_inception_v3        | 256  | 779.105129  |
|      vit_base_patch16_224       |  64  | 689.637976  |
| deit_base_distilled_patch16_224 |  64  | 685.297054  |
|      beit_base_patch16_224      |  64  | 672.713616  |
|        res2net101_26w_4s        | 128  | 639.550032  |
|            nfnet_l0             | 128  | 601.811972  |
|          gmlp_s16_224           | 128  | 582.121108  |
|          gmixer_24_224          | 128  | 578.375771  |
|            levit_128            | 1024 | 564.931229  |
|            pit_b_224            |  64  | 547.372506  |
|        ese_vovnet19b_dw         | 256  | 546.648819  |
|             dla102              | 128  | 529.548674  |
|          jx_nest_base           |  32  | 519.373971  |
|         crossvit_9_240          | 256  | 513.512528  |
|        convmixer_768_32         |  32  | 442.254239  |
|           volo_d1_224           |  64  | 418.704806  |
|         coat_lite_mini          | 128  | 413.657823  |
|         poolformer_m36          |  64  | 408.565529  |
|        res2net50_14w_8s         | 128  | 397.127343  |
|        adv_inception_v3         | 128  | 394.925454  |
|          inception_v3           | 128  | 394.662933  |
|         visformer_small         | 128  |  386.25678  |
|            mixnet_l             | 128  | 365.940712  |
|          ghostnet_100           | 512  |  360.02675  |
|           res2next50            | 128  | 354.468414  |
|          pnasnet5large          |  16  | 353.655648  |
|           tf_mixnet_l           | 128  | 352.432303  |
|            repvgg_a2            | 128  | 341.477614  |
|        sebotnet33ts_256         |  64  | 315.963096  |
|        eca_halonext26ts         | 128  |  306.41038  |
|           fbnetc_100            | 512  | 304.817987  |
|       eca_botnext26ts_256       | 128  | 299.502397  |
|          botnet26t_256          | 128  | 289.769482  |
|            gernet_l             | 128  | 289.654101  |
|           regnety_002           | 1024 | 280.806238  |
|          resmlp_12_224          | 128  | 264.193934  |
|          cspdarknet53           |  64  | 261.264288  |
|           mnasnet_100           | 512  | 257.444265  |
|            fbnetv3_b            | 256  | 242.040406  |
|           selecsls42b           | 128  | 232.845301  |
|      mobilenetv3_large_100      | 512  | 228.285582  |
|           rexnet_100            | 256  | 227.842387  |
|           mobilevit_s           |  64  | 206.163472  |
|       tf_efficientnet_b0        | 128  | 119.153534  |
|            tinynet_a            | 128  |  88.613203  |
|         mobilenetv2_100         | 128  |  70.667085  |
|          spnasnet_100           | 128  |  65.850009  |
|            lcnet_050            | 256  |  27.791151  |
+---------------------------------+------+-------------+

zxd1997066 · 2024-10-28T22:40:16Z

[cppwrapper_dynamic_shape] Performance Dashboard for float32 precision -- Single-core Single-thread (2024-10-27 nightly release)

Executive Summary

We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
Experiments do not cover dynamic shapes.
Experimental setup does not have optimizer.

SW information:

SW	Branch	Commit
Pytorch	main	00504aa6b8b0ae68761b89f023184202e8c79bc8
Torchbench	main	e522b45c
torchaudio	main	2.5.0a0+79047bf
torchtext	main	0.16.0a0+b0ebddc
torchvision	main	0.19.0a0+d23a6e1
torchdata	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly

HW information

Item	Value
Manufacturer	Amazon EC2
Product Name	c6i.16xlarge
CPU Model	Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Installed Memory	128GB (1x128GB DDR4 3200 MT/s [Unknown])
OS	Ubuntu 22.04.2 LTS
Kernel	5.19.0-1022-aws
Microcode	0xd000389
GCC	gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
GLIBC	ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Binutils	GNU ld (GNU Binutils for Ubuntu) 2.38
Python	Python 3.10.6
OpenSSL	OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)

Test command

export LD_PRELOAD=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libiomp5.so:${CONDA_PREFIX:-"$(dirname $(which conda))/../"}/lib/libjemalloc.so
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
export TORCHINDUCTOR_FREEZING=1
export OMP_NUM_THREADS=1

python benchmarks/dynamo/runner.py --enable_cpu_launcher --cpu_launcher_args "--core_list 0 --ncores_per_instance 1" --devices=cpu --dtypes=float32 --inference --compilers=inductor --batch_size=1 --threads 1 --extra-args="--timeout 9000"

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+----------+-------------+-------------+-------------+
| Compiler | torchbench  | huggingface | timm_models |
+----------+-------------+-------------+-------------+
| inductor | 100%, 80/80 | 100%, 46/46 | 100%, 61/61 |
+----------+-------------+-------------+-------------+

Geometric mean speedup

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   1.60x    |    1.22x    |    1.59x    |
+----------+------------+-------------+-------------+

Mean compilation time (seconds)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   28.38    |    26.00    |    25.03    |
+----------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+----------+------------+-------------+-------------+
| Compiler | torchbench | huggingface | timm_models |
+----------+------------+-------------+-------------+
| inductor |   0.86x    |    0.81x    |    0.82x    |
+----------+------------+-------------+-------------+

torchbench suite with float32 precision

Performance speedup

+---------------------------------+---------+-----------+
|              name               |   bs    | inductor  |
+---------------------------------+---------+-----------+
|     pyhpc_isoneutral_mixing     |    1    | 49.571334 |
|     pyhpc_equation_of_state     |    1    | 23.591037 |
|     functorch_maml_omniglot     |    1    | 3.737675  |
|          basic_gnn_gin          |    1    |  3.64308  |
|         basic_gnn_sage          |    1    | 3.578766  |
|          squeezenet1_1          |    1    | 3.525187  |
|          basic_gnn_gcn          |    1    | 3.175872  |
|         opacus_cifar10          |    1    | 2.855665  |
|          maml_omniglot          |    5    | 2.851356  |
|           timm_nfnet            |    1    | 2.819023  |
|      functorch_dp_cifar10       |    1    | 2.596898  |
|       shufflenet_v2_x1_0        |    1    | 2.377961  |
|              dcgan              |    1    | 2.248674  |
|            resnet18             |    1    | 2.239306  |
|          mobilenet_v2           |    1    | 2.201226  |
|         phlippe_resnet          |    1    | 2.120789  |
|           mnasnet1_0            |    1    | 2.073441  |
|          timm_resnest           |    1    | 2.038873  |
|       mobilenet_v3_large        |    1    | 2.031101  |
|        timm_efficientnet        |    1    | 1.835955  |
|            resnet50             |    1    | 1.823623  |
|        phlippe_densenet         |    1    | 1.822996  |
|           densenet121           |    1    | 1.813968  |
|            resnet152            |    1    | 1.722101  |
|         LearningToPaint         |    1    | 1.656282  |
|          lennard_jones          |    1    | 1.623016  |
|           timm_vovnet           |    1    | 1.621891  |
|              llama              |    1    | 1.596185  |
|         resnext50_32x4d         |    1    | 1.520757  |
|           timm_regnet           |    1    | 1.520613  |
|      doctr_reco_predictor       |    1    | 1.510134  |
|              vgg16              |    1    | 1.447953  |
|              dlrm               |    1    | 1.416116  |
|             yolov3              |    1    | 1.409318  |
|       doctr_det_predictor       |    1    | 1.404395  |
|        basic_gnn_edgecnn        |    1    | 1.400623  |
|          BERT_pytorch           |    1    | 1.383482  |
|             alexnet             |    1    | 1.383468  |
|               drq               |    1    | 1.374664  |
|            hf_Albert            |    1    |  1.3169   |
| detectron2_fasterrcnn_r_101_c4  |    1    |  1.29802  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 1.285961  |
|             hf_GPT2             |    1    | 1.278663  |
|          fastNLP_Bert           |    1    | 1.267868  |
|         vision_maskrcnn         |    1    | 1.265453  |
|     timm_vision_transformer     |    1    | 1.260006  |
|           hf_Reformer           |    1    | 1.252357  |
|          hf_GPT2_large          |    1    | 1.248751  |
|            moondream            |    1    | 1.226128  |
|           Super_SloMo           |    1    | 1.221138  |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 1.197792  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 1.190075  |
|             hf_Bert             |    1    | 1.189006  |
|         pytorch_stargan         |   16    | 1.181345  |
|          hf_Bert_large          |    1    | 1.178081  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 1.176991  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 1.174839  |
|  timm_vision_transformer_large  |    1    | 1.170112  |
|          hf_DistilBert          |    1    | 1.158758  |
|              maml               |    1    | 1.152383  |
|      torch_multimodal_clip      |    1    | 1.137238  |
|          pytorch_unet           |    1    | 1.123371  |
|             hf_Bart             |    1    | 1.110762  |
|       speech_transformer        |    1    | 1.098869  |
|    detectron2_fcos_r_50_fpn     |    1    | 1.095307  |
|           hf_BigBird            |    1    | 1.093736  |
|        hf_distil_whisper        |    1    | 1.075719  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 1.054203  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 1.052156  |
|        soft_actor_critic        |   256   | 1.044628  |
|          hf_Longformer          |    1    | 1.039223  |
|           tts_angular           |    1    | 1.000271  |
|             demucs              |    1    | 0.999296  |
|   mobilenet_v2_quantized_qat    |    1    | 0.998557  |
|     resnet50_quantized_qat      |    1    | 0.994382  |
|     nvidia_deeprecommender      |    1    | 0.943015  |
|       Background_Matting        |    1    | 0.924462  |
|           hf_T5_large           |    1    | 0.796064  |
|              hf_T5              |    1    | 0.715132  |
|           hf_T5_base            |    1    | 0.594623  |
|        timm_efficientdet        |    0    |    0.0    |
+---------------------------------+---------+-----------+

Accuracy

+---------------------------------+---------+--------------------+
|              name               |   bs    |      inductor      |
+---------------------------------+---------+--------------------+
|       Background_Matting        |    1    |  pass_due_to_skip  |
|          hf_GPT2_large          |    1    |  pass_due_to_skip  |
|              maml               |    1    |  pass_due_to_skip  |
|           hf_T5_large           |    1    |  pass_due_to_skip  |
|  timm_vision_transformer_large  |    1    |  pass_due_to_skip  |
|        hf_distil_whisper        |    1    |        pass        |
|         LearningToPaint         |    1    |        pass        |
|           Super_SloMo           |    1    |        pass        |
|             alexnet             |    1    |        pass        |
|        basic_gnn_edgecnn        |    1    |        pass        |
|          basic_gnn_gcn          |    1    |        pass        |
|          basic_gnn_gin          |    1    |        pass        |
|         basic_gnn_sage          |    1    |        pass        |
|           densenet121           |    1    |        pass        |
|              dcgan              |    1    |        pass        |
|              dlrm               |    1    |        pass        |
| detectron2_fasterrcnn_r_101_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_101_dc5 |    1    |        pass        |
| detectron2_fasterrcnn_r_101_fpn |    1    |        pass        |
|  detectron2_fasterrcnn_r_50_c4  |    1    |        pass        |
| detectron2_fasterrcnn_r_50_dc5  |    1    |        pass        |
|    detectron2_fcos_r_50_fpn     |    1    |        pass        |
|          hf_Bert_large          |    1    |        pass        |
|       doctr_det_predictor       |    1    |        pass        |
|      doctr_reco_predictor       |    1    |        pass        |
|             yolov3              |    1    |        pass        |
|               drq               |    1    |        pass        |
|          fastNLP_Bert           |    1    |        pass        |
|      functorch_dp_cifar10       |    1    |        pass        |
|     functorch_maml_omniglot     |    1    |        pass        |
|            hf_Albert            |    1    |        pass        |
|             hf_Bart             |    1    |        pass        |
|             hf_Bert             |    1    |        pass        |
|             demucs              |    1    |        pass        |
|           hf_BigBird            |    1    |        pass        |
|          hf_DistilBert          |    1    |        pass        |
|             hf_GPT2             |    1    |        pass        |
|          hf_Longformer          |    1    |        pass        |
|           hf_Reformer           |    1    |        pass        |
|              hf_T5              |    1    |        pass        |
|           hf_T5_base            |    1    |        pass        |
| detectron2_fasterrcnn_r_50_fpn  |    1    |        pass        |
|            resnet18             |    1    |        pass        |
|              llama              |    1    |        pass        |
|         opacus_cifar10          |    1    |        pass        |
|            moondream            |    1    |        pass        |
|   mobilenet_v2_quantized_qat    |    1    |        pass        |
|       mobilenet_v3_large        |    1    |        pass        |
|          maml_omniglot          |    5    |        pass        |
|          pytorch_unet           |    1    |        pass        |
|     nvidia_deeprecommender      |    1    |        pass        |
|          mobilenet_v2           |    1    |        pass        |
|         phlippe_resnet          |    1    |        pass        |
|          lennard_jones          |    1    |        pass        |
|     pyhpc_equation_of_state     |    1    |        pass        |
|     pyhpc_isoneutral_mixing     |    1    |        pass        |
| pyhpc_turbulent_kinetic_energy  | 1048576 |        pass        |
|  pytorch_CycleGAN_and_pix2pix   |    1    |        pass        |
|         pytorch_stargan         |   16    |        pass        |
|        phlippe_densenet         |    1    |        pass        |
|           mnasnet1_0            |    1    |        pass        |
|          BERT_pytorch           |    1    |        pass        |
|            resnet152            |    1    |        pass        |
|           timm_regnet           |    1    |        pass        |
|         vision_maskrcnn         |    1    |        pass        |
|     resnet50_quantized_qat      |    1    |        pass        |
|         resnext50_32x4d         |    1    |        pass        |
|       shufflenet_v2_x1_0        |    1    |        pass        |
|        soft_actor_critic        |   256   |        pass        |
|       speech_transformer        |    1    |        pass        |
|          squeezenet1_1          |    1    |        pass        |
|        timm_efficientnet        |    1    |        pass        |
|            resnet50             |    1    |        pass        |
|           timm_nfnet            |    1    |        pass        |
|          timm_resnest           |    1    |        pass        |
|     timm_vision_transformer     |    1    |        pass        |
|           timm_vovnet           |    1    |        pass        |
|      torch_multimodal_clip      |    1    |        pass        |
|           tts_angular           |    1    |        pass        |
|              vgg16              |    1    |        pass        |
|        timm_efficientdet        |    0    | model_fail_to_load |
+---------------------------------+---------+--------------------+

Compilation latency (sec)

+---------------------------------+---------+------------+
|              name               |   bs    |  inductor  |
+---------------------------------+---------+------------+
|           hf_BigBird            |    1    | 156.428263 |
|         vision_maskrcnn         |    1    | 146.38789  |
|    detectron2_fcos_r_50_fpn     |    1    | 123.16047  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 113.676608 |
|           hf_T5_base            |    1    | 92.137482  |
| detectron2_fasterrcnn_r_101_c4  |    1    | 91.000182  |
|           hf_T5_large           |    1    | 90.895016  |
|              maml               |    1    | 83.349885  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 62.820136  |
|          hf_Longformer          |    1    | 60.481714  |
|       speech_transformer        |    1    | 52.084106  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 48.694503  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 45.097131  |
|           hf_Reformer           |    1    | 43.822664  |
|           densenet121           |    1    | 39.511636  |
|  timm_vision_transformer_large  |    1    | 35.540715  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 32.847264  |
|          fastNLP_Bert           |    1    | 30.823409  |
|          basic_gnn_gcn          |    1    | 29.769721  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |  29.42749  |
|          hf_Bert_large          |    1    | 29.098724  |
|          hf_GPT2_large          |    1    | 28.145481  |
|            moondream            |    1    | 27.028372  |
|        hf_distil_whisper        |    1    | 26.965863  |
|       doctr_det_predictor       |    1    | 26.449988  |
|            resnet152            |    1    | 25.074922  |
|           Super_SloMo           |    1    | 25.023379  |
|      torch_multimodal_clip      |    1    | 24.397768  |
|              hf_T5              |    1    |  22.42101  |
|          BERT_pytorch           |    1    | 21.647693  |
|             hf_Bart             |    1    | 20.703859  |
|             yolov3              |    1    |  19.44902  |
|       Background_Matting        |    1    | 19.449015  |
|             demucs              |    1    | 18.194427  |
|           timm_regnet           |    1    | 17.676525  |
|        phlippe_densenet         |    1    | 17.340065  |
|             hf_Bert             |    1    | 17.213219  |
|       shufflenet_v2_x1_0        |    1    | 16.940472  |
|              llama              |    1    |  16.50341  |
|           timm_nfnet            |    1    | 16.453072  |
|        timm_efficientnet        |    1    | 16.029736  |
|             hf_GPT2             |    1    | 15.840646  |
|          timm_resnest           |    1    | 15.130621  |
|     timm_vision_transformer     |    1    | 14.929228  |
|       mobilenet_v3_large        |    1    | 14.901554  |
|            hf_Albert            |    1    | 14.715443  |
|      doctr_reco_predictor       |    1    | 14.617005  |
|           timm_vovnet           |    1    | 14.216788  |
|         pytorch_stargan         |   16    | 14.198376  |
|            resnet50             |    1    | 13.686972  |
|         resnext50_32x4d         |    1    | 13.679224  |
|          mobilenet_v2           |    1    | 13.651015  |
|           mnasnet1_0            |    1    | 13.315776  |
|          hf_DistilBert          |    1    | 12.998999  |
|     pyhpc_isoneutral_mixing     |    1    | 12.927844  |
|         opacus_cifar10          |    1    | 12.490579  |
|      functorch_dp_cifar10       |    1    | 11.936651  |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 11.608035  |
|          pytorch_unet           |    1    | 10.529422  |
|            resnet18             |    1    | 10.109681  |
|          squeezenet1_1          |    1    |  9.939103  |
|         LearningToPaint         |    1    |  9.88534   |
|         phlippe_resnet          |    1    |  9.625551  |
|              vgg16              |    1    |  9.412526  |
|     pyhpc_equation_of_state     |    1    |  9.29685   |
|             alexnet             |    1    |  9.266989  |
|          maml_omniglot          |    5    |  9.215274  |
|     functorch_maml_omniglot     |    1    |  8.677129  |
|               drq               |    1    |  8.526383  |
|     nvidia_deeprecommender      |    1    |  8.505019  |
|              dlrm               |    1    |  8.295589  |
|              dcgan              |    1    |  7.90031   |
|          basic_gnn_gin          |    1    |  7.771319  |
|         basic_gnn_sage          |    1    |  7.768536  |
|        soft_actor_critic        |   256   |  7.665891  |
|          lennard_jones          |    1    |  7.620589  |
|        basic_gnn_edgecnn        |    1    |  7.614699  |
|           tts_angular           |    1    |  7.401931  |
|   mobilenet_v2_quantized_qat    |    1    |  0.100161  |
|     resnet50_quantized_qat      |    1    |  0.073081  |
|        timm_efficientdet        |    0    |    0.0     |
+---------------------------------+---------+------------+

Peak Memory Compression Ratio

+---------------------------------+---------+----------+
|              name               |   bs    | inductor |
+---------------------------------+---------+----------+
|             demucs              |    1    | 0.995224 |
|           hf_T5_base            |    1    | 0.988425 |
|              dlrm               |    1    | 0.987121 |
|          hf_GPT2_large          |    1    | 0.986278 |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 0.980938 |
|       Background_Matting        |    1    | 0.980562 |
|          pytorch_unet           |    1    | 0.974118 |
|        basic_gnn_edgecnn        |    1    | 0.973233 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 0.97262  |
| detectron2_fasterrcnn_r_101_fpn |    1    |  0.9584  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 0.955869 |
|       doctr_det_predictor       |    1    | 0.955562 |
|    detectron2_fcos_r_50_fpn     |    1    | 0.953244 |
|      torch_multimodal_clip      |    1    | 0.952043 |
|     resnet50_quantized_qat      |    1    | 0.950967 |
|         pytorch_stargan         |   16    | 0.94573  |
|         basic_gnn_sage          |    1    | 0.944896 |
|           hf_BigBird            |    1    | 0.944411 |
|         LearningToPaint         |    1    | 0.944239 |
|          basic_gnn_gcn          |    1    | 0.944089 |
|         vision_maskrcnn         |    1    | 0.943891 |
|          basic_gnn_gin          |    1    | 0.942806 |
|      doctr_reco_predictor       |    1    | 0.942364 |
|           Super_SloMo           |    1    | 0.933183 |
|        hf_distil_whisper        |    1    | 0.930496 |
|   mobilenet_v2_quantized_qat    |    1    | 0.924949 |
|              llama              |    1    | 0.919209 |
|  pytorch_CycleGAN_and_pix2pix   |    1    | 0.911178 |
|           tts_angular           |    1    | 0.888293 |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 0.885017 |
|        soft_actor_critic        |   256   | 0.882784 |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 0.880347 |
|         opacus_cifar10          |    1    | 0.879408 |
|             yolov3              |    1    | 0.875119 |
|        timm_efficientnet        |    1    | 0.859259 |
| pyhpc_turbulent_kinetic_energy  | 1048576 | 0.858736 |
|          fastNLP_Bert           |    1    | 0.858669 |
|          lennard_jones          |    1    | 0.858093 |
|              dcgan              |    1    | 0.857424 |
|           mnasnet1_0            |    1    | 0.854881 |
|          timm_resnest           |    1    | 0.854816 |
|          squeezenet1_1          |    1    | 0.853491 |
|          maml_omniglot          |    5    | 0.851813 |
|     functorch_maml_omniglot     |    1    | 0.849741 |
|          mobilenet_v2           |    1    | 0.848635 |
|     pyhpc_equation_of_state     |    1    | 0.842451 |
|       mobilenet_v3_large        |    1    | 0.840237 |
|         phlippe_resnet          |    1    | 0.827731 |
|            moondream            |    1    | 0.825739 |
|     pyhpc_isoneutral_mixing     |    1    | 0.82246  |
|       shufflenet_v2_x1_0        |    1    | 0.822447 |
|       speech_transformer        |    1    | 0.817778 |
|          hf_Bert_large          |    1    | 0.814631 |
|           timm_nfnet            |    1    | 0.812952 |
|        phlippe_densenet         |    1    | 0.810568 |
|          hf_Longformer          |    1    | 0.807816 |
|             hf_Bert             |    1    | 0.805797 |
|            hf_Albert            |    1    | 0.803997 |
|     timm_vision_transformer     |    1    | 0.803656 |
|           hf_T5_large           |    1    | 0.800361 |
|           timm_vovnet           |    1    | 0.797685 |
|            resnet18             |    1    | 0.792746 |
|         resnext50_32x4d         |    1    | 0.790225 |
|             hf_Bart             |    1    | 0.785455 |
|           densenet121           |    1    | 0.78289  |
|            resnet50             |    1    | 0.780635 |
|          hf_DistilBert          |    1    |  0.7801  |
|          BERT_pytorch           |    1    | 0.77862  |
|           timm_regnet           |    1    | 0.778264 |
|             hf_GPT2             |    1    | 0.769181 |
|               drq               |    1    | 0.755147 |
|      functorch_dp_cifar10       |    1    | 0.744361 |
|             alexnet             |    1    | 0.733976 |
|  timm_vision_transformer_large  |    1    | 0.73298  |
|           hf_Reformer           |    1    | 0.732166 |
|              hf_T5              |    1    | 0.730163 |
|              vgg16              |    1    | 0.722892 |
|              maml               |    1    | 0.718367 |
|            resnet152            |    1    | 0.716733 |
|     nvidia_deeprecommender      |    1    | 0.672821 |
|        timm_efficientdet        |    0    |   0.0    |
+---------------------------------+---------+----------+

Absolute latency (ms)

+---------------------------------+---------+--------------+
|              name               |   bs    |   inductor   |
+---------------------------------+---------+--------------+
|           hf_T5_base            |    1    | 26905.560935 |
| detectron2_fasterrcnn_r_101_c4  |    1    | 12079.13578  |
|  detectron2_fasterrcnn_r_50_c4  |    1    | 11482.490805 |
|          hf_GPT2_large          |    1    | 10215.773117 |
|           hf_T5_large           |    1    | 7690.736441  |
|            moondream            |    1    | 7382.744462  |
|        hf_distil_whisper        |    1    | 6872.891978  |
|       Background_Matting        |    1    | 6254.082163  |
| detectron2_fasterrcnn_r_101_dc5 |    1    | 5847.336196  |
| detectron2_fasterrcnn_r_50_dc5  |    1    | 5138.996825  |
|          pytorch_unet           |    1    | 4552.302302  |
| detectron2_fasterrcnn_r_101_fpn |    1    | 3255.142108  |
|  timm_vision_transformer_large  |    1    | 2792.465562  |
|         vision_maskrcnn         |    1    | 2641.796228  |
|    detectron2_fcos_r_50_fpn     |    1    | 2489.423282  |
| detectron2_fasterrcnn_r_50_fpn  |    1    | 2461.898249  |
|             demucs              |    1    | 2347.591205  |
|         pytorch_stargan         |   16    | 2050.677944  |
|           Super_SloMo           |    1    | 1977.450258  |
|          hf_Bert_large          |    1    | 1817.077695  |
|           hf_BigBird            |    1    | 1622.940025  |
|       doctr_det_predictor       |    1    | 1616.817019  |
|      torch_multimodal_clip      |    1    | 1225.254472  |
|          hf_Longformer          |    1    | 1141.991189  |
|             hf_Bart             |    1    |  892.740054  |
|              hf_T5              |    1    |  773.899643  |
|             hf_Bert             |    1    |  685.04835   |
|       speech_transformer        |    1    |  674.97874   |
|  pytorch_CycleGAN_and_pix2pix   |    1    |  624.989617  |
|            hf_Albert            |    1    |  579.142536  |
|          fastNLP_Bert           |    1    |  528.614296  |
|             yolov3              |    1    |  434.309172  |
|          hf_DistilBert          |    1    |   425.652    |
|             hf_GPT2             |    1    |  363.047112  |
|           hf_Reformer           |    1    |  297.260379  |
|        basic_gnn_edgecnn        |    1    |  238.196942  |
|              vgg16              |    1    |  191.728432  |
| pyhpc_turbulent_kinetic_energy  | 1048576 |   165.9993   |
|           timm_regnet           |    1    |  150.989603  |
|          BERT_pytorch           |    1    |  139.108567  |
|            resnet152            |    1    |  137.791454  |
|           timm_nfnet            |    1    |  97.136261   |
|              maml               |    1    |  86.250779   |
|           timm_vovnet           |    1    |  81.001404   |
|     nvidia_deeprecommender      |    1    |  58.698922   |
|         resnext50_32x4d         |    1    |  57.795431   |
|     timm_vision_transformer     |    1    |  57.353506   |
|           tts_angular           |    1    |  54.840658   |
|            resnet50             |    1    |  51.476547   |
|           densenet121           |    1    |  45.578861   |
|          timm_resnest           |    1    |  33.264705   |
|          basic_gnn_gcn          |    1    |  33.065702   |
|      doctr_reco_predictor       |    1    |  24.382051   |
|             alexnet             |    1    |  22.663924   |
|            resnet18             |    1    |  22.382235   |
|              llama              |    1    |  21.145315   |
|     resnet50_quantized_qat      |    1    |  19.718407   |
|         basic_gnn_sage          |    1    |  17.127467   |
|          basic_gnn_gin          |    1    |  16.798619   |
|        timm_efficientnet        |    1    |  12.957075   |
|         LearningToPaint         |    1    |  10.251856   |
|   mobilenet_v2_quantized_qat    |    1    |   8.882432   |
|           mnasnet1_0            |    1    |   7.547481   |
|          mobilenet_v2           |    1    |   7.417257   |
|       mobilenet_v3_large        |    1    |   7.031257   |
|          squeezenet1_1          |    1    |   5.825907   |
|       shufflenet_v2_x1_0        |    1    |   5.556106   |
|        soft_actor_critic        |   256   |   3.493247   |
|        phlippe_densenet         |    1    |   3.438885   |
|      functorch_dp_cifar10       |    1    |   2.144781   |
|         opacus_cifar10          |    1    |   2.059312   |
|               drq               |    1    |   1.932102   |
|              dcgan              |    1    |   1.804031   |
|         phlippe_resnet          |    1    |   1.350928   |
|     functorch_maml_omniglot     |    1    |   0.850347   |
|          maml_omniglot          |    5    |   0.82313    |
|              dlrm               |    1    |   0.604475   |
|     pyhpc_isoneutral_mixing     |    1    |   0.063653   |
|     pyhpc_equation_of_state     |    1    |   0.050631   |
|          lennard_jones          |    1    |   0.049644   |
|        timm_efficientdet        |    0    |     0.0      |
+---------------------------------+---------+--------------+

huggingface suite with float32 precision

Performance speedup

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|          MobileBertForMaskedLM          | 1  | 2.166329 |
|     MobileBertForQuestionAnswering      | 1  | 1.843853 |
|         Speech2Text2ForCausalLM         | 1  | 1.422766 |
|      GPT2ForSequenceClassification      | 1  | 1.366085 |
|            XLNetLMHeadModel             | 1  | 1.351937 |
| BlenderbotSmallForConditionalGeneration | 1  | 1.351115 |
|     DistilBertForQuestionAnswering      | 1  | 1.337289 |
|          DistilBertForMaskedLM          | 1  | 1.332848 |
|            YituTechConvBert             | 1  | 1.32241  |
|       BlenderbotSmallForCausalLM        | 1  | 1.312198 |
|       DebertaForQuestionAnswering       | 1  | 1.310298 |
|           DebertaForMaskedLM            | 1  | 1.275797 |
|       ElectraForQuestionAnswering       | 1  | 1.271844 |
|           ElectraForCausalLM            | 1  | 1.269938 |
|     PegasusForConditionalGeneration     | 1  | 1.261716 |
|     M2M100ForConditionalGeneration      | 1  | 1.261373 |
|           PegasusForCausalLM            | 1  | 1.255696 |
|          BlenderbotForCausalLM          | 1  | 1.25089  |
|             XGLMForCausalLM             | 1  | 1.246121 |
|               GoogleFnet                | 1  | 1.241752 |
|       MT5ForConditionalGeneration       | 1  | 1.21538  |
|           LayoutLMForMaskedLM           | 1  | 1.21231  |
|               DistillGPT2               | 1  | 1.207682 |
|             BertForMaskedLM             | 1  | 1.20469  |
|    LayoutLMForSequenceClassification    | 1  | 1.203037 |
|                CamemBert                | 1  | 1.200013 |
|        BertForQuestionAnswering         | 1  | 1.197936 |
|       RobertaForQuestionAnswering       | 1  | 1.196502 |
|       AlbertForQuestionAnswering        | 1  | 1.191546 |
|            AlbertForMaskedLM            | 1  | 1.190686 |
|      DebertaV2ForQuestionAnswering      | 1  | 1.186157 |
|    MegatronBertForQuestionAnswering     | 1  | 1.181074 |
|           RobertaForCausalLM            | 1  | 1.17947  |
|          DebertaV2ForMaskedLM           | 1  | 1.175707 |
|         MegatronBertForCausalLM         | 1  | 1.173945 |
|            TrOCRForCausalLM             | 1  | 1.146497 |
|     PLBartForConditionalGeneration      | 1  | 1.117303 |
|      MBartForConditionalGeneration      | 1  | 1.08858  |
|      BartForConditionalGeneration       | 1  | 1.06293  |
|             BartForCausalLM             | 1  | 1.062878 |
|            PLBartForCausalLM            | 1  | 1.025504 |
|             OPTForCausalLM              | 1  | 1.021938 |
|            MBartForCausalLM             | 1  | 1.009354 |
|          AllenaiLongformerBase          | 1  | 0.970439 |
|       T5ForConditionalGeneration        | 1  | 0.619767 |
|                 T5Small                 | 1  | 0.614206 |
+-----------------------------------------+----+----------+

Accuracy

+-----------------------------------------+----+------------------+
|                  name                   | bs |     inductor     |
+-----------------------------------------+----+------------------+
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |
|                CamemBert                | 1  |       pass       |
|       AlbertForQuestionAnswering        | 1  |       pass       |
|           DebertaForMaskedLM            | 1  |       pass       |
|          AllenaiLongformerBase          | 1  |       pass       |
|             BartForCausalLM             | 1  |       pass       |
|      BartForConditionalGeneration       | 1  |       pass       |
|             BertForMaskedLM             | 1  |       pass       |
|        BertForQuestionAnswering         | 1  |       pass       |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |
|       DebertaForQuestionAnswering       | 1  |       pass       |
|           LayoutLMForMaskedLM           | 1  |       pass       |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |
|          DistilBertForMaskedLM          | 1  |       pass       |
|     DistilBertForQuestionAnswering      | 1  |       pass       |
|               DistillGPT2               | 1  |       pass       |
|           ElectraForCausalLM            | 1  |       pass       |
|       ElectraForQuestionAnswering       | 1  |       pass       |
|      GPT2ForSequenceClassification      | 1  |       pass       |
|               GoogleFnet                | 1  |       pass       |
|    LayoutLMForSequenceClassification    | 1  |       pass       |
|            MBartForCausalLM             | 1  |       pass       |
|            XLNetLMHeadModel             | 1  |       pass       |
|             XGLMForCausalLM             | 1  |       pass       |
|            AlbertForMaskedLM            | 1  |       pass       |
|      MBartForConditionalGeneration      | 1  |       pass       |
|       MT5ForConditionalGeneration       | 1  |       pass       |
|         MegatronBertForCausalLM         | 1  |       pass       |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |
|          MobileBertForMaskedLM          | 1  |       pass       |
|     MobileBertForQuestionAnswering      | 1  |       pass       |
|             OPTForCausalLM              | 1  |       pass       |
|            PLBartForCausalLM            | 1  |       pass       |
|     PLBartForConditionalGeneration      | 1  |       pass       |
|           PegasusForCausalLM            | 1  |       pass       |
|     M2M100ForConditionalGeneration      | 1  |       pass       |
|     PegasusForConditionalGeneration     | 1  |       pass       |
|           RobertaForCausalLM            | 1  |       pass       |
|       RobertaForQuestionAnswering       | 1  |       pass       |
|         Speech2Text2ForCausalLM         | 1  |       pass       |
|       T5ForConditionalGeneration        | 1  |       pass       |
|                 T5Small                 | 1  |       pass       |
|            TrOCRForCausalLM             | 1  |       pass       |
|            YituTechConvBert             | 1  |       pass       |
+-----------------------------------------+----+------------------+

Compilation latency (sec)

+-----------------------------------------+----+-----------+
|                  name                   | bs | inductor  |
+-----------------------------------------+----+-----------+
|          AllenaiLongformerBase          | 1  | 97.868279 |
|          MobileBertForMaskedLM          | 1  | 69.671983 |
|     MobileBertForQuestionAnswering      | 1  | 69.293214 |
|     M2M100ForConditionalGeneration      | 1  | 41.146564 |
|     PegasusForConditionalGeneration     | 1  | 41.123196 |
|      MBartForConditionalGeneration      | 1  | 39.47347  |
|      BartForConditionalGeneration       | 1  | 35.99762  |
|             XGLMForCausalLM             | 1  | 34.120857 |
|          BlenderbotForCausalLM          | 1  | 33.260128 |
|            XLNetLMHeadModel             | 1  | 33.220913 |
|      DebertaV2ForQuestionAnswering      | 1  | 30.997772 |
|          DebertaV2ForMaskedLM           | 1  | 30.956179 |
|         MegatronBertForCausalLM         | 1  | 29.600381 |
| BlenderbotSmallForConditionalGeneration | 1  | 29.401686 |
|    MegatronBertForQuestionAnswering     | 1  | 29.336943 |
|       MT5ForConditionalGeneration       | 1  | 28.385944 |
|                 T5Small                 | 1  | 26.978216 |
|       T5ForConditionalGeneration        | 1  | 26.974887 |
|            YituTechConvBert             | 1  | 26.176997 |
|     PLBartForConditionalGeneration      | 1  | 22.583017 |
|            MBartForCausalLM             | 1  | 20.225434 |
|           PegasusForCausalLM            | 1  | 19.853701 |
|            TrOCRForCausalLM             | 1  | 19.78667  |
|             OPTForCausalLM              | 1  | 19.381415 |
|           ElectraForCausalLM            | 1  | 18.282808 |
|           DebertaForMaskedLM            | 1  | 18.251918 |
|       ElectraForQuestionAnswering       | 1  | 18.076577 |
|       DebertaForQuestionAnswering       | 1  | 17.959263 |
|           LayoutLMForMaskedLM           | 1  | 17.764572 |
|                CamemBert                | 1  | 17.688195 |
|           RobertaForCausalLM            | 1  | 17.669182 |
|             BertForMaskedLM             | 1  | 17.535489 |
|       RobertaForQuestionAnswering       | 1  | 17.449936 |
|    LayoutLMForSequenceClassification    | 1  | 17.434664 |
|        BertForQuestionAnswering         | 1  | 17.32631  |
|             BartForCausalLM             | 1  | 17.166026 |
|       BlenderbotSmallForCausalLM        | 1  | 15.630772 |
|      GPT2ForSequenceClassification      | 1  | 14.702052 |
|               GoogleFnet                | 1  | 14.433931 |
|         Speech2Text2ForCausalLM         | 1  | 13.86201  |
|          DistilBertForMaskedLM          | 1  | 13.662264 |
|            PLBartForCausalLM            | 1  | 13.489619 |
|     DistilBertForQuestionAnswering      | 1  | 13.323419 |
|               DistillGPT2               | 1  | 11.732894 |
|            AlbertForMaskedLM            | 1  | 9.494742  |
|       AlbertForQuestionAnswering        | 1  | 7.091945  |
+-----------------------------------------+----+-----------+

Peak Memory Compression Ratio

+-----------------------------------------+----+----------+
|                  name                   | bs | inductor |
+-----------------------------------------+----+----------+
|             OPTForCausalLM              | 1  | 0.986358 |
|      MBartForConditionalGeneration      | 1  | 0.975093 |
|                 T5Small                 | 1  | 0.954508 |
|      GPT2ForSequenceClassification      | 1  | 0.952774 |
|          AllenaiLongformerBase          | 1  | 0.949064 |
|            XLNetLMHeadModel             | 1  | 0.909754 |
|       T5ForConditionalGeneration        | 1  | 0.907259 |
|            PLBartForCausalLM            | 1  | 0.900576 |
|     PLBartForConditionalGeneration      | 1  | 0.900113 |
|            MBartForCausalLM             | 1  | 0.885099 |
|       DebertaForQuestionAnswering       | 1  | 0.877583 |
|      BartForConditionalGeneration       | 1  | 0.87501  |
|       RobertaForQuestionAnswering       | 1  | 0.868916 |
|               GoogleFnet                | 1  | 0.852805 |
|    LayoutLMForSequenceClassification    | 1  | 0.849269 |
|        BertForQuestionAnswering         | 1  | 0.847589 |
|      DebertaV2ForQuestionAnswering      | 1  | 0.842687 |
|    MegatronBertForQuestionAnswering     | 1  | 0.839204 |
|       ElectraForQuestionAnswering       | 1  | 0.837821 |
|           DebertaForMaskedLM            | 1  | 0.830117 |
|               DistillGPT2               | 1  | 0.828025 |
|          DebertaV2ForMaskedLM           | 1  | 0.823643 |
|         MegatronBertForCausalLM         | 1  | 0.818904 |
|           LayoutLMForMaskedLM           | 1  | 0.81662  |
|             BertForMaskedLM             | 1  | 0.815899 |
|                CamemBert                | 1  | 0.813545 |
|           RobertaForCausalLM            | 1  | 0.811661 |
|         Speech2Text2ForCausalLM         | 1  | 0.811012 |
|            YituTechConvBert             | 1  | 0.808187 |
|             BartForCausalLM             | 1  | 0.807778 |
|           ElectraForCausalLM            | 1  | 0.805874 |
|     DistilBertForQuestionAnswering      | 1  | 0.799836 |
|          BlenderbotForCausalLM          | 1  | 0.797979 |
|            TrOCRForCausalLM             | 1  | 0.786483 |
|       MT5ForConditionalGeneration       | 1  | 0.779433 |
|       BlenderbotSmallForCausalLM        | 1  | 0.762035 |
|           PegasusForCausalLM            | 1  | 0.753507 |
|          DistilBertForMaskedLM          | 1  | 0.746588 |
| BlenderbotSmallForConditionalGeneration | 1  | 0.740655 |
|     M2M100ForConditionalGeneration      | 1  | 0.721654 |
|     PegasusForConditionalGeneration     | 1  | 0.711644 |
|     MobileBertForQuestionAnswering      | 1  | 0.710807 |
|             XGLMForCausalLM             | 1  | 0.694245 |
|          MobileBertForMaskedLM          | 1  | 0.693102 |
|            AlbertForMaskedLM            | 1  | 0.448737 |
|       AlbertForQuestionAnswering        | 1  | 0.443711 |
+-----------------------------------------+----+----------+

Absolute latency (ms)

+-----------------------------------------+----+--------------+
|                  name                   | bs |   inductor   |
+-----------------------------------------+----+--------------+
|            AlbertForMaskedLM            | 1  | 12995.479795 |
|       AlbertForQuestionAnswering        | 1  | 12954.123721 |
|      MBartForConditionalGeneration      | 1  | 6084.220772  |
|      BartForConditionalGeneration       | 1  | 5534.746138  |
|             OPTForCausalLM              | 1  | 5312.284495  |
|          DebertaV2ForMaskedLM           | 1  | 5074.367541  |
|      DebertaV2ForQuestionAnswering      | 1  |  3982.73231  |
|            XLNetLMHeadModel             | 1  | 3226.034164  |
|            MBartForCausalLM             | 1  | 3083.069054  |
|          BlenderbotForCausalLM          | 1  | 2652.636933  |
|                 T5Small                 | 1  | 2539.249267  |
|       T5ForConditionalGeneration        | 1  | 2539.188149  |
|             BartForCausalLM             | 1  | 2524.156111  |
|          AllenaiLongformerBase          | 1  | 2447.697604  |
|     PLBartForConditionalGeneration      | 1  | 2144.072629  |
|         MegatronBertForCausalLM         | 1  | 2037.912007  |
|    MegatronBertForQuestionAnswering     | 1  | 1869.597463  |
|      GPT2ForSequenceClassification      | 1  | 1277.776335  |
|            PLBartForCausalLM            | 1  | 1231.652887  |
|             XGLMForCausalLM             | 1  |  837.462901  |
|           DebertaForMaskedLM            | 1  |  788.894006  |
|           RobertaForCausalLM            | 1  |  768.64663   |
|     M2M100ForConditionalGeneration      | 1  |  716.662823  |
|                CamemBert                | 1  |  684.200895  |
|            YituTechConvBert             | 1  |  684.121092  |
|             BertForMaskedLM             | 1  |  675.039855  |
|           LayoutLMForMaskedLM           | 1  |  674.475185  |
|     PegasusForConditionalGeneration     | 1  |  606.718113  |
|            TrOCRForCausalLM             | 1  |  596.259261  |
|       DebertaForQuestionAnswering       | 1  |  562.157446  |
|        BertForQuestionAnswering         | 1  |  534.047082  |
|    LayoutLMForSequenceClassification    | 1  |  533.855661  |
|       RobertaForQuestionAnswering       | 1  |  533.270479  |
|               DistillGPT2               | 1  |  499.350362  |
|               GoogleFnet                | 1  |  476.529954  |
|       MT5ForConditionalGeneration       | 1  |  396.262124  |
|           PegasusForCausalLM            | 1  |  300.566854  |
| BlenderbotSmallForConditionalGeneration | 1  |  142.725145  |
|           ElectraForCausalLM            | 1  |  131.426615  |
|          DistilBertForMaskedLM          | 1  |  100.072924  |
|       ElectraForQuestionAnswering       | 1  |  89.906525   |
|       BlenderbotSmallForCausalLM        | 1  |  85.237996   |
|          MobileBertForMaskedLM          | 1  |  67.978705   |
|     DistilBertForQuestionAnswering      | 1  |  63.672856   |
|     MobileBertForQuestionAnswering      | 1  |  40.297929   |
|         Speech2Text2ForCausalLM         | 1  |  18.653624   |
+-----------------------------------------+----+--------------+

timm_models suite with float32 precision

Performance speedup

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          inception_v3           | 1  | 2.421961 |
|       gluon_inception_v3        | 1  | 2.415763 |
|          pnasnet5large          | 1  | 2.412013 |
|        adv_inception_v3         | 1  | 2.382048 |
|         mobilenetv2_100         | 1  | 2.25128  |
|           dm_nfnet_f0           | 1  | 2.232666 |
|            nfnet_l0             | 1  | 2.22863  |
|          ghostnet_100           | 1  | 2.218367 |
|          spnasnet_100           | 1  | 2.192358 |
|            lcnet_050            | 1  | 2.191729 |
|            levit_128            | 1  | 2.181946 |
|           mnasnet_100           | 1  | 2.149293 |
|           regnety_002           | 1  | 2.12642  |
|           fbnetc_100            | 1  | 2.091896 |
|      mobilenetv3_large_100      | 1  | 2.062953 |
|            repvgg_a2            | 1  | 1.979601 |
|            fbnetv3_b            | 1  | 1.88769  |
|           rexnet_100            | 1  | 1.871547 |
|       tf_efficientnet_b0        | 1  | 1.823168 |
|           selecsls42b           | 1  | 1.816873 |
|         poolformer_m36          | 1  | 1.799168 |
|            tinynet_a            | 1  | 1.797529 |
|           mobilevit_s           | 1  | 1.744245 |
|             dla102              | 1  | 1.74115  |
|        ese_vovnet19b_dw         | 1  | 1.736422 |
|        eca_halonext26ts         | 1  | 1.658536 |
|          botnet26t_256          | 1  | 1.650154 |
|       eca_botnext26ts_256       | 1  | 1.627449 |
|          cspdarknet53           | 1  | 1.625657 |
|           volo_d1_224           | 1  | 1.602848 |
|        res2net50_14w_8s         | 1  | 1.602331 |
|           res2next50            | 1  | 1.59996  |
|         coat_lite_mini          | 1  | 1.565499 |
|           tf_mixnet_l           | 1  | 1.549169 |
|        res2net101_26w_4s        | 1  | 1.546597 |
|         visformer_small         | 1  | 1.437208 |
|        twins_pcpvt_base         | 1  | 1.424294 |
|           convit_base           | 1  | 1.414105 |
|          gmixer_24_224          | 1  | 1.395226 |
|            mixnet_l             | 1  | 1.380772 |
|            gernet_l             | 1  | 1.353294 |
|          jx_nest_base           | 1  | 1.345936 |
|          resmlp_12_224          | 1  | 1.310321 |
|      beit_base_patch16_224      | 1  | 1.304388 |
|        tnt_s_patch16_224        | 1  | 1.303194 |
|         crossvit_9_240          | 1  | 1.30302  |
|  swin_base_patch4_window7_224   | 1  | 1.294222 |
|        convmixer_768_32         | 1  |  1.2484  |
|          gmlp_s16_224           | 1  | 1.246177 |
|          mixer_b16_224          | 1  | 1.213441 |
| deit_base_distilled_patch16_224 | 1  | 1.211111 |
|            pit_b_224            | 1  | 1.201843 |
|      xcit_large_24_p8_224       | 1  | 1.189981 |
|      vit_base_patch16_224       | 1  | 1.189284 |
|          convnext_base          | 1  | 1.181283 |
|             dpn107              | 1  | 1.175814 |
|          cait_m36_384           | 1  | 1.066936 |
|           resnest101e           | 1  | 0.993293 |
|        sebotnet33ts_256         | 1  | 0.984809 |
|            hrnet_w18            | 1  | 0.622016 |
|     swsl_resnext101_32x16d      | 1  | 0.067321 |
+---------------------------------+----+----------+

Accuracy

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|        adv_inception_v3         | 1  |   pass   |
|      beit_base_patch16_224      | 1  |   pass   |
|          botnet26t_256          | 1  |   pass   |
|          cait_m36_384           | 1  |   pass   |
|         coat_lite_mini          | 1  |   pass   |
|           convit_base           | 1  |   pass   |
|        convmixer_768_32         | 1  |   pass   |
|          convnext_base          | 1  |   pass   |
|         crossvit_9_240          | 1  |   pass   |
|          cspdarknet53           | 1  |   pass   |
| deit_base_distilled_patch16_224 | 1  |   pass   |
|             dla102              | 1  |   pass   |
|           dm_nfnet_f0           | 1  |   pass   |
|             dpn107              | 1  |   pass   |
|       eca_botnext26ts_256       | 1  |   pass   |
|        eca_halonext26ts         | 1  |   pass   |
|        ese_vovnet19b_dw         | 1  |   pass   |
|           fbnetc_100            | 1  |   pass   |
|            fbnetv3_b            | 1  |   pass   |
|            gernet_l             | 1  |   pass   |
|          ghostnet_100           | 1  |   pass   |
|       gluon_inception_v3        | 1  |   pass   |
|          gmixer_24_224          | 1  |   pass   |
|          gmlp_s16_224           | 1  |   pass   |
|            hrnet_w18            | 1  |   pass   |
|          inception_v3           | 1  |   pass   |
|          jx_nest_base           | 1  |   pass   |
|            lcnet_050            | 1  |   pass   |
|            levit_128            | 1  |   pass   |
|          mixer_b16_224          | 1  |   pass   |
|            mixnet_l             | 1  |   pass   |
|           mnasnet_100           | 1  |   pass   |
|         mobilenetv2_100         | 1  |   pass   |
|      mobilenetv3_large_100      | 1  |   pass   |
|           mobilevit_s           | 1  |   pass   |
|            nfnet_l0             | 1  |   pass   |
|            pit_b_224            | 1  |   pass   |
|          pnasnet5large          | 1  |   pass   |
|         poolformer_m36          | 1  |   pass   |
|           regnety_002           | 1  |   pass   |
|            repvgg_a2            | 1  |   pass   |
|        res2net101_26w_4s        | 1  |   pass   |
|        res2net50_14w_8s         | 1  |   pass   |
|           res2next50            | 1  |   pass   |
|          resmlp_12_224          | 1  |   pass   |
|           resnest101e           | 1  |   pass   |
|           rexnet_100            | 1  |   pass   |
|        sebotnet33ts_256         | 1  |   pass   |
|           selecsls42b           | 1  |   pass   |
|          spnasnet_100           | 1  |   pass   |
|  swin_base_patch4_window7_224   | 1  |   pass   |
|     swsl_resnext101_32x16d      | 1  |   pass   |
|       tf_efficientnet_b0        | 1  |   pass   |
|           tf_mixnet_l           | 1  |   pass   |
|            tinynet_a            | 1  |   pass   |
|        tnt_s_patch16_224        | 1  |   pass   |
|        twins_pcpvt_base         | 1  |   pass   |
|         visformer_small         | 1  |   pass   |
|      vit_base_patch16_224       | 1  |   pass   |
|           volo_d1_224           | 1  |   pass   |
|      xcit_large_24_p8_224       | 1  |   pass   |
+---------------------------------+----+----------+

Compilation latency (sec)

+---------------------------------+----+-----------+
|              name               | bs | inductor  |
+---------------------------------+----+-----------+
|     swsl_resnext101_32x16d      | 1  | 83.502823 |
|          cait_m36_384           | 1  | 55.712147 |
|            hrnet_w18            | 1  | 53.394404 |
|      xcit_large_24_p8_224       | 1  | 51.882251 |
|          pnasnet5large          | 1  | 48.913119 |
|  swin_base_patch4_window7_224   | 1  | 46.945286 |
|         poolformer_m36          | 1  | 43.26917  |
|        twins_pcpvt_base         | 1  | 36.604022 |
|        res2net101_26w_4s        | 1  | 36.210698 |
|          jx_nest_base           | 1  | 35.564321 |
|           resnest101e           | 1  | 35.148051 |
|        tnt_s_patch16_224        | 1  | 34.721879 |
|        res2net50_14w_8s         | 1  | 34.034497 |
|             dpn107              | 1  | 32.057249 |
|           mobilevit_s           | 1  | 29.432007 |
|           tf_mixnet_l           | 1  | 29.280642 |
|        adv_inception_v3         | 1  | 27.845944 |
|           volo_d1_224           | 1  | 27.798996 |
|            mixnet_l             | 1  | 27.560155 |
|          gmixer_24_224          | 1  | 25.872845 |
|          inception_v3           | 1  | 25.71611  |
|       gluon_inception_v3        | 1  | 25.696039 |
|         crossvit_9_240          | 1  | 25.390237 |
|            levit_128            | 1  | 24.654568 |
|          gmlp_s16_224           | 1  | 24.485615 |
|           res2next50            | 1  | 23.082075 |
|          convnext_base          | 1  | 22.915347 |
|        eca_halonext26ts         | 1  | 21.889975 |
|            fbnetv3_b            | 1  | 21.680439 |
|        sebotnet33ts_256         | 1  | 21.289327 |
|             dla102              | 1  | 20.879942 |
|          ghostnet_100           | 1  | 20.379535 |
|         coat_lite_mini          | 1  | 20.360243 |
|           rexnet_100            | 1  | 20.099963 |
|           convit_base           | 1  | 18.980747 |
|       eca_botnext26ts_256       | 1  | 17.92289  |
|            tinynet_a            | 1  | 17.742726 |
|         visformer_small         | 1  | 17.403906 |
|          botnet26t_256          | 1  | 17.04161  |
|        convmixer_768_32         | 1  | 17.041442 |
|            pit_b_224            | 1  | 16.979724 |
|       tf_efficientnet_b0        | 1  | 16.60333  |
|          cspdarknet53           | 1  | 16.415433 |
|          mixer_b16_224          | 1  | 16.281474 |
|      beit_base_patch16_224      | 1  | 16.17476  |
|           dm_nfnet_f0           | 1  | 16.059847 |
| deit_base_distilled_patch16_224 | 1  | 15.012019 |
|      vit_base_patch16_224       | 1  | 14.882517 |
|            repvgg_a2            | 1  | 14.64388  |
|           regnety_002           | 1  | 14.630779 |
|            nfnet_l0             | 1  | 14.595734 |
|           fbnetc_100            | 1  | 14.57669  |
|      mobilenetv3_large_100      | 1  | 14.54053  |
|          spnasnet_100           | 1  |  14.5242  |
|            gernet_l             | 1  | 14.062137 |
|         mobilenetv2_100         | 1  | 13.463991 |
|        ese_vovnet19b_dw         | 1  | 13.276569 |
|           mnasnet_100           | 1  | 13.222021 |
|           selecsls42b           | 1  | 12.665365 |
|          resmlp_12_224          | 1  | 12.092849 |
|            lcnet_050            | 1  | 11.544305 |
+---------------------------------+----+-----------+

Peak Memory Compression Ratio

+---------------------------------+----+----------+
|              name               | bs | inductor |
+---------------------------------+----+----------+
|          cait_m36_384           | 1  | 0.951302 |
|          pnasnet5large          | 1  | 0.908046 |
|        convmixer_768_32         | 1  | 0.90316  |
|            nfnet_l0             | 1  | 0.890808 |
|      xcit_large_24_p8_224       | 1  | 0.88836  |
|        ese_vovnet19b_dw         | 1  | 0.877303 |
|       eca_botnext26ts_256       | 1  | 0.871917 |
|         mobilenetv2_100         | 1  | 0.870508 |
|           mnasnet_100           | 1  | 0.868318 |
|        eca_halonext26ts         | 1  | 0.864536 |
|            lcnet_050            | 1  | 0.859094 |
|       tf_efficientnet_b0        | 1  | 0.856472 |
|          spnasnet_100           | 1  | 0.856457 |
|           dm_nfnet_f0           | 1  | 0.854029 |
|      mobilenetv3_large_100      | 1  | 0.853844 |
|          botnet26t_256          | 1  | 0.853086 |
|            fbnetv3_b            | 1  | 0.852714 |
|           fbnetc_100            | 1  | 0.851083 |
|           rexnet_100            | 1  | 0.851025 |
|            tinynet_a            | 1  | 0.848743 |
|          cspdarknet53           | 1  | 0.843397 |
|           mobilevit_s           | 1  | 0.83877  |
|           tf_mixnet_l           | 1  | 0.832397 |
|          ghostnet_100           | 1  | 0.831471 |
|           regnety_002           | 1  | 0.830852 |
|            mixnet_l             | 1  | 0.829164 |
|          resmlp_12_224          | 1  | 0.824192 |
|        sebotnet33ts_256         | 1  | 0.82015  |
|         visformer_small         | 1  | 0.819634 |
|         coat_lite_mini          | 1  | 0.819135 |
|             dpn107              | 1  | 0.812194 |
|            levit_128            | 1  | 0.806264 |
|         poolformer_m36          | 1  | 0.80318  |
|          convnext_base          | 1  | 0.800857 |
|          gmlp_s16_224           | 1  | 0.796613 |
|        adv_inception_v3         | 1  | 0.795762 |
|             dla102              | 1  | 0.795052 |
|           resnest101e           | 1  | 0.794717 |
|       gluon_inception_v3        | 1  | 0.794371 |
|          inception_v3           | 1  | 0.793973 |
|           volo_d1_224           | 1  | 0.789475 |
|          gmixer_24_224          | 1  | 0.787316 |
|           selecsls42b           | 1  | 0.787138 |
|         crossvit_9_240          | 1  | 0.784929 |
|           res2next50            | 1  | 0.784269 |
|        tnt_s_patch16_224        | 1  | 0.783085 |
|           convit_base           | 1  | 0.780772 |
|        twins_pcpvt_base         | 1  | 0.77806  |
|          jx_nest_base           | 1  | 0.777177 |
|          mixer_b16_224          | 1  | 0.775989 |
|            hrnet_w18            | 1  | 0.774138 |
|      beit_base_patch16_224      | 1  | 0.773325 |
|        res2net50_14w_8s         | 1  | 0.767034 |
| deit_base_distilled_patch16_224 | 1  | 0.764136 |
|      vit_base_patch16_224       | 1  | 0.764073 |
|            pit_b_224            | 1  | 0.753687 |
|  swin_base_patch4_window7_224   | 1  | 0.741774 |
|            repvgg_a2            | 1  | 0.739217 |
|        res2net101_26w_4s        | 1  | 0.736205 |
|            gernet_l             | 1  | 0.731802 |
|     swsl_resnext101_32x16d      | 1  | 0.653789 |
+---------------------------------+----+----------+

Absolute latency (ms)

+---------------------------------+----+--------------+
|              name               | bs |   inductor   |
+---------------------------------+----+--------------+
|     swsl_resnext101_32x16d      | 1  | 13594.708267 |
|          cait_m36_384           | 1  | 3446.255326  |
|      xcit_large_24_p8_224       | 1  | 1568.428891  |
|           resnest101e           | 1  | 1088.252179  |
|          pnasnet5large          | 1  |  372.932981  |
|          convnext_base          | 1  |  304.733076  |
|            hrnet_w18            | 1  |  292.655153  |
|             dpn107              | 1  |  274.182536  |
|          jx_nest_base           | 1  |  252.788416  |
|        convmixer_768_32         | 1  |  250.959721  |
|  swin_base_patch4_window7_224   | 1  |  224.639281  |
|      vit_base_patch16_224       | 1  |  200.07969   |
|           convit_base           | 1  |  198.876055  |
|      beit_base_patch16_224      | 1  |  197.530277  |
| deit_base_distilled_patch16_224 | 1  |  196.977928  |
|            pit_b_224            | 1  |  166.197326  |
|           dm_nfnet_f0           | 1  |  159.693973  |
|          mixer_b16_224          | 1  |  143.093406  |
|         poolformer_m36          | 1  |  120.979066  |
|        res2net101_26w_4s        | 1  |  112.164124  |
|        twins_pcpvt_base         | 1  |  105.057027  |
|        sebotnet33ts_256         | 1  |  98.817945   |
|            nfnet_l0             | 1  |  94.825363   |
|           volo_d1_224           | 1  |  92.087874   |
|             dla102              | 1  |  91.081911   |
|        tnt_s_patch16_224        | 1  |  90.835427   |
|          cspdarknet53           | 1  |  84.474595   |
|          inception_v3           | 1  |  69.940911   |
|        adv_inception_v3         | 1  |  69.835815   |
|       gluon_inception_v3        | 1  |  69.694283   |
|          gmlp_s16_224           | 1  |  68.611118   |
|         visformer_small         | 1  |  67.802188   |
|          gmixer_24_224          | 1  |   65.56316   |
|        res2net50_14w_8s         | 1  |  64.715237   |
|            repvgg_a2            | 1  |  64.653167   |
|           res2next50            | 1  |  59.431682   |
|            gernet_l             | 1  |  57.083127   |
|          botnet26t_256          | 1  |  46.672029   |
|           selecsls42b           | 1  |   45.77565   |
|        eca_halonext26ts         | 1  |  42.870281   |
|       eca_botnext26ts_256       | 1  |  42.854987   |
|         coat_lite_mini          | 1  |  37.923279   |
|           mobilevit_s           | 1  |   36.51965   |
|          resmlp_12_224          | 1  |  36.230917   |
|        ese_vovnet19b_dw         | 1  |   31.69767   |
|         crossvit_9_240          | 1  |  31.449898   |
|            mixnet_l             | 1  |  31.124294   |
|           tf_mixnet_l           | 1  |  29.454921   |
|            fbnetv3_b            | 1  |  15.554531   |
|       tf_efficientnet_b0        | 1  |  13.878689   |
|           rexnet_100            | 1  |  13.398573   |
|            tinynet_a            | 1  |  11.730626   |
|           fbnetc_100            | 1  |   9.289581   |
|            levit_128            | 1  |   9.010785   |
|          ghostnet_100           | 1  |   8.588259   |
|          spnasnet_100           | 1  |   8.216289   |
|           mnasnet_100           | 1  |   7.512448   |
|         mobilenetv2_100         | 1  |   7.406163   |
|      mobilenetv3_large_100      | 1  |   7.072924   |
|           regnety_002           | 1  |   5.995749   |
|            lcnet_050            | 1  |   2.430234   |
+---------------------------------+----+--------------+

blzheng changed the title ~~[CPU] TorchInductor Performance Dashboard~~ TorchInductor CPU Performance Dashboard Oct 13, 2022

EikanWang assigned blzheng Oct 13, 2022

chuanqi129 mentioned this issue Feb 1, 2023

[Inductor] [CPU] Crash failure in torchbench model mobilenet_v2_quantized_qat & resnet50_quantized_qat #93430

Closed

TorchInductor CPU Performance Dashboard #93531

TorchInductor CPU Performance Dashboard #93531

Comments

blzheng commented Oct 13, 2022 • edited by chauhang Loading

blzheng commented Oct 13, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

blzheng commented Oct 13, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

jansel commented Oct 13, 2022

blzheng commented Oct 13, 2022

blzheng commented Oct 18, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Oct 27, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

blzheng commented Oct 28, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 3, 2022

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 3, 2022

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 8, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-Socket Multi-threads

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 8, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-core Single-thread

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

chuanqi129 commented Nov 11, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-09 nightly release)

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

chuanqi129 commented Nov 11, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-09 nightly release)

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 16, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-Socket Multi-threads (2022-11-13 nightly release)

Executive Summary

torchbench suite with float32 precision

huggingface suite with float32 precision

timm_models suite with float32 precision

ESI-SYD commented Nov 16, 2022 • edited Loading

Performance Dashboard for float32 precision -- Single-core Single-thread (2022-11-13 nightly release)

blzheng commented Oct 13, 2022 •

edited by chauhang

Loading

blzheng commented Oct 13, 2022 •

edited

Loading

blzheng commented Oct 13, 2022 •

edited

Loading

blzheng commented Oct 18, 2022 •

edited

Loading

ESI-SYD commented Nov 8, 2022 •

edited

Loading

ESI-SYD commented Nov 8, 2022 •

edited

Loading

chuanqi129 commented Nov 11, 2022 •

edited

Loading

chuanqi129 commented Nov 11, 2022 •

edited

Loading

ESI-SYD commented Nov 16, 2022 •

edited

Loading

ESI-SYD commented Nov 16, 2022 •

edited

Loading

ESI-SYD commented Nov 18, 2022 •

edited

Loading

ESI-SYD commented Nov 18, 2022 •

edited

Loading