version 1.2.1 and 1.3.0 issues

Hello, thank you for your team’s awesome work!
I have some questions about using the bolt framework.

Here's my working environment:
- target platform: Android-aarch64
- build platform: Linux
- Device: Arm v8.2+
- inference Precision: BNN_FP16
- tested bolt version: both 1.2.1 and 1.3.0




1. when I run the same model on both versions (1.2.1 and 1.3.0) using their X2bolt and benchmark,
the final latency results were almost the same but the compositions (statistics time report) were different.

[ version 1.2.1 ]
<img src="https://user-images.githubusercontent.com/32446763/184139415-fcd910d8-6690-484f-894f-d164b557511b.png" width="40%">

[ version 1.3.0 ]
<img src="https://user-images.githubusercontent.com/32446763/184139496-9bd7c2cc-bd10-4ccc-a9aa-7225ab90a809.png" width="40%">

Both cases run under loops=1 option but the statistics time of version 1.2.1 seems to be the result of running 10 times. Is it normal?






2. I made some custom network, which has a structure like the following:

<img src="https://user-images.githubusercontent.com/32446763/184140655-9f66dcb9-9034-4c94-b9ef-ac639aedd22f.png" width="30%">

This network works with version 1.3.0, but not with 1.2.1 because X2bolt of ver 1.2.1 doesn't work properly.

[ X2bolt debug log (version 1.2.1) ]
<img src="https://user-images.githubusercontent.com/32446763/184140823-66aed958-cb5c-4c0c-9dd1-3fb4a47bfdc3.png" width="80%">

[ X2bolt debug log (version 1.3.0) ]
<img src="https://user-images.githubusercontent.com/32446763/184140972-74d048cd-ba65-44ab-a824-dde16358c842.png" width="80%">

as we can see, ver 1.2.1 X2bolt cannot detect an inputs tensor of Mul_21 so I guess benchmark program stops at
https://github.com/huawei-noah/bolt/blob/4bdc81eeebcfa3e83ee45f906d9cdbe132cc064d/inference/engine/src/cnn.cpp#L696
(or nearby, checked with debug options).

In the case of Mul_21, it is executed after the whole left path of the above graph image, so it is expected that it was difficult to reuse the result of ReduceMean op. Of course, there is no problem with the latest version of X2bolt. Is there a way to solve this in the previous version as well?








3. The reason I use the previous version of the bolt framework is that I saw a significant difference in latency results depending on the version for a specific network model (e.g. Real-to-Binary network, https://arxiv.org/pdf/2003.11535.pdf?fname=cm&font=TypeI).

[ version 1.2.1 ]
<img src="https://user-images.githubusercontent.com/32446763/184143986-b791bb5f-6d94-419c-829d-abda648b6bd6.png" width="40%">


[ version 1.3.0 ]
<img src="https://user-images.githubusercontent.com/32446763/184143945-7fd4e033-b70b-4bce-94e3-7cf095f343dd.png" width="40%">

I wonder if this faster output of ver 1.2.1 is kind a reporting bug in version 1.2.1, or a possible result by the implementation difference.

Thank you for reading my long issue and I look forward to your answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

version 1.2.1 and 1.3.0 issues #120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

version 1.2.1 and 1.3.0 issues #120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions