Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inductor] refine loop split logic #124060

Closed
wants to merge 27 commits into from

Conversation

zhuhaozhe
Copy link
Collaborator

@zhuhaozhe zhuhaozhe commented Apr 15, 2024

A fix aim to better leverage omp parallel collapse. First exposed by #122281.

ICX results for FP32 multi-thread

before after speedup
hf 1.43 1.44 1.003
timms 2.06 2.08 1.008
torchbench 1.61 1.62 1.006

SPR results for BF16 multi-thread

before after speedup
hf 2.27 2.28 1.003
timms 2.75 2.79 1.016
torchbench 2.29 2.34 1.019

No regression > 5%, most model performance are similar.
Previous regressed model basic_gnn_gcn are back, and some other models are benefit.

Per model

before after speedup
basic_gnn_gcn amp 1.08 1.18 1.09
cait_m36_384 amp 0.82 1.48 1.78
mobilevit_s amp 1.529 2.79 1.82
cait_m36_384 float32 1.13 1.35 1.19
mobilevit_s float32 1.63 2.21 1.35
basic_gnn_gcn float32 1.33 1.49 1.12
pyhpc_isoneutral_mixing float32 0.64 0.79 1.23

ICX results for FP32 multi-thread

before after speedup
hf 1.17 1.18 1.005
timms 1.52 1.53 1.003
torchbench 2.40 2.38 0.989

For FP32 single thread test, overall speed up is similar.
Found a regression >% case and after use C10_UNLIKELY/C10_LIKELY in if condition check, the performance will be back.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Apr 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124060

Note: Links to docs will display an error until the docs builds have been completed.

❌ 50 New Failures, 55 Unrelated Failures

As of commit 8d67fcf with merge base 8a09940 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zhuhaozhe added a commit that referenced this pull request Apr 15, 2024
ghstack-source-id: 302a21b51aac87601b742a6bf161bd378b0b122c
Pull Request resolved: #124060
@zhuhaozhe zhuhaozhe marked this pull request as draft April 15, 2024 13:52
@zhuhaozhe zhuhaozhe changed the title [inductor] refine loop split logic [WIP][inductor] refine loop split logic Apr 15, 2024
@zhuhaozhe zhuhaozhe added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 15, 2024
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Apr 16, 2024
ghstack-source-id: 8f1c475e72431cdbcae59e32e58fb5c8a739f1cf
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Apr 16, 2024
ghstack-source-id: 6ed2a9c807c915f826a694514a3a353387c4632e
Pull Request resolved: #124060
zhuhaozhe added a commit that referenced this pull request Apr 29, 2024
ghstack-source-id: b64a2fb805e363a4e9c4e5852157a57063e4c696
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Apr 29, 2024
ghstack-source-id: 7a40f4cfad10e7aea8155a3ef7fbcec3860e55d7
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request Apr 30, 2024
ghstack-source-id: debd4963fa06b4e3e585afe5b788e2505303b697
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 3, 2024
ghstack-source-id: 00b2d01edd2c6f49546c456691704a424fa47b41
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 6, 2024
ghstack-source-id: d6511a4c7205452688262848e3e8124fe096fc9c
Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 8, 2024
ghstack-source-id: 6d439f29043c854d942e715254bd316d191ab353
Pull Request resolved: #124060
## ICX results for FP32 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | 1.36   | 1.37  | 1.007   |
| timms      | 1.91   | 1.94  | 1.015   |
| torchbench | 1.46   | 1.48  | 1.013   |

## SPR results for BF16 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | WIP| WIP  | WIP   |
| timms      | WIP   | WIP  | WIP   |
| torchbench | WIP   | WIP  | WIP   |

## Regression >5% (WIP for further analyze)
maml 85% for FP32





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 9, 2024
ghstack-source-id: 4d61ed3915f5053ba5dc1bf07c56a0e4150598e5
Pull Request resolved: #124060
## ICX results for FP32 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | 1.36   | 1.37  | 1.007   |
| timms      | 1.91   | 1.94  | 1.015   |
| torchbench | 1.46   | 1.48  | 1.013   |

## SPR results for BF16 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | WIP| WIP  | WIP   |
| timms      | WIP   | WIP  | WIP   |
| torchbench | WIP   | WIP  | WIP   |

## Regression >5% (WIP for further analyze)
maml 85% for FP32





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
## ICX results for FP32 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | 1.36   | 1.37  | 1.007   |
| timms      | 1.91   | 1.94  | 1.015   |
| torchbench | 1.46   | 1.48  | 1.013   |

## SPR results for BF16 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | WIP| WIP  | WIP   |
| timms      | WIP   | WIP  | WIP   |
| torchbench | WIP   | WIP  | WIP   |

## Regression >5% (WIP for further analyze)
maml 85% for FP32





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 15, 2024
ghstack-source-id: c6bb4d21fdea9b8d275b5ed52712706b52889f8b
Pull Request resolved: #124060
## ICX results for FP32 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | 1.36   | 1.37  | 1.007   |
| timms      | 1.91   | 1.94  | 1.015   |
| torchbench | 1.46   | 1.48  | 1.013   |

## SPR results for BF16 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | WIP| WIP  | WIP   |
| timms      | WIP   | WIP  | WIP   |
| torchbench | WIP   | WIP  | WIP   |

## Regression >5% (WIP for further analyze)
maml 85% for FP32





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
zhuhaozhe added a commit that referenced this pull request May 17, 2024
ghstack-source-id: e75c2b2919a4dabf267745e2bf891410ffdfd9eb
Pull Request resolved: #124060
## ICX results for FP32 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | 1.36   | 1.37  | 1.007   |
| timms      | 1.91   | 1.94  | 1.015   |
| torchbench | 1.46   | 1.48  | 1.013   |

## SPR results for BF16 multi-thread

|            | before | after | speedup |
|------------|--------|-------|---------|
| hf         | WIP| WIP  | WIP   |
| timms      | WIP   | WIP  | WIP   |
| torchbench | WIP   | WIP  | WIP   |

## Regression >5% (WIP for further analyze)
maml 85% for FP32





cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@zhuhaozhe zhuhaozhe requested a review from jgong5 May 20, 2024 02:05
@zhuhaozhe zhuhaozhe changed the title [WIP][inductor] refine loop split logic [inductor] refine loop split logic May 20, 2024
zhuhaozhe added a commit that referenced this pull request May 21, 2024
ghstack-source-id: 99ee0935403d96c5692ac391273b0fb463d7b48d
Pull Request resolved: #124060
[ghstack-poisoned]
@zhuhaozhe
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict gh/zhuhaozhe/20/orig returned non-zero exit code 1

Rebasing (1/1)
Auto-merging torch/_inductor/codegen/cpp.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/cpp.py
error: could not apply 294a6236de0... [inductor] refine loop split logic
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 294a6236de0... [inductor] refine loop split logic

Raised by https://github.com/pytorch/pytorch/actions/runs/9183859297

zhuhaozhe added a commit that referenced this pull request Jun 12, 2024
ghstack-source-id: 25e68a0e52b4b99c0b5136037ec6635e54b44a30
Pull Request resolved: #124060
[ghstack-poisoned]
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 11, 2024
@zhuhaozhe
Copy link
Collaborator Author

Moved to #128812

@github-actions github-actions bot closed this Sep 19, 2024
@github-actions github-actions bot deleted the gh/zhuhaozhe/20/head branch October 20, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants