-
Notifications
You must be signed in to change notification settings - Fork 22.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inductor] refine loop split logic #124060
Conversation
[ghstack-poisoned]
ghstack-source-id: 302a21b51aac87601b742a6bf161bd378b0b122c Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 8f1c475e72431cdbcae59e32e58fb5c8a739f1cf Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 6ed2a9c807c915f826a694514a3a353387c4632e Pull Request resolved: #124060
ghstack-source-id: b64a2fb805e363a4e9c4e5852157a57063e4c696 Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 7a40f4cfad10e7aea8155a3ef7fbcec3860e55d7 Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: debd4963fa06b4e3e585afe5b788e2505303b697 Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 00b2d01edd2c6f49546c456691704a424fa47b41 Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: d6511a4c7205452688262848e3e8124fe096fc9c Pull Request resolved: #124060
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 6d439f29043c854d942e715254bd316d191ab353 Pull Request resolved: #124060
## ICX results for FP32 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | 1.36 | 1.37 | 1.007 | | timms | 1.91 | 1.94 | 1.015 | | torchbench | 1.46 | 1.48 | 1.013 | ## SPR results for BF16 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | WIP| WIP | WIP | | timms | WIP | WIP | WIP | | torchbench | WIP | WIP | WIP | ## Regression >5% (WIP for further analyze) maml 85% for FP32 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 4d61ed3915f5053ba5dc1bf07c56a0e4150598e5 Pull Request resolved: #124060
## ICX results for FP32 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | 1.36 | 1.37 | 1.007 | | timms | 1.91 | 1.94 | 1.015 | | torchbench | 1.46 | 1.48 | 1.013 | ## SPR results for BF16 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | WIP| WIP | WIP | | timms | WIP | WIP | WIP | | torchbench | WIP | WIP | WIP | ## Regression >5% (WIP for further analyze) maml 85% for FP32 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
## ICX results for FP32 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | 1.36 | 1.37 | 1.007 | | timms | 1.91 | 1.94 | 1.015 | | torchbench | 1.46 | 1.48 | 1.013 | ## SPR results for BF16 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | WIP| WIP | WIP | | timms | WIP | WIP | WIP | | torchbench | WIP | WIP | WIP | ## Regression >5% (WIP for further analyze) maml 85% for FP32 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: c6bb4d21fdea9b8d275b5ed52712706b52889f8b Pull Request resolved: #124060
## ICX results for FP32 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | 1.36 | 1.37 | 1.007 | | timms | 1.91 | 1.94 | 1.015 | | torchbench | 1.46 | 1.48 | 1.013 | ## SPR results for BF16 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | WIP| WIP | WIP | | timms | WIP | WIP | WIP | | torchbench | WIP | WIP | WIP | ## Regression >5% (WIP for further analyze) maml 85% for FP32 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: e75c2b2919a4dabf267745e2bf891410ffdfd9eb Pull Request resolved: #124060
## ICX results for FP32 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | 1.36 | 1.37 | 1.007 | | timms | 1.91 | 1.94 | 1.015 | | torchbench | 1.46 | 1.48 | 1.013 | ## SPR results for BF16 multi-thread | | before | after | speedup | |------------|--------|-------|---------| | hf | WIP| WIP | WIP | | timms | WIP | WIP | WIP | | torchbench | WIP | WIP | WIP | ## Regression >5% (WIP for further analyze) maml 85% for FP32 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
ghstack-source-id: 99ee0935403d96c5692ac391273b0fb463d7b48d Pull Request resolved: #124060
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Rebase failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/9183859297 |
ghstack-source-id: 25e68a0e52b4b99c0b5136037ec6635e54b44a30 Pull Request resolved: #124060
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Moved to #128812 |
A fix aim to better leverage
omp parallel collapse
. First exposed by #122281.ICX results for FP32 multi-thread
SPR results for BF16 multi-thread
No regression > 5%, most model performance are similar.
Previous regressed model
basic_gnn_gcn
are back, and some other models are benefit.Per model
ICX results for FP32 multi-thread
For FP32 single thread test, overall speed up is similar.
Found a regression >% case and after use
C10_UNLIKELY/C10_LIKELY
inif condition check
, the performance will be back.Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang