Remove j and block Kmkhz 9c #221
Conversation
|
Running into some KGO issues at fast debug. Below testing at full-debug indicates that the KGO is good, whilst 1 and 4 are technically added, 2T is consistent with trunk, and they hold between runs, which with the denom LHS above, they were not. The most recent changes here increase the number of ii blocking loops present with the dynamic schedule. Test Suite Results - lfric_apps - kmkhz_9c_pysclone/run20Suite Information
Task Information✅ succeeded tasks - 12 |
|
Given full-debug does not change and holds between runs, likely it might be an optimisation occurence from CCE? Updating the CCE KGOs show it holds. Further testing, do they hold as seg-size changes? Test Suite Results - lfric_apps - kmkhz_9c_pysclone/run21Suite Information
Task Information✅ succeeded tasks - 51 |
|
Refactor of damping layer matrix (#139) means that full-debug used as reference are out of date, generating new to preserve datapoint that changes are safe. Barriers have been left. They should be removable (after removing some of the nowaits), but I'll look at this further in the PSyclone version of this ticket. Current leading thought is that KGO changes are Optimisation changes, but they are more widespread. Likely the intial KGO change was caused by the seg size loop intros, and as the bounds change, but the no waits didn't, the threaded runs changed, then stabalised. Changing the segment size after updating the KGOs does not cause them to change. Changing further syncronisation points may have allowed the compiler to change how it has optimised, generting further KGO shifts. Full debug otherwise remains consistent as a reference point, where previously with a genuine bug, they did not. This reinforces that the KGO change is optmisation driven. After updating KGO, but no change to seg len (for this PR): After changing Seg len to 16: |
|
I'm putting this into review for now, but I still expect KGO changes for most of LFRic atm. Given these have been changing rapidly on trunk recently, and that the 1T tasks updated as part of the OMP dev group are representative(enough) of lfric atm mpi 1T jobs, up until CR I'll leave these for now unless requested by CO/SR. |
iboutle
left a comment
There was a problem hiding this comment.
Not a request to do it now - happy to wait until it's closer to trunk - but I'd like to see the full set of KGO changes before giving approval
|
Your CLA signature was found on the base branch, but you appear to have modified the CONTRIBUTORS.md file in this PR. Please do not edit the CONTRIBUTORS.md file. If you have already signed the CLA, revert changes to the file and your signature will be picked up. |
|
The conflicts have been quite horrendous on this PR - I'm, for clarity going to do a new PR with just the changes to kmkhz and the KGOs. The version.py changes can occur in their own PR. |
|
See new PR #325 |
|
Closing this PR due to commit issues, see #325 which lifts this work, minus the versions which will occur in a new issuehttps://github.com//issues/327 |
PR Summary
Sci/Tech Reviewer: @christophermaynard
Code Reviewer:
Remove the j loop which is adversely affecting performance in the boundary layer, and improve blocking loops.
Initial KGO issues were fixed with adding
denomto the private list L2546.Introducing further blocking loops have introduced new KGO changes, which appear to be fast debug O2 optimisation driven.
I've also noted that with differing loop ranges over threads, reducing nowaits is required, which opens the option of removing the barrier to aid further PSyclone work in the future (The barriers will remain for now).
With the above KGO occurrences, these have likely factored into the cause of KGO changes at 1T, which remain unaffected for full-debug, still indicating optimisation as the likely root cause.
As the KGO updates are holding (for the all of the tests involved in the OMP dev group) for 1, 2 or 4T, I'm reasonably confident that it's not a newly introduced race condition or similar.
Code Quality Checklist
Testing
trac.log
Security Considerations
Performance Impact
AI Assistance and Attribution
Documentation
PSyclone Approval
Sci/Tech Review
(Please alert the code reviewer via a tag when you have approved the SR)
Code Review