Conversation
|
kmkhz_9c_after.txt Listing files before and after. |
|
From #221: Running into some KGO issues at fast debug. Below testing at full-debug indicates that the KGO is good, whilst 1 and 4 are technically added, 2T is consistent with trunk, and they hold between runs, which with the denom LHS above, they were not. The most recent changes here increase the number of ii blocking loops present with the dynamic schedule. Test Suite Results - lfric_apps - kmkhz_9c_pysclone/run20Suite Information
Task Information✅ succeeded tasks - 12 |
|
From #221: Updating the CCE KGOs show it holds. Further testing, do they hold as seg-size changes? Test Suite Results - lfric_apps - kmkhz_9c_pysclone/run21Suite Information
Task Information✅ succeeded tasks - 51 |
|
From #221 Barriers have been left. They should be removable (after removing some of the nowaits), but I'll look at this further in the PSyclone version of this ticket. Current leading thought is that KGO changes are Optimisation changes, but they are more widespread. Likely the intial KGO change was caused by the seg size loop intros, and as the bounds change, but the no waits didn't, the threaded runs changed, then stabalised. Changing the segment size after updating the KGOs does not cause them to change. Changing further syncronisation points may have allowed the compiler to change how it has optimised, generting further KGO shifts. Full debug otherwise remains consistent as a reference point, where previously with a genuine bug, they did not. This reinforces that the KGO change is optmisation driven. |
Adrian-Lock
left a comment
There was a problem hiding this comment.
All looks fine to me
PR Summary
Sci/Tech Reviewer: @christophermaynard
Code Reviewer: @harry-shepherd
Remove the j loop which is adversely affecting performance in the boundary layer, and improve blocking loops.
Performance data can be found in the umbrella issue, #106
closes #217
Initial KGO issues were fixed with adding denom to the private list L2546.
Taking over from #221.
Introducing further blocking loops have introduced new KGO changes, which appear to be fast debug O2 optimisation driven.
I've also noted that with differing loop ranges over threads, reducing nowaits is required, which opens the option of removing the barrier to aid further PSyclone work in the future (The barriers will remain for now).
With the above KGO occurrences, these have likely factored into the cause of KGO changes at 1T, which remain unaffected for full-debug, still indicating optimisation as the likely root cause.
As the KGO updates are holding (for the all of the tests involved in the OMP dev group) for 1, 2 or 4T, I'm reasonably confident that it's not a newly introduced race condition or similar.
Code Quality Checklist
Testing
trac.log
Test Suite Results - lfric_apps - Kmkhz-9c-by-hand-2/run7
Suite Information
Task Information
✅ succeeded tasks - 51
Test Suite Results - lfric_apps - Kmkhz-9c-by-hand-2/run6
Suite Information
Task Information
❌ failed tasks - 2
Security Considerations
Performance Impact
AI Assistance and Attribution
Documentation
PSyclone Approval
Sci/Tech Review
(Please alert the code reviewer via a tag when you have approved the SR)
Code Review