-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HANA_CALL - handle all timeout return codes, not only 124 #248
Comments
Why? |
What was the CONCRETE error condition? |
@fmherschel in the SAPHanaTopology RA there are already checks for other return values, for example Line 981 in 8db3d75
Line 1091 in 8db3d75
Line 1095 in 8db3d75
Line 1114 in 8db3d75
Line 1118 in 8db3d75
So the question for me would be why the other error codes for the timeout command are takein ito account in the SAPHanaTopology RA but not in the SAPHana RA? |
@fdanapfel RC 124 happens regularly. I have never seen 125, 126, 127. RC 124 is by plan. |
@fmherschel Yes we do. :) We have a case of RC 134 in a scale-out environment which leads to a failover, due to the role being set to some rubbish from that error. In the scale-up SAPHanaTopology RA this happens to be automatically covered by the Do you have specific concerns about generally allowing this bit of tolerance of any RC >= 124? |
@fdanapfel , @ja9fuchs Ah, so you just like to change from " |
I'm fine with that change, but the RC should be part of the warn message |
@fmherschel Correct, this is my suggestion. Although it is not exactly the same as @PeterPitterling originally requested. But I think it would be a good solution to cover these RCs in a consistent way in the different RAs. Maybe with a slight change of wording in the log output, to make it more generic and not refer to the result as a timeout only. For example call it "command error" instead of "timeout". By including the RC in the output this should be enough info for individual troubleshooting. |
@ja9fuchs Yes I also want to change that. Also the angi code will have some changes on that part. It will take some time as this also needs some testing. I was afraid to differ e.g. rc==125 in the RAs reaction. Because that is really not easy. But we could of course treat a (e.g.) rc==125 in the same way as a rc==124 as a "failed answer". And following the comment of @PeterPitterling we will add the RC to the log messages. Please give me some time to do that. |
@fmherschel If I understand it correctly, you will work on adjustments of the angi code. In the meanwhile, if that is ok, I would create a small PR in the separate scale-out repository to make the minimum of adjustment there to cover the agreed change. This way the work is distributed and we can clarify details in the PR to get it right. |
@ja9fuchs That would be great! |
@fmherschel My main question actually was why the resource agents use " I agree with @ja9fuchs that it would be good to make it more consistent that only one type of check is used, preferably " And I also like the suggestion from @PeterPitterling to include the RC in the messages that get logged. |
@fdanapfel Why does code has different handlings? It might be that we have learned during the "aging" of SAPHanaSR and did miss the other code sections. I currently do not expect that there is a good reason why we should differ between -eq and -ge. But I will also ask @angelabriel, if I miss something which explains the difference. |
@fmherschel Thank you for accepting and merging the PR in scale-out! 👍 @PeterPitterling Would you please check if that covers your request well enough, for classic scale-out? If yes, it might help aligning the angi code in a consistent way. |
@ja9fuchs Do you also check the scale-up-classic code in the branch maintenence-classic or should I do this? |
@fmherschel I actually compared to the classic scale-up code side-by-side while doing the scale-out changes. The scale-up Note: |
I checked all conditionals around 124 in each of the classic RAs and those that are still |
That was actually my initial request. Also this section should handle ge 124 and provide the RC in the error log |
Since improved handling of rc >= 124 has been added to scale-out and also to the Angi code (007e761), has this request been resolved or did we miss anything? |
@ja9fuchs Upps - seems the change was lost in space ... I could provide a patch in a branch. Would you be able to test it closely from your side? |
@fmherschel What patch do you mean? As far as I found, the Angi code was already updated. See the link to the commit in my above comment. |
@ja9fuchs: Maybe I was unclear. I just asked if you would be able to test a patch closely if I would prepare it. Just to be sure the patch (to be created) will not break something. And by the way: Happy New Year! |
https://man7.org/linux/man-pages/man1/timeout.1.html
SAPHanaSR/ra/SAPHana
Lines 692 to 697 in 8db3d75
The text was updated successfully, but these errors were encountered: