-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite loop in nksol.m subroutine model #16
Comments
Noticed that in the above output, When I compile without
|
replace -Ofast by -03 -fstack-arrays and see what happens. Ofast enables unsafe memory data racing. |
Just checked and found that |
Did you check that you don't have any NaN while evaluating the rhs? You can
put a loop iv=1 to neq with if isnan(yldot(iv)) stop
…On Tue, Apr 28, 2020, 19:38 Sean Ballinger ***@***.***> wrote:
Just checked and found that -O3 -fstack-arrays has the same behavior as
-Ofast in this case.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEESMZHUMTO42T7XHWSPJ6LRO6HKFANCNFSM4MI6YP7Q>
.
|
(Just keeping this issue up to date.) I put the following code at the end of subroutine do iv=1,neq
if (isnan(yldot(iv))) then
stop
endif
enddo and found that the code stopped at this spot even with |
Maxim, Bill, and Sean,
I know there was some effort to get a comparable case on singe.llnl.gov so I could look at at, but I haven’t heard anything for several days. What is the status here?
I am assuming the Sean is running some variant of UEDGE V7.08.04. Some weeks ago, Roman Smirnov pointed out a couple of bugs that have been corrected in the CVS version, but not yet uploaded to GitHub as I am working on one additional update before releasing it. But the bugs Roman found I simply fixed in any version. They are as follows:
In bbb/odesetup.m:
odesetup.m-c... Construct second intermediate velocity grid (xvnrmnx,yvnrmnx)
odesetup.m- do ir = 1, 3*nxpt
odesetup.m: call grdintpy(ixsto(ir),ixendo(i),ixst(ir),ixend(ir),
For the last line, make the change ixendo(i) --> ixendo(ir)
In bbb/oderhs.m:
oderhs.m- 255 continue
oderhs.m- do igsp = 1, ngsp
oderhs.m- nbg2dot(igsp) = 0.
oderhs.m: if(isngonxy(ix,iy,ifld) == 1) then
For the last line, make the change isngonxy(ix,iy,ifld) --> isngonxy(ix,iy,igsp)
This was also a problem with cases that solve for the potential (isphion=1) that has been fixed, but I don’t think that Sean is evolving the potential equation.
My understand of the problems that Sean finds appear when some form of compiler optimization is utilized, but go away when a debuggable (-g) version is used. But this may be incorrect.
Please let me know where this all stands for Sean’s cases, and I am glad to participate in a call if that is a good way to make progress.
-Tom
…------
Thomas D. Rognlien Email: [email protected]<mailto:[email protected]>
L-440 (B3725, R432) Tel: 925-422-9830
LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446
Livermore, CA 94551
From: Sean Ballinger <[email protected]>
Reply-To: LLNL/UEDGE <[email protected]>
Date: Friday, May 1, 2020 at 8:20 PM
To: LLNL/UEDGE <[email protected]>
Cc: Subscribed <[email protected]>
Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)
(Just keeping this issue up to date.) I put the following code at the end of subroutine pandf1 in oderhs.m:
do iv=1,neq
if (isnan(yldot(iv))) then
stop
endif
enddo
and found that the code stopped at this spot even with -O3 -fstack-arrays.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAILAYVRAFZ5XJJ3DW75MM3RPOGNZANCNFSM4MI6YP7Q>.
|
Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up. I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with I tried Roman's fixes with and without |
Sean,
Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork,
there are some routines for debugging purpose. You can also print out the
Jacobian.
Also here is a subroutine generator for debugging purpose:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Mar 25 22:04:48 2020
@author: jguterl
"""
from uedge import *
#%%
class WriteDebugRoutine():
def __init__(self,FileName,ListVariable,Doc):
self.ListVariable=self.ListUse=list(dict.fromkeys(ListVariable))
self.ListVariable.sort()
self.Doc=Doc
self.FileName=FileName
self.ListUse=[]
self.VarDic={}
self.GetVarDoc()
self.GetListGrp()
self.WriteFortranSubroutine()
def SetFfile(self):
"""
Set the ffile attribute, which is the fortran file object.
It the attribute hasn't been created, then open the file with write status.
If it has, and the file is closed, then open it with append status.
"""
if 'ffile' in self.__dict__:
status = 'a'
else:
status = 'w'
if status == 'w' or (status == 'a' and self.ffile.closed):
self.ffile = open(self.FileName, status)
def fw90(self, text, noreturn=0):
i = 0
while len(text[i:]) > 132 and text[i:].find('&') == -1:
# --- If the line is too long, then break it up, adding line
# --- continuation marks in between any variable names.
# --- This is the same as \W, but also skips %, since PG compilers
# --- don't seem to like a line continuation mark just before a %.
ss = re.search('[^a-zA-Z0-9_%]', text[i+130::-1])
assert ss is not None, "Forthon can't find a place to break up this line:\n" + text
text = text[:i+130-ss.start()] + '&\n' + text[i+130-ss.start():]
i += 130 - ss.start() + 1
if noreturn:
self.ffile.write(' '+text)
else:
self.ffile.write(' '+text + '\n')
def GetVarDoc(self):
for VarName in self.ListVariable:
VarDoc=self.Doc.GetVarInfo(VarName)
if len(VarDoc)<1:
raise ValueError('Cannot find variable {}'.format(VarName))
elif len(VarDoc)>1:
raise ValueError('Found variable {} in two groups'.format(VarName))
else:
self.VarDic[VarName]=VarDoc[0]
def GetListGrp(self):
for VarName,VarDoc in self.VarDic.items():
self.ListUse.append(VarDoc['Group'])
self.ListUse=list(dict.fromkeys(self.ListUse))
self.ListUse.sort()
def WriteFortranSubroutine(self):
self.SetFfile()
self.fw90('subroutine WriteArrayReal(array,s,iu)')
self.fw90('implicit none')
self.fw90('real:: array(*)')
self.fw90('integer:: i,s,iu')
self.fw90('do i=1,s')
self.fw90('write(iu,*) array(i)')
self.fw90('enddo')
self.fw90('end subroutine WriteArrayReal')
self.fw90('subroutine WriteArrayInteger(array,s,iu)')
self.fw90('implicit none')
self.fw90('integer:: array(*)')
self.fw90('integer:: i,s,iu')
self.fw90('do i=1,s')
self.fw90('write(iu,*) array(i)')
self.fw90('enddo')
self.fw90('end subroutine WriteArrayInteger')
self.fw90('subroutine DebugHelper(FileName)')
self.fw90('')
for UseGrp in self.ListUse:
self.fw90('Use {} '.format(UseGrp))
self.fw90('implicit none')
self.fw90('integer:: iunit')
self.fw90('character(len = *) :: filename')
self.fw90('open (newunit = iunit, file = trim(filename))')
for VarName,VarDoc in self.VarDic.items():
self.fw90('write(iunit,*) "{}"'.format(VarName))
if VarDoc['Dimension'] is None:
self.fw90('write(iunit,*) {}'.format(VarName))
else:
if 'integer' in VarDoc['Type']:
self.fw90('call WriteArrayInteger({},size({}),iunit)'.format(VarName,VarName))
elif 'real' in VarDoc['Type'] or 'double' in VarDoc['Type']:
self.fw90('call WriteArrayReal({},size({}),iunit)'.format(VarName,VarName))
else:
raise ValueError('Unknown type')
self.fw90('close(iunit)')
self.fw90('end subroutine DebugHelper')
self.ffile.close()
#%%
#Dic File is generated by the script UEDGEFortranParser.py
ListVariable=DicFile['convert']['convsr_vo']['AssignedNonLocalVars']+DicFile['convert']['convsr_aux']['AssignedNonLocalVars']+DicFile['odepandf']['pandf']['AssignedNonLocalVars']
dbg=WriteDebugRoutine('DebugHelper.F90',ListVariable,Doc)
#%%
def CompareDump(FileName1,FileName2):
Dic1=ReadDumpFile(FileName1)
Dic2=ReadDumpFile(FileName2)
VarCheck={}
for Var in Dic1.keys():
VarCheck[Var]=True
if len(Dic1[Var])!=len(Dic2[Var]):
print(Var)
VarCheck[Var]=False
#aise ValueError('dics of different length')
continue
isfirst=True
for i,(L1,L2) in enumerate(zip(Dic1[Var],Dic2[Var])):
if L1!=L2:
VarCheck[Var]=False
if isfirst:
print(Var,i)
isfirst=False
return VarCheck
def ReadDumpFile(FileName):
file = open(FileName, 'r')
Lines = file.readlines()
file.close()
Dic={}
for L in Lines:
L=L.rstrip().strip()
try:
Lf=float(L)
isnumeric=True
except:
isnumeric=False
if not isnumeric:
VarName=L
Dic[VarName]=[]
else:
Dic[VarName].append(float(L))
return Dic
FileName1='/home/jguterl/Dropbox/python/UEDGERunDir/dumpregular.txt'
FileName2='/home/jguterl/Dropbox/python/UEDGERunDir/dumpomp.txt'
VarCheck=CompareDump(FileName1,FileName2)
for V,B in VarCheck.items():
if not B:
print(V)
…On Tue, May 5, 2020, 11:14 Sean Ballinger ***@***.***> wrote:
Maybe we could get you a temporary PSFC account? We have totalview. I am
also happy to debug over Zoom as I have it all set up.
I am using UEDGE version 7.0.8.4.14 and not evolving the potential
equation. It appears to me that bugs happen both with -Ofast and without,
but they manifest differently. I don't think -g prevents optimization or
otherwise affects the code.
I tried Roman's fixes with and without -Ofast, and the output/hanging
behavior was the same.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q>
.
|
Sean,
I think a zoom session is the place to start. If you are finding a nan for one of the yldot’s, the components that go into yldot are available to look at from the parser, and one of those must also have a nan – and so we can drill down to identify the specific source - hopefully. Please send me the source files in the subdirectories bbb and svr. If these are in a tar or zip file, you must change the prefix to something like .tax or .zix before sending it to me so that it will make it through the LLNL mail filter – unknown tar and zip files with those extensions are not allowed.
I can do a zoom session after 3 PDT today or after 1:30 PDT tomorrow.
-Tom
------
Thomas D. Rognlien Email: [email protected]<mailto:[email protected]>
L-440 (B3725, R432) Tel: 925-422-9830
LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446
Livermore, CA 94551
From: jguterl <[email protected]>
Reply-To: LLNL/UEDGE <[email protected]>
Date: Tuesday, May 5, 2020 at 12:10 PM
To: LLNL/UEDGE <[email protected]>
Cc: Tom Rognlien <[email protected]>, Comment <[email protected]>
Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)
Sean,
Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork,
there are some routines for debugging purpose. You can also print out the
Jacobian.
On Tue, May 5, 2020, 11:14 Sean Ballinger ***@***.***> wrote:
Maybe we could get you a temporary PSFC account? We have totalview. I am
also happy to debug over Zoom as I have it all set up.
I am using UEDGE version 7.0.8.4.14 and not evolving the potential
equation. It appears to me that bugs happen both with -Ofast and without,
but they manifest differently. I don't think -g prevents optimization or
otherwise affects the code.
I tried Roman's fixes with and without -Ofast, and the output/hanging
behavior was the same.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#16 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAILAYRTP3Q6UX3H6WLNC63RQBQABANCNFSM4MI6YP7Q>.
|
I've been encountering a problem where certain cases "hang". The text output stops, the code continues running, but nothing else happens (for example, the intermediate hdf5 savefiles do not get updated). Here is an example of the output (I'm using
rundt.py
):It is also possible to get the code to continue by compiling without the
-Ofast
flag inMakefile.Forthon
. In this case, you start seeingfnrm= nan
but the output continues untildtreal
goes belowdtkill
and the code quits by itself.I examined the case with
-Ofast
which hangs. I found that thepandf
subroutine was being called many times. By adding some print statements and going up the call stack, I found this infinite loop inside subroutinemodel
innksol.m
:I also put print statements around
pandf1
calls inoderhs.m
. Thepandf1
calls followed by line numbers (messed up by the additional lines I inserted into the file) correspond to otherpandf1
calls in thejac_calc
subroutine.This produces the following output:
Because
pthrsh
is always 0,ipcur
never gets set to anything other than 0, which is required in order to stop looping.pthrsh
is set to 0 ificntnu != 0
.icntnu
indicates if this is a continuation call to nksol that makes use of old values.Another interesting thing is that the arguments supplied to
pandf1
change around the time of the hang:-1 is not an invalid argument, but apparently means "full RHS evaluation" rather than "poloidal/radial index of perturbed variable for Jacobian calc".
Also tested a case which does not hang (even with
-Ofast
) and found that it also has a long period ofpandf(-1, -1, ...)
calls where the following form repeats many times but eventually the code moves on:From the absence of certain print statements (compare to end of the 4th code block in this post), we can see that these calls are being made from a different loop, supporting the claim that the loop identified above is the one that needs fixing.
Also ran the hanging case for several minutes to make sure that it was entirely
pandf(-1, -1, ...)
calls and it didn't start doing other things.Also checked out Jerome Guterl's pandf issue but this seems to be a problem inside
pandf
, not outside, as we have here.The text was updated successfully, but these errors were encountered: