Page 1 of 2

SDR runtime error for 3rd granule

Posted: Thu May 30, 2013 10:21 am
by yli
Has anyone met this issue? I run ProSdrViirsController for a couple of granules with runAdlChainrunnerGui in Parallel (default 2), what happens OCCASIONALLY is after the first two granules finished running and jump to the third granule, it fails whereas the following granules (4th and beyond) can run successfully. The error reports for the failure granules shows something like "no error". I rerun the failure granule and it can simply go through.

Thanks.

Yue

Re: SDR runtime error for 3rd granule

Posted: Thu May 30, 2013 10:47 am
by bhenders
Yue,

We have not seen this, but we'd be very interested in looking at any log files for failure cases. Is the "no error" what comes to the console or what is found in the log file? The log files should be in $ADL_HOME/log unless you've redirected them elsewhere. I can't think of a reason right now that might cause the intermittent failures, so any additional information you can provide would help us. Like you asked if there are others out there that are experiencing the same issue that would be good info also.

Thanks,

Bryan Henderson
Raytheon Company

Re: SDR runtime error for 3rd granule

Posted: Thu May 30, 2013 12:00 pm
by yli
Bryan,

That is from console. Next time I see it, I will paste it from the log file at $ADL_HOME/log.

Thanks.

Yue

Re: SDR runtime error for 3rd granule

Posted: Wed Jun 12, 2013 2:41 pm
by yli
I encountered the same error again. However, this does not seems to be limited to the 3rd granule only. Could it be related to memory usage?

The error message:

"Log Message Value TRACE - (30723.47063431984720): DBG_HIGH ProCmnMethodAudit.cpp|207|ProCmnAppl[ProCmnViirsAppl]::initDMSClient() [0x7fff73f93650] ROOT PRO_FAIL Error with: DMS client initialization: No error from file ProCmnAppl.cpp, line 618
Log Message Value TRACE - (30723.47063431984720): DBG_HIGH ProCmnMethodAudit.cpp|207|ProCmnAppl[ProCmnViirsAppl]::init(ADL 4.1.1 PRO VERSION built by yli on Tue May 28 10:54:12 CDT 2013) [0x7fff73f93650] PRO_FAIL initDMSClient() call from file ProCmnAppl.cpp, line 261
Log Message Value ProCmnAppl[ProCmnViirsAppl]::initDMSClient() [0x7fff73f93650] ROOT PRO_FAIL Error with: DMS client initialization: No error from file ProCmnAppl.cpp, line 618"

Re: SDR runtime error for 3rd granule

Posted: Wed Jun 12, 2013 2:58 pm
by scottm
I believe this is the kind of error you can see if a path in the xml is wrong or does not exists. Check packer unpacker and log paths.

Re: SDR runtime error for 3rd granule

Posted: Wed Jun 12, 2013 4:13 pm
by yli
Hi Scott,

Can you give more details? BTW, I built ADL from the source codes.

If it is some kinds of error, why rerun with Delta plan could go through?

Thanks.

Yue

Re: SDR runtime error for 3rd granule

Posted: Wed Jun 12, 2013 8:08 pm
by yli
Some added information, seems related. At the terminal where Tk Chain Runner is initiated, some messages show up:

"WARNING 2013-06-13 00:53:14.235653 tid-1270589760 pid-11812 (DmCoreFileTools.cpp line 1089) failed to mapFile: (args: path=~/data/testdatalocation/output_atl/51b917c8-6f79e-6880ae6f-2396da05.asc,length=0,rwflag=DM_READONLY_ACCESS)
WARNING 2013-06-13 00:58:10.649989 tid-1270589760 pid-11812 (DmCoreFileTools.cpp line 1089) failed to mapFile: (args: path=~/data/testdatalocation/output_atl/51b91813-100b6-6880ae6f-43b3e1a8.asc,length=0,rwflag=DM_READONLY_ACCESS)"

Re: SDR runtime error for 3rd granule

Posted: Thu Jun 13, 2013 8:29 am
by bhenders
Yue,

I believe the issue has to do with running parallel processes and a file write race condition. I think that with multiple processes, what is occurring is that as one process is creating outputs in the file system, another process at the same time is trying to read a newly created output metadata asc file, which hasn't been completely written. This causes a DMS init failure in the second process. Later when you re-run the issue vanishes because the files are in a stable state.

We have a potential fix for this issue, however I will need a little time to code it and test it to some degree. Once I have it, I'd like you to apply the fix and see if your issue goes away.

Thanks for reporting this issue.

Bryan Henderson
Raytheon Company

Re: SDR runtime error for 3rd granule

Posted: Thu Jun 13, 2013 8:43 am
by scottm
We have found that if ADL runs into file problems, bad directory, missing file, or permissions wrong, it is likely to either quick or core dump with a DMS error. This always results from a real problem with the file system, but the messages usually do not contain the path or file in question.

Re: SDR runtime error for 3rd granule

Posted: Thu Jun 13, 2013 11:43 am
by kbisanz
Regarding the comment by scottm, when you say the message usually doesn't contain the path in question, are you looking at the output to the console or at the log file? If it's the log file, what debug level are you running?