SDR runtime error for 3rd granule

Issues related to the VIIRS SDR algorithm and data
yli
Posts: 16
Joined: Fri Apr 26, 2013 1:25 pm

SDR runtime error for 3rd granule

Post by yli »

Has anyone met this issue? I run ProSdrViirsController for a couple of granules with runAdlChainrunnerGui in Parallel (default 2), what happens OCCASIONALLY is after the first two granules finished running and jump to the third granule, it fails whereas the following granules (4th and beyond) can run successfully. The error reports for the failure granules shows something like "no error". I rerun the failure granule and it can simply go through.

Thanks.

Yue
bhenders
Posts: 72
Joined: Wed Jan 05, 2011 9:27 am
Location: Omaha, NE

Re: SDR runtime error for 3rd granule

Post by bhenders »

Yue,

We have not seen this, but we'd be very interested in looking at any log files for failure cases. Is the "no error" what comes to the console or what is found in the log file? The log files should be in $ADL_HOME/log unless you've redirected them elsewhere. I can't think of a reason right now that might cause the intermittent failures, so any additional information you can provide would help us. Like you asked if there are others out there that are experiencing the same issue that would be good info also.

Thanks,

Bryan Henderson
Raytheon Company
yli
Posts: 16
Joined: Fri Apr 26, 2013 1:25 pm

Re: SDR runtime error for 3rd granule

Post by yli »

Bryan,

That is from console. Next time I see it, I will paste it from the log file at $ADL_HOME/log.

Thanks.

Yue
yli
Posts: 16
Joined: Fri Apr 26, 2013 1:25 pm

Re: SDR runtime error for 3rd granule

Post by yli »

I encountered the same error again. However, this does not seems to be limited to the 3rd granule only. Could it be related to memory usage?

The error message:

"Log Message Value TRACE - (30723.47063431984720): DBG_HIGH ProCmnMethodAudit.cpp|207|ProCmnAppl[ProCmnViirsAppl]::initDMSClient() [0x7fff73f93650] ROOT PRO_FAIL Error with: DMS client initialization: No error from file ProCmnAppl.cpp, line 618
Log Message Value TRACE - (30723.47063431984720): DBG_HIGH ProCmnMethodAudit.cpp|207|ProCmnAppl[ProCmnViirsAppl]::init(ADL 4.1.1 PRO VERSION built by yli on Tue May 28 10:54:12 CDT 2013) [0x7fff73f93650] PRO_FAIL initDMSClient() call from file ProCmnAppl.cpp, line 261
Log Message Value ProCmnAppl[ProCmnViirsAppl]::initDMSClient() [0x7fff73f93650] ROOT PRO_FAIL Error with: DMS client initialization: No error from file ProCmnAppl.cpp, line 618"
scottm

Re: SDR runtime error for 3rd granule

Post by scottm »

I believe this is the kind of error you can see if a path in the xml is wrong or does not exists. Check packer unpacker and log paths.
yli
Posts: 16
Joined: Fri Apr 26, 2013 1:25 pm

Re: SDR runtime error for 3rd granule

Post by yli »

Hi Scott,

Can you give more details? BTW, I built ADL from the source codes.

If it is some kinds of error, why rerun with Delta plan could go through?

Thanks.

Yue
yli
Posts: 16
Joined: Fri Apr 26, 2013 1:25 pm

Re: SDR runtime error for 3rd granule

Post by yli »

Some added information, seems related. At the terminal where Tk Chain Runner is initiated, some messages show up:

"WARNING 2013-06-13 00:53:14.235653 tid-1270589760 pid-11812 (DmCoreFileTools.cpp line 1089) failed to mapFile: (args: path=~/data/testdatalocation/output_atl/51b917c8-6f79e-6880ae6f-2396da05.asc,length=0,rwflag=DM_READONLY_ACCESS)
WARNING 2013-06-13 00:58:10.649989 tid-1270589760 pid-11812 (DmCoreFileTools.cpp line 1089) failed to mapFile: (args: path=~/data/testdatalocation/output_atl/51b91813-100b6-6880ae6f-43b3e1a8.asc,length=0,rwflag=DM_READONLY_ACCESS)"
bhenders
Posts: 72
Joined: Wed Jan 05, 2011 9:27 am
Location: Omaha, NE

Re: SDR runtime error for 3rd granule

Post by bhenders »

Yue,

I believe the issue has to do with running parallel processes and a file write race condition. I think that with multiple processes, what is occurring is that as one process is creating outputs in the file system, another process at the same time is trying to read a newly created output metadata asc file, which hasn't been completely written. This causes a DMS init failure in the second process. Later when you re-run the issue vanishes because the files are in a stable state.

We have a potential fix for this issue, however I will need a little time to code it and test it to some degree. Once I have it, I'd like you to apply the fix and see if your issue goes away.

Thanks for reporting this issue.

Bryan Henderson
Raytheon Company
scottm

Re: SDR runtime error for 3rd granule

Post by scottm »

We have found that if ADL runs into file problems, bad directory, missing file, or permissions wrong, it is likely to either quick or core dump with a DMS error. This always results from a real problem with the file system, but the messages usually do not contain the path or file in question.
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: SDR runtime error for 3rd granule

Post by kbisanz »

Regarding the comment by scottm, when you say the message usually doesn't contain the path in question, are you looking at the output to the console or at the log file? If it's the log file, what debug level are you running?
Kevin Bisanz
Raytheon Company
Post Reply