Setting the provenance portion of an output filename

Data formats, HDF5, XML profiles, etc.
Post Reply
houchin
Posts: 128
Joined: Mon Jan 10, 2011 6:20 am

Setting the provenance portion of an output filename

Post by houchin »

Hi all,

In ADL is there a way (through software modification) to set the provenance portion of the filename of an output? In case I'm using the wrong terminology, I'm talking about the "PS-1-0-CCR-13-992-JPSS-DPA-059-PE" portion of a file like VIIRS-SDR-F-PREDICTED-LUT_npp_20130920143100Z_20130927001900Z_ee00000000000000Z_PS-1-0-CCR-13-992-JPSS-DPA-059-PE_noaa_all_all-_all.bin.

I would want the actual value to vary on at least a run-by-run basis.

In addition, is there a way to get just that portion of an input file without getting the entire filename and parsing it out myself?
Scott Houchin, Senior Engineering Specialist, The Aerospace Corporation
15049 Conference Center Dr CH3/310, Chantilly, VA 20151; 571-307-3914; scott.houchin@aero.org
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: Setting the provenance portion of an output filename

Post by kbisanz »

"Provenance portion" is the correct terminology.

When you mention "outputs", are you referring to H5 outputs, or to a binary/.asc file pair?

In build/envSetup.*sh you should be able to vary the values of DPE_SITE, DPE_SITE_ID, DPE_DOMAIN, and DPE_VER. The values of DPE_SITE, DPE_SITE_ID, and DPE_DOMAIN I *believe* are put in h5 file names. DPE_VER shows itself in N_Software_Version metadata (in .asc files) on outputs.

Provenance information is only on LUTs. When an output is created, the LUTs used as inputs to the process are listed in N_Aux_Filename metadata. The file name is the entry, so provenance information would be embedded in there.

I don't believe there is currently an easy way to get the provenance information, other than parsing it yourself. Input file names have a fixed, underscore delimited format (defined in the CDFCB), so provenance information should always be the N'th field, whatever N is.

Does the above information help you out? I have a feeling you were aware of most/all of the above, so I'm guessing you still have your need.

What problem are you trying to solve?
Kevin Bisanz
Raytheon Company
houchin
Posts: 128
Joined: Mon Jan 10, 2011 6:20 am

Re: Setting the provenance portion of an output filename

Post by houchin »

Yeah, I knew most of that already.

One of the activities we're doing is using ADL itself to modify input LUTs based on other data, then use that modified LUT for the standard processing. I have extended the algorithm configuration to have ADL output any of the modified LUTs. What I would like to do is to be able to change the official portion of that name to include content I pulled from the name of one of the other inputs.

So for example if I have an input VIIRS-SDR-ADJUSTMENT-LUT_npp_20020101010000Z_2002
0101010000Z_ee00000000000000Z_PS-blahblahblah-PE-_devl_dev_all-_all.bin

and that LUT causes the VIIRS-SDR-RVF LUT to be modified, I want to copy that blahblahblah into the filename ADL uses when it outputs the modified Delta C LUT.
Scott Houchin, Senior Engineering Specialist, The Aerospace Corporation
15049 Conference Center Dr CH3/310, Chantilly, VA 20151; 571-307-3914; scott.houchin@aero.org
bhenders
Posts: 72
Joined: Wed Jan 05, 2011 9:27 am
Location: Omaha, NE

Re: Setting the provenance portion of an output filename

Post by bhenders »

Scott,

I'm not sure how you've extended the algorithm configuration to have ADL output any of the modified LUTs, but you may be able to hook into ProCmnAUXOutputItem.cpp and make a few updates to this class to support your desires. The method below currently just hard codes the version to be a "-". One could add a data member to this class that defaults to the existing "-" item, but can be overridden in your case to set the version from the input or whatever you desire. This may work for you, but I'm not sure I'm completely following your scenario. I'm not aware of any code that currently exists to pull the version from the file name, though I suspect there is some out there in the ING software which has to parse file names that are dropped in the Landing Zone. Post back if you have further questions from this posting.

In this class you will find this method:

//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------
void ProCmnAUXOutputItem::generateFileName()
{
// the following routines can throw exceptions but we don't want to
// catch them here since the exception would just be rethrown
std::string rangeDateTimeStr = iet2utcString(getStartRange());
std::string effectivityStartStr = iet2utcString(getStartEffectivity());

// copy the string and convert them to lower case
std::string spacecraft(algorithm_->getSpacecraft());
std::string domain(algorithm_->getProcessingDomain());

// convert the string to all lowercase to appease the CDFCB format
std::transform(spacecraft.begin(), spacecraft.end(),
spacecraft.begin(), (Int32(*)(Int32)) std::tolower);

std::transform(domain.begin(), domain.end(),
domain.begin(), (Int32(*)(Int32)) std::tolower);

std::string shortName(getShortName());
// The elements of the AUX filename are as follows:
// Collection Short Name
// Spacecraft ID
// Production Timestamp: Based on RangeDateTime Metadata
// Effectivity Start & Stop: Based on Effectivity Metadata
// Version Number: Currently no known versioning of the PRO produced AUX
// Origin: Based upon the Central/Site where produced
// Origin Domain Based on the domain where PRO is running
// Destination - "all-"
// Destination Domain Based on the domain where PRO is running
// Extension - Set to ".bin"

std::string filename;

filename = shortName + "_" +
spacecraft + "_" +
rangeDateTimeStr + "_" +
effectivityStartStr + "_" +
"ee00000000000000Z_" + // unbounded effectivity
"-_" + // version number (NA)
getSiteOriginId() + "_" + // site produced
domain + "_" +
"all-_" + // destination
domain +
".bin";

if (ProCmnLogger::getLogger().isDebugMedEnabled())
{
std::ostringstream oss;
oss << getMethodContext()
<< " ProCmnAUXOutputItem::generateFileName() generated filename: "
<< filename;
DEBUG_MED((ProCmnLogger::getLogger()),oss.str());
}

// set the class attribute
setFileName(filename);
}

Bryan Henderson
Raytheon Company
houchin
Posts: 128
Joined: Mon Jan 10, 2011 6:20 am

Re: Setting the provenance portion of an output filename

Post by houchin »

In order to get the LUTs output, we're doing nothing different than specifying an output product in the Algorithm config file. If that file is to be changed, I copy data into the buffer provided by ADL. If not I remove the output item just as we did in RSBAutoCal to delete the Cal History when it's not updated. We're relying on ADL to handle all of this transparently to our code.

It sound like I would have to modify the as-distributed ProCmnAUXOutputItem class to take the version from a member variable, letting it default to "-_" so nothing is changed in the standard case.

It is my understanding that the filename of the output is generated really early in the execution of the task. Assuming I added a setVersion() method to ProCmnAUXOutputItem, is there a point in the execution where at least one specific input is available (so I can get the filename of that input), get the ProCmnAUXOutputItem objects for the outputs for which I want to customize the names, and then call that setVersion() method on those outputs.

In RSBAutoCal, we're calling getFileName() on the output item, which given that the method you cited below is generateFileName(), there is some point at which the filename is statically generated.
Scott Houchin, Senior Engineering Specialist, The Aerospace Corporation
15049 Conference Center Dr CH3/310, Chantilly, VA 20151; 571-307-3914; scott.houchin@aero.org
bhenders
Posts: 72
Joined: Wed Jan 05, 2011 9:27 am
Location: Omaha, NE

Re: Setting the provenance portion of an output filename

Post by bhenders »

Scott,

You brought up some very good points about the file name being set very early in the input stage. I've got a strategy for you that probably will work for you, though I admit it's a little more than you probably wanted. I basically tried this on VI EDR by adding a LUT to the algorithm config file as an output, so I know it can work, but it still involves making one common change.
Here is what I did:

Overrode the getDataItems() method from ProCmnAlgorithm in ProEdrViirsVI class:

/* virtual */ Int32 ProEdrViirsVI::getDataItems()
{
// Set output items as already retrieved so we can modify file names
// get the output data item
ProCmnOutputItem *output = getOutputItem("VIIRS-AOT-LUT");
if (output == 0)
{
return (PRO_FAIL);
}

// reset the allocated flag so that the get will not retrieve the output
output->setIsAllocated(true);

return (ProCmnAlgorithm::getDataItems());
}

Then add some code in doProcessing() for ProEdrViirsVI which sets the appropriate file name as you wish:
/* virtual */ Int32 ProEdrViirsVI::doProcessing()
{
ProCmnMethodAudit methodAudit(
this,
"ProEdrViirsVI",
"doProcessing",
getMethodContext());

Int32 status = PRO_SUCCESS;

// get the output data item
ProCmnOutputItem *output = getOutputItem("VIIRS-AOT-LUT");
if (output == 0)
{
return (PRO_FAIL);
}

// reset the attribute so that the get method can retrieve the data with new
// file name
output->setIsAllocated(false);
std::string filename = "This IS IT";
output->setFileName(filename);

status = ProCmnAlgorithm::getDataItems(); // retrieves the updated LUT with the correct file name at this point.
: : :

The CMN change I had to make was changing the setFileName() method to be public as it was a protected method of ProCmnDataItem class. One could also do all of the work in the overridden getDataItems() method, ie retrieve the inputs, determine the new file name for the output, set it, and then call the getDataItems() a second time, just manipulating the allocated flag as needed prior to each call.

This strategy can probably be adapted and enhanced as you need to, to meet your needs.

Post back if you have further questions.

Bryan Henderson
Raytheon Company
Post Reply