is it possible to run multiple jobs using same ADL?

Issues related to runtime execution of algorithms in ADL
wzchen
Posts: 89
Joined: Wed Jul 18, 2012 3:01 pm

is it possible to run multiple jobs using same ADL?

Post by wzchen »

Hi,

I am using command line TK Chain Runner to run VIIRS Clouds mask. I works like a charm. However, you know, it will take at least one week to finish one day's data. In order to speed up its processing, I have to run multiple jobs at the same time. Currently, I had to install several copies of ADL with different name, so it will not conflict each other. But it is just annoying to maintain multiple copies of ADL for same purpose. I am wondering if I modify the "landing zone area" and "lw_properties" file to different location, then I can use same ADL to run the multiple jobs. Before I do that, I want to make sure if these are the only two places which ADL is writing to during the run? Does it has any other common locations or files I need to modify beside the "land zone area" and "lw_properties"?

Thanks.

Weizhong
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: is it possible to run multiple jobs using same ADL?

Post by kbisanz »

Hi,

The ability to run in parallel depends on what version you have. When ADL 4.1 was originally released, the command line chain runner could not process algorithms in parallel. However a patch was released to fix that (to make ADL 4.1.1). That patch is available at https://jpss.ssec.wisc.edu/adl/download ... L_4.1_Pat/

If your $ADL_HOME/.version says ADL 4.1.1 you should be able to process granules in parallel. You can do this with the -m option. If you don't have the -m option, you can try to apply the patch.

However, that patch is for the version of ADL that has *not* already been converted into a virtual appliance. So, if you have installed ADL from source (i.e. you extracted source code from tar files), you should be able to use the patch. If you use the virtual appliance, I am not sure if that patch is in the latest virtual appliance. I am also not sure if that patch will successfully install in the virtual appliance. I have sent off an email to the University of Wisconsin to try to determine what the status of the 4.1.1 patch and the virtual appliance is.

If you want to try the patch, I would recommend making another ADL installation and attempting to apply the patch to that copy. If you have the virtual appliance or you have made code changes since ADL was installed, I cannot guarantee the patch will install correctly.

To answer your question, yes I *believe* that the landing zone and the location in the lw_properties files are the only locations to which data is written.

I have a few questions about your specific setup that may let us make some recommendations to help you:
--Are you using the virtual appliance or have you installed from source?
--Have you compiled optimized or are you compiling debug? Compiling optimized takes longer, but make things run faster.
--How much memory and how many CPUs do you have on the machine you're using?
--What algorithms are you running? Are you running only cloud mask, or are you running algorithms earlier in the chain such as VIIRS SDR?
--How much data is in the algorithm's input directories? A quick way to find out is "find . -name \*.asc | wc -l" to count the number of input files.
--I understand the command line chainrunner is easier to use in some instances, such as scripting, but have you tried the GUI chain runner? It should be able to process things in parallel.

ADL 4.1 uses OpenMP while reading input files in an attempt to make the reading faster. In $ADL_HOME/build/envSetup.ksh you can try changing OMP_NUM_THREADS to other values to see if that helps. Unless you have very fast hard disks or a solid state drive I'd recommend leaving the value at 2 because that's the value that worked the best for us. OpenMP is *not* used during algorithm processing.

Additionally if you have large quantities of static data (such as tiles) you could try creating "jumbo" asc files (with extension .jasc). Note that these may already exist in tile input directories. A .jasc is just a single file containing the contents of a series of .asc files. Using a .jasc file can speed up initialization because only 1 file read is performed for the .jasc file, instead of a file read for each .asc file. You can create a .jasc file using $ADL_HOME/script/createJascFiles.sh. Note that there are a couple things to watch out for when using .jasc files
--Check to make sure they're not already there.
--A .jasc file in a directory will cause all .asc files to be ignored
--A .jasc file can only be used for data which does not change.

Once you get the parallel processing ability, it is limited to 2 processes by default. You can change the maximum by changing "THREAD_COUNT" in $ADL_HOME/.lw_properties. It is set to 2 because running too many processes can easily overwhelm a machine if it doesn't have much memory or many CPUs.

I look forward to hearing from you.
Kevin Bisanz
Raytheon Company
wzchen
Posts: 89
Joined: Wed Jul 18, 2012 3:01 pm

Re: is it possible to run multiple jobs using same ADL?

Post by wzchen »

Hi Kevin,

Thanks for your quick and detailed reply. Let my answered your questions first:

--Are you using the virtual appliance or have you installed from source?
From source
--Have you compiled optimized or are you compiling debug? Compiling optimized takes longer, but make things run faster.
Not sure. I just use "./buildAdl.ksh" script without any arguments. Is it optimized mode? If not, how can I choose it?
--How much memory and how many CPUs do you have on the machine you're using?
I have several machines which I can use. Each has only 32Gb memory and 16 CPUs.
--What algorithms are you running? Are you running only cloud mask, or are you running algorithms earlier in the chain such as VIIRS SDR?
Yes, in the chain from SDR.
--How much data is in the algorithm's input directories? A quick way to find out is "find . -name \*.asc | wc -l" to count the number of input files.
I keep all unpacked the RDRs into a separate folder. My scripts will give 5 granules set each time.
--I understand the command line chainrunner is easier to use in some instances, such as scripting, but have you tried the GUI chain runner? It should be able to process things in parallel.
The GUI mode is great. However, it is not an option for me to run any task longer than one day. I usually just use it to check missing inputs and errors. Since I can not guarantee our network and my laptop without any problem during longer run time. Also, I have to take my laptop to home sometimes. Therefore, I have to keep all my jobs in background. It works well so far. Actually, from my experience, the command line chainrunner is the most useful ADL tool for me. Thanks.

I already use "-m" option in command line chainrunner. It works well. However, like you mentioned the default thread is only 2. You know, for one day's clouds mask algorithm, it will take about one week to finish on our machine. Also, it can only runs on one machine. Therefore, I had to separate one day's data into 3 or 5 smaller data set, then use different ADL to run it. By using this method, it will cut run time to only 1/3 or even 1/5 of original run time. If I know the exactly the writing locations during ADL run, I can relocated all of them to outside the $ADL_HOME. So I don't have to install multiple ADL for same task.
Beside the landing zone and lw_properties file itself, what else location or files I have to move? I check the lw_properties file. It includes 14 locations. It seems that all the paths in the last two variables (DPE_DMS_INST_PATH and INFTK_DM_ROOT) need to be change to outside the $ADL_HOME, right? What else do I need to change?

Thanks.

Weizhong
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: is it possible to run multiple jobs using same ADL?

Post by kbisanz »

You are compiled in debug mode. I know this because of the "-DDebug" inside of buildAdl.ksh. You can remove "-DDebug" and if you clean and rebuild you should be compiled optimized then. Alternatively I would actually recommend using "$ADL_HOME/script/build_adl.pl -clean -makefiles -src -library -program -log build.log". Both buildAdl.ksh and build_adl.pl take a debug option (-DDebug and -debug) which causes code to be compiled debug. Compiling in debug is faster to compile and allows for the use of a debugger to step through code. However if you're processing a lot of data, compiling without the debug option will cause the compile to be optimized for runtime efficiency. The advantage of build_adl.pl over buildAdl.ksh is just that build_adl.pl should build ADL faster because it attempts to run multiple build processes at once.

Your machines sound moderately powerful. I would guess that memory is the limiting factor. VIIRS SDR requires about 6-8 gig of memory for a single granule. I would guess you could run 3 or maybe 4 granules of VIIRS SDR at once before they start interfering with each other. By interfering with each other, I mean that together they could require more than 32 gig of memory. Once they require more than that, the operating system will start to use swap space which could actually slow things down.

It sounds like your datasets are relatively small and probably aren't causing too big of slow downs. The one thing I would recommend checking is that you find where the tiled data is stored and make sure the directories contain a .jasc file. The tile data is data like Terrain-Eco-ANC-Tile and GridIP-VIIRS-Qst-Lwm-Quarterly-Tile.

It's good that you're using the -m option. Like I said previously, you can change "THREAD_COUNT" in $ADL_HOME/.lw_properties to be something higher.

I *believe* the landing zone and $ADL_HOME/.lw_properties contain all the locations where data is written. Actually $ADL_HOME/.lw_properties is generated based on environment variables in $ADL_HOME/build/envSetup.ksh. The usage of the chain runner says this:

Code: Select all

   Environment Variables:
                TEMP_LOC                Directory to write the log and TK files.
                INFTK_DM_ROOT           Data directories (colon separated) or path to file
                                        with data directories listed on individual lines.
                DEFAULT_EXE_PATH        Path to where your executables are located.
If you want to run multiple instances of the chain runner with 1 ADL install, I *think* it can be done. I would change
--TEMP_LOC to be a scratch location for each dataset
--INFTK_DM_ROOT to be the paths for the data for different data sets
--DEFAULT_EXE_PATH to be $ADL_HOME/bin
Then change runAdlChainRunner.pl such that LW_TOOL_PROPERTIES is written to a location that is dataset specific. You'll notice that currently, it will write to $ADL_HOME/.lw_properties, causing your different datasets to clobber each other. I have not actually tried the above steps, so it's possible you'll find something I have overlooked.

Here are my recommendations in short list:
--Compile optimized by removing -debug or -DDebug
--Don't run too many processes to where you start using swap space. You may have to play around to find the right number.
--Change "THREAD_COUNT" in $ADL_HOME/.lw_properties
--If you want to run multiple instances of the chain runner, change $TEMP_LOC, $INFTK_DM_ROOT, $DEFAULT_EXE_PATH. Then modify ./runAdlChainRunner.pl to write .lw_properties to a different location.

Also, consider changing your log level to be HIGH or OFF. LOW writes a lot of debug information, but can slow things down. HIGH writes only minimal information (only really important messages). This is done with the -l flag to runAdlChainRunner.pl. It defaults to HIGH it appears.

I believe I've answered your questions. If something is unclear, please post back.

I am interested in which of the above options helps you out and look forward to your response.
Kevin Bisanz
Raytheon Company
wzchen
Posts: 89
Joined: Wed Jul 18, 2012 3:01 pm

Re: is it possible to run multiple jobs using same ADL?

Post by wzchen »

Hi Kevin,

Thanks for your quick reply.
After I added parallel option, changed threads to 3 and added jasc file to my tiles. The algorithm ran much faster. It usually took 35min for the first granule and 12 min for all granules afterwards. Now, it only took 8.3 min for the first one and 6.6 min for all others. BTW, the parallel option will only effect for the SDR algorithms, right?

I am able to change the all the directories in lw_properties and landing zone. However, another problem is that I don't know how to change the location for "HDF5_Unpack_Area"? It is used by an executable file.

I always use log level "HIGH". It seems that it shows enough information for me to locate the problems.

I try to rebuild my adl using the perl script you mentioned. However, it didn't go through. It gave me the following error message. Do you have any idea what's going on?
.......
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
------------------------------------------
Running size validation script
./ProCmnValidateOmpsNpDictionarySize.exe -lw /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/cfg/dictgen.xml
SUCCESS: All product structure sizes match their dictionary entry definitions' sizes.


--------------------------------------------------------------
cd /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/SCIENCE/src/;make library

/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AutoDerivedScienceDictionaryEntries.cpp -o AutoDerivedScienceDictionaryEntries.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AdlCmnScienceProductDictionary.cpp -o AdlCmnScienceProductDictionary.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
Creating Shared Library ...
/bin/rm -f /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlScienceDictEntries.so
/usr/bin/g++ -mfull-toc -m64 -Xlinker -zmuldefs -lc -shared -L/usr/lib -L/lib -o /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlScienceDictEntries.so AutoDerivedScienceDictionaryEntries.o AdlCmnScienceProductDictionary.o
/usr/bin/ld: skipping incompatible /usr/lib/libm.so when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib/libm.a when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
------------------------------------------
Creating Static Library ...
/bin/rm -f /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlScienceDictEntries.a
/usr/bin/ar -ru /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlScienceDictEntries.a AutoDerivedScienceDictionaryEntries.o AdlCmnScienceProductDictionary.o
/usr/bin/ar: creating /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlScienceDictEntries.a
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AdlCmnValidateScienceDictionarySize.cpp -o AdlCmnValidateScienceDictionarySize.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
------------------------------------------
Creating size validation script
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -m64 -Xlinker -zmuldefs -L/usr/lib -L/lib -o AdlCmnValidateScienceDictionarySize.exe AdlCmnValidateScienceDictionarySize.o -L/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib -lProCmnIPO -lProCmnUtil -lProCmnDictionary -lProCmnDictEntries -lProCmnStrings -lInfUtil_Cfg -lInfUtil_Perf -lInfTk -lInfCmnUtil -lInfCmnExc -lInfUtil_Dbg -lInfUtil_Tim -lInfUtilGran -lDmApi -lDmCoreDb -lDmImInventory -lDmCoreLibrary -lDmImInventoryAdaptation -lDmMgmt -lDmSmStorage -lDmSmStorageAdaptation -lProCmnMath -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lpppack -lADLPacker -lADLHDF -lADLUtil -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -llog4cplus -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lhdf5 -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lboost_filesystem -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lxerces-c -L/usr/lib -lm -lc -lrt -lpthread -lgfortran -lz -L/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib -lProCmnValidateDictionarySize -lProCmnDictionary -lProCmnDictEntries -lProCmnAncDictEntries -lAdlScienceDictEntries -lProCmnMode -lProSdrCmnGeo -lProCmnGeoloc -lnovasc -lProCmnMath -lProGipViirsTileInterfaces -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lboost_regex
/usr/bin/ld: skipping incompatible /usr/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.a when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.a when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.a when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.a when searching for -lpthread
/usr/bin/ld: cannot find -lProCmnAncDictEntries
collect2: ld returned 1 exit status
make: *** [AdlCmnValidateScienceDictionarySize.exe] Error 1

--------------------------------------------------------------
cd /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/TEMPLATE/src/;make library

/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AutoDerivedTemplateDictionaryEntries.cpp -o AutoDerivedTemplateDictionaryEntries.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AdlCmnTemplateProductDictionary.cpp -o AdlCmnTemplateProductDictionary.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
Creating Shared Library ...
/bin/rm -f /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlTemplateDictEntries.so
/usr/bin/g++ -mfull-toc -m64 -Xlinker -zmuldefs -lc -shared -L/usr/lib -L/lib -o /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlTemplateDictEntries.so AutoDerivedTemplateDictionaryEntries.o AdlCmnTemplateProductDictionary.o
/usr/bin/ld: skipping incompatible /usr/lib/libm.so when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib/libm.a when searching for -lm
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
/usr/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
------------------------------------------
Creating Static Library ...
/bin/rm -f /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlTemplateDictEntries.a
/usr/bin/ar -ru /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlTemplateDictEntries.a AutoDerivedTemplateDictionaryEntries.o AdlCmnTemplateProductDictionary.o
/usr/bin/ar: creating /data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib/libAdlTemplateDictEntries.a
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/Dictionary/Entries/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/COTS/java/jdk1.7.0/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/DMS/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/exceptions/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/common/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/cfg/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/util/time/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/tk/util/include -I/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/ING/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -I/data/data020/weizhong/ADL4.1/CSPP/common/local/include -c AdlCmnValidateTemplateDictionarySize.cpp -o AdlCmnValidateTemplateDictionarySize.o
/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/CMN/Utilities/INF/include/InfTk_Defs.h:120: warning: âINFTK_TASK_SHUTDOWNSTRâ defined but not used
------------------------------------------
Creating size validation script
/usr/bin/g++ -DBYTE_ORDER_LE -DADL_ENV -D_USE_FLAT_FILE_ -D_THREAD_SAFE -DGCC -m64 -fPIC -Wall -Wno-unknown-pragmas -DEXCLUDE_CRIS -DUSE_UNDERSCORE -O3 -m64 -Xlinker -zmuldefs -L/usr/lib -L/lib -o AdlCmnValidateTemplateDictionarySize.exe AdlCmnValidateTemplateDictionarySize.o -L/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib -lProCmnIPO -lProCmnUtil -lProCmnDictionary -lProCmnDictEntries -lProCmnStrings -lInfUtil_Cfg -lInfUtil_Perf -lInfTk -lInfCmnUtil -lInfCmnExc -lInfUtil_Dbg -lInfUtil_Tim -lInfUtilGran -lDmApi -lDmCoreDb -lDmImInventory -lDmCoreLibrary -lDmImInventoryAdaptation -lDmMgmt -lDmSmStorage -lDmSmStorageAdaptation -lProCmnMath -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lpppack -lADLPacker -lADLHDF -lADLUtil -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -llog4cplus -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lhdf5 -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lboost_filesystem -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lxerces-c -L/usr/lib -lm -lc -lrt -lpthread -lgfortran -lz -L/data/data020/weizhong/ADL4.1/CSPP/ADL4.1_Mx6.7_VCM/lib -lProCmnValidateDictionarySize -lProCmnDictionary -lProCmnDictEntries -lProCmnAncDictEntries -lAdlTemplateDictEntries -lProCmnMode -lProSdrCmnGeo -lProCmnGeoloc -lnovasc -lProCmnMath -lProGipViirsTileInterfaces -L/data/data020/weizhong/ADL4.1/CSPP/common/local/lib -lboost_regex
/usr/bin/ld: skipping incompatible /usr/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.a when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.so when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/librt.a when searching for -lrt
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.a when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.so when searching for -lpthread
/usr/bin/ld: skipping incompatible /usr/lib/libpthread.a when searching for -lpthread
/usr/bin/ld: cannot find -lProCmnAncDictEntries
collect2: ld returned 1 exit status
make: *** [AdlCmnValidateTemplateDictionarySize.exe] Error 1
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: is it possible to run multiple jobs using same ADL?

Post by kbisanz »

The parallel option will affect all algorithms. Basically all the VIIRS SDRs will be ran in a parallel fashion, with a max number of THREAD_COUNT processes in parallel. The chain runner will then wait until all SDRs are finished. After all the SDRs complete, it will run the next "level" of algorithms, which is the masks controller (VIIRS cloud mask and active fires). The chain runner then runs all masks controllers with a max number of THREAD_COUNT processes. Then it proceeds to the next level and runs runs THREAD_COUNT processes. Except for you it stops after cloud mask. You can see this parallel process easier in the GUI chain runner.

You can change the HDF5_Unpack_Area in $ADL_HOME/cfg/DDSADL_CFG.xml. Change the value of DMS_PATH. Sorry I forgot about that one. Environment variables are obviously ok to use.

Regarding your compile error, I think you've stumbled across a race condition we've seen every so often. There is not a patch for it (though there probably should be). The fix is to edit build_adl.pl. Find where "my @libDictEntriesList " is being used to declare @libDictEntriesList. This is probably around line 650. Move the line "cd $ENV{ADL_HOME}/CMN/Utilities/Dictionary/Entries/ANC/src/;" from there and into the block above where "my @libPreDictEntriesList " is used to declare @libPreDictEntriesList. After your change, @libPreDictEntriesList will contain lines for ANC and VIIRS-Verified-RDR. @libDictEntriesList will contain lines for VIIRS, CrIMSS, OMPS-NP, SCIENCE, and TEMPLATE.

Make the above change and try to execute "$ADL_HOME/script/build_adl.pl -clean -makefiles -src -library -program -log build.log" again.
Kevin Bisanz
Raytheon Company
wzchen
Posts: 89
Joined: Wed Jul 18, 2012 3:01 pm

Re: is it possible to run multiple jobs using same ADL?

Post by wzchen »

OK. I tried SurfaceAlbedo algorithm. I set Thread Count to 3 and picked up 3 granules to run. From the GUI chain runner, I can see that all algorithms had been sorted based on the "Threads count" and "algorithm level". Then, it was running in that order. Everything was just like your description.
When I gave "-m" option in the command line chain runner, I also have to provide one granule id as its input. (Do I have other option here? ). However, it only ran the SDRs and the first 3 Masks in parallel for the input granule. Also, it seems that it can only ran one granule each time, even I included more granules in its input. Meanwhile, the chain runner didn't sort those algorithms . All other algorithms were still running one by one. Did I do something wrong?
Thanks.
Last edited by wzchen on Thu May 30, 2013 9:02 am, edited 2 times in total.
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: is it possible to run multiple jobs using same ADL?

Post by kbisanz »

Unfortunately, I believe you're using the command line chain runner correctly. Currently you can specify only 1 granule ID, which in this case is a limitation. You could work around this by calling the command line chain runner in some type of loop, either a shell loop or a loop in a script. However, this is not an ideal solution because for each invocation of the chain runner (both GUI and command line), it needs to read input data and perform analysis on it, which can be a time penalty.

If you feel this should be fixed, you'll need to work it though Paul Meade who can try to get it worked through the sustainment DRAT process. I have sent Paul Meade an email alerting him to this forum post. However, if you want it patched, you should also contact Paul.

Also, I don't quite understand why the GUI chain runner does not fit into your workflow?

Did you achieve any speed up from compiler optimization?
Kevin Bisanz
Raytheon Company
wzchen
Posts: 89
Joined: Wed Jul 18, 2012 3:01 pm

Re: is it possible to run multiple jobs using same ADL?

Post by wzchen »

Hi Kevin,

Thanks for your suggestion. I will contact Paul about this issue.

Yes, that's exactly what I am doing right now to put command line chain runner into a loop. It works well so far. However, like we discussed, I only can give one granule ID each time.

The problem for using GUI chain runner is really the network issue. I have to use putty to connect our server from my laptop. Sometimes, if I have to take my laptop around, I have to disconnect it from the network which means all the jobs will be stopped too. Also, I don't know if I can move the GUI chain runner to work in the background. Please let me know, if you have any solutions for it. However, for command line chain runner, I can just easily use "nohup .... &". It will totally run everything in the background. I can safely shut down my laptop.

Yes, yes, the ADL ran much faster from compiler optimization. Now, it took less than 6min for clouds mask algorithm after all optimized steps. That's amazing! It used to take about half an hour. Many thanks.
Last edited by wzchen on Thu May 30, 2013 9:04 am, edited 2 times in total.
kbisanz
Posts: 280
Joined: Wed Jan 05, 2011 7:02 pm
Location: Omaha NE

Re: is it possible to run multiple jobs using same ADL?

Post by kbisanz »

Unfortunately, I don't have an easy solution for connecting and disconnecting from a GUI application. Depending on if you can install software on your server or not, the following a piece of software named "Xpra" sounds interesting:
http://en.wikipedia.org/wiki/Xpra
http://xpra.org/

I have not used Xpra, so I don't know if it works or not.

If you need to disconnect and reconnect from terminal sessions, you should investivate GNU screen if you're not already using it:
http://www.debian-administration.org/articles/34
http://linuxgazette.net/147/appaiah.html
Kevin Bisanz
Raytheon Company
Post Reply