Running Very Large Watershed VELMA Simulations in Piecewise Fashion

THIS IS A DRAFT DOCUMENT: ITS CONTENTS MAY CHANGE

Overview

Large watersheds that require substantial time to run under JVelma or the VelmaSimulatorCmdLine can be configured to run under VelmaParallelCmdLine. However, some watersheds are so large that they take too long to run even with VelmaParallelCmdline.

When enough computing hardware is available, VelmaParallelCmdLine's multiple-map option can be use to run separate pieces of the overall watershed as independent VelmaParallelCmdLine processes. Ideally, all of these except the last (final reach of the overall watershed) can be run independently and simultaneously. The last piece is run with the VelmaParallelCmdline --relocationFile option, which provides the data necessary to use all of the other already completed results as part of the final, overall watershed result.

For this piecewise technique to work and provide running-time reduction, you will need enough hardware to run multiple VelmaParallelCmdLine processed independently. All of that harware must be able to read and write to a shared filesystem, or, the results of the independent pieces must be placed into a location that the final piece can access when it is run.

Basic Piecewise Technique

Running a VELMA watershed in piecewise VelmaParallel fashion is a specific use-case of VelmaParallelCmdLine's multiple-map capability.

Preparing Piecewise Configurations

Start with a VELMA .xml file that provides a valid multiple-reach delineation of the full watershed. I.e., the .xml file you would run in a single VelmaParallelCmdLine process to simulate the full watershed (we'll refer to this as the "full .xml").
Determine the full .xml's iOutlet location (the icell value corresponding to the full .xml's outx and outy parameter values.).
Find the subset of reach outlets in the full .xml's initialReachOutlets parameter value that are adjacent-contributors to the final outlet's reach, and are not themselves indepedent reaches (i.e. they have "upstream" independent contributors of their own).
Create a new "piecewise .xml" file for the final reach copying the full .xml. In the new file … a. Change the run_index parameter value and file name. A handy naming scheme: prefix i<outlet#>_ to the original name, where <outlet#> is the ioutlet of the final reach. b. Change the initialReachOutlets parameter value to be the final reach's own iOutlet value plus only the adjacent-contributor reaches' outlets.
Create new "piecewise .xml" files for each of the non-indenpendent reaches identified in step (3). As in step (4), copy the full .xml, and For these non-final piecewise .xml files … a. Change the run_index parameter value and file name to something unique for the reach. b. Change the outx and outy parameter values to be the x and y coordinates corresponding to this reach's iOutlet value. c. Change the initialReachOutlets parameter value to be this reach's own iOutlet value plus all the reach outlets that this reach contains (i.e., all outlets upstream of this reach that either directly or indirectly flow to this reach).

Running the Independent Piecewise Configurations

After configuration, all but the final piecewise .xml can be run simultaneously, so long as there are enough hardeware resources to run them. Except for the final piecewise .xml, you can launch each piece as if it is a completely independent VelmaParallelCmdLine simulation run. Because of the preparatory steps, it is a completely independent simulation run.

Running the Final Piecewise Configuration

The final piecewise .xml can only be run after all the other pieces have successfully completed. When it is run, it needs to know where to find the already-computed results from the other pieces. This information is provided to the final piece in the --relocationFile file passed to the final-piece VelmaSimulation

The Relocation File's format is rows of comma-separated values (.csv data). There should be one row for each independent piecewise .xml that you ran. Each row must contain the following fields:

iReach, iUseAs, iSource, sourceLocationData

The iReach and iUseAs field values refer to the destination (i.e. final-piece). The iSource and sourceLocationData field values refer to the source (i.e. an upstream, contributor) piece. Each row of the relocation .csv file must contain four values:

iReach, iUseAs,  iSource, sourceLocationData
|       |        |        |
|       |        |        Fully-qualified path to a source sub-map simulation results directory.
|       |        |        (Specifies the location of the source sub-map simulation run results.)
|       |        |
|       |        The iOutlet of a contributor reach 
|       |
|       The iOutlet of a contributor reach.
|       (Each row's iUseAs value should be the same as its iSource value).
|
The iOutlet of the final reach.

(Do not include whitespace after commas in your actual file.)

The sourceLocationData value on each row must be the fully-qualified path to the VELMA results of the independent contributor iOutlet (iUseAs, iSource) on that row. The path/location specified must be accessible and readable by the final-piece VelmaParallelCmdLine run.

An example: suppose one of your independent contributor reach piecewise runs had: iOutlet = 2336 and run_index value: i2336_MD_ws109_10m_mulD , and the i2336 results were written to: C:\Users\Me\VelmaResults\MULTI_i2336_MD_ws109_10m_mulD . Also suppose the final-piece reach's iOutlet = 1735. The row for the i2236 piecewise results in the relocation file for running the final i1735 piece would look like this:

1735,2336,2336,C:\Users\Me\VelmaResults\MULTI_i2336_MD_ws109_10m_mulD\Reach_2336

VelmaParallelCmdLine prefixes MULTI_ to the run_index folder name it writes simulation results to. Reach results of a VelmaParallel are in subdirectiries of that MULTI_ -prefixed root folder name. That is why the full path specified for sourceLocationData starts with MULTI_ and ends with Reach_2336 .

The PdemOutletsInfo Utility

It can be difficult to manually determine the subset of a base .xml's initialReachOutlets that are direct, non-independent contributors to the final reach's outlet. And, once found, it can also be diffcult determining the disjoint sets of a base .xml's initialReachOutlets contained (updatestream of and flowing into) for each of those non-independent contributor outlets. The PdemOutletsInfo utility performs both of these tasks automatically.

The PdemOutletsInfo utility "lives" in the JPDEM.jar file.

Here is an example run that uses it to report "depends_on" and "contains" data for every outlet in a small watershed:

In the base .xml:

 outx = 15
 outy = 20
 input_dem = ./m_1_DEM/elevation_10m_ws109_7-9-2014_nearest_aoi_std_flatProc.asc
 initialReachOutlets = 2434 3236 3590 1735 2263

The command line to run PdemOutletsInfo from a Microsoft Powershell for our data looks like this:

 PS C:\Users\me\Velma> 
 java -Xmx2g -cp '.\Velma_Jars\JPDEM.jar' gov.epa.jpdem.PdemOutletsInfo `
 C:\Users\me\Velma\MD_wS109\DataInputs\m_1_DEM\elevation_10m_ws109.asc `
 2434 3236 3590 1735 2263 `

NOTE: The multiple lines above are actually a single command line. The backtick character ending each line is Powershell's way of escaping them into a single line when evaluated.

The output of the above command line looks like this:

INFO 2025-03-06 13:51:40 Outlets 1735 2263 2434 3236 3590
START 2025-03-06 13:51:40 LOAD file=C:\Users\me\Velma\MD_wS109\DataInputs\m_1_DEM\elevation_10m_ws109.asc
INFO 2025-03-06 13:51:40 Set computational domain from default border definition.
DONE 2025-03-06 13:51:40 LOAD File loaded with default border.
START 2025-03-06 13:51:40 FDIR Generating flow-direction data for map.
DONE 2025-03-06 13:51:40 FDIR Generating flow-direction data for map.
START 2025-03-06 13:51:40 DELN Generating relations data for 5 outlets.
INFO 2025-03-06 13:51:40 i=1735 x=15 y=20 depends_on 2 outlets 2434 2263
INFO 2025-03-06 13:51:40 i=1735 x=15 y=20 contains 4 outlets 2434 2263 3236 3590
INFO 2025-03-06 13:51:40 i=2263 x=27 y=26 depends_on 1 outlets 3236
INFO 2025-03-06 13:51:40 i=2263 x=27 y=26 contains 2 outlets 3236 3590
INFO 2025-03-06 13:51:40 i=2434 x=26 y=28 depends_on 0 outlets
INFO 2025-03-06 13:51:40 i=2434 x=26 y=28 contains 0 outlets
INFO 2025-03-06 13:51:40 i=3236 x=54 y=37 depends_on 1 outlets 3590
INFO 2025-03-06 13:51:40 i=3236 x=54 y=37 contains 1 outlets 3590
INFO 2025-03-06 13:51:40 i=3590 x=64 y=41 depends_on 0 outlets
INFO 2025-03-06 13:51:40 i=3590 x=64 y=41 contains 0 outlets
DONE 2025-03-06 13:51:40 DELN Generating relations data for 5 outlets.
PS C:\Users\me\Velma_GitHub>

PdemOutletsInfo reports "depends_on" and "contains" information for each outlet listed on the command line. The "depends_on" line lists the directly adjacent outlets that flow into the specified "i, x, y" outlet. The "contains" line lists all the outlets that flow into the specified "i, x, y" outlet.

In the example above, the number of outlets is small, and it is possible to determine the final and contributor simulation configurations by examining the complete list of paired outlet-INFO lines.

When the number of outlets is large, the list of outlet-INFO lines can be difficult to review manually. In such cases, redirect the output of running PdemOutletsInfo to a file. That file can then be searched for specific lines with a text editor, or command-line search tools (e.g. grep and sed in Linux, Select-String in Microsoft Powershell, or Python in either.)

When the number of outlets is very large, and when the final outlet has many directly adjacent contributors, editor or tool-assisted searches may still be cumbersome. In this case, PdemOutletsInfo's optional --focus= and --onlyFocus arguments may help.

Use the --focus= parameter to specify one of the outlets provided on the command line that PdemOutletsinfo should "focus" on. Use the --onlyFocus parameter to limit PdemOutletsInfo output to only "focus"-related output info.

PdemOutletsInfo computes additional information for the outlet specified by --focus= and outputs it in lines prefixed with 'FOCUS'. The FOCUS lines report each non-independent outlet ("contains" line has > 0 outlets) directly adjacent to the focus-selected outlet, provide the outx , outy , and initialReachOutlets parameter values required to run that adjacent outlet in piecewise fashion. (i.e. the information required by Preparing Piecewise Configurations step 5).

Here is an example that uses --focus= and --focusOnly to find the piecewise configuration data for an overall watershed with a large number (300) of reach outlets. (In this example, the outlet lists have been elided to help keep things readable: the actualy output does not "[ . . . ]" outlet lists.)

PS C:\Users\me\Velma> 
java -Xmx2g -cp '.\Velma_Jars\JPDEM.jar' gov.epa.jpdem.PdemOutletsInfo `
C:\Users\me\Velma\WA_Skagit_30m\DataInputs\m_1_DEM\Skagit_DredgeMask_EEX.asc `
21560497 20947552 23291836 23146966 20104072 20033947 20288737 20921419 18566893 19684173 [ . . . ] `
--focus=17119788 `
--focusOnly `
>>
INFO 2025-03-06 15:09:05 300 outlet args 17119788 17224567 17264485 17324359 17434128 [ . . . ]
START 2025-03-06 15:09:05 LOAD file=C:\Users\me\Velma\WA_Skagit_30m\DataInputs\m_1_DEM\Skagit_DredgeMask_EEX.asc
INFO 2025-03-06 15:09:16 Set computational domain from default border definition.
DONE 2025-03-06 15:09:16 LOAD File loaded with default border.
START 2025-03-06 15:09:16 FDIR Generating flow-direction data for map.
DONE 2025-03-06 15:09:18 FDIR Generating flow-direction data for map.
START 2025-03-06 15:09:18 DELN Generating relations data for 300 outlets.
DONE 2025-03-06 15:09:20 DELN Generating relations data for 300 outlets.
FOCUS 2025-03-06 15:09:20 i17119788 Focused_Outlet depends_on 24 outlets delineates 83318 cells Xml_Params: outx=2529 outy=3431 initialReachOutlets=17119788 17224567 17264485 [ . . . ]
FOCUS 2025-03-06 15:09:20 i17119788 Adjacent_Contributor: i18731290 contains 140 outlets delineates 986759 cells Xml_Params: outx=2584 outy=3754 initialReachOutlets=18731290 18736285 18831091 [ . . . ]
FOCUS 2025-03-06 15:09:20 i17119788 Adjacent_Contributor: i18736278 contains 135 outlets delineates 994274 cells Xml_Params: outx=2583 outy=3755 initialReachOutlets=18736278 18771200 18821091 [ . . . ]

The information provided by the "Focused_Outlet" and "Adjacent_Contributor" lines above permits a decomposition of the original .xml into three pieces.

Following the steps for preparing piecewise configurations :

The "full .xml" for the above example is "WA_Skagit_30m_mulD.xml". We get the input_dem (DEM .asc map file) and initialReachOutlets values from the full .xml to use as arguments for PdemOutletsInfo.
The full .xml file also provides the final/overall watershed outlet's outx and outy values. We want to PdemOutletsInfo to --focus= on this final outlet, but we need its i-value. To determine the i-value, we can use JVelma or JPDEM.
- In JVelma, load the full .xml file and look at the field labelled "OutletI" in the "Run Parameters" tab pane.
- In JPDEM, load the map file specified by the full .xml's input_dem parameter, then enter the outx and outy values into JPDEM's "X=" and "Y=" fields. the "I=" field will then display the corresponding i-value.
Running PdemOutletsInfo, as shown in the example above generates all the information required by this step.
Using the information generated for step 3 by running PdemOutlets info, create the final piecewise .xml file by copying and renaming the full .xml and making these edits:
- Filename: i17119788_WA_Skagit_30m_mulD.xml
- The outx and outy parameters remain unchanged.
- Set initialReachOutlets to "17119788" plust the "initialReachOutlets=" value from the "i17119788 Focused_Outlet" output line.
Using the information generated for step 3 by running PdemOutlets info, create the two independent piecewise .xml files by copying and renaming the full .xml for each "i17119788 Adjacent_Contributor" output line.
- Provide each .xml file with a unique name (e.g. filename: i18731290_WA_Skagit_30m_mulD.xml).
- Set the outx and outy parameter values based on the Adjacent_Contributor "outx=" and "outy=" values.
- Set the initialReachOutlets to the "Adjacent_Contributor: " value plus the "initialReachOutlets="

Upon completing the above, we have three copies of the original full .xml file:

i17119788_WA_Skagit_30m_mulD.xml (dependent final configuration)
i18731290_WA_Skagit_30m_mulD.xml (independent configuration)
i18736278_WA_Skagit_30m_mulD.xml (independent configuration) The two independent configurations can be started and run immediately but the final configuration cannot be started until the independent configurations have successfully completed and must include a properly formated --relocationFile as a VelmaParallelCmdLine argument.

Submit a Suggestion