+
Observered Runoff Data May Inlude NaN Values
Overview
Unlike other input data, the observed runoff values presented to VELMA may contain NaN
values. The code within VELMA that computes Nash-Sutcliffe ("NSE") and Root Mean Square ("RMSE") statistics for runoff treats NaN
as a "no data" value. Any (simulated, observed) data-pair where one or both values equal NaN
is "rejected" (i.e. ignored) by the NSE and RMSE calculators.
Like other driver data files, any observed runoff file presented to a VELMA simulation must have a data value for each Julian day between (inclusively) the simulation's forcing_start
and forcing_end
parameter year values.
Example
Suppose a simulation configuration hasforcing_start=1991
andforcing_end=1993
.
Driver data files for that simulation must contain(1993 - 1991 + 1) * 365 = 1095
values.
Unlike other driver data files, observed runoff files are allowed to have NaN
as the value for a given step.
The option of explicitly specifying that an observed runoff data value is missing (via NaN
) allows VELMA to calculate NSE and RMSE values for your simulation run's runoff even when you don't have observed data for every step of the [forcing_start, forcing_end]
span. The NSE and RMSE values are calculated with however many data-pairs are available. VELMA reports how many observed values were used, but not their distribution across the simulated values. You are responsible for knowing and deciding whether their distribution is a problem or not.
Example
Suppose the observed runoff file for ourforcing_start=1991
andforcing_end=1993
simulation configuration hasNaN
values for all of the steps (days) in year 1992.
Further suppose that the simulation is run for year 1992: the NSE for this run will beNaN
, and VELMA results will report it as such along with the fact that 365 of 365 elements (obs,sim data-pairs) were rejected.
Now suppose the same simulation configuration is run again, but for 1991 through 1993, and that the NSE for this run computes to 0.75. You must be aware -- apart from anything VELMA inludes in its results -- that the 730 of 1095 data-pairs used to compute the NSE (and RMSE) completely exclude the middle year of the simulation run.
Specific Notes
-
When you set a value in an observed runoff data file to
NaN
, it must have exactly that spelling and letter-case.
For example, none of the following are recognized: "nan", "NAN", "Nan", and each would cause the observed runoff file's load and initialization process -- and the simulation run itself -- to fail. -
Each row of your observed runoff file can contain either 1 or 3 comma-separated fields.
No other combinations or formats are permitted.
Here are a few lines of a single-field file as an example:
11.5121
NaN
24.80
26.2279
When the file has a single field, that field contains the observed runoff value for one simulation step, and implicitly, the first value in the file is the value for first day of the forcing_start
year, and so on.
Here are a few lines from a triple-field version of the same data:
1991,1,11.5121
1991,2,NaN
1991,3,24.80
1991,4,26.2279
When the file has three fields, the first two fields are interpreted as a year and jday.
However, the first value in the file is still the value for first day of the forcing_start
year!.
The year and jday are permitted, but VELMA ignores them. They are permitted to make the file easier for humans to read and verify.
-
Here's a dark secret: if your observed runoff data file's first value is the value for the first day of the
forcing_start
year (and it will be used as such, regardless) you can have fewer data values in you file then the number of steps in the[forcing_start, forcing_end]
span, and your simulation run should still produce a "usable" NSE value.
This is because VELMA fills in "missing" values at then end of your data as it loads from your file. It can do this because it can unambiguously assignNaN
for each missing value.
Do not take advantage of this behavior!
If you have fewer actual observed values than the[forcing_start, forcing_end]
span requires, fill the remainder of your file withNaN
values explicitly. Do not rely on VELMA to do this for you. We only mention this behavior in case you accidentally use an observed runoff file that is "too short", and then wonder how the results are able to include an NSE value at all. -
Having a header row for your observed runoff data file is allowed, but currently discouraged, because some command line tools that use observed runoff data files cannot process them correctly. Your observed data file will be read OK by the VELMA simulator, but not by other VELMA utility tools.
-
JVelma displays observed runoff
NaN
values in charts that include observed runoff -- e.g. "Calibration Hydrology" -- by "pegging the value to the top". That is, the value will appear at the very top of the chart's display array. This may be outside the borders of the chart's graph. Annual-values charts with aNaN
at any point during the year will display that year asNaN
(i.e. pegged to top).