Calibration
HydroRaVENS calibration is built around run_and_score(),
which runs the model with a given parameter set, applies optional Nash-cascade
routing and a scoring window, and returns a CalibResult
containing all goodness-of-fit metrics. Integration with an external optimizer
(e.g. Dakota) is achieved by writing a thin
driver that reads parameter values from the optimizer’s output file and calls
run_and_score with those values.
run_and_score()
- hydroravens.run_and_score(cfg, t_efold=None, f_to_discharge=None, Hmax=None, pdm_H0=None, f_tile=None, tau_tile=None, melt_factor=None, fdd_threshold=None, snow_insulation_k=None, et_scale=None, et_alpha=None, wp_soil=None, wp_soil_sigma=None, recession_exponents=None, recession_exponents_calibrated=0, direct_runoff_fraction=None, baseflow_Q=None, modules=None, initial_states=None, start=None, end=None, spin_up_cycles=None, metric='KGE', routing_N=2, routing_K=None, enforce_water_balance='water-year')[source]
Run hydroRaVENS and return a CalibResult named tuple.
- Parameters:
cfg (str) – Path to a YAML configuration file. Should have spin_up_cycles: 0 so that spin-up is performed here with the calibrated parameters.
t_efold (list of float, optional) – E-folding residence times [days], one per reservoir (shallowest first). Overrides the values in cfg.
f_to_discharge (list of float, optional) – Exfiltration fractions to stream, one per reservoir except the deepest (which is always 1.0). Overrides the values in cfg.
Hmax (list of float, optional) – Maximum effective water depths [mm], one per reservoir. Overrides the values in cfg.
melt_factor (float, optional) – Degree-day snowmelt factor [mm SWE per degC per day]. Overrides the value in cfg.
direct_runoff_fraction (float or None, optional) – Fraction (0–1) of positive daily recharge that bypasses the reservoir cascade and exits directly as runoff. Conceptually inspired by Hortonian (infiltration-excess) overland flow, but at a daily timestep rainfall intensity is unavailable, so the fraction is not a rigorous physical representation – except in extreme events where intense rainfall dominates the daily total. In practice it acts as a calibrated fast-bypass fraction. None (default) leaves the value from the YAML config (itself defaulting to 0; off by default).
fdd_threshold (float or None, optional) – Frozen ground index threshold [°C·day]. The frozen ground index accumulates freezing degree-days and decays during warming (Molnau & Bissell 1983, https://westernsnowconference.org/bibliography/1983Molnau.pdf). When the index exceeds fdd_threshold, infiltration from the top reservoir to deeper layers is set to zero for that timestep (all drainage becomes direct runoff). None (default) disables the effect.
snow_insulation_k (float or None, optional) – Snow insulation decay constant [mm⁻¹ SWE]. Scales the effective air temperature reaching the soil as T_eff = T · exp(-k · SWE), reducing FGI accumulation under a deep snowpack. Applied to both freezing and thawing temperature forcing; excess degree-days from meltwater (excess_dd) are not scaled because meltwater delivers heat directly to the soil surface. None (default) leaves the value from cfg (itself defaulting to 0.0, i.e. no insulation). Literature values: LISFLOOD uses ~0.057 mm⁻¹; GSSHA default is 0.5 (units ambiguous in original source).
initial_states (dict, optional) –
Starting reservoir water depths, snowpack SWE, and frozen ground index, as returned by a previous call’s CalibResult.final_states. Format:
{'reservoirs': [H_shallow, H_deep, ...], 'snowpack': H_snow_SWE, 'fgi': frozen_ground_index}
When provided, these override the H0 values from cfg. Use with spin_up_cycles=0 when chaining consecutive decades so that water storage and frozen-ground state are physically continuous. When None (default), reservoirs are initialised to their analytical steady-state depths before spin-up (see
_steady_state_depths()).start (str or datetime-like, optional) – Start of the scoring window (inclusive). Score, AIC, BFI, and FDC are all computed within this window. Spin-up still uses the full record.
end (str or datetime-like, optional) – End of the scoring window (inclusive). Same as start.
spin_up_cycles (int or None, optional) – Number of times to loop the full record before the scored run.
None(default) auto-computes asceil(tau_max / record_length), wheretau_maxis the longest reservoir e-folding time after parameter overrides andrecord_lengthis the number of days in the input record. Because initial conditions are set to analytical steady-state depths, one e-folding time is sufficient to resolve the seasonal and inter-annual climate memory. Set to 0 when providing initial_states for chained decade runs.metric ({'KGE', 'NSE', 'logKGE', 'KGE_logKGE', 'KGE_logKGE_logFDC', 'KGE_logKGE_logFDC_BFI'}, optional) – Goodness-of-fit metric. Default is ‘KGE’.
'KGE_logKGE'returns 0.5*KGE + 0.5*logKGE, balancing peak and low-flow performance (Yilmaz et al. 2008).'KGE_logKGE_logFDC_BFI'adds a BFI bias-ratio score (1 - |BFI_mod/BFI_obs - 1|) as a fourth equal-weight component.routing_N (int, optional) – Number of identical linear reservoirs in the Nash cascade used for channel routing (shape parameter of the gamma IUH). Default is 2. Set routing_K to enable routing; routing_N is counted as a free parameter in k only when it is explicitly calibrated (the caller must add 1 to the k count via run_and_score if routing_N is varied).
routing_K (float or None, optional) – Storage time constant [days] of each Nash-cascade reservoir (scale parameter). Mean travel time through the cascade is routing_N * routing_K. When None (default), no routing is applied and the model output is compared directly to observed discharge. When provided, routing_K is counted as one free parameter.
modules (dict or None, optional) – Enable/disable optional process modules. Keys:
'snowpack','frozen_ground','rain_on_snow','direct_runoff'. Values: bool. Overrides themodulesblock in the YAML config. Absent keys leave the YAML/default value unchanged.enforce_water_balance (str, optional) – ‘water-year’ (default): scale ET by a per-water-year multiplier so that P - Q - ET = 0 over each water year. ‘global’: scale ET by a single multiplier computed from sum(P - Q_obs) / sum(ET_raw) over the full record — no per-year overfitting, does not add hidden degrees of freedom to AIC. ‘none’: use raw ET without correction (only for measured ET). Legacy boolean True/’False are silently mapped to ‘water-year’/’none’. Overrides the enforce_water_balance key in the YAML config.
- Returns:
Named tuple with fields: score : goodness-of-fit (higher is better) aic : AIC on log flows (lower is better; compare across
models with different reservoir counts)
bfi_obs : observed baseflow index (Eckhardt filter) bfi_mod : modelled baseflow index (Eckhardt filter) fdc_obs : pd.Series, observed FDC indexed by exceedance % fdc_mod : pd.Series, modelled FDC indexed by exceedance % final_states : dict suitable for use as initial_states next decade buckets : Buckets object after the final run
All scalar fields are np.nan if the scoring window is empty.
- Return type:
Notes
Spin-up runs the full record regardless of start/end so that the ET multiplier (computed per water year during initialize()) remains valid and the initial storage state reflects long-run climatology.
CalibResult
- class hydroravens.CalibResult(score, aic, bfi_obs, bfi_mod, kge_logfdc, fdc_obs, fdc_mod, final_states, buckets)
Named tuple returned by
run_and_score().- score
Goodness-of-fit score (higher is better). Metric is one of KGE, NSE, or logKGE as requested; see
run_and_score().- Type:
- aic
Akaike Information Criterion computed on log-transformed flows (lower is better). Useful for comparing models with different numbers of free parameters.
- Type:
- bfi_obs
Baseflow index of the observed discharge, computed with the Eckhardt (2005) recursive digital filter.
- Type:
- fdc_obs
Observed flow duration curve: discharge at 99 evenly-spaced exceedance probabilities (0.5–99.5 %), indexed by exceedance %.
- Type:
pd.Series
- fdc_mod
Modelled flow duration curve (same format as fdc_obs).
- Type:
pd.Series
- final_states
End-of-run reservoir states suitable for passing as
initial_statesto the next call:{'reservoirs': [H_shallow, H_deep, ...], # [mm] 'snowpack': H_snow_SWE, # [mm SWE] 'fgi': frozen_ground_index} # [°C·day]
- Type:
- buckets
The
Bucketsobject after the final run, including the fullhydrodataDataFrame with modelled discharge.- Type:
- All scalar fields are ``np.nan`` if the scoring window contains no
- valid overlapping data.
The fields of CalibResult are:
Field |
Type |
Description |
|---|---|---|
|
float |
Composite goodness-of-fit score from the metric chosen in
|
|
float |
Akaike Information Criterion computed on log-transformed discharge. Lower is better. See AIC and model comparison. |
|
float |
Observed baseflow index (Eckhardt recursive-filter method, α = 0.98, BFI_max = 0.80). |
|
float |
Modeled baseflow index, computed identically to |
|
float |
KGE on the log-transformed flow-duration curve. Sensitive to reservoir partitioning and long-term storage behaviour. |
|
pd.Series |
Observed FDC at 99 exceedance percentiles (log-space). |
|
pd.Series |
Modeled FDC at 99 exceedance percentiles (log-space). |
|
dict |
Reservoir water depths (mm) at the end of the run. Pass as
|
|
The fully initialised and run model object. Access
|
Goodness-of-Fit Metrics
The metric argument to run_and_score() selects the
composite score maximised during calibration and returned as
CalibResult.score. The AIC is always computed and returned separately
regardless of the chosen metric.
Available metrics
|
Score range |
Best suited for |
|---|---|---|
|
−∞ to 1 |
Overall hydrograph shape; balanced sensitivity to timing, magnitude, and variability. |
|
−∞ to 1 |
Peak-flow emphasis; dominated by high-flow events. |
|
−∞ to 1 |
Low-flow and baseflow behaviour; down-weights large peak events. |
|
−∞ to 1 |
Balanced calibration target: average of KGE and logKGE. Recommended for most temperate catchments. |
|
−∞ to 1 |
As above but also includes the flow-duration curve. Useful when the full range of flows is of interest. |
Recommendation: KGE_logKGE is the default and recommended metric for
calibrating multi-reservoir models in temperate climates. It weights peak and
low flow equally and does not over-emphasise the largest flood events the way
NSE or raw KGE do. When the flow-duration curve (groundwater contribution
to dry-season flow) is a specific focus, use KGE_logKGE_logFDC.
Baseflow index (BFI) is reported in all CalibResult objects as a
diagnostic regardless of the chosen calibration metric. Use it to verify
that the modeled groundwater contribution is physically plausible.
AIC and model comparison
The Akaike Information Criterion allows comparison of models with different numbers of free parameters:
where \(n\) is the number of scored days, \(\text{RSS}_{\ln Q}\) is the residual sum of squares on log-transformed discharge, and \(k\) is the count of free calibration parameters. Lower AIC indicates a better trade-off between fit quality and model complexity.
The AIC is evaluated on log-transformed discharge because the residuals of raw discharge are dominated by large flood events, which gives a misleading picture of overall model skill. Log-transformation gives roughly equal weight to all flow magnitudes.
Counting free parameters (k):
Only parameters that are actively calibrated (not fixed) contribute to \(k\).
In run_and_score, \(k\) is accumulated automatically for each parameter
that is passed as a non-trivial argument (e.g. each element of t_efold and
f_to_discharge that is free). Fixed structural choices (e.g. b_deep = 1
by convention, not optimization) do not add to \(k\).
The ET scaling factor added by enforce_water_balance='water-year' is a
hidden degree of freedom: one multiplier per water year is computed from the
data, effectively adding \(N_{\text{years}}\) free parameters.
enforce_water_balance='global' computes a single multiplier for the full
record, adding only 1 degree of freedom (not counted in \(k\) because it is
derived analytically from the data, not optimized). For multi-model AIC
comparisons, use enforce_water_balance='global' to avoid this hidden
per-year penalty.
Interpreting ΔAIC:
\(\Delta\text{AIC}\) |
Interpretation |
|---|---|
0–2 |
Substantial support for both models; the simpler one is preferred on parsimony grounds. |
2–4 |
Moderate evidence favouring the lower-AIC model. |
4–10 |
Considerably less support for the higher-AIC model. |
> 10 |
The higher-AIC model is effectively ruled out. |
Practical Calibration Workflow
Scoring window
By default, run_and_score scores over all time steps that have observed
discharge. Use start and end (YYYY-MM-DD strings or datetime
objects) to restrict the scoring window:
result = run_and_score(
'config.yml',
t_efold=[671, 22, 50881],
f_to_discharge=[0.553, 0.446, 1.0],
recession_exponents=[4.62, 2.203, 1.0],
start='1993-01-01', # exclude first two years of spin-up uncertainty
end='2011-09-30',
)
Leaving a 1–2 year buffer at the start of the record is good practice when the deep reservoir is slow (\(\tau\) ≫ record length) and spin-up may not fully equilibrate it.
Spin-up
spin_up_cycles=None (the default) triggers automatic calculation:
ceil(τ_max / record_length). This is adequate when initial conditions are
set to the analytical steady-state (the default behaviour in run_and_score).
Set spin_up_cycles=0 when chaining decade runs — the final_states from
the previous run supply well-initialised starting conditions, so additional
spin-up is not needed.
Nash-cascade routing
The model outputs point-source discharge from each reservoir. To simulate
channel travel time (appropriate for large basins), routing_N and
routing_K apply Nash-cascade attenuation after the reservoir cascade:
result = run_and_score(
'config.yml',
...,
routing_N=2, # number of routing reservoirs (integer ≥ 1)
routing_K=0.5, # routing timescale (days)
)
For small catchments (<500 km²), routing adds negligible improvement and should be omitted. For large basins, fit one routing step (N=1–3) with a timescale of order (basin_length / wave_speed). Routing parameters are calibration parameters and add to \(k\) if non-trivial.
Chaining decade runs
To simulate non-stationary behaviour (changing land use, long-term climate trends), chain runs across decades, passing end-state storage from one decade as the initial state of the next:
result_d1 = run_and_score('config_1991_2000.yml', ..., spin_up_cycles=1)
result_d2 = run_and_score(
'config_2001_2010.yml',
...,
initial_states=result_d1.final_states,
spin_up_cycles=0, # no spin-up — initial states already equilibrated
)
The final_states dict has keys matching reservoir_order in the driver
config (e.g. {'soil': H_soil, 'intermediate': H_int, 'deep': H_deep}).
Suggested Parameter Sets
The following configurations are illustrative starting points derived from calibration studies on temperate agricultural catchments in the upper Midwest (USA). They should be treated as starting points for a calibration, not as universal defaults — the correct parameter values depend on the specific basin.
Three-reservoir temperate agricultural basin
This structure (soil → intermediate → deep) is appropriate for glaciated lowland catchments with tile drainage, shallow Quaternary aquifers, and regional bedrock groundwater. It is the recommended starting structure for AIC-based exploration.
reservoirs:
e_folding_residence_times__days:
- 671 # soil: calibrate; log10 bounds [1.0, 4.5]
- 22 # intermediate: calibrate; log10 bounds [1.0, 4.5]
- 50000 # deep: calibrate; log10 bounds [3.5, 5.0]
exfiltration_fractions:
- 0.55 # soil → stream vs. percolating to intermediate
- 0.45 # intermediate → stream vs. recharging deep
- 1.0 # deep → all to stream
maximum_effective_depths__mm:
- .inf
- .inf
- .inf
recession_exponents:
- 4.62 # soil: calibrate [1.5, 6.0]; encodes tile-drain nonlinearity
- 2.203 # intermediate: fix at B–N lower-envelope value (basin-specific)
- 1.0 # deep: fix at 1.0 (linear; confined aquifer)
snowmelt:
PDD_melt_factor: 5.1 # mm SWE °C⁻¹ day⁻¹; calibrate [0.1, 10.0]
snow_insulation_k: 0.0
fdd_threshold: .inf # frozen ground disabled; b_soil absorbs the signal
modules:
snowpack: true
frozen_ground: false # b_soil > 1 absorbs spring-pulse signal
rain_on_snow: true
direct_runoff: false
dtr_fgi_decay: false
et_water_stress: false
et_reservoir_draw: true
general:
enforce_water_balance: global # single WB multiplier; no hidden per-year d.o.f.
et_alpha: 1.0
Key design choices and rationale:
frozen_ground: false— the calibrated soil nonlinearity (\(b \approx 3\)–5 for tile-drained basins) reproduces the same spring fast-flow signal as frozen infiltration blockage. Activating both simultaneously leads to equifinality. Use frozen ground only when independent soil temperature or frost-depth data warrant it; see Model Description for the equifinality argument.recession_b_intermediatefixed at the basin-specific Brutsaert–Nieber lower-envelope value — the B–N lower envelope represents the slowest observable recession, which corresponds to the intermediate Quaternary zone (MRT of days to weeks). See Recession Analysis for the derivation andBrutsaertNieberfor how to compute the estimate.recession_b_deep = 1.0fixed — the deep reservoir (MRT of decades to centuries) produces discharge too small and too slow to constrain \(b\) from streamflow. The linear approximation (Darcy flow in a confined or semi-confined matrix) is physically appropriate.enforce_water_balance: global— a single ET multiplier applied to the full record. Recommended for AIC-based model comparison because'water-year'mode adds one hidden degree of freedom per year.et_reservoir_draw: true— ET drawn from post-cascade soil storage, giving a temporal buffer equal to the soil MRT without any additional free parameter.
Free parameters for this structure (k = 7):
Parameter |
Suggested bounds |
Notes |
|---|---|---|
|
[0.1, 10.0] |
Sensitive to climate; 1–6 mm SWE °C⁻¹ d⁻¹ typical |
|
[1.0, 4.5] |
Unconstrained; MRT more informative than raw τ |
|
[1.5, 6.0] |
High b encodes tile drainage or surface nonlinearity |
|
[0.01, 0.99] |
— |
|
[1.0, 4.5] |
Unconstrained; allows model to self-discover timescale ordering |
|
[0.01, 0.99] |
— |
|
[3.5, 5.0] |
~1000–100,000 days; well-constrained range for bedrock GW |
Two-reservoir baseline
A simpler structure for data-sparse basins or as a structural null hypothesis before adding a third reservoir. AIC comparison against the three-reservoir structure indicates whether the additional reservoir is warranted.
reservoirs:
e_folding_residence_times__days:
- 30 # soil/shallow: calibrate
- 5000 # groundwater: calibrate
exfiltration_fractions:
- 0.6 # calibrate
- 1.0
maximum_effective_depths__mm:
- .inf
- .inf
recession_exponents:
- 2.0 # soil: starting point from B–N; calibrate
- 1.0 # groundwater: fix linear
general:
enforce_water_balance: global
This structure has k = 5 free parameters (τ × 2, f × 1, b_soil × 1, PDD_melt_factor × 1). If the three-reservoir structure gives ΔAIC < 2 compared to this baseline, the additional reservoir is not justified by the data.