Calibration

HydroRaVENS calibration is built around run_and_score(), which runs the model with a given parameter set, applies optional Nash-cascade routing and a scoring window, and returns a CalibResult containing all goodness-of-fit metrics. Integration with an external optimizer (e.g. Dakota) is achieved by writing a thin driver that reads parameter values from the optimizer’s output file and calls run_and_score with those values.

run_and_score()

hydroravens.run_and_score(cfg, t_efold=None, f_to_discharge=None, Hmax=None, pdm_H0=None, f_tile=None, tau_tile=None, melt_factor=None, fdd_threshold=None, snow_insulation_k=None, et_scale=None, et_alpha=None, wp_soil=None, wp_soil_sigma=None, recession_exponents=None, recession_exponents_calibrated=0, direct_runoff_fraction=None, baseflow_Q=None, modules=None, initial_states=None, start=None, end=None, spin_up_cycles=None, metric='KGE', routing_N=2, routing_K=None, enforce_water_balance='water-year')[source]

Run hydroRaVENS and return a CalibResult named tuple.

Parameters:

cfg (str) – Path to a YAML configuration file. Should have spin_up_cycles: 0 so that spin-up is performed here with the calibrated parameters.
t_efold (list of float, optional) – E-folding residence times [days], one per reservoir (shallowest first). Overrides the values in cfg.
f_to_discharge (list of float, optional) – Exfiltration fractions to stream, one per reservoir except the deepest (which is always 1.0). Overrides the values in cfg.
Hmax (list of float, optional) – Maximum effective water depths [mm], one per reservoir. Overrides the values in cfg.
melt_factor (float, optional) – Degree-day snowmelt factor [mm SWE per degC per day]. Overrides the value in cfg.
direct_runoff_fraction (float or None, optional) – Fraction (0–1) of positive daily recharge that bypasses the reservoir cascade and exits directly as runoff. Conceptually inspired by Hortonian (infiltration-excess) overland flow, but at a daily timestep rainfall intensity is unavailable, so the fraction is not a rigorous physical representation – except in extreme events where intense rainfall dominates the daily total. In practice it acts as a calibrated fast-bypass fraction. None (default) leaves the value from the YAML config (itself defaulting to 0; off by default).
fdd_threshold (float or None, optional) – Frozen ground index threshold [°C·day]. The frozen ground index accumulates freezing degree-days and decays during warming (Molnau & Bissell 1983, https://westernsnowconference.org/bibliography/1983Molnau.pdf). When the index exceeds fdd_threshold, infiltration from the top reservoir to deeper layers is set to zero for that timestep (all drainage becomes direct runoff). None (default) disables the effect.
snow_insulation_k (float or None, optional) – Snow insulation decay constant [mm⁻¹ SWE]. Scales the effective air temperature reaching the soil as T_eff = T · exp(-k · SWE), reducing FGI accumulation under a deep snowpack. Applied to both freezing and thawing temperature forcing; excess degree-days from meltwater (excess_dd) are not scaled because meltwater delivers heat directly to the soil surface. None (default) leaves the value from cfg (itself defaulting to 0.0, i.e. no insulation). Literature values: LISFLOOD uses ~0.057 mm⁻¹; GSSHA default is 0.5 (units ambiguous in original source).
initial_states (dict, optional) –
Starting reservoir water depths, snowpack SWE, and frozen ground index, as returned by a previous call’s CalibResult.final_states. Format:
```
{'reservoirs': [H_shallow, H_deep, ...],
 'snowpack':    H_snow_SWE,
 'fgi':         frozen_ground_index}
```
When provided, these override the H0 values from cfg. Use with spin_up_cycles=0 when chaining consecutive decades so that water storage and frozen-ground state are physically continuous. When None (default), reservoirs are initialised to their analytical steady-state depths before spin-up (see _steady_state_depths()).
start (str or datetime-like, optional) – Start of the scoring window (inclusive). Score, AIC, BFI, and FDC are all computed within this window. Spin-up still uses the full record.
end (str or datetime-like, optional) – End of the scoring window (inclusive). Same as start.
spin_up_cycles (int or None, optional) – Number of times to loop the full record before the scored run. None (default) auto-computes as ceil(tau_max / record_length), where tau_max is the longest reservoir e-folding time after parameter overrides and record_length is the number of days in the input record. Because initial conditions are set to analytical steady-state depths, one e-folding time is sufficient to resolve the seasonal and inter-annual climate memory. Set to 0 when providing initial_states for chained decade runs.
metric ({'KGE', 'NSE', 'logKGE', 'KGE_logKGE', 'KGE_logKGE_logFDC', 'KGE_logKGE_logFDC_BFI'}, optional) – Goodness-of-fit metric. Default is ‘KGE’. 'KGE_logKGE' returns 0.5*KGE + 0.5*logKGE, balancing peak and low-flow performance (Yilmaz et al. 2008). 'KGE_logKGE_logFDC_BFI' adds a BFI bias-ratio score (1 - |BFI_mod/BFI_obs - 1|) as a fourth equal-weight component.
routing_N (int, optional) – Number of identical linear reservoirs in the Nash cascade used for channel routing (shape parameter of the gamma IUH). Default is 2. Set routing_K to enable routing; routing_N is counted as a free parameter in k only when it is explicitly calibrated (the caller must add 1 to the k count via run_and_score if routing_N is varied).
routing_K (float or None, optional) – Storage time constant [days] of each Nash-cascade reservoir (scale parameter). Mean travel time through the cascade is routing_N * routing_K. When None (default), no routing is applied and the model output is compared directly to observed discharge. When provided, routing_K is counted as one free parameter.
modules (dict or None, optional) – Enable/disable optional process modules. Keys: 'snowpack', 'frozen_ground', 'rain_on_snow', 'direct_runoff'. Values: bool. Overrides the modules block in the YAML config. Absent keys leave the YAML/default value unchanged.
enforce_water_balance (str, optional) – ‘water-year’ (default): scale ET by a per-water-year multiplier so that P - Q - ET = 0 over each water year. ‘global’: scale ET by a single multiplier computed from sum(P - Q_obs) / sum(ET_raw) over the full record — no per-year overfitting, does not add hidden degrees of freedom to AIC. ‘none’: use raw ET without correction (only for measured ET). Legacy boolean True/’False are silently mapped to ‘water-year’/’none’. Overrides the enforce_water_balance key in the YAML config.

Returns:

Named tuple with fields: score : goodness-of-fit (higher is better) aic : AIC on log flows (lower is better; compare across

models with different reservoir counts)

bfi_obs : observed baseflow index (Eckhardt filter) bfi_mod : modelled baseflow index (Eckhardt filter) fdc_obs : pd.Series, observed FDC indexed by exceedance % fdc_mod : pd.Series, modelled FDC indexed by exceedance % final_states : dict suitable for use as initial_states next decade buckets : Buckets object after the final run

All scalar fields are np.nan if the scoring window is empty.

Return type:

CalibResult

Notes

Spin-up runs the full record regardless of start/end so that the ET multiplier (computed per water year during initialize()) remains valid and the initial storage state reflects long-run climatology.

CalibResult 

class hydroravens.CalibResult(score, aic, bfi_obs, bfi_mod, kge_logfdc, fdc_obs, fdc_mod, final_states, buckets)

Named tuple returned by run_and_score().

score

Goodness-of-fit score (higher is better). Metric is one of KGE, NSE, or logKGE as requested; see run_and_score().

Type:: float

aic

Akaike Information Criterion computed on log-transformed flows (lower is better). Useful for comparing models with different numbers of free parameters.

Type:: float

bfi_obs

Baseflow index of the observed discharge, computed with the Eckhardt (2005) recursive digital filter.

Type:: float

bfi_mod

Baseflow index of the modelled discharge.

Type:: float

fdc_obs

Observed flow duration curve: discharge at 99 evenly-spaced exceedance probabilities (0.5–99.5 %), indexed by exceedance %.

Type:: pd.Series

fdc_mod

Modelled flow duration curve (same format as fdc_obs).

Type:: pd.Series

final_states

End-of-run reservoir states suitable for passing as initial_states to the next call:

{'reservoirs': [H_shallow, H_deep, ...],  # [mm]
 'snowpack':    H_snow_SWE,               # [mm SWE]
 'fgi':         frozen_ground_index}       # [°C·day]

Type:: dict

buckets

The Buckets object after the final run, including the full hydrodata DataFrame with modelled discharge.

Type:: Buckets

All scalar fields are ``np.nan`` if the scoring window contains no

valid overlapping data.

The fields of CalibResult are:

Field	Type	Description
`score`	float	Composite goodness-of-fit score from the metric chosen in `metric=`. Higher is better (all metrics return higher values for better fits). `np.nan` if no observed discharge falls in the scoring window.
`aic`	float	Akaike Information Criterion computed on log-transformed discharge. Lower is better. See AIC and model comparison.
`bfi_obs`	float	Observed baseflow index (Eckhardt recursive-filter method, α = 0.98, BFI_max = 0.80).
`bfi_mod`	float	Modeled baseflow index, computed identically to `bfi_obs`.
`kge_logfdc`	float	KGE on the log-transformed flow-duration curve. Sensitive to reservoir partitioning and long-term storage behaviour.
`fdc_obs`	pd.Series	Observed FDC at 99 exceedance percentiles (log-space).
`fdc_mod`	pd.Series	Modeled FDC at 99 exceedance percentiles (log-space).
`final_states`	dict	Reservoir water depths (mm) at the end of the run. Pass as `initial_states=` in the next decade run to chain simulations.
`buckets`	`Buckets`	The fully initialised and run model object. Access `result.buckets.hydrodata` for the full simulated time series.

Goodness-of-Fit Metrics 

The metric argument to run_and_score() selects the composite score maximised during calibration and returned as CalibResult.score. The AIC is always computed and returned separately regardless of the chosen metric.

Available metrics 

`metric` key	Score range	Best suited for
`KGE`	−∞ to 1	Overall hydrograph shape; balanced sensitivity to timing, magnitude, and variability.
`NSE`	−∞ to 1	Peak-flow emphasis; dominated by high-flow events.
`logKGE`	−∞ to 1	Low-flow and baseflow behaviour; down-weights large peak events.
`KGE_logKGE`	−∞ to 1	Balanced calibration target: average of KGE and logKGE. Recommended for most temperate catchments.
`KGE_logKGE_logFDC`	−∞ to 1	As above but also includes the flow-duration curve. Useful when the full range of flows is of interest.

Recommendation: KGE_logKGE is the default and recommended metric for calibrating multi-reservoir models in temperate climates. It weights peak and low flow equally and does not over-emphasise the largest flood events the way NSE or raw KGE do. When the flow-duration curve (groundwater contribution to dry-season flow) is a specific focus, use KGE_logKGE_logFDC.

Baseflow index (BFI) is reported in all CalibResult objects as a diagnostic regardless of the chosen calibration metric. Use it to verify that the modeled groundwater contribution is physically plausible.

AIC and model comparison 

The Akaike Information Criterion allows comparison of models with different numbers of free parameters:

\[\text{AIC} = n \ln\!\left(\frac{\text{RSS}_{\ln Q}}{n}\right) + 2k\]

where \(n\) is the number of scored days, \(\text{RSS}_{\ln Q}\) is the residual sum of squares on log-transformed discharge, and \(k\) is the count of free calibration parameters. Lower AIC indicates a better trade-off between fit quality and model complexity.

The AIC is evaluated on log-transformed discharge because the residuals of raw discharge are dominated by large flood events, which gives a misleading picture of overall model skill. Log-transformation gives roughly equal weight to all flow magnitudes.

Counting free parameters (k): Only parameters that are actively calibrated (not fixed) contribute to \(k\). In run_and_score, \(k\) is accumulated automatically for each parameter that is passed as a non-trivial argument (e.g. each element of t_efold and f_to_discharge that is free). Fixed structural choices (e.g. b_deep = 1 by convention, not optimization) do not add to \(k\).

The ET scaling factor added by enforce_water_balance='water-year' is a hidden degree of freedom: one multiplier per water year is computed from the data, effectively adding \(N_{\text{years}}\) free parameters. enforce_water_balance='global' computes a single multiplier for the full record, adding only 1 degree of freedom (not counted in \(k\) because it is derived analytically from the data, not optimized). For multi-model AIC comparisons, use enforce_water_balance='global' to avoid this hidden per-year penalty.

Interpreting ΔAIC:

\(\Delta\text{AIC}\)	Interpretation
0–2	Substantial support for both models; the simpler one is preferred on parsimony grounds.
2–4	Moderate evidence favouring the lower-AIC model.
4–10	Considerably less support for the higher-AIC model.
> 10	The higher-AIC model is effectively ruled out.

Practical Calibration Workflow 

Scoring window 

By default, run_and_score scores over all time steps that have observed discharge. Use start and end (YYYY-MM-DD strings or datetime objects) to restrict the scoring window:

result = run_and_score(
    'config.yml',
    t_efold=[671, 22, 50881],
    f_to_discharge=[0.553, 0.446, 1.0],
    recession_exponents=[4.62, 2.203, 1.0],
    start='1993-01-01',   # exclude first two years of spin-up uncertainty
    end='2011-09-30',
)

Leaving a 1–2 year buffer at the start of the record is good practice when the deep reservoir is slow (\(\tau\) ≫ record length) and spin-up may not fully equilibrate it.

Spin-up 

spin_up_cycles=None (the default) triggers automatic calculation: ceil(τ_max / record_length). This is adequate when initial conditions are set to the analytical steady-state (the default behaviour in run_and_score). Set spin_up_cycles=0 when chaining decade runs — the final_states from the previous run supply well-initialised starting conditions, so additional spin-up is not needed.

Nash-cascade routing 

The model outputs point-source discharge from each reservoir. To simulate channel travel time (appropriate for large basins), routing_N and routing_K apply Nash-cascade attenuation after the reservoir cascade:

result = run_and_score(
    'config.yml',
    ...,
    routing_N=2,     # number of routing reservoirs (integer ≥ 1)
    routing_K=0.5,   # routing timescale (days)
)

For small catchments (<500 km²), routing adds negligible improvement and should be omitted. For large basins, fit one routing step (N=1–3) with a timescale of order (basin_length / wave_speed). Routing parameters are calibration parameters and add to \(k\) if non-trivial.

Chaining decade runs 

To simulate non-stationary behaviour (changing land use, long-term climate trends), chain runs across decades, passing end-state storage from one decade as the initial state of the next:

result_d1 = run_and_score('config_1991_2000.yml', ..., spin_up_cycles=1)
result_d2 = run_and_score(
    'config_2001_2010.yml',
    ...,
    initial_states=result_d1.final_states,
    spin_up_cycles=0,   # no spin-up — initial states already equilibrated
)

The final_states dict has keys matching reservoir_order in the driver config (e.g. {'soil': H_soil, 'intermediate': H_int, 'deep': H_deep}).

Suggested Parameter Sets 

The following configurations are illustrative starting points derived from calibration studies on temperate agricultural catchments in the upper Midwest (USA). They should be treated as starting points for a calibration, not as universal defaults — the correct parameter values depend on the specific basin.

Three-reservoir temperate agricultural basin 

This structure (soil → intermediate → deep) is appropriate for glaciated lowland catchments with tile drainage, shallow Quaternary aquifers, and regional bedrock groundwater. It is the recommended starting structure for AIC-based exploration.

reservoirs:
    e_folding_residence_times__days:
        - 671      # soil: calibrate; log10 bounds [1.0, 4.5]
        - 22       # intermediate: calibrate; log10 bounds [1.0, 4.5]
        - 50000    # deep: calibrate; log10 bounds [3.5, 5.0]
    exfiltration_fractions:
        - 0.55     # soil → stream vs. percolating to intermediate
        - 0.45     # intermediate → stream vs. recharging deep
        - 1.0      # deep → all to stream
    maximum_effective_depths__mm:
        - .inf
        - .inf
        - .inf
    recession_exponents:
        - 4.62     # soil: calibrate [1.5, 6.0]; encodes tile-drain nonlinearity
        - 2.203    # intermediate: fix at B–N lower-envelope value (basin-specific)
        - 1.0      # deep: fix at 1.0 (linear; confined aquifer)

snowmelt:
    PDD_melt_factor: 5.1   # mm SWE °C⁻¹ day⁻¹; calibrate [0.1, 10.0]
    snow_insulation_k: 0.0
    fdd_threshold: .inf    # frozen ground disabled; b_soil absorbs the signal

modules:
    snowpack:          true
    frozen_ground:     false   # b_soil > 1 absorbs spring-pulse signal
    rain_on_snow:      true
    direct_runoff:     false
    dtr_fgi_decay:     false
    et_water_stress:   false
    et_reservoir_draw: true

general:
    enforce_water_balance: global   # single WB multiplier; no hidden per-year d.o.f.
    et_alpha: 1.0

Key design choices and rationale:

frozen_ground: false — the calibrated soil nonlinearity (\(b \approx 3\)–5 for tile-drained basins) reproduces the same spring fast-flow signal as frozen infiltration blockage. Activating both simultaneously leads to equifinality. Use frozen ground only when independent soil temperature or frost-depth data warrant it; see Model Description for the equifinality argument.
recession_b_intermediate fixed at the basin-specific Brutsaert–Nieber lower-envelope value — the B–N lower envelope represents the slowest observable recession, which corresponds to the intermediate Quaternary zone (MRT of days to weeks). See Recession Analysis for the derivation and BrutsaertNieber for how to compute the estimate.
recession_b_deep = 1.0 fixed — the deep reservoir (MRT of decades to centuries) produces discharge too small and too slow to constrain \(b\) from streamflow. The linear approximation (Darcy flow in a confined or semi-confined matrix) is physically appropriate.
enforce_water_balance: global — a single ET multiplier applied to the full record. Recommended for AIC-based model comparison because 'water-year' mode adds one hidden degree of freedom per year.
et_reservoir_draw: true — ET drawn from post-cascade soil storage, giving a temporal buffer equal to the soil MRT without any additional free parameter.

Free parameters for this structure (k = 7):

Parameter	Suggested bounds	Notes
`PDD_melt_factor`	[0.1, 10.0]	Sensitive to climate; 1–6 mm SWE °C⁻¹ d⁻¹ typical
`log10(τ_soil)`	[1.0, 4.5]	Unconstrained; MRT more informative than raw τ
`recession_b_soil`	[1.5, 6.0]	High b encodes tile drainage or surface nonlinearity
`f_exfiltration_soil`	[0.01, 0.99]	—
`log10(τ_intermediate)`	[1.0, 4.5]	Unconstrained; allows model to self-discover timescale ordering
`f_exfiltration_intermediate`	[0.01, 0.99]	—
`log10(τ_deep)`	[3.5, 5.0]	~1000–100,000 days; well-constrained range for bedrock GW

Two-reservoir baseline 

A simpler structure for data-sparse basins or as a structural null hypothesis before adding a third reservoir. AIC comparison against the three-reservoir structure indicates whether the additional reservoir is warranted.

reservoirs:
    e_folding_residence_times__days:
        - 30       # soil/shallow: calibrate
        - 5000     # groundwater: calibrate
    exfiltration_fractions:
        - 0.6      # calibrate
        - 1.0
    maximum_effective_depths__mm:
        - .inf
        - .inf
    recession_exponents:
        - 2.0      # soil: starting point from B–N; calibrate
        - 1.0      # groundwater: fix linear

general:
    enforce_water_balance: global

This structure has k = 5 free parameters (τ × 2, f × 1, b_soil × 1, PDD_melt_factor × 1). If the three-reservoir structure gives ΔAIC < 2 compared to this baseline, the additional reservoir is not justified by the data.