Tutorial: From Data to Calibrated Model
This tutorial walks through a complete HydroRaVENS workflow, from raw daily discharge and precipitation data to a calibrated model with physically informed parameter starting points.
Overview
The recommended workflow has four stages:
Estimate priors — analyse the discharge record to get data-driven starting points for timescales, recession exponents, and initial storage depths (
suggest_priors()).Write a configuration file — populate a YAML file using the suggested priors.
Calibrate — run the optimizer to refine parameters.
Evaluate — inspect goodness-of-fit metrics and the best-fit hydrograph.
Input Data Format
HydroRaVENS reads a CSV file with at minimum these columns:
Date,Precipitation [mm/day],Discharge [m^3/s]
Optional columns activate additional model processes:
Evapotranspiration [mm/day] # measured ET; enables evapotranspiration_method: datafile
Mean Temperature [C] # activates snowpack and frozen-ground modules
Minimum Temperature [C] # activates DTR-based FGI decay and ThorntwaiteChang2019
Maximum Temperature [C] # activates DTR-based FGI decay and ThorntwaiteChang2019
Photoperiod [hr] # required for ThorntwaiteChang2019 ET method
All columns must be at a daily timestep. Missing values in the discharge column are allowed and are excluded from scoring.
Stage 1 — Estimate Priors
Before running or calibrating the model, use suggest_priors()
to extract data-driven parameter starting points directly from the
observed discharge record.
import pandas as pd
from hydroravens import suggest_priors
df = pd.read_csv('data/input.csv', parse_dates=['Date'])
Q = df['Specific Discharge [mm/day]'].values # or compute from m³/s
P = df['Precipitation [mm/day]'].values
pr = suggest_priors(Q, P=P, n_reservoirs=3)
pr.summary()
The output looks like:
============================================================
HydroRaVENS data-driven priors
============================================================
E-folding residence times (fastest → slowest):
soil : 28.4 days
karst : 847.3 days
deep : could not be estimated — use calibration default
Power-law recession exponents (fastest → slowest):
soil : 2.841 (B–N data-driven estimate — calibrate)
karst : 2.203 (theoretical B–N 2.203 — consider fixing)
deep : 1.000 (linear (b=1) — deep reservoir default)
Initial storage depths [mm] (fastest → slowest):
soil : 45.2 mm
karst : 312.8 mm
deep : 18.4 mm
Calibration bounds (log10 days) for params.yml:
log__t_efold_soil: initial=1.453, lower=0.953, upper=1.953
log__t_efold_karst: initial=2.928, lower=2.428, upper=3.428
Brutsaert–Nieber recession cloud:
slope n = 1.4817
coeff a = 0.0052
→ b_HR = 2.841 (used as b_soil prior)
============================================================
You can also visualise the recession cloud and get a YAML snippet:
pr.bn.plot() # log-log recession cloud with fit
print(pr.to_yaml_snippet()) # paste into your config file
The YAML snippet gives a complete reservoirs: and
initial_conditions: block ready to paste into your configuration
file.
What the priors mean:
Timescales come from the spectral decomposition of the hydrograph (
HydrographSeparation). Use them as starting points for calibration, not fixed values — the optimizer will refine them.b_soil (the B–N estimate) is a data-driven starting point. Calibrated values for agricultural catchments are typically b ≈ 3–4. Do not fix it.
b_karst = 2.203 is the theoretical Brutsaert & Nieber (1977) long-time baseflow value. Consider fixing it rather than calibrating — the data rarely constrain it independently. See Recession Analysis for the theory.
b_deep = 1.0 — the deep reservoir is effectively a slow linear store; the linear approximation is adequate and avoids adding a free parameter.
Stage 2 — Write a Configuration File
Create a YAML configuration file. Start from the snippet produced by
pr.to_yaml_snippet() and fill in the remaining sections:
timeseries:
datafile: data/input.csv
initial_conditions:
water_reservoir_effective_depths__mm:
- 45.2 # soil
- 312.8 # karst
- 18.4 # deep
snowpack__mm_SWE: 0
catchment:
drainage_basin_area__km2: 3800
evapotranspiration_method: ThorntwaiteChang2019
water_year_start_month: 10
general:
spin_up_cycles: null # auto: ceil(τ_max / record_length)
enforce_water_balance: global
reservoirs:
e_folding_residence_times__days:
- 28.4 # soil
- 847.3 # karst
- 10000 # deep — placeholder; calibrate or fix
exfiltration_fractions:
- 0.6 # soil — placeholder; calibrate
- 0.5 # karst — placeholder; calibrate
- 1.0 # deep — all to stream
maximum_effective_depths__mm:
- .inf
- .inf
- .inf
recession_exponents:
- 2.841 # soil — from B–N prior; calibrate
- 2.203 # karst — theoretical B–N; consider fixing
- 1.0 # deep — linear
snowmelt:
PDD_melt_factor: 2.0 # mm SWE / °C / day; starting point
fdd_threshold: .inf # °C·day; .inf disables frozen ground
snow_insulation_k: 0.0 # mm⁻¹ SWE; 0 disables insulation
modules:
snowpack: true
frozen_ground: true
rain_on_snow: true
direct_runoff: false
dtr_fgi_decay: false
et_water_stress: false
et_reservoir_draw: true
See Configuration Reference for a full reference of all available options.
Stage 3 — Quick Check Before Calibrating
Run the model once with the prior starting points to verify the configuration is valid and the fit is in the right ballpark:
from hydroravens import Buckets
model = Buckets()
model.initialize('config.yml')
model.run()
model.compute_NSE(verbose=True)
model.plot()
If the hydrograph shape is qualitatively reasonable (seasonal timing is correct, magnitudes are in the right range), proceed to calibration.
Diagnosing a poor Stage 3 fit:
Symptom |
Likely cause and fix |
|---|---|
Modeled peak timing shifted by weeks |
Melt factor too high or low; soil τ too short or long.
Adjust |
Baseflow too high in summer |
Intermediate or deep τ too short, or f_exfiltration too high for those layers. Extend timescales; reduce exfiltration fractions. |
Rising limb too slow, flashy peaks absent |
Soil τ too long; try shorter timescale or higher recession exponent. |
Annual water balance far off (model wet or dry) |
Check |
NSE near 0 or negative |
Check input data alignment — is precipitation spatially appropriate for this basin? Are dates in the correct format? |
Stage 4 — Calibrate
HydroRaVENS uses Dakota for calibration.
The calibration workflow is driven by a params.yml file that
specifies which parameters to calibrate and their bounds. Use the
bounds from pr.log_t_efold_bounds as the starting point for
log__t_efold_* parameters:
parameters:
log__t_efold_soil:
lower: 0.953 # from pr.log_t_efold_bounds
upper: 1.953
initial: 1.453
log__t_efold_karst:
lower: 2.428
upper: 3.428
initial: 2.928
recession_b_soil:
lower: 1.5
upper: 6.0
initial: 2.841 # from B–N prior
After calibration, compare results using AIC to evaluate model structure. The AIC is lower for better-fitting models and penalises additional free parameters. A ΔAIC > 2 relative to a simpler model is required before the more complex structure is considered justified. See Calibration for the full AIC guide and metric selection.
Stage 5 — Evaluate
After calibration, inspect the best-fit parameters and metrics:
from hydroravens import run_and_score
result = run_and_score(
'config.yml',
t_efold = [28.4, 847.3, 10000],
f_to_discharge = [0.65, 0.48],
recession_exponents = [3.4, 2.203, 1.0],
enforce_water_balance = 'global',
)
print(f"Score = {result.score:.3f}") # composite metric (KGE_logKGE here)
print(f"AIC = {result.aic:.1f}")
print(f"BFI = {result.bfi_mod:.3f} (obs: {result.bfi_obs:.3f})")
Key metrics to examine:
Score (KGE_logKGE) — overall and low-flow performance combined.
AIC — penalises extra parameters; use to compare model structures.
BFI — check that modeled baseflow index matches observations.
KGE_logFDC — flow duration curve match; sensitive to reservoir partitioning.
Reporting mean residence time (MRT):
Raw e-folding times (τ) are not physically comparable across reservoirs
or experiments when the recession exponent b differs. After obtaining
best-fit parameters, compute MRT for each reservoir using
mean_residence_time() with the long-term mean
flux attributed to that layer as Q_ref:
buckets = result.buckets
# Mean annual discharge attributed to each reservoir.
# Approximate as: total mean Q × exfiltration-weighted fraction.
# For a quick cross-experiment comparison, use mean total Q as Q_ref.
import numpy as np
Q_mean = buckets.hydrodata['Specific Discharge (modeled) [mm/day]'].mean()
for i, res in enumerate(buckets.reservoirs):
mrt = res.mean_residence_time(Q_ref=Q_mean)
print(f"Reservoir {i}: τ={res.t_efold:.1f}d, b={res.recession_exponent:.3f}, "
f"MRT={mrt:.1f}d (Q_ref={Q_mean:.3f} mm/d)")
MRT should be reported alongside τ and b whenever comparing calibrated models across experiments or basins. τ alone is misleading when b > 1, because the optimizer can trade τ against b while holding MRT nearly constant. See mean-residence-time in Model Description for the derivation and interpretation.
See Also
Configuration Reference — full YAML reference
Recession Analysis — Brutsaert & Nieber theory and API
Calibration — scoring functions, AIC, and metric details
suggest_priors— full API referenceHydrographSeparation— timescale decomposition