Tutorial: From Data to Calibrated Model ======================================== This tutorial walks through a complete HydroRaVENS workflow, from raw daily discharge and precipitation data to a calibrated model with physically informed parameter starting points. .. contents:: On this page :local: :depth: 2 Overview -------- The recommended workflow has four stages: 1. **Estimate priors** — analyse the discharge record to get data-driven starting points for timescales, recession exponents, and initial storage depths (:func:`~hydroravens.suggest_priors`). 2. **Write a configuration file** — populate a YAML file using the suggested priors. 3. **Calibrate** — run the optimizer to refine parameters. 4. **Evaluate** — inspect goodness-of-fit metrics and the best-fit hydrograph. Input Data Format ----------------- HydroRaVENS reads a CSV file with at minimum these columns: .. code-block:: text Date,Precipitation [mm/day],Discharge [m^3/s] Optional columns activate additional model processes: .. code-block:: text Evapotranspiration [mm/day] # measured ET; enables evapotranspiration_method: datafile Mean Temperature [C] # activates snowpack and frozen-ground modules Minimum Temperature [C] # activates DTR-based FGI decay and ThorntwaiteChang2019 Maximum Temperature [C] # activates DTR-based FGI decay and ThorntwaiteChang2019 Photoperiod [hr] # required for ThorntwaiteChang2019 ET method All columns must be at a daily timestep. Missing values in the discharge column are allowed and are excluded from scoring. Stage 1 — Estimate Priors -------------------------- Before running or calibrating the model, use :func:`~hydroravens.suggest_priors` to extract data-driven parameter starting points directly from the observed discharge record. .. code-block:: python import pandas as pd from hydroravens import suggest_priors df = pd.read_csv('data/input.csv', parse_dates=['Date']) Q = df['Specific Discharge [mm/day]'].values # or compute from m³/s P = df['Precipitation [mm/day]'].values pr = suggest_priors(Q, P=P, n_reservoirs=3) pr.summary() The output looks like:: ============================================================ HydroRaVENS data-driven priors ============================================================ E-folding residence times (fastest → slowest): soil : 28.4 days karst : 847.3 days deep : could not be estimated — use calibration default Power-law recession exponents (fastest → slowest): soil : 2.841 (B–N data-driven estimate — calibrate) karst : 2.203 (theoretical B–N 2.203 — consider fixing) deep : 1.000 (linear (b=1) — deep reservoir default) Initial storage depths [mm] (fastest → slowest): soil : 45.2 mm karst : 312.8 mm deep : 18.4 mm Calibration bounds (log10 days) for params.yml: log__t_efold_soil: initial=1.453, lower=0.953, upper=1.953 log__t_efold_karst: initial=2.928, lower=2.428, upper=3.428 Brutsaert–Nieber recession cloud: slope n = 1.4817 coeff a = 0.0052 → b_HR = 2.841 (used as b_soil prior) ============================================================ You can also visualise the recession cloud and get a YAML snippet: .. code-block:: python pr.bn.plot() # log-log recession cloud with fit print(pr.to_yaml_snippet()) # paste into your config file The YAML snippet gives a complete ``reservoirs:`` and ``initial_conditions:`` block ready to paste into your configuration file. **What the priors mean:** - **Timescales** come from the spectral decomposition of the hydrograph (:class:`~hydroravens.HydrographSeparation`). Use them as starting points for calibration, not fixed values — the optimizer will refine them. - **b_soil** (the B–N estimate) is a data-driven starting point. Calibrated values for agricultural catchments are typically *b* ≈ 3–4. Do not fix it. - **b_karst = 2.203** is the theoretical Brutsaert & Nieber (1977) long-time baseflow value. Consider fixing it rather than calibrating — the data rarely constrain it independently. See :doc:`recession_analysis` for the theory. - **b_deep = 1.0** — the deep reservoir is effectively a slow linear store; the linear approximation is adequate and avoids adding a free parameter. Stage 2 — Write a Configuration File -------------------------------------- Create a YAML configuration file. Start from the snippet produced by ``pr.to_yaml_snippet()`` and fill in the remaining sections: .. code-block:: yaml timeseries: datafile: data/input.csv initial_conditions: water_reservoir_effective_depths__mm: - 45.2 # soil - 312.8 # karst - 18.4 # deep snowpack__mm_SWE: 0 catchment: drainage_basin_area__km2: 3800 evapotranspiration_method: ThorntwaiteChang2019 water_year_start_month: 10 general: spin_up_cycles: null # auto: ceil(τ_max / record_length) enforce_water_balance: global reservoirs: e_folding_residence_times__days: - 28.4 # soil - 847.3 # karst - 10000 # deep — placeholder; calibrate or fix exfiltration_fractions: - 0.6 # soil — placeholder; calibrate - 0.5 # karst — placeholder; calibrate - 1.0 # deep — all to stream maximum_effective_depths__mm: - .inf - .inf - .inf recession_exponents: - 2.841 # soil — from B–N prior; calibrate - 2.203 # karst — theoretical B–N; consider fixing - 1.0 # deep — linear snowmelt: PDD_melt_factor: 2.0 # mm SWE / °C / day; starting point fdd_threshold: .inf # °C·day; .inf disables frozen ground snow_insulation_k: 0.0 # mm⁻¹ SWE; 0 disables insulation modules: snowpack: true frozen_ground: true rain_on_snow: true direct_runoff: false dtr_fgi_decay: false et_water_stress: false et_reservoir_draw: true See :doc:`configuration` for a full reference of all available options. Stage 3 — Quick Check Before Calibrating ----------------------------------------- Run the model once with the prior starting points to verify the configuration is valid and the fit is in the right ballpark: .. code-block:: python from hydroravens import Buckets model = Buckets() model.initialize('config.yml') model.run() model.compute_NSE(verbose=True) model.plot() If the hydrograph shape is qualitatively reasonable (seasonal timing is correct, magnitudes are in the right range), proceed to calibration. **Diagnosing a poor Stage 3 fit:** .. list-table:: :widths: 40 55 :header-rows: 1 * - Symptom - Likely cause and fix * - Modeled peak timing shifted by weeks - Melt factor too high or low; soil τ too short or long. Adjust ``PDD_melt_factor`` and ``log__t_efold_soil``. * - Baseflow too high in summer - Intermediate or deep τ too short, or f_exfiltration too high for those layers. Extend timescales; reduce exfiltration fractions. * - Rising limb too slow, flashy peaks absent - Soil τ too long; try shorter timescale or higher recession exponent. * - Annual water balance far off (model wet or dry) - Check ``drainage_basin_area__km2`` and discharge units. Verify ``enforce_water_balance`` is set; check ET data for unit errors. * - NSE near 0 or negative - Check input data alignment — is precipitation spatially appropriate for this basin? Are dates in the correct format? Stage 4 — Calibrate -------------------- HydroRaVENS uses `Dakota `_ for calibration. The calibration workflow is driven by a ``params.yml`` file that specifies which parameters to calibrate and their bounds. Use the bounds from ``pr.log_t_efold_bounds`` as the starting point for ``log__t_efold_*`` parameters: .. code-block:: yaml parameters: log__t_efold_soil: lower: 0.953 # from pr.log_t_efold_bounds upper: 1.953 initial: 1.453 log__t_efold_karst: lower: 2.428 upper: 3.428 initial: 2.928 recession_b_soil: lower: 1.5 upper: 6.0 initial: 2.841 # from B–N prior After calibration, compare results using AIC to evaluate model structure. The AIC is lower for better-fitting models and penalises additional free parameters. A ΔAIC > 2 relative to a simpler model is required before the more complex structure is considered justified. See :doc:`calibration` for the full AIC guide and metric selection. Stage 5 — Evaluate ------------------- After calibration, inspect the best-fit parameters and metrics: .. code-block:: python from hydroravens import run_and_score result = run_and_score( 'config.yml', t_efold = [28.4, 847.3, 10000], f_to_discharge = [0.65, 0.48], recession_exponents = [3.4, 2.203, 1.0], enforce_water_balance = 'global', ) print(f"Score = {result.score:.3f}") # composite metric (KGE_logKGE here) print(f"AIC = {result.aic:.1f}") print(f"BFI = {result.bfi_mod:.3f} (obs: {result.bfi_obs:.3f})") Key metrics to examine: - **Score (KGE_logKGE)** — overall and low-flow performance combined. - **AIC** — penalises extra parameters; use to compare model structures. - **BFI** — check that modeled baseflow index matches observations. - **KGE_logFDC** — flow duration curve match; sensitive to reservoir partitioning. **Reporting mean residence time (MRT):** Raw e-folding times (τ) are not physically comparable across reservoirs or experiments when the recession exponent *b* differs. After obtaining best-fit parameters, compute MRT for each reservoir using :meth:`~hydroravens.Reservoir.mean_residence_time` with the long-term mean flux attributed to that layer as ``Q_ref``: .. code-block:: python buckets = result.buckets # Mean annual discharge attributed to each reservoir. # Approximate as: total mean Q × exfiltration-weighted fraction. # For a quick cross-experiment comparison, use mean total Q as Q_ref. import numpy as np Q_mean = buckets.hydrodata['Specific Discharge (modeled) [mm/day]'].mean() for i, res in enumerate(buckets.reservoirs): mrt = res.mean_residence_time(Q_ref=Q_mean) print(f"Reservoir {i}: τ={res.t_efold:.1f}d, b={res.recession_exponent:.3f}, " f"MRT={mrt:.1f}d (Q_ref={Q_mean:.3f} mm/d)") MRT should be reported alongside τ and *b* whenever comparing calibrated models across experiments or basins. τ alone is misleading when *b* > 1, because the optimizer can trade τ against *b* while holding MRT nearly constant. See :ref:`mean-residence-time` in :doc:`model_description` for the derivation and interpretation. See Also -------- - :doc:`configuration` — full YAML reference - :doc:`recession_analysis` — Brutsaert & Nieber theory and API - :doc:`calibration` — scoring functions, AIC, and metric details - :class:`~hydroravens.suggest_priors` — full API reference - :class:`~hydroravens.HydrographSeparation` — timescale decomposition