Skip to content

Scoring

Proper scoring rules for posterior predictive evaluation.

trade_study.score(metric, predictions, truth, *, alpha=None, level=0.95)

Compute a scalar scoring rule.

Parameters:

Name Type Description Default
metric str

One of "crps", "wis", "interval", "energy", "rmse", "mae", "coverage", "brier".

required
predictions NDArray[floating[Any]]

Model predictions (ensemble members, quantiles, etc.).

required
truth NDArray[floating[Any]]

Known ground truth values.

required
alpha float | NDArray[floating[Any]] | None

Significance level for interval-based scores.

None
level float

Nominal coverage level for coverage metric.

0.95

Returns:

Type Description
float

Scalar score value.

Raises:

Type Description
ValueError

If the metric name is not recognized.

Source code in src/trade_study/_scoring.py
def score(
    metric: str,
    predictions: NDArray[np.floating[Any]],
    truth: NDArray[np.floating[Any]],
    *,
    alpha: float | NDArray[np.floating[Any]] | None = None,
    level: float = 0.95,
) -> float:
    """Compute a scalar scoring rule.

    Args:
        metric: One of "crps", "wis", "interval", "energy",
            "rmse", "mae", "coverage", "brier".
        predictions: Model predictions (ensemble members, quantiles, etc.).
        truth: Known ground truth values.
        alpha: Significance level for interval-based scores.
        level: Nominal coverage level for coverage metric.

    Returns:
        Scalar score value.

    Raises:
        ValueError: If the metric name is not recognized.
    """
    simple = {
        "crps": _crps,
        "energy": _energy,
        "brier": _brier,
        "rmse": _rmse,
        "mae": _mae,
    }
    if metric in simple:
        return simple[metric](predictions, truth)
    if metric == "wis":
        return _wis(predictions, truth, alpha=alpha)
    if metric == "interval":
        return _interval(predictions, truth, alpha=alpha)
    if metric == "coverage":
        return _coverage(predictions, truth, level=level)
    msg = f"Unknown metric: {metric!r}"
    raise ValueError(msg)

trade_study.coverage_curve(posteriors, truth, levels=None)

Compute empirical coverage across nominal levels.

Parameters:

Name Type Description Default
posteriors NDArray[floating[Any]]

Posterior samples, shape (n_obs, n_samples).

required
truth NDArray[floating[Any]]

True values, shape (n_obs,).

required
levels NDArray[floating[Any]] | None

Nominal coverage levels (default: 0.05 to 0.99).

None

Returns:

Type Description
tuple[NDArray[floating[Any]], NDArray[floating[Any]]]

Tuple of (nominal_levels, empirical_coverage).

Source code in src/trade_study/_scoring.py
def coverage_curve(
    posteriors: NDArray[np.floating[Any]],
    truth: NDArray[np.floating[Any]],
    levels: NDArray[np.floating[Any]] | None = None,
) -> tuple[NDArray[np.floating[Any]], NDArray[np.floating[Any]]]:
    """Compute empirical coverage across nominal levels.

    Args:
        posteriors: Posterior samples, shape (n_obs, n_samples).
        truth: True values, shape (n_obs,).
        levels: Nominal coverage levels (default: 0.05 to 0.99).

    Returns:
        Tuple of (nominal_levels, empirical_coverage).
    """
    if levels is None:
        levels = np.linspace(0.05, 0.99, 50)
    empirical = np.array([
        _coverage(posteriors, truth, level=float(lv)) for lv in levels
    ])
    return levels, empirical