Building a Bayesian Position Player WAR Model

Published

June 27, 2026

Setup code

suppressPackageStartupMessages({
  library(cmdstanr)
  library(posterior)
  library(scales)
  library(ggrepel)
  library(gt)
  library(here)
  library(tidyverse)

  devtools::load_all()
})

season <- params$season
rpw <- fg_woba_weights(season)$rpw

pos_adj_per_600pa <- c(
  C = 12.5, SS = 7.5,
  `2B` = 2.5, `3B` = 2.5, CF = 2.5,
  LF = -7.5, RF = -7.5,
  `1B` = -12.5, DH = -17.5,
  OF = -7.5
)

The Goal

Position player WAR estimates how many wins a player contributed above a replacement-level baseline. Replacement level represents the freely-available minor league call-up a team could use instead.

Like our Pitcher WAR model, this model is fully Bayesian. The key advantage is uncertainty quantification: every player gets a full posterior distribution over WAR, with credible intervals that widen for players with few plate appearances.

Position player WAR has five components:

\[ \text{WAR} = \frac{\text{Batting} + \text{Fielding} + \text{Baserunning} + \text{Positional Adjustment} + \text{Replacement Runs}}{\text{Runs per Win}} \]

Component	What it measures	How we estimate it
Batting	Offensive value above/below average	Bayesian wOBA model (per-game logs)
Fielding	Defensive value above/below average	Bayesian shrinkage of DRS (framing runs for catchers)
Baserunning	Value on the bases beyond batting	Seasonal UBR + wSB (observed)
Positional Adj.	Difficulty of playing the position	Fixed constants (FanGraphs standard)
Replacement Runs	Credit for being available at all	~20 runs per 600 PA below average

The Data

Batting: per-game logs

The unit of observation for batting is a single game appearance. Each game contributes six count outcomes to the likelihood:

Symbol	Meaning
\(y_{HR}\)	Home runs
\(y_{3B}\)	Triples
\(y_{2B}\)	Doubles
\(y_{1B}\)	Singles (direct `1B` column)
\(y_{BB}\)	Unintentional walks (\(BB - IBB\))
\(y_{HBP}\)	Hit by pitch

Each game also records plate appearances (PA) as the exposure, the batter identity, the game date, and park factors for HR, 3B, 2B, and 1B.

Load and preview batting data

d_logs <- read_rds(here(str_glue('data/batter_logs_{season}.rds')))
d_logs |>
  slice_sample(n = 8) |>
  select(name, team, date, pa, n_hr, n_3b, n_2b, n_1b, n_bb, n_hbp) |>
  arrange(date) |>
  gt() |>
  tab_header(title = 'Sample game-log rows') |>
  fmt_date(columns = date, date_style = 'yMMMd')

name	team	date	pa	n_hr	n_3b	n_2b	n_1b	n_bb	n_hbp
Sample game-log rows
Trea Turner	PHI	Mar 26, 2026	4	0	0	0	2	0	0
Kyle Schwarber	PHI	Mar 30, 2026	5	0	0	0	0	1	1
Ramon Urias	STL	Apr 7, 2026	1	0	0	0	0	1	0
Zach Neto	LAA	Apr 13, 2026	5	0	0	0	1	1	0
Steven Kwan	CLE	Apr 14, 2026	5	0	0	0	0	0	0
Oneil Cruz	PIT	Apr 17, 2026	5	1	0	0	0	1	0
J.P. Crawford	SEA	May 25, 2026	5	1	0	0	0	0	0
Cole Young	SEA	Jun 8, 2026	5	0	0	0	1	0	0

The data covers all batters with at least 1 PA in the season, roughly 700 players. The first season fetch makes around 650 API calls to FanGraphs (one per batter) and is cached as a compressed .rds file.

wOBA linear weights

Not all hits are equal. A home run is worth more than a single. wOBA (weighted on-base average) assigns run-value weights to each event:

\[ \text{wOBA} = \frac{w_{BB} \cdot BB + w_{HBP} \cdot HBP + w_{1B} \cdot 1B + w_{2B} \cdot 2B + w_{3B} \cdot 3B + w_{HR} \cdot HR}{\text{PA} - \text{IBB}} \]

The weights come from linear-weight run expectancy calculations and are re-estimated each season by FanGraphs.

Fetch and display wOBA weights

woba_wts <- fg_woba_weights(season)
tibble(
  Event = c('Walk (unint.)', 'Hit by Pitch', 'Single', 'Double', 'Triple', 'Home Run'),
  Symbol = c('BB', 'HBP', '1B', '2B', '3B', 'HR'),
  Weight = c(
    woba_wts$w_bb, woba_wts$w_hbp, woba_wts$w_1b,
    woba_wts$w_2b, woba_wts$w_3b, woba_wts$w_hr
  )
) |>
  gt() |>
  tab_header(
    title = str_glue('{season} wOBA linear weights'),
    subtitle = 'Source: FanGraphs guts page'
  ) |>
  fmt_number(columns = Weight, decimals = 3) |>
  tab_footnote(
    footnote = str_glue(
      'wOBA scale = {round(woba_wts$woba_scale, 3)}; ',
      'league wOBA = {round(woba_wts$lg_woba, 3)}; ',
      'runs per win = {round(woba_wts$rpw, 2)}'
    )
  )

Event	Symbol	Weight
2026 wOBA linear weights
Source: FanGraphs guts page
Walk (unint.)	BB	0.700
Hit by Pitch	HBP	0.731
Single	1B	0.892
Double	2B	1.263
Triple	3B	1.598
Home Run	HR	2.052
wOBA scale = 1.239; league wOBA = 0.317; runs per win = 9.84

Fielding: seasonal DRS

DRS (Defensive Runs Saved, Baseball Info Solutions) measures how many runs above or below average a fielder saved. DRS is only available as a seasonal total (not per-game), so the fielding sub-model uses a different approach than the batting model.

We use DRS rather than UZR or FRP because it has a consistent methodology and column name from ~2003 to present — essential for future cross-year random-walk smoothing where UZR (discontinued after 2024) and FRP (introduced in 2025) cannot form a consistent series.

For catchers, we use framing runs (CFraming) from the FanGraphs fielding leaderboard. This measures the run value of converting borderline pitches into called strikes.

Baserunning: seasonal UBR + wSB

UBR (Ultimate Base Running) measures extra-base-advancement value on non-stolen-base plays. wSB (weighted stolen base runs) captures stolen base and caught stealing. We use FanGraphs seasonal values directly as point estimates.

The Batting Model

The batting model is the statistical core of position player WAR. It extends the pitcher FIP model from three components to six, using the same two-level hierarchy.

League baseline

For each of the six event types \(c \in \{HR, 3B, 2B, 1B, BB, HBP\}\), the league-average rate on day \(d\) follows a B-spline on the logit scale:

\[ \text{logit}\bigl(\lambda^{(c)}_d\bigr) = \sum_{k=1}^{K} B_{dk} \cdot \theta^{(c)}_k \]

where \(B_{dk}\) is the spline basis matrix (knots every 14 days) and \(\theta^{(c)}_k\) are the knot values. A random-walk prior anchors the first knot near the observed league rate and allows slow drift:

\[ \theta^{(c)}_1 \sim \mathcal{N}(\text{logit}(\hat\lambda^{(c)}),\; 0.3), \quad \theta^{(c)}_k \sim \mathcal{N}(\theta^{(c)}_{k-1},\; 0.1) \quad k \geq 2 \]

This captures real seasonal trends (e.g., a home run spike in warm weather) without overfitting to week-to-week noise.

Batter offsets

Each batter \(p\) has a scalar offset for each event type, estimated with partial pooling (non-centred parametrisation):

\[ \alpha^{(c)}_p = \sigma_c \cdot z^{(c)}_p, \quad z^{(c)}_p \sim \mathcal{N}(0,1), \quad \sigma_c \sim \mathcal{N}^+(0, 0.25) \]

The shared scale \(\sigma_c\) controls how much batter-to-batter variation the model finds for that component. When \(\sigma_c\) is small, all batters look alike and everyone is shrunk toward the league mean. When \(\sigma_c\) is large, individual batters diverge substantially.

Likelihood

The expected count for event type \(c\) in game \(g\) is:

\[ \mu^{(c)}_g = \text{PA}_g \cdot \text{logit}^{-1}\!\Bigl(\lambda^{(c)}_{d_g} + \alpha^{(c)}_{p_g}\Bigr) \cdot \text{PF}^{(c)}_g \]

where \(\text{PF}^{(c)}_g = \exp(\log(\text{pf}^{(c)}_g / 100))\) is the park factor for component \(c\) in game \(g\). HR, 3B, 2B, and 1B are park-adjusted using FanGraphs component park factors. BB and HBP are not park-adjusted: walk rate is primarily a pitcher-batter interaction, and HBP has no reliable park factor.

Each count follows a Negative Binomial distribution to accommodate overdispersion beyond Poisson. Real batters have slumps and hot streaks that inflate variance beyond what Poisson predicts:

\[ y^{(c)}_g \sim \text{NegBin}\bigl(\mu^{(c)}_g,\; \phi_c\bigr), \quad \phi_c \sim \text{Gamma}(3, 0.1) \]

Full Stan model

Code

// Bayesian position player batting WAR model (fWAR-style, wOBA-based)
//
// Models per-game counts of six wOBA event types (HR, 3B, 2B, 1B, BB, HBP)
// as Negative Binomial with a per-PA rate.
//
// Two-level hierarchy (mirrors pitcher_fip.stan):
//   1. League-level time-varying baseline (B-spline with random-walk prior)
//   2. Batter scalar offsets (partial pooling / shrinkage toward league avg)
//
// Park adjustments: HR, 3B, 2B, 1B are adjusted for park effects on the log
// scale (log(pf/100) added to log expected count).  BB and HBP are not
// park-adjusted — walk rate is determined by pitcher/batter interaction, not
// park dimensions; HBP has no reliable park factor.
//
// Generated quantities: per-batter wOBA and wRAA per PA.  Scale by actual
// PA in R post-processing to get season batting runs and WAR.

data {
  int<lower=1> P;        // batters
  int<lower=1> N;        // game-batter observations
  int<lower=1> D;        // days in season
  int<lower=2> K;        // spline basis columns

  array[N] int<lower=1, upper=P> batter;  // batter index per game
  array[N] int<lower=1, upper=D> day;     // day of season per game

  // Observed counts per game
  array[N] int<lower=0> n_hr;
  array[N] int<lower=0> n_3b;
  array[N] int<lower=0> n_2b;
  array[N] int<lower=0> n_1b;
  array[N] int<lower=0> n_bb;   // unintentional BB
  array[N] int<lower=0> n_hbp;
  array[N] int<lower=0> pa;     // plate appearances (exposure)

  // Park factors on log scale: log(pf / 100) per game-observation.
  // HR, 3B, 2B, 1B are park-adjusted; BB and HBP are not.
  vector[N] log_park_hr;
  vector[N] log_park_1b;
  vector[N] log_park_2b;
  vector[N] log_park_3b;

  matrix[D, K] splines;          // pre-computed B-spline basis (D x K)

  // Prior centres (logit of observed league rates per PA)
  real mu_hr_prior;
  real mu_3b_prior;
  real mu_2b_prior;
  real mu_1b_prior;
  real mu_bb_prior;
  real mu_hbp_prior;

  // wOBA linear weights (season-specific, from FanGraphs guts page)
  real w_hr;
  real w_3b;
  real w_2b;
  real w_1b;
  real w_bb;
  real w_hbp;
  real<lower=0> woba_scale;    // converts linear weights to wOBA scale (~1.15-1.20)
  real<lower=0> lg_woba;       // league-average wOBA

  // WAR constants
  real<lower=0> rpw;   // runs per win (from FanGraphs guts R/W)
  // Note: replacement level (~20 runs / 600 PA) is applied in R post-processing,
  // not in the Stan model.
}

parameters {
  // League-level spline knots for each wOBA component (time-varying baseline)
  vector[K] lg_hr_knots;
  vector[K] lg_3b_knots;
  vector[K] lg_2b_knots;
  vector[K] lg_1b_knots;
  vector[K] lg_bb_knots;
  vector[K] lg_hbp_knots;

  // Batter-level deviations from league baseline (non-centred)
  vector[P] z_hr;    real<lower=0> sigma_hr;
  vector[P] z_3b;    real<lower=0> sigma_3b;
  vector[P] z_2b;    real<lower=0> sigma_2b;
  vector[P] z_1b;    real<lower=0> sigma_1b;
  vector[P] z_bb;    real<lower=0> sigma_bb;
  vector[P] z_hbp;   real<lower=0> sigma_hbp;

  // Negative Binomial overdispersion (one per component)
  real<lower=0> phi_hr;
  real<lower=0> phi_3b;
  real<lower=0> phi_2b;
  real<lower=0> phi_1b;
  real<lower=0> phi_bb;
  real<lower=0> phi_hbp;
}

transformed parameters {
  // League-average logit rates for each day (D-vectors)
  vector[D] lg_logit_hr  = splines * lg_hr_knots;
  vector[D] lg_logit_3b  = splines * lg_3b_knots;
  vector[D] lg_logit_2b  = splines * lg_2b_knots;
  vector[D] lg_logit_1b  = splines * lg_1b_knots;
  vector[D] lg_logit_bb  = splines * lg_bb_knots;
  vector[D] lg_logit_hbp = splines * lg_hbp_knots;

  // Log expected counts per game: league baseline + batter offset + exposure + park
  // HR, 3B, 2B, 1B are park-adjusted; BB and HBP are not.
  vector[N] log_mu_hr  = log_inv_logit(lg_logit_hr[day]  + sigma_hr  * z_hr[batter])
                         + log(to_vector(pa)) + log_park_hr;
  vector[N] log_mu_3b  = log_inv_logit(lg_logit_3b[day]  + sigma_3b  * z_3b[batter])
                         + log(to_vector(pa)) + log_park_3b;
  vector[N] log_mu_2b  = log_inv_logit(lg_logit_2b[day]  + sigma_2b  * z_2b[batter])
                         + log(to_vector(pa)) + log_park_2b;
  vector[N] log_mu_1b  = log_inv_logit(lg_logit_1b[day]  + sigma_1b  * z_1b[batter])
                         + log(to_vector(pa)) + log_park_1b;
  vector[N] log_mu_bb  = log_inv_logit(lg_logit_bb[day]  + sigma_bb  * z_bb[batter])
                         + log(to_vector(pa));
  vector[N] log_mu_hbp = log_inv_logit(lg_logit_hbp[day] + sigma_hbp * z_hbp[batter])
                         + log(to_vector(pa));
}

model {
  // League baseline: random-walk prior on spline knots
  // First knot anchored near observed league rate; subsequent knots drift slowly
  lg_hr_knots[1]  ~ normal(mu_hr_prior,  0.3);
  lg_3b_knots[1]  ~ normal(mu_3b_prior,  0.3);
  lg_2b_knots[1]  ~ normal(mu_2b_prior,  0.3);
  lg_1b_knots[1]  ~ normal(mu_1b_prior,  0.3);
  lg_bb_knots[1]  ~ normal(mu_bb_prior,  0.3);
  lg_hbp_knots[1] ~ normal(mu_hbp_prior, 0.3);
  for (k in 2:K) {
    lg_hr_knots[k]  ~ normal(lg_hr_knots[k-1],  0.1);
    lg_3b_knots[k]  ~ normal(lg_3b_knots[k-1],  0.1);
    lg_2b_knots[k]  ~ normal(lg_2b_knots[k-1],  0.1);
    lg_1b_knots[k]  ~ normal(lg_1b_knots[k-1],  0.1);
    lg_bb_knots[k]  ~ normal(lg_bb_knots[k-1],  0.1);
    lg_hbp_knots[k] ~ normal(lg_hbp_knots[k-1], 0.1);
  }

  // Batter offsets: partial pooling (non-centred parametrisation)
  z_hr  ~ std_normal();   sigma_hr  ~ normal(0, 0.25);
  z_3b  ~ std_normal();   sigma_3b  ~ normal(0, 0.25);
  z_2b  ~ std_normal();   sigma_2b  ~ normal(0, 0.25);
  z_1b  ~ std_normal();   sigma_1b  ~ normal(0, 0.25);
  z_bb  ~ std_normal();   sigma_bb  ~ normal(0, 0.25);
  z_hbp ~ std_normal();   sigma_hbp ~ normal(0, 0.25);

  // Overdispersion: weakly informative
  phi_hr  ~ gamma(3, 0.1);
  phi_3b  ~ gamma(3, 0.1);
  phi_2b  ~ gamma(3, 0.1);
  phi_1b  ~ gamma(3, 0.1);
  phi_bb  ~ gamma(3, 0.1);
  phi_hbp ~ gamma(3, 0.1);

  // Likelihoods
  n_hr  ~ neg_binomial_2_log(log_mu_hr,  phi_hr);
  n_3b  ~ neg_binomial_2_log(log_mu_3b,  phi_3b);
  n_2b  ~ neg_binomial_2_log(log_mu_2b,  phi_2b);
  n_1b  ~ neg_binomial_2_log(log_mu_1b,  phi_1b);
  n_bb  ~ neg_binomial_2_log(log_mu_bb,  phi_bb);
  n_hbp ~ neg_binomial_2_log(log_mu_hbp, phi_hbp);
}

generated quantities {
  // Season-average league logit rates (used as context for each batter)
  real avg_lg_hr  = mean(lg_logit_hr);
  real avg_lg_3b  = mean(lg_logit_3b);
  real avg_lg_2b  = mean(lg_logit_2b);
  real avg_lg_1b  = mean(lg_logit_1b);
  real avg_lg_bb  = mean(lg_logit_bb);
  real avg_lg_hbp = mean(lg_logit_hbp);

  // True per-PA rates for each batter (season-average, park-neutral context)
  // The park factors were absorbed into the likelihood; the posterior z_* values
  // already reflect park-adjusted talent.  inv_logit(league_avg_logit + offset)
  // gives the batter's park-neutral per-PA probability.
  vector[P] rate_hr  = inv_logit(avg_lg_hr  + sigma_hr  * z_hr);
  vector[P] rate_3b  = inv_logit(avg_lg_3b  + sigma_3b  * z_3b);
  vector[P] rate_2b  = inv_logit(avg_lg_2b  + sigma_2b  * z_2b);
  vector[P] rate_1b  = inv_logit(avg_lg_1b  + sigma_1b  * z_1b);
  vector[P] rate_bb  = inv_logit(avg_lg_bb  + sigma_bb  * z_bb);
  vector[P] rate_hbp = inv_logit(avg_lg_hbp + sigma_hbp * z_hbp);

  // wOBA per PA: sum of (FanGraphs weight × per-PA rate).
  // The FanGraphs w_* weights (from the guts page) are already on the wOBA
  // scale — they equal raw_linear_weight × woba_scale.  So this sum gives
  // wOBA directly; no further woba_scale division is needed here.
  vector[P] woba_per_pa = w_hr  * rate_hr  +
                          w_3b  * rate_3b  +
                          w_2b  * rate_2b  +
                          w_1b  * rate_1b  +
                          w_bb  * rate_bb  +
                          w_hbp * rate_hbp;

  // wRAA per PA: (wOBA − lgwOBA) / woba_scale  [standard FanGraphs formula]
  // woba_scale converts the wOBA gap from OBP-scale to run-value scale.
  // Final WAR = (wraa_per_pa × PA + repl_runs) / rpw  (done in R)
  vector[P] wraa_per_pa = (woba_per_pa - lg_woba) / woba_scale;
}

From wOBA to Batting WAR

After sampling, we compute per-batter wOBA and WAR in R from the posterior draws, following the same post-processing pattern as pitcher WAR.

wOBA per PA

In the generated quantities block, each posterior draw gives true per-PA rates \(r^{(c)}_p\) for all six event types:

\[ r^{(c)}_p = \text{logit}^{-1}\!\bigl(\bar\lambda^{(c)} + \alpha^{(c)}_p\bigr) \]

where \(\bar\lambda^{(c)}\) is the season-average league logit rate. Then:

\[ \text{wOBA}_p = w_{HR}\,r^{HR}_p + w_{3B}\,r^{3B}_p + w_{2B}\,r^{2B}_p + w_{1B}\,r^{1B}_p + w_{BB}\,r^{BB}_p + w_{HBP}\,r^{HBP}_p \]

The FanGraphs weights \(w_c\) are already on the wOBA scale (each equals the raw linear weight \(\times\) wOBA scale), so the sum gives wOBA directly.

wRAA per PA

Batting runs above average per plate appearance:

\[ \text{wRAA/PA}_p = \frac{\text{wOBA}_p - \overline{\text{wOBA}}_\text{lg}}{\text{wOBA scale}} \]

Batting WAR

Batting WAR post-processing

draws_bat <- read_rds(here(str_glue('data/fit_batting_{season}.rds')))
d_logs <- read_rds(here(str_glue('data/batter_logs_{season}.rds')))
woba_wts <- fg_woba_weights(season)

batting_stan <- prep_batting_stan(
  d_logs,
  tryCatch(fg_batting_park(season), error = \(e) NULL),
  woba_wts
)

d_pa <- d_logs |>
  group_by(player_id, name) |>
  summarise(pa = sum(pa), team = last(team), .groups = 'drop')

1wraa_mat <- as_draws_matrix(
  draws_bat[, , str_starts(variables(draws_bat), 'wraa_per_pa')]
)

2pa_vec <- d_pa$pa[match(batting_stan$batter_levels, d_pa$player_id)]
3batting_runs_mat <- sweep(wraa_mat, 2, pa_vec, '*')
4repl_mat <- outer(rep(1L, nrow(batting_runs_mat)), 20 * pa_vec / 600)
5batting_war_mat <- (batting_runs_mat + repl_mat) / woba_wts$rpw

1: Pull the wraa_per_pa draws: a matrix of [draws × batters].
2: Align PA totals to the Stan batter-index ordering.
3: Multiply each batter’s wRAA/PA by their actual PA to get season wRAA.
4: Add replacement-level credit: ~20 runs below average per 600 PA.
5: Divide by runs-per-win to convert to WAR scale.

The replacement level of 20 runs per 600 PA reflects that a bench player or call-up typically bats around 20 runs below the league average. Every PA from an above-replacement player is that much more valuable than what a team could freely get.

Defense: DRS and Bayesian Shrinkage

The DRS reliability problem

DRS is noisy at the single-season level. Its year-to-year correlation for a single season is roughly \(r \approx 0.2\)–\(0.3\), somewhat lower than UZR. A player who appears to save 15 runs in one year might truly be a 5-run saver, or even a −2-run saver.

The Bayesian solution is to treat each season’s DRS as a noisy measurement of the player’s true defensive talent, with noise that decreases as playing time increases.

The measurement error model

For non-catcher fielder \(p\) at position \(j\), the model uses a rate parametrisation: the prior is placed on the player’s true defensive rate per full season, and the actual run contribution scales with innings played.

\[ \tau_p \sim \mathcal{N}(0,\; \sigma^{(j)}_\text{talent}) \]

\[ \text{DRS}_p \sim \mathcal{N}\!\Bigl(\tau_p \cdot \frac{\text{inn}_p}{1350},\; \frac{\sigma^{(j)}_\text{meas}}{\sqrt{\text{inn}_p / 1350}}\Bigr) \]

where \(\tau_p\) is the true rate (runs per full season), \(\sigma^{(j)}_\text{talent}\) is the SD of talent at position \(j\), and \(\sigma^{(j)}_\text{meas}\) is the measurement noise at full exposure. A player with 200 innings is expected to contribute ~15% of a full season’s worth of runs; the model does not need to learn this from data — it is built into the mean of the likelihood.

Why sigma_meas cannot be estimated from a single season

There is a structural identification problem: the model has one latent parameter \(z_p\) per player. The sampler can always explain every player’s observed DRS by setting \(z_p = \text{obs\_drs}_p / (\sigma_\text{talent} \cdot v_p)\) — i.e., treating observed DRS as exact truth — and then driving \(\sigma_\text{meas} \to 0\). This degeneracy holds regardless of parametrisation or prior shape: with ~500 players all pulling \(\sigma_\text{meas}\) toward 0, even a strong lognormal prior is overwhelmed.

Resolving \(\sigma_\text{meas}\) from \(\sigma_\text{talent}\) requires repeated measurements of the same players — either across seasons or within a season via split samples. That is the natural setting for a multi-season random-walk model, deferred to a future version.

Fixing sigma_meas via known reliability (empirical Bayes)

Since \(\sigma_\text{meas}\) cannot be estimated from one season, we fix its ratio to \(\sigma_\text{talent}\) using the known DRS year-to-year reliability \(r\):

\[ r = \frac{\sigma^2_\text{talent}}{\sigma^2_\text{talent} + \sigma^2_\text{meas}} \quad\Longrightarrow\quad \frac{\sigma_\text{meas}}{\sigma_\text{talent}} = \sqrt{\frac{1-r}{r}} =: \rho \]

With \(r_\text{DRS} \approx 0.40\), \(\rho \approx 1.22\). Only \(\sigma^{(j)}_\text{talent}\) is estimated from the current season’s cross-sectional data; \(\sigma^{(j)}_\text{meas} = \rho \cdot \sigma^{(j)}_\text{talent}\) is derived. This is standard empirical Bayes: substituting external knowledge for a parameter the data cannot pin down.

The implied shrinkage for a player with \(v = \text{inn}/1350\) is:

\[ \mathbb{E}[\tau_p \mid \text{DRS}_p] = \text{DRS}_p \cdot \frac{r}{r + (1-r)/v} \]

Innings	\(v\)	Shrinkage factor	Interpretation
1350 (full)	1.00	\(r = 0.40\)	Full-season player: 40% of observed DRS is credited
675 (half)	0.50	0.25	Half-season: 25% credited
200	0.15	0.09	Part-season: 9% credited, strongly pooled toward 0

A player with 1350 innings gets noise \(\sigma^{(j)}_\text{meas}\); a player with 338 innings gets noise \(2 \times \sigma^{(j)}_\text{meas}\), reflecting four times less information.

Shrinkage in action

Fielding shrinkage plot

d_field <- read_rds(here(str_glue('data/fielding_{season}.rds')))
d_framing <- read_rds(here(str_glue('data/framing_{season}.rds')))
draws_fld <- read_rds(here(str_glue('data/fit_fielding_{season}.rds')))
fld_stan <- prep_fielding_stan(d_field, d_framing)

drs_post_med <- apply(
  as_draws_matrix(draws_fld[, , str_starts(variables(draws_fld), 'true_drs')]),
  2, median
)

tibble(
  player_id = fld_stan$fielder_levels,
  drs_post  = drs_post_med
) |>
  left_join(fld_stan$d_field_agg, by = 'player_id') |>
  ggplot(aes(drs, drs_post)) +
  geom_abline(slope = 1, intercept = 0, color = '#888888', linewidth = 0.5) +
  geom_hline(yintercept = 0, color = '#e6550d', linewidth = 0.4, lty = 'dotted') +
  geom_point(aes(size = inn), alpha = 0.4, color = '#31a354') +
  geom_text_repel(
    data = ~ filter(.x, inn < 400 | abs(drs - drs_post) > 8),
    aes(label = player_id),
    size = 2.6,
    max.overlaps = 12,
    seed = 42
  ) +
  scale_size_continuous(range = c(0.5, 4), name = 'Innings') +
  labs(
    x = 'Observed DRS (runs)',
    y = 'Posterior median true DRS (runs)',
    title = 'Fielding shrinkage: raw DRS vs. Bayesian estimate',
    subtitle = 'Dotted line: league average (0 runs). Points sized by innings played.'
  ) +
  theme_bw(base_size = 11)

Players with few innings are pulled strongly toward zero. Players with a full season of data stay close to their observed DRS. The model learns \(\sigma^{(j)}_\text{meas}\) and \(\sigma^{(j)}_\text{talent}\) from the entire league, so the shrinkage is calibrated rather than arbitrary.

Catcher Framing

Catchers have a unique defensive skill: influencing umpires to call borderline pitches as strikes. Framing runs (CFraming from FanGraphs) measure this directly as a run value.

Elite framers can save 15–20 runs per season, comparable to outstanding infield defense. The framing model mirrors the DRS model: observed framing runs equal true talent plus noise, shrinking toward zero for catchers with few games caught.

Catcher framing shrinkage plot

if (length(fld_stan$catcher_levels) > 0) {
  frame_post_med <- apply(
    as_draws_matrix(draws_fld[, , str_starts(variables(draws_fld), 'true_framing')]),
    2, median
  )

  tibble(
    player_id = fld_stan$catcher_levels,
    framing_post = frame_post_med
  ) |>
    left_join(d_framing, by = 'player_id') |>
    ggplot(aes(framing_runs, framing_post)) +
    geom_abline(slope = 1, intercept = 0, color = '#888888', linewidth = 0.5) +
    geom_hline(yintercept = 0, color = '#e6550d', linewidth = 0.4, lty = 'dotted') +
    geom_point(aes(size = games_caught), alpha = 0.5, color = '#756bb1') +
    geom_text_repel(
      data = ~ filter(.x, games_caught < 40 | abs(framing_runs - framing_post) > 5),
      aes(label = name),
      size = 2.8,
      max.overlaps = 12,
      seed = 42
    ) +
    scale_size_continuous(range = c(1, 4), name = 'Games caught') +
    labs(
      x = 'Observed framing runs',
      y = 'Posterior median true framing (runs)',
      title = 'Catcher framing shrinkage',
      subtitle = 'Dotted line: league average (0 runs)'
    ) +
    theme_bw(base_size = 11)
} else {
  cat('No catcher framing data available for this season.')
}

Baserunning

Total baserunning value is the sum of two components:

Component	Measures
UBR	Extra bases taken on hits, outs on the bases, other advancement decisions
wSB	Value of stolen base attempts: \(\text{wSB} = 0.2 \times SB - 0.4 \times CS\) (approximately)

We use the FanGraphs seasonal values directly as point estimates. These are already computed using run-expectancy linear weights, so they’re on the same scale as batting runs and DRS.

Baserunning values are smaller than batting or fielding, typically ranging from −5 to +5 runs, and their year-to-year reliability is moderate. An additional layer of Bayesian shrinkage is possible but low-priority for this model version.

Positional Adjustments

Not all positions require equal skill. A shortstop who bats identically to a first baseman is more valuable, because the shortstop provides defense the team could not easily replace.

We apply a fixed positional adjustment per 600 PA, reflecting the offensive opportunity cost of playing each position:

Positional adjustment table

tibble(
  Position = c(
    'Catcher', 'Shortstop', 'Second Base', 'Third Base', 'Center Field',
    'Left/Right Field', 'First Base', 'Designated Hitter'
  ),
  Abbrev = c('C', 'SS', '2B', '3B', 'CF', 'LF/RF', '1B', 'DH'),
  `Runs/600PA` = c(12.5, 7.5, 2.5, 2.5, 2.5, -7.5, -12.5, -17.5)
) |>
  gt() |>
  tab_header(
    title = 'Positional adjustments (FanGraphs standard)',
    subtitle = 'Runs above average per 600 plate appearances'
  ) |>
  fmt_number(columns = `Runs/600PA`, decimals = 1) |>
  data_color(
    columns = `Runs/600PA`,
    fn = col_numeric(c('#d73027', '#fee090', '#74add1'), domain = c(-18, 13))
  )

Position	Abbrev	Runs/600PA
Positional adjustments (FanGraphs standard)
Runs above average per 600 plate appearances
Catcher	C	12.5
Shortstop	SS	7.5
Second Base	2B	2.5
Third Base	3B	2.5
Center Field	CF	2.5
Left/Right Field	LF/RF	−7.5
First Base	1B	−12.5
Designated Hitter	DH	−17.5

These constants are well-established and would require multi-decade data to estimate reliably from scratch. The positional adjustment is scaled by actual PA / 600, so a catcher who played 400 PA receives \(12.5 \times (400/600) \approx 8.3\) runs of positional credit.

Putting It Together

WAR aggregation

All components are combined in R from posterior draws:

WAR aggregation (abridged)

# This is the same code as in position_war.qmd post-process chunk.
# Shown here for illustration with a minimal example.

# 1. Batting WAR from posterior wraa_per_pa draws
# war_mat[i, p] = posterior draw i, batter p
batting_war_mat_demo <- (sweep(wraa_mat, 2, pa_vec, '*') + # wRAA × PA
  outer(rep(1, nrow(wraa_mat)), 20 * pa_vec / 600)) / # + replacement
  woba_wts$rpw # ÷ RPW

# 2. Point estimates for fielding, baserunning, positional
# (computed similarly from fielding draws and seasonal data)

# 3. Total WAR combines all components; CI spans batting + fielding uncertainty
d_war <- tibble(
  player_id = batting_stan$batter_levels,
  batting_war_med = apply(batting_war_mat_demo, 2, median),
  batting_war_lo = apply(batting_war_mat_demo, 2, quantile, 0.10),
  batting_war_hi = apply(batting_war_mat_demo, 2, quantile, 0.90)
) |>
  left_join(d_pa, by = 'player_id') |>
  arrange(desc(batting_war_med))

cat('Top 5 by batting WAR contribution:\n')

Top 5 by batting WAR contribution:

WAR aggregation (abridged)

d_war |>
  slice_head(n = 5) |>
  select(name, pa, batting_war_med, batting_war_lo, batting_war_hi) |>
  gt() |>
  fmt_number(columns = c(batting_war_med, batting_war_lo, batting_war_hi), decimals = 1) |>
  cols_label(batting_war_med = 'Bat WAR', batting_war_lo = '10th', batting_war_hi = '90th')

name	pa	Bat WAR	10th	90th
Kyle Schwarber	343	3.2	2.2	4.3
Yordan Alvarez	358	3.1	2.1	4.2
Nick Kurtz	366	3.0	2.0	4.0
James Wood	382	2.4	1.6	3.5
Ben Rice	325	2.3	1.5	3.3

Comparison with FanGraphs fWAR

Our Bayesian WAR and FanGraphs fWAR answer the same question but with different methods. Key differences:

Aspect	This model	FanGraphs fWAR
Batting estimation	Hierarchical Bayesian + partial pooling	Direct wOBA calculation
Defense estimation	Gaussian shrinkage model on DRS (absolute-value parametrisation)	Regressed UZR/FRP (3-year average)
Uncertainty	Full posterior distributions	Point estimates
Low-PA players	Heavily shrunk toward mean	Minimum PA threshold applied
Park adjustment	HR, 1B, 2B, 3B component factors on log scale	Separate multi-factor park adjustment

Correlation with FanGraphs fWAR

# Retrieve FanGraphs fWAR from the batter leaders response
# (the WAR column is available in the same API call as batter IDs)
fg_war <- tryCatch(
  fg_batter_ids(season) |>
    transmute(player_id = playerid, fg_war = WAR),
  error = function(e) NULL
)

if (!is.null(fg_war)) {
  d_compare <- d_war |>
    left_join(fg_war, by = 'player_id') |>
    filter(!is.na(fg_war), pa >= 100)

  r_val <- round(cor(d_compare$batting_war_med, d_compare$fg_war, use = 'complete.obs'), 3)

  d_compare |>
    ggplot(aes(fg_war, batting_war_med)) +
    geom_abline(slope = 1, intercept = 0, color = '#888888', linewidth = 0.5, lty = 'dashed') +
    geom_point(aes(size = pa), alpha = 0.4, color = '#3182bd') +
    geom_smooth(method = 'lm', se = FALSE, color = '#e6550d', linewidth = 0.8) +
    geom_text_repel(
      data = ~ slice_max(.x, abs(fg_war - batting_war_med), n = 15),
      aes(label = name),
      size = 2.6,
      seed = 42
    ) +
    scale_size_continuous(range = c(0.5, 4), name = 'PA') +
    labs(
      x = 'FanGraphs fWAR',
      y = 'Bayesian Batting WAR (posterior median)',
      title = str_glue('{season} Bayesian vs. FanGraphs batting WAR'),
      subtitle = str_glue('r = {r_val} | PA \u2265 100 | Dashed: y = x')
    ) +
    theme_bw(base_size = 11)
} else {
  cat('FanGraphs fWAR data not available for comparison.')
}

The Bayesian estimates and fWAR should correlate strongly (target: \(r > 0.85\)). Large outliers are typically low-PA players where Bayesian shrinkage pulls the estimate toward zero, while fWAR uses the raw observed value.

Validation

Posterior uncertainty decreases with PA

A well-functioning model should have narrower credible intervals for players with more data.

Code

d_war |>
  filter(pa >= 50) |>
  mutate(ci_width = batting_war_hi - batting_war_lo) |>
  ggplot(aes(pa, ci_width)) +
  geom_point(alpha = 0.4, size = 1.2, color = '#3182bd') +
  geom_smooth(method = 'loess', se = FALSE, color = '#e6550d', linewidth = 0.8) +
  labs(
    x = 'PA',
    y = '80% CI width (batting WAR)',
    title = 'Posterior uncertainty decreases with plate appearances'
  ) +
  theme_bw(base_size = 11)

Total WAR sanity check

Position players account for roughly 55–60% of total WAR. With pitchers expected to generate 150–200 WAR per full season, position players should generate around 500–650 WAR.

Code

full_war <- read_rds(here(str_glue('data/fit_batting_{season}.rds')))
d_all_pa <- read_rds(here(str_glue('data/batter_logs_{season}.rds'))) |>
  group_by(player_id) |>
  summarise(pa = sum(pa), .groups = 'drop')

season_scale <- min(1, (as.integer(
  max(read_rds(here(str_glue('data/batter_logs_{season}.rds')))$date) -
    min(read_rds(here(str_glue('data/batter_logs_{season}.rds')))$date)
) + 1) / 186)

cat(str_glue(
  'Season scale factor: {round(season_scale, 3)}\n',
  'Expected batting + other WAR: ~{round(500 * season_scale)}\u2013{round(650 * season_scale)}\n'
))

Season scale factor: 0.5
Expected batting + other WAR: ~250–325

wOBA league average check

The league wOBA computed from game logs should match the FanGraphs published value within 0.002. A larger discrepancy indicates a data pipeline issue.

Code

woba_wts <- fg_woba_weights(season)
d_logs <- read_rds(here(str_glue('data/batter_logs_{season}.rds')))

lg_totals <- d_logs |>
  summarise(across(c(pa, n_hr, n_3b, n_2b, n_1b, n_bb, n_hbp), sum))

computed_woba <- with(
  lg_totals,
  (woba_wts$w_hr * n_hr + woba_wts$w_3b * n_3b + woba_wts$w_2b * n_2b +
    woba_wts$w_1b * n_1b + woba_wts$w_bb * n_bb + woba_wts$w_hbp * n_hbp) / pa
)

diff <- abs(computed_woba - woba_wts$lg_woba)
flag <- if (diff > 0.002) '\u26a0 exceeds tolerance' else '\u2713 within tolerance'
cat(str_glue(
  'Computed lg wOBA: {round(computed_woba, 4)}\n',
  'FanGraphs lg wOBA: {round(woba_wts$lg_woba, 4)}\n',
  'Difference: {round(diff, 4)} {flag}\n'
))

Computed lg wOBA: 0.3151
FanGraphs lg wOBA: 0.317
Difference: 0.0019 ✓ within tolerance