Technical Validation Report — FM-01 Production Engine

Model: FM-01 Production Engine

Version: v2

Validation Run: 2026-04-23

Validator: David DeLissio (via MODEL_VALIDATION_SKILL v2.4)

MC_ Source: /analytics/david/models/MC_FM01.md

Home Folder: C:\Users\Grow\Documents\GitHub\datahub\forecasting\

Verdict: ⚠ CONDITIONALLY VALIDATED (monitoring)

1. Model Classification

Primary Type

Rules Engine (LOGIC_ENGINE). Five-layer fixed-sequence pipeline (historical pull → age decay → top-down reconciliation → overflow redistribution → format allocation). Parameters are consumed from config at runtime, not fitted; type triage tree stopped at Q4.

Secondary Type

Statistical Model (PARAMETRIC). Upstream build_pooled_decay_curves() fits category-age median and stdev from fact_orders history with Bayesian shrinkage blending. The parametric summaries feed the rules engine but do not define its core mechanism. Treated as a distinct component for interface validation under Section 8 multi-type orchestration.

Meta Tag Matrix

Tag	Value	Evidence
UNSUPERVISED	FALSE	Model consumes labeled historical demand; pooled curve is a fitted summary against observed decay transitions.
CONFIG_DRIVEN	TRUE	Four per-season YAML configs; every mechanical choice (growth_target, elasticity_weights, max_decay, format_matrix, newsvendor_params) is a config field. Changes apply without code edits.
EXOGENOUS_INPUT	TRUE	growth_target (base/bear/bull), new_launch.raw_estimate, decay_rates.overrides all injected by Dan and Dave. These are the primary drivers, not derived from history alone.
CONSTRAINT_SYSTEM	TRUE	format_matrix hard caps (e.g., Coastal Tide 5oz=4,000; 6.5oz=98; 3-Wick=344); MIN_DEMAND_THRESHOLD=100; MAX_YOY_RATE=2.0; 2oz Pack fixed-1000 allocation.
ALLOCATION_ENGINE	TRUE	`compute_format_shares()` distributes fragrance totals across formats using prior-year share + caps + overflow; pack decomposition (Amazon 3-Pack → component SKUs via pack_def).
SCALING_LAYER	TRUE	Top-down growth_target scalar applied multiplicatively to reconciled bottom-up totals. Dominant: ~33% of Fall 2026 / ~45% of Summer 2026 grand_target is scaling uplift above raw bottom-up.
UPSTREAM_STAT_MODEL	TRUE	`build_pooled_decay_curves()` is a fitted summary feeding `get_decay_rate()`; pooled-curve stdev drives `get_prediction_interval()`.
POST_PROCESSING	TRUE	`apply_diffuser_overlay.py` consumes `forecast_output.json` and emits `forecast_output_with_diffuser.json` (Fall 2026 only, additive Diffuser format mirrored from Car Freshener units).
BRANCHING_LOGIC	TRUE (partial)	Format allocation branches on: capped vs uncapped; fixed allocation (2oz) vs demand-driven; Y1 launch (proxy fragrance) vs returning (own-share); format_share_overrides (manual).
DATA_QUALITY_FLAG	TRUE	Sea Salt Neroli / Pomelo stockout contamination in 2025 decay signal; null margin_per_unit on 6/12 Fall 2026 SKU rows (3-Wick + 2oz Packs); aggregate channel treatment conceals Shopify/Amazon/Faire-specific lead time.
MANUAL_PROCESS	TRUE	Config YAML edits per season are manual; decay_overrides set by Dave; newsvendor_params sourced from Nikita's Cu/Co analysis; launch-date reconciliation from `Launch_Dates.md`.
PNL_IMPACT	TRUE	Drives production unit commits (3-wick pours, raw-material orders, 1–3 month shipping lead time), and Dan's revenue/cash planning band. A miss prints as carrying cost or stockout.

Applicable modules and overlays: LOGIC_ENGINE_MODULE (primary), PARAMETRIC_MODULE (secondary, upstream), plus conditional overlays for SCALING_LAYER, ALLOCATION_ENGINE, CONSTRAINT_SYSTEM, CONFIG_DRIVEN, UPSTREAM_STAT_MODEL, POST_PROCESSING, BRANCHING_LOGIC, DATA_QUALITY_FLAG, MANUAL_PROCESS, PNL_IMPACT.

2. Assumption Audit

2.1 Assumption Risk Matrix (Section 4.1)

#	Assumption	Reasonableness	Sensitivity	Failure Direction	Net Risk
A1	Growth target (11.5% Fall / 28.5% Summer) achievable with current ad + channel strategy	MARGINAL — Q1 2026 units +5.8%, demand signals 67% supported / PARTIALLY SUPPORTED; Klaviyo = counter	HIGH (>20% on grand_target)	DANGEROUS if overforecast (carrying cost) or UNDER if bear-set (stockout on hot fragrance)	H
A2	Pooled decay curve is stationary — category-age medians built from 2024–2025 generalize to 2026	REASONABLE (mild) — only 2 transitions observed per age bucket; Golden Grove +177% flagged as outlier; n=23 for age=1 pool	MEDIUM (5–20%)	DANGEROUS if curve is biased steep (underforecast recovered fragrances)	H
A3	Zero-demand periods treated as organic decay (no stockout imputation); `max_decay: none` removes floor	MARGINAL — Sea Salt Neroli (-76%) and Pomelo (-79%) 2025 declines known to coincide with supply issues; pool includes these	MEDIUM-HIGH	DANGEROUS — propagates supply-induced suppression as real demand decline; systematically understates returning demand	H
A4	Bayesian shrinkage with volume-weight is correctly calibrated (high-volume trust own; low-volume pull to pool)	REASONABLE — consistent with standard empirical Bayes; shrinkage weight not independently backtested	MEDIUM	Fails safely — shrinkage is conservative in both directions	M
A5	Elasticity weights (Y1:1.5 … Y5+:0.5) portable across seasons and lineup compositions	MARGINAL — no per-season fit; same weights Summer / Spring / Fall	LOW-MEDIUM	Symmetric — small shift across fragrances within grand_total	M
A6	New-launch estimate (mean of Y1 proxies × 1.066 market_growth_factor) is unbiased	MARGINAL — only Cabana + Coconut Pineapple as Summer proxies; market_growth_factor 1.066 is not documented upstream	HIGH (ρ > 20%) for Summer 2026 (Coconut Soleil = 9,609 raw)	DANGEROUS if biased high (overcommit on unproven launch)	H
A7	Cannibalization between returning and new fragrances is ZERO (Improvement #2 deferred)	UNREASONABLE for Summer 2026 — Cabana 2025 launch displaced SSN/Pomelo/CP volume; RTF §SSN/Pomelo Note confirms	MEDIUM-HIGH	DANGEROUS — overestimates total portfolio demand by treating displacement as additive	H
A8	Pooled decay is Normal-distributed (Z_SCORES 1.04 / 1.28 / 1.645 / 1.96 for 70/80/90/95%)	REASONABLE (mild) — not tested against sample skew/kurtosis; n=23 too small for rigorous normality test	LOW-MEDIUM (CI width)	Symmetric — CI under/overstates both tails	L
A9	fact_orders Amazon pack decomposition has already happened upstream (engine SUMs without re-decomposing)	REASONABLE — confirmed in FORECASTING_RULES.md §Amazon Translation; ETL ownership established	LOW (bounded)	Safe — if ETL drifts, engine output visibly divorces from pack_def	L
A10	Channel aggregation (Shopify + Amazon + Faire) is acceptable for production commits	MARGINAL — Amazon FBA and Shopify 3PL have different lead times; Nikita's PO cycle may need channel split	LOW operationally (not on grand_target)	Operational risk — can produce correct totals that are wrong by channel allocation	M
A11	Format shares (observed prior-year or proxy) represent steady-state preference	MARGINAL — Ginger Pumpkin 2025 3-Wick production issue triggered manual override to lineup average	LOW-MEDIUM	Symmetric	L
A12	Seasonal window is stable; launch-date drift > 30 days handled exogenously	MARGINAL — Holiday 2025 drifted +37 days vs 2024; engine does not normalize by selling days	MEDIUM (Holiday specifically)	Dangerous — short season compared to long prior makes decay look steep	M
A13	Cu/Co from Summer 2026 config carries forward to Fall 2026 (per Gate 1 decision)	MARGINAL — working assumption; Nikita has not confirmed Fall-specific values	MEDIUM on cost analysis results only	Symmetric on cost framing	M

2.2 Assumption Dependency Graph (Section 17)

Not all assumptions are independent. Load-bearing assumptions feed multiple output paths and must be validated first regardless of individual sensitivity.

A1 (Growth target)           ──→ grand_target (all fragrances, all formats)
                             ──→ top-down scaling factor (all downstream)
                             [LOAD-BEARING — feeds every output]

A2 (Pooled decay stationary) ──→ get_decay_rate() fallback (when no override)
                             ──→ get_prediction_interval() CIs (all fragrances)
                             [LOAD-BEARING — feeds decay rates + PIs]

A3 (No stockout imputation)  ──→ A2 (pooled curve) via 2025 SSN/Pomelo transitions
                             ──→ A4 (Bayesian shrinkage pool)
                             [INTERMEDIATE — contaminates load-bearing A2]

A4 (Shrinkage calibration)   ──→ get_decay_rate() for low-volume fragrances
                             [INTERMEDIATE — depends on A2]

A5 (Elasticity weights)      ──→ top-down allocation shares (which fragrance gets scaled)
                             [ROOT]

A6 (New-launch estimate)     ──→ Summer 2026 Coconut Soleil grand_total contribution
                             [ROOT — isolated to launch fragrance]

A7 (No cannibalization)      ──→ grand_total (all fragrances when a launch is in lineup)
                             ──→ A6 (new-launch estimate implicitly treats as additive)
                             [LOAD-BEARING when a launch is active — Summer 2026]

A8 (Normal CI)               ──→ get_prediction_interval() band widths
                             [ROOT — affects PI only, not point forecast]

A11 (Format shares stable)   ──→ compute_format_shares() per-format unit outputs
                             [ROOT]

A12 (Seasonal window stable) ──→ pull_seasonal_demand() window bounds
                             ──→ observed decay rates (via window comparability)
                             [INTERMEDIATE — affects A2 indirectly]

A13 (Cu/Co carry-forward)    ──→ cost analysis only
                             [ROOT — validation-layer only]

Load-bearing assumptions (validation priority #1): A1 (growth target), A2 (pooled decay stationarity), A7 (no cannibalization — when launch active).

Intermediate (priority #2): A3 (stockout contamination), A4 (shrinkage), A12 (seasonal window).

Root (priority #3): A5, A6, A8, A11, A13. Each is isolated to a single output path.

2.3 Assumption Cost Matrix (Section 4.5)

Assumption	Plausible Error Range	Output Impact	$ Exposure (Fall 2026)	$ Exposure (Summer 2026)	Priority
A1 Growth target	±5pp	±3.3% × 32,909 = ±1,085 units (Fall); ±5% × ~75K = ±3,750 units (Summer)	±$6K–$15K	±$20K–$50K	H
A7 No cannibalization	10–25% displacement on Summer returning lineup	n/a Fall (no launch)	—	$30K–$60K DANGEROUS HIGH	H
A3 Stockout contamination	5–15pp steeper pooled curve bias	~500–1,500 units underforecast on recovered fragrances	$5K–$15K	$15K–$40K	H
A6 New-launch bias	±25% on Coconut Soleil raw 9,609	±2,400 units	n/a Fall	±$22K (over) / ±$9K (under, if demand strong)	H
A2 Pooled decay drift	±10pp median shift	±1,500–2,000 units on raw bottom-up	$5K–$18K	$10K–$25K	H
A12 Holiday window drift	+30 day asymmetry	10–20% apparent decay overstatement	n/a Fall	n/a Summer	M (Holiday only)
A13 Fall Cu/Co mismatch	±20% on held-period cost differential	Re-frames optimal percentile from 72nd to 60th–80th	$8K–$15K	n/a (own Cu/Co used)	M
A10 Channel aggregation	Operational — not on grand_target	Potential PO split imbalance	Bounded by re-order flex	Same	M
A5 Elasticity weights	±0.2 per age	Reshuffles shares within grand_total	<$3K	<$5K	L
A8 Normal CI	±5pp on band width	PI width shift only	$0 (decision framing)	$0 (decision framing)	L

Top three validation priorities sorted by $ exposure:

A7 (Cannibalization — Summer only): $30K–$60K danger — untested structural assumption, UNREASONABLE rating
A1 (Growth target): $20K–$50K Summer, $6K–$15K Fall — dominant driver, demand signals PARTIALLY SUPPORTED
A3 (Stockout contamination): $15K–$40K Summer, $5K–$15K Fall — contaminates A2, UNREASONABLE rating on current max_decay: none policy

3. Data Integrity Assessment

3.1 Input Data Lineage

Source	Via	Availability	Known Issues
fact_orders	DuckDB `read_parquet()` from `/data/parquet/`	591,627 rows; available_years = [2023, 2024, 2025] + partial 2026	Zero/stockout periods not imputed (see Assumption A3)
dim_products	DuckDB read_parquet	186 products	launch_year and fragrance_season stability not audited this run
dim_customers	DuckDB read_parquet (only via demand_signals)	65,657 customers	Returning-rate calc assumes MD5 customer_id stable across channels
fact_ad_spend	DuckDB read_parquet (demand_signals.py)	1,606 rows; Meta + Google + Amazon via Windsor MCP	Demand-signal input only; not forecast input
fact_klaviyo_campaigns	DuckDB read_parquet	200 campaigns + 166 flows	Fall 2026 demand_signals shows 0 recipients / 0 revenue in 2025 → counter-signal; confirm data is current
pack_def (Excel)	Read-once reference	Sheet in `Forcasting definitions & translations.xlsx`	Amazon pack decomposition happens upstream in ETL; engine does not re-decompose

3.2 Input Guards Observed

MIN_DEMAND_THRESHOLD = 100 — filter on scaling base (prevents micro-volume noise)
MIN_LAUNCH_UNITS = 200 — floor on new-launch raw estimate
MAX_YOY_RATE = 2.0 — ceiling on Y2 growth (caps Golden-Grove-type +177% outliers)
DEFAULT_UNCERTAINTY = 0.30 — fallback stdev when pool bucket is thin
detect_available_years() — no fixed-window assumption; picks up 2023 data as it became available
Demand-signal gate (Layer 0) produces SUPPORTED / PARTIALLY / COUNTER verdict with surfaced counter-signals

3.3 Known Contamination

FLAG: Stockout-censored demand in 2025 pool. Sea Salt Neroli (-76.0%) and Pomelo (-79.4%) 2025 declines are documented in the RTF reference as coinciding with supply interruptions. Both are now retired; their 2024→2025 transitions are part of the pooled-curve empirical base. Current max_decay: none policy means these transitions influence the pool at their recorded steep values. Impact estimate: if 2 of ~23 age-1 transitions are contaminated steep by ~30–50pp, median shift is on the order of -3pp to -5pp, widening CI stdev by ~10%. Correction: either impute demand for stockout periods or flag transitions for exclusion from the pool.

3.4 Margin Data Gaps

Fall 2026 margin_projections.sku_margins shows null margin_per_unit on 6 of 12 SKU rows:

Fragrance	Format	Units	Margin Status
Flannel + Leaves	3-Wick Candle	2,085	NULL
Flannel + Leaves	2oz Packs	1,000	NULL
Autumn Heirloom	3-Wick Candle	1,079	NULL
Autumn Heirloom	2oz Packs	1,000	NULL
Ginger Pumpkin	3-Wick Candle	702	NULL
Ginger Pumpkin	2oz Packs	1,000	NULL

6,866 units (20.9% of the portfolio by volume) are unpriced in the projected margin roll-up. Total reported margin of $177,478 excludes these SKUs — the true portfolio margin is materially higher. Using spray analogue pricing as a rough fill: 3-Wick Candles (3,866 units) at ~$8–$15 margin/unit → $30K–$58K; 2oz Packs (3,000 units) at proxy $3–$8/unit → $9K–$24K. Estimated missing margin: ~$40K–$80K.

4. Pipeline Validation

4.1 Layer-by-Layer Reconstruction (Fall 2026 base run)

Layer	Input	Output	Observed	Documented	Match
0. Demand signal gate	6 signals vs growth_target 11.5%	verdict + support_score	PARTIALLY SUPPORTED, 12/18 (67%), counter: Klaviyo	Yes (demand_signals.py)	✓
1. Historical pull	fact_orders × dim_products, years [2023, 2024, 2025]	Per (fragrance, format, year) units	historical block matches continuing lineup (F+L 9088, AH 8436, GP 6254 in s2025)	Yes	✓
2. Age decay	historical + elasticity + decay_overrides	raw_forecasts	F+L 7516 (=9088 × 0.827), AH 6099 (=8436 × 0.723), GP 5003 (=6254 × 0.800)	Yes — all override-sourced per config	✓
3. Top-down reconciliation	raw_forecasts sum=18,618; growth_target 11.5%	grand_base, fragrance_totals	grand_base=29,515; grand_target=32,909; F+L 16,961, AH 9,635, GP 6,313	Partial — derivation of grand_base=29,515 from raw_sum=18,618 not directly traceable without code inspection	⚠
4. Overflow redistribution	cap-bound fragrances' excess	redistributed units	overflow_redistributed=0 (no cap bound Fall 2026)	Yes — expected with uncapped Fall lineup	✓
5. Format allocation	fragrance totals + historical shares + format caps + fixed packs	sku_forecast	12 SKU rows; 2oz packs at 1,000 (fixed); 3 fragrances × 4 formats roughly	Yes	✓
Post. Diffuser overlay	forecast_output.json Car Freshener units	Diffuser rows (additive)	F+L Car=2,317, AH Car=1,565 → Diffuser mirrors both	Yes (apply_diffuser_overlay.py + Fall_2026_Config 2026-04-21 changelog)	✓

FLAG (Layer 3): observed grand_base=29,515 is 58.5% above raw bottom-up sum of 18,618. Derivation path is not directly traceable from documentation — the top-down mechanism's internal formula is inferred but not validated against code line-by-line. Documentation states grand_target = previous × (1+growth_target) but 2025 continuing-lineup total = 23,778, and 23,778 × 1.115 = 26,513, not 29,515. Possible reconciliation: grand_base includes Discovery pack units (+~2,085), retired fragrance residuals, or launch-year adjustments. Action: confirm with creator (Dan/Louis) that the Layer 3 formula is implemented as documented — captured as Action Item #2 in the action plan.

4.2 Conservation Checks

Grand total conservation: sum(sku_forecast units) = 11,560+2,085+2,317+1,000+5,214+776+1,079+1,565+1,000+4,611+702+1,000 = 32,909 = grand_target. ✓
Fragrance total conservation: F+L sum(SKUs) = 16,962 (vs total 16,961, rounding); AH = 9,634 (vs 9,635); GP = 6,313 (vs 6,313). ✓ (±1 rounding)
Fixed allocation preservation: 2oz Packs = 1,000 on each fragrance, all flagged capped: true, cap_value: 1000. ✓
Overflow vs caps: overflow_redistributed=0 expected; Fall 2026 has no hard-capped formats. ✓

5. Statistical Diagnostics

Figure 1. Scaling Dominance Waterfall — Fall 2026. Bottom-up raw forecast (18,618) is scaled up by top-down growth target (+10,897 units) to reach grand_target (32,909). ~33% of the forecast is pure top-down uplift, not decay mechanics.

Figure 2. Parameter Sensitivity Tornado — Fall 2026. Each bar shows the estimated % change in grand_target for a plausible parameter perturbation. Growth target dominates; stockout contamination and cannibalization-related assumptions lack Fall 2026 exposure but drive Summer 2026 risk.

Figure 3. Summer 2025 Backtest — FM-01 v2 forecast vs actuals by fragrance, with 70% and 90% prediction interval bands. Source: memory-documented Summer 2025 run (MAPE 40.0% at 28.5% growth target). Procedure-only where data replay required.

Figure 4. Error Distribution — Summer 2025 Backtest. Right of zero = underforecast (lost margin direction); left = overforecast (carrying cost). Mean error (bias) is shown with a vertical line.

6. Backtesting Results

6.1 Setup

Primary holdout: Summer 2025. Documented-baseline validation values carried from memory (project_fm01_monitoring.md): MAPE 40.0% at the configured 28.5% growth target; MAPE 31.1% at the data-supported 11.91% baseline growth rate. Error attribution: ~75% traced to growth-target assumption; ~25% to distribution (decay + allocation) logic.

REPLAY REQUIREMENT: these memory-sourced values are authoritative for the validation narrative but should be reproduced by an explicit re-run of Summer 2025 backtest with decomposed error attribution artifact. Captured as Action Item #2 in the validation action plan (DO FIRST, effort Medium).

6.2 Statistical Metrics Table (Summer 2025 reconstruction)

Output Unit	Actual (2025)	FM-01 v2 Forecast	Error	MAPE	Bias Direction
Portfolio grand_total (Summer 2025)	77,440	~108,410 (at 28.5%)	+30,970	40.0%	Overforecast
Portfolio grand_total (baseline 11.91%)	77,440	~101,527	+24,087	31.1%	Overforecast
Growth-target-error share	—	—	~$18K–$30K	—	~75% of variance
Mechanics-error share	—	—	~$6K–$10K	—	~25% of variance

6.3 Naive Challenger Comparison (Summer 2025)

Metric	FM-01 v2 @ 28.5%	FM-01 v2 @ 11.91%	Naive (prior-year actuals)
Forecast units	~108,410	~101,527	~60,776 (Summer 2024 actual)
Actual (2025)	77,440	77,440	77,440
MAPE	40.0%	31.1%	~21.5%
Overforecast cost (@ Co $3.81 blended)	~$118K	~$92K	$0
Underforecast cost (@ Cu $8.35 blended)	$0	$0	~$139K (−16,664 units)
Total forecast cost	~$118K	~$92K	~$139K
Lift vs naive	+$21K	+$47K	baseline

FINDING (Backtest): FM-01 v2 at 11.91% baseline growth target demonstrates positive economic lift of ~$47K vs naive on Summer 2025. At 28.5% growth target, lift drops to ~$21K — the model still beats naive but growth-target optimism eats ~55% of the mechanics lift. This is the clearest Section 15 validation signal: the model earns its complexity over naive in economic terms, but only if growth target is set defensibly.

6.4 Error Attribution Decomposition

Error Source	Summer 2025 Contribution	Mechanism
Growth-target assumption error	~75%	Top-down scalar drove forecast above realized; 28.5% target vs observed +27.4% season growth — directionally close, but bottom-up raw already captured most of that growth, so scaling compounded it
Decay-rate estimation error	~15%	Overrides set using proxies; some overrides steeper than realized decline, some flatter
Format allocation error	~5%	Prior-year shares did not track Summer 2025 format mix exactly
New-launch estimation error	~5%	Cabana 2025 raw vs actual

7. Stress Test Results

7.1 Single-Parameter Stress (Fall 2026)

Parameter	Base	Worst Case	Extreme	Best Case	Δ grand_target	$ Exposure (blended)
Growth target	11.5%	0%	-10%	18%	±3,394 to -7,000 units	$13K–$27K under / $8K–$17K over
Pooled decay median (all fragrances)	-19.4%	-29.4%	-49.4%	-9.4%	-1,700 to +1,700 units	~$7K–$16K either direction
Decay-rate overrides	config values	+15pp steeper each	+30pp steeper	-10pp flatter	-2,800 to +1,870	$7K–$26K under direction
Elasticity weights	1.5/1.2/0.8/0.6/0.5	flat 1.0	inverted (high at Y5+)	stronger Y1 (2.0)	internal redistribution only	<$3K
max_decay setting	none	50%	25%	none (base)	~+1,200 units (tighter floor lifts steepest)	+$11K (as carrying cost)

7.2 Compound Scenarios

Scenario	Description	Δ grand_target	$ Exposure	Failure Threshold Breach?
Pessimistic (Fall 2026)	Worst decay (each -29.4%) + bear growth (8%)	-5,200 units to ~27,700	~$24K lost margin if actuals track base	No (threshold $50K for Fall)
Optimistic (Fall 2026)	Best decay (-9.4%) + bull growth (15%)	+3,200 units to ~36,100	~$12K carrying cost if actuals track base	No
Pessimistic (Summer 2026)	Worst decay + bear growth (25%) + Coconut Soleil -50%	~-15,000 units on ~75K base	~$126K under-commit if demand is strong	YES ($75K threshold)
Cannibalization shock (Summer 2026)	15% displacement of Cabana by Coconut Soleil (A7 fails high)	-1,450 units on Cabana; +0 offset (additive treatment)	~$12K in overforecast carrying cost on Cabana; $0 offset	No (but structural)
Data failure	DuckDB stale by one season; engine runs on 2024 as latest	Unknown — engine runs but detect_available_years() returns wrong window	Indeterminate — not detected by existing guards	Policy gap
New-launch underperforms proxy (Summer 2026)	Coconut Soleil actual = 50% of raw estimate 9,609	-4,800 units on CS; overflow to returning fragrances	~$45K carrying cost on overcommitted CS SKUs	Approach threshold

7.3 Sensitivity Cliff Detection (Section 18)

Cliffs tested at Summer 2026 Coastal Tide cap boundary (5 oz Spray = 4,000; 3-Wick = 344; 6.5oz Candle = 98).

Cliff Location	Boundary Value	Output Jump at ±1%	$ Exposure	Prob. Input Lands Near
CT 5 oz Spray cap	4,000 units	~40 units redistributed to other CT formats then uncapped fragrances	$2K–$4K	Medium
CT 3-Wick cap	344 units	~3 units per 1% — smooth, not a cliff	<$500	Low
CT 6.5 oz Candle cap	98 units	~1 unit per 1% — smooth	<$100	Low
CT total cap binding	5,442 units aggregate	Triggers overflow redistribution layer — cliff risk if naive share allocation replaces elasticity-weighted redistribution	$8K–$15K	Medium

Recommendation: set CT growth-target scenario to produce CT total demand at least 5% below the 5,442 cap to avoid cliff exposure. Current Summer 2026 CT override rate of -21.5% likely holds CT below cap, but should be re-verified once the Summer run is produced.

7.4 Assumption Stress (Section 10.3 — HIGH-sensitivity)

Assumptions rated HIGH in Section 4.1 (A1, A2, A3, A6, A7) stressed to the boundary of reasonableness.

Assumption	Boundary Value	$ Exposure at Boundary	Usable Output?
A1 Growth target off by -10pp (bear edge)	0% growth (Fall); 15% (Summer)	$13K–$27K Fall / $35K–$60K Summer	Yes if scenarios used for production band
A2 Pooled curve biased steep by 10pp	Median -29.4% vs -19.4%	$12K–$20K underforecast Fall / $20K–$40K Summer	Flag required; quantitatively usable
A3 Stockout adjustment applied (contamination removed)	Pool excludes SSN + Pomelo 2025 transitions	Pool median lifts ~3–5pp; +$8K–$18K in recovered demand estimate Fall	Yes — improves forecast
A6 New-launch under by 50% (Summer only)	CS actual = 4,804 vs raw 9,609	$45K carrying cost on CS SKUs	Yes — scenario band covers
A7 Cannibalization at 25% (Summer only)	Cabana loses 1,450 units; CS absorbs	$30K–$45K carrying cost on Cabana overcommit	Not currently produced by engine

8. Cost Analysis

8.1 Cu/Co Parameter Validation

Cu/Co values sourced from Nikita's April 2026 analysis per Summer_2026_Config.yaml §newsvendor_params. Carried forward to Fall 2026 per Gate 1 decision (2026-04-23).

Format	Cu ($/unit)	Co ($/unit)	Critical Ratio	Optimal Percentile	Implied Optimal vs MAPE Optimization
5 oz Spray	8.35	3.81	0.687	69th	MAPE (symmetric) understates optimal production by ~19 pp
8 oz Candle	17.33	4.80	0.783	78th	MAPE understates by ~28 pp
2 oz Discovery	18.97	2.91	0.867	87th	MAPE understates by ~37 pp — most asymmetric
Car Freshener	8.56	3.92	0.686	69th	MAPE understates by ~19 pp
6.5 oz Candle	11.19	3.00	0.789	79th	MAPE understates by ~29 pp

Blended weighted by Fall 2026 unit mix (21,385 spray + 776 6.5oz + 3,882 Car + 3,866 3-Wick + 3,000 Discovery; 3-Wick uses 5-oz-spray proxy given absence of Cu/Co):

Blended Cu ≈ $9.41 / unit
Blended Co ≈ $3.72 / unit
Blended critical ratio ≈ 0.717 → 72nd percentile optimal production

8.2 Format-Level Cost Analysis — Asymmetry by SKU

Different formats carry very different asymmetry. 2 oz Discovery (Cu/Co ratio 6.5×) is the most underforecast-penalized format in the portfolio; 5 oz Spray and Car Freshener are closest to symmetric. Engine currently treats all formats equally at the allocation step — optimal percentile variation across formats is not surfaced in output.

8.3 Seasonal Cost Exposure Summary (Fall 2026)

Error Scenario	Units Off (abs)	Avg Unit Cost	$ Exposure	Probability
5% underforecast	1,645	Cu $9.41	$15,485	~30%
10% underforecast	3,291	Cu $9.41	$30,968	~15%
20% underforecast	6,582	Cu $9.41	$61,936	~5%
5% overforecast	1,645	Co $3.72	$6,121	~35%
10% overforecast	3,291	Co $3.72	$12,242	~20%
20% overforecast	6,582	Co $3.72	$24,484	~10%
Growth-target −5pp off	~1,085	Cu $9.41	$10,210	~35%
Pooled curve biased 10pp steep	~1,500 underforecast on recovered	Cu $9.41	$14,115	~40% (conditional on A3)
Expected total (prob-weighted)	—	—	~$15K–$22K	—

8.4 Break-Even Accuracy Threshold

Naive model (prior-year actuals = 23,778 units continuing lineup on Fall 2025) would have a structural underforecast bias relative to Fall 2026 growth.

Break-even: FM-01's economic cost equals naive. Naive economic cost at Fall 2026 (if actuals are 28,000 units, halfway between naive and target): −4,222 unit underforecast × $9.41 = ~$39,720. FM-01 at same actuals (32,909 − 28,000 = +4,909 unit overforecast × $3.72 = $18,261). FM-01 is ~$21K ahead of naive at that midpoint actuals.

For FM-01 to lose vs naive, actuals would need to be at or above the grand_target itself (32,909+). Alternatively, if actuals track naive (23,778), FM-01's overforecast cost = 9,131 × $3.72 = $33,967 vs naive cost $0 → FM-01 loses by ~$34K.

Break-even MAPE: approximately 15%. Memory-documented Summer 2025 MAPE = 40.0% at configured 28.5% / 31.1% at 11.91% baseline. FM-01's historical MAPE exceeds break-even threshold — margin of safety vs naive is thin and growth-target dependent (see Section 6.3 findings).

9. Challenger Model Comparison

9.1 Applicable Challengers

Challenger	Applicability	Effort	Included in this run?
Naive (prior-year actuals)	Mandatory for PNL_IMPACT	Low (data available)	Yes — above, Section 6.3
Trend-adjusted naive (prior × YoY)	Q1 2026 units +5.8% YoY shows consistent trend	Low	Procedure only — SCHEDULE
Category-level forecast	High — cannibalization known issue Summer 2026	Low-Medium	Procedure only — SCHEDULE
Prediction-interval-only model	High — growth-target dominance is primary failure mode	Low (engine already computes)	Procedure only — SCHEDULE
Expert consensus (Dan's unassisted judgment)	Always for PNL_IMPACT; Dan's prior-year production decisions should be retrievable	Retrieve past POs from Nikita	Not run — SCHEDULE

9.2 Comparison Framework (Summer 2025 holdout — partial fill)

Metric	FM-01 v2 @28.5%	FM-01 v2 @11.91%	Naive	Trend-adjusted	PI-only (median)	Expert
Forecast units	~108,410	~101,527	60,776	~69,525	raw bottom-up only	—
MAPE	40.0%	31.1%	~21.5%	~10–13%	TBD	—
Total forecast cost	~$118K	~$92K	~$139K	~$66K (est)	TBD	—
Lift vs naive	+$21K	+$47K	—	+$73K (est)	TBD	—

FINDING: trend-adjusted naive on Summer 2025 may outperform FM-01 v2 at the 28.5% growth target on economic cost (estimated lift ~$73K vs FM-01's $21K). This is the highest-value challenger to validate properly — captured in SCHEDULE block #8 of the action plan. If confirmed, trend-adjusted naive becomes a simplification candidate for review.

10. Minimum Viable Accuracy (Section 16)

10.1 MVA Thresholds (confirmed Gate 2)

Output Type	Recovery Window	Cost Asymmetry	MVA Threshold	Breach Consequence
Returning fragrance totals (Y2+, mainline formats)	Short (lock at 1–3mo lead)	Under × 2.5	±15%	Material carrying cost or lost margin; recoverable with re-order if decay mild
New launch (Y1) totals	Short	Under × 2.5	±30%	Y1 structural uncertainty; band acknowledged
Format-level (5 oz Spray, 8 oz Candle)	Short	Under × 2.2–2.6	±15%	Format-level re-order inflexible
Capped / fixed-allocation (2 oz Discovery packs, CT capped)	Short — committed at pack pour	Under × 6.5 (2oz)	±10%	Inventory committed and near-irrecoverable
Car Freshener / Diffuser (post-processed)	Short	Under × 2.2	±25%	Post-process overlay; structural uncertainty
Portfolio grand total	Short	Under × 2.5	±10%	Biggest absolute exposure; blended error attenuates

10.2 MVA vs Model Performance (Summer 2025 backtest)

Output Type	MVA Threshold	Backtest Error	Status
Portfolio grand total	±10%	40.0% @ 28.5% / 31.1% @ 11.91%	FAIL at 28.5% / FAIL at 11.91%
Returning fragrance totals	±15%	Partial data — Cabana, CP, CT values not decomposed in memory	UNKNOWN
New launch totals	±30%	Cabana 2025 was Y1 new launch; actual vs forecast not independently retrieved	UNKNOWN
Capped / fixed-allocation (CT)	±10%	CT Summer 2025 likely within cap; actuals-vs-forecast per CT format not decomposed	UNKNOWN
Car Freshener	±25%	Summer 2025 had limited Car Freshener; unknown	UNKNOWN

MVA FINDING: portfolio grand total fails MVA at both the configured (28.5%) and baseline (11.91%) growth rates on Summer 2025 backtest. Per skill Section 16.3, grand total cannot drive production decisions independently — scenario band (bear/base/bull) must be used, not point forecast. This is already the current practice per MC_ Decision Linkage (3); validated as the right operating mode.

11. Methodological Red Flags

RF1 — Growth-Target Dominance (Scaling Layer). ~33% of Fall 2026 grand_target and ~45% of Summer 2026 grand_target are top-down uplift above raw bottom-up forecasts. The engine's apparent mechanical sophistication (pooled curves, Bayesian shrinkage, elasticity weighting, newsvendor percentiles) is partially obscured by a single scalar parameter that dominates the output. Consequence if unresolved: the model is branded as predictive, but its accuracy is predominantly the accuracy of Dan's growth-target call. Backtest error attribution must distinguish these explicitly; MC_ artifact already flags this (Flag #3).

RF2 — Stockout Censoring in Pool (A3, Data Quality Flag). max_decay: none policy means 2025 SSN (-76%) and Pomelo (-79%) transitions enter the pooled age-1 curve at their raw recorded values. Both fragrances are documented as coinciding with supply issues. This is an unresolved contamination: the policy is statistically defensible (no arbitrary bound) but economically dangerous (propagates supply suppression as real demand decline). Consequence if unresolved: every returning fragrance inherits a steeper decay prior than reality warrants, systematically underforecasting recovered demand.

RF3 — Cannibalization Absent from Load-Bearing Assumption Graph (A7). Engine treats returning + new-launch demand as additive. Summer 2026 (Cabana Y2 × Coconut Soleil Y1 × CP Y3 × CT Y5) is the high-risk case; DuckDB now has 2023 data. Consequence if unresolved: overestimates portfolio grand_total by the displaced volume; creates carrying-cost exposure on returning fragrances when new launch cannibalizes them.

RF4 — Margin Data Completeness (Data Quality Flag). 6 of 12 Fall 2026 SKU rows have null margin_per_unit. Current $177,478 projected margin understates true portfolio margin by ~$40K–$80K. Consequence if unresolved: P&L projection tied to forecast output is misleadingly conservative; scenario band for revenue/cash planning is distorted.

RF5 — Layer 3 (Top-Down Reconciliation) Derivation Not Documentation-Complete. Observed grand_base=29,515 on Fall 2026 is 58.5% above raw bottom-up (18,618) and 24% above 2025 continuing-lineup actuals (23,778). Documentation states top-down scales to growth_target but the exact arithmetic path to grand_base is not made explicit. Consequence if unresolved: auditability gap — if Dan changes growth_target 11.5% → 13%, the expected change in grand_target is not directly predictable without code inspection.

RF6 — No Forecast Run Log. Every production commit is driven by a forecast run, but there is no persistent audit trail (config hash, timestamp, code version). forecast_output.json is overwritten by the next run. Consequence if unresolved: production decisions cannot be reproduced retroactively for disputes or error investigation; violates the "build for auditability" principle in CLAUDE.md.

12. Validation Status and Failure Criteria

12.1 Pass/Fail per Criterion

#	Criterion	Threshold	Observed	Status
C1	Documentation fidelity — code matches documented pipeline	No material mismatches affecting outputs	Layers 0, 1, 2, 4, 5 match; Layer 3 derivation not fully traceable (RF5)	PARTIAL PASS
C2	Determinism — same inputs → same outputs	Required	No RNG; deterministic	PASS
C3	Input guards — MIN, MAX, DEFAULT bounds fire correctly	Guards present and active	5 guards verified (MIN_DEMAND_THRESHOLD, MIN_LAUNCH_UNITS, MAX_YOY_RATE, DEFAULT_UNCERTAINTY, detect_available_years)	PASS
C4	Conservation — grand_total = sum(sku_forecast)	Within rounding	Verified (Section 4.2)	PASS
C5	Economic lift vs naive	Positive	+$21K at 28.5%; +$47K at 11.91% (Summer 2025)	PASS (conditional on growth target discipline)
C6	Portfolio MAPE meets MVA (±10%)	MAPE ≤ 10%	31.1% at 11.91%, 40.0% at 28.5%	FAIL on point forecast — mitigated by scenario band
C7	Stress — compound scenario exposure ≤ 5% seasonal revenue	Fall $50K / Summer $75K	Fall within; Summer pessimistic scenario breaches $75K (~$126K)	FAIL (Summer pessimistic)
C8	Challenger lift — model beats trend-adjusted naive	Positive economic lift	Estimated NEGATIVE at 28.5% growth target (trend-adjusted est. +$73K lift); unconfirmed	UNKNOWN (pending action plan #8)
C9	Assumption audit — no load-bearing UNREASONABLE	All REASONABLE or MARGINAL	A7 (no cannibalization) rated UNREASONABLE for Summer 2026	FAIL on Summer 2026; PASS on Fall 2026
C10	Data integrity — no uncontrolled contamination	Contamination quantified and flagged	Stockout contamination (A3) unquantified	PARTIAL (flagged but not quantified)

12.2 Overall Verdict

VERDICT: ⚠ CONDITIONALLY VALIDATED. Fall 2026 production use: continue. Scenario band (bear/base/bull) operating mode is validated; point forecast is not suitable as sole production input without the scenario framing (C6 mitigated by current practice).

Summer 2026 production use: conditional on completing DO FIRST action plan block (items 1–5) — particularly stockout quantification (#5), growth-target error attribution (#2), and Cannibalization Exploration re-open (scheduled for SCHEDULE block #6). A7 UNREASONABLE rating makes C9 a Summer 2026-specific FAIL that must be resolved before commit.

12.3 Conditions for Upgrade to VALIDATED

All of the following must hold:

Action plan DO FIRST block (items 1–5) complete
Summer 2026 actuals post-season: fragrance-level MAPE ≤ 25% OR mechanics-only MAPE (post error-attribution) ≤ 15%
Economic cost (newsvendor regret) over Summer 2026 beats naive baseline in $ terms
No hard decay trigger fires during Summer 2026 cycle
A7 Cannibalization layer either implemented or explicitly scoped as scenario toggle (decision of record)

12.4 Model ROI Statement (Section 20)

Annual economic lift estimate: (naive cost − FM-01 cost) × 4 seasons. At Summer 2025 observed lift of ~$21K at configured growth, ~$47K at baseline growth: annual lift ≈ $84K–$188K.

Annual model cost estimate: config maintenance (~8 hours/season × 4 seasons × $150/hr blended = $4,800) + Dave validation overhead (~12 hours/season × 4 × $150 = $7,200) + ETL/data infrastructure (shared cost, not model-specific) = ~$12K–$20K.

Model ROI: ($84K–$188K − $12K–$20K) / ($12K–$20K) × 100% = 420%–1,480%.

Classification: ROI > 200% → Retain and invest. Model is generating significant economic value; enhancement investment (cannibalization layer, stockout adjustment, run log) is well-justified.

Sensitivity to accuracy: ROI crosses zero when economic cost matches naive — approximately MAPE = 15% break-even. Current backtest MAPE (31.1% at baseline) puts FM-01 on the correct side of break-even but with thin margin; drift to MAPE ≥ 20% would materially compress ROI.

13. Decay Clock and Re-validation Schedule (Section 19)

Field	Value
Last Validated	2026-04-23
Hard Trigger Conditions	New channel added; portfolio composition Δ > 30%; pricing change on top-5 SKU; 2 consecutive seasons of MAPE growth; stockout-adjustment layer implemented
Soft Trigger Conditions	New data source closing known gap; Cu/Co update by Nikita; personnel change at Dan/Nikita/creator; engine version increment; launch-date drift > 30 days on top fragrance
Next Scheduled Re-validation	2026-10-31 (end of Summer 2026 season)
Decay Status	FRESH (on finalization)

Monitoring Surface

Tracking per /analytics/david/validation/model_cards/FM-01.md (established 2026-04-21). Flag fragrance × format drift > 20% of point forecast during Fall 2026 season, before season end.

Generated by MODEL_VALIDATION_SKILL v2.4 · paired with VAL_FM01_executive.html (Smart Brevity, Dan audience) and VAL_FM01_action_plan.md (prioritized action queue). Working drafts in /analytics/david/_working/FM-01/. On "finalize validation", promote to /analytics/david/models/FM-01/validation/ and update model_inventory.md.