Technical Validation Report — FM-01 Production Engine

Model: FM-01 Production Engine
Version: v2
Validation Run: 2026-04-23
Validator: David DeLissio (via MODEL_VALIDATION_SKILL v2.4)
MC_ Source: /analytics/david/models/MC_FM01.md
Home Folder: C:\Users\Grow\Documents\GitHub\datahub\forecasting\
Verdict:CONDITIONALLY VALIDATED (monitoring)

1. Model Classification

Primary Type

Rules Engine (LOGIC_ENGINE). Five-layer fixed-sequence pipeline (historical pull → age decay → top-down reconciliation → overflow redistribution → format allocation). Parameters are consumed from config at runtime, not fitted; type triage tree stopped at Q4.

Secondary Type

Statistical Model (PARAMETRIC). Upstream build_pooled_decay_curves() fits category-age median and stdev from fact_orders history with Bayesian shrinkage blending. The parametric summaries feed the rules engine but do not define its core mechanism. Treated as a distinct component for interface validation under Section 8 multi-type orchestration.

Meta Tag Matrix

TagValueEvidence
UNSUPERVISEDFALSEModel consumes labeled historical demand; pooled curve is a fitted summary against observed decay transitions.
CONFIG_DRIVENTRUEFour per-season YAML configs; every mechanical choice (growth_target, elasticity_weights, max_decay, format_matrix, newsvendor_params) is a config field. Changes apply without code edits.
EXOGENOUS_INPUTTRUEgrowth_target (base/bear/bull), new_launch.raw_estimate, decay_rates.overrides all injected by Dan and Dave. These are the primary drivers, not derived from history alone.
CONSTRAINT_SYSTEMTRUEformat_matrix hard caps (e.g., Coastal Tide 5oz=4,000; 6.5oz=98; 3-Wick=344); MIN_DEMAND_THRESHOLD=100; MAX_YOY_RATE=2.0; 2oz Pack fixed-1000 allocation.
ALLOCATION_ENGINETRUEcompute_format_shares() distributes fragrance totals across formats using prior-year share + caps + overflow; pack decomposition (Amazon 3-Pack → component SKUs via pack_def).
SCALING_LAYERTRUETop-down growth_target scalar applied multiplicatively to reconciled bottom-up totals. Dominant: ~33% of Fall 2026 / ~45% of Summer 2026 grand_target is scaling uplift above raw bottom-up.
UPSTREAM_STAT_MODELTRUEbuild_pooled_decay_curves() is a fitted summary feeding get_decay_rate(); pooled-curve stdev drives get_prediction_interval().
POST_PROCESSINGTRUEapply_diffuser_overlay.py consumes forecast_output.json and emits forecast_output_with_diffuser.json (Fall 2026 only, additive Diffuser format mirrored from Car Freshener units).
BRANCHING_LOGICTRUE (partial)Format allocation branches on: capped vs uncapped; fixed allocation (2oz) vs demand-driven; Y1 launch (proxy fragrance) vs returning (own-share); format_share_overrides (manual).
DATA_QUALITY_FLAGTRUESea Salt Neroli / Pomelo stockout contamination in 2025 decay signal; null margin_per_unit on 6/12 Fall 2026 SKU rows (3-Wick + 2oz Packs); aggregate channel treatment conceals Shopify/Amazon/Faire-specific lead time.
MANUAL_PROCESSTRUEConfig YAML edits per season are manual; decay_overrides set by Dave; newsvendor_params sourced from Nikita's Cu/Co analysis; launch-date reconciliation from Launch_Dates.md.
PNL_IMPACTTRUEDrives production unit commits (3-wick pours, raw-material orders, 1–3 month shipping lead time), and Dan's revenue/cash planning band. A miss prints as carrying cost or stockout.

Applicable modules and overlays: LOGIC_ENGINE_MODULE (primary), PARAMETRIC_MODULE (secondary, upstream), plus conditional overlays for SCALING_LAYER, ALLOCATION_ENGINE, CONSTRAINT_SYSTEM, CONFIG_DRIVEN, UPSTREAM_STAT_MODEL, POST_PROCESSING, BRANCHING_LOGIC, DATA_QUALITY_FLAG, MANUAL_PROCESS, PNL_IMPACT.

2. Assumption Audit

2.1 Assumption Risk Matrix (Section 4.1)

#AssumptionReasonablenessSensitivityFailure DirectionNet Risk
A1Growth target (11.5% Fall / 28.5% Summer) achievable with current ad + channel strategyMARGINAL — Q1 2026 units +5.8%, demand signals 67% supported / PARTIALLY SUPPORTED; Klaviyo = counterHIGH (>20% on grand_target)DANGEROUS if overforecast (carrying cost) or UNDER if bear-set (stockout on hot fragrance)H
A2Pooled decay curve is stationary — category-age medians built from 2024–2025 generalize to 2026REASONABLE (mild) — only 2 transitions observed per age bucket; Golden Grove +177% flagged as outlier; n=23 for age=1 poolMEDIUM (5–20%)DANGEROUS if curve is biased steep (underforecast recovered fragrances)H
A3Zero-demand periods treated as organic decay (no stockout imputation); max_decay: none removes floorMARGINAL — Sea Salt Neroli (-76%) and Pomelo (-79%) 2025 declines known to coincide with supply issues; pool includes theseMEDIUM-HIGHDANGEROUS — propagates supply-induced suppression as real demand decline; systematically understates returning demandH
A4Bayesian shrinkage with volume-weight is correctly calibrated (high-volume trust own; low-volume pull to pool)REASONABLE — consistent with standard empirical Bayes; shrinkage weight not independently backtestedMEDIUMFails safely — shrinkage is conservative in both directionsM
A5Elasticity weights (Y1:1.5 … Y5+:0.5) portable across seasons and lineup compositionsMARGINAL — no per-season fit; same weights Summer / Spring / FallLOW-MEDIUMSymmetric — small shift across fragrances within grand_totalM
A6New-launch estimate (mean of Y1 proxies × 1.066 market_growth_factor) is unbiasedMARGINAL — only Cabana + Coconut Pineapple as Summer proxies; market_growth_factor 1.066 is not documented upstreamHIGH (ρ > 20%) for Summer 2026 (Coconut Soleil = 9,609 raw)DANGEROUS if biased high (overcommit on unproven launch)H
A7Cannibalization between returning and new fragrances is ZERO (Improvement #2 deferred)UNREASONABLE for Summer 2026 — Cabana 2025 launch displaced SSN/Pomelo/CP volume; RTF §SSN/Pomelo Note confirmsMEDIUM-HIGHDANGEROUS — overestimates total portfolio demand by treating displacement as additiveH
A8Pooled decay is Normal-distributed (Z_SCORES 1.04 / 1.28 / 1.645 / 1.96 for 70/80/90/95%)REASONABLE (mild) — not tested against sample skew/kurtosis; n=23 too small for rigorous normality testLOW-MEDIUM (CI width)Symmetric — CI under/overstates both tailsL
A9fact_orders Amazon pack decomposition has already happened upstream (engine SUMs without re-decomposing)REASONABLE — confirmed in FORECASTING_RULES.md §Amazon Translation; ETL ownership establishedLOW (bounded)Safe — if ETL drifts, engine output visibly divorces from pack_defL
A10Channel aggregation (Shopify + Amazon + Faire) is acceptable for production commitsMARGINAL — Amazon FBA and Shopify 3PL have different lead times; Nikita's PO cycle may need channel splitLOW operationally (not on grand_target)Operational risk — can produce correct totals that are wrong by channel allocationM
A11Format shares (observed prior-year or proxy) represent steady-state preferenceMARGINAL — Ginger Pumpkin 2025 3-Wick production issue triggered manual override to lineup averageLOW-MEDIUMSymmetricL
A12Seasonal window is stable; launch-date drift > 30 days handled exogenouslyMARGINAL — Holiday 2025 drifted +37 days vs 2024; engine does not normalize by selling daysMEDIUM (Holiday specifically)Dangerous — short season compared to long prior makes decay look steepM
A13Cu/Co from Summer 2026 config carries forward to Fall 2026 (per Gate 1 decision)MARGINAL — working assumption; Nikita has not confirmed Fall-specific valuesMEDIUM on cost analysis results onlySymmetric on cost framingM

2.2 Assumption Dependency Graph (Section 17)

Not all assumptions are independent. Load-bearing assumptions feed multiple output paths and must be validated first regardless of individual sensitivity.

A1 (Growth target)           ──→ grand_target (all fragrances, all formats)
                             ──→ top-down scaling factor (all downstream)
                             [LOAD-BEARING — feeds every output]

A2 (Pooled decay stationary) ──→ get_decay_rate() fallback (when no override)
                             ──→ get_prediction_interval() CIs (all fragrances)
                             [LOAD-BEARING — feeds decay rates + PIs]

A3 (No stockout imputation)  ──→ A2 (pooled curve) via 2025 SSN/Pomelo transitions
                             ──→ A4 (Bayesian shrinkage pool)
                             [INTERMEDIATE — contaminates load-bearing A2]

A4 (Shrinkage calibration)   ──→ get_decay_rate() for low-volume fragrances
                             [INTERMEDIATE — depends on A2]

A5 (Elasticity weights)      ──→ top-down allocation shares (which fragrance gets scaled)
                             [ROOT]

A6 (New-launch estimate)     ──→ Summer 2026 Coconut Soleil grand_total contribution
                             [ROOT — isolated to launch fragrance]

A7 (No cannibalization)      ──→ grand_total (all fragrances when a launch is in lineup)
                             ──→ A6 (new-launch estimate implicitly treats as additive)
                             [LOAD-BEARING when a launch is active — Summer 2026]

A8 (Normal CI)               ──→ get_prediction_interval() band widths
                             [ROOT — affects PI only, not point forecast]

A11 (Format shares stable)   ──→ compute_format_shares() per-format unit outputs
                             [ROOT]

A12 (Seasonal window stable) ──→ pull_seasonal_demand() window bounds
                             ──→ observed decay rates (via window comparability)
                             [INTERMEDIATE — affects A2 indirectly]

A13 (Cu/Co carry-forward)    ──→ cost analysis only
                             [ROOT — validation-layer only]

Load-bearing assumptions (validation priority #1): A1 (growth target), A2 (pooled decay stationarity), A7 (no cannibalization — when launch active).

Intermediate (priority #2): A3 (stockout contamination), A4 (shrinkage), A12 (seasonal window).

Root (priority #3): A5, A6, A8, A11, A13. Each is isolated to a single output path.

2.3 Assumption Cost Matrix (Section 4.5)

AssumptionPlausible Error RangeOutput Impact$ Exposure (Fall 2026)$ Exposure (Summer 2026)Priority
A1 Growth target±5pp±3.3% × 32,909 = ±1,085 units (Fall); ±5% × ~75K = ±3,750 units (Summer)±$6K–$15K±$20K–$50KH
A7 No cannibalization10–25% displacement on Summer returning lineupn/a Fall (no launch)$30K–$60K DANGEROUS HIGHH
A3 Stockout contamination5–15pp steeper pooled curve bias~500–1,500 units underforecast on recovered fragrances$5K–$15K$15K–$40KH
A6 New-launch bias±25% on Coconut Soleil raw 9,609±2,400 unitsn/a Fall±$22K (over) / ±$9K (under, if demand strong)H
A2 Pooled decay drift±10pp median shift±1,500–2,000 units on raw bottom-up$5K–$18K$10K–$25KH
A12 Holiday window drift+30 day asymmetry10–20% apparent decay overstatementn/a Falln/a SummerM (Holiday only)
A13 Fall Cu/Co mismatch±20% on held-period cost differentialRe-frames optimal percentile from 72nd to 60th–80th$8K–$15Kn/a (own Cu/Co used)M
A10 Channel aggregationOperational — not on grand_targetPotential PO split imbalanceBounded by re-order flexSameM
A5 Elasticity weights±0.2 per ageReshuffles shares within grand_total<$3K<$5KL
A8 Normal CI±5pp on band widthPI width shift only$0 (decision framing)$0 (decision framing)L

Top three validation priorities sorted by $ exposure:

  1. A7 (Cannibalization — Summer only): $30K–$60K danger — untested structural assumption, UNREASONABLE rating
  2. A1 (Growth target): $20K–$50K Summer, $6K–$15K Fall — dominant driver, demand signals PARTIALLY SUPPORTED
  3. A3 (Stockout contamination): $15K–$40K Summer, $5K–$15K Fall — contaminates A2, UNREASONABLE rating on current max_decay: none policy

3. Data Integrity Assessment

3.1 Input Data Lineage

SourceViaAvailabilityKnown Issues
fact_ordersDuckDB read_parquet() from /data/parquet/591,627 rows; available_years = [2023, 2024, 2025] + partial 2026Zero/stockout periods not imputed (see Assumption A3)
dim_productsDuckDB read_parquet186 productslaunch_year and fragrance_season stability not audited this run
dim_customersDuckDB read_parquet (only via demand_signals)65,657 customersReturning-rate calc assumes MD5 customer_id stable across channels
fact_ad_spendDuckDB read_parquet (demand_signals.py)1,606 rows; Meta + Google + Amazon via Windsor MCPDemand-signal input only; not forecast input
fact_klaviyo_campaignsDuckDB read_parquet200 campaigns + 166 flowsFall 2026 demand_signals shows 0 recipients / 0 revenue in 2025 → counter-signal; confirm data is current
pack_def (Excel)Read-once referenceSheet in Forcasting definitions & translations.xlsxAmazon pack decomposition happens upstream in ETL; engine does not re-decompose

3.2 Input Guards Observed

3.3 Known Contamination

FLAG: Stockout-censored demand in 2025 pool. Sea Salt Neroli (-76.0%) and Pomelo (-79.4%) 2025 declines are documented in the RTF reference as coinciding with supply interruptions. Both are now retired; their 2024→2025 transitions are part of the pooled-curve empirical base. Current max_decay: none policy means these transitions influence the pool at their recorded steep values. Impact estimate: if 2 of ~23 age-1 transitions are contaminated steep by ~30–50pp, median shift is on the order of -3pp to -5pp, widening CI stdev by ~10%. Correction: either impute demand for stockout periods or flag transitions for exclusion from the pool.

3.4 Margin Data Gaps

Fall 2026 margin_projections.sku_margins shows null margin_per_unit on 6 of 12 SKU rows:

FragranceFormatUnitsMargin Status
Flannel + Leaves3-Wick Candle2,085NULL
Flannel + Leaves2oz Packs1,000NULL
Autumn Heirloom3-Wick Candle1,079NULL
Autumn Heirloom2oz Packs1,000NULL
Ginger Pumpkin3-Wick Candle702NULL
Ginger Pumpkin2oz Packs1,000NULL

6,866 units (20.9% of the portfolio by volume) are unpriced in the projected margin roll-up. Total reported margin of $177,478 excludes these SKUs — the true portfolio margin is materially higher. Using spray analogue pricing as a rough fill: 3-Wick Candles (3,866 units) at ~$8–$15 margin/unit → $30K–$58K; 2oz Packs (3,000 units) at proxy $3–$8/unit → $9K–$24K. Estimated missing margin: ~$40K–$80K.

4. Pipeline Validation

4.1 Layer-by-Layer Reconstruction (Fall 2026 base run)

LayerInputOutputObservedDocumentedMatch
0. Demand signal gate6 signals vs growth_target 11.5%verdict + support_scorePARTIALLY SUPPORTED, 12/18 (67%), counter: KlaviyoYes (demand_signals.py)
1. Historical pullfact_orders × dim_products, years [2023, 2024, 2025]Per (fragrance, format, year) unitshistorical block matches continuing lineup (F+L 9088, AH 8436, GP 6254 in s2025)Yes
2. Age decayhistorical + elasticity + decay_overridesraw_forecastsF+L 7516 (=9088 × 0.827), AH 6099 (=8436 × 0.723), GP 5003 (=6254 × 0.800)Yes — all override-sourced per config
3. Top-down reconciliationraw_forecasts sum=18,618; growth_target 11.5%grand_base, fragrance_totalsgrand_base=29,515; grand_target=32,909; F+L 16,961, AH 9,635, GP 6,313Partial — derivation of grand_base=29,515 from raw_sum=18,618 not directly traceable without code inspection
4. Overflow redistributioncap-bound fragrances' excessredistributed unitsoverflow_redistributed=0 (no cap bound Fall 2026)Yes — expected with uncapped Fall lineup
5. Format allocationfragrance totals + historical shares + format caps + fixed packssku_forecast12 SKU rows; 2oz packs at 1,000 (fixed); 3 fragrances × 4 formats roughlyYes
Post. Diffuser overlayforecast_output.json Car Freshener unitsDiffuser rows (additive)F+L Car=2,317, AH Car=1,565 → Diffuser mirrors bothYes (apply_diffuser_overlay.py + Fall_2026_Config 2026-04-21 changelog)
FLAG (Layer 3): observed grand_base=29,515 is 58.5% above raw bottom-up sum of 18,618. Derivation path is not directly traceable from documentation — the top-down mechanism's internal formula is inferred but not validated against code line-by-line. Documentation states grand_target = previous × (1+growth_target) but 2025 continuing-lineup total = 23,778, and 23,778 × 1.115 = 26,513, not 29,515. Possible reconciliation: grand_base includes Discovery pack units (+~2,085), retired fragrance residuals, or launch-year adjustments. Action: confirm with creator (Dan/Louis) that the Layer 3 formula is implemented as documented — captured as Action Item #2 in the action plan.

4.2 Conservation Checks

5. Statistical Diagnostics

Figure 1. Scaling Dominance Waterfall — Fall 2026. Bottom-up raw forecast (18,618) is scaled up by top-down growth target (+10,897 units) to reach grand_target (32,909). ~33% of the forecast is pure top-down uplift, not decay mechanics.
Figure 2. Parameter Sensitivity Tornado — Fall 2026. Each bar shows the estimated % change in grand_target for a plausible parameter perturbation. Growth target dominates; stockout contamination and cannibalization-related assumptions lack Fall 2026 exposure but drive Summer 2026 risk.
Figure 3. Summer 2025 Backtest — FM-01 v2 forecast vs actuals by fragrance, with 70% and 90% prediction interval bands. Source: memory-documented Summer 2025 run (MAPE 40.0% at 28.5% growth target). Procedure-only where data replay required.
Figure 4. Error Distribution — Summer 2025 Backtest. Right of zero = underforecast (lost margin direction); left = overforecast (carrying cost). Mean error (bias) is shown with a vertical line.

6. Backtesting Results

6.1 Setup

Primary holdout: Summer 2025. Documented-baseline validation values carried from memory (project_fm01_monitoring.md): MAPE 40.0% at the configured 28.5% growth target; MAPE 31.1% at the data-supported 11.91% baseline growth rate. Error attribution: ~75% traced to growth-target assumption; ~25% to distribution (decay + allocation) logic.

REPLAY REQUIREMENT: these memory-sourced values are authoritative for the validation narrative but should be reproduced by an explicit re-run of Summer 2025 backtest with decomposed error attribution artifact. Captured as Action Item #2 in the validation action plan (DO FIRST, effort Medium).

6.2 Statistical Metrics Table (Summer 2025 reconstruction)

Output UnitActual (2025)FM-01 v2 ForecastErrorMAPEBias Direction
Portfolio grand_total (Summer 2025)77,440~108,410 (at 28.5%)+30,97040.0%Overforecast
Portfolio grand_total (baseline 11.91%)77,440~101,527+24,08731.1%Overforecast
Growth-target-error share~$18K–$30K~75% of variance
Mechanics-error share~$6K–$10K~25% of variance

6.3 Naive Challenger Comparison (Summer 2025)

MetricFM-01 v2 @ 28.5%FM-01 v2 @ 11.91%Naive (prior-year actuals)
Forecast units~108,410~101,527~60,776 (Summer 2024 actual)
Actual (2025)77,44077,44077,440
MAPE40.0%31.1%~21.5%
Overforecast cost (@ Co $3.81 blended)~$118K~$92K$0
Underforecast cost (@ Cu $8.35 blended)$0$0~$139K (−16,664 units)
Total forecast cost~$118K~$92K~$139K
Lift vs naive+$21K+$47Kbaseline
FINDING (Backtest): FM-01 v2 at 11.91% baseline growth target demonstrates positive economic lift of ~$47K vs naive on Summer 2025. At 28.5% growth target, lift drops to ~$21K — the model still beats naive but growth-target optimism eats ~55% of the mechanics lift. This is the clearest Section 15 validation signal: the model earns its complexity over naive in economic terms, but only if growth target is set defensibly.

6.4 Error Attribution Decomposition

Error SourceSummer 2025 ContributionMechanism
Growth-target assumption error~75%Top-down scalar drove forecast above realized; 28.5% target vs observed +27.4% season growth — directionally close, but bottom-up raw already captured most of that growth, so scaling compounded it
Decay-rate estimation error~15%Overrides set using proxies; some overrides steeper than realized decline, some flatter
Format allocation error~5%Prior-year shares did not track Summer 2025 format mix exactly
New-launch estimation error~5%Cabana 2025 raw vs actual

7. Stress Test Results

7.1 Single-Parameter Stress (Fall 2026)

ParameterBaseWorst CaseExtremeBest CaseΔ grand_target$ Exposure (blended)
Growth target11.5%0%-10%18%±3,394 to -7,000 units$13K–$27K under / $8K–$17K over
Pooled decay median (all fragrances)-19.4%-29.4%-49.4%-9.4%-1,700 to +1,700 units~$7K–$16K either direction
Decay-rate overridesconfig values+15pp steeper each+30pp steeper-10pp flatter-2,800 to +1,870$7K–$26K under direction
Elasticity weights1.5/1.2/0.8/0.6/0.5flat 1.0inverted (high at Y5+)stronger Y1 (2.0)internal redistribution only<$3K
max_decay settingnone50%25%none (base)~+1,200 units (tighter floor lifts steepest)+$11K (as carrying cost)

7.2 Compound Scenarios

ScenarioDescriptionΔ grand_target$ ExposureFailure Threshold Breach?
Pessimistic (Fall 2026)Worst decay (each -29.4%) + bear growth (8%)-5,200 units to ~27,700~$24K lost margin if actuals track baseNo (threshold $50K for Fall)
Optimistic (Fall 2026)Best decay (-9.4%) + bull growth (15%)+3,200 units to ~36,100~$12K carrying cost if actuals track baseNo
Pessimistic (Summer 2026)Worst decay + bear growth (25%) + Coconut Soleil -50%~-15,000 units on ~75K base~$126K under-commit if demand is strongYES ($75K threshold)
Cannibalization shock (Summer 2026)15% displacement of Cabana by Coconut Soleil (A7 fails high)-1,450 units on Cabana; +0 offset (additive treatment)~$12K in overforecast carrying cost on Cabana; $0 offsetNo (but structural)
Data failureDuckDB stale by one season; engine runs on 2024 as latestUnknown — engine runs but detect_available_years() returns wrong windowIndeterminate — not detected by existing guardsPolicy gap
New-launch underperforms proxy (Summer 2026)Coconut Soleil actual = 50% of raw estimate 9,609-4,800 units on CS; overflow to returning fragrances~$45K carrying cost on overcommitted CS SKUsApproach threshold

7.3 Sensitivity Cliff Detection (Section 18)

Cliffs tested at Summer 2026 Coastal Tide cap boundary (5 oz Spray = 4,000; 3-Wick = 344; 6.5oz Candle = 98).

Cliff LocationBoundary ValueOutput Jump at ±1%$ ExposureProb. Input Lands Near
CT 5 oz Spray cap4,000 units~40 units redistributed to other CT formats then uncapped fragrances$2K–$4KMedium
CT 3-Wick cap344 units~3 units per 1% — smooth, not a cliff<$500Low
CT 6.5 oz Candle cap98 units~1 unit per 1% — smooth<$100Low
CT total cap binding5,442 units aggregateTriggers overflow redistribution layer — cliff risk if naive share allocation replaces elasticity-weighted redistribution$8K–$15KMedium

Recommendation: set CT growth-target scenario to produce CT total demand at least 5% below the 5,442 cap to avoid cliff exposure. Current Summer 2026 CT override rate of -21.5% likely holds CT below cap, but should be re-verified once the Summer run is produced.

7.4 Assumption Stress (Section 10.3 — HIGH-sensitivity)

Assumptions rated HIGH in Section 4.1 (A1, A2, A3, A6, A7) stressed to the boundary of reasonableness.

AssumptionBoundary Value$ Exposure at BoundaryUsable Output?
A1 Growth target off by -10pp (bear edge)0% growth (Fall); 15% (Summer)$13K–$27K Fall / $35K–$60K SummerYes if scenarios used for production band
A2 Pooled curve biased steep by 10ppMedian -29.4% vs -19.4%$12K–$20K underforecast Fall / $20K–$40K SummerFlag required; quantitatively usable
A3 Stockout adjustment applied (contamination removed)Pool excludes SSN + Pomelo 2025 transitionsPool median lifts ~3–5pp; +$8K–$18K in recovered demand estimate FallYes — improves forecast
A6 New-launch under by 50% (Summer only)CS actual = 4,804 vs raw 9,609$45K carrying cost on CS SKUsYes — scenario band covers
A7 Cannibalization at 25% (Summer only)Cabana loses 1,450 units; CS absorbs$30K–$45K carrying cost on Cabana overcommitNot currently produced by engine

8. Cost Analysis

8.1 Cu/Co Parameter Validation

Cu/Co values sourced from Nikita's April 2026 analysis per Summer_2026_Config.yaml §newsvendor_params. Carried forward to Fall 2026 per Gate 1 decision (2026-04-23).

FormatCu ($/unit)Co ($/unit)Critical RatioOptimal PercentileImplied Optimal vs MAPE Optimization
5 oz Spray8.353.810.68769thMAPE (symmetric) understates optimal production by ~19 pp
8 oz Candle17.334.800.78378thMAPE understates by ~28 pp
2 oz Discovery18.972.910.86787thMAPE understates by ~37 pp — most asymmetric
Car Freshener8.563.920.68669thMAPE understates by ~19 pp
6.5 oz Candle11.193.000.78979thMAPE understates by ~29 pp

Blended weighted by Fall 2026 unit mix (21,385 spray + 776 6.5oz + 3,882 Car + 3,866 3-Wick + 3,000 Discovery; 3-Wick uses 5-oz-spray proxy given absence of Cu/Co):

Blended Cu ≈ $9.41 / unit
Blended Co ≈ $3.72 / unit
Blended critical ratio ≈ 0.717 → 72nd percentile optimal production

8.2 Format-Level Cost Analysis — Asymmetry by SKU

Different formats carry very different asymmetry. 2 oz Discovery (Cu/Co ratio 6.5×) is the most underforecast-penalized format in the portfolio; 5 oz Spray and Car Freshener are closest to symmetric. Engine currently treats all formats equally at the allocation step — optimal percentile variation across formats is not surfaced in output.

8.3 Seasonal Cost Exposure Summary (Fall 2026)

Error ScenarioUnits Off (abs)Avg Unit Cost$ ExposureProbability
5% underforecast1,645Cu $9.41$15,485~30%
10% underforecast3,291Cu $9.41$30,968~15%
20% underforecast6,582Cu $9.41$61,936~5%
5% overforecast1,645Co $3.72$6,121~35%
10% overforecast3,291Co $3.72$12,242~20%
20% overforecast6,582Co $3.72$24,484~10%
Growth-target −5pp off~1,085Cu $9.41$10,210~35%
Pooled curve biased 10pp steep~1,500 underforecast on recoveredCu $9.41$14,115~40% (conditional on A3)
Expected total (prob-weighted)~$15K–$22K

8.4 Break-Even Accuracy Threshold

Naive model (prior-year actuals = 23,778 units continuing lineup on Fall 2025) would have a structural underforecast bias relative to Fall 2026 growth.

Break-even: FM-01's economic cost equals naive. Naive economic cost at Fall 2026 (if actuals are 28,000 units, halfway between naive and target): −4,222 unit underforecast × $9.41 = ~$39,720. FM-01 at same actuals (32,909 − 28,000 = +4,909 unit overforecast × $3.72 = $18,261). FM-01 is ~$21K ahead of naive at that midpoint actuals.

For FM-01 to lose vs naive, actuals would need to be at or above the grand_target itself (32,909+). Alternatively, if actuals track naive (23,778), FM-01's overforecast cost = 9,131 × $3.72 = $33,967 vs naive cost $0 → FM-01 loses by ~$34K.

Break-even MAPE: approximately 15%. Memory-documented Summer 2025 MAPE = 40.0% at configured 28.5% / 31.1% at 11.91% baseline. FM-01's historical MAPE exceeds break-even threshold — margin of safety vs naive is thin and growth-target dependent (see Section 6.3 findings).

9. Challenger Model Comparison

9.1 Applicable Challengers

ChallengerApplicabilityEffortIncluded in this run?
Naive (prior-year actuals)Mandatory for PNL_IMPACTLow (data available)Yes — above, Section 6.3
Trend-adjusted naive (prior × YoY)Q1 2026 units +5.8% YoY shows consistent trendLowProcedure only — SCHEDULE
Category-level forecastHigh — cannibalization known issue Summer 2026Low-MediumProcedure only — SCHEDULE
Prediction-interval-only modelHigh — growth-target dominance is primary failure modeLow (engine already computes)Procedure only — SCHEDULE
Expert consensus (Dan's unassisted judgment)Always for PNL_IMPACT; Dan's prior-year production decisions should be retrievableRetrieve past POs from NikitaNot run — SCHEDULE

9.2 Comparison Framework (Summer 2025 holdout — partial fill)

MetricFM-01 v2 @28.5%FM-01 v2 @11.91%NaiveTrend-adjustedPI-only (median)Expert
Forecast units~108,410~101,52760,776~69,525raw bottom-up only
MAPE40.0%31.1%~21.5%~10–13%TBD
Total forecast cost~$118K~$92K~$139K~$66K (est)TBD
Lift vs naive+$21K+$47K+$73K (est)TBD
FINDING: trend-adjusted naive on Summer 2025 may outperform FM-01 v2 at the 28.5% growth target on economic cost (estimated lift ~$73K vs FM-01's $21K). This is the highest-value challenger to validate properly — captured in SCHEDULE block #8 of the action plan. If confirmed, trend-adjusted naive becomes a simplification candidate for review.

10. Minimum Viable Accuracy (Section 16)

10.1 MVA Thresholds (confirmed Gate 2)

Output TypeRecovery WindowCost AsymmetryMVA ThresholdBreach Consequence
Returning fragrance totals (Y2+, mainline formats)Short (lock at 1–3mo lead)Under × 2.5±15%Material carrying cost or lost margin; recoverable with re-order if decay mild
New launch (Y1) totalsShortUnder × 2.5±30%Y1 structural uncertainty; band acknowledged
Format-level (5 oz Spray, 8 oz Candle)ShortUnder × 2.2–2.6±15%Format-level re-order inflexible
Capped / fixed-allocation (2 oz Discovery packs, CT capped)Short — committed at pack pourUnder × 6.5 (2oz)±10%Inventory committed and near-irrecoverable
Car Freshener / Diffuser (post-processed)ShortUnder × 2.2±25%Post-process overlay; structural uncertainty
Portfolio grand totalShortUnder × 2.5±10%Biggest absolute exposure; blended error attenuates

10.2 MVA vs Model Performance (Summer 2025 backtest)

Output TypeMVA ThresholdBacktest ErrorStatus
Portfolio grand total±10%40.0% @ 28.5% / 31.1% @ 11.91%FAIL at 28.5% / FAIL at 11.91%
Returning fragrance totals±15%Partial data — Cabana, CP, CT values not decomposed in memoryUNKNOWN
New launch totals±30%Cabana 2025 was Y1 new launch; actual vs forecast not independently retrievedUNKNOWN
Capped / fixed-allocation (CT)±10%CT Summer 2025 likely within cap; actuals-vs-forecast per CT format not decomposedUNKNOWN
Car Freshener±25%Summer 2025 had limited Car Freshener; unknownUNKNOWN
MVA FINDING: portfolio grand total fails MVA at both the configured (28.5%) and baseline (11.91%) growth rates on Summer 2025 backtest. Per skill Section 16.3, grand total cannot drive production decisions independently — scenario band (bear/base/bull) must be used, not point forecast. This is already the current practice per MC_ Decision Linkage (3); validated as the right operating mode.

11. Methodological Red Flags

RF1 — Growth-Target Dominance (Scaling Layer). ~33% of Fall 2026 grand_target and ~45% of Summer 2026 grand_target are top-down uplift above raw bottom-up forecasts. The engine's apparent mechanical sophistication (pooled curves, Bayesian shrinkage, elasticity weighting, newsvendor percentiles) is partially obscured by a single scalar parameter that dominates the output. Consequence if unresolved: the model is branded as predictive, but its accuracy is predominantly the accuracy of Dan's growth-target call. Backtest error attribution must distinguish these explicitly; MC_ artifact already flags this (Flag #3).
RF2 — Stockout Censoring in Pool (A3, Data Quality Flag). max_decay: none policy means 2025 SSN (-76%) and Pomelo (-79%) transitions enter the pooled age-1 curve at their raw recorded values. Both fragrances are documented as coinciding with supply issues. This is an unresolved contamination: the policy is statistically defensible (no arbitrary bound) but economically dangerous (propagates supply suppression as real demand decline). Consequence if unresolved: every returning fragrance inherits a steeper decay prior than reality warrants, systematically underforecasting recovered demand.
RF3 — Cannibalization Absent from Load-Bearing Assumption Graph (A7). Engine treats returning + new-launch demand as additive. Summer 2026 (Cabana Y2 × Coconut Soleil Y1 × CP Y3 × CT Y5) is the high-risk case; DuckDB now has 2023 data. Consequence if unresolved: overestimates portfolio grand_total by the displaced volume; creates carrying-cost exposure on returning fragrances when new launch cannibalizes them.
RF4 — Margin Data Completeness (Data Quality Flag). 6 of 12 Fall 2026 SKU rows have null margin_per_unit. Current $177,478 projected margin understates true portfolio margin by ~$40K–$80K. Consequence if unresolved: P&L projection tied to forecast output is misleadingly conservative; scenario band for revenue/cash planning is distorted.
RF5 — Layer 3 (Top-Down Reconciliation) Derivation Not Documentation-Complete. Observed grand_base=29,515 on Fall 2026 is 58.5% above raw bottom-up (18,618) and 24% above 2025 continuing-lineup actuals (23,778). Documentation states top-down scales to growth_target but the exact arithmetic path to grand_base is not made explicit. Consequence if unresolved: auditability gap — if Dan changes growth_target 11.5% → 13%, the expected change in grand_target is not directly predictable without code inspection.
RF6 — No Forecast Run Log. Every production commit is driven by a forecast run, but there is no persistent audit trail (config hash, timestamp, code version). forecast_output.json is overwritten by the next run. Consequence if unresolved: production decisions cannot be reproduced retroactively for disputes or error investigation; violates the "build for auditability" principle in CLAUDE.md.

12. Validation Status and Failure Criteria

12.1 Pass/Fail per Criterion

#CriterionThresholdObservedStatus
C1Documentation fidelity — code matches documented pipelineNo material mismatches affecting outputsLayers 0, 1, 2, 4, 5 match; Layer 3 derivation not fully traceable (RF5)PARTIAL PASS
C2Determinism — same inputs → same outputsRequiredNo RNG; deterministicPASS
C3Input guards — MIN, MAX, DEFAULT bounds fire correctlyGuards present and active5 guards verified (MIN_DEMAND_THRESHOLD, MIN_LAUNCH_UNITS, MAX_YOY_RATE, DEFAULT_UNCERTAINTY, detect_available_years)PASS
C4Conservation — grand_total = sum(sku_forecast)Within roundingVerified (Section 4.2)PASS
C5Economic lift vs naivePositive+$21K at 28.5%; +$47K at 11.91% (Summer 2025)PASS (conditional on growth target discipline)
C6Portfolio MAPE meets MVA (±10%)MAPE ≤ 10%31.1% at 11.91%, 40.0% at 28.5%FAIL on point forecast — mitigated by scenario band
C7Stress — compound scenario exposure ≤ 5% seasonal revenueFall $50K / Summer $75KFall within; Summer pessimistic scenario breaches $75K (~$126K)FAIL (Summer pessimistic)
C8Challenger lift — model beats trend-adjusted naivePositive economic liftEstimated NEGATIVE at 28.5% growth target (trend-adjusted est. +$73K lift); unconfirmedUNKNOWN (pending action plan #8)
C9Assumption audit — no load-bearing UNREASONABLEAll REASONABLE or MARGINALA7 (no cannibalization) rated UNREASONABLE for Summer 2026FAIL on Summer 2026; PASS on Fall 2026
C10Data integrity — no uncontrolled contaminationContamination quantified and flaggedStockout contamination (A3) unquantifiedPARTIAL (flagged but not quantified)

12.2 Overall Verdict

VERDICT: ⚠ CONDITIONALLY VALIDATED. Fall 2026 production use: continue. Scenario band (bear/base/bull) operating mode is validated; point forecast is not suitable as sole production input without the scenario framing (C6 mitigated by current practice).

Summer 2026 production use: conditional on completing DO FIRST action plan block (items 1–5) — particularly stockout quantification (#5), growth-target error attribution (#2), and Cannibalization Exploration re-open (scheduled for SCHEDULE block #6). A7 UNREASONABLE rating makes C9 a Summer 2026-specific FAIL that must be resolved before commit.

12.3 Conditions for Upgrade to VALIDATED

All of the following must hold:

  1. Action plan DO FIRST block (items 1–5) complete
  2. Summer 2026 actuals post-season: fragrance-level MAPE ≤ 25% OR mechanics-only MAPE (post error-attribution) ≤ 15%
  3. Economic cost (newsvendor regret) over Summer 2026 beats naive baseline in $ terms
  4. No hard decay trigger fires during Summer 2026 cycle
  5. A7 Cannibalization layer either implemented or explicitly scoped as scenario toggle (decision of record)

12.4 Model ROI Statement (Section 20)

Annual economic lift estimate: (naive cost − FM-01 cost) × 4 seasons. At Summer 2025 observed lift of ~$21K at configured growth, ~$47K at baseline growth: annual lift ≈ $84K–$188K.

Annual model cost estimate: config maintenance (~8 hours/season × 4 seasons × $150/hr blended = $4,800) + Dave validation overhead (~12 hours/season × 4 × $150 = $7,200) + ETL/data infrastructure (shared cost, not model-specific) = ~$12K–$20K.

Model ROI: ($84K–$188K − $12K–$20K) / ($12K–$20K) × 100% = 420%–1,480%.

Classification: ROI > 200% → Retain and invest. Model is generating significant economic value; enhancement investment (cannibalization layer, stockout adjustment, run log) is well-justified.

Sensitivity to accuracy: ROI crosses zero when economic cost matches naive — approximately MAPE = 15% break-even. Current backtest MAPE (31.1% at baseline) puts FM-01 on the correct side of break-even but with thin margin; drift to MAPE ≥ 20% would materially compress ROI.

13. Decay Clock and Re-validation Schedule (Section 19)

FieldValue
Last Validated2026-04-23
Hard Trigger ConditionsNew channel added; portfolio composition Δ > 30%; pricing change on top-5 SKU; 2 consecutive seasons of MAPE growth; stockout-adjustment layer implemented
Soft Trigger ConditionsNew data source closing known gap; Cu/Co update by Nikita; personnel change at Dan/Nikita/creator; engine version increment; launch-date drift > 30 days on top fragrance
Next Scheduled Re-validation2026-10-31 (end of Summer 2026 season)
Decay StatusFRESH (on finalization)

Monitoring Surface

Tracking per /analytics/david/validation/model_cards/FM-01.md (established 2026-04-21). Flag fragrance × format drift > 20% of point forecast during Fall 2026 season, before season end.