Forecast Accuracy Metrics: 2026 Practitioner Guide

WMAPE, Bias, MASE, CRPS - which forecast accuracy metrics reflect real business impact? Formulas, benchmarks, and dashboard mistakes to avoid.

8 min readProspeo Team

Forecast Accuracy Metrics: Which Ones Matter and Which Ones Lie

Your VP sees 82% forecast accuracy on the dashboard and calls it a win. Meanwhile, stockouts rose 15% last quarter and the warehouse is drowning in safety stock for products nobody ordered. The number isn't wrong - it's measuring the wrong thing at the wrong level.

Most teams pick their forecast accuracy metrics once, never revisit them, and spend years optimizing a score that doesn't reflect operational reality. That's expensive.

What You Need (Quick Version)

Track WMAPE and Bias. WMAPE gives you volume-weighted error that reflects actual business impact - a miss on your top SKU matters more than a miss on a product you sell three units of per month. Bias tells you whether you're systematically over- or under-forecasting, the kind of error that compounds into real inventory and revenue problems. Everything else below is a refinement for specific contexts.

What Is Forecast Accuracy?

Forecast accuracy measures how close predictions land to what actually happened. Error is the raw gap: e = y - ŷ. Accuracy is typically 1 minus some error percentage. Bias is the directional tendency - consistently too high or too low?

Always evaluate on out-of-sample data. A model that fits history perfectly but can't predict next month is useless. Out-of-sample evaluation tells you how well a model actually forecasts; in-sample fit tells you how well it memorized the past.

Core Formulas and When to Use Each

MAE and RMSE

MAE (Mean Absolute Error) = mean(|e|). It's the average size of your misses, ignoring direction. RMSE = √mean(e²), which penalizes large errors more heavily because it squares them first.

Visual overview of core forecast accuracy metric formulas
Visual overview of core forecast accuracy metric formulas

Minimizing MAE pushes your forecast toward the median outcome. Minimizing RMSE pushes toward the mean. If occasional big misses are catastrophic - perishable goods, capacity-constrained manufacturing - RMSE is the better guardrail. For general tracking, MAE is simpler and more interpretable.

MAPE - And Why It Misleads

MAPE = mean(|actual - forecast| / |actual|) × 100%.

When actual demand is zero, MAPE is undefined. When demand is low, a small absolute error produces an absurd percentage: a product selling 2 units with a forecast of 4 registers 100% error for a 2-unit miss, while a product selling 1,000 units with a 50-unit miss registers 5%. That's not a rounding issue. That's a metric lying to you about where the real problems are.

Hyndman's textbook flags another problem: MAPE penalizes negative errors more than positive ones, creating asymmetric distortion most practitioners never notice. If your demand includes any low-volume items, MAPE will mislead you.

sMAPE - Skip This One

sMAPE was designed to address MAPE's issues but introduced worse problems. Hyndman's textbook (citing Hyndman & Koehler, 2006) notes it's still unstable near zero and can even produce negative values - nonsensical for an error metric. Any tool defaulting to sMAPE is using a broken metric. Move on.

WMAPE - The Practitioner Standard

WMAPE = Σ|actual - predicted| / Σ|actual| × 100%

MAPE vs WMAPE divergence on same dataset
MAPE vs WMAPE divergence on same dataset

This is the metric that reflects business impact. Instead of averaging percentage errors equally across all items, WMAPE weights by actual volume. High-volume products drive the score.

The divergence is dramatic. On the same dataset, MAPE can show 36.7% error while WAPE shows just 5.9% - a single low-volume day with 100% error inflates MAPE while barely moving the volume-weighted number.

52% of demand planning professionals use WMAPE as their primary metric. It's the standard in CPG, retail, and multi-SKU environments. If you're picking one metric, this is it.

Quick Python calculation:


wmape = np.sum(np.abs(actual - forecast)) / np.sum(np.abs(actual)) * 100

One refinement worth considering: standard WMAPE weights by volume, but you can weight by revenue or margin instead. A 20% miss on a high-margin SKU costs more than a 20% miss on a commodity item at the same volume.

WMAPE is symmetric - it doesn't distinguish over-forecasting (excess inventory) from under-forecasting (lost sales). That's why you pair it with Bias.

Bias / Mean Forecast Error

Bias = mean(forecast - actual).

Positive bias means consistent over-forecasting. Negative means under-forecasting. Here's the thing: even a small persistent bias can cost more than larger random error because bias compounds. Random errors cancel out over time. Bias doesn't - it systematically pushes inventory in the wrong direction, period after period.

Track bias alongside WMAPE using a tracking signal (cumulative forecast error ÷ MAE). When the signal exceeds your threshold, something structural has changed.

MASE - For Cross-Series Comparison

MASE divides your forecast's MAE by the in-sample MAE of a naive benchmark. Below 1.0 means you're beating the naive approach. Above 1.0 means a simple "repeat last period" forecast would've been better.

MASE solves the cross-series comparison problem that plagues percentage metrics. You can meaningfully compare a MASE score for a product selling 50 units/month against one selling 50,000 - something MAPE can't do. Hyndman recommends MASE as the primary alternative to MAPE for this reason.

Metric Comparison Table

Metric Formula Best For Avoid When Notes
MAE mean(|e|) General tracking Need % context Scale-dependent
RMSE √mean(e²) Penalizing big misses Outlier-prone data Sensitive to spikes
MAPE mean(|e/y|)×100 Stable, high-vol items Low/zero demand Distorts at low vol
sMAPE 200×mean(|e|/(y+ŷ)) Nothing - avoid Always Broken near zero
WMAPE Σ|e|/Σ|y|×100 Multi-SKU portfolios Single-item tracking 52% adoption
Bias mean(ŷ-y) Directional error Need magnitude only Pair with WMAPE
MASE MAE / naive MAE Cross-series comparison No historical data <1.0 = beats naive

Probabilistic Metrics: CRPS and Pinball Loss

Point forecasts give you a single number. Probabilistic forecasts give you a distribution - and point metrics can't evaluate distributions properly. We've seen teams where MAE actually rewarded conservative, variance-collapsing forecasts even when the tails were what mattered for inventory decisions.

CRPS is a proper scoring rule for probabilistic forecasts. It compares the forecast distribution (via its CDF) to what actually happened, and lower is better. When your forecast collapses to a point estimate, CRPS reduces to MAE.

Pinball loss evaluates specific quantiles. At α=0.95, it penalizes under-prediction more than over-prediction - useful when you're setting safety stock at the 95th percentile. The asymmetric weighting lets you tune the penalty to match your actual cost structure.

Intermittent Demand

Spare parts, long-tail retail, specialty chemicals - these share a pattern: long stretches of zero demand punctuated by occasional spikes. MAPE is undefined when actuals are zero. MAE and RMSE can reward an always-zero forecast because the median demand is often zero. Statistically optimal, operationally useless.

MAAPE uses an arctangent transformation to damp extreme percentage errors and handles zeroes gracefully. For probabilistic intermittent models, CRPS works well. But for intermittent demand, always pair any forecast metric with inventory outcome metrics - cycle service level, fill rate, stockout frequency. The forecast metric tells you how good the prediction is. The inventory metric tells you whether it matters.

Prospeo

Forecast accuracy starts upstream. If your pipeline is built on unverified contacts and stale data, every revenue forecast inherits that error. Prospeo's 98% email accuracy and 7-day data refresh cycle eliminate the dirty-data bias that compounds into missed forecasts quarter after quarter.

Clean inputs, accurate forecasts. Start with data you can trust.

The Aggregation Trap

Your exec dashboard shows 88% accuracy at the portfolio level. Green across the board. But at the SKU-location level, half your products are over-forecast by 30% and the other half are under-forecast by 25%. The overs and unders cancel out, producing a number that looks great while warehouses are simultaneously overstocked and out of stock.

Aggregation trap showing hidden errors behind green dashboards
Aggregation trap showing hidden errors behind green dashboards

We've watched this exact scenario play out at three different companies - the dashboard was green, the warehouse was on fire.

This is a common complaint on r/supplychain about forecast accuracy reporting. The fix: measure accuracy at the level where you make decisions. If you replenish at the SKU-DC level, measure there. Roll up for reporting, but never let the rollup replace the operational metric.

Benchmarks by Context

A single accuracy target like "85%" is meaningless without context.

Forecast accuracy benchmarks by product context and industry
Forecast accuracy benchmarks by product context and industry
Context Error Range Notes
Stable staples 5-15% WMAPE Predictable, high-vol
Promotional items 25-40% WMAPE Timing/lift uncertainty
Seasonal products 20-35% WMAPE Pattern + amplitude
Spare parts 40-60%+ WMAPE Intermittent; use CSL
CPG (survey avg) ~39% error Cross-category
Chemicals (survey avg) ~36% error Less promo lift

For B2B SaaS revenue forecasting, Fullcast's benchmarks frame it differently: 80-85% is acceptable, 85-95% is good, 95%+ is world-class. Their data also shows that even after quotas dropped 13.3%, nearly 77% of sellers still missed quota.

Let's be honest: if your forecast accuracy looks great across the board, you're almost certainly measuring at the wrong level of aggregation. Segment before setting targets - a blended target across stable and volatile products guarantees you'll over-invest where it's easy and under-invest where it matters.

Choosing the Right Metric

If you track only two metrics: WMAPE + Bias. That covers magnitude and direction for the vast majority of contexts.

Decision tree for choosing the right forecast accuracy metric
Decision tree for choosing the right forecast accuracy metric

Layer from there based on your situation. For multi-SKU portfolios with mixed volumes, WMAPE is non-negotiable. When you need to compare accuracy across different-scale series, add MASE. If your models output probability distributions, CRPS is the right evaluation tool. For teams setting safety stock at specific service levels, pinball loss at your target quantile gives you the asymmetric penalty structure you need. Intermittent demand? Drop MAPE entirely and use MAAPE or pair MAE with service metrics. Revenue and sales forecasting teams should track accuracy at the deal or segment level alongside bias.

Three well-understood metrics beat seven that nobody on the team can explain to the VP.

Common Mistakes

Forecasting sales instead of unconstrained demand. This is the single worst forecasting practice. If you had stockouts, sales data understates true demand. Forecast demand signals, not constrained shipments.

Chasing accuracy points instead of fixing bias. A team reducing WMAPE from 22% to 20% while ignoring a persistent 8% over-forecast bias is optimizing the wrong thing. Bias reduction delivers more operational value per hour of analyst time.

Building forecasts on stale data. Your forecast is only as good as your inputs. In revenue forecasting, that means CRM data - contact records, deal stages, pipeline values - needs to reflect reality. If your CRM records haven't been verified in months, pipeline forecasts are wrong before you pick a metric. Tools like Prospeo that refresh contact data on a 7-day cycle help keep those inputs current so forecast models aren't working with garbage from day one. If you're evaluating platforms, start with sales forecasting solutions and compare against your pipeline health metrics.

Replanning to hit fixed targets. Constantly adjusting forecasts to match a fixed accuracy target amplifies variability and feeds the bullwhip effect. Plan with ranges, not point targets.

Improving Quarterly Results

Quarterly forecast accuracy deserves its own focus because it's the cadence most leadership teams use for planning and performance reviews. The challenge is that quarterly windows are long enough for multiple demand shifts to occur but short enough that a single bad month can tank the number.

To improve quarterly results, decompose the quarter into monthly or weekly buckets, identify which periods drove the miss, and correct at that granularity. Rolling quarterly reviews - where you evaluate the trailing 13 weeks every month rather than waiting for the quarter to close - give you earlier signals and more chances to course-correct before the number is locked.

Tracking Accuracy in Practice

Archive every forecast version - statistical baseline, planner-adjusted, and sales override. This lets you measure where human adjustments add value versus where they introduce noise. Forecast Pro's implementation guide calls this the foundation of any serious accuracy program.

Build waterfall reports showing actuals alongside archived forecasts at different lead times. A forecast made 12 weeks out should be less accurate than one made 4 weeks out - if it isn't, your short-range process is broken. Align your forecast horizon to your actual lead time. Measuring 1-week-ahead accuracy when procurement lead time is 12 weeks is measuring the wrong thing.

Evaluate out-of-sample, always. And treat data quality as a prerequisite, not an afterthought - the best forecast accuracy metrics in the world can't fix garbage inputs. If your process depends on CRM hygiene, use a data enrichment workflow and keep a clean contact management layer so your pipeline inputs stay current.

Prospeo

You wouldn't track WMAPE on a dataset full of duplicates and dead contacts. Prospeo's 5-step verification removes spam traps, honeypots, and catch-all guesses - so the pipeline data feeding your forecast models reflects real buyers, not noise. Teams using Prospeo book 26% more meetings than ZoomInfo users.

Your forecast is only as accurate as your contact data. Fix the source.

FAQ

What's a good forecast accuracy percentage?

For stable staples, target 90%+ (under 10% WMAPE). Promotional or seasonal items typically land at 60-75%. Intermittent demand at 50% can be excellent. B2B SaaS revenue forecasts target 85-95% as "good" per Fullcast's 2026 benchmarks. Always segment by product type and demand volatility before setting targets.

What's the difference between MAPE and WMAPE?

MAPE averages percentage errors equally, so one low-volume SKU with 200% error dominates the result. WMAPE weights by actual volume - on the same dataset, MAPE can show 36.7% while WMAPE shows 5.9%. For multi-SKU portfolios, WMAPE reflects real business impact; MAPE almost never does.

How do you measure forecast accuracy for intermittent demand?

Standard metrics like MAPE fail because demand is mostly zero. Use MAAPE for a percentage-like measure that handles zeroes, or pair MAE with inventory outcomes like cycle service level and fill rate. Never use a metric that rewards an always-zero forecast - that's statistically optimal but operationally useless.

How does data quality affect forecast accuracy?

Pipeline and revenue forecasts built on CRM records with bounced emails and outdated job titles produce unreliable outputs regardless of model sophistication. Regular data enrichment - refreshing contact records, verifying emails, updating job titles - is a prerequisite, not an afterthought. If your inputs are wrong, no metric or model can save you.

B2B Data Platform

Verified data. Real conversations.Predictable pipeline.

Build targeted lead lists, find verified emails & direct dials, and export to your outreach tools. Self-serve, no contracts.

  • Build targeted lists with 30+ search filters
  • Find verified emails & mobile numbers instantly
  • Export straight to your CRM or outreach tool
  • Free trial — 100 credits/mo, no credit card
Create Free Account100 free credits/mo · No credit card
300M+
Profiles
98%
Email Accuracy
125M+
Mobiles
~$0.01
Per Email