Bayesian A/B Test Calculator

Bayesian A/B Test Calculator

Find out the probability that your variant beats control — no p-values needed

A Control

B Variant

Results

Control (A) Variant (B)

What Is Bayesian A/B Testing?

Bayesian A/B testing uses probability to answer a direct question: “What is the chance that Variant B is better than Control A?” Instead of binary pass/fail (like traditional p-values), you get a probability and a range of plausible effect sizes.

The approach starts with a neutral assumption (called a prior) and updates it with your experimental data to produce a posterior distribution — a full picture of what each variant’s true conversion rate might be. By comparing thousands of samples from these distributions, we calculate the probability that one variant outperforms the other.

🎯

Probability to Win

The chance that B’s true conversion rate is higher than A’s. Above 95% is strong evidence; 90-95% is suggestive; below 90% means you need more data.

📈

Expected Lift

The most likely relative improvement of B over A. A lift of +15% means B is expected to convert 15% better than A in the long run.

Risk (Expected Loss)

If you pick the wrong variant, how much conversion rate would you lose on average? Lower risk means a safer decision. Under 0.1% is typically acceptable.

📊

Credible Interval

The 95% range for the true lift. If the interval is [+5%, +25%], you can be 95% confident the real improvement falls within that range.

When to Use Bayesian vs Frequentist Testing

Both approaches are valid — but they answer different questions and work best in different situations.

🔢 Frequentist (Classical)

  • Answers: “Is there a statistically significant difference?”
  • Requires fixed sample size calculated upfront
  • Cannot peek at results early without inflating error rate
  • Gives p-values and confidence intervals
  • Best for: regulated industries, strict testing protocols

🎲 Bayesian

  • Answers: “What is the probability B is better?”
  • Works with any sample size — updates continuously
  • Safe to check results at any time (no peeking problem)
  • Gives probability of winning and expected loss
  • Best for: marketing, product teams, iterative testing
AspectFrequentistBayesian
Main outputp-value (reject/fail to reject)Probability of winning (0-100%)
Sample sizeMust be pre-determinedFlexible — update as data arrives
Early stoppingInflates false positive rateSafe to stop anytime
Interpretation“Results unlikely under null hypothesis”“95% chance B is better than A”
Decision-makingSignificant or notProbability + expected loss

Frequently Asked Questions

Most teams use 95% as their threshold, which is analogous to the 5% significance level in frequentist testing. For lower-risk decisions (like button color changes), 90% can be sufficient. For high-stakes changes (pricing, checkout flow), consider waiting for 99%.

It depends on your baseline conversion rate and the minimum lift you want to detect. As a rough guide: to detect a 10% relative lift on a 5% conversion rate, you need roughly 5,000-10,000 visitors per variant. Smaller lifts or lower conversion rates require larger samples.

Yes — this is a key advantage of Bayesian testing. Unlike frequentist methods, where early peeking inflates your false positive rate, Bayesian probabilities are valid at any point. Just keep in mind that early results with small samples will be less stable.

Expected loss (risk) is the average conversion rate you would sacrifice if you pick the wrong variant. For example, a risk of 0.05% means that even in the worst case, choosing this variant would only cost you 0.05 percentage points in conversion rate. Most teams accept risk under 0.1%.

This calculator uses a uniform Beta(1,1) prior, which treats all conversion rates between 0% and 100% as equally likely before seeing data. This is called an “uninformative” prior. With even moderate sample sizes (100+ per variant), the prior has negligible effect on results.

No. All calculations run entirely in your browser using Monte Carlo simulation with 100,000 samples. Your experiment data never leaves your device. You can verify this by using the tool with your network disconnected.