Free A/B test calculator

How to Calculate A/B Test Sample Size

How many visitors per variant do you need to detect a real lift? The two-proportion z-test below gives you the exact number — and how many days to run the test at your current traffic.

✓ Two-proportion z-test (same as Optimizely / VWO)

✓ Days to significance at your traffic

✓ Inputs never leave your browser

✓ Shareable URL of every test plan

95 %

Standard confidence level

80 %

Standard statistical power

1.96

z-score at 95% confidence

⏱ 60 sec tutorial

Watch how to use this calculator

Watch on YouTube → · Subscribe for more

How A/B test sample size is calculated

To calculate A/B test sample size you need five inputs: baseline conversion rate, minimum detectable lift, confidence level, statistical power, and your daily visitor count per variant. The calculator uses the same two-proportion z-test that Optimizely, VWO and Convert use under the hood.

The formula

p1   = baseline                              # current conversion rate
p2   = baseline × (1 + minimum_lift)         # what you want to detect
p̄   = (p1 + p2) / 2                          # pooled probability
n    = (zα × √(2p̄(1-p̄)) + zβ × √(p1(1-p1) + p2(1-p2)))² / (p2 - p1)²

Where zα is the z-score for your confidence level (1.96 for 95 %), and zβ is the z-score for your statistical power (0.842 for 80 % power). Both are precomputed for the common pairs.

What each input means in plain English

Baseline conversion rate — current rate the variant is fighting against. Use the rate from the last 30 days, not all-time.
Minimum detectable lift — the smallest relative change worth caring about. 10 % is typical for marketing tests; 5 % for revenue tests. Setting it too small (1–2 %) makes sample size balloon.
Confidence — 1 minus the chance of a false positive. 95 % is the default; 99 % is for high-stakes decisions; 90 % is acceptable for low-risk marketing optimizations.
Power — chance of detecting a real lift if it exists. 80 % is standard; 90 % for sensitive tests where missing the lift is costly.
Daily traffic per variant — used to compute how many days to run the test. Half of total daily visitors if you split 50/50.

Common mistakes the calculator avoids

Stopping early — looking at significance before reaching sample size inflates false positives. The calculator shows the exact threshold.
Setting MDE too small — a 1 % lift on a 3 % baseline needs 100k+ visitors per variant. Most teams cannot afford that runtime.
Forgetting power — 95 % confidence with 50 % power means half your real wins will look like noise.

How to plan an A/B test with this calculator

Plan the sample size before you ship the test, not after.

Measure baseline conversion

Use the rate from the last 30 days for the surface you are testing. Do not use 90-day or all-time — it does not reflect current behavior.

Decide minimum detectable lift

Pick the smallest lift that would change your decision. 10 % relative is typical for marketing; 5 % for revenue; do not go below 5 % without 100k+ visitors.

Pick confidence and power

95 % / 80 % is standard. Go to 99 % / 90 % for high-stakes tests. Anything looser is fine for low-risk optimizations but document the decision.

Enter daily traffic per variant

Total daily visitors to the test surface, divided by number of variants. The calculator outputs days to significance.

Lock the runtime, do not stop early

Run the test for the full duration. Peeking at significance halfway inflates false positives by 2–3x. Trust the calculator.

Frequently asked questions about A/B test sample size

What confidence level should I use for A/B tests?

95 % is the industry standard and the right default for most marketing tests. 99 % for high-stakes tests (pricing, checkout flow) where a wrong call costs real money. 90 % is acceptable for low-risk optimizations like headline tweaks.

How long should an A/B test run?

Run until you hit the sample size the calculator shows — never less. Minimum runtime should also span at least one full business cycle (7 days for B2B, 14 days for SaaS with weekly buyers) so you capture day-of-week effects.

What is the difference between confidence and power?

Confidence = chance the result is not a fluke (typically 95 %). Power = chance you will detect a real lift if it exists (typically 80 %). You need both — a high-confidence, low-power test misses real wins.

Why does sample size explode for small lifts?

Sample size scales inversely with the square of the lift. Halving the detectable lift quadruples the required sample size. This is why detecting a 2 % lift on a 3 % baseline typically needs 100k+ visitors per variant — there is no shortcut.

Can I peek at the results before reaching sample size?

No. Peeking and stopping early inflates the false-positive rate from 5 % up to 25–30 % in repeated peeking. If you must monitor, use sequential testing methods (mSPRT, group sequential) — but this simple calculator assumes a fixed-horizon test.

Does the calculator handle more than 2 variants?

Not directly — this version is a two-proportion z-test (A vs B only). For multi-arm tests, increase the per-variant sample size by ~50 % to account for the Bonferroni correction across pairwise comparisons.

More marketing calculators we built

🚀

SEO ROI Calculator

12-month ramp + break-even month.

Open calculator →

💰

CAC & LTV Calculator

Unit-economics with a letter grade.

Open calculator →

📈

SaaS ARR Forecast

36-month MRR → ARR with retention.

Open calculator →

⚙️

Automation Savings

Hours + $ saved by automating a process.

Open calculator →

Need help designing and running A/B tests that actually move metrics?

We run paid CRO sprints for B2B SaaS — test design, dev build, statistical analysis. No vanity wins, only tests that change product or pricing decisions.

✓ Senior test design

✓ Dev included

✓ Power-correct analysis

✓ Free 30-min discovery call

Book a free 30-min CRO call

By submitting you agree to our Privacy Policy