← Back to Blog
Conversion Optimization

How Many Visitors Do You Need for an A/B Test?

Lena Kovácová··11 min read

Key Takeaways

  • Most A/B tests need between 1,500 and 16,000 visitors per variant — the exact number depends on your baseline conversion rate and how small a lift you want to detect.
  • Three variables drive sample size: baseline conversion rate, minimum detectable effect (MDE), and statistical confidence (usually 95%).
  • Lower baseline rates and smaller MDEs dramatically increase the visitors required. A 1% baseline needs roughly 8x the traffic of a 10% baseline at the same MDE.
  • If your traffic is too low for the lift you want to detect, either test bigger changes (larger MDE), test higher-traffic pages, or accept that some tests simply are not viable.
  • Copysplit calculates required sample size before you launch, so you know whether a test is feasible on day one instead of finding out three weeks in.

Most A/B tests need between 1,500 and 16,000 visitors per variant to produce a trustworthy result, and the exact number depends on three inputs: your current conversion rate, the size of the lift you want to detect, and how confident you want to be in the answer. A page converting at 5% that wants to detect a 20% relative lift needs roughly 1,500 visitors per variant. A page converting at 1% with the same 20% lift target needs closer to 8,000 per variant. In our experience at Copysplit, the single biggest mistake teams make is launching tests without checking whether their traffic can support the experiment at all. This guide gives you the numbers, the formula, and a reference table you can use before every test.

The three variables that determine sample size

Sample size for an A/B test is not a fixed number. It is the output of three inputs, and changing any one of them moves the answer significantly. The first is your baseline conversion rate — the rate your current page or control variant converts at. Lower baseline rates require more traffic because rare events are harder to measure precisely. Doubling 1% requires far more evidence than doubling 10%, because a 1% rate is already statistically noisy.

The second variable is your Minimum Detectable Effect (MDE) — the smallest lift you care about catching. If you only want to know about huge wins (50% relative lift), you need very little traffic. If you want to catch subtle 5% improvements, you need enormous samples. Most teams settle on 10-20% relative MDE as a practical middle ground. The third variable is confidence level, typically 95% (alpha = 0.05), which controls your false-positive rate. Raise it to 99% and your required sample roughly doubles. Together, these three inputs plug into the standard power analysis formula for two proportions, and out comes your number.

Sample size reference table

Use this table as a quick reference. All numbers assume 95% confidence, 80% statistical power, and a two-sided test on a binary conversion metric (converted vs. not converted). MDE refers to relative lift — so a 20% MDE on a 5% baseline means detecting a move to 6%.

  • Baseline 1%, MDE 10%: ~31,000 visitors per variant (~62,000 total)
  • Baseline 1%, MDE 20%: ~8,000 visitors per variant (~16,000 total)
  • Baseline 1%, MDE 50%: ~1,400 visitors per variant (~2,800 total)
  • Baseline 2%, MDE 10%: ~15,500 visitors per variant (~31,000 total)
  • Baseline 2%, MDE 20%: ~4,000 visitors per variant (~8,000 total)
  • Baseline 2%, MDE 50%: ~700 visitors per variant (~1,400 total)
  • Baseline 5%, MDE 10%: ~6,000 visitors per variant (~12,000 total)
  • Baseline 5%, MDE 20%: ~1,500 visitors per variant (~3,000 total)
  • Baseline 5%, MDE 50%: ~280 visitors per variant (~560 total)
  • Baseline 10%, MDE 10%: ~2,900 visitors per variant (~5,800 total)
  • Baseline 10%, MDE 20%: ~700 visitors per variant (~1,400 total)
  • Baseline 10%, MDE 50%: ~130 visitors per variant (~260 total)
  • Baseline 20%, MDE 10%: ~1,250 visitors per variant (~2,500 total)
  • Baseline 20%, MDE 20%: ~310 visitors per variant (~620 total)

Notice how brutally the numbers scale at low baselines and small MDEs. If your landing page converts at 1% and you want to catch a 10% relative lift, you need over 60,000 total visitors across both variants. For a team with 300 daily visitors, that is more than seven months of testing — which is rarely practical.

Sample size tells you if a test is feasible. Duration tells you when to stop. Read the companion guide to make sure you also run your test for the right amount of calendar time.

Read how long to run an A/B test →

What Minimum Detectable Effect really means

MDE is one of the most misunderstood concepts in A/B testing. It does not mean the effect my test found. It means the smallest effect my test was designed to find. If you planned for a 20% MDE and your test reaches significance showing a 25% lift, great — your design worked. If your test finishes showing a 5% lift that is not significant, that does not mean there is no effect. It means your test lacked the power to detect effects that small.

This distinction matters because underpowered tests produce a lot of false negatives — real wins that look like ties. In our experience reviewing customer experiments, roughly a third of inconclusive tests were actually detecting real but small lifts that the sample size was never going to resolve. Setting your MDE honestly before the test starts is how you avoid wasting months chasing effects too small for your traffic to reveal.

Why statistical power (80%) matters

Statistical power (1 minus beta) is the probability your test will correctly detect a real effect of your stated MDE. The industry default is 80%, which means that if the true effect really is at or above your MDE, you have an 80% chance of catching it and a 20% chance of missing it (a false negative, or Type II error). Raising power to 90% is safer but requires roughly 35% more traffic. Dropping to 70% saves traffic but raises your false-negative rate unacceptably.

Most sample size calculators — including ours — assume 80% power by default. Teams sometimes confuse power with confidence, but they are opposite sides of the coin. Confidence (95%) controls false positives: saying something won when it did not. Power (80%) controls false negatives: saying nothing happened when something did. You need both to trust your result, and both feed into the required visitor count.

What to do when your traffic is too low

Many teams discover their traffic simply cannot support the tests they want to run. We worked with a B2B SaaS team with about 300 daily visitors to their pricing page trying to detect a 5% relative lift. At their 3% baseline conversion rate, that test would need roughly 60,000 visitors per variant — over a year of testing. The test was not feasible, full stop. Pretending otherwise would just waste their time.

You have three honest options when traffic is tight. First, test bigger changes. Swapping a single word requires detecting a tiny effect, but rewriting the entire headline and value proposition can reliably produce 20%+ lifts on strong variants — which needs far less traffic to resolve. Second, prioritize your highest-traffic pages (usually the homepage or a top-of-funnel landing page) rather than deep-funnel pages with few visitors. Third, accept that some pages are not testable and move to qualitative research, session recordings, or best-practice-driven copy decisions instead.

Honest limitation: Bayesian and sequential testing methods can sometimes reach decisions with smaller samples than traditional fixed-horizon frequentist tests, especially when effects are large. Copysplit uses frequentist statistics (95% confidence), which is the industry standard and what most calculators assume. If you run a very low-traffic site (under 1,000 monthly visitors per key page), no methodology will rescue tests designed to catch small effects — the math simply does not bend that far.

Know exactly when your test has enough data to call a winner — not too early, not too late.

Read when to call a winner →

How Copysplit calculates sample size automatically

Every Copysplit experiment includes an automatic sample size estimate before you launch. You enter your current conversion rate and the minimum lift you want to catch, and we calculate the required visitors per variant at 95% confidence and 80% power. If your historical traffic cannot hit that number in a reasonable window, we flag it — so you know to either increase your MDE, pick a higher-traffic page, or skip the test entirely. No surprises three weeks in.

During the test, we track progress toward the required sample in real time. You see exactly how many more visitors you need and the estimated completion date based on current traffic pace. When you reach the target and results are significant, we surface the winner with a clear confidence interval. When the target is reached and results are not significant, we tell you honestly that the test was inconclusive rather than letting you chase noise. This is the same discipline used by professional experimentation teams, built into the tool.

Stop launching tests that cannot possibly reach significance. Copysplit calculates sample size before you run, so every experiment is set up to actually work.

Start free with Copysplit →

Common sample size mistakes

The most common mistake is picking a number that feels big enough — running until you hit 1,000 visitors because that sounds like a lot, without checking whether 1,000 actually resolves your MDE. For a 2% baseline and 20% MDE, 1,000 visitors per variant gets you nowhere near significance no matter what the results look like. The second mistake is stopping the test the moment significance appears. Peeking at results and stopping early inflates your false-positive rate dramatically — what looks like a 95% confidence winner after half the planned sample is closer to 75% real confidence.

The third mistake is recalculating your MDE after the test ends based on what you saw. If you planned for 20% MDE, caught a 7% lift, and then retroactively say well, 7% is still valuable — that test was never powered to detect 7%, and the result may not replicate. Set your MDE honestly before launch, pick your sample size, run the full test, and accept the answer you get. That is the only way A/B testing produces results you can trust at scale.

See the full list of testing mistakes that quietly ruin results — and how to avoid each one.

Read common A/B testing mistakes →

Frequently asked questions

Can I run an A/B test with only 1,000 visitors per variant?
Only if your baseline conversion rate is high (10%+) and you are looking for a large MDE (30%+). At a 10% baseline with 30% MDE, roughly 800 visitors per variant is enough at 95% confidence. For lower baselines or smaller MDEs, 1,000 visitors is not enough and the test will almost certainly return inconclusive.
What is the minimum traffic needed to run any A/B test?
Practically, you want at least 1,000 monthly visitors to the page you are testing — and ideally 5,000+. Below that, even large MDEs like 50% require months of testing to resolve. Very low traffic sites usually get better ROI from qualitative research (session recordings, user interviews) than from quantitative A/B tests.
Does sample size change if I am testing more than two variants?
Yes. Each additional variant splits traffic further and requires a multiple-comparisons correction to maintain the false positive rate. A three-variant test (control plus 2 challengers) typically needs about 30-40% more total traffic than a two-variant test to detect the same MDE with the same confidence. Two-variant tests are usually the most efficient.
Should I use relative or absolute MDE?
Relative MDE (percentage lift over baseline) is more intuitive and what most calculators use. A 20% relative MDE on a 5% baseline means detecting a move to 6% — absolute lift of 1 percentage point. Both give the same sample size mathematically; just be consistent within a single test and do not mix them up when interpreting results.
Can I reduce sample size by using Bayesian methods?
Sometimes, especially for large effects or when you have strong prior beliefs. Bayesian methods and sequential testing can reach decisions faster in some scenarios. However, they require careful setup and are harder to interpret for teams new to testing. Copysplit uses frequentist statistics (95% confidence), which is the most widely understood and audited approach.
Why do my sample size calculations differ between tools?
Most differences come from default assumptions — power (80% vs 90%), one-sided vs two-sided tests, or how MDE is defined (relative vs absolute). Always check what assumptions a calculator makes. For two-sided tests at 95% confidence and 80% power with relative MDE, results across reputable calculators should match within a few percent.

Sample size is the part of A/B testing that feels tedious but quietly decides whether your entire experimentation program is worth running. Underpowered tests waste weeks, produce inconclusive results, and erode your team trust in data. Properly sized tests — even if they take longer — give you answers you can actually act on. Use the reference table above as a sanity check before every experiment, set your MDE honestly, and respect the math even when it tells you a test is not feasible. The teams that win at CRO are not the ones running the most tests; they are the ones running tests that were set up to succeed from the start.

Copysplit calculates sample size, tracks progress, and tells you honestly when a test is ready to call. See how it works.

See how Copysplit works →

Ready to test your copy?

Stop guessing which headlines convert. Start testing with Copysplit today.

Start Free Trial →
How Many Visitors Do You Need for an A/B Test? | Copysplit