A/B Test Designer

Name: A/B Test Designer
Rating: 4.5 (7800 reviews)
Author: experiment-tools

Design A/B tests with proper sample sizes and statistical power

Intermediate30 min

data/ExperimentationUpdated Feb 16, 2026

Data ScientistProduct Manager

terminal

claude install community/ab-test-designer

This install command is a placeholder. The real source repo for this skill is still being verified.

by experiment-tools

About this skill

Experiment design assistant: calculate required sample sizes, estimate test duration, define primary and guardrail metrics, and produce experiment specifications.

Capabilities

data analysis
file editing

Use cases

A/B test design
Sample sizing
Experiment planning

Skill Definition

This is the actual SKILL.md file that powers this skill. Copy it to install.

SKILL.md

---
name: ab-test-designer
description: |
  Design statistically valid A/B tests: calculate sample size, estimate
  duration, pick primary and guardrail metrics, and output a ready-to-run
  experiment spec. Trigger on "A/B test", "experiment design", "sample
  size", "statistical power", "ship an experiment".
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(python3 *)
---

# A/B Test Designer

Produce experiment specifications that survive statistical review. The
output: a document that states the hypothesis, the primary metric, the
minimum detectable effect, required sample size, duration, and guardrails.

## Prerequisites

- Baseline conversion rate or mean for the primary metric
- Traffic volume estimate (daily users or events per arm)
- Team alignment on what a meaningful lift looks like

## Steps

1. **Write the hypothesis in one sentence.** Form: "Changing X will
   increase Y by at least Z percent because of mechanism M." Vague
   hypotheses produce vague experiments.

2. **Pick a single primary metric.** Conversion rate, revenue per
   visitor, or retention at day 7. Everything else is secondary or a
   guardrail.

3. **Define guardrails.** Metrics that must NOT regress: page load
   time, error rate, unsubscribe rate. Set non-inferiority margins.

4. **Calculate sample size.** For a two-proportion test at 80 percent
   power and alpha 0.05:
   ```python
   from statsmodels.stats.power import NormalIndPower
   from statsmodels.stats.proportion import proportion_effectsize
   effect = proportion_effectsize(baseline + mde, baseline)
   n = NormalIndPower().solve_power(effect, power=0.8, alpha=0.05)
   ```
   Round up. This is per arm.

5. **Estimate duration.** Sample size divided by daily traffic per arm,
   rounded up to full weeks to avoid day-of-week bias. Minimum one week.

6. **Define the analysis plan before launch.** Exact statistical test,
   segmentation cuts, and stopping rules. Lock it. Peeking without
   pre-registration inflates false positive rates.

7. **Write the experiment spec.** Hypothesis, metric, MDE, sample size,
   duration, guardrails, analysis plan, owner, end date. Share for
   review before flipping the flag.

## Anti-patterns

- Running until the result looks good (peeking)
- Measuring every metric and cherry-picking the significant one
- Ending early because one arm has a hot first day
- Ignoring novelty effects on changes visible to existing users

## Output

- One-page experiment spec in Markdown
- Sample size and duration calculation (reproducible script)
- Primary metric + guardrails defined before launch
- Locked analysis plan with explicit stopping rules

How to install

Create the skill directory and file

terminal

mkdir -p ~/.claude/skills/ab-test-designer

Copy the SKILL.md content above

file path

~/.claude/skills/ab-test-designer/SKILL.md

Resulting file structure:

~/.claude/
  skills/
    ab-test-designer/
      SKILL.md    <-- skill definition

Skills are loaded automatically by Claude Code when you start a new session. The skill name and description in the frontmatter determine when Claude triggers it.

Trace your skill's performance

Local observability dashboard for AI tool usage

Related skills

Recommended from shared domain, career, and tool overlap with A/B Test Designer

Browse more →

53% match

Claude CodeUniversal