Adaptive A/B testing

Adaptive A/B Testing with Customer Features

When treatment effects depend on customer features, the choice of who to test can strongly affect how quickly you learn. Sequential design treats each assignment as a decision: pick the next customer \(x^*\) (and ad assignment) to reduce uncertainty about the A vs B decision boundary as fast as possible—so you reach a reliable targeting policy with fewer samples. In many settings this reduces the classic explore/exploit tension: customers who are confidently A (or B) are low-value information-wise, so the design naturally concentrates experiments near the boundary where the opportunity cost of exploration is small.

What you are seeing

Three views of the same state: where to test next (expected information gain), where the decision is most fragile (decision uncertainty), and what we would do right now (best choice now).
We model conversion probability as a function of customer features (x1, x2) and which ad is shown.
Points show past experiments; the star is the recommended next experiment \(x^*\).

Why this matters

Standard A/B assumes one average effect. Real ads have different effects for different people.
Uniform sampling wastes budget on profiles that teach you little.
Adaptive design concentrates tests where the model is uncertain and where the A/B decision is sensitive.

Model details (for technical viewers)

We use a logistic interaction model: \(\eta(x,z) = x^\top\beta + z\,x^\top\gamma\), \(y \sim \mathrm{Bernoulli}(\sigma(\eta(x,z)))\). Here \(x\) denotes the feature vector \(\phi(x_1,x_2)=[1,\,x_1,\,x_2,\,x_1x_2]\). The heterogeneous A/B difference is \(\Delta(x)=\eta(x,1)-\eta(x,0)=x^\top\gamma\).

In this demo, the expected information gain view is concerned with learning the interaction weights \(\gamma\) (i.e., the feature-dependent A/B difference), not the baseline \(\beta\). Any other random quantity computable from the model can be targeted depending on our goals (e.g., a policy boundary, a threshold, or a ranking), each leading to different computed optimal designs. The decision uncertainty view focuses on uncertainty in the sign of \(\Delta(x)\) ("is A better than B here?") rather than uncertainty in the absolute conversion rate.

Simulation setup for these precomputed frames: customer features \((x_1, x_2)\in[-3,7]^2\). We start with \(N_0=10\) customers sampled from the same proposal distribution used below. Each step draws a candidate pool of 200 customers from this proposal distribution, where each coordinate is sampled independently as \(\mathcal{N}(\mu=3,\sigma=1)\) with rejection outside \([-3,7]\); after 10,000 rejected draws the implementation falls back to clamping a fresh normal draw into \([-3,7]\). The OED policy chooses \(x^*\) from this candidate pool to maximize expected information gain. Ad assignment at \(x^*\) is randomized with \(P(z=1)=0.5\); outcomes are simulated from fixed “true” parameters.

Browse the sequence

Plots are precomputed for a sequential design run (recommended next experiment at each step). Use the slider to move through the 41-frame run, and toggle between views.

Step

Step 0 of 0

Shortcuts: ←/→ step, 1/2/3 switch view.

Three views of the same state: where to test next, where the decision is fragile, and what we'd do now.

Where to test next: expected information gain · Decision uncertainty: where A vs B is fragile · Best choice now: the predicted winner

Outcomes observed (cumulative)

Frames show observed outcomes = 10..50 (41 frames).

Next customer (x*)

From the precomputed OED policy.

Ad to show at x*

From the precomputed OED policy.

Random sampling (no OED)

If you sample customers from the same fixed proposal distribution but without adaptive design (no OED), the resulting A/B recommendation can remain wrong for much longer. Random sampling often misses the few profiles that would resolve the decision quickly.

Show snapshot (N samples)

N = 50

Snapshots are precomputed at N ∈ {50, 60, 70, 80, 90, 100}.

*Compared against random sampling under the same budget and the same evaluation criteria.

Key takeaway

Even in a simplified model, the optimal next test moves around. Intuition is not a reliable sampling strategy.

Contact

If you'd like to apply sequential design to your experiments, pricing tests, surveys, or labeling pipelines, email me.

Email Back to demos