Adaptive A/B Testing with Customer Features
When treatment effects depend on customer features, the choice of who to test can strongly affect how quickly you learn. Sequential design treats each assignment as a decision: pick the next customer \(x^*\) (and ad assignment) to reduce uncertainty about the A vs B decision boundary as fast as possible—so you reach a reliable targeting policy with fewer samples. In many settings this reduces the classic explore/exploit tension: customers who are confidently A (or B) are low-value information-wise, so the design naturally concentrates experiments near the boundary where the opportunity cost of exploration is small.
What you are seeing
- Three views of the same state: where to test next (expected information gain), where the decision is most fragile (decision uncertainty), and what we would do right now (best choice now).
- We model conversion probability as a function of customer features (x1, x2) and which ad is shown.
- Points show past experiments; the star is the recommended next experiment \(x^*\).
Why this matters
- Standard A/B assumes one average effect. Real ads have different effects for different people.
- Uniform sampling wastes budget on profiles that teach you little.
- Adaptive design concentrates tests where the model is uncertain and where the A/B decision is sensitive.
Model details (for technical viewers)
We use a logistic interaction model: \(\eta(x,z) = x^\top\beta + z\,x^\top\gamma\), \(y \sim \mathrm{Bernoulli}(\sigma(\eta(x,z)))\). Here \(x\) denotes the feature vector \(\phi(x_1,x_2)=[1,\,x_1,\,x_2,\,x_1x_2]\). The heterogeneous A/B difference is \(\Delta(x)=\eta(x,1)-\eta(x,0)=x^\top\gamma\).
In this demo, the expected information gain view is concerned with learning the interaction weights \(\gamma\) (i.e., the feature-dependent A/B difference), not the baseline \(\beta\). Any other random quantity computable from the model can be targeted depending on our goals (e.g., a policy boundary, a threshold, or a ranking), each leading to different computed optimal designs. The decision uncertainty view focuses on uncertainty in the sign of \(\Delta(x)\) ("is A better than B here?") rather than uncertainty in the absolute conversion rate.
Simulation setup for these precomputed frames: customer features \((x_1, x_2)\in[-3,7]^2\). We start with \(N_0=10\) customers sampled from the same proposal distribution used below. Each step draws a candidate pool of 200 customers from this proposal distribution, where each coordinate is sampled independently as \(\mathcal{N}(\mu=3,\sigma=1)\) with rejection outside \([-3,7]\); after 10,000 rejected draws the implementation falls back to clamping a fresh normal draw into \([-3,7]\). The OED policy chooses \(x^*\) from this candidate pool to maximize expected information gain. Ad assignment at \(x^*\) is randomized with \(P(z=1)=0.5\); outcomes are simulated from fixed “true” parameters.
Browse the sequence
Plots are precomputed for a sequential design run (recommended next experiment at each step). Use the slider to move through the 41-frame run, and toggle between views.
Shortcuts: ←/→ step, 1/2/3 switch view.
Three views of the same state: where to test next, where the decision is fragile, and what we'd do now.
Where to test next: expected information gain · Decision uncertainty: where A vs B is fragile · Best choice now: the predicted winner
Frames show observed outcomes = 10..50 (41 frames).
From the precomputed OED policy.
From the precomputed OED policy.
Random sampling (no OED)
If you sample customers from the same fixed proposal distribution but without adaptive design (no OED), the resulting A/B recommendation can remain wrong for much longer. Random sampling often misses the few profiles that would resolve the decision quickly.
Snapshots are precomputed at N ∈ {50, 60, 70, 80, 90, 100}.
*Compared against random sampling under the same budget and the same evaluation criteria.
Key takeaway
Even in a simplified model, the optimal next test moves around. Intuition is not a reliable sampling strategy.
Contact
If you'd like to apply sequential design to your experiments, pricing tests, surveys, or labeling pipelines, email me.