Adaptive A/B Testing with Customer Features
When treatment effects depend on customer features, who you test matters. This demo walks through a simple sequential design run: at each step, choose the next customer \(x^*\) and ad assignment to learn about the A/B decision boundary. Customers who are clearly better for A or B usually do not teach much, so the design tends to spend tests near the boundary.
What you are seeing
- Three views of the same state: where the next test would be useful (expected information gain), where the A/B choice is least settled (decision uncertainty), and what the model would choose right now.
- We model conversion probability as a function of customer features (x1, x2) and which ad is shown.
- Points show past experiments; the star is the next experiment \(x^*\) chosen by the design rule.
Why this matters
- A single average treatment effect can hide the cases where A works better for one group and B works better for another.
- Random sampling may spend many tests on customer profiles where the decision is already obvious.
- Adaptive design puts more tests where the model is uncertain and the A/B decision is sensitive.
Model details (for technical viewers)
We use a logistic interaction model: \(\eta(x,z) = x^\top\beta + z\,x^\top\gamma\), \(y \sim \mathrm{Bernoulli}(\sigma(\eta(x,z)))\). Here \(x\) denotes the feature vector \(\phi(x_1,x_2)=[1,\,x_1,\,x_2,\,x_1x_2]\). The heterogeneous A/B difference is \(\Delta(x)=\eta(x,1)-\eta(x,0)=x^\top\gamma\).
In this demo, the expected information gain view focuses on learning the interaction weights \(\gamma\) (i.e., the feature-dependent A/B difference), not the baseline \(\beta\). Any other random quantity computable from the model could be targeted depending on the goal (e.g., a policy boundary, a threshold, or a ranking), each leading to different computed optimal designs. The decision uncertainty view focuses on uncertainty in the sign of \(\Delta(x)\) ("is A better than B here?") rather than uncertainty in the absolute conversion rate.
Simulation setup for these precomputed frames: customer features \((x_1, x_2)\in[-3,7]^2\). We start with \(N_0=10\) customers sampled from the same proposal distribution used below. Each step draws a candidate pool of 200 customers from this proposal distribution, where each coordinate is sampled independently as \(\mathcal{N}(\mu=3,\sigma=1)\) with rejection outside \([-3,7]\); after 10,000 rejected draws the implementation falls back to clamping a fresh normal draw into \([-3,7]\). The OED policy chooses \(x^*\) from this candidate pool to maximize expected information gain. Ad assignment at \(x^*\) is randomized with \(P(z=1)=0.5\); outcomes are simulated from fixed “true” parameters.
Browse the sequence
The plots are precomputed from one sequential design run. Use the slider to move through the 41 frames, and switch views to understand why points were chosen.
Shortcuts: ←/→ step, 1/2/3 switch view.
Three views of the same state: where to test next, where the A/B choice is unsettled, and what the model would choose now.
Where to test next: expected information gain · Decision uncertainty: where A vs B is least settled · Best choice now: the predicted winner
Frames show observed outcomes = 10..50 (41 frames).
From the precomputed OED policy.
From the precomputed OED policy.
Random sampling (no OED)
If you sample customers from the same fixed proposal distribution but without adaptive design (no OED), the resulting A/B recommendation can stay wrong for longer. In this run, random sampling often misses the profiles that would clarify the boundary.
Snapshots are precomputed at N ∈ {50, 60, 70, 80, 90, 100}.
*Compared against random sampling under the same budget and the same evaluation criteria.
Key takeaway
Even in this simplified model, the useful next test moves around. I would not want to pick these samples by intuition alone.
Contact
If you're thinking about sequential design for experiments, pricing tests, surveys, or labeling pipelines, email me.