Optimal experiment design

Optimal Experiment Design Demos

Adaptive data collection that picks the next most-informative sample so product, ML, and research teams can learn with fewer measurements.

What this is

Optimal experiment design provides a way to fit a statistical model with less data: start with some initial data, then iteratively choose the next measurement to reduce uncertainty about whatever quantity is most relevant for the decision.

Example: in deduplication, instead of labeling random record pairs which are almost all non-matches, we label the pairs that are nearer to the decision boundary and correlate strongly with model parameters.

Typical applications include A/B testing with covariates, record linkage, pricing experiments, adaptive surveys, and ML data labeling.

Demos

Precomputed plots and demos.

Method

A high-level view of sequential design.

Sequential design makes data collection active: choose the next measurement to maximize expected information about a target quantity of interest, accounting for sampling uncertainty and uncertainty in the model.

In practice, this is implemented by an approximation to Bayesian optimal experiment design, evaluating candidate measurements via a data acquisition objective (e.g., expected information gain about a decision boundary or treatment choice), and iterating until the remaining decision uncertainty is below a chosen tolerance.

Unlike standard A/B testing, bandits, or generic active learning, the objective here can be explicitly tied to the decision-relevant quantity of interest, and is approximately Bayesian. We don't reward uncertainty reduction in parts of the model that do not change the recommended action. We can also fit more complex models--likely ones you already use. We evaluate the acquisition function in about the time it takes to fit the model itself.

About

Background and contact.

These demos are maintained by Joseph S. Miller. For publications and technical explanations, see josephsmiller.com.

Contact

Questions, collaborations, or pilots — email me.

If you prefer a short call, include a few times that work for you.