Optimal experiment design

Optimal Experiment Design Demos

Adaptive data collection that picks the next most-informative sample, so product, ML, and research teams can learn with fewer measurements.

Some demos are fully static (precomputed plots). Others are interactive and require a backend service; each demo indicates its status.

Featured demo: Adaptive A/B testing with customer features

What this is

Optimal experiment design provides a way to fit a statistical model with less data: start with some initial data, then iteratively choose the next measurement to reduce uncertainty about the quantity that matters for a decision.

The emphasis is not exploration for its own sake, but learning the specific quantity that drives a decision as quickly as possible.

Example: in deduplication, instead of labeling random record pairs, we label the pairs that most clarify cluster membership.

Typical applications include A/B testing with covariates, record linkage, pricing experiments, adaptive surveys, and ML data labeling.

Demos

Precomputed plots and interactive demos.

Method

A high-level view of sequential design.

Sequential design makes data collection active: choose the next measurement to maximize expected information about a target quantity of interest, accounting for sampling uncertainty and uncertainty in the model.

In practice, this is implemented by an approximation to Bayesian optimal experiment design, evaluating candidate measurements via a data acquisition objective (e.g., expected information gain about a decision boundary or treatment choice), and iterating until the remaining decision uncertainty is below a chosen tolerance.

Unlike standard A/B testing, bandits, or generic active learning, the objective here can be explicitly tied to the decision-relevant quantity of interest, and is approximately Bayesian. We don't reward uncertainty reduction in parts of the model that do not change the recommended action. We can also fit more complex models--likely ones you already use. We evaluate the acquisition function in about the time it takes to fit the model itself.

Hosted vs local demos

The A/B demo is fully static (precomputed plots). The record linkage demo runs against a hosted backend API so visitors can upload data directly.

If you need the system to run locally (e.g., for privacy or infrastructure constraints), that is typically handled as part of an engagement. Email joe@josephsmiller.com and I can suggest an appropriate deployment option.

About

Background and contact.

These demos are maintained by Joseph S. Miller. For publications and technical notes, see josephsmiller.com.

Privacy note: when you run the demos locally, your files stay on your machine.

Contact

Questions, collaborations, or pilots — email me.

If you prefer a short call, include a few times that work for you.