Skip to content

Commit 734b50d

Browse files
Balandatfacebook-github-bot
authored andcommitted
Docs on t/q batch mode decorators (#70)
Summary: Adds some explanation for these (we should still consider getting rid of the q-batch one) Pull Request resolved: #70 Reviewed By: eytan Differential Revision: D14993807 Pulled By: Balandat fbshipit-source-id: 3f0de22c16a5a889877049fde8ca6eff9ff4cfd3
1 parent d78c11f commit 734b50d

File tree

6 files changed

+179
-102
lines changed

6 files changed

+179
-102
lines changed

docs/acquisition.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ stochastic optimization methods.
5959
![resampling_fixed](assets/EI_resampling_fixed.png)
6060

6161
If the base samples are fixed, the problem of optimizing the acquisition function
62-
is deterministic, allowing for conventional quasi-second order methods to be used
63-
(e.g., `L-BFGS` or sequential least-squares programming `SLSQP`). These have
62+
is deterministic, allowing for conventional quasi-second order methods such as
63+
L-BFGS or sequential least-squares programming (SLSQP) to be used. These have
6464
faster convergence rates than first-order methods and can speed up acquisition
6565
function optimization significantly.
6666

docs/batching.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
---
2+
id: batching
3+
title: Batching
4+
---
5+
6+
botorch makes frequent use of "batching", both in the sense of batch acquisition
7+
functions for multiple candidates as well as in the sense of parallel or batch
8+
computation (neither of these should be confused with mini-batch training).
9+
Here we explain some of the common patterns you will see in botorch for
10+
exploiting parallelism, including common shapes and decorators for more
11+
conveniently handling these shapes.
12+
13+
14+
## Batch Acquisition Functions
15+
16+
botorch supports batch acquisition functions that assign a joint utility to a
17+
set of $q$ design points in the parameter space. These are, for obvious reasons,
18+
referred to as q-Acquisition Functions. For instance, botorch ships with support
19+
for q-EI, q-UCB, and a few others.
20+
21+
As discussed in the
22+
[design philosophy](design_philosophy#batching-batching-batching),
23+
botorch has adopted the convention of referring to batches in the
24+
batch-acquisition sense as "q-batches", and to batches in the torch
25+
batch-evaluation sense as "t-batches".
26+
27+
Internally, q-batch acquisition functions operate on input tensors of shape
28+
$b \times q \times d$, where $b$ is the number of t-batches, $q$ is the number
29+
of design points to be considered concurrently, and $d$ is the dimension of the
30+
parameter space. Their output is a one-dimensional tensor with $b$ elements,
31+
with the $i$-th element corresponding to the $i$-th t-batch. Always requiring a
32+
explicit t-batch dimension makes it much easier and less ambiguous to work with
33+
samples from the posterior in a consistent fashion.
34+
35+
#### Batch-mode decorators
36+
37+
In order to simplify the user-facing API for evaluating acquisition functions,
38+
botorch implements the
39+
[`@t_batch_mode_transform`](../api/utils.html#botorch.utils.transforms.t_batch_mode_transform)
40+
and
41+
[`@q_batch_mode_transform`](../api/utils.html#botorch.utils.transforms.q_batch_mode_transform)
42+
decorators.
43+
44+
##### `@t_batch_mode_transform`
45+
46+
This decorator simplifies evaluating MC-based acquisition functions using
47+
inputs in non-batch mode. If applied to an instance method with a single `Tensor`
48+
argument, an input tensor to that method without a t-batch dimension (i.e.
49+
tensors of shape $q \times d$) will automatically be converted to a t-batch of
50+
size 1 (i.e. of `batch_shape` `torch.Size([1])`), This is typically used on the
51+
`forward` method of a `MCAcquisitionFunction`.
52+
53+
54+
##### `@q_batch_mode_transform`
55+
56+
This decorator simplifies evaluating analytic acquisition functions with input
57+
tensors that do not have a q-batch dimension. If applied to an instance method
58+
with a single `Tensor` argument, an input tensor to that method will
59+
automatically receive an additional singleton dimension at the second-to-last
60+
dimension. This is typically used on the `forward` method of an
61+
`AnalyicAcquisitionFunction`.
62+
63+
64+
65+
## Batching Sample Shapes
66+
67+
botorch evaluates Monte-Carlo acquisition functions using (quasi-) Monte-Carlo
68+
sampling from the posterior at the input features $X$. Hence, on top of the
69+
existing q-batch and t-batch dimensions, we also end up with another batch
70+
dimension corresponding to the MC samples we draw. We use the PyTorch notions of
71+
`sample_shape` and `event_shape`.
72+
73+
`event_shape` is the shape of a single sample drawn from the underlying
74+
distribution. For instance,
75+
- evaluating a single-output model at a $1 \times n \times d$ tensor,
76+
representing $n$ data points in $d$ dimensions each, yields a posterior with
77+
`event_shape` $1 \times n \times 1$. Evaluating the same model at a
78+
$\textit{batch_shape} \times n \times d$ tensor (representing a t-batch-shape
79+
of $\textit{batch_shape}$, with $n$ $d$-dimensional data points in each batch)
80+
yields a posterior with `event_shape` $\textit{batch_shape} \times n \times 1$.
81+
- evaluating a multi-output model with $t$ outputs at a $\textit{batch_shape}
82+
\times n \times d$ tensor yields a posterior with `event_shape`
83+
$\textit{batch_shape} \times n \times t$.
84+
- recall from the previous section that internally, all acquisition functions
85+
are evaluated using a single t-batch dimension.
86+
87+
`sample_shape` is the shape (possibly multi-dimensional) of the samples drawn
88+
*independently* from the distribution with `event_shape`, resulting in a tensor
89+
of samples of shape `sample_shape` + `event_shape`. For instance,
90+
- drawing a sample of shape $s1 \times s2$ from a posterior with `event_shape`
91+
$b \times n \times t$ results in a tensor of shape
92+
$s1 \times s2 \times \textit{batch_shape} \times n \times t$, where each of
93+
the $s1 s2$ tensors of shape $\textit{batch_shape} \times n \times t$ are
94+
independent draws.
95+
96+
97+
## Batched Evaluation of Models and Acquisition Functions
98+
The GPyTorch models implemented in botorch support t-batched evaluation with
99+
arbitrary t-batch shapes.
100+
101+
##### Non-batched Models
102+
103+
In the simplest case, a model is fit to non-batched training points with shape
104+
$n \times d$.
105+
- *Non-batched evaluation* on a set of test points with shape $m \times d$
106+
yields a joint posterior over the $m$ points.
107+
- *Batched evaluation* on a set of test points with shape
108+
$\textit{batch_shape} \times m \times d$ yields $\textit{batch_shape}$
109+
joint posteriors over the $m$ points in each respective batch.
110+
111+
##### Batched Models
112+
The GPyTorch models can also be fit on batched training points with shape
113+
$\textit{input_batch_shape} \times n \times d$. Here, each batch is modeled
114+
independently (each batch has its own hyperparameters).
115+
For example the training points have shape $b_1 \times b_2 \times n \times d$
116+
(two batch dimensions), the batched GPyTorch model is effectively $b_1 \times b_2$
117+
independent models. More generally, suppose we fit a model to training points
118+
with shape $\textit{input_batch_shape} \times n \times d$.
119+
Then, the test points must support broadcasting to the $\textit{input_batch_shape}$.
120+
121+
* *Non-batched evaluation* on a set of test points with shape
122+
$\textit{input_batch_shape}^* \times m \times d$, where each dimension of
123+
$\textit{input_batch_shape}^*$ either matches the corresponding dimension of
124+
$\textit{input_batch_shape}$ or is 1 to support broadcasting, yields
125+
$\textit{input_batch_shape}$ joint posteriors over the $m$ points
126+
(respectively if not broadcasting).
127+
128+
* *Batched evaluation* on a set of test points with shape
129+
$\textit{new_batch_shape} \times \textit{input_batch_shape}^* \times m \times d$,
130+
where $\textit{new_batch_shape}$ is the new batch shape for batched evaluation,
131+
yields $\textit{new_batch_shape} \times \textit{input_batch_shape}$ joint
132+
posteriors over the $m$ points in each respective batch (broadcasting as
133+
necessary over $\textit{input_batch_shape}$)
134+
135+
#### Batched Multi-Output Models
136+
The [`BatchedMultiOutputGPyTorchModel`](../api/models.html#batchedmultioutputgpytorchmodel)
137+
class implements a fast multi-output model (assuming conditional independence of
138+
the outputs given the input) by batching over the outputs.
139+
140+
##### Training inputs/targets
141+
Given training inputs with shape $\textit{input_batch_shape} \times n \times d$
142+
and training outputs with shape $\textit{input_batch_shape} \times n \times o$,
143+
the `BatchedMultiOutputGPyTorchModel` permutes the training outputs to make the
144+
output $o$-dimension a batch dimension such that the augmented training inputs
145+
have shape $o \times \textit{input_batch_shape} \times n$. The training inputs
146+
(which are required to be the same for all outputs) are expanded to be
147+
$o \times \textit{input_batch_shape} \times n \times d$.
148+
149+
##### Evaluation
150+
When evaluating test points with shape
151+
$\textit{new_batch_shape} \times \textit{input_batch_shape} \times m \times d$
152+
via the `posterior` method, the test points are broadcasted to the model(s) for
153+
each output. This results in the batched posterior where the mean has shape
154+
$\textit{new_batch_shape} \times o \times \textit{input_batch_shape} \times m$
155+
which then is permuted back to the original multi-output shape
156+
$\textit{new_batch_shape} \times \textit{input_batch_shape} \times m \times o$.
157+
158+
#### Batched Optimization of Random Restarts
159+
botorch uses random restarts to optimize an acquisition function from multiple
160+
starting points. To efficiently optimize an acquisition function for a $q$-batch
161+
of candidate points using $r$ random restarts, botorch uses batched
162+
evaluation on a $r \times q \times d$ set of candidate points to independently
163+
evaluate and optimize each random restart in parallel.
164+
In order to optimize the $r$ acquisition functions using gradient information,
165+
the acquisition values of the $r$ random restarts are summed before
166+
back-propagating.
167+
168+
#### Batched Cross Validation
169+
See the
170+
[Using batch evaluation for fast cross validation](../tutorials/batch_mode_cross_validation)
171+
tutorial for details on using batching for fast cross validation.

docs/design_philosophy.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ botorch adheres to the following main design tenets:
1919
- Extend the applicability of Bayesian Optimization to very large problems by
2020
harnessing scalable modeling frameworks such as [GPyTorch](https://gpytorch.ai/).
2121

22+
2223
## Parallelism through batched computation
2324

2425
Batching (as in batching data, batching computations) is a central component to
@@ -44,8 +45,7 @@ stochastic gradient descent using mini-batch training, botorch itself abstracts
4445
away from this.
4546

4647
For more detail on the different batch notions in botorch, take a look at the
47-
[More on Batching](#more_on_batching) section.
48-
48+
[Batching in botorch](#batching) section.
4949

5050

5151
## Optimizing Acquisition Functions

docs/more_on_batching.md

Lines changed: 0 additions & 93 deletions
This file was deleted.

docs/optimization.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The default method used by botorch to optimize acquisition functions is
2020
[`gen_candidates_scipy()`](../api/gen.html#botorch.gen.gen_candidates_scipy).
2121
Given a set of starting points (for multiple restarts) and an acquisition
2222
function, this optimizer makes use of `scipy.optimize.minimize()` for
23-
optimization, via either the `L-BFGS-B` or `SLSQP` routines.
23+
optimization, via either the L-BFGS-B or SLSQP routines.
2424
`gen_candidates_scipy()` automatically handles conversion between `torch` and
2525
`numpy` types, and utilizes PyTorch's autograd capabilities to compute the
2626
gradient of the acquisition function.
@@ -30,9 +30,8 @@ gradient of the acquisition function.
3030
A `torch` optimizer such as `torch.optim.Adam` or `torch.optim.SGD` can also be
3131
used directly, without the need to perform `numpy` conversion. These first-order
3232
gradient-based optimizers are particularly useful for the case when the
33-
acquisition function is stochastic, where algorithms `L-BFGS` or Sequential
34-
Least-Squares Programming designed for deterministic functions should not be
35-
applied. The function
33+
acquisition function is stochastic, where algorithms like L-BFGS or SLSQP that
34+
are designed for deterministic functions should not be applied. The function
3635
[`gen_candidates_torch()`](../api/gen.html#botorch.gen.gen_candidates_torch)
3736
provides an interface for `torch` optimizers and handles bounding.
3837
See the example notebooks

website/sidebars.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
"About": ["introduction", "design_philosophy", "botorch_and_ax"],
44
"General": ["getting_started"],
55
"Basic Concepts": ["overview", "models", "posteriors", "acquisition", "optimization"],
6-
"Advanced Topics": ["more_on_batching", "objectives", "samplers", "mtmo_models"]
6+
"Advanced Topics": ["batching", "objectives", "samplers"]
77
}
88
}

0 commit comments

Comments
 (0)