|
| 1 | +--- |
| 2 | +id: batching |
| 3 | +title: Batching |
| 4 | +--- |
| 5 | + |
| 6 | +botorch makes frequent use of "batching", both in the sense of batch acquisition |
| 7 | +functions for multiple candidates as well as in the sense of parallel or batch |
| 8 | +computation (neither of these should be confused with mini-batch training). |
| 9 | +Here we explain some of the common patterns you will see in botorch for |
| 10 | +exploiting parallelism, including common shapes and decorators for more |
| 11 | +conveniently handling these shapes. |
| 12 | + |
| 13 | + |
| 14 | +## Batch Acquisition Functions |
| 15 | + |
| 16 | +botorch supports batch acquisition functions that assign a joint utility to a |
| 17 | +set of $q$ design points in the parameter space. These are, for obvious reasons, |
| 18 | +referred to as q-Acquisition Functions. For instance, botorch ships with support |
| 19 | +for q-EI, q-UCB, and a few others. |
| 20 | + |
| 21 | +As discussed in the |
| 22 | +[design philosophy](design_philosophy#batching-batching-batching), |
| 23 | +botorch has adopted the convention of referring to batches in the |
| 24 | +batch-acquisition sense as "q-batches", and to batches in the torch |
| 25 | +batch-evaluation sense as "t-batches". |
| 26 | + |
| 27 | +Internally, q-batch acquisition functions operate on input tensors of shape |
| 28 | +$b \times q \times d$, where $b$ is the number of t-batches, $q$ is the number |
| 29 | +of design points to be considered concurrently, and $d$ is the dimension of the |
| 30 | +parameter space. Their output is a one-dimensional tensor with $b$ elements, |
| 31 | +with the $i$-th element corresponding to the $i$-th t-batch. Always requiring a |
| 32 | +explicit t-batch dimension makes it much easier and less ambiguous to work with |
| 33 | +samples from the posterior in a consistent fashion. |
| 34 | + |
| 35 | +#### Batch-mode decorators |
| 36 | + |
| 37 | +In order to simplify the user-facing API for evaluating acquisition functions, |
| 38 | +botorch implements the |
| 39 | +[`@t_batch_mode_transform`](../api/utils.html#botorch.utils.transforms.t_batch_mode_transform) |
| 40 | +and |
| 41 | +[`@q_batch_mode_transform`](../api/utils.html#botorch.utils.transforms.q_batch_mode_transform) |
| 42 | +decorators. |
| 43 | + |
| 44 | +##### `@t_batch_mode_transform` |
| 45 | + |
| 46 | +This decorator simplifies evaluating MC-based acquisition functions using |
| 47 | +inputs in non-batch mode. If applied to an instance method with a single `Tensor` |
| 48 | +argument, an input tensor to that method without a t-batch dimension (i.e. |
| 49 | +tensors of shape $q \times d$) will automatically be converted to a t-batch of |
| 50 | +size 1 (i.e. of `batch_shape` `torch.Size([1])`), This is typically used on the |
| 51 | +`forward` method of a `MCAcquisitionFunction`. |
| 52 | + |
| 53 | + |
| 54 | +##### `@q_batch_mode_transform` |
| 55 | + |
| 56 | +This decorator simplifies evaluating analytic acquisition functions with input |
| 57 | +tensors that do not have a q-batch dimension. If applied to an instance method |
| 58 | +with a single `Tensor` argument, an input tensor to that method will |
| 59 | +automatically receive an additional singleton dimension at the second-to-last |
| 60 | +dimension. This is typically used on the `forward` method of an |
| 61 | +`AnalyicAcquisitionFunction`. |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +## Batching Sample Shapes |
| 66 | + |
| 67 | +botorch evaluates Monte-Carlo acquisition functions using (quasi-) Monte-Carlo |
| 68 | +sampling from the posterior at the input features $X$. Hence, on top of the |
| 69 | +existing q-batch and t-batch dimensions, we also end up with another batch |
| 70 | +dimension corresponding to the MC samples we draw. We use the PyTorch notions of |
| 71 | +`sample_shape` and `event_shape`. |
| 72 | + |
| 73 | +`event_shape` is the shape of a single sample drawn from the underlying |
| 74 | +distribution. For instance, |
| 75 | +- evaluating a single-output model at a $1 \times n \times d$ tensor, |
| 76 | + representing $n$ data points in $d$ dimensions each, yields a posterior with |
| 77 | + `event_shape` $1 \times n \times 1$. Evaluating the same model at a |
| 78 | + $\textit{batch_shape} \times n \times d$ tensor (representing a t-batch-shape |
| 79 | + of $\textit{batch_shape}$, with $n$ $d$-dimensional data points in each batch) |
| 80 | + yields a posterior with `event_shape` $\textit{batch_shape} \times n \times 1$. |
| 81 | +- evaluating a multi-output model with $t$ outputs at a $\textit{batch_shape} |
| 82 | + \times n \times d$ tensor yields a posterior with `event_shape` |
| 83 | + $\textit{batch_shape} \times n \times t$. |
| 84 | +- recall from the previous section that internally, all acquisition functions |
| 85 | + are evaluated using a single t-batch dimension. |
| 86 | + |
| 87 | +`sample_shape` is the shape (possibly multi-dimensional) of the samples drawn |
| 88 | +*independently* from the distribution with `event_shape`, resulting in a tensor |
| 89 | +of samples of shape `sample_shape` + `event_shape`. For instance, |
| 90 | +- drawing a sample of shape $s1 \times s2$ from a posterior with `event_shape` |
| 91 | + $b \times n \times t$ results in a tensor of shape |
| 92 | + $s1 \times s2 \times \textit{batch_shape} \times n \times t$, where each of |
| 93 | + the $s1 s2$ tensors of shape $\textit{batch_shape} \times n \times t$ are |
| 94 | + independent draws. |
| 95 | + |
| 96 | + |
| 97 | +## Batched Evaluation of Models and Acquisition Functions |
| 98 | +The GPyTorch models implemented in botorch support t-batched evaluation with |
| 99 | +arbitrary t-batch shapes. |
| 100 | + |
| 101 | +##### Non-batched Models |
| 102 | + |
| 103 | +In the simplest case, a model is fit to non-batched training points with shape |
| 104 | +$n \times d$. |
| 105 | +- *Non-batched evaluation* on a set of test points with shape $m \times d$ |
| 106 | + yields a joint posterior over the $m$ points. |
| 107 | +- *Batched evaluation* on a set of test points with shape |
| 108 | + $\textit{batch_shape} \times m \times d$ yields $\textit{batch_shape}$ |
| 109 | + joint posteriors over the $m$ points in each respective batch. |
| 110 | + |
| 111 | +##### Batched Models |
| 112 | +The GPyTorch models can also be fit on batched training points with shape |
| 113 | +$\textit{input_batch_shape} \times n \times d$. Here, each batch is modeled |
| 114 | +independently (each batch has its own hyperparameters). |
| 115 | +For example the training points have shape $b_1 \times b_2 \times n \times d$ |
| 116 | +(two batch dimensions), the batched GPyTorch model is effectively $b_1 \times b_2$ |
| 117 | +independent models. More generally, suppose we fit a model to training points |
| 118 | +with shape $\textit{input_batch_shape} \times n \times d$. |
| 119 | +Then, the test points must support broadcasting to the $\textit{input_batch_shape}$. |
| 120 | + |
| 121 | +* *Non-batched evaluation* on a set of test points with shape |
| 122 | + $\textit{input_batch_shape}^* \times m \times d$, where each dimension of |
| 123 | + $\textit{input_batch_shape}^*$ either matches the corresponding dimension of |
| 124 | + $\textit{input_batch_shape}$ or is 1 to support broadcasting, yields |
| 125 | + $\textit{input_batch_shape}$ joint posteriors over the $m$ points |
| 126 | + (respectively if not broadcasting). |
| 127 | + |
| 128 | +* *Batched evaluation* on a set of test points with shape |
| 129 | + $\textit{new_batch_shape} \times \textit{input_batch_shape}^* \times m \times d$, |
| 130 | + where $\textit{new_batch_shape}$ is the new batch shape for batched evaluation, |
| 131 | + yields $\textit{new_batch_shape} \times \textit{input_batch_shape}$ joint |
| 132 | + posteriors over the $m$ points in each respective batch (broadcasting as |
| 133 | + necessary over $\textit{input_batch_shape}$) |
| 134 | + |
| 135 | +#### Batched Multi-Output Models |
| 136 | +The [`BatchedMultiOutputGPyTorchModel`](../api/models.html#batchedmultioutputgpytorchmodel) |
| 137 | +class implements a fast multi-output model (assuming conditional independence of |
| 138 | +the outputs given the input) by batching over the outputs. |
| 139 | + |
| 140 | +##### Training inputs/targets |
| 141 | +Given training inputs with shape $\textit{input_batch_shape} \times n \times d$ |
| 142 | +and training outputs with shape $\textit{input_batch_shape} \times n \times o$, |
| 143 | +the `BatchedMultiOutputGPyTorchModel` permutes the training outputs to make the |
| 144 | +output $o$-dimension a batch dimension such that the augmented training inputs |
| 145 | +have shape $o \times \textit{input_batch_shape} \times n$. The training inputs |
| 146 | +(which are required to be the same for all outputs) are expanded to be |
| 147 | +$o \times \textit{input_batch_shape} \times n \times d$. |
| 148 | + |
| 149 | +##### Evaluation |
| 150 | +When evaluating test points with shape |
| 151 | +$\textit{new_batch_shape} \times \textit{input_batch_shape} \times m \times d$ |
| 152 | +via the `posterior` method, the test points are broadcasted to the model(s) for |
| 153 | +each output. This results in the batched posterior where the mean has shape |
| 154 | +$\textit{new_batch_shape} \times o \times \textit{input_batch_shape} \times m$ |
| 155 | +which then is permuted back to the original multi-output shape |
| 156 | +$\textit{new_batch_shape} \times \textit{input_batch_shape} \times m \times o$. |
| 157 | + |
| 158 | +#### Batched Optimization of Random Restarts |
| 159 | +botorch uses random restarts to optimize an acquisition function from multiple |
| 160 | +starting points. To efficiently optimize an acquisition function for a $q$-batch |
| 161 | +of candidate points using $r$ random restarts, botorch uses batched |
| 162 | +evaluation on a $r \times q \times d$ set of candidate points to independently |
| 163 | +evaluate and optimize each random restart in parallel. |
| 164 | +In order to optimize the $r$ acquisition functions using gradient information, |
| 165 | +the acquisition values of the $r$ random restarts are summed before |
| 166 | +back-propagating. |
| 167 | + |
| 168 | +#### Batched Cross Validation |
| 169 | +See the |
| 170 | +[Using batch evaluation for fast cross validation](../tutorials/batch_mode_cross_validation) |
| 171 | +tutorial for details on using batching for fast cross validation. |
0 commit comments