Skip to content

Commit 780dbab

Browse files
committed
Accelerate the method Tensor::getElementPtr
The getElementPtr method can be rewritten using std::inner_product. Unfortunately std::inner_product does not optimize very well and loops that use this method don't get vectorized. Don't change this loop without benchmarking the program on a few compilers.
1 parent 3a05dc8 commit 780dbab

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

include/glow/Base/Tensor.h

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -272,8 +272,16 @@ template <class ElemTy> class Handle final {
272272
/// the list of indices may be incomplete.
273273
size_t getElementPtr(llvm::ArrayRef<size_t> indices) const {
274274
assert(indices.size() <= numDims && "Invalid number of indices");
275-
return std::inner_product(indices.begin(), indices.end(),
276-
std::begin(sizeIntegral), 0);
275+
// The loop below can be rewritten using std::inner_product. Unfortunately
276+
// std::inner_product does not optimize very well and loops that use this
277+
// method don't get vectorized. Don't change this loop without benchmarking
278+
// the program on a few compilers.
279+
size_t index = 0;
280+
for (int i = 0, e = indices.size(); i < e; i++) {
281+
index += size_t(sizeIntegral[i]) * size_t(indices[i]);
282+
}
283+
284+
return index;
277285
}
278286

279287
/// \returns the value of the n'th dimension \p dim, for the raw index \p idx.

0 commit comments

Comments
 (0)