Inline Tensor Convolution Using Template Expansion

Last year I faced a problem which required me to compute products (convolutions) between vectors, matrixes, vectors of matrixes (rank-3 tensors).

I wanted to do this in C++ and the best option I could find seemed to be FTensor. Although it probably would have worked for my particular case, FTensor seems to have been written in an older style which failed to make use of variadics, and I wanted something a bit more generic.

In a quest for further flexibility I began to work on my own tensor library, Weyl.

Weyl is header only and contains just one core class, tensor, whose entire source code is very short (at the moment, 367 lines including comments). Thanks to fancy-pants template expansion, large tensor convolutions are (usually!) compiled into a list of simple add and multiply operations.

Here’s the sort of thing I aim to do.

weyl::tensor<float, 3, 3, 5> a({ ... });
weyl::tensor<float, 2, 5, 4> b({ ... });
weyl::tensor<float, 3, 3, 2, 4> c = weyl::sum<2, 1>(a, b);

This ought to be equivalent to the tensor convolution through a Riemann sum over one index.

\[\begin{align*} C_{ijlm} = \sum_{k=1}^5 A_{ijk}B_{lkm} \end{align*}\]

In order to produce this result, we need the template function sum to expand to something equivalent to the following set of embedded loops (assuming that the values in the resulting tensor were initialized to zero).

for (i = 0; i < 3; ++i) // i and j are a's contribution to c's rank indexes
for (j = 0; j < 3; ++j)
for (l = 0; l < 2; ++l) // l and m are b's contribution to c's rank indexes
for (m = 0; m < 4; ++m)
for (k = 0; k < 5; ++k) // k is the index of the dimension to sum over
    c[i][j][l][m] += a[i][j][k] * b[l][k][m];

This is essentially what weyl::tensor produces, and since the bounds of each loop are known, they are unrolled at compile time. You can see the sort of assembly it produces here on Godbolt. For brevity, the full source is not included in this example.

I even made this fancy logo thing.