What does this PR do?
This PR improves the performance of the backwarp
class.
The backwarp
class is used for creating backwarping objects. The class constructor calls numpy.meshgrid()
and torch.tensor()
to create a grid including two tensor objects.
According to my profiling script, the similar API provided by the torch
module has far better performance. The torch.meshgrid()
has 25X speedup on a single NVIDIA 3090 GPU.
Analysis
I compare the trace of the two different implementations of meshgrid()
functions from torch
and numpy
modules. The reasons for the performance difference would be:
- The
np.meshgrid()
generates the numpy objects on CPU and then copy it to GPU. The copy process incurs extra function call toaten::to
. Theaten::to
takes 0.115ms - The
torch.meshgrid()
takes 0.037ms, however, thenumpy.meshgrid()
takes 0.099ms. It indicates about 3X time difference. - The total time for generating the grid with
torch.meshgrid()
is 0.113ms. The total time for generating the grid withnp.meshgrid()
is 0.370ms ---- 3X speedup.