Maps a multi-dimensional local index and thread layout into a reshaped global index.
The idea is that the user need to specify the dimensions of logical index (local + thread ids), and the target tensor dimensions, then we map the logical index into the corresponding target multi-dimension tensor.
We have: