Tensor Slicing and Optimization for Multicore NPUs (Full Report)