Why does my code run in different order on GCC and MSVC compilers

I have a cuda program that calls two cuda streams with two cpu threads.I didn’t want the runtime of the two cuda streams to overlap, just want to hide the copy of the data.
I get this effect with the MSVC compiler on Windows, but the GCC compiler on Linux doesn’t always work the way I want it to.
Below are the execution times of the two programs I analyzed with the Nsight System. I’ve added stream sync, and it’s still not right,Can someone tell me why it’s different?(First image is linux gcc)

Hi @Simple_Liu,