It seems there are quite a few differences between this implementation and the one from the paper. Namely the weight initializers for DeConv, Conv and BatchNorm, noise sample, and whether to use bias in the 3DdeConv layer. May I ask why these differences?