As MKL blas compact routines are faster, we might need to support compact gemm in the future. https://software.intel.com/en-us/mkl-developer-reference-c-mkl-gemm-compact