JIT compilator was slower compared to MSVC compiled binary. Up to +0.1% speedup on rx/wow in Linux.
Also optimized Blake2b SSE4.1 code size to avoid code cache pollution.