GCC 12 enables autovectorization at -O2

Compiler geek inside me is excited about this change in GCC


This is a ton of work to pull something like this and make it default. ARM chips with neon SIMD benefits and so do all modern chips. I will do some benchmarking with gcc and see how it compares at O2 with gcc11

