In my project selection blog I mentioned that I wanted to look at vectorization options the compiler can do for us if we make some small changes to the code. However before looking into the code I went to take a look at the makefile to see if there are any vectorization flags currently turned on.
This is what the compiler options line looks like on both betty and my own x86 machine:
options= $(CFLAGS) -DNOFUNCDEF -DUTIME_H $(LDFLAGS) $(CFLAGS) $(CPPFLAGS) -DDIRENT=1 -DUSERMEM=800000 -DREGISTERS=3What I found out is that CFLAGS, CPPFLAGS and LDFLAGS are all undefined in this case. What I also found strange is that these options define a set memory for the program to use and a number of registers the program can use.
The most glaring lack here is that there is not even a base level of optimization applied, so I set out to test that first. To verify that the code has not been changed in a way that it now has a different meaning, we will be using md5 checksum to see if our compressed and uncompressed files remain the same. After slowly ramping up the optimization level, it turns out that the program works on O3, without it causing any problems.
During benchmarks I first encountered a couple of runs where it looked like uncompressing the files was slower than before applying the optimization, however after running some more benchmarks, it seems to me that it was probably background noise from other processes running on my system.
After the -O3 optimization level had been applied we see the following results in the benchmarks:
aarch64 testserver Betty:
my own x86_64 Fedora 23 installation:
If we compare that with last week's results we find that in all of the cases the compression was faster with the optimization turned on. However oddly enough on my local machine the uncompress was notably slower! This could also be explained by the amount of possible disturbances in the background processes of my machine since I am running a Virtual machine to test with. However I have tested extensively and have yet to see the optimized version of uncompress not do worse.
To go about implementing this optimization in the actual program I will have to add it to the Makefile recipe, for the testing I have just been editing the Makefile itself. I have also been looking for a reason as to why compression on the aarch64 server betty is so terribly slow, while uncompress is very fast, but I have yet to find the answer.