zondag 10 april 2016

Project Phase 2

After last week where we ended our project selection with a somewhat hasty switch to another project I am happy to be able to just dig into this package a little bit. Sadly I have been sick half of this week(still am) so the progress is slower than I would have hoped.

In my project selection blog I mentioned that I wanted to look at vectorization options the compiler can do for us if we make some small changes to the code. However before looking into the code I went to take a look at the makefile to see if there are any vectorization flags currently turned on.

This is what the compiler options line looks like on both betty and my own x86 machine:
options= $(CFLAGS) -DNOFUNCDEF -DUTIME_H $(LDFLAGS) $(CFLAGS) $(CPPFLAGS) -DDIRENT=1 -DUSERMEM=800000 -DREGISTERS=3
What I found out is that CFLAGS, CPPFLAGS and LDFLAGS are all undefined in this case. What I also found strange is that these options define a set memory for the program to use and a number of registers the program can use.

The most glaring lack here is that there is not even a base level of optimization applied, so I set out to test that first. To verify that the code has not been changed in a way that it now has a different meaning, we will be using md5 checksum to see if our compressed and uncompressed files remain the same. After slowly ramping up the optimization level, it turns out that the program works on O3, without it causing any problems.

During benchmarks I first encountered a couple of runs where it looked like uncompressing the files was slower than before applying the optimization, however after running some more benchmarks, it seems to me that it was probably background noise from other processes running on my system.

After the -O3 optimization level had been applied we see the following results in the benchmarks:
aarch64 testserver Betty:
compress:
2545934560(~2.4GB) textfile:
real:    64.467s
user:   63.280s
sys:     1.180s

1073741824(1GB) integerstream:
real:    9.413s
user:   8.980s
sys:     0.430s


uncompress:
2545934560(~2.4GB) textfile:
real:    10.882s
user:   9.510s
sys:     1.370s

1073741824(1GB) integerstream:
real:    4.734s
user:   3.920s
sys:     0.700s

my own x86_64 Fedora 23 installation:
compress: 
2545934560(~2.4GB) textfile:
real:    34.812s
user:   18.821s
sys:     4.715s

1073741824(1GB) integerstream:
real:    13.274s
user:   3.789s
sys:     1.651s

uncompress: 
2545934560(~2.4GB) textfile:
real:    45.669s
user:   5.779s
sys:     2.339s

1073741824(1GB) integerstream: 
real:    17.713s
user:   2.814s
sys:     1.107s


If we compare that with last week's results we find that in all of the cases the compression was faster with the optimization turned on. However oddly enough on my local machine the uncompress was notably slower! This could also be explained by the amount of possible disturbances in the background processes of my machine since I am running a Virtual machine to test with. However I have tested extensively and have yet to see the optimized version of uncompress not do worse.

To go about implementing this optimization in the actual program I will have to add it to the Makefile recipe, for the testing I have just been editing the Makefile itself. I have also been looking for a reason as to why compression on the aarch64 server betty is so terribly slow, while uncompress is very fast, but I have yet to find the answer.

Geen opmerkingen:

Een reactie posten