So in my previous blog I found an optimization, but I was not too happy about it because it seemed like such a simple step to take. I guess that is sort of the point with compiler optimizations though. The more efficiently you can get things done, the better. The compiler assists with this very handily.
I tried to get the compiler to tell me some possible optimizations as well, using -fopt-info-vec-missed to see what vectorization possibilities it missed and why. The compiler came up with about 500 lines of missed vectorization, so I wasn't sure where to start with that. I then used -fopt-info-vec to see that some things did actually get vectorized, but that got me no further either.
I tried profiling the program using gprof, it resulted in me finding out that the program spends its entire time in a single large function, whether compressing or uncompressing. This brought me no closer to finding any other optimization.
I have fiddled with the DUSERMEM and DREGISTERS to see if it makes a difference, but I think there is a set maximum to these values somewhere, although I do not know why. To me they are sort of magic numbers right now.
All in all the search and constant waiting for benchmarks was frustrating to me, especially because I found next to nothing besides the obvious compiler flag. If I was to attempt a project like this again. I would have an extra phase and my phase 2 would be write an automatic benchmarking suite where you press a button and a reliable benchmark rolls out the other end.
I also wouldn't use my own laptop to benchmark on, because for some reason the Virtual Machine gives very varying performances and I can never tell if it is an accurate representation of change or not. Luckily betty had my sanity covered there.
Geen opmerkingen:
Een reactie posten