dinsdag 29 maart 2016

Project selection part 2

Alright after the disappointing failure I experienced with slimdata I am ready for a new try! I tried to install several of the leftover projects (most of them have been selected already). Some of them had fairly complicated build instructions to get the last version to work and some of for some of the packages listed I was not sure if they were even around. Notably coreutils latest version has a bunch of prerequisites and I followed their readme to install the most recent version, the prerequisites will not install correctly on it even though I followed their instructions to the letter. Also sha2 and sha3 were sort of ambiguous, it was hard to find the actual source code that was meant by those listings. After exploring these packages I settled on ncompress.

ncompress is a compression algorithm that uses the Lempel-Ziv-Welch approach, this is a well known compression algorithm. The basics are that it creates a lookup-table for common sequences in the file, to reduce the number of bits required to display that sequence. This means there has to be some sort of repetition in the file in order for there to be an improvement. Therefore we can not use a randomly generated file. Luckily I wrote a piece of code in my Project selection 1 blog that writes a repeating int pattern either directly to a file or as text to a file. We can use the output from that program as input for the ncompress program.

The ncompress package is fairly old with its' last release being in 2006, as such I think there should be an optimization present in a multi-processing or simd approach. Which is what I will focus my attention on. This is also because I am most interested in these types of optimizations.

ncompress is easy to install by just running the git command:
git clone https://github.com/vapier/ncompress.git
then it will clone the repo into a map called ncompress, you can then navigate in there and make install the application.

I faced some difficulties copying my files to our aarch64 test servers aarchie and betty, for some reason my scp and sftp kept getting closed by the remote host after a certain percentage. After trying for over 20 times and reaching no more than 20% of a 1 GB file I gave up and looked for an alternative. It turns out rsync is a very flexible command that lets you sync directories across servers. For approaching aarchie and betty you do have to specify a port number with the -e option, but some googling on the rsync command gets you there very quickly.

I then set about obtaining benchmarks for the package, which are to be viewed below:

Some notes about the benchmarks:
- They were taken when cache was "hot" (I disregarded first run).
- They are averages of 3 runs.
- All runs used copies of the same files.
- I used time to measure time spent for the entire command.
-There was a minimal amount of interference caused by other processes running at the same time, so keep that in mind. I did check to see if there were no large CPU consuming applications running.

aarch64 testserver Betty:
compress:
2545934560(~2.4GB) textfile:
real: 83.652s
user: 82.590s
sys: 1.060s
resulting size: 49091023(~1,93%)

1073741824(1GB) integerstream:
real: 15.811s
user: 15.323s
sys: 0.480s
resulting size:  21505241(~2,00%)

uncompress: 
2545934560(~2.4GB) textfile:
real:14.104s
user: 12.530s
sys: 1.570s
1073741824(1GB) integerstream:
real: 5.920
user: 5.403
sys: 0.510s

my own x86_64 Fedora 23 installation:
compress:
2545934560(~2.4GB) textfile:
real: 39.341s
user: 28.147s
sys: 3.484s
resulting size: 49091023(~1,93%)

1073741824(1GB) integerstream:
real: 15.626s
user: 7.479s
sys: 2.328s
resulting size: 21505241(~2,00%)

uncompress:
2545934560(~2.4GB) textfile:
real: 38.687s
user: 6.326s
sys: 2.160s

1073741824(1GB) integerstream:
real: 14.575s
user: 2.500s
sys: 0.856s

Something interesting to look into immediately is that on Betty the uncompress algorithm seemed to work much better than on my own Fedora installation, where it was pretty much the same speed as the compression algorithm. I would like to do some more benchmarking on actual written text files and maybe some installers later, but for a basic benchmark I think this does alright. I was in some stress for time because I had to select a new project after attempting to benchmark the previous package for too long.

Geen opmerkingen:

Een reactie posten