I knew it was not going to be easy to find documentation because there was only one developer and the last he worked on it was 3 years ago. Undeterred I installed the package on both aarchie and my own Fedora installation. The package can be found here: https://sourceforge.net/projects/slimdata/
The installation was very straightforward and just required a tar -xvf, a ./configure with some parameters to be found in the included README and a make install. I could now slim and unslim files, I tried it on a few text files and it worked. However the program clearly stated it was made for large integer groups with repetitive patterns, so I set out to create a program that would make a file for me. After much experimentation I finished with this:
#include <stdint.h>I have tried using fwrite, to just write the binary values of the integers to a file. I have tried fprintf, to write the text values of the integers to a file. I have tried random repetition amounts and swapping around some integers to create a more or less diverse file. I have tried smaller and larger files. The text files would not even slim, it would literally break the original file into an unrecoverable state. The integer files would slim but the maximum compression file gain was 1,5% of the file or about 20MB on a 1GB file.
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
int main() {
srand(time(NULL));
//ints in a GB
int64_t size = 1073741824/4;
int repetition = 20000;
int32_t* data = (int32_t*) malloc(sizeof(int32_t)*size);
int i = 0;
while (i < size) {
int32_t number = rand();
//repeat j times
for (int j = 0; j < repetition; j++) {
if (i < size) {
data[i] = number;
i++;
}
}
}
FILE *fp = fopen("data", "w");
fwrite(data, 4, size, fp);
//for(int x = 0; x<size; x++){
//fprintf(fp, "%d", data[x]);
//}
fclose(fp);
printf("success");
return 0;
}
I think I can safely assume that this is not the intended gain on a file of this or any size as gzip would win about 50% of the file size in about the same amount of time.
It saddens me to say that I think that I will have to admit defeat here and pick a different project. I will soon update on that.
Geen opmerkingen:
Een reactie posten