dinsdag 29 maart 2016

Compiler optimizations part 2

Last week we each chose 2 compiler optimizations and in my previous blog I have explained my first, my second compiler optimization flag is -free. In some 64 bit architectures when you load a word into the lower half of a register, the upper half of that register implicitly gets zeroed out. This means that if you need to lengthen an unsigned integer from 32 to 64 bits you can do so just by loading the word.

From the gcc manual we find that -free is a flag that tries to remove redundant extension instructions. This is also one of the optimizations that specifically works well on AArch64, which is why I wanted to take a closer look at it. It is enabled at -O2, -O3 and -Os.

Let me start by saying that even though I've tried multiple programs lengthening int16's and int32's to int64's, by either multiplication or addition, I have yet to find a case where -free kicks in. So far there has been no difference in assembly. I have even tried to get 32 bit floats converted to 64 bit integers to see if it would do lengthening, but it never did. Instead gcc did do something interesting though, which I would like to highlight for a second. This was the program I used to attempt to generate a lengthening instruction.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>

uint64_t sumArray(uint32_t *arr, uint32_t *arr2, int length){
    uint64_t result = 0;
    for(int i=0; i<length; i++){
        result += arr[i] + arr2[i];
    }
    return result;
}

int main(){
    srand(time(NULL));
    const int length=256;
    uint32_t arr[length];
    uint32_t arr2[length];
    for(int i=0; i<length; i++){
        arr[i] = rand();
        arr2[i] = rand();
    }
    printf("%d", sumArray(arr, arr2, length));
    return 0;
}
 The funny thing is that this program actually did not create a lengthening instruction, but instead had the following instruction:
mov %eax, %eax
 on the surface this looks very strange, because we are moving a 32 bit word into the same register, but if you look into it slightly deeper, the mov instruction actually zeroes out the top part of the register as well, so it implicitly lengthens the 32 bit unsigned integer to a 64 bit unsigned integer.
I've tried several versions of this program, that multiply or add 32 bit to 64 bit unsigned integers or with 32 and 64 bit unsigned integers, but I have yet to generate a single lengthening instruction. This means that -free sadly does not do anything for the programs that I have written.

I can tell you that if there was a mov instruction present and afterwards a lengthening instruction(for an unsigned 32 bit int), this would be one of the cases where -free would kick in and remove the lengthening instruction, sadly I was not able to generate one of those cases.

Geen opmerkingen:

Een reactie posten