SPO600 Berwout: SPO600 Lab 5

For this lab we will be looking at 2 different approaches to scaling 16 bit signed integer sound samples with a float to simulate different volumes. We will also be testing these approaches on 2 different machine architectures, one aarch64 and one x86_64 machine.

The first approach will be to naively scale the value and return the result. When scaling a sample size of n this will result in n multiplication instructions.

The second approach is to create a look up table where we store the result of every possible sample at it's index. We populate the look up table whenever the user changes the scaling factor, but during consecutive calls to the function with no change we can just look up the value in the table. With a 16 bit signed integer this results in 65536 multiplication instructions per volume change. Because the cache will probably be too small to hold the entire array, so this approach will be more memory intensive.

To achieve a clear difference we will be attempting to scale 500 million sound samples, this is roughly equal to 1 hour and 35 minutes of samples in 44 kHz stereo sound.

To estimate the time used by the scaling functions we will be using a control program that does the same memory allocation and basic operations as the program would do without the scaling functions. We will be timing these three programs to see which one does best.

The control program looks like this:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>
#include <time.h>

void fillSampleArray(int16_t* arr, int numOfElems);

int main()
{
    //creating an array of samples
    srand((int)time(NULL));
    int numberOfElements = 500000000;
    int16_t* sampleArray = (int16_t*)malloc(sizeof(int16_t) * numberOfElements);
    fillSampleArray(sampleArray, numberOfElements);

    int result = INT16_MIN;

    for (int i = 0; i < numberOfElements; i++) {
        result = result ^ sampleArray[i];
    }

    printf("%d", result);
    free(sampleArray);
    return 0;
}

void fillSampleArray(int16_t* arr, int numOfElems)
{
    for (int i = 0; i < numOfElems; i++)
    {
        arr[i] = ((rand() * 2) % (UINT16_MAX + 1)) + (INT16_MIN);
    }
}

some things to note about this code:
The reason we are using the ^ operator and printing a result is to make sure the compiler does not strip our code thinking that it is useless. We force it to use the code in some form of output so it can't get rid of it.

The reason we are multiplying rand() by 2 is that rand() has a maximum value of approximately 32000, we need to represent around 65000 numbers so this is our way of artificially doing so. Note that this creates only even numbers and it still only creates around 32000 distinct values. For our scaling function this does not matter as we just care about having a lot of samples.

For the 2 implementations we add this method signature above:

int16_t volumeScale(int16_t sample, float scale);

We have a small change to the main program to use the function, the result is now calculated as follows:

result = result ^ volumeScale(sampleArray[i], 0.750);

Our naive solution is fairly simple and looks as you would expect:

int16_t volumeScale(int16_t sample, float scale)
{
return (int16_t)sample*scale;
}

The lookup function is a bit more complicated due to the need to save state across multiple iterations and allocation of the array to store the intermediate results.

int16_t lookUpVolumeScale(int16_t sample, float scale)
{
   //previous volume
   static float pVolume = -1;

   //lookup table
   static int16_t* luTable=NULL;

   if(luTable==NULL){
       luTable = (int16_t*)malloc(sizeof(int16_t) * (UINT16_MAX + 1));
   }

   if(pVolume != scale)
   {
       for(int i=0; i<UINT16_MAX+1; i++)
       {
           //INT16_MIN is the offset
           luTable[i] = scale*(i+INT16_MIN);
       }
       pVolume = scale;
   }
   return luTable[sample + INT16_MAX + 1];
}

Funny thing here, I am a C# / Java programmer who has never programmed anything in C besides a flashing led, so I did not actually know how to save state inside a function across multiple iterations. It took me 2 hours of googling to figure out that the static keyword does all this for you. Sometimes when you are searching for something considered to be very simple for a C programmer, it can actually be very hard to find a result that explains it.

Lastly we run a comparison running the three programs on both our platforms, we will be using the time command and the real duration in seconds for this comparison:

Platform	Control	Naive	Lookup
x86_64(Xerxes)	10.728	12.354	19.238
aarch64(Aarchie)	181.366	286.152	223.343

Some things to note here:

We have not changed the volume, this means that we are assuming our user is going to listen to 1,5 hours of music without ever changing the volume.
The aarch64 system is compiling with a slightly older version of gcc.
The time function is probably not the best approach to accurately time these programs.

With these things in mind I think we can still say a few things about the results, due to the size of the gap between the solutions. Even without changing the volume, the x86_64 version performs better when naively scaling the sound samples. This probably has to do with memory access speed being the bottleneck in this case. On the aarch64 system however the lookup-table performs much better than the naive scaling. There is a full minute difference, even if you account for us not changing the volume this is a significant gap. This also means that the processing power is the bottleneck in this system.
In both cases we could easily scale the amount of sound we had in a fraction of the time required to listen to it, so there would be no problem with either approach, however if you are planning to scale a huge amount of sound samples this test was definitely worth the effort.

Next lab we will be looking at parallelising some of these operations to see if we can achieve an even faster result!

SPO600 Berwout

maandag 22 februari 2016

SPO600 Lab 5

Geen opmerkingen:

Een reactie posten

Blogarchief