Home    General Stuff    General Chat
#1

Spending Your Cache Wisely

Archive: 4 posts


MediaMolecule.com:
http://www.mediamolecule.com/lab/article/spending_your_cache_wisely/



Back in the day, computers were much simpler things. They had a CPU and they had some memory. Getting data to/from memory took a fairly reliable amount of time, and when optimising you’d just try to make sure that memory speed wasn’t the block in the pipeline that caused your application to slow down. Today however, things have changed, primarily due to the fact that CPUs have gained speed at a much faster rate than memory has. Where a CPU might go up in speed by 10x over a few years, the speed of the memory it’s paired with might go up by just 3x, making it become a much larger culprit for slowing games down…
The hardware solution to this is the use of caches – a small amount of super fast (and pricey) memory that sits in between the main CPU and RAM. A cache is made up of a set of small cache lines (usually 128 bytes) that at any point in time represent some 128 byte block of main memory. Cache lines can be flushed back to RAM (writing back any changes to the memory they represent), and read from RAM (refreshing or loading a new cache line from memory). It’s the hardware’s responsibility to keep the memory and the cache in sync, and make sure the bits of memory of you want to fiddle with are in the cache.
Nowadays when you read/write memory you’re really reading/writing what is in the cache. This is exceedingly quick (generally on the same scale as the cost of a few instructions). However, if you attempt to read/write data that isn’t in the cache, the CPU must stall, flush a bit of the cache that’s not been used for a while back to memory, and replace it with the memory you want to read/write. It’s not uncommon for the cost of accessing memory that isn’t in the cache to be 30x the cost of your average instruction!
So lets jump straight into some code…


struct Person
{
char Name[58];
int FavouritePrimeNumber;

Person() : FavouritePrimeNumber(0) { Name[0] = 0; }
};
int NumPeople = 0;
Person GPeople[1000];

Person* AddPerson(const char* name, int favourite_prime_number)
{
strcpy(GPeople[NumPeople].Name, name);
GPeople[NumPeople].FavouritePrimeNumber = favourite_prime_number;
NumPeople++;
}

This type of code should be fairly familiar to any games programmer. We’ve got a nice big buffer to store people in (in this case their name and favourite prime number). The AddPerson function simply inserts a new entry and increments the NumPeople counter. In this case I’ve conveniently made the Person structure be 64 bytes, so we can fit 2 people in a 128 byte cache line.
Now suppose our game wants to inspect a property of the people – what if we want to know the biggest favourite prime number…



int GetBiggestFavouritePrimeNumber()
{
int biggest = 0;
for(int i = 0; i < NumPeople; i++)
if(GPeople.FavouritePrimeNumber > biggest)
biggest = GPeople.FavouritePrimeNumber;
return biggest;
}


That all works fine. We iterate over all the added people and track the largest chosen number. However, this article is about memory access, so let’s look at how much data has to be read from memory. FavouritePrimeNumber is a 4 byte integer, so you might be forgiven for assuming that the total memory read is:
Total read (in bytes) = NumPeople * sizeof(int) = NumPeople * 4

Wrong! Think back to how a cache works. On a modern platform there is no such thing as reading a few bytes – if the data you want isn’t in the cache, you need to load the entire cache line. In hardware, the loop will actually do something like:
1. Read GPeople[0].FavouritePrimeNumber – load the whole 128 byte cache line containing GPeople[0] (costs about 30 instructions worth of time)
2. Read GPeople[1].FavouritePrimeNumber – it’ll already be in the cache as it’s in the same line as GPeople[0] (pretty fast)
3. Read GPeople[2].FavouritePrimeNumber – load the whole 128 byte cache line containing GPeople[2] (costs about 30 instructions worth of time)
4. Etc etc ....
When we want to read that 4 byte property of a person, we’re having to actually get the entire person from memory into the cache before doing so. As a result, from the hardware’s point of view GetBiggestFavouritePrimeNumber has to read 64 bytes just to get at that 4 byte integer!
So how to fix this? Well, it all depends on what you want to do regularly with your data. If in my game I very regularly needed to get the biggest favourite prime number from a large set of people then I might re-organise data as follows:


struct Person
{
char Name[58];
Person() { Name[0] = 0; }
};
int NumPeople = 0;
int GFavouritePrimeNumbers[1000];
Person GPeople[1000];

Person* AddPerson(const char* name, int favourite_prime_number)
{
strcpy(GPeople[NumPeople].Name, name);
GFavouritePrimeNumbers[NumPeople] = favourite_prime_number;
NumPeople++;
}

int GetBiggestFavouritePrimeNumber()
{
int biggest = 0;
for(int i = 0; i < NumPeople; i++)
if(GFavouritePrimeNumbers > biggest)
biggest = GFavouritePrimeNumbers;
return biggest;
}

I’ve updated the code so the favourite prime numbers are stored in a separate array that I can iterate over nice and quickly (the technical name for this is SOA – structure-of-arrays, as opposed to AOS – array-of-structures). An int is only 4 bytes, which means the GetBiggestFavouritePrimeNumber now only has to load a new cache line for every 32nd person! Given memory was almost certainly the blocker in the pipeline here, my code now runs at roughly 16x its original speed!
The moral of the story - avoid loading data into the cache that you don’t need. Or to put it another way, spend your cache wisely
2011-08-30 16:36:00

Author:
Cronos Dage
Posts: 396


Just posted the following reply...


It'd be even more efficient just to cache the biggest prime in its own variable. I mean, you'd have to reiterate the array if you removed a person from it, but assuming that occurs less frequently than having to find the biggest prime, it's still a win from a performance POV. Something like...



struct Person
{
char Name[58];
int FavouritePrimeNumber;

Person(): FavouritePrimeNumber(0) { *Name = 0; }
};

int NumPeople = 0;
Person GPeople[1000];
int BiggestFavouritePrimeNumber = 0;

Person* AddPerson(const char* name, int favourite_prime_number)
{
strcpy(GPeople[NumPeople].Name, name);
GPeople[NumPeople].FavouritePrimeNumber = favourite_prime_number;
if (favourite_prime_number > BiggestFavouritePrimeNumber)
{
BiggestFavouritePrimeNumber = favourite_prime_number;
}
++NumPeople;
}

int GetBiggestFavouritePrimeNumber()
{
return BiggestFavouritePrimeNumber;
}


The Person() constructor is also redundant, since the AddPerson() function ensures the memory is initialized, and the NumPeople variable ensures uninitialized array entries are never accessed.

2011-08-30 18:51:00

Author:
Aya042
Posts: 2870


Too confusing! * Head Asplotion* I suppose this is useful to game devs though?2011-08-30 19:06:00

Author:
craigmond
Posts: 2426


you know, every single thing MM puts up, doesn't necessarily have to be reposted here. This is not LBP news, this is more like programmer's how to's and last time I checked, we don't have a "programmer's how to's" section. Sure it's interesting and what not but it's not news. LBP news is something like: "hey guess what guys, the patch is coming out tomorrow" or "Holy Crap!, they're going to make a LOTR (Lord of the Rings) DLC kit!" File this with "how to Defrag your pc".

Sorry folks, guess Grouchy McGrouch decided to show up today, or it could do with the fact that it's 109 degrees and the a/c isn't working right.
Also, Wouldn't a LOTR DLC kit be awesome! I would love to have an ENT costume or hobbits, or orcs or, or, or....
Maybe this should be moved to general chat
2011-08-30 20:19:00

Author:
biorogue
Posts: 8424


LBPCentral Archive Statistics
Posts: 1077139    Threads: 69970    Members: 9661    Archive-Date: 2019-01-19

Datenschutz
Aus dem Archiv wurden alle persönlichen Daten wie Name, Anschrift, Email etc. - aber auch sämtliche Inhalte wie z.B. persönliche Nachrichten - entfernt.
Die Nutzung dieser Webseite erfolgt ohne Speicherung personenbezogener Daten. Es werden keinerlei Cookies, Logs, 3rd-Party-Plugins etc. verwendet.