Cleverer cache management could progress computer chips’ performance, shrink energy consumption
Computer chips keep getting faster because transistors keep getting smaller. But the chips themselves are as big as ever, so data moving between chips , and around the chip and main memory, has to pass through just as far. As transistors get faster, the cost of moving data becomes, proportionally, a more severe restriction.
So far, chip designers have circumvented that limitation through the use of “caches”—small memory banks close to processors that store frequently used data But the number of processors—or “cores”—per chip is also increasing, which makes cache management more difficult.
Moreover, as cores proliferate, they have to share data more frequently, so the communication network connecting the cores becomes the site of more frequent logjams, as well.
In a pair of recent papers, researchers at MIT and the University of Connecticut have developed a set of new caching strategies for particularly multicore chips that, in simulations, extensively improved chip performance while actually reducing energy consumption.
The first paper, presented at the most recent ACM/IEEE International conference on Computer Architecture, reported average gains of 15 percent in execution time and energy savings of 25 percent. The second paper, which describes a balancing set of caching strategies and will be presented at the IEEE International Symposium on High Performance Computer Architecture, reports gains of 6 percent and 13 percent, respectively.
The caches on multicore chips are typically arranged in a hierarchy. Each core has its own private cache, which may itself have several levels, while all the cores share the so-called last-level cache, or LLC.
Chips’ caching protocols usually adhere to the simple but unexpectedly effective principle of “spatiotemporal locality.” Temporal locality means that if a core requests a particular piece of data, it will probably request it again. Spatial locality means that if a core requests a particular piece of data, it will probably request other data stored near it in main memory.
So every requested data item gets stored, along with those instantaneously adjacent to it, in the private cache. If it falls idle, it will eventually be squeezed out by more recently requested data, falling down through the hierarchy—from the private cache to the LLC to main memory—until it’s requested again.