Communication Issues in “The Challenge of the Multicores”

The Talk

A few weeks ago I listened in on a lecture given by Fran Allen, a Turing award winner and a language/compiler/optimization researcher at IBM.

The talk was on “The Challenge of the Multicores”. It meandered a bit into the history of 60’s era IBM machines, a few reflections, and a basic explanation of the problem being encountered with an emphasis on scientific computing. I think- the lecture was rather unfocused, and a few weeks ago, so memory=dead. Part of the lecture was a list of solutions to the multicore challenge:

  1. Cache needs to die a bloody and well deserved death
  2. We should move away from low level general purpose languages (C/C++/Java/Python/etc)

(Humorous tilt added by me, but otherwise this is roughly what she said)

I’m not sure if there were more, but these two stuck in my head because of what ensued. With the elimination of cache, you could hear almost every engineer in the audience groan in horror, and roughly similar may have occurred with the death of low level languages. The questions at the end of the seminar were roughly 90% “How do we handle the discrepancy between CPU speed and memory speed without cache”, “How could any low level general purpose language be completely displaced by a high level language”, etc. This was a classic issue of miscommunication.

The Death of Cache

With the first point, Fran isn’t saying “Get rid of the cache and have the CPU talk directly to memory” (or she might be, I’m not sure how well she understands hardware concerns). What she is saying is that how cache is implemented today is a large source of performance problems. Cache is operated largely without instruction from the program- it makes educated guesses about to load, and what to keep. When it guesses wrong, a cache miss occurs and a massive microsecond or so overhead occurs while memory is fetched.

Here is the problem: cache guessing is based largely on assembly code and current use. A program can’t tell the CPU ‘I’m going to use this data soon’; a compiler can’t provide meta data to the CPU of the form ‘by the time this method finishes, we’re going to need data from one of these three locations’. Lack of certainty about what the cache will do impedes automated optimization. That is what she was getting at. If instead we were able to treat the cache as 4MB worth of really slow registers, with it’s own lesser instruction set and controller, we could see massive performance gains as we can get much closer to a perfect cache with an intelligent cache controller then we could with a controller that only guesses based on current and previous state.

When phrased that way, cache death doesn’t seem like such an incredibly terrible idea. The cache is still there in some form, it’s just not the cache of today. There are still all manner of issues that would need to be worked out, but it’s alot more clear that “yes, you may have a point there”.

The Death of C

Again, she isn’t saying “C should be eradicated from everywhere and everything”, she’s saying that using a low level language removes the semantics of what you’re trying to do. The best example I can think of would be the difference between using SQL, and hand coding a query over a data structure. (Ignore small issues like ethics, responsibility, pain, etc).

If you encode a query yourself, you are responsible for parallelizing it, and doing that correctly. That means you need to keep in mind data dependencies, lock acquisition orders, etc. You may need to take into account niggly little details about the machine architecture you’re working on if you want massive performance. If you have a sufficiently complex query, it might be worthwhile to pre-compute certain parts of the query and reuse them. And so on.

With SQL, instead of specifying the steps to get to the state you want, you specify “I want everything that looks like x, cross referenced with everything that feels like y, ordered by z”. Your database server is then responsible for building a query execution plan. It asks “what can I precompute to make this faster”, “Do I need to do a linear scan through this entire data set, or do I have meta data somewhere such that I can do a binary search”, etc.

SQL constrains what you can ask for, but in return it exploits all manner of optimizations and parallelizations, without your lifting of a finger. Fran’s focus is on high performance scientific computing. If the people doing a weather simulation say “I need the solution to Ax=b” rather then coding a matrix inverse or an LU decomposition by hand, the compiler can provide a very efficient implementation, which may be able to exploit optimizations that the programmers could not see. She isn’t saying “Throw out C#, write your web applications in MATLAB”; but she is saying that current practice makes it hard for optimization researchers to help your code run faster.

Conclusion

Communicate better. It’s better for you, your message, and us.

Leave a Reply


Allowed HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>