Migration to AWS

Took me long enough. In 2009 this blog was originally hosted with a shared hosting provider. In 2014 (maybe) I moved it to a linux server on Azure for the purposes of play (and to remove reliance on hosting provider). After a slight employment mixup I decided to move from Azure (registered with my FS credentials), to AWS (registered with personal creds).

In the process of doing that I’ve turned on HTTPS via letsencrypt, switched from a “magic apache config copied from the internets” to an nginx based server with php-fpm doing the hosting, updated WordPress (though not yet the styling), and made a few minor DNS improvements. This took months to do. MONTHS! Months of “I should muck about with the server” on the back of my mind. But, it’s finished now, and that’s what’s important.

In celebration of new found listlessness, I’m re-publishing some of my older stuff. It’s a mix of contest related problem solving, very early electronics projects, and some misc rambling that at one point I thought didn’t really belong here, but, ehh.

My Calling- To Grok

For Rick Schonfeld of the American Red Cross (and other random readers):

We were having a discussion about how we, the students in CRASAR, got into computer science and robotics. I said my reason was I had found my calling in CS in high school, and you asked what my calling was, specifically.

I said that it was system design- starting with a blank slate and a desired functionality and producing a working system. That’s a bit of a lie.

Designing systems is one of the things I enjoy, but developing a working system (something from nothing) isn’t quite why I like it. The actual reason is the understanding that comes with being a systems developer. I have a friend (and former coworker) that when asked how well he understood computers (part of a “what should be a students first language” debate) replied “I understand it down to the level of electrons moving… I stopped at why the electrons are moving, but above that I understand.”

Being a computer scientist teaches you how to think; it teaches you how to analyze a problem and how to come to a solution. Once you have a solution in mind, you learn to implement it. When you implement a solution, if you are worth your salt, you don’t just know how it works in vague and abstract terms, but rather you understand it. You grok it. You can watch it run and envision each individual component operating, running along at 2 billion operations per second (or merely 1 million depending on your platform). You can envision it flowing as one fluid entity, existing within its own reality.

Groking how something works is an amazing experience (similar to an ‘Ah ha!’ moment, but more awesome and less fleeting). Learning, understanding, and groking new and complex systems is why I am glad to be in Computer Science; and being exposed to new physical and electronic systems is why I am glad I am a Systems Developer.

It can also be quite terrifying. I designed and developed the software for a frac pump control system from scratch. I raised it from a wee tike that could barely blink LED’s to a system now deployed on million dollar industrial vehicles. I know that it works, and I grok it. But, every once in a while a problem comes up that doesn’t leave you wondering what happened, but rather exclaiming “That’s impossible.. that’s not how this works”. So far when I’ve run into these situations, it seems I’ve been right (odd mechanical or deployment issues), but each time your heart skips a few beats as you’re forced to question yourself, your understanding, and the basis of the reality your system is built around.

But, it is worth it.

Put a different way, a common joke is that real programmers use a magnetic pen to write data onto a hard disk. Somewhere out there are people who, given sufficiently steady hands and sufficiently small points to their pens, know enough about how a hard drive works that they could write data that way. There are people out there that, if you give them the raw electrical signals off your ethernet cable pairs, they can tell you what website you are surfing. These are those that grok.

Thanks to Rick for asking a question I probably should have known the answer to already.

Communication Issues in “The Challenge of the Multicores”

The Talk

A few weeks ago I listened in on a lecture given by Fran Allen, a Turing award winner and a language/compiler/optimization researcher at IBM.

The talk was on “The Challenge of the Multicores”. It meandered a bit into the history of 60’s era IBM machines, a few reflections, and a basic explanation of the problem being encountered with an emphasis on scientific computing. I think- the lecture was rather unfocused, and a few weeks ago, so memory=dead. Part of the lecture was a list of solutions to the multicore challenge:

  1. Cache needs to die a bloody and well deserved death
  2. We should move away from low level general purpose languages (C/C++/Java/Python/etc)

(Humorous tilt added by me, but otherwise this is roughly what she said)

I’m not sure if there were more, but these two stuck in my head because of what ensued. With the elimination of cache, you could hear almost every engineer in the audience groan in horror, and roughly similar may have occurred with the death of low level languages. The questions at the end of the seminar were roughly 90% “How do we handle the discrepancy between CPU speed and memory speed without cache”, “How could any low level general purpose language be completely displaced by a high level language”, etc. This was a classic issue of miscommunication.

The Death of Cache

With the first point, Fran isn’t saying “Get rid of the cache and have the CPU talk directly to memory” (or she might be, I’m not sure how well she understands hardware concerns). What she is saying is that how cache is implemented today is a large source of performance problems. Cache is operated largely without instruction from the program- it makes educated guesses about to load, and what to keep. When it guesses wrong, a cache miss occurs and a massive microsecond or so overhead occurs while memory is fetched.

Here is the problem: cache guessing is based largely on assembly code and current use. A program can’t tell the CPU ‘I’m going to use this data soon’; a compiler can’t provide meta data to the CPU of the form ‘by the time this method finishes, we’re going to need data from one of these three locations’. Lack of certainty about what the cache will do impedes automated optimization. That is what she was getting at. If instead we were able to treat the cache as 4MB worth of really slow registers, with it’s own lesser instruction set and controller, we could see massive performance gains as we can get much closer to a perfect cache with an intelligent cache controller then we could with a controller that only guesses based on current and previous state.

When phrased that way, cache death doesn’t seem like such an incredibly terrible idea. The cache is still there in some form, it’s just not the cache of today. There are still all manner of issues that would need to be worked out, but it’s alot more clear that “yes, you may have a point there”.

The Death of C

Again, she isn’t saying “C should be eradicated from everywhere and everything”, she’s saying that using a low level language removes the semantics of what you’re trying to do. The best example I can think of would be the difference between using SQL, and hand coding a query over a data structure. (Ignore small issues like ethics, responsibility, pain, etc).

If you encode a query yourself, you are responsible for parallelizing it, and doing that correctly. That means you need to keep in mind data dependencies, lock acquisition orders, etc. You may need to take into account niggly little details about the machine architecture you’re working on if you want massive performance. If you have a sufficiently complex query, it might be worthwhile to pre-compute certain parts of the query and reuse them. And so on.

With SQL, instead of specifying the steps to get to the state you want, you specify “I want everything that looks like x, cross referenced with everything that feels like y, ordered by z”. Your database server is then responsible for building a query execution plan. It asks “what can I precompute to make this faster”, “Do I need to do a linear scan through this entire data set, or do I have meta data somewhere such that I can do a binary search”, etc.

SQL constrains what you can ask for, but in return it exploits all manner of optimizations and parallelizations, without your lifting of a finger. Fran’s focus is on high performance scientific computing. If the people doing a weather simulation say “I need the solution to Ax=b” rather then coding a matrix inverse or an LU decomposition by hand, the compiler can provide a very efficient implementation, which may be able to exploit optimizations that the programmers could not see. She isn’t saying “Throw out C#, write your web applications in MATLAB”; but she is saying that current practice makes it hard for optimization researchers to help your code run faster.

Conclusion

Communicate better. It’s better for you, your message, and us.

Howdy

This is written far after the fact, in the balmy year of 2016.

I’m Matt Moss. At the time I started writing this blog, I was a Student and Software Developer at Texas A&M for the Energy Systems Lab (ESL). I graduated in 2011, and continued in Software Development for a local consultancy, as well as dabbling with Oilfield Electronics on the side.

This blog started out as a way to generate an excuse to practice for the ACM ICPC contest. To that effect I’d go through, look at past ACM problems or Project Euler problems, anything that struck my fancy, and go through and try to solve them. The goal being to both improve implementation time, and just improve problem solving/recognition skills. Eventually I also started learning electronics, and some of my early projects from that ended up here, as well as a few philosophical diatribes. Now that I’m out of college, I’m trying to maintain the blog mostly as a project repository. And/or a place to write when I feel like writing.

You might be curious about the name. Coder Tao is not a fascinating name based on the balance that should come from being able to solve complex problems semi-elegantly while also operating within the confine of project scope. Nor is it an endorsement of Asian religions. Nor is it a statement involving Torque and the coding folk. Tao is the name of a gnome sorcerer. A machine of unimaginable destruction and chaos (but mostly destruction). And now it is a name.