Visual Studio… Why?

This post is just a rant venting frustration with Visual Studio. Feel free to ignore.

Last year I switched employment from a somewhat .NET shop to a predominantly python shop with a bit of a node sprinkled in where it won’t be noticed. As such the only times I need to use Visual Studio now are when I’m doing work for my brother, or whenever I dust off old projects. That is probably the absolute worst way to use Visual Studio.

So, I’m not entirely sure how I got into this situation, but my main ‘pack up and go’ laptop has a single version of Visual Studio on it- 2015 Community Edition (I’m really unclear on this as it should have professional edition). And I use it once every few months.

So. I go to a job site. In the middle of nowhere. I open up a project in Visual Studio, what do I get?

A prompt.

A prompt asking me to sign in to my gods damned useless Microsoft Account because it wants me to sign in every 90 days.

No internet access? Don’t want to sign in? No development for you. Visual Studio shuts down. All for what? So Microsoft can advertise their hosted TFS service that I wouldn’t touch with a 10 foot pole? So they can gather anonymous usage data that will help them… do something?

I will grant, they’re providing a (very large) tool for free and are more then welcome to impose any terms they’d like. I just wish they’d chosen terms that weren’t making me reconsider my off hours tech stack.

Unrelated but equally annoying, I had ReSharper on this VS2015 release. Somehow the two got into a really bad conflict. Would get IntelliSense from both tools, neither would work, and various key strokes would be ignored in favor of closing the useless IntelliSense pop ups. Coding when key strokes are randomly swallowed is painful.

Further annoyance: package management. I come back to my Tachyon Model Viewer project after (too long) a hiatus, and I can’t build the damn thing. The packages the project depends on are listed as installed, but for whatever reason they’re not referenced. And unless my solution of browsing to the installation location and manually choosing each assembly reference is the actual intended solution, there’s no obvious way of referencing them.

 

How much of this is my fault? Almost certainly all of it. I’m sure I screwed up my installation two years ago by, say, removing an older version of VS or something. Don’t know why I would have done that, but I could see it happening. And the package management could just be I’m using it in the exact wrong way. But, I don’t really care. Same way I’m sure there’s a VS2017 or a VSCode that I should be using if I want to be up to date, which probably does improve package management. But, as I said, I don’t really care- I’m on the internet, mildly annoyed, and basically talking to myself. I hate being stuck in the Javascript ecosystem using 30 different tools to build a web page, but I hate the Microsoft ecosystem more, where you have 1 tool that only usually works.

Oh well. To the Rust Rabbit Hole!

Aside. You may have noticed I don’t keep this blog active. Realistically that’s not going to change. This is a ‘when I feel like it’ thing, and sadly that doesn’t happen as often as I’d like. But, I do have a few things on my To Do list. One odd thing I’d like to figure out is getting this site and a few others into a docker container. I’d like to get to the point where I’m not spinning up new EC2 instances to run new side websites. Nothing major in the scheme of things, but it is some house keeping I’d like to do. Especially if it means I can continue ignoring actual house keeping.

Nicking assets from Tachyon: The Fringe, Part 2: The PFF Archive

The first file we’ll look at on our foray into pixel piracy is the PFF Archive. This is an archive file containing all of the game assets for Tachyon. There’s just the one, Tachyon.pff, and it’s nearly 400MB in size (no compression, but when extracted the contents usually take ~2GB on disk, due to a lot of small files). (This was back in the day when 2GB was a lot). This file format has been used by several NovaLogic games, most prominently the Delta Force series. Later versions of the format seem to support compression, but the format we’re working with is basically straight storage.

Technically the PFF format isn’t vital for our model extraction project- extractors have been around for the past 20 years, and we could just as easily work off an already extracted archive. However, that makes moving between computers a bit annoying, as you then need to find the extractor, extract the directory, point the app at the directory. Too much work, easier to point the app at a single file (especially as we aren’t looking to modification of underlying files). This also serves as a starting point for the blog series, to play with formatting and such before the content is important.

Thankfully at least one of the PFF extractors (from Devil’s Claw) also came with source code; so no reverse engineering required, just a minor port. Thank you Devil’s Claw.

On with the show.

 

 

General format of the file is a Header, a set of File Entries, a Footer, and then a massive blob of data.

 

Header

Size Type Name Value Comments
4 int Header Size 20
4 char[4] Version String PFF3 PFF3, PFF4, etc
4 int File Count 8134 Number of File Entries
4 int File Segment Size 36 Size of Individual record, 32 to 40 bytes
4 int File List Offset 0x16C6A7F1 Start of File Entry Array, near end of file in this case

One interesting bit about the header is that the version string is lies (according to the old code). File Segment Size varies by PFF version, from 32 bytes for “V2”, 36 for V3, 40 for V4. But, “V2” and V3 have the same version string: PFF3.

File Segment Entry

Size Type Name Comments
4 int Deleted 0 for “not deleted”
4 int File Position Index into PFF where file content starts
4 int File Length
4 int Creation Date Guessing unix like timestamp; all around 2000 if true
16 char* File Name Null terminated string
4 int Modified Date Optional based on Segment Size, present in our data set, and has some weird values for timestamps (2020s+)
4 int Compression Level Optional based on Segment Size

Footer

Size Type Name Comments
4 int System IP Who packed the file, a 192.168.*.* address.
4 int Reserved/Unknown
4 char[] KING tag. Value=KING… Magic value, at a guess?

Appears after all of the File Entries. At a guess the KING tag is intended as a magic value to check that the file is otherwise valid.

Thoughts

It’s a straightforward archive format, with some support for extensions built in. It’s not bad, and kind of elegant for the goal of ‘single file archive’. It’s a bit wasteful though- what I usually see, when size matters, is the entries will have a bit field, with some fields defined, and others reserved for future use. IE deleted and compression level don’t need 8 bytes all to themselves. Though I’ll grant, for this use you’re talking about 8 bytes across 8000 entries- 64KB of semi-wasted space in a 400MB file.

Source code for the C# reader can be found here, and the Source for the Devils Claw extractor is here.

Other Notes: I wrote this a long time ago and never got around to publishing it because I wanted to get formatting for the tables right. Giving up on that for now, and just publishing to get the process started again. This’ll be a bigger problem in some of the next file formats, unfortunately, but we’ll see what happens. I also may use this as an opportunity to learn Rust, as Visual Studio is getting on my nerves.

Migration to AWS

Took me long enough. In 2009 this blog was originally hosted with a shared hosting provider. In 2014 (maybe) I moved it to a linux server on Azure for the purposes of play (and to remove reliance on hosting provider). After a slight employment mixup I decided to move from Azure (registered with my FS credentials), to AWS (registered with personal creds).

In the process of doing that I’ve turned on HTTPS via letsencrypt, switched from a “magic apache config copied from the internets” to an nginx based server with php-fpm doing the hosting, updated WordPress (though not yet the styling), and made a few minor DNS improvements. This took months to do. MONTHS! Months of “I should muck about with the server” on the back of my mind. But, it’s finished now, and that’s what’s important.

In celebration of new found listlessness, I’m re-publishing some of my older stuff. It’s a mix of contest related problem solving, very early electronics projects, and some misc rambling that at one point I thought didn’t really belong here, but, ehh.

My Calling- To Grok

For Rick Schonfeld of the American Red Cross (and other random readers):

We were having a discussion about how we, the students in CRASAR, got into computer science and robotics. I said my reason was I had found my calling in CS in high school, and you asked what my calling was, specifically.

I said that it was system design- starting with a blank slate and a desired functionality and producing a working system. That’s a bit of a lie.

Designing systems is one of the things I enjoy, but developing a working system (something from nothing) isn’t quite why I like it. The actual reason is the understanding that comes with being a systems developer. I have a friend (and former coworker) that when asked how well he understood computers (part of a “what should be a students first language” debate) replied “I understand it down to the level of electrons moving… I stopped at why the electrons are moving, but above that I understand.”

Being a computer scientist teaches you how to think; it teaches you how to analyze a problem and how to come to a solution. Once you have a solution in mind, you learn to implement it. When you implement a solution, if you are worth your salt, you don’t just know how it works in vague and abstract terms, but rather you understand it. You grok it. You can watch it run and envision each individual component operating, running along at 2 billion operations per second (or merely 1 million depending on your platform). You can envision it flowing as one fluid entity, existing within its own reality.

Groking how something works is an amazing experience (similar to an ‘Ah ha!’ moment, but more awesome and less fleeting). Learning, understanding, and groking new and complex systems is why I am glad to be in Computer Science; and being exposed to new physical and electronic systems is why I am glad I am a Systems Developer.

It can also be quite terrifying. I designed and developed the software for a frac pump control system from scratch. I raised it from a wee tike that could barely blink LED’s to a system now deployed on million dollar industrial vehicles. I know that it works, and I grok it. But, every once in a while a problem comes up that doesn’t leave you wondering what happened, but rather exclaiming “That’s impossible.. that’s not how this works”. So far when I’ve run into these situations, it seems I’ve been right (odd mechanical or deployment issues), but each time your heart skips a few beats as you’re forced to question yourself, your understanding, and the basis of the reality your system is built around.

But, it is worth it.

Put a different way, a common joke is that real programmers use a magnetic pen to write data onto a hard disk. Somewhere out there are people who, given sufficiently steady hands and sufficiently small points to their pens, know enough about how a hard drive works that they could write data that way. There are people out there that, if you give them the raw electrical signals off your ethernet cable pairs, they can tell you what website you are surfing. These are those that grok.

Thanks to Rick for asking a question I probably should have known the answer to already.

Communication Issues in “The Challenge of the Multicores”

The Talk

A few weeks ago I listened in on a lecture given by Fran Allen, a Turing award winner and a language/compiler/optimization researcher at IBM.

The talk was on “The Challenge of the Multicores”. It meandered a bit into the history of 60’s era IBM machines, a few reflections, and a basic explanation of the problem being encountered with an emphasis on scientific computing. I think- the lecture was rather unfocused, and a few weeks ago, so memory=dead. Part of the lecture was a list of solutions to the multicore challenge:

  1. Cache needs to die a bloody and well deserved death
  2. We should move away from low level general purpose languages (C/C++/Java/Python/etc)

(Humorous tilt added by me, but otherwise this is roughly what she said)

I’m not sure if there were more, but these two stuck in my head because of what ensued. With the elimination of cache, you could hear almost every engineer in the audience groan in horror, and roughly similar may have occurred with the death of low level languages. The questions at the end of the seminar were roughly 90% “How do we handle the discrepancy between CPU speed and memory speed without cache”, “How could any low level general purpose language be completely displaced by a high level language”, etc. This was a classic issue of miscommunication.

The Death of Cache

With the first point, Fran isn’t saying “Get rid of the cache and have the CPU talk directly to memory” (or she might be, I’m not sure how well she understands hardware concerns). What she is saying is that how cache is implemented today is a large source of performance problems. Cache is operated largely without instruction from the program- it makes educated guesses about to load, and what to keep. When it guesses wrong, a cache miss occurs and a massive microsecond or so overhead occurs while memory is fetched.

Here is the problem: cache guessing is based largely on assembly code and current use. A program can’t tell the CPU ‘I’m going to use this data soon’; a compiler can’t provide meta data to the CPU of the form ‘by the time this method finishes, we’re going to need data from one of these three locations’. Lack of certainty about what the cache will do impedes automated optimization. That is what she was getting at. If instead we were able to treat the cache as 4MB worth of really slow registers, with it’s own lesser instruction set and controller, we could see massive performance gains as we can get much closer to a perfect cache with an intelligent cache controller then we could with a controller that only guesses based on current and previous state.

When phrased that way, cache death doesn’t seem like such an incredibly terrible idea. The cache is still there in some form, it’s just not the cache of today. There are still all manner of issues that would need to be worked out, but it’s alot more clear that “yes, you may have a point there”.

The Death of C

Again, she isn’t saying “C should be eradicated from everywhere and everything”, she’s saying that using a low level language removes the semantics of what you’re trying to do. The best example I can think of would be the difference between using SQL, and hand coding a query over a data structure. (Ignore small issues like ethics, responsibility, pain, etc).

If you encode a query yourself, you are responsible for parallelizing it, and doing that correctly. That means you need to keep in mind data dependencies, lock acquisition orders, etc. You may need to take into account niggly little details about the machine architecture you’re working on if you want massive performance. If you have a sufficiently complex query, it might be worthwhile to pre-compute certain parts of the query and reuse them. And so on.

With SQL, instead of specifying the steps to get to the state you want, you specify “I want everything that looks like x, cross referenced with everything that feels like y, ordered by z”. Your database server is then responsible for building a query execution plan. It asks “what can I precompute to make this faster”, “Do I need to do a linear scan through this entire data set, or do I have meta data somewhere such that I can do a binary search”, etc.

SQL constrains what you can ask for, but in return it exploits all manner of optimizations and parallelizations, without your lifting of a finger. Fran’s focus is on high performance scientific computing. If the people doing a weather simulation say “I need the solution to Ax=b” rather then coding a matrix inverse or an LU decomposition by hand, the compiler can provide a very efficient implementation, which may be able to exploit optimizations that the programmers could not see. She isn’t saying “Throw out C#, write your web applications in MATLAB”; but she is saying that current practice makes it hard for optimization researchers to help your code run faster.

Conclusion

Communicate better. It’s better for you, your message, and us.

Howdy

This is written far after the fact, in the balmy year of 2016.

I’m Matt Moss. At the time I started writing this blog, I was a Student and Software Developer at Texas A&M for the Energy Systems Lab (ESL). I graduated in 2011, and continued in Software Development for a local consultancy, as well as dabbling with Oilfield Electronics on the side.

This blog started out as a way to generate an excuse to practice for the ACM ICPC contest. To that effect I’d go through, look at past ACM problems or Project Euler problems, anything that struck my fancy, and go through and try to solve them. The goal being to both improve implementation time, and just improve problem solving/recognition skills. Eventually I also started learning electronics, and some of my early projects from that ended up here, as well as a few philosophical diatribes. Now that I’m out of college, I’m trying to maintain the blog mostly as a project repository. And/or a place to write when I feel like writing.

You might be curious about the name. Coder Tao is not a fascinating name based on the balance that should come from being able to solve complex problems semi-elegantly while also operating within the confine of project scope. Nor is it an endorsement of Asian religions. Nor is it a statement involving Torque and the coding folk. Tao is the name of a gnome sorcerer. A machine of unimaginable destruction and chaos (but mostly destruction). And now it is a name.