ACM 2008 SCUSA Wormholes- 2nd Edition

Last time we left our time travelling hero’s, they were using a repeated applications of the Bellman-Ford algorithm (at time complexity O(|V| * |E|), where |E| is on par with |V|, and number of requests is probably also on the order of |V|, giving a total of O(|V|^3)). Our valiant programmer desperately seeks to apply an the Floyd-Warshall algorithm to the problem, just so that the time spent reading the analysis of algorithms book isn’t wasted.

The secret to successfully applying the Floyd-Warshall algorithm to this problem is the same secret to getting reasonable performance out of the previous solution. You can’t look at each individual year, you instead must look at time spans between meaningful events. That brings the number of vertices down from many thousand to at most 301 (starting year + 200 distinct wormhole years + 100 mission years). It has the downside of making the process of finding weights a fair bit more complex:

public int[,] BuildInitialWeights(List<int> distinctVerticies)
{
    int[,] weights = new int[distinctVerticies.Count, distinctVerticies.Count];
    for(int i=0; i<distinctVerticies.Count; i++)
        for(int j=0; j<distinctVerticies.Count; j++)
        {
            if (i == j) weights[i,j] = 0;
            else weights[i,j] = Int32.MaxValue;
        }
 
    //assumes vertices are in order, from first year to last
    for (int i = 1; i < distinctVerticies.Count; i++)
    {
        int start = distinctVerticies[i - 1];
        int end = distinctVerticies[i];
        weights[i-1, i] = end - start;
    }
    foreach (int[] years in wormholes)
    {
        int i = distinctVerticies.IndexOf(years[0]);
        int j = distinctVerticies.IndexOf(years[1]);
        int weight;
        if (years[1] > years[0]) weight = (years[1] - years[0]) / 2;
        else weight = (years[1] - years[0]) / 4;
        weights[i, j] = weight;
    }
    return weights;
}

Here we’re building a 2D matrix of weights. Informally we’re mapping each year in the distinct vertices list to its index, then using that index as coordinates in the weight matrix. Weights are constructed by first setting all weights to their default values (0 to get to current node, max value for other nodes). Then default weights for moving forward with time are assigned. Finally, wormholes are accounted for. Wormhole accounting is done last as it will be, by definition, the shortest path between two points. This has the advantage that only one path is ever considered between two points, where as the previous solution allowed for multiple paths.

public List<int> BuildDistinctVerticies(List<int> addedYears)
{
    List<int> years = new List<int>(addedYears);
    foreach (EndPoint wormhole in wormholes)
    {
        years.Add(wormhole.Start);
        years.Add(wormhole.End);
    }
    years.Sort();
    for (int i = 1; i < years.Count; i++)
        if (years[i - 1] == years[i])
            years.RemoveAt(i--);
    return years;
}

This is the simple method for building all distinct vertices. It starts by taking all mission and starting years, adds each end point from a wormhole to the list, then sorts it and removes duplicates. Again we get into ‘it would be nice to have a Sorted Set’ (Should appear in .NET 4; or, alternatively, has always been in Java… accursed Java with its nicer Collections libraries…).

public int[,] AllPairs(List<int> missionYears, out List<int> distinctVerticies)
{
    distinctVerticies = BuildDistinctVerticies(missionYears);
    int[,] currentWeights = BuildInitialWeights(distinctVerticies);
    int[,] nextWeights = new int[distinctVerticies.Count, distinctVerticies.Count];
    for (int i = 0; i < distinctVerticies.Count; i++)
    {
        Extend(currentWeights, nextWeights, i);
        int[,] temp = currentWeights;
        currentWeights = nextWeights;
        nextWeights = temp;
    }
    return currentWeights;
}
 
public void Extend(int[,] previosLevel, int[,] thisLevel, int k)
{
    for(int i=0; i<previosLevel.GetLength(0); i++)
        for(int j=0; j<previosLevel.GetLength(1); j++)
        {
            int iToJ = previosLevel[i,j];
            int iToK = previosLevel[i,k];
            int kToJ = previosLevel[k,j];
            if (iToK != Int32.MaxValue && kToJ != Int32.MaxValue)
                thisLevel[i,j] = Math.Min(iToJ, iToK + kToJ);
            else thisLevel[i,j] = iToJ;
        }
}

I’m about 90-100% certain I misspelled previous. Ignoring that, here we can see Floyd-Warshall algorithm implemented. I don’t fully understand how it works. I’ve read that the ‘extension’ element is simulating whether or not the shortest path from i to j will go through k; but at this hour of the morning I still can’t quite wrap my head around how it works. Something to learn for the new school year.

The distinct verticies are passed as an out parameter to the caller, so that they have the map from index to year available to them.

List<int> years = new List<int>(missions);
years.Add(startingYear);
int[,] distances = timeline.AllPairs(years, out years);
 
foreach(int destination in missions)
{
    int i = years.IndexOf(startingYear);
    int j = years.IndexOf(destination);
    int toDistance = distances[i, j];
    int fromDistance = distances[j, i];
    if(toDistance==Int32.MaxValue || fromDistance==Int32.MaxValue)
        Console.WriteLine("IMPOSSIBLE");
    else
        Console.WriteLine(toDistance+fromDistance);
}

Finally, the relevant changes to the problem solving method. The down side to using this approach is that there isn’t a direct mapping from year to index; so a linear search is performed to get that mapping. It’s not really an issue, considering what else is being done, but it is a bit unfortunate.

Analysis
Psuedo-analysis might be more accurate. Comparing this to the previous solution, I like the fact that the algorithm seems more straight forward, and that in the end we end up with a massive table of possible paths, should our time travelling agents ever need to do strange other things. It also has a slight performance advantage over the straight version from last time. (It’s about on par with not continuing the Bellman Ford algorithm if no relaxations occur, though that might be in part due to poor implementation of that algorithm).

That said, in its current form, it adds about 30 lines of code, and the indirect mapping from index to representative year feels ungood. In the end, the question for ACM problems is ‘which could you code fastest with the lowest probability of error’, and looking at the two, I think the Bellman Ford would be a bit easier, specifically because it uses a series of edges, rather then a weight table. I could be wrong though, and the performance level does seem to be in favor of Floyd Warshall.

Side Rant
I seem to be rather poor at a very specific bit of contestifying. That is seeing points where you can discard unimportant information. In this problem, it was my attempt to treat every year as a vertex. In a problem about a disk defragmenter, I was tempted to treat the disk as a series of sectors rather then a series of filled and unfilled ranges (would have massively increased computation time to simulate, and probably over complicated the code; didn’t get to find out due to contest ending). I find that this seems like a common theme to me: I constantly attempt the simplest solution that could possibly work, and these problems are more often then not tuned to make that ‘the wrong way’. I suspect this comes from my many years of employment, where you’re not necessarily seeking the best solution, but rather just something that works well enough to get by. That is, of course, part of why I started this crazy blog: to get better at recognizing when I’m going down an incredibly wrong path.

Full Source Code for this solution.

ACM 2008 SCUSA Wormholes

South Central USA has to be one of the odder acronyms in common use… a single character away from SCUBA no less.

The last time I went to ACM, this problem was delegated to one of my team mates, and for one reason or another we never got it solved. I was reading through an Analysis of Algorithms book and came across a section on All Pairs Shortest Path and thought ‘This would apply perfectly and elegantly to this situation’. (I was wrong, but we’ll get to that soon). (Also, judge data can be had at this location, and I have been cheating miserably).

Synopsis
You work at Time Travel Inc. in the payroll department. Sick of keeping up with all the math related to time travel, you contracted yourself (while in college) to develop an application to determine how much a person has aged during their time travel, and whether or not their last journey was even physically possible. More specifically, time travel is acheived by portals which appear at specific moments in time, and exit at other specific moments, either aging or youthifying the human in the process, according to a specific formula; and ignoring the concept that you could die, or be negative years old.

Analysis
This is a graph traversal problem with negative weight directed edges. The goal is to find the minimum weight path that takes you from a start, to a destination, and back to a start; or equivelantly, two graph traversals.

As I said, I had read a bit on All Pairs Shortests Paths and was interested to see how it would handle this. Specifically, I was thinking ‘if you treat each year as a vertex, then each vertex has a weight one edge to the next year; and if a wormhole is present, an edge to that destination year. At this point in time, I thought the input was constricted to a few hundred years near the 2000’s (where else would you time travel to?). The actual range of years is from 1-10,000. 10,000 verticies, 2 10,000 x 10,000 matricies for storing path weights… no. Not gonna work. (As we’ll see in a bit, this is actually a bit of a lie).

Still curious to further my learning, I read up on the Bellman-Ford algorithm, which is a single source shortest paths algorithm. It works on the assumption that the absolute longest path will go through every single vertex in V, and therefore you can determine the minimum cost to get anywhere by going through each edge, relaxing the edge’s destination weight if appropriate, and then repeating that |V| times. Complexity of |V|*|E|. That actually sort of worked, in the sense that it produced an answer for most problems in under a minute. But, under a minute isn’t quite ideal.

To get reasonable response times, you can’t simulate every single meaningless year. You have to focus on only the meaningful ones, and declare edges between them. If you’re in 1950 and want to get to 2050, yes, you can look at it as 100 1 year edges, but it would be less pain to look at it as 1 100 year edge… or, if a wormhole happens to open up in 2012, a 62 year edge followed by a 38 year edge. The downside to ignoring meaningless years is this adds a fair bit of complex non intuitive code, just for the purpose of filling in the ‘time moves forward at a constant rate’ bits.

With that said, you could apply the same reasoning to all pairs shortest path, so long as you only kept track of the vertexes you actually cared about.

Source Code
As this one is a tad bit long, and far from being the greatest, I’m only going to post bits and peices of it.

public static void ProcessProblem(int id, TextReader input)
{
	Console.WriteLine("DATA SET #" + id);
	Timeline timeline = new Timeline();
	int wormHoleCount = ReadInt(input);
	for (int i = 0; i < wormHoleCount; i++)
	timeline.AddWormhole(ReadInts(input));
	int startingYear = ReadInt(input);
	int missionCount = ReadInt(input);
	for (int m = 0; m < missionCount; m++)
	{
		int destination = ReadInt(input);
		int distance = timeline.ShortestPath(startingYear, destination);
		int returnDistance = timeline.ShortestPath(destination, startingYear);
		if(distance==Int32.MaxValue || returnDistance==Int32.MaxValue)
			Console.WriteLine("IMPOSSIBLE");
		else
			Console.WriteLine(distance+returnDistance);
	}
}

Above is the primary problem solving bit. We create a timeline, register wormholes, then go through each mission and determine if the destination can be reached from the starting point, and if the starting point can be reached from the destination. Note that we’re using the max value of an int as a ‘no path’ return code.

class Timeline
{
	private List<int[]> wormholes=new List<int[]>();
 
	public void AddWormhole(int[] wormhole)
	{
		wormholes.Add(wormhole);
	}
 
	public int ShortestPath(int sourceYear, int destYear)
	{
		List<Edge> edges = new List<Edge>();
		List<int> vertexes=new List<int>();
		vertexes.Add(sourceYear); vertexes.Add(destYear);
		foreach (int[] wormhole in wormholes)
		{
			vertexes.Add(wormhole[0]);
			vertexes.Add(wormhole[1]);
			edges.Add(new Edge(wormhole[0], wormhole[1], true));
		}
		vertexes.Sort();
		for (int i = 1; i < vertexes.Count; i++)
			if (vertexes[i] == vertexes[i - 1])
				vertexes.RemoveAt(i--);
		for (int i = 1; i < vertexes.Count; i++)
			edges.Add(new Edge(vertexes[i - 1], vertexes[i], false));
 
		Dictionary<int, int> distances = new Dictionary<int, int>();
		distances[sourceYear] = 0;
 
		for (int i = 0; i < vertexes.Count; i++ )
		foreach (Edge edge in edges)
		{
			if (!distances.ContainsKey(edge.start)) continue;
 
			int distance = distances[edge.start] + edge.weight;
			if (!distances.ContainsKey(edge.end) || distances[edge.end] > distance)
				distances[edge.end] = distance;
		}
		if (distances.ContainsKey(destYear))
			return distances[destYear];
		return Int32.MaxValue;
	}
}

And here we have the meat of the problem. The graph traversal algorithm/timeline class. The first portion of the ShortestPath method is soley responsible for building a list of edges (note, third parameter to edge constructor is ‘isWormhole’). Once that is constructed, the Bellman Ford algorithm is repeatedly applied to get the distance from the start point to every other point. Bellman Ford performance can actually be improved slightly by not continuing to iterate if no relaxations were needed on a given iteration. On my machine, execution time went from about 3s to .5s. (Also, as a fun aside, note the sad lack of a .NET Set class).

That’s all for now. I think I’ll reimplement this with an all pairs shortest path just to see if it would meet time constraints on execution; and make a post about that later. Also, I’m planning to try and deploy/install Wordpress, with some form of auto syntax highlighting magic included, to make this blog thing a bit easier. We’ll see how that goes. And, for a third thought, I finally got started on LC3D, after mere months. I’m working on the GUI now, getting camera to work right, everything rendering, etc. Then I can start in on the fun that is integrating EVERYTHING. Yay.

Full Source Code for First Solution.

Developer Training

Let’s chat about an idea I had for a training project; something to bring someone from an academic environment into a mildly professional environment. When I first started at the ESL, I was given a relatively simple project… after I was given a fair amount of grunt work you’d expect of a high school intern. I spent more time then I should have on that project, but in retrospect I realize that, if adapted slightly, it’s perfect for bringing someone up to speed on things like databases, ORM, .NET, how the web actually works, etc.

Background
Part of what our lab used to do is maintain a database of hourly weather readings at sites across the state. In the past we received emails from the Meteorology department with a fixed width file that contained the relevant data; and we would parse these emails with a cron job on an ancient UNIX machine; the data going into an Informix database. For one reason or another this system needed to be shutdown (Meteorology stopped sending emails; UNIX machine got hacked and destroyed; desire to make the data more accessible; etc).

The original goal of this project was to replace the old system with a more accessible system; in our case using Microsoft technologies (SQL Server for data storage; and a .NET language for data retrieval). The new goal is to do this while iterating through technologies that you’ll encounter in your everyday life.

Data
I figure this is worth spelling out first. Previously we received data in the form of an email. Where precisely are we going to get weather data now? We can’t just go around installing internet enabled weather stations… this is only a training project, not a mass system deployment (sadly).

As it happens, NOAA provides the (current) weather readings in a freely available XML format. As an example: http://www.weather.gov/xml/current_obs/KCLL.xml is for Easterwood Airport, College Station, TX. They provide these for many locales, listed here: http://www.weather.gov/xml/current_obs/seek.php?state=tx&Find;=Find. Most of the XML files will update only once an hour, but there are a few that’ll update more often.

Goal
The goal is to, given a list of weather stations (either URL’s or just station names), write a service that periodically pulls down the weather data for each station, parses it, and puts it into some form of long term data storage. Duplicate readings (ie the weather station hasn’t updated) shouldn’t be stored. Also, the XML files contain a fair bit of extraneous data that can be done without; we can ignore anything extra.

Advised Process

  1. Start by getting source control setup.
    • If getting an svn server is too much overhead, look into using Bazaar, git, Mercurial, just some form of source control.
    • Setup a project / solution structure. In our lab we would have a root project folder containing overhead files (solution file, build files, etc), and then a Source folder and a Database folder under the root.
    • Throughout this I’ll mention that ‘everything should be in source control’ at the end of an iteration. This is not to imply that that is the one specific point to check in all your changes to date. Source control should be an integrated part of development: once you hit a milestone, it’s time to do a commit. Remember, ’commit early, commit often’. Just don’t commit broken things.
  2. For a first iteration, download a single file, parse it, and display it to the screen
    • Interjecting a bit of personal thought; when you first attempt this, try it using a Raw Socket, and the HTTP protocol. Not because it’s advisable (it’s not), but because it shows you the foundation on which the web is built. It’s good to understand this most basic building block.
    • Once you’ve had fun with raw sockets, look into what your language provides by default for making web requests. C# provides a WebRequest object with a Create method that can be used; Java, python, etc probably provide something similar.
    • For Parsing. DO NOT USE ANY METHODS ON THE STRING CLASS. Parsing XML by hand is something that you will be shot for: it’s a death penalty offense. Look into what your language provides for XML Parsing.
    • For storage; for this first iteration, just store to a flat file; one entry per line.

    And that’s it for the first iteration. Make sure everything is in source control, and have a senior fellow review what you’ve done. At this point the senior is looking for basic faults: did you name a public property ‘theTemperatureRightNow’ or ‘Temp’; as well as basic class structure.

  3. The next element is to add configuration, multiple stations, and the ’service’ aspect to it.
    • .NET configuration is usually done through a file called app.config (later renamed to AppName.<extension>.config), and is accessible through System.Configuration.ConfigurationManager ; I’m not sufficiently familiar with other languages to say what’s available; but writing a Config Manager isn’t terribly difficult if your language doesn’t provide it.
    • When I say service; I don’t necessarily mean a Windows Service (or unix dameon process); just a program that contains something akin to for(a very long time). At the simplest level, this can be done with a while(true) loop, with a 15 minute thread sleep. Your language may also provide a Timer construct specifically for firing off threads at specific intervals; using that would be perfectly valid too.
      NOTE: this isn’t something that would be repeated once every second, or once every minute. The data updates at best once every fifteen minutes; common ettiquete demands that ~fifteen minutes be your shortest pause.
    • Logging would also be good to play around with at this point. Most logging packages also come with a form of file configuration; look into playing with that a bit.

    At this point we have a configurable service that will pull down data from any number of weather stations, and parse out the bits we want. Put everything in source control, get a senior to do a quick review of what you’ve done, and listen to any comments they have.

  4. Next, we’re going to add database storage.
    • Determine an appropriate database schema. I would suggest two tables, one carrying weather station information, the other carrying reading information. A reading should have a link back to its weather station; and each table a primary key.
    • For this version, use the raw API’s provided by your platform/language; learn the joys of SqlConnection, SqlDataReader, and parametrized queries.

    So, again, everything in source control; have a senior look at it. At this point, database design and class structure are probably most important; the data storage layer should have a well defined, simple interface that doesn’t cause a headache on viewing.

    Also at this point, you should have a starting idea for what SQL is like. The project should be far enough along that it can collect data. Let it run for a day, and play around with getting interesting weather stats in SQL.

  5. An ORM layer
    • If the data access component was coded to both save and load Weather Readings, then it’s probably possible to extrapolate the fact that writing this transform code sucks. A lot. If left with only SQL, a developer can spend a few days working out data access for a mildly complex object; potentially weeks with things more complex.

      ORM fixes this by implementing a wrapper layer for the whole of SQL Transformation; your only job in turn is to define what’s relevant to the transformation.

    • Implement another Data Storage class using an ORM technology. Use the same interface as the SQL storage class above.

    That’s it for this one. Call a senior over to see what you’ve done; they should be checking to see that everything makes sense, looking for possible points of confusion, etc.

And that’s it for now. It could be extended to include things like Dependency Injection for choosing the output method; a web application fed off the database to get summary data, which can in turn evolve into an AJAX web app with web service back end. You could also go the route of a desktop summary viewer if that’s relevant to what you do. You can attempt more random and off the wall ideas- try to build a temperature map off the data and an overlay of the state map; AI algorithms for predicting the next hours readings; statistical analysis (linear regression to predict temperature at a station that didn’t report); feeding the data into a weather simulation in an MMO. All manner of things.

There’s also a bit of unit testing / TDD possibility, though I’m not certain it’s quite complicated enough to warrant it, and a TDD intro is probably better done by pairing at first, which goes against the goal of a (mostly) solo training project.

Conclusion
I have presented a poorly worded and untested training project. This project is suitable for use as a first project by interns and student developers. This project’s primary goal is to bring less experienced developers up to speed on shop practices, while also providing a reasonably interesting and non-daunting problem for the new developer to solve.

The primary problem with this approach is that it isn’t immediately relevant to what’s being done in your group. Once finished, you’ll still have to go through and bring the dev up to speed on your project. Hopefully the active training will take slightly less time as you don’t have to go through and explain the major concepts. And hopefully they’ll be less likely to commit major sins, but it’s still not ideal. The ideal would probably be to bring the dev up to speed through pair programming, and slowly let them start working on their own / take control.

All that said; it might be just me that’s interested in this stuff. I like being able to go back and say ‘yes, the high for last week was 103 F; with a max wind speed of 20 mph’. I can foresee some developers being less enthusiastic about developing this system. However, for those actually interested, I think this would be a good learning experiment.