Developer Training
Let’s chat about an idea I had for a training project; something to bring someone from an academic environment into a mildly professional environment. When I first started at the ESL, I was given a relatively simple project… after I was given a fair amount of grunt work you’d expect of a high school intern. I spent more time then I should have on that project, but in retrospect I realize that, if adapted slightly, it’s perfect for bringing someone up to speed on things like databases, ORM, .NET, how the web actually works, etc.
Background
Part of what our lab used to do is maintain a database of hourly weather readings at sites across the state. In the past we received emails from the Meteorology department with a fixed width file that contained the relevant data; and we would parse these emails with a cron job on an ancient UNIX machine; the data going into an Informix database. For one reason or another this system needed to be shutdown (Meteorology stopped sending emails; UNIX machine got hacked and destroyed; desire to make the data more accessible; etc).
The original goal of this project was to replace the old system with a more accessible system; in our case using Microsoft technologies (SQL Server for data storage; and a .NET language for data retrieval). The new goal is to do this while iterating through technologies that you’ll encounter in your everyday life.
Data
I figure this is worth spelling out first. Previously we received data in the form of an email. Where precisely are we going to get weather data now? We can’t just go around installing internet enabled weather stations… this is only a training project, not a mass system deployment (sadly).
As it happens, NOAA provides the (current) weather readings in a freely available XML format. As an example: http://www.weather.gov/xml/current_obs/KCLL.xml is for Easterwood Airport, College Station, TX. They provide these for many locales, listed here: http://www.weather.gov/xml/current_obs/seek.php?state=tx&Find;=Find. Most of the XML files will update only once an hour, but there are a few that’ll update more often.
Goal
The goal is to, given a list of weather stations (either URL’s or just station names), write a service that periodically pulls down the weather data for each station, parses it, and puts it into some form of long term data storage. Duplicate readings (ie the weather station hasn’t updated) shouldn’t be stored. Also, the XML files contain a fair bit of extraneous data that can be done without; we can ignore anything extra.
Advised Process
- Start by getting source control setup.
- If getting an svn server is too much overhead, look into using Bazaar, git, Mercurial, just some form of source control.
- Setup a project / solution structure. In our lab we would have a root project folder containing overhead files (solution file, build files, etc), and then a Source folder and a Database folder under the root.
- Throughout this I’ll mention that ‘everything should be in source control’ at the end of an iteration. This is not to imply that that is the one specific point to check in all your changes to date. Source control should be an integrated part of development: once you hit a milestone, it’s time to do a commit. Remember, ’commit early, commit often’. Just don’t commit broken things.
- For a first iteration, download a single file, parse it, and display it to the screen
- Interjecting a bit of personal thought; when you first attempt this, try it using a Raw Socket, and the HTTP protocol. Not because it’s advisable (it’s not), but because it shows you the foundation on which the web is built. It’s good to understand this most basic building block.
- Once you’ve had fun with raw sockets, look into what your language provides by default for making web requests. C# provides a WebRequest object with a Create method that can be used; Java, python, etc probably provide something similar.
- For Parsing. DO NOT USE ANY METHODS ON THE STRING CLASS. Parsing XML by hand is something that you will be shot for: it’s a death penalty offense. Look into what your language provides for XML Parsing.
- For storage; for this first iteration, just store to a flat file; one entry per line.
And that’s it for the first iteration. Make sure everything is in source control, and have a senior fellow review what you’ve done. At this point the senior is looking for basic faults: did you name a public property ‘theTemperatureRightNow’ or ‘Temp’; as well as basic class structure.
- The next element is to add configuration, multiple stations, and the ’service’ aspect to it.
- .NET configuration is usually done through a file called app.config (later renamed to AppName.<extension>.config), and is accessible through System.Configuration.ConfigurationManager ; I’m not sufficiently familiar with other languages to say what’s available; but writing a Config Manager isn’t terribly difficult if your language doesn’t provide it.
- When I say service; I don’t necessarily mean a Windows Service (or unix dameon process); just a program that contains something akin to for(a very long time). At the simplest level, this can be done with a while(true) loop, with a 15 minute thread sleep. Your language may also provide a Timer construct specifically for firing off threads at specific intervals; using that would be perfectly valid too.
NOTE: this isn’t something that would be repeated once every second, or once every minute. The data updates at best once every fifteen minutes; common ettiquete demands that ~fifteen minutes be your shortest pause. - Logging would also be good to play around with at this point. Most logging packages also come with a form of file configuration; look into playing with that a bit.
At this point we have a configurable service that will pull down data from any number of weather stations, and parse out the bits we want. Put everything in source control, get a senior to do a quick review of what you’ve done, and listen to any comments they have.
- Next, we’re going to add database storage.
- Determine an appropriate database schema. I would suggest two tables, one carrying weather station information, the other carrying reading information. A reading should have a link back to its weather station; and each table a primary key.
- For this version, use the raw API’s provided by your platform/language; learn the joys of SqlConnection, SqlDataReader, and parametrized queries.
So, again, everything in source control; have a senior look at it. At this point, database design and class structure are probably most important; the data storage layer should have a well defined, simple interface that doesn’t cause a headache on viewing.
Also at this point, you should have a starting idea for what SQL is like. The project should be far enough along that it can collect data. Let it run for a day, and play around with getting interesting weather stats in SQL.
- An ORM layer
- If the data access component was coded to both save and load Weather Readings, then it’s probably possible to extrapolate the fact that writing this transform code sucks. A lot. If left with only SQL, a developer can spend a few days working out data access for a mildly complex object; potentially weeks with things more complex.
ORM fixes this by implementing a wrapper layer for the whole of SQL Transformation; your only job in turn is to define what’s relevant to the transformation.
- Implement another Data Storage class using an ORM technology. Use the same interface as the SQL storage class above.
That’s it for this one. Call a senior over to see what you’ve done; they should be checking to see that everything makes sense, looking for possible points of confusion, etc.
- If the data access component was coded to both save and load Weather Readings, then it’s probably possible to extrapolate the fact that writing this transform code sucks. A lot. If left with only SQL, a developer can spend a few days working out data access for a mildly complex object; potentially weeks with things more complex.
And that’s it for now. It could be extended to include things like Dependency Injection for choosing the output method; a web application fed off the database to get summary data, which can in turn evolve into an AJAX web app with web service back end. You could also go the route of a desktop summary viewer if that’s relevant to what you do. You can attempt more random and off the wall ideas- try to build a temperature map off the data and an overlay of the state map; AI algorithms for predicting the next hours readings; statistical analysis (linear regression to predict temperature at a station that didn’t report); feeding the data into a weather simulation in an MMO. All manner of things.
There’s also a bit of unit testing / TDD possibility, though I’m not certain it’s quite complicated enough to warrant it, and a TDD intro is probably better done by pairing at first, which goes against the goal of a (mostly) solo training project.
Conclusion
I have presented a poorly worded and untested training project. This project is suitable for use as a first project by interns and student developers. This project’s primary goal is to bring less experienced developers up to speed on shop practices, while also providing a reasonably interesting and non-daunting problem for the new developer to solve.
The primary problem with this approach is that it isn’t immediately relevant to what’s being done in your group. Once finished, you’ll still have to go through and bring the dev up to speed on your project. Hopefully the active training will take slightly less time as you don’t have to go through and explain the major concepts. And hopefully they’ll be less likely to commit major sins, but it’s still not ideal. The ideal would probably be to bring the dev up to speed through pair programming, and slowly let them start working on their own / take control.
All that said; it might be just me that’s interested in this stuff. I like being able to go back and say ‘yes, the high for last week was 103 F; with a max wind speed of 20 mph’. I can foresee some developers being less enthusiastic about developing this system. However, for those actually interested, I think this would be a good learning experiment.
Tags: side projects, training
