Status Blog‎ > ‎

Lots of updates

posted Oct 21, 2008, 9:19 AM by Brian Tanner   [ updated Oct 21, 2008, 9:26 AM ]


Introduction

I just wanted to make a post here to let people know that the project is still active and I'm working on it :)

Been working hard cleaning up the code, making it more robust to network failures, and more user friendly.  I've sped up the aggregator drastically by sending status messages through SQS instead of uploading them as S3 files (if they can be compressed and fit into a single SQS message).

Being robust to failures connecting the signature provider or SQS have made huge improvements of the uptime of the nodes.  We're also managing the status of the nodes through reporting a heartbeat to Amazon's simpleDB, which has made the commander much more reliable and has reduced the complexity and computation of all the nodes.

However, we're still running into problems were some nodes are being marked as dead when they are not, or they are marked as alive when they are clearly not doing work anymore.  I've been taking a shotgun approach and running 60 nodes at a time.  This was good to catch which failures were common and to quickly weed out issues.  However, I think we need to take a more careful approach now, running only a few nodes at a time and carefully evaluating them and seeing what goes wrong.

Logs

I've embraced the apache logging project and java';s logging facilities, and we're not not just dumping errors to the console, we're actually tracking all different levels of errors to files.

Passwords And Such
There is now a configuration system where you can pass parameters at the command line like paramName=paramValue.  There is a parameter value called settingsfile. You can point this to a file that has a name=value pair on each line.  This is a good way to save passwords and stuff.  I'm going to improve it to use a default file in .recordbook in your home directory.

Comments