06 6 / 2008

Re-testing the cloud (or building better redundancy) [QA/cloud/observations]

Once more we see that the Twitter world is all a flutter with an unfortunate downtime, and because of this, some people are making the egress to Friendfeed, Pownce (both my personal favorites), and whatever else is out there. The same downtime has raised its ugly head for Amazon Web Services (once more), which I so revere, despite the ups and downs. I guess a new business service should spring into action to support early alerts and reporting of these unfortunate events, but before that happens, maybe their respective QA teams can step up to the plate to report these potential failures early.

In fact, in a previous post, I discussed performance monitoring services, that may have some reasonable prices out there for start-ups to jump on board. But I digress from my focus on QA, which I believe sits at the root of where these potential issues arise. Perhaps the test teams at both Twitter and Amazon can improve on their risk assessments of unfortunate server shut downs – and at least address the impact that these unfortunate events deliver to businesses that rely on the clouds 24/7 uptime. There are a number of performance tests that can be applied to ensure that services maintain some reliability during it’s production run. In fact, these tests can include proper failover tests, configuration tests, system failure checks, system resource checks – but all of this is on the physical side of the infrastructure. Other performance tests can check that updated code, or new code is properly tested thoroughly in identical production environments, ensuring that the applications maintain their running status despite the changes. That includes using relative test data from production environments so that all of the changes are properly introduced to reusable, redundant data. These types of tests are textbook, and lead me to believe that these companies are not taking serious note to what may jeopardize their future. All of these tests may suit Amazon Web Services primarily, and may not offer much for Twitter. But then if you have a blog that is monetizing Twitter feeds for news and information, then this too should still be considered thoroughly – at least tested thoroughly.

Changing perspectives, I need to place on an Architects hat and assume that their redundancy is “off-tilt” and perhaps should be redesigned. Maybe the cloud service companies should build a back-up for their back-up. With the systems for both companies falling into limbo way to often, it may lead more and more customers to seek out other  resources, or perhaps take some time to build their cloud needs in-house. All of this is in it’s infancy, and as we mature in this cloud, we may see backup services (hint hint) that will maintain (and fully support) failover for cloud based services.

I’m playing with this idea now.