Skype Outage: Too many holes in the official explanation for my liking

Having blamed Microsoft windows updates for the collapse of the Skype network, the beleaguered p2p VoIP company has spun another yarn now ‘clarifying’ that it’s not really Microsoft’s fault after all.

Their second explanatory post contains more hot air than a dodgy datacenter with a broken air conditioner.

I would urge you to read both posts, as they contain a couple of contradictions and curious points – including:

Post 1:
“The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.”

Post 2:
3. How come previous Microsoft update patches didn’t cause disruption?
That’s because the update patches were not the cause of the disruption.

This seems very odd, given that every Microsoft update requires a restart. There was nothing different with this latest windows update on that front. Thus to say that the reboots caused the outage makes no logical sense without the addition of a further factor (which they don’t appear to be disclosing).

As GigaOm rightly questions – how come this happened on a Thursday, when MS patches are released on a Tuesday? There appears to be no answer to this.

Skype have attempted to reassured us that this won’t happen again:

“Yes, the bug has been squashed. The fix means that we’ve tuned Skype’s P2P core so that it can cope with simultaneous P2P network load and core size changes similar to those that occurred on August 16.”

However, the lack of transparency as to where this fix (and thus where/what the actual problem is) makes this less convincing. If the fault and fix lies with the clients (the Skype softphones) then this fix is only good if everyone updates – which seems unlikely given some less than compelling new features. If people don’t update then the problem still remains.

If the fix is in their server architecture (that is under their control), then that says a lot about just how ‘distributed’ and ‘p2p-like’ Skype is.

For me further explanation is needed for me to feel that Skype can be relied on as a robust business tool (of course, Skype is not a replacement for emergency call use and I am not assuming that degree to reliability and resilience).

I wrote back in early 2006 that the Skype network relied too much on so-called ‘supernodes’ and I personally believe that the main element of the ‘perfect storm’ (Skype’s description) that they’re not disclosing is that the Skype network is running too low on supernodes.

Normal Skype clients that happen to be running on un-firewalled connections are candidates to become supernodes. In order to relay the Skype calls across the internet, Skype uses these unwitting user’s computers and bandwidth, bringing degraded memory, cpu and network throughput performance. There is no benefit or pay-off for being a supernode and there’s no opt-out or indication that your Skype client is being used as one.

I can’t think of any reason why anyone would want to leave their computer in the position that it could become a Skype supernode and it is for this reason that I believe their availability has dwindled and thus the integrity of the Skype network could be on a knife-point.

Far more transparency and technical explanation is needed from parent-company eBay to reassure my concerns. Given it relies on the same network architecture, anyone looking to use Joost as a broadcast platform would be wise to keep across these developments too.

4 Comments

Joe Cartoon

Wow, this smells like the largest load of BS since Microsoft said it wasn’t worried about Linux.

Let’s think here….

Reboots caused the outage? Am I the only person running Windows tha thas to boot occasionally? Are there a number of Skype users that shutdown, logoff, or make there PCs unavailable at any given point in the day?

As stated previously, the outage didn’t happen on Tuesday when MS issued the updates.

Why is Skype being so vague about the cause of the outage? Is it because if we know what REALLY happened that we’ll be scared SH!TLE$$ and cancel out service right away?

That’s satring to sound like a good idea to me.

August 21, 2007
Flotsam

The Tuesday/Thursday thing is quite a valid reason. Smart people wait until they see how many others end up with broken systems caused by patches before diving in willy-nilly. And corporates that run their Windows networks intelligently (yeah, oxymoron time) use tools like WSUS to roll out updates to production machines only after spending a day or two of usage on test machines.

This, of course, doesn’t mean that Skype isn’t another PoS from the people that brought Kazaa and accompanying infestations to the world at large 🙂

August 21, 2007
Ian Nock

Skype needs to clear the air. Reasoned outages are forgiveable (as long as they take public steps to remove the failure mode) but creating uncertainty and doubt is the first step to destroying confidence in the system and then the brand.

Many people who use Skype have now lined up a backup, and in some instances some will be taking steps to make Skype their backup and not their primary voice and text communications service. Bad news for Skype.

August 22, 2007
» Skype, i servizi in Beta, e come _non_ si fa customer care - TechRadar - Skype, Radar, O'Reilly, ipv6, Blue Rey, Jabber, Security, SKY

[…] Skype Outage: Too many holes in the official explanation for my liking, post del 21 agosto sul blog di Ben […]

December 18, 2007

Comments are closed.