Written By:
lprent - Date published:
11:08 am, April 1st, 2011 - 9 comments
Categories: admin, The Standard -
Tags:
We have had some routing problems this morning with several networks used by our provider in San Diego. The problem have been ongoing with the network links appearing and disappearing from the perspective of NZ and aussie (there is less of a issue from overseas networks). They’re working on it. It started just before 7am and is starting to look stable at about 11am.
Ironically, I’m just waiting for some due diligence to be done on a local supplier for the election year server – ie that they won’t just fold if a malicious complaint is made. The current warm backup system can’t handle the load (it was having problems handling the comments being fed to it), so it is currently off. The new server was due to get configured this weekend but that is looking less likely because it is unlikely to get provisioned before Monday.
And no, this isn’t an April fools joke…. But since we last had a failure almost exactly a year ago at the end of March running over April 1, I’m starting to get superstitious.
The server will be getting hardware changes this evening starting at 10pm NZDT.
The site will be off line for some hours.
Access seems fine right now.
Also, I use Dreamhosts. Cheap, reliable, good customer service, and a reasonable AUP. I’m not sure how robust they are to takdown complaints, but their polices seem quite firm. (http://abuse.dreamhost.com/libel/ for instance).
Also, if you need a backup site, I may be able to provide something for free. Email me?
Will do.
These days the primary server is running on a reasonably recent dual core dedicated Xeon that started at running well under 20% normal’ peak and 40% spiking CPU during the day last March, and now runs 40% normal peak and 80% spiking. The spiking typically happens when posts are being made with the SEO notifications kicking off demand, whilst at the same time background processes like the search indexing run, and we have multiple comments happening at the same time. Unfortunately I have to do capacity planning based on the spikes rather than the normal and it is difficult to prevent spiking in software (ie it is an OR queuing problem). So I need more cores and more CPU cycles.
The idea is to move to a much higher spec system in NZ and leave our existing US system running in the warm backup role. The latter should have enough grunt in an emergency to carry us through the election year growth (probably about another 50% page views) if I turn off some of the background processes.
If we can’t find something suitable in NZ, then it is pretty easy to get offshore servers with more cores and more CPU cycles and where we don’t have the potential malicious legal issues we have in NZ.
FYI from the hosting company
Not quite as bad as the hosting companies I have had for a different operations.
One was in Florida the day that they had a really good hurricane come through. They had multiple failures in their connection providers resulting in only a single connection staying active. The volume of traffic was such that it kept overloading. After that we always kept the servers separated in different sites and the sites were selected for their lack of expected natural disasters
But of course we had a different hoster that had a contractor for a company down the street from them ram a metal girder straight through their high voltage underground power line. The power surge was sufficient to fry the switches that were meant to divert them to the battery backup system (and the generators that were meant to feed that). Consequently our servers in that site were out for a day while they rewired everything.
Disaster planning is a bit of a pain when traffic keeps rising. You usually find out it doesn’t work when the systems fail. Of course you could do what I did last month. I forced a test failure and found out that the warm backup couldn’t even get close to handling normal loads.
Anyway, I think I’d better order that server today. They may be able to provision it tomorrow. Otherwise it will be a week before I have time to do the setup.
Use linode. Their VPS service is amazing and cheap! Steer clear of dreamhost, as a programmer I would expect nothing less than self-configured servers 😛
We can’t survive on VPS’es. If you look at the ‘Online’ tab on the upper right you will see why. That is a picture of the number of readers (including the spiders) within a couple of minutes on the site. During our day that usually sits between 40-60. But it goes up as high as 150 when comment spikes. Overnight it drops down to 10-15 base load because that is when many of the spiders pick up their data steadily reading their way through the 7000 odd posts and 250,000 odd comments that are online.
Remember that these are unique IP’s. There are many many connections within that count for all of those little graphics, CSS and JS.
We have been booted off three VPS’es for having too much traffic so far. The last one was running purely as a warm backup storing comment and post updates. Right now we’re on the verge of having to start spreading across several servers to handle traffic, which is what I intend to set up when the new server comes online. NZ readers will read the NZ server, and overseas users (mostly spiders) will read the US server.
Ahh, yeah that makes sense, EC2?
“Overnight it drops down to 10-15 base load because that is when many of the spiders pick up their data steadily reading their way through the 7000 odd posts and 250,000 odd comments that are online.”
Isn’t there some way to force the spiders not to re-cache this stuff all the time? Most of those pages and comments won’t be changing very frequently (if ever), so they should be able to index it once and then not need to come back for a couple of months, and only then just to check that the pages are still valid.
What’s that cdn.topsy thing that can take f o r e v e r to load?
Another thing to do this weekend, was meant to do it last week but I was helping to do the list.