cheating at hardware fixes

Somewhere around Wednesday evening, about 72 hours after I fixed the remote SSH problem by changing the Plusnet-supplied Sagecom router for a Netgear router, all port 80 and all port 22 calls to the server c1 started being dropped.

There was nothing I could do, because I was down in Bristol and server C1 needed an onsite visit back at the Nottingham datacentre.

Frustrating!

Eventually the weekend rolled around and I tottered off my sickbed in to the datacentre to begin explorations.

Server c1 is an HP DL380/G5.

It had just one (500Gb) disk, which contained all the CentOS 6.6 goodies that had been rolled out so far.

Which wasn’t much cop, because server c1 wouldn’t stay alive.

When I walked up to the cabinet, c1 was definitely receiving power, but was switched off.

I pushed the button and it whirred and whined, noisily, to life.

The console showed me the usual boot sequence.

Then server c1 just powered itself down.

I tried again; it booted up. This time it got as far as the CentOS login prompt.

And then powered down again.

Long story short, I removed the PSU from server c1, cleaned all the PSU and serverside contacts, and replaced it.

The server booted up and stayed up.

I logged in as root and performed some basic functions.

Everything looked fine.

Rather than leave things like that for the week, I decided I’d like to add some extra resilience to the situation.

I removed the PSU from server c2 (another HP DL380/G5), and slotted that in to the spare PSU bay in server c1 (the HP DL380/G5 servers have the capability for two independent PSUs running at the same time).

So server c1 is now running two PSUs, and I’ll keep an eye on the server logs to see if the original PSU drops out, or if there any more powerdown problems.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *