cheating at hardware fixes

Somewhere around Wednesday evening, about 72 hours after I fixed the remote SSH problem by changing the Plusnet-supplied Sagecom router for a Netgear router, all port 80 and all port 22 calls to the server c1 started being dropped.

There was nothing I could do, because I was down in Bristol and server C1 needed an onsite visit back at the Nottingham datacentre.

Frustrating!

Eventually the weekend rolled around and I tottered off my sickbed in to the datacentre to begin explorations.

Server c1 is an HP DL380/G5.

It had just one (500Gb) disk, which contained all the CentOS 6.6 goodies that had been rolled out so far.

Which wasn’t much cop, because server c1 wouldn’t stay alive.

When I walked up to the cabinet, c1 was definitely receiving power, but was switched off.

I pushed the button and it whirred and whined, noisily, to life.

The console showed me the usual boot sequence.

Then server c1 just powered itself down.

I tried again; it booted up. This time it got as far as the CentOS login prompt.

And then powered down again.

Long story short, I removed the PSU from server c1, cleaned all the PSU and serverside contacts, and replaced it.

The server booted up and stayed up.

I logged in as root and performed some basic functions.

Everything looked fine.

Rather than leave things like that for the week, I decided I’d like to add some extra resilience to the situation.

I removed the PSU from server c2 (another HP DL380/G5), and slotted that in to the spare PSU bay in server c1 (the HP DL380/G5 servers have the capability for two independent PSUs running at the same time).

So server c1 is now running two PSUs, and I’ll keep an eye on the server logs to see if the original PSU drops out, or if there any more powerdown problems.

remote SSH problems

I spent last weekend working through a real pain in the arse problem: couldn’t get remote SSH access configured on to server C1.

Local access via console worked brilliantly.

And I could attach another device to the internal network and run SSH sessions to the internal IP that server C1 had been configured with.

But I couldn’t get SSH consistently working, in a stable, always-up, kind of way, via remote.

The best I could get was for remote SSH to stay up and running for around 20 minutes.

In the end, frustrated beyond belief, I binned the router that Plusnet had supplied (a neat-looking Sagecom device).

Then I looked out a spare Netgear router that I had at home.

I copied the config details from my home-hosted NAS in to the spare Netgear router, and installed that in the datacentre.

Changed the account credentials to match those of the datacentre, obv.

And lo and behold, I had remote SSH.

But for how long?

An hour later it was still working.

This was a new record.

Three hours later it was still up and running.

Twenty-one hours later, we were still golden.

So it seems like, to me at least, that the Plusnet router had a ‘go to sleep’ rule set in the firmware.

If it hadn’t seen any port 22 traffic for around 20 minutes, it shut down port 22.

And wouldn’t wake up when port 22 traffic came knocking on the door.

Marvellous.

Except not, obv.

But the Netgear router fixed that and I had remote SSH and public port 80 access to the server C1.

For 72 hours.

ups downs and ups in the datacentre

It’s been a mixed bag on the datacentre project, this weekend.

I feel that I’m about half a dozen steps further forward, and have only taken one or two steps back.

But it has been a weekend of problems.

The biggest obstacle to making progress actually took me a while to realise exactly what the problem was.

I had downloaded CentOS 7, as this was to be my operating system and virtualisation agent of choice for the hosting servers.

I’d set aside Saturday as the main day of installation.

I inserted the DVD media containing CentOS 7, and booted up the first server.

The system went through its normal start-up/boot sequences, and I took this opportunity to set the iLO2 config.

Then I set the system language, keyboard language, timezone and country settings.

And then the OS wouldn’t let me go any further because it said I had no storage space.

Except the server had half a terabyte of storage, live and flashing a green light at me.

I ran through the boot cycle five times, and each time the OS said I had no storage, and the flashing green light continued to contradict.

I stepped through the server boot sequence, and sure enough the array controller said there was plenty of storage space.

So I did a google, and you know what?

It turns out that CentOS 7 has a compatibility problem with the HP DL360/G5 array controller.

So I downloaded CentOS 6, and burned that to DVD as an iso.

Some hours later I put the CentOS 6 media in the server DVD drive and booted up.

Success!

After the installation, I ran the yum update command except it wouldn’t run.

I tried several commands for online activity and none of them worked.

A bit more googling told me that by default CentOS 6.5 produces a closed server – unlike CentOS 7, which is the product I’ve been doing all my reading on.

CentOS 6.5 needs to have the eth0 and eth1 ports opened by the root administrator.

I did this, and then ran yum update, and downloaded and installed 79Mb of update packages.

I rebooted the system and then successfully pinged a FQDN or two.

Then I shutdown the server and called it a day.

I had intended to get as far as enabling remote access via SSH, but I haven’t even got in to Firewall rules and Securitisation.

And I know that’s another solid half-day of effort.

I’m guessing another 10 hours to bring just the first server in the cluster, to where I want it.

So that’s next weekend then.

freelancer required (CentOS/RHEL)

I’m looking for a very experienced, remotely-located freelancer for some ad hoc work on a small datacentre.

The skills required are:

  • CentOS
  • DNS Server
  • LAMP admin
  • PostFix
  • MariaDB/MySQL
  • php
  • Perl
  • Virtualisation
  • VPN

The work is in two areas:

  1. Project delivery (to assist with consultancy and advice, and, if things go wrong, to take a hands-on role in installation, setup, config)
  2. Ad Hoc support (on an ‘as required’ basis)

The salary will be an agreed hourly rate, paid by whatever means you prefer (PayPal, etc).

I’m not too fussy where in the world you’re based – timezone parity isn’t a big deal for me.

If you’re interested in the role, please drop me a line in the comments box, and I’ll email you back, and we can take the conversation forward.

is it possible to have too many firewalls?

On my current setup, I have two firewalls.

The router has a firewall function.

And the NAS operating system has a firewall function (the native Linux-based feature).

The new setup will have these too.

But I’m considering putting one of these in, to sit between the router and the Linux firewall.

Is this too much?

Is too much even possible?

I think having a third layer of firewall protection is reasonable.

But do you?

servers

The HP DL380 G5 server requires two, but has capacity for three, NICs (Network Interface Controllers).

There are two standard NIC ports, as you would expect from a heavy-duty internet server.

The third is an iLO2 port.

The reason servers have two NICs is to give the server high availability and redundancy capability, and for load balancing.

But the iLO2 port is a non-public, management-only route on to the server.

So while NIC port 1 and NIC port 2 are filtering and processing inbound/outbound packets, the iLO2 port just sits there, in a semi-dormant state.

iLO2 stands for ‘Integrated Lights-Out’ (v2).

The iLO2 function enables root admins to remotely access a failed/stalled server and perform all manner of remote management functions via a dedicated out-of-band channel, regardless of whether the machine is powered on or not, and regardless of whether an operating system is installed or functional or not.

Neat, huh?

The downside is that one has to provide the HP DL380 with three network connections.

So that’s six network connections for two servers.

Nine network connections for three servers.

Etc.

Good job switches and patch panels exist, eh?

serving up new stuff!

Today I bought a pair of used HP DL380 (Gen 5) rackable servers.

The bottom line is that if I bought them as a pair, I could get them for a very good, heavily discounted price.

The servers have been refurbed back to the core.

Each server comes with 2x Quad Core Xeon 2.33Ghz processors.

And each server also comes with 16GB of RAM.

So they are RAIDed, VMWare-ready, and sufficiently powered to do the kind of thing I want them to do.

Well.

Overpowered might be more accurate.

But there’s no such thing as having too much processing power, eh?

The downside is that although each server has 8 bays, neither has any disks, so I’ll have to buy those separately.

I’ve decided that I’ll buy two pairs of disks, a pair for each server.

When I’ve installed the disks, I’ll install CentOS on each server, and virtualise the platforms.

Then, as I add more disks to each server, over time, I just need to resize the virtualised environments to take advantage of the increased disk capacity.

Within the virtualised environments on each physical server, I can create a number of virtual servers (VMs).

So each physical server could host, for example, ten VMs.

A total of 20 virtual servers.

I’m really looking forward to some first-class geekage.

I’m also very excited because this is the first step down the path of a new business venture that I want to explore.

serving up new stuff?

The price of mid-range servers has taken a big dip in recent months.

I’m seriously considering taking advantage of this.

The idea that I’m playing with is to grab a used server, installing ESX Server, and building virtualised machines on it.

The plan is to replicate everything the NAS currently does, but to replicate it in a more scaleable, faster environment.

The NAS has 256MB of RAM.

A decent used server comes with 16GB of RAM.

And that’s a huge difference.

Cost of a used server?

About £150.

Cost of ESX Server?

Free.

Cost of operating system?

Free.

Cost of software (DNS Server, Mailserver, Firewall, etc)?

Free.

Cost of database products?

Free.

It would be an interesting project.

And it fits nicely with my business idea.

things i have learned so far this year (2015)

  1. If I delete directories in the NAS that the NAS backup routine is set to backup on a scheduled job, the scheduled job will fail – even if 99.9% of the other directories are still in place/intact
  2. I can only set one NAS backup job to a specific external HDD. Therefore if I want to set two external backup jobs (even if they are configured to run days apart) I need to set each to a separate external device

 

updating and backing up

I had an interesting discussion yesterday, with a senior technologist, about the recent Drupal security update (that some members of the tech community are unkindly calling ‘a fiasco’).

Oh, I understand the bitterness around the short-notice of the security update.

And I understand the raised hackles around the ‘do this within six hours or consider your systems to be compromised’ message.

But there’s a wider point about how we update our critical systems (and make no mistake, for a very large number of organisations, Drupal is a critical system).

And that’s where I’d like to go with these thoughts.

Mature (and that’s a key word) organisations require their suppliers to produce a release schedule.

Each release schedule will generate, for the users/customers, a full set of release notes, prior to release publication.

A good set of release notes will contain full details of all functional (and non-functional) changes.

But that’s the release schedule model for routine updates.

When urgent fixes are released (let’s assume we’re talking about infrastructure fundamentals here; operating systems and core application products such as databases, mailservers etc), they too should contain a good set of information about the change(s).

Whether we, as decision-making technologists, choose to have our systems automatically process these updates (i.e. before we have read the release notes), or not, is up to us.

It’s about how much control we choose to retain over our infrastructure.

Do we want to read the release notes before we install the updates?

It’s a personal thing.

The critical point, though, is that we ensure we take a whole-system backup, before we apply the updates.

I’m slightly OCD about my infrastructure.

I like to read the release notes before I apply a fix – even if it’s an urgent fix – I like to know what I’m applying, and what changes it will make to my systems.

And I’ll take an extra whole-system backup; everything from the operating system (but including the operating system) upwards, will get backed up and copied to a contingency location.

My core infrastructure is RAIDed.

So I’ll keep one compressed whole-system backup on the RAIDed infra.

And I replicate that backup to an off-system location.

But lately I’m wondering if I should introduce another layer.

Maybe I should consider having my system back itself up to an external product?

2TB.

Hmm.