failover

Last night/this morning I conducted a total brownout failover exercise.

What if everything in the world went down (or there was a major national grid outage, or similar, or Openreach forgot how to run a fibre network)?

What if someone dropped an aircraft on a regional power distribution centre?

What would happen to my extensive audio and video media library?

What would happen to my data?

Well, there’s only one way to find out (in a simulated kind of way, obv).

But first, before I did anything drastic, I plugged in my nice new shiny UPS unit.

And then I went to the MCB and flipped the switch.

And went to bed.

After six hours sleep I made tea and inspected the devastation and began reading the various server/device logs.

The router was dead, obv, but the NAS (in its capacity as DHCP router) was still up and showing full signs of alertness.

I could plug a monitor and keyboard directly in to HP Server A (I really need to name these things) and HP Server B and access the various media and data stores that are spread across these devices.

I could plug a laptop directly in to the NAS and access the operating system as an administrator. I could also access a partitioned area of the volume and and pick up any/all of the weekly contingency backup that was tucked away there.

And if I had integrated the router in to this setup, my WiFi and local area network would have been up and running for me to do all of these things from the comfort of my own bed.

But I deliberately left the router out of the protected environment, because it wouldn’t have been a real test, would it?

I took the quiet time as an opportunity to deploy a WordPress upgrade across the WP estate, and I triggered a non-standard backup on to my emergency external device.

Then I restored power.

The router came back up (logs showed no issues), and the local area network popped back in to place around the infrastructure.

And the UPS stood itself down and resumed charge mode.

So yeah, that was a pretty good test.

All server and system logs show no problems, and the router log merely records a hard shutdown.

Brilliant!

 

wordpress points of security

I may be preaching to the converted on this but…

I like WordPress (in fact I like most .php-based products).

I’ve tried Ghost and thought it was meh. I’ve worked with Drupal and thought it was alright (if slightly over-engineered).

But I *like* WordPress.

However, the more time I spend with it, the more I realise that WordPress has shortcomings, here and there.

It’s a pros and cons argument.

Yes, there is a very large user community developing features and facilities for the (functional and non-functional portions of the) application.

And this is good. This huge, hardcore team of developers are continually turning the WordPress product in to a much more sophisticated tool.

But there are also some naughty people out there, attempting to bugger up some people’s WordPress installations.

Just for a laugh.

When I was in LA three years ago, one of my WordPress-based websites was hacked.

It was a relatively straightforward task to get in to the back-end and fix the website. It was just frustrating that it had happened.

Though, interestingly, I believe I would fix the problem in a much simpler way these days.

But here are a couple of simple golden rules that everyone should undertake to protect their WordPress environment:

  • log in as administrator
  • create a new user (with a non-obvious name)
  • promote that user to administrator
  • log out as administrator
  • log in as the new administrator user you have just created
  • delete the old administrator account

And while I’m on the patronising subject of the blindingly obvious:

  • never publish content from an admin account – use an author/editor role
  • change your passwords  frequently (and use a random password generator for security)

But don’t worry about deleting your admin user if you have been posting from content from it – you can just assign your new author/contributor user as owner of the legacy content, and then you don’t lose anything.

Maybe you already know these things.

But we’re never to old to learn, are we?

not so odd probes

It’s a shame that I can’t set a ‘deny by IP address’ rule in my router, but sadly I can’t.

Last week I did manage (don’t ask) to create a port-based rule in my router.  The rule automatically rejects any request to access a specific list of ports.

So the only requests that the NAS (in its role as secondary firewall) has to process now are port 80 (and other common http-based) requests.

For the last week the CPU utilisation on the NAS has firmly remained steady in the 7-15% band.

resourceMon

 

 

 

 

 

So is that it?

Have I made hosting in to a trouble-free zone?

Interesting.

odd probes 2

What are Tech Mahindra playing at?

[LAN access from remote] from 199.68.218.129:47420 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:59
[LAN access from remote] from 199.68.218.129:34282 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:58
[LAN access from remote] from 199.68.218.129:39887 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:57
[LAN access from remote] from 199.68.218.129:39785 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:56
[LAN access from remote] from 199.68.218.129:50558 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:55
[LAN access from remote] from 199.68.218.129:51750 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:55
[LAN access from remote] from 199.68.218.129:49432 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:55
[LAN access from remote] from 199.68.218.129:13656 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:13046 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:32097 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:12733 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:54673 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:43122 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:54
[LAN access from remote] from 199.68.218.129:37606 to 192.168.1.9:80 Saturday, Jan 11,2014 16:21:51

odd probes

The memory test results came back from Synology. The results were both conclusive and inconclusive.

They proved that the environment is stable and not under any strain. But they were unable to demonstrate why CPU utilisation was close to 100%.

There were no rogue (or non-rogue) services or processes spinning (and thus eating up CPU).

There were no jobs ticking along in either TSR or background.

And there were no previously unknown index-based utilities doing indexy-related things.

So it was a bit of a mystery as to why CPU should be almost maxed out (yet delivery performance was unaffected by the high-90%s utilisation – because, remember RAM utilisation was in the normal band).

I spent a couple of days wondering what could eat up CPU let leave RAM relatively untroubled.

I was travelling to work, a few days ago, when I remembered a conversation I had with my friend Avril, many years ago – about processing architecture and IP packets.

Avril was a network genius. She holds a number of networking patents, and has worked at BTs R&D division in Suffolk in a senior capacity. She also lectured, part time, at UWE on Real Time Information Systems.

We had a conversation, years ago, about processor architecture and how some types of processors are designed to dump processes in to RAM, whilst others will handle simple positive/negative steps themselves.

It’s as clear as crystal, if you remember the big RISC v Intel debate.

I looked up the architecture specs for the processors in the NAS and discovered that they are designed on the RISC SPARC model.

Light dawned.

So it was entirely possible that the processors were performing simple RAM-type processes, without actually dumping them in to RAM to be dealt with there.

On the way home I decided to check the router logs.

[later]

A look at the router logs confirmed what I’d thought: I was under a DDoS attack.

The router was stopping 95% of the probes, but the 5% that were getting through were enough to be eating up almost all of the CPU. The NAS was rejecting the 5%, but that yes/no process was where the CPU utilisation was going.

At the peak of the attack the router logs indicated that my infrastructure was receiving c. 500 probes per minute – so many probes that the router log couldn’t record them all, it was filling up and over-writing itself every few minutes.

Unfortunately, effective though the router’s firewall is, it isn’t configurable, so I have to rely on its defaults, and beef up the second layer of security on the NAS (which I’m using as secondary firewall and DHCP server/router).

So I figure that I just have to sit tight and wait out the DDoS attack.

But here’s a thing.

Do you know how much traffic your internet router is receiving?

I doubt many people bother to check.

But crack open the logs and have a look. I’m betting you’ll be surprised at what’s going on behind the scenes.

memory testing

This is exciting!

In an attempt to get to the source of the 100% CPU utilisation, I had a word with Synology support.

They got back to me with a suggestion of a memory diagnostic which I ran earlier this evening.

I’ve just mailed off two memory tests for analysis.

Watch this space for further updates!

(ps: ironically, the 100% CPU utilisation issue has not reared its ugly head since)

whittling away at it

I’ve updated the ‘pending’ list over there.

I have been less than impressed with Ghost (the platform, not Drift Ghost HD, my new video camera, which is excellent!). It has left me with a strong aftertaste of mediocrity. I understand the concept, but if you strip away the aspirations, really it has no significant advantages over Drupal or WordPress (and has some fairly hefty disadvantages).

Subdomains (both leading and trailing) were actually much easier to get a grip of than I thought they would be. The simpler method, where there is a need, is to deploy trailing subdomains, but I do understand why the geeky amongst us would prefer leading subdomains.

I had initially set up one of the HP servers, but both are now up and running. I have some test MySQL databases on one, a couple of test domains on the other and I’m experimenting with the split of data (between ‘A’ and ‘B’ boxes).

There is, though, a periodic routing problem that I want to get to grips with. The internet router seems to be doing its job, the NAS (acting as DHCP router) seems to be doing its job, and the servers seem to be doing their jobs.

But every now and then latency creeps up and can run up from an operating level of +90ms to as much as +3s.

I need to sit down and figure out where this is springing from and why it’s a periodic issue. No packets are being lost, so it’s not as if the routing is being dropped, just slowed down. I have also noticed that sometimes the CPU on the NAS rises to 99% utilisation from the customary 15-25%. But when it does peak that high, the RAM utilisation scarcely gets over 35%.

Anyway, I’m sure the routing thing is fixable and probably needs nothing more than some fine-tuning. I’d like to sit down and check my IP addressing. But as ever, this all needs time and there isn’t too much spare time knocking about, at the moment.

Not on the ‘pending’ list, but I have spent a lot of time (almost all of this weekend) consolidating data from a multitude of backups.

Over the last six years I’ve burned out three laptops (or hard disks in laptops).

Before I bought the NAS I undertook a weekly global backup of my laptop(s), all the way back to 2008. To accommodate these backups, I bought three external HDDs (one 500Mb, the other two 1Tb each).

So this weekend, while the weather has been absolutely pants, I’ve been consolidating the data from these individual devices on to the ‘A’ HP server.

This consolidation hasn’t really increased the data footprint on the ‘A’ server very much, but it has freed up three fairly large external HDDs. I have no idea what I’m going to do with them, but I’m sure I’ll think of something.

(a nice side effect of the data consolidation is that my iTunes library has now risen to 7,500 songs)

puzzling it

Earlier this evening I had a look at the router log.

It’s showing *a lot* of IP-related activity.

I don’t know enough about IP traffic-types to be able to read the log and determine what it all means.

But if you do, could you drop me a line?

Here’s an excerpt. If you look at the timings, you’ll see what I mean by *a lot*!

[LAN access from remote] from 165.88.254.215:36379 to 192.168.1.9:80 Friday, Jan 03,2014 21:06:50
[LAN access from remote] from 122.55.79.221:39718 to 192.168.1.9:80 Friday, Jan 03,2014 21:06:41
[LAN access from remote] from 173.192.238.41:50686 to 192.168.1.9:80 Friday, Jan 03,2014 21:06:14
[LAN access from remote] from 173.192.238.41:50549 to 192.168.1.9:80 Friday, Jan 03,2014 21:06:14

geeking in .php

I was awake until gone 4 this morning, trying to crack an apparent permissions problem on a MySQL database.

The database runs a content management system (indexing/categorising/tracking metadata/grouping and, of course, reporting.

It was brought to my attention at about 7pm last night that the content management system wasn’t letting registered users (or administrators) add new content/update metadata.

This is something of a problem, for a content management system.

Obv.

I logged on to the database as senior user and experienced the problem first hand.

My first thought was that the permissions had somehow become unstuck, and the user hierarchy had assumed a permission value of ‘null’.

I checked the user permissions in the admin panel and they were all as they should have been.

I flipped in to the database and looked at things from there, and saw nowt wrong there either.

Hmm. Puzzling.

I then spent several hours running through logical, systematic checks (eg: creating a new user, assigning standard permissions to a section, validating those permissions with the database, writing to the database from the backend with those permissions – etc, etc, etc – up and down the structural, content and permissions-based hierarchies.

I spent several hours achieving some things and not others.

By 4.15am I was too tired to think straight, but switching off the light didn’t help; I couldn’t sleep for ages.

I woke at 9am and had an idea.

What if the .php front-end was somehow fudging the permissions? Concealing functionality? Somehow?

Unlikely, but I was running out of places to look.

I checked the read/write permissions on all of the trigger .php files.

Nope, nothing off-key to be found.

And then I had another thought.

What if it was the design template?

Supposing that there was a not-permissions problem, a something else kind of a problem with the design?

Something missing? Something garbled/corrupted? Something somewhere, that was occluding (word?) a fundamental line (or block) of executable code?

I am not a design expert.

I don’t do flashy front ends and high quality UX audits. I understand function point analysis, yes. I understand integration/interfaces, yes. To a (limited) degree I understand databases of various flavours.

But I’m not a front end developer.

And the person who did the front end development on this application has long since vanished back to Sydney.

I surrounded myself with the few debugging tools I have, gave myself two hours to unpick the front end code, looking for a bug, or a garble, or a thing that didn’t make any sense.

At just under an hour I thought I was on the track of it.

At an hour and ten minutes I knew I had the problem in my sights.

I changed some syntax in a section of triggers, got out of bed, showered, brushed my teeth, got dressed and took the bike out for a ride.

When I got back, a couple of hours later, I had lungfuls a plenty of fresh air, and a clear head.

I went straight back to the area I had been working, completed the tidying up, applied the new code and blow me down, it only went and worked.

Registered users now have read/write access to the appropriate areas of the system (as designated by their admin settings).

And I tidied up a very minor presentational thing that has been annoying me ever since I started looking after the product.

What caused it?

What was the terrible event that triggered this near-apocalyptic devastation?

In a nutshell, I upgraded the database.

It was a legacy product which I had just dragged up to MySQL 5.x (MySQL 5.6.15) from an unstable MySQL 4.1 – and what’s that, something like nine years old?

The front end developer had hard-coded certain (but unnecessary) database calls in to the front end .php.

Upgrading the database (a thing I did on Boxing Day), disconnected the hard-coding and the system had a paddy.

It was that simple.

Yes, very easy to believe that the problem was permission-based. But actually it was a disconnect in the front end that was producing symptoms of a permission-based failure.

I got there.

A more highly-skilled front end wiz would probably have got there much quicker than I did.

But I’m not a front end specialist.

I’m just a geek who does stuff here and there.

rendered completely useless

I’ve been playing with the video I shot yesterday.

I used Sony Vegas Pro as the video/audio editor; here’s what I did first of all:

  • migrated the raw footage to the studio laptop
  • viewed the raw footage in VLC
  • fired up SVP
  • created a blank project using a slightly modified template I designed a while ago
  • imported the raw video to the project
  • split the raw video to the start and end points I wanted (I didn’t want to edit the timeline)
  • created fade in and fade out points for the edited video
  • created and dropped to the start of the edited video a title graphic
  • created a new video channel within the project
  • dropped the ‘comment’ graphics in to the new video channel
  • removed the audio track to the edited video (original track)
  • imported the chosen musical track in to the project, to where I wanted it to start/end
  • saved the project as a .veg file
  • rendered the project as .wmv format

When I played the finished project back in the master viewer in SVP, it looked fine. The start and end points were good, the fades were good, the text graphics were good.

But when I viewed the file in .wmv format there was nothing to like:

  • the title graphic was bitty; dropped, for no reason, and then came back, and flickered briefly to negative before returning to normal
  • the video quality was rubbish, not so much HD (which it should have been), but VFPQ (work it out)

But, oddly, the audio remained consistent.

I re-rendered the finished project as .wmv three times, and each time the same quality (lack of quality!) issues were apparent.

Then I rendered the project as .mpg and viewed it.

Brilliant! No quality issues.

The weird thing is that .wmv presents itself as the default format for rendered video.

Well, not any more, obv. I’ve changed that.

But the questions running around in my head now are what the hell use is .wmv? And why does rendering a finished project in to .wmv present so many quality issues?