In just a couple weeks we’re going to schedule some maintenance windows in order to migrate client accounts around to facilitate some server replacements. Now, I know what you’re thinking:
“Didn’t we just do this four years ago? Wasn’t moving to the cloud supposed to do away with these hardware refresh cycles?”
It’s true that virtualization and “the cloud” has empowered services such as ours in ways never dreamed off in the days of “a physical server for every need”, but there are a couple caveats and things to watch out for, including two forces that have come into play in our situation:
The Cloud Is Still Built On Actual Hardware
While it’s true that we no longer have to directly touch hardware, our infrastructure is still ultimately tied to physical hardware. That hardware exists somewhere, and someone has to feed it, care for it, and eventually replace it. The continual commoditization of PC server level hardware means that the newer stuff is generally faster and cheaper than the stuff from a few years ago. This leads our providers into interesting situations where they end up wanting to encourage people towards the newer hardware, so they can decommission the older systems.
This is currently happening with us. A couple months ago I took a phone call that started something like this:
“How’d you like 33% more RAM, 66% more Storage space, and faster, newer generation CPUs, at the very same prices you pay today?”
Now, I’m no fool, so I asked what the catch was. And I learned: we’d have to migrate our existing servers to their new hardware infrastructure. The new hardware is in the same data-centers and has all the same connectivity as our existing hardware, but we’d have to migrate over in order to enjoy the additional resources. Thankfully they offer a “single click” button migration that would take care of everything for us, we just hit the button for a given server, it goes offline, transfers to the new hardware, and spins back up… about 3 hours later.
Okay, not really the best option in the world, but something for us to consider. After all, more resources are always a good thing, we always like getting additional resources for the same price, that means we get to inject more resources into everyone’s hosting plans! But.. a roughly 3 hour downtime for each server? That’s kind of a big chunk for us to commit to, even for a significant resource increase, alone.
But there is another aspect to consider…
The Lingering Operating System
Believe it or not, in the last 8 months I have personally laid eyes on a production Red Hat Enterprise Linux 3 server, being used in a very critical and production oriented way at a customer site. For anyone who doesn’t know, RHEL3 was released in 2004, the last released update was in 2007, and the entire release was announced as End of Life in late 2010, over eight years ago. The machine in question was stood up circa 2006. It’s twelve years old, and while it’s a security vulnerability nightmare, it lives on today in all of its 32bit glory.
Why? Because it’s never needed to be rebuilt. The company in question was an early adopter of server virtualization. This rickety old machine was one of their first virtual systems, and it has persisted, and ran proudly, on numerous stacks of underlying hardware over it’s 12 year lifespan. Virtualization and the flexibility it has brought us has minimized the number of situations that used to lead to a server getting rebuilt from the ground up. While this is great for uptime and SysAdmin sanity, the dark lining is that it sometimes allows old machines to persist longer than they probably should have.
This story isn’t that unique, we’ve seen countless instances of “It’s still running, so we left it be” over the years, and we’re even guilty of it ourselves. While we moved to CentOS 7 as our platform of choice shortly after the release CentOS 7.1, we’ve still got a fair amount of CentOS6 still running in our environment today. While CentOS6 is not scheduled for a full “End of Life” until the end of 2020, we want to get ahead of the curve.
So with the these two datapoints lodged in our minds, we started thinking about the benefits of ‘refreshing’ our existing servers. We built a list of possible benefits:
- More resources, same cost.
- Move everything to a newer Operating System.
- Additionally, we want to move from the basic CentOS platform over to CloudLinux. CloudLinux adds in a bunch of features and abilities that will benefit us in terms of server stability and management.
- Look at retooling our systems to utilize PHP-FPM instead of SuPHP. (Again, increasing performance for clients!)
And then we looked at the downsides of performance such a “full refresh” and weighed out the options before us:
- — No resource upgrade
- — CentOS6 continues to live on, with a necessary replacement in the next 24 months.
- +++ Zero work on our part.
Migrate, but don’t “refresh”
- +++ Resource Upgrade!
- — CentOS6 continues to live on, replacement still necessary within 24 months.
- — 3 hour downtime per server
- -+- Schedule downtime, “push one button” migration process
Build New Servers and Migrate Clients
- +++ Resource Upgrade!
- +++ Operating System Upgrade!
- +++ Much less downtime per client!
- — Most amount of work required on our part.
So, looking over the three possibilities, it became clear that, well, it makes sense to invest the work and do things right now. (Sorry team!) Our plan is rather simple:
- Build out a new server, in the new hardware environment.
- Install everything, get it configured the way we like, test everything out.
- Schedule a window to migrate all customers from one “old” server over to the new one.
- Repeat steps 1-3 for each server that is getting a refresh.
Now, the only part of this that is impactful to our clients is step (3). We’ll do all our normal tricks to minimize the downtime (lowering DNS TTLs, etc), but we can’t make the downtime go away entirely. What we can (and will) do is transfer accounts one at a time, so instead of your website being offline for 3 hours, it will be down for a period of time measured in minutes, based on the size of your individual account. (We usually lower DNS TTLs to 10 minutes, and most accounts transfer within that period of time).
A few points of interest:
- We’ll be emailing all the clients on each server in advance of their maintenance window. Generally we aim for a 5-7 day heads up for something like this.
- Server names and IPs will be changing. For cPanel clients who host their DNS with us (your domain is pointed to ns3.purenrg.com and ns4.purenrg.com), no action will be required on your part in order for your site to perform normally after the migration. If however you have something out there hard-coded to a specific server IP address, you will need to adjust some things.
- New server names and IPs will be included in the announcement emails that go out to clients on a server before it is relocated.
- When your account is relocated, your account information in our portal will be updated at the same time. So if in doubt, you can always use the links within our client portal to access your cPanel interface.
Our aim is to begin the migrations in the next 10-14 days, and to have the entire project wrapped up before the end of the year.