The ongoing saga of Healthcare.gov continued this past weekend when the website was taken offline for several hours on both Friday and Saturday nights so they could perform maintenance and bug fixes that would hopefully get the site’s performance a little closer to where they wanted it when it opened to the public two months ago. Server maintenance is a mysterious event to non-IT folks since you don’t really know much about it other than the fact that the network needs to be taken down to do it, but a lot of people believe that “maintenance” is really just a euphemism for “it’s broken and we need to take the whole thing down to fix it”, sort of like we’ve seen with the Healthcare.gov debacle.
It’s understandable that many casual observers would get the wrong idea and assume that, when their administrator is regularly sending out emails to tell people the system will be down for maintenance, that their company’s systems are fraught with problems and are constantly in need of repairs. While that may actually be true in some cases, most companies have regularly scheduled maintenance periods, usually every month or so, when they do routine maintenance similar to what you might do to keep your car running. Server maintenance is kind of like the oil change of computers in that sense.
What is done during server maintenance isn’t actually all that different in many cases that the kinds of things you should be doing to keep your own computer in good working order, and usually consists of things like applying security patches, installing new hardware, rebooting the servers (their performance can suffer if they never reboot just like your own computers), and performing other upgrades that might be necessary to support new software or hardware your company plans to start using. Much like road work on highways, it’s generally done at times when nobody is working (such as late at night or on weekends), and there’s a couple of reasons for that.
The first is that just about everything people do in an office setting is dependent on one server or another, and since those servers are going to be unavailable while the maintenance work is being done on them, there’s no sense in doing the maintenance while people are trying to work. Another, even more important reason is that your administrators want to make sure that, if one of the patches or some other change they make causes one of the servers they’re working on to melt down, everyone in the office won’t be left unable to work for however long it takes to fix the problem.
Now, there are going to be times when emergency maintenance is required outside the normal schedule, but these instances are for very specific reasons that cannot wait, such as a fix to some newly-discovered security hole that hackers could exploit, or as in the case of Healthcare.gov, system performance is so unacceptable that the problems need to be fixed immediately. Even these emergencies don’t necessarily mean someone isn’t doing their job, and to go back to the car analogy, you don’t always see the road bump that’s going to crack your axle. While there are going to be problems that someone should have caught before it became an emergency, you just don’t know when a hard drive is going to keel over and need to be replaced.
Hopefully that de-mystified the mystery of system maintenance, but if you have any other questions or comments, feel free to email me at firstname.lastname@example.org or leave a comment below!