There was a discussion on a forum about how to save on IT costs, and the question of consolidating servers came up. So, I had a few little somethings to say, and some of that saying I cleaned up and presented here.
To the argument that servers get additional power and capabilities every year, so you can replace a number of yeasteryears server with just one or a few new ones, I had this to say:
Consolidation of different functions on the same machine, same OS instance, greatly increases the ROI for break-ins, fragility of the corporation and risk for downtime and weird interactions. Makes it harder to evolve software and adapt, too. “Keep it simple” directly contradicts “consolidation of servers”.
Consolidate (if you will) the hardware, but virtualize the instances. Ideally, don’t let one server (OS instance) have more than one distinct function. That makes it much easier to secure, replace, move, control, etc.
This is the way I design applications, so they can run on, let’s call it a “redundant array of inexpensive servers”. If you’re into designing and building business line applications with Internet components in them and requirements on security and resilience, it ought to become very obvious after a while that this is the way to go.
Consolidation of OS instances saves you nothing but license fees, but greatly increases interdependency of processes. Which is a major enemy of security. It not only increases the risk for breaks, but also greatly increases the damage caused by breaks and greatly increases the difficulty in fixing problems. You’re *much* better off with three cheap servers doing the same thing than one big and indispensible machine doing it all. (Except if that one big machine is running virtualized instances that can be moved around. But still, make it at least two big machines.)
If all you have is three webservers, you could consolidate. You’ll greatly diminish resilience, while having to upgrade that remaining server. But what’s the profit in this? If having downtime increase substantially costs you nothing, there could conceivable be some kind of advantage, but even then. I don’t doubt there are situations that make OS instance consolidation a good idea economically and securitywise. I do think, however, that they are few and far between.
In short, to me, the best return for money is in *increasing* the number of machines or virtualized instances, while reducing their individual importance and horsepower. Use cheap sh.. erm, hardware, if you want. Preferrably cheap software, too.
Some people claim that OS instance licenses are too expensive and that having OS’s run on top of other OS’s (the hosting OS), this means that there will be higher bills to pay and (even) more rebooting going on at patch time.
But Windows, contrary to popular belief, is not the only operating system in existance. I’ve heard vague rumours of others. It’s counterintuitive, but there you are.
Now, if all you have is one honking big server which only needs to be patched once, but fails to boot after that, what happens to the enterprise? Or if one of your 27 server apps gets an allergic reaction to the patch or another app, you’re up exactly which creek?
Finally, you don’t necessarily need to boot the host system after a patch, since it doesn’t have to be Windows. And I doubt you actually need to patch the host system every time there is a patch available, either, since the host system isn’t normally exposed to applications or the Internet.
Different apps often have very contradictory demands on setup of major supporting apps like IIS and diverse runtimes. DLL hell and all that. And many of these contradictory demands aren’t even known and become obvious once things stop working, or (worst case) become unstable and unpredictable. This all leads to the “frozen configuration syndrome”. Who, in their right minds, dare touch any fundamental config on a machine running the enterprise’s crown jewels?
Which also leads to defensive deployment and defensive development. You simply don’t dare to utilize the latest and greatest on a machine that also has to run the oldest and moldiest.
When refreshing the inventory, I wouldn’t replace 486s with dual Xeons. I’d replace them with entry level Pentiums of some kind or (as others are wont to say) Salvation Army Specials. The only thing stopping me from suggesting filling the hall with all the old junked desktops is that they don’t fit in racks and would give the cleaning crew the fits. This kind of machine is great for, for instance, clustered Internet front-ends, if your apps are designed to work session-less (I’m talking app protocols, not HTTP here). If you need reliability and don’t go for the “old junk” look, entry level rack servers will usually do the trick.
Recall the classic dilemma: that old app that simply can’t stand a particular service pack. What do you do? Do you leave the entire giga-humongous server vulnerable just because the dearly beloved statistical app from hell doesn’t run on the new SP? Or do you just leave the machine (OS instance) that piece of cr** is running on vulnerable? Or do you take it down in the name of security, and polish your resume in the name of freedom of speech?
As an example, let’s look at a LAMP (Linux/Apache/MySQL/Perl-Python) setup: I wouldn’t even run the “A” and the “M” on the same machine. And I’d try to design the app (and the business process!) so that the “M” wouldn’t run on a single machine, either. The “M” would be on another subnet from the “A”s, too. And the different “A”s on different subnets, unreachable to each other, if I could manage that with the available routers and switches.
If you have MySQL and Apache on different machines, with only the MySQL ports open between the two, and the MySQL machine not on the Internet, you make it very hard to get at the MySQL machine even after achieving root on one of the Apache machines.
If you’ve got a multiple of Apache machines doing the same work, not having any open ports between them, you will make the web server farm resilient. To enable an attacker to take down all your web servers forces him to attack each machine anew from the Internet side. Barring a class break, he will have a lot more work cut out for him. If you also put in some kind of IDS, even if it’s homemade, you will detect and act before he got to the last machine and before services go down entirely. You’ll also be free to start patching and updating from one end of the server row while the hacker is eating away at the other end.
A nice side effect: it’s easier to detect that something is going wrong with a server, when you’ve got several other identical servers to compare with. If the MySQL traffic, for instance, is substantially different on one, something is probably wrong. The way I did it last time is measuring number of messages handled (on frontends) or processed (on crypto machines). This varies a lot, but all machines should vary in parallel. Very easy to monitor.
In conclusion, having an array of reduntant cheap servers buys you:
1. Major availability advantages
2. Easier development and upgrades
3. Maybe a little extra confidentiality and integrity, by the subnet