How to solve problems: isolation goes for compensation

Some time ago I was invited to attend a guided tour in large computer operations center, and yes, having originally a technical background I like such things. Also, it brought back in memory how to solve problems. One of the topics during the tour was bringing a broken server back into service. I never gave it a thought before, but this is challenging task in more than one way.

First the technical side: there are 3 ways to make sure that you can put a server back into service:

  1. Double the complete server including its infrastructure.
  2. Create system snapshots periodically and store them on disk, not on tape.
  3. Create complete backups that is bear metal which means including everything such as hardware drivers configuration files and operating systems.

Of course, these 3 methods have their own advantages and disadvantages. The first one doubling the infrastructure is fast and stable because service is always guaranteed. It is expensive: you need more than twice the investment. The second one helps you to make the gap between the working and breakdown moment smaller, and thus information loss. But disk backups are expensive though not as expensive as the first option. The third one makes sure that you can simply dump data on the repaired hardware without worrying about a new installation and configuration cycle so that is how a large computer operation center makes sure that the duration between a failure and being back on line is as short as possible.

Now the second observation. Regarding the challenging aspect the best approach always and therefore also in computer operations is this: isolation goes before compensation. Applied to backups this implies: make your content to be backup-ed as small as possible and thus backup real changes only. Backup as often as feasible say every 15 minutes. A backup as small as possible is do not only inspect which files changed but look for information changes. This implies that when you backup an operating system for the 2nd time 99% is already in your backup and does not need to be backed-up again and hence a very small backup is the result and this enables the second point. such a small change can be backed up much more often than a large one and you know since a short time such intelligent backup software exists and is used on large scale. It works on all information so you can backup your laptop USB stick or whatever you want to know more.

Contact Hans Lodder.