For as long as I can recall, the humble backup has been rather underestimated, a tragic oversight by far too many. We've seen it all, haven't we? From techniques that wouldn't pass muster with a sieve to the rather vexing "Schrödinger's backups"—those curious entities that, having never been tested, are simultaneously both valid and utterly useless. And let's not even get started on the conceptual conundrums, the most egregious of which is the notion that RAID, bless its resilient heart, somehow constitutes a backup. Oh, the data lost, the digital tears shed, all due to such elementary deficiencies!
Nowadays, backup often seems to be an afterthought, a mere footnote in the grand digital narrative. Many simply fling their precious data into "the cloud" with nary a thought as to how—or indeed, if—it's truly protected. It's a rather crucial detail many overlook, but even the behemoth cloud providers operate on a shared responsibility model. Their terms, if one bothers to peruse them, often clarify that while they jolly well secure the infrastructure, the ultimate onus of protecting and backing up your data rests squarely on your shoulders.
One might think that by consigning everything to "the cloud," or to clusters owned by other chaps, or even to those rather elaborate distributed Kubernetes systems, backup becomes entirely redundant. When I gingerly broach the subject with developers or colleagues, asking how they manage backups for all this digital wizardry, they often stare at me as if I'm speaking in an archaic, unknown, and utterly indecipherable tongue. The thought, it seems, has simply never graced their cognitive pathways. Yet, data, dear reader, is decidedly not ephemeral; it must be preserved, in every conceivable fashion.
I've always held firm to a particular philosophy: data must always be restorable (and with the alacrity of a well-oiled machine, if you please!), in an open format (meaning one shouldn't have to mortgage the house to restore or even glance at it), and, crucially, consistent. These points may strike one as blindingly obvious, but alas, they often aren't.
I've had the dubious pleasure of encountering a veritable smorgasbord of data loss scenarios:
The stakes, one might observe, escalate rather dramatically for servers connected to the wild west of the internet, such as e-commerce and email servers. Here, not only is data integrity paramount, but so too is the uninterrupted operation of services. This series of posts, much like a seasoned raconteur revisiting old tales, will dust off some of my earlier articles to elucidate my core ideas on this subject and, at least in part, describe my primary techniques.
Many, quite wrongly I might add, consider a backup to be little more than a simple copy. Oh, the countless times I've heard chaps proudly declare they have backups because they "copy the data." This, my friends, is often quite wrong and, frankly, extremely dangerous, providing a false sense of security that would make a seasoned illusionist blush. Copying the files of a live database, for instance, is an almost entirely useless exercise, as the resulting mess will nearly always be impossible to restore. It is utterly essential to at least perform a proper dump and then transfer that file. Yet, many still blithely proceed, only to face the chilling reality of their mistake when an emergency strikes and a restoration is desperately needed.
Before one so much as contemplates touching a single file, a plan must be meticulously crafted, and that plan, much like a good British mystery, begins with asking the right questions:
"How much risk am I willing to take? What data do I need to protect? What downtime can I tolerate in case of data loss? What type and amount of storage space do I have available?"
The first question, in particular, is absolutely critical. A common, though rather risky, approach is to tuck a backup away on the very same machine that requires backing up. While undeniably convenient, this method rather spectacularly fails in the event of a machine malfunction. Even relying on a classic USB drive for daily backups is not entirely foolproof, as these devices are as susceptible to the whims of fate as any other hardware component. And, contrary to popular belief, even those high-end uninterruptible power supplies (UPS) are not entirely immune to catastrophic failures.