Storage and backups

When you do video production of any kind, you quickly run into two problems, namely having the files organized so that you can effectively find them, and secondly, the insane volumes of data to manage. I sortof knew that movie studios had massive data volumes, but I didn’t realize that even a moderate operation like ours would produce so much data.

An off-the-cuff sum of the data used goes something like this. For every hour of video, the camera files are around 25 GB, the ScreenFlow raw files another 20 GB, and the output video files 4-5 GB. Once you import the camera files to the ScreenFlow project, it grows proportionally and ends up at around 50 GB per hour. So if you keep both the camera files and the ScreenFlow files, and the output files, you have roughly 75 GB per hour of video. If you skip the camerafiles, arguing that there’s a copy in the ScreenFlow project (and hoping you’ll be able to open that forever), you’re “only” producing around 55 GB per hour.

All that is just one copy, though, and I wouldn’t trust anything less than two copies, which gives us 110-150 GB per hour of video. Oh, and a Udemy course could be like 30-50 hours of material, giving us 3.3-7.5 TB per course. Let’s average this to 5 TB per course. And once we get going, let’s count on 3 courses per year, giving us 15 TB per year for two copies. Dear lord, what have we done…

If we back up that material using some kind of incremental backup system like Retrospect (which I’m using for everything else), then if these files would change, they’d be incrementally added again each time, so the actual size could become pretty unpredictable, but at least 15 TB and maybe double that per year. The best way of avoiding this is probably to archive the projects instead of backing them up incrementally. In this way, every copy is under my own control, and every copy is kept in its original form, increasing the chances of being able to use it in the far future.

So where do we put these 15 TB per year? Let’s talk about the options, namely: spinning disks, disks at rest, cloud, or tape.

Spinning disks

Right now, I’m having all these files mainly on spinning disks in two Synology RAID arrays, but that has some drawbacks:

  • it uses electricity all the time. Say 20W for a 12 TB drive, with a duty cycle of 50%, that’s around 87 kWh per year, giving you 7.3 kWh per year per TB. Say 15 cents per kWh (Swedish electricity is relatively cheap), and that is around a US dollar per TB per year. Since we’re adding 15 TB per year, that’s incrementing from $15 the first year, to $30 the second, etc1.
  • The price for the hard disks is around $25 per TB, so with 15TB per year, that’s $375 per year.
  • Limited lifetime. One can expect an attrition rate of 5% per year2.
  • Vulnerable to ransomware if kept as mounted drives. In the same vein, it’s easy to make a mistake as a user and wipe out needed files3.
  • If the RAID controller dies, possibility of total loss of the array contents.
A typical 8-bay Synology

Spinning disks in arrays do have the advantage of convenience, though.

Disks at rest

You could also copy the files to external hard disks and use that as offline storage. Advantages:

  • No ongoing electricity use.
  • Not vulnerable to ransomware4.
  • Low up-front investment, except for the actual drives.
An OWC dual disk dock

But it’s not all roses:

  • Actually pretty expensive. It’s around $200 for an 8 TB drive currently, which is $25 per TB.
  • Disks don’t last that long. What’s worse, they’re likely to die sooner if they’re only spun up rarely5.
  • Disks that are moved around manually also run a larger risk of bumps, and you really don’t need much of a bump to screw things up6.
  • It’s hard to organize this right. There are tools, and I’m in the process of testing out one of them, YoYotta, and it promises to solve that, but I don’t know for sure yet. If it works out, I’ll probably write it up in a later post.

Cloud storage

There are a number of cloud storage vendors. The one I’m using is Wasabi, and I think it’s the cheapest (there’s a price calculator here). I’m using it from Retrospect as secondary storage tier for the most important files from my backups, but you have to be careful with the volumes. Generally speaking (there are lots of details), you pay $6 per month per TB stored, so if you load up 7.5 TB7 over 12 months, you’ll pay $6 the first month, but $54 the last month of that year, say an average of $30 for the first year. Then an average of $84 for the second year, and so on.

You can find the interactive comparison at Wasabisys.com. This one is for 500 TB.

Advantages:

  • It’s out and up there, so it’s off-site.
  • Unless you mount the cloud store as a volume, it’s unreachable for ransomware.
  • No up-front investment.
  • For the extra careful, you can define storage buckets as write only, so you can’t erase or change anything by mistake (or through malicious software).

Disadvantages:

  • The monthly expenses are considerable and gradually increasing.
  • Uploading and downloading is relatively slow, even if you have a seriously fast internet pipe. Any disaster recovery you do will take ages, unless you ask them to mail you hard disks.

Considering that for the video files I have, I’d like to save them for a long time, and probably not need to access them often. This means that if I have, say, 4 years of courses in the cloud, that’s 30 TB storage used, which will cost me a nerve-racking $180 per month even when I’m not doing anything with it. Just sitting there serving no other purpose than “just in case”. They’re not that important, after all.

Tape

Yes, tape. Who would have thought?

Oh boy, does this bring back memories…

Tape has come a long way since last I used them. That was on dual reel 60 MB cartridges on Windows, DOS, and Netware, and DAT tape for Netware. This was back in the 90’s. Actually, I liked it. I find the sound of tape whirring in the evening very soothing.

The problem with tape back then was that every generation tended to be incompatible with any previous generation of hardware and tapes, so it was an expensive and futile exercise in trying to safeguard your data. Also, there was no disk-like file system for tapes, so you had to go through a backup software system to save and restore files. That has its advantages, but it is also cumbersome enough to discourage you from locating a file you don’t absolutely need to restore. There’s nothing spontaneous or enjoyable about digging through the backups.

An HPE LTO-6 external drive. The LTO-8 looks exactly the same.

Meanwhile…

The world invented LTO. The premise of LTO is that it is a predictable progression of increased tape capabilities with some significant guarantees about backwards compatibility. If you look at the history and the roadmaps, you can see that old backups can usefully be recovered some 4-10 years into the future without jumping through hoops. And if you’re careful with your old hardware, it can probably be extended a couple of years longer than that.

The expected evolution of LTO. As I’m writing this, the latest and greatest is LTO-8, with LTO-9 expected shortly

These drives come in SAS and FC interface variants, but I’m basing this discussion on the SAS interface variant, since it’s simpler, cheaper, and more flexible. To connect this drive to your Mac over Thunderbolt 3, you need both an interface card and a box to put that card in.

Advantages:

  • The long-term support and predictable technical roadmap I mentioned.
  • The low media costs. They’re somewhere around $10 per TB8
  • Very long media life. A tape cartridge, if stored right, should be readable 20-30 years into the future.
  • The media is small, light, easy to transport, and not as fragile as hard disks.
  • With LTFS9, there’s any number of software systems and operating systems that can read and write the files.

Disadvantages:

  • The upfront investment is significant. An LTO-8 drive with interface card and Thunderbolt box comes to around $4500-$5000.
  • That’s about it.

Risk comparisons

Let’s try to compare safety and security of the different options.

Spinning disks

Having everything on spinning disks alone is high risk. Not least for hardware failure, user mistakes, break-ins, fire, and ransomware. If you have it on spinning disks, you need to at least have it on disk arrays with redundancy, like RAID 5 or better, which mitigates the implications of hardware failures at least. But this also means that the price per TB goes up.

Disks at rest

This is probably the highest risk for hardware failure of all the alternatives10. It does have the advantage of being off-line, though, making it less likely to fall victim to ransomware. However, since it’s so hard to catalog what you have, it’s more likely to fall victim to thoughtless deletions11.

Cloud storage

From a risk perspective, this is probably the best. It’s off-site, off-line (if you don’t mount it as a drive). The only real risk is if the company behind it goes belly-up or decides they don’t want to do this anymore12. Or if you forget to keep paying the exorbitant monthly bills. So, yes, it’s pretty safe. But for that price, it really should be.

There is a risk though, namely the very long recovery times. The internet simply isn’t fast enough for this kind of volumes.

Tape storage

If you only keep tapes on site, there’s a risk in that. Think burglaries and fire. But that’s easy to work around by saving a second copy of the tapes elsewhere. Since tapes can be encrypted, there’s no additional risk in having the tapes elsewhere. Except for the risk of them disappearing, of course. Else, I don’t see any significant risks with tape, assuming it’s some LTO format.

Cost comparisons

So let’s try to calculate stuff here. I’m assuming an 8 year horizon, because it comes out conveniently when calculating LTO costs13. I’m also assuming we’ll produce 3 courses per year ongoing.

Spinning disks

If you use only spinning disks to store your media files, you’ll need two RAID-based NAS systems. That’s what I have today. My workflow involves having the working files on my desktop (it has an internal 8 TB SSD), then copying the files over to a NAS when I’m done, then from there to the other NAS or disks at rest (I can’t really make up my mind, which is one of the reasons I’m writing this).

Each of these NASs will need to handle 7.5 TB more data per year. In my case, I have one Synology with 5 x 8 TB disks, and one with 4 x 12 TB disks. This gives me 28 TB respectively 33 TB available space. The reason it’s so much less is that what HD vendors call “12 TB” is actually 10.9 TB, and what they call “8 TB” is more accurately 7.3 TB. The other factor is that these are RAID type arrays, so one drive is redundant. And then you lose a little overhead here and there, so in the end, it works out right.

I do use the same NASs for Retrospect backups, and if you allow for a little slack, you could probably use 20 TB on each of these NASs for the media files without cramping your style.

So, what does a NAS like this cost? The latest one I bought was a DS1817 for around $1000 empty. The drives cost around $300 each. So say that one NAS with 4 drives costs you $2200 up front, and another 4 x $300 after 3-4 years (the DS1817 has 8 slots). That ought to carry you over 8 years total.

You need two of these, so say $4400 over 8 years. Unless, of course, drives fail. I think it’s reasonable to count on two failed drives over 8 years, giving us a total of $5000 for two NASs.

There is one additional cost that could arguably be included here, namely 10 Gb ethernet. A Microtik switch with a couple of 10 Gb SFPs would run you $500 or so. You can argue that having everything on NASs kind of makes this a requirement, else you’ll die of boredom looking at the file transfers. Disks at rest and tape don’t require 10 Gb networks in themselves.

So, we’re at $5500 for 8 years for two NASs, or $3000 for one for 8 years14.

And then the electricity costs:

Each NAS consumes around 80W with 4 drives, maybe 140W with 8 drives, so let’s count 80W first 4 years and 140W the second 4 years. That gives us 700 kWh per year in the first four years, and 1.2 MWh/year the second 4 years. Which is $105/year respectively $180/year. A total of 4 x 105 + 4 x 180 = $1140 in electricity over 8 years. And that is for just one NAS. For two, it’s $2280 for 8 years.

Disks at rest

You need a good external disk dock for this. I’m using a Thunderbolt 3 dock from OWC with dual slots and it cost me almost $100.

Then you need disks and this can be a bit difficult. First of all, you can’t just go out and get any old disks. You can’t use 4k native disks, for instance, only 512 or 512e disks15 Also, the disk dock people don’t have compatibility testing or even promises of which size of disk they work with. Disks up to 4 TB almost certainly work, maybe even 6 TB. The most I’ve tested is 4 TB.

4 TB disks cost around $120 each, that is around $30 per TB. If you count on a single set of backups in a mixed scenario, you need 7.5 TB per year, i.e. two of these disks per year, that is $240 per year. This comes to $1920 for 8 years. I’d estimate a larger failure rate with these as compared to the NAS drives, say 6 drives over 8 years, giving us replacement costs of 3 x $120 = $360 for the full 8 years. So a single backup set of disks at rest would cost around $1920 + $360 = $2280 for the full 8 years. Two sets would come to $4560 for 8 years.

Cloud storage

We could use cloud storage for one of the two backup copies. Let’s calculate with $6 per TB per month. If we grow the volume with 7.5 TB per year, we’ll have an average over 8 years of 30 TB (the end volume is 60 TB), so we can calculate on $6 x 30 TB x 96 months = $17,280. OMG…

Tape

There are three scenarios here. One is to get the latest and greatest LTO-8 drive and tapes and use it for 8 years. It’s a pretty safe assumption that you will be able to read those LTO-8 tapes in 8 years. If we go back 8 years to 2012, then we see that LTO-6 was released that year, and it’s still current. If you go look for new drives, HPE is still selling new Ultrium drives from LTO-4 through LTO-8, today, and LTO-4 came out in 2007. LTO-8 came out 2018, so it ought to be on the market up to at least 2030 if the future evolution looks like in the past. And LTO-10, which can be expected around 2023 ought to even read the LTO-8 tapes without problems.

The second scenario is to get an LTO-8 drive and then update to an LTO-10 drive in 4 years, while transferring the data then to LTO-10 tapes.

The third scenario is to get an LTO-6 drive and tapes now, then upgrade to LTO-8 in four years, assuming the prices have come down somewhat. I’m counting on a 30% drop in prices for the LTO-8 system in four years.

Scenario 1 – LTO-8 and keep it for 8 years

An external HPE LTO-8 SAS drive costs around $4000 right now in Sweden. You also need a PCI SAS card like the Atto H1280 for $500. (You can get the H680 for $100 less, but then you’ll probably need a different cable from the one that comes with the HPE drive, costing you $125.) Then you need a Thunderbolt 3 external PCIe box to put the card in like the Sonnet Echo Express SEL for $220. I’m assuming the cable that comes with the HPE drive works for this combo.

If you use tape, I assume both backup sets are on tape, so I’ll count on a total of 15 TB per year. Each tape holds 12 TB uncompressed so we’ll need 10 tapes for the first 4 years, then another 10 for the next four years, i.e. 20 in total. Each tape costs around $100 currently, and probably around $80 in four years, so that’s $1800 for 8 years.

You also need decent software to keep all of this under control, so I’m including an LTFS copy of YoYotta for $500.

The price for 4 years (which we’ll need in the next scenario) is $4000 + $500 + $220 + $1000 + $500 = $6220.

LTO-8 drive$4,000
Atto H1280$500
Sonnet SEL$220
Tapes for 8 years ($1,000+$800)$1,800
YoYotta$500
Total$7,020
Total for an LTO-8 drive used for 8 years.
Scenario 2 – LTO-8, updated to LTO-10 in 4 years

This one is harder to calculate, but if you replace the drive and tapes and keep the rest unchanged, and assuming (big assumption!) that LTO-10 will cost in four years what LTO-8 costs today and that the tapes will hold 48 TB uncompressed according to the roadmap, you’ll get for the first four years $6220 (see above), and for the next four years:

LTO-8 Drive$4,000
4 tapes for the old data $400
4 tapes for future data$400
Sale of second hand LTO-8 drive-$750
Total$4,050
Costs for the LTO-10 replacement for year 5 through 8.
The first four years (LTO-8)$6,220
The second four years (LTO-10)$4,050
Total 8 years$10,270
Total for scenario 2 for 8 years.
Scenario 3 – LTO-6, updated to LTO-8 in 4 years

An LTO-6 drive today in Sweden costs around $2200. The tapes are only around $40 each but only hold 2.5 TB uncompressed, so you need 6 per year if you count on 15 TB per year.

LTO-6 drive$2,200
24 tapes for first 4 years$960
Atto H1280 card$500
Sonnet Echo Express SEL box$220
YoYotta software$500
Total$4,380
Total for first four years.

For the second four years, we only need drive and tape. Let’s assume the LTO-8 drive and tapes has become 30% cheaper in the meanwhile. I don’t count on any resale value for the old LTO-6 at this point.

LTO-8 drive$2,800
LTO-8 tapes for the first 4 years, 5 tapes$350
LTO-8 tapes for the second 4 years, 5 tapes$350
Total$3,500
Total for the second four years

For all the years, then:

First 4 years$4,380
Second 4 years$3,500
Total$7,880
The totals for the second 4 years of LTO-8
Comparing tape scenarios

Of the three tape scenarios, the first is not only the cheapest, but also the simplest, namely to get an LTO-8 drive to start with and keep using it for the full 8 years for a total of $7,020.

Let’s compare them

If we compare the prices between the following reasonable scenarios:

  • One NAS and disks at rest as second copy.
  • One NAS and cloud storage as second copy.
  • Two sets of disks at rest.
  • Two sets of tapes on LTO-8 for all the 8 years.
One NAS for 8 years$3,000
Electricity$1,140
Disks at rest, one set, 8 years$2,280
Total$6,420
One NAS and one set of disks at rest
One NAS for 8 years$3,000
Electricity$1,140
Cloud storage 8 years$17,280
Total$21,420
I guess that one is out…
Two sets of disks at rest, 8 years$4,560
Total$4,560
That’s simple, just a lot of hard disks in a drawer
LTO-8 complete for 8 years$7,020
Total$7,020
The cheapest and simplest LTO version

So all of these in one table. I’ve added a “Trust” column, indicating how reliable the storage is long term, and a “Vuln” column indicating how exposed the backups are to malware, in particular ransomware. I’m scoring Trust and Vuln 1-4, where 1 is best, 4 is worst.

TrustVuln
One NAS plus disks at rest$6,42034
One NAS and cloud$21,42012
Two sets of disks at rest$4,56043
LTO-8 simplest version$7,02021
And the winner is…

From the summary table, I do think the LTO-8 alternative is the way to go. The greatly enhanced security (trust and vulnerability) make it worth the relatively minor extra cost.

  1. Actually it’s more, since I can’t add disks all that gradually, but will start with typically 4 disks right off the bat.
  2. In large studies like Backblaze publishes, it’s lower, but that’s under perfect data center conditions.
  3. If you enable the trash folder on the NAS, it may save you. Unless you get into the habit of emptying the trash, of course. Or only discover what you have done after it emptied automatically.
  4. If you take care not to mount the drives while you have ransomware. If you even know.
  5. Weird, I know, but has something to do with oil leaks and the heads sticking to the platters or something.
  6. As I’m typing this, DiskWarrior is trying to rescue an external 500 GB drive I was going to use for some tests. What did I just say…
  7. I only count on a single copy. The other copy needs to go somewhere else, like on spinning disks.
  8. I’m counting raw capacity, which is 12 TB for LTO-8. You often see 30 TB, but that’s if your data compresses well, which video doesn’t.
  9. Linear Tape File System
  10. Remember I mentioned a bad disk above? Well, DiskWarrior is still running and is at 53 disk malfunctions and counting.
  11. Yup. Really. I’ve done it.
  12. Sound unlikely? Well, look up “CrashPlan” and you’ll see…
  13. LTO generations are around 2-2.5 years.
  14. I’ll use one NAS in one of the mixed scenarios later.
  15. Yep, a whole other thing we have to think about now.

Leave a Reply

Your email address will not be published. Required fields are marked *