Backups part 1: Strategy
There are two kinds of people in this world: people who back up their files and people who will back up their files.
If you don't do backups, check the world backup day website. It lists a number of reasons of why backups are important and provides some advice on how to start backing your files up.
However, their advice is rather simplistic. After following it and setting up /some/ backup, one may end up with no usable backups still and false sense of security. Or, people who are more paranoid (or experienced?), may still have some anxiety. There are still many open questions. Do I backup the right files? Did I forget anything? Will these backups protect me from all data loss situations? Do they cost me too much?
To reduce this anxiety for myself I am starting the series of posts where I will document how I do backups. I hope these will be useful for more people as well.
Now, before I start implementing any backups, I need to understand what to back up. So let's start with...
Data Catalogue
Or what data do I have and how important is it?
Description | Importance | Comment |
---|---|---|
Files | Me & my spouse own several laptops, a desktop and a "home server" which is just an old laptop. All of them mostly run Linux. | |
/home |
High | Spread over several laptops & desktops. A lot of files are duplicate between machines |
/etc |
Low | Can be restored after some documentation reading |
/var |
Low | I try not to put anything important there. Need to double-check though. |
/root |
Low | Nothing there |
/media and /mnt
|
Medium | Some additional devices may be mounted there. |
/mnt/windows |
Low | Special case - dual-boot Windows installation for games. Nothing there except saves which should be synced online |
/bin , /usr , /lib , etc |
Low | Easy enough to reinstall/repair from distribution |
Devices | Non-computer devices | |
Phones | Low | We are on Android and almost everything is backed up to Google account. There is nothing really valuable there anyway. |
Virgin router | Low | I use internet from Virgin media and they insist on their own router. I use close to default settings there so loss of config is not a big deal |
UniFi Dream Machine | Medium | Most of the home routing is done by UniFi Dream Machine, so its config is nice to preserve |
Chromecast / Google TV | Low | Doesn't hold any state |
Online Services | ||
Medium | ||
Feedly | Medium | It may be annoying to loose list of subscriptions |
GitHub | Medium | It's important but by the nature of git I'll have many copies naturally |
Gmail | High | |
Google Calendar | High | |
Google Drive | High | A lot of important stuff there for archival |
Google Keep | Medium | Some small notes there |
Netatmo | Medium | |
Password manager | High | |
ProtonMail | Medium | |
Medium | ||
Low | ||
YouTube | Medium | No videos but playlists and subscriptions |
YouTube Music | High | My music collection is there. It would be bad to loose it along with my playlists |
That's everything I could remember at the moment.
You may have noticed I've also included online services in the list of data I care about. I believe it's a mistake to leave them out. I expect companies to take a good care of my data (and I know first-hand it is true in case of Google). However, I may get locked out of the account for some reason, or the service may be unavailable at some critical time, or may even shut down.
Next, I need to understand what do I need to protect from.
Data loss scenarios
Scenario | Probability | Comment |
---|---|---|
Storage device (HDD or SSD) failure | Medium | Modern hardware is quite robust but can still fail |
Human error | High | My fat fingers are by far the biggest risk to my data. |
Theft or loss | High for laptops | |
Natural disaster (fire, etc) | ??? Hopefully low | |
Malice | Low | I believe I am protected enough from random malicious attacks and not valuable enough to be targeted |
Other considerations
- I use syncthing extensively to sync files between my machines, so a lot of files are duplicates.
- I don't want to spend a fortune on these backups.
- I dislike and try to avoid subscription services and would prefer to pay for what I use approach.
- A lot of my machines have intermittent connection to the Internet, and often do not have public IP address.
- I have a weak Internet uplink at home (upload is capped at 20 Mbps).
- A lot of my data does not change too frequently.
Solution requirements
Taking all of above into consideration, here's what I would like from my backup solution:
Must-have:
- Off-site and ideally offline backups.
- Ability to work with clients without public IP.
- Ability to do incremental backups.
- I did not mention this, but all data must be encrypted in transit and at rest.
Nice to have:
- De-duplication across different hosts.
- Pay as you go instead of subscription.
- Use of open-source software so I can hack and improve my backup solution.
Next up
That's all for this post. Check part 2 where I describe my solution for files backups.
Comments