BorgBackup: (TODO: Insert Borg Joke Here)
Table of Contents
I was really fighting with my inner self to have a straightforward title for this one, but alas, resistance is futile.
BorgBackup, or, for short, just “Borg”, is a relatively fast (more on that later), efficient, secure, and authenticated way of backing up multiple devices either on a single network, or even across networks (you’ll see, again, later). This is currently what I use for backing up my stuff, and, well, it’s just cool, and definitely something you should take a look at.
What Borg is Not
Borg is not an enterprise-level backup system. Not that it technically can’t be used as such, just that it doesn’t seem to be built for that level of a workload.
Anyways, let’s run down some of the basics.
Borg Architecture
Transport
Borg uses SSH as the main transport mechanism, meaning in theory, any time you can SSH into your server, you can create backups. Different devices, different networks, halfway across the globe… as long as you can access it. Also note, direct processes communication over SSH is faster than directly backing up to a remote network filesystem like NFS or SSHFS. Of course, this also means that the main form of security and authentication relies on your SSH server setup, and the security of your credentials, be it a password, public key, or what have you. There is another layer though, so keep that in mind.
Deduplicating Storage
All backups in a ‘repository’ (think Git repository — just a folder with a specific file structure within) are split into chunks, which are cryptographically hashed, and only chunks with unseen hashes are actually recorded, the rest are marked down but not recorded — in effect, that data is deduplicated, it will never store the same data twice on disk, it’ll store once, and just point to it later if it needs to.
Encryption
All data in a repository is AES-256 encrypted, and verified with HMAC-SHA256. Data is encrypted client-side, meaning the server doesn’t need the resources for it, and you aren’t even transmitting unencrypted data over the network.
Compression
Data in a repository, as well as deduplication, can also be compressed with lz4, zstd, zlib, or lzma, providing even less storage usage (potentially).
Ease of Use
Borg is just a single-fine CLI binary that’s available on most platforms.
Additionally, the command itself isn’t that hard to use either, the --help
flag is, well, for once, actually useful.
There’s even an experimental web interface.
Repository Specifics
Many parts of a Borg repo are set at create time (borg init [OPTIONS] PATH
), the main one being -e
or --encryption
, which is not only required, but also specifies a few things.
You can specify none
… which is just a fancy rsync
, if you like, but most people are going to use some form of encryption or authentication.
Seriously, there’s no point.
Don’t do that.
Of the real options, there’s a few possibilities: authenticated
just runs all the hash checks to make sure nobody has been messing with things, but provides no encryption.
The repokey
and keyfile
modes provide encryption.
repokey
is, effectively, a password-protected repo.
keyfile
has a password and requires that you, the client, have the matching half of the repository key before you’re allowed access.
In SSH terms, repokey
is password auth, and keyfile
is public key auth and then password auth.
Note: There’s also -blake2
versions of all the modes (minus none
), like repokey-blake2
, which use BLAKE2b-256 as the HMAC hash, not SHA-256.
On processors without SHA hardware acceleration, BLAKE2b is a faster hash than SHA.
Besides encryption, you can also set a storage quota, meaning Borg will refuse to commit backups that would cause the repo to be larger than its quota.
Backing Up
To actually run a backup, you just need to borg create
one, specifying the repo path and archive (backup) name, and the paths to include.
For example, $ borg create user@backup-host:backups::Monday ~/Pictures ~/Documents
.
This would backup Pictures
and Documents
to an archive named Monday
in the backups
path of backup-host
, over SSH.
You can use variables in the archive name (part after the ::
), like {now}
for the current date and time.
(borg help placeholders
has the list of them all)
And just as an example, let’s look at the final output of a backup job:
------------------------------------------------------------------------------
Archive name: Monday
Archive fingerprint: bd31004d58f51ea06ff735d2e5ac49376901b21d58035f8fb05dbf866566e3c2
Time (start): Tue, 2016-02-16 18:15:11
Time (end): Tue, 2016-02-16 18:15:11
Duration: 0.19 seconds
Number of files: 127
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 4.16 MB 4.17 MB 26.78 kB
All archives: 8.33 MB 8.34 MB 4.19 MB
Unique chunks Total chunks
Chunk index: 132 261
------------------------------------------------------------------------------
Here, we can see, in order, the final name that was submitted, the hash of this job, the time information, and number of files processed. When listing this one later, we can also see things like the exact command line used.
It’s also possible to see the size as uncompressed (aka source), compressed, and deduplicated (meaning final total on-disk), which, if you notice, in this case, is remarkably small compared to the source size. Just looking at the “Deduplicated size” for “All archives,” and “Chunk index” numbers, this repo is about 50% the size it would be without data deduplication.
And yes, it’s possible to automate this process.
Additional commands like borg prune
can be used to roll off old backups to make space for newer ones.
Borg is smart about this, and won’t delete chunks that are referenced in another archive.
So don’t worry about losing any data to deduplication.
Conclusion
It’s just cool, and it’s designed to actually be usable without spending multiple hours reading the documentation, unlike other tools. Though if you do want, the documentation is actually very well done and easy to follow, so have fun with this one, nerds.