# Synchronising and backing up files securely

## Dropbox for Dropbox haters

Related: backing up data.

Purely cloud-based network drives just aren’t as awesome at fast or distributed work. Realising this is why the Dropbox founders are now rich, when they invented a thing which keeps the files locally and syncs them with your co-workers online. Well done them.

However, Dropbox’s solution, as groundbreaking as it was, is also unsatisfactory, being hamstrung by technical and legal shortcomings. Same goes for the Google and Microsoft options. How can I get something like Dropbox without all the security holes and creepy behaviour?

Peer to peer synchronising (i.e. no special server at all) is one robust option and I do a lot of this. I then have no 3rd party helping me, which is a plus and a minus. Plus: I can feel safer. Minus: I do not synchronise if my computers are not simultaneously online.

Taking it further, how about everything be sneakernets?

Or I can use some hacks to make Dropbox less awful, by e.g. encrypting my files to inhibit some Dropbox data mining, and by using alternative clients that are less suspect and so on.

Various options follow.

## Syncthing

Choose this if… You have a collection of various folders that you need shared to various different machines, and you would like many of the the different machines to be able to edit them. You don’t need a server and thus you are happy for syncing to happen if and when the peers are online. And you don’t care about iOS. e.g. I use this for synchronising my music production files across my studio machines, studio backup machines and gig machines.

Syncthing has an elegant decentralised peer-to-peer design. It is mostly simple and friendly to use, although I spent too long reading the manual and being intimidated to dive in and discover that.

Granularity is per-folder-per-machine — each shared folder (and all sub folders) is a separate share. It doesn’t support iOS. It doesn’t support archiving stuff to USB keys or semi-offline stores, or multiple copies of the same folder on one machine.

Stated design criteria:

• Private. None of your data is ever stored anywhere else than on your computers. There is no central server that might be compromised, legally or illegally.

[Editorial note: Note that, if any of your machines are compromised your attacker still gets the data on that machine. It’s not magic. It’s simply that you don’t have to worry about a whole other copy of your data being on a server owned by some faceless third party who may or may not not have acceptable confidentiality practice.]

• Encrypted. All communication is secured using TLS. The encryption used includes perfect forward secrecy to prevent any eavesdropper from ever gaining access to your data.

• Authenticated. Every node is identified by a strong cryptographic certificate. Only nodes you have explicitly allowed can connect to your cluster.

You probably want to set the following files ignored in your .stignore, file if you don’t want it to synchronise a little too aggressively.

// From Windows
$RECYCLE.BIN System Volume Information$WINDOWS.~BT
pagefile.sys
desktop.ini

// From OS X
Icon?
.Spotlight-V100
.Trashes
(?d).DS_Store
.fseventsd
(?d)._*

// From Linux
lost+found
.gvfs
.local/share/trash
.Trash-*
.fuse_hidden*

There is a CLI syncthing manager for your remote cloud instances, the named syncthingmanager. It has a macOS client.

Syncthing also has file versioning and such, but cryptographic signing of versions and guaranteeing consistent snapshots and so on is not a front-and-centre feature.

The major gotcha is that syncing between case insensitive and case sensitive file systems is broken and can delete data. That’s right, this app works beautifully, smoothly and easily except that the moment you use it to sync between linux and macOS or Linux and Windows (which have different case sensitivity per default, although it’s a long story) stuff goes horribly wrong. This works as of version 1.9.0, sort of. It could be better documented. This solution is a clunky — for me at least, often if one machine renames a file changing only the case, then other machines no longer delete the data, which is a wonderful improvemnt, but … Rather, some machines become confused about the rename and demand that you manually “fix” it, by renaming rather than syncing the rename across the machines. It seems you need to do this on each of the machines. I’m sure that there is some sensible reason that this happens, to do with the intrinsic difficulties of distributed something-or-other, but it does lead to much manual labour to do something that looks automatic and trivial.

A lesser problem is that syncthing uses a non-trivial amount of disk space in a central cache folder (i.e. outside where the files of interest are). This is something like 1% of the size of the data being backed up, which if you are syncing many terabytes does add up.

## Seafile

Choose this if… You have a collection of various folders that you need to synchronise between macOS, Windows, Linux, Android and iOS. You don’t mind installing or paying for a central server to coordinate all this.

Seafile is an open-source file sync service with a premium enterprise server with more features available for a fee. It has clients for browsers and desktop and mobile. Semi-host-blind encryption is available but optional and doesn’t work from browsers or encrypt metadata. FWIW it looks simpler than Nextcloud to install manually, although in either case it’s easiest to run pre-made docker images if you trust the creator. You can build your own; this starts to feel like a looot of work. You would use this if you wanted a convenient iOS client, which is otherwise tricky to get. I would happily pay for a hosted version of this if there was one available, but there isn’t.

## Peergos

Peergos is a file sync based on IPFS.

## dat

Choose this if… You have a large data set that you wish to share amongst many strangers, and if there is a single source of truth. I would use this for sharing predictably updated research, because I wished to have the flexibility of updating my data at the cost of keeping the software running,

Dat is similar to syncthing, with a different emphasis — sharing data to strangers rather than friends, with a special focus on datasets. You could also use it for backups or other sharing. See scientific data sharing.

NB it’s one-writer-many-readers, so don’t get excited about multiple data sources, or inter-lab collaboration. For this price, though, you get data versioning and robust verifiability, which is a bargain. Useless in my current “lone madman”-style workflow though.

People have built more collaborative applications on top of the the Dat tools, such as beaker browser, which is a decentralised web browser. Good on them.

## OrbitDB

Choose this if… Not sure yet.

## Mega

Choose this if… You want to share files, chats, data and whatever else, with people who can’t or won’t install their own software and so must use a browser to download stuff, and if you don’t care that the company behind it is dicey. e.g. I used this for some temporary file sharing music collaboration projects, but now that they are over, I’ve deleted it.

Mega Easy to run. Public source, but not open source. (Long story.) Host-blind encryption business from New Zealand.

Anyway it’s relatively easy to use because it works in the browser, so it won’t terrify your non-geek friends. Ok, maybe a little. Much cheaper than Dropbox, as well as being probably less creepy. The UI is occasionally freaky but it’s reasonably functional, especially for its bargain-basement price. A… unique?… tradeoff of respectability, privacy and affordability.

## Rclone

Choose this if… You want to infrequently clone some files somewhere, collaborate with people who use a different sync solutions, or because you want a Swiss army knife fallback solution.

Rclone is a command line program to copy files and directories to and from various cloud sync solution Google Drive, Amazon S3, Memset Memstore, Dropbox etc. For some of these services it is the only Linux client. It is not actually a sync client per se, in that it does not mirror the files to your drive, but accesses remote storage that happens to usually be used by sync clients.

I have this around because it lets me plug into pretty much anything, e.g. accessing my Microsoft OneDrive from the campus Linux cluster. Also because my colleagues don’t agree on which provider to use, and I don’t have enough money, time, or trust to run Dropbox, google Drive, Onedrive and Mega.

Pro-tip: the encryption module turns any unencrypted file storage into an encrypted one for your personal use. Sharing the data from such a drive is tricky, but it is much safer. It is thus a convenient UI for encrypting things, even local files.

Pro-tip: It can also mount remote cloud storage on your local machine, which is handy although not recommended for daily use as it’s ungainly and slow, although you can set up caching but oh my this is getting complicated now isn’t it?

Features:

• MD5/SHA1 hashes checked at all times for file integrity
• Timestamps preserved on files
• Partial syncs supported on a whole file basis
• Copy mode to just copy new/changed files
• Sync (one way) mode to make a directory identical
• Check mode to check for file hash equality
• Can sync to and from network, eg two different cloud accounts
• Optional encryption (Crypt)
• Optional FUSE mount (rclone mount)

Cons: manual synchronising is to be avoided because every extra thing to remember is another thing you will forget at the worst possible time, so you probably still want an actual sync client of some kind as well. Or, you could script it into your build tool, which is what I do.

You could read the manual, but everything seems to work great for me if I simply run

rclone config

See below for a concrete application of this.

## Upspin

Choose this if… you prize elegance above comprehensibility and but intermittently descend from the heights of your research program in abstract category theory to quotidian normalcy in order to share files.

Upspin is hard to explain, specifically because I don’t understand it. It’s not really a sync service (I think) but it fills some of the same niches. Rclone on steroids, with a server process, something like that?

When did you last…

• make a file public just to share it with one person?
• accidentally make something visible to the wrong people?

Upspin is an attempt to address problems like these, and many more.

Upspin is in its early days, but the plan is for you to manage all your data—even data you’ve stored in commercial web services—in a safe, secure, and sharable way that makes it easy to discover what you’ve got and who you’ve shared it with.

If you’d like to help us make that vision a reality, we’d love to have you try out Upspin.

Upspin is an open-source project that comprises two main design elements:

1. a set of protocols enabling secure, federated sharing using a global naming system; and
2. reference implementations of tools and services that demonstrate the capabilities.

Summary: It backloads encrypted, permissioned (?), storage into arbitrary backends including cloud providers. Maybe.

## Owncloud/Nextcloud

Choose this if… your campus runs a giant free Owncloud service so you may as well. Owncloud and Nextcloud are two forks of the same codebase. Nextcloud seems to be hipper. They both look similar to me. Since I encountered Owncloud first I mention it here; the differences seem largely philosophical.

For clarity where I am referring to both I will describe the resulting superposition as Nowtcloud.

Nowtcloud is dubiously secure; they have security advisories all the time. Also the server doesn’t store files encrypted, so you get an increased ease of sharing files (no extra password needed) but decreased confidentiality in case the server is compromised. Lawks! That’s barely better than Dropbox!

OTOH, it’s possible to run on your own Nowtcloud server, e.g. using docker, so it’s useful for sharing something public such as open research etc for only the cost of hosting, which is low. However, if you want to do this, Seafile seems to be better software for the file sharing use case and is no harder to set up, so why not try that?

The real reason would be that someone else has gifted you a server already set up. Australian academics get a generous serving of storage from AARNET, a terrabyte I think, so we may as well.

However, there are various quirks to survive.

For one, native command-line usage is not obvious. How do you access your stored data files from your campus cluster?

First, you can access it as a WebDAV share, which is unwieldy but probably works. WebDAV is effectively making things available on a webserver, which sounds like it should be simple, but in practice you need a special client to do it because there are lots of fiddly details. e.g. you want to browse the folders to find the file, or you want to handle authentication etc. Boring.

Nowtcloud notionally has a command-line client, but the CLI documentation is hidden deeply, possibly because it’s not very good. (the documentation or the CLI) Tony Maro gives a walk-through. It’s sensitively version dependent. Beware.

There are also version clashes between different versions of Nowtcloud, and when you sync a folder with some other service, occasional silent data loss. I’m not a fan of this whole project.

I recommend that if you need to get data out of Nowtcloud from the command line you should extract it using rclone’s WebDAV mode and ignore Nowtcloud itself. It’s convenient that someone runs Nowtcloud code on some server somewhere, and I do in fact store a bunch of non-confidential data sets there. But the less of the Nowtcloud code I myself run, the better this has worked for me thus far.

## git-annex

Choose this if… you are a giant nerd with harrowing restrictions on your data transfer and its worth your while to leverage this sophisticated and yet confusing bit fo software to work around these challenges. E.g. you are integrating sneakernets and various online options. Which I am not.

git-annex supports explicit and customisable folder-tree synchronisation, merging and sneakernets and as such I am well disposed toward it. You can choose to have things in various stores, and to copy files to an from servers or disks as they become available. It doesn’t support iOS. Windows support is experimental. Granularity is per-file. It has weird symlink-based file access protocol which might be inconvenient for many uses. (I’m imagining this is trouble for Microsoft Word or whatever.)

Also, do you want to invoke various disk-online-disk-offline-how-sync-when options from the command line, or do you want stuff to magically replicate itself across some machines without requiring you to remember the correct incantation on a regular basis?

The documentation is nerdy and unclear, but I think my needs are nerdy and unclear by modern standards. However, the combinatorial explosion of options and excessive hands-on-ness is a serious problem which I will not realistically get around to addressing due to my to-do list already being too long.

## Sparkleshare

Sparkleshare is designed for file syncing for designers, by using git as a backend. Haven’t explored it yet.

## Turtl

Faintly left-field, Turtl is a notes syncing app. It aims to be a competitor to Evernote, but without their insipid privacy and faintly spammy attitude. No iOS support. Doesn’t synchronise arbitrary files like most of the other entrants here, but does rich UI for the notes.

## Dropbox if you must

Choose this if… you don’t mind giving access to your data to dubious strangers with little regard for your security and some of your colleagues are totally hooked on it.

I talk about dropbox here, but also GDrive and whatever the windows thing is called do it too.

Bonus tip: using dropbox without the dropbox client.

### Use rclone to get dropbox data on demand

The upshot of rclone is that I can pull changes from dropbox into my git repository thusly

rclone sync --exclude=".git/" --update dropbox:ProjectForGitHaters/ ./ProjectForGitHaters/

and push changes into dropbox from the git repo like so

clone sync --exclude=".git/" --update ./ProjectForGitHaters/ dropbox:ProjectForGitHaters/

My colleagues need never know that I am using modern version control, change-tracking, merging, diffing and so on.

In practice, to exclude a lot of files at once, I recycle a standard exclude list from syncthing and replace --exclude=".git/" with --exclude-from=".stignore" in those commands. And to make sure I am not accidentally syncing git repo stuff I use the --dry-run option to verify that the expected files are getting copied/deleted/whatever.

### Run dropbox proper on a spare computer and then sync using syncthing

This works pretty good. I was running dropbox and syncthing on a spare computer I had lying around campus to synchronise the stuff off dropbox I need automatically. One minus was that occasionally I get logged out of that machine when I am away, causing syncing to break. These days I use rclone tom communicate with dropboxers.

### Use a FUSE virtual mount

dbxfs or ff3d or rclone (above) allow you mount the remote dropbox file system without installing Dropbox’s suspect client software. This seems slow and clunky; I think you would only do this if you needed to coordinate on some dropbox thing in realtime but mistrusted the client. This sounds like hell to me. Can I not just use git and handle coordination in my own sweet time? For my offline collaboration style, manual syncing with rclone is better.

### Run a sandbox

If I must use Dropbox, I could perhaps sandbox it so at least I don’t need to run their stupid software on a real machine that I actually use. Dropbox itself doesn’t seem to ship any of the sandbox systems natively, (aside: why not? Does some part of their business model depends on intrusive access to everything you do?). I did try to a containerized, using docker. However in practice for me it was fragile and RAM-heavy, difficult to debug and overall not recommended. Possibly other sandboxes would be better? But meh.

## Others

• SpiderOak was the most popular encrypted service last time I checked. It is based in the USA, which, like Russia and China, is more of a secret service browsing library than a secure document store where you would keep actual private stuff, which creates certain difficulties for their credibility.

• sparkleshare is a friendly git front-end for non-specialists:

creates a special folder on your computer. You can add remotely hosted folders (or “projects”) to this folder. These projects will be automatically kept in sync with both the host and all of your peers when someone adds, removes or edits a file.

SparkleShare uses the version control system Git under the hood, so setting up a host yourself is relatively easy.

FWIW this seems to me to be less of a good sync client, and more of a good git GUI.

• Academic cred: “Ori is a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.”

• Tresorit is a Swiss Spideroak competitor, which capitalises on stronger Swiss privacy laws, (YMMV) as well as trendy encryption technology. Closed-source though, so there is still a degree of blind faith.

## Syncing dotfiles

Want to synchronise text-based setting between computers? You might try mackup to sync settings for linux and osx machines alike to some folder somewhere. It’s a database of which actual settings of various apps are actually syncable. On second thoughts, this is a fragile approach. And it freaks out if you have non-ascii characters in your filenames. Do something different.

Revised recommendation:

git init --bare $HOME/.dotfiles alias dotfiles='git --git-dir=$HOME/.dotfiles/ --work-tree=$HOME' dotfiles config --local status.showUntrackedFiles no echo "alias dotfiles='git --git-dir=$HOME/.dotfiles/ --work-tree=$HOME'" \ >>$HOME/.bashrc

Yes, much less freaky.

Actually, do you know what is even easier? Just make a git repo in your root dir. No more overthinking. Re-revised recommendation:

git init \$HOME
git config --local status.showUntrackedFiles no

Now! go forth and steal other peoples’ dotfile tricks.

## Bonus trick: host-proof your sync with encryption

There are tools to turn even your awful unencrypted untrustworthy system into an encrypted ones. Cryptomator is one cuddly friendly option. So is the more austere rclone, as mentioned earlier. These encrypt everything inside a certain sync, in particular stopping your snooping corporate sync provider from reading it. Even a crappy spying provider such as Google, Microsoft or Dropbox can be made safer. Both those options are free and simple.

The drawbacks that immediately occur to me are

• this does not help with sharing files with peers, who still need to decrypt stuff somehow (although that’s a challenge for any encrypted service)
• you still have to run the provider’s sync software on your computer, which means trusting their client code if not their server code.
• files are encrypted individually so you are still leaking some information about what kind of files they are in their size and usage patterns.
• There are several solutions to do this, but they are use AFAICT incompatible encryption, so you can’t ebenfit from sharing files across many apps securely

NB you could do this anyway by manually encrypting everything, but would you? No, because it’s slow and tedious. You want a nice GUI like this.

If you don’t mind whether the files are local or not, you could use rclone’s encryption mode, which talks directly to the remote file store and also encrypts the content. Rclone can do everything.