Taking my email archive offline
2020-12-29 — 2026-03-20
Wherein a 20GB correspondence trove is drawn down by mbsync into an encrypted Maildir, is replicated by Syncthing, and the server is left with a 30–90‑day window for phone use.
Many people I know keep a long record of their email correspondence on, e.g., Gmail. I’m getting increasingly uneasy about having a complete social graph and archive of my correspondence sitting on someone else’s server — basically a browsable library for foreign actors and AI training data farms. I wonder if I can do better?
My goal here is to keep redundant, encrypted, local copies of my ~20GB email archive and still read them with a nice modern email client — while keeping my provider as a thin relay rather than the permanent custodian of two decades of correspondence.
A quick orientation for email plumbing: email involves two distinct protocols.
- SMTP handles sending and routing — getting a message from my machine to its recipient’s server. I will never run my own SMTP server; it’s a thankless fight against spam filters, SPF, DKIM, DMARC, and the general hostility of the modern internet to small mail senders who, to be fair, are often AI slop scams. I’m happy to let some email provider handle that.
- IMAP handles reading — it’s the protocol a mail client uses to access a mailbox on a server. The key thing I care about here is that IMAP can also be used to sync a remote mailbox to a local copy. Tools like
mbsyncpull mail down into a local Maildir (one plain file per message, stored in ordinary directories).
My plan is to keep using a provider for SMTP and for the live IMAP mailbox where new mail lands, but regularly sync that mailbox down to encrypted local storage and — once I’m confident the local copy is complete — delete old mail from the server. My day-to-day would look something like this:
- I open my laptop. A cron job or systemd timer runs
mbsyncin the background, pulling any new messages from my provider into a local Maildir on an encrypted disk. - I open my mail client — Apple Mail, Thunderbird, mu4e, or even a local webmail client — pointed at the local Maildir (directly, or via a local-only Dovecot IMAP server). Reading, searching, and tagging all happen locally.
- When I compose and send, the client talks SMTP to my provider as usual. Sent mail lands in the provider’s Sent folder, which mbsync will pull down on its next run.
- Between my Linux and Mac machines, the local Maildir stays in sync via Syncthing or a similar file sync tool.
- Periodically I prune old mail from the provider’s server, keeping only the local encrypted copies.
The provider never disappears entirely — I still depend on it for sending and for receiving new mail — but it stops being the long-term archive. If the provider gets breached, or served with a national security letter, or simply shuts down, my archive is safe(r) on my own disks.
My original plan was to store ~20GB of email archives on an encrypted offline disk, reducing at-rest data exposure. I didn’t do it the last time I looked into it (I freak out about this every couple of years or so; the last time was 2024). Last time I gave up mainly because I couldn’t figure out a clean way to replicate a local mail archive between my Linux and Mac machines. Instead I stored my email on Gandi via IMAP, reasoning that a French host under EU jurisdiction was at least marginally better than a US-based provider. With the Gandi contract up for renewal, I’m revisiting the whole question.
1 What has changed since 2024
The regulatory landscape has gotten worse, not better. The UK’s Investigatory Powers Act was used in early 2025 to serve Apple with a Technical Capability Notice demanding a backdoor into iCloud encrypted backups. Apple’s response was to disable Advanced Data Protection for UK users rather than comply, but the UK tried again in late 2025, this time asking for access limited to British users only. Meanwhile the US CLOUD Act continues to allow US law enforcement to compel American companies to disclose data stored abroad, regardless of local data protection law, and these requests can come with non-disclosure orders that prevent the provider from telling me. In early 2025, the Trump administration gutted the Privacy and Civil Liberties Oversight Board, removing the body that was supposed to oversee intelligence community commitments around data protection. Essentially, reducing state spying got more urgent.
I don’t want to oversell email sovereignty as a panacea.
The broader trend is worth noting: On one hand, governments and regulators worldwide are increasingly insisting that data about their citizens stays within their borders — what policy people call “data sovereignty,” the replacement of the old assumption that data flows freely across borders.
On the other hand, those same governments are simultaneously asserting more power to compel disclosure from providers within their jurisdiction. The net effect for individuals like me: If my email sits on someone else’s server, some government can probably get at it, and I may never know. Keeping the archive on my own disks doesn’t make me invulnerable, but it does take me out of the easy-pickings category.
I’m in Australia, which is part of the Five Eyes alliance, so my email is potentially subject to surveillance by multiple governments, many of which have poor and/or decaying respect for citizen confidentiality and the rule of law.
2 Costs and benefits
2.1 What offline email actually mitigates
As always, if we don’t have a real threat model, our security attempts are just security theatre. So I’ll state what I think offline email storage actually helps with, and what it doesn’t.
It does help with:
- At-rest bulk surveillance. If my email sits on a provider’s server, it’s an easy target for mass data requests, provider-side breaches, and insider threats. A local encrypted archive isn’t accessible unless someone can get physical access to my machines.
- Provider compromise. Cloud providers get breached. A local Maildir on LUKS/FileVault is not exposed by a provider-side hack.
- Jurisdiction shopping by state actors. With IMAP on Gandi, my email is subject to French law (good) but also to whatever international agreements France has (variable). Offline email is subject only to the laws of whatever jurisdiction my physical disks are in, and only if someone knows to look for them.
- Reducing long-term accumulation risk. The longer an archive sits on a server, the more valuable a target it becomes. Offline storage lets me keep the archive without painting a bigger target.
It does NOT help with:
- Metadata. My email provider still sees who I email, when, and with what subject lines. This metadata is often more revealing than content and is much easier for law enforcement to obtain (often without a warrant).
- In-transit interception. SMTP is still SMTP. Unless we’re using PGP/S-MIME end-to-end (and we almost certainly aren’t for most correspondence), the content is visible in transit.
- Sender-side copies. My correspondents’ providers still have copies of everything. Google has the other side of most of my conversations.
- Active targeted surveillance. If a state actor specifically targets me, offline email is a speed bump, not a wall. They can compel my provider to intercept future mail, serve me with a warrant for the physical disks, or compromise my endpoints.
- Sending mail. I still need an SMTP relay. This is a hard dependency on some provider somewhere.
Summary: offline email is a meaningful improvement against bulk/passive threats and a modest improvement against targeted threats. It mostly buys me sovereignty over my archive — the historical record — rather than over my ongoing communications.
2.2 What this costs me
Trade-offs:
- No Gmail-tier client. I lose the polish of modern hosted email: Gmail’s search, smart categorization, snoozing, the integrated calendar/contacts/tasks ecosystem. Local clients like Thunderbird or mu4e are capable but undeniably rougher. This is the biggest day-to-day cost and the reason most people don’t bother.
- Maintenance overhead. Someone (me) has to keep mbsync running, monitor that syncs complete, handle the occasional conflict, and ensure backups of the local archive itself. This is a small but non-zero ongoing tax.
- AI and MCP integration gets harder. In the current era, we might want to pipe email through AI agents — summarization, triage, draft replies — via something like MCP. With a cloud provider, this is easy: the AI talks to the provider’s API. With a local Maildir, it’s still feasible (an MCP server can read Maildir files or talk to a local Dovecot over IMAP), but I’d have to set that up myself. The upside is that my email never leaves my machine, which is arguably a feature rather than a bug if I’m worried about data sovereignty.
- Compelled decryption. In Australia, section 3LA of the Crimes Act 1914 allows a magistrate to order anyone with knowledge of a computer system to provide passwords or decryption assistance. Refusal carries up to 5 years imprisonment (10 for serious offences). The subject of the order doesn’t even need to be suspected of a crime — they just need to own the device. The broader Assistance and Access Act 2018 can additionally compel providers to give technical assistance, though it cannot require them to build new decryption capabilities. So local encrypted storage isn’t immune to Australian law — it just moves the pressure point from a provider (who might comply silently) to me personally (where at least I’d know about it).
2.3 Why not just use Proton Mail?
Let’s compare and contrast with encrypted mail providers. Proton Mail, for example, offers end-to-end encryption, zero-access storage (Proton themselves can’t read my mail), Swiss jurisdiction, and a polished web/mobile client. Why am I contemplating this DIY nonsense instead?
Proton solves a different problem — and introduces its own trade-offs.
Where Proton wins: Zero admin overhead. Attractive clients on every platform. End-to-end encryption between Proton users is genuinely strong. Swiss jurisdiction is about as good as it gets for resisting foreign data requests. Pricing is reasonable: the Mail Plus plan is ~$4/month, Unlimited ~$10/month, and they support custom domains on paid plans. For most people who just want better-than-Gmail privacy, Proton is the right answer and I wouldn’t talk them out of it.
Where Proton doesn’t solve my problem:
- Encryption only works within the Proton ecosystem. Mail sent to or received from non-Proton addresses is not end-to-end encrypted unless I pre-arrange a shared password with the recipient — which in practice means almost none of my correspondence is actually E2E encrypted. My archive of 20 years of mail from non-Proton senders would sit on Proton’s servers encrypted at rest with their keys, not mine.
- Vendor lock-in. Proton doesn’t offer standard IMAP; the only way to use a desktop client is via their Bridge application, which is paid-plan-only and has documented issues with mbsync — UID instability, custom header stripping, and IMAP specification violations that cause desynchronization. Exporting my archive out of Proton is possible but painful; there’s no good incremental backup plan AFAICT.
- I’m still trusting a provider. Swiss law is favourable, but Proton is still a company that can be compelled, compromised, or acquired. Their zero-access encryption means they currently can’t read my mail, but that’s a property of their software, not a mathematical guarantee I can verify. If Swiss law changes, or if Proton’s infrastructure is breached, I have no local fallback.
- Cost adds up. For a 20GB archive with a custom domain, I’d need at least the Mail Plus plan ($48/year) or Unlimited ($120/year). My DIY approach costs whatever I’m already paying for a cheap IMAP relay (comparable or cheaper) plus my time.
The DIY offline approach trades Proton’s polish and convenience for something Proton can’t offer: the archive lives on my disks, encrypted with my keys, in a standard format (Maildir) that any tool can read, and I can switch providers trivially because the provider is just a transient relay. The ongoing encryption of the at-rest archive doesn’t depend on any company’s continued good behaviour.
If I were starting fresh with no existing archive, Proton would be tempting. But I’m migrating 20 years of mail from non-Proton senders, I want to use standard tools, and I want the archive to survive any single provider disappearing. For that, DIY wins.
3 In practice
The offline mail problem breaks down into two separate questions: (1) how do I get mail off the remote server and into a usable local format? and (2) how do I keep multiple local copies in sync across my Linux and Mac machines without running a dedicated server?
These are independent choices. I pick one ingest tool and one sync strategy, and they compose.
3.1 Problem 1: Ingest — getting mail from the remote into local storage
This is always the same basic operation: pull mail from an IMAP server into a local Maildir. The tool I’d use is mbsync (the binary from the isync project). It syncs IMAP↔︎Maildir bidirectionally, tracking state via UIDs so no message identity conflicts occur. It’s available on both Linux (every distro packages it) and macOS (brew install isync), and config lives in ~/.mbsyncrc.
offlineimap3 does the same job and still works, but is less actively maintained. mbsync is the safer bet for a new setup.
The result of either tool is a Maildir — a directory tree with one plain file per message. This is the local storage format everything else builds on. It’s encrypted at rest by whatever full-disk encryption the machine already uses (LUKS on Linux, FileVault on macOS).
References for mbsync:
- isync: free IMAP and MailDir mailbox synchronizer
- mbsync manpage
- isync — ArchWiki
- Fastmail + Emacs + mu4e + mbsync on macOS — good worked example
- Setting up isync on Linux
References for offlineimap:
3.2 Problem 2: Sync — keeping multiple machines in agreement
This is the harder question. AFAICS there are three approaches.
3.2.1 Option A: Both machines independently sync from the remote
The simplest option: both my Linux and Mac machines run mbsync against the same IMAP server. Each gets its own local Maildir. Changes (flags, deletes) propagate via the server: I mark something read on one machine, mbsync pushes that flag to the server, the other machine picks it up on its next sync.
This works well, but the remote server stays as the coordination point — which partially defeats the purpose. I can’t delete old mail from the server if both machines still need it as their sync source. It’s a reasonable transitional step (get everything local and encrypted, stop worrying about the provider as sole custodian), but it doesn’t fully achieve the “thin relay” goal.
3.2.2 Option B: File-level replication with Syncthing
One machine is the “primary” that runs mbsync against the remote. Syncthing (or rsync, or any file sync tool) replicates the resulting Maildir to the other machine. Maildir is well-suited to file-level sync: one file per message, flag changes are atomic filename renames, and messages are never modified after delivery.
The constraint is that only the primary machine should run mbsync — running it on both would produce conflicting UID state files. In practice this means I’d designate my Linux box as the ingest machine and let Syncthing push to the Mac. Syncthing doesn’t understand mail semantics, so if I somehow managed to create a genuine conflict (e.g. renaming the same file on both machines simultaneously), it would create a .sync-conflict file rather than merge. In practice this almost never happens with Maildir.
This is my current leading candidate. It’s simple, the server can be pruned once I trust the local copies, and the tools are all mature.
3.2.3 Option C: Dovecot dsync master-master replication
Run Dovecot as a local-only IMAP server on each machine, using dsync for two-way replication between them. Dovecot listens on localhost only (no network exposure). One or both instances also pull from the remote provider (either via mbsync feeding the Dovecot Maildir, or Dovecot’s own dsync against the remote IMAP).
The advantage is that dsync replication is master-master: both replicas end up identical, and no changes are lost even with simultaneous modifications on both machines. It understands mail semantics, so conflicts are resolved correctly rather than producing duplicate files. Any standard mail client talks to the local Dovecot over IMAP — Thunderbird, Apple Mail, mu4e, webmail, anything.
The cost is admin: I’d be running Dovecot on both machines and need occasional network connectivity between them (e.g. over Tailscale/WireGuard when both are on the same LAN). It’s the most “correct” solution and also the most work to maintain.
References:
3.2.4 Sneakernet variant: encrypted external drive
I could also skip machine-to-machine sync entirely and keep the archive on a LUKS-encrypted USB drive (readable on Linux natively; on macOS via macFUSE + luksopen), or an encrypted APFS sparse bundle (macOS-native; Linux can read via apfs-fuse). Mount, run mbsync, unmount, carry it between machines. The most offline option — the archive isn’t even on a running filesystem most of the time — but I have to physically carry the drive.
3.3 The mobile problem
None of the above helps with phones. An iPhone or Android device can’t run mbsync, can’t mount a local Maildir, and can’t talk to a localhost Dovecot. Mobile mail clients need a real IMAP server on the internet, which means the provider can’t be fully pruned — at least not of recent mail.
This is the single biggest source of UX friction in the whole scheme, and it forces a compromise: If I want email on the road, I need to keep a rolling window of recent mail (say, 30–90 days) on the provider’s server for mobile access, and archive everything older to local storage. My phone sees recent mail normally; my laptops see everything. The archive isn’t accessible from my phone, but in practice I almost never need to search 2019 email from my pocket — and if I do, I can SSH into one of my machines.
In practice the workflow needs to look like this: mbsync runs on the primary laptop, pulling all mail (new and old) into the local Maildir. Then a separate pruning step deletes messages older than N days from the remote server. mbsync’s MaxMessages option can cap how many messages the remote side retains, though it works by count rather than age — for true date-based pruning I’d need a small script (e.g. a cron job using notmuch or mu to find old message IDs and an IMAP STORE+EXPUNGE to remove them from the server).
Security trade-off: during that 30–90 day window, recent mail is sitting on the provider’s server, readable by whoever can compel the provider. But the bulk of the archive — the 20 years of accumulated correspondence that’s the real prize — is offline. This feels like a reasonable line to draw for some risk profiles: I get mobile convenience for the mail I’m actively dealing with (airline bookings etc), and sovereignty over the historical record.
This also interacts with the sync strategy. Option A (both machines sync independently from the remote) is the most phone-friendly, since the server already has everything — but it’s the worst for the “thin relay” goal. Options B and C work fine with the windowed approach: the primary laptop ingests everything, the server gets pruned to the recent window, and the phone just sees what’s left.
4 Which host?
I’ve been using Gandi. Their email hosting costs keep going up. Since I’m re-evaluating anyway, it’s worth noting that alternatives in favourable jurisdictions include Infomaniak (Swiss, strong privacy stance) and OVHcloud (French, includes email with domain registration). Migadu, Posteo, and Mailbox.org are also options with good privacy reputations.
For the offline-email approach, though, the provider choice matters less — I only need IMAP access for the initial sync and for ongoing new-mail retrieval. The provider becomes a transient relay rather than a long-term archive host.
4.1 SQL-backed mail stores
I considered and rejected the idea of storing mail in SQL. Mailpiler provides a MySQL-backed web frontend for email archives; libremail does IMAP→SQL syncing with a Kanban-style client. Without checking, I bet they use idiosyncratic schemas. Running a database server to index files that already work fine as files feels like the wrong trade-off for a personal archive.
5 Local search and indexing
Once mail is in a local Maildir, I need to search and read it. These are the main options; what we actually need is one of the first two (an indexer/search tool) paired with a client that talks to it.
notmuch — Fast, tag-based search over Maildir. The philosophy is that mail is immutable data and organization comes from tags rather than folders. Integrates with Emacs (notmuch-emacs), neomutt, and various other frontends. Cross-platform (Linux, macOS). I’m not yet sure if this is a value-add, since I’m not sure what the actual invocation pathway is.
mu / mu4e — Similar to notmuch: indexes Maildir, provides fast search with a rich query language (sender, date range, flags, attachments, etc.), and includes mu4e, a full Emacs-based mail client built on top. Supports encrypted messages, accent normalization, and scripting via Guile/Scheme. If I were an Emacs user, this would be the obvious choice. See the mu cheatsheet.
Jaro Mail — An integrated suite targeting local-first email on macOS and Linux. Bundles sync, search, filtering, whitelisting, and address book management. Opinionated: it really wants to be the whole stack, including a console-based reading interface (Classic dynebolic!) I could probably use just its Maildir management and filtering pieces, ignoring the rest.
For GUI clients: Thunderbird can read Maildir directly (via its “local folders” support) and is the path-of-least-resistance option on both Linux and macOS. Apple Mail cannot read Maildir — it uses its own proprietary storage format — but it speaks IMAP just fine, so running a local Dovecot (Option C above) makes Apple Mail work transparently. In fact, with a localhost Dovecot in front of the Maildir, essentially any IMAP client works, including local webmail.
