Community sovereign AI compute
On the geopolitical risk of renting your thinking from abroad, and what a small collective can do about it
2026-03-22 — 2026-05-24
Wherein the Joint Ownership of AI Inference Hardware by Small Mutual-Aid Collectives Is Examined as a Hedge Against Geopolitical Disruption to Submarine Cables and Commercial API Restriction.
It turns out I’m writing a series of posts about practical community infrastructure building.
This is the second one. In a previous post I sketched out a case for neo-friendly societies—small mutual aid groups that hedge against state decay by pooling resources and investing counter-cyclically. I wrote that because it seems unwise to bank on the Australian state taking a proactive approach to the oncoming global risks. What can we, as small collectives, do to prepare for the future if we don’t want to wait on the government to do it for us?
Here I talk about a specific asset that a small collective might want to own: a computer that can do intellectual work for us.
By which I mean: a machine that can run something like the class of AI models that currently power the tools many of us are starting to depend on for work—coding assistants, research tools, document drafters, agentic workflows. The kind of thing that, right now, we rent by the token from a company in San Francisco, or increasingly, from a company in Hangzhou.
A self-hosted open-weight model, at least at the level of hardware and scale that I have costed here, is not going to match the quality of the best commercial systems. Claude, GPT-5+, Gemini—these are the products of billions of dollars of training compute and proprietary post-training. The best open-weight models are good—impressively good for many tasks—but they’re not that good, not yet. The case for sovereign compute isn’t “this is better than what we pay for.” It’s “this is good enough that we’d be glad to have it if the commercial options became unavailable, unaffordable, or untrustworthy.” It’s a hedge more than a replacement.
I used AI heavily to scope out some plans (see Australian Sovereign compute technical addendum). It’s there if we want to get deep into the weeds. Here I’m thinking about the big picture: why would a small collective want to own its own AI compute, what would it look like, and how does it fit into the broader friendly society model? And is it affordable right now?
This is the scenario that sovereign compute hedges against. I do not promise it is the most likely future, but nor does it seem absurd.
1 The problem with renting our cognition
If we use Claude, ChatGPT, Qwen or DeepSeek, we are renting inference from someone else’s data centre.
1.1 They want to extract more rent
The process we refer to as enshittification has happened to many digital services; the social media version is the canonical shape:
- A new service offers a great product, often for free, and rapidly gains users by providing value at a competitive price.
- Once we have built habits and workflows around it, our switching costs go up. The platform tilts toward its other customers — advertisers, integrators, governments — at our expense.
- Once those parties are locked in too, the platform extracts from them as well. Quality drops, prices rise, terms narrow. Everyone but the shareholders ends up worse off than at step 1.
Today’s generous AI pricing is step 1.
We should not assume the labs will always be content to let us use their infrastructure for purposes that threaten their interests — organising labour, challenging their market power, building competing products, or doing work they’d rather we paid them more for. Three regulatory blocs — Washington, Brussels, Beijing — decide which AI capabilities reach us, and Australia has little leverage over any of them. Canberra has no credible AI sovereignty strategy of its own, so whichever way those blocs jump seriously constrains our options.
1.2 State pressure on providers
In early 2026 the Pentagon demanded that Anthropic grant the military unrestricted access to Claude for “all lawful uses”. When Anthropic’s CEO refused — insisting on two conditions, no autonomous weapons and no mass domestic surveillance — the Trump administration designated Anthropic a “supply chain risk to national security”, a label previously reserved for firms subject to adversarial foreign influence, like Huawei. The company’s $200M Pentagon contract was terminated; Anthropic filed federal lawsuits alleging illegal retaliation.
The Chinese open-weight models — Qwen, DeepSeek — have their own version of this. They ship with CCP-aligned guardrails baked in at the alignment stage: hard refusals on Taiwan, Tiananmen, Xinjiang, plus subtler framing biases that shift depending on the language we prompt in. (These are fixable if we own the hardware — more on that below.)
There are many structural reasons to expect that governments will lean on providers to restrict frontier access. For the most capable tier, there may be no API at all, especially for non-US or non-Chinese nationals. Open weights on hardware we own aren’t subject to that rationing logic.
1.3 The infrastructure is a target
The war in Iran has demonstrated that AI data centres are now military targets. Iran struck Amazon-owned data centres in the UAE and Bahrain, citing their role in supporting US military operations — operations that themselves relied on AI systems including Claude to plan airstrikes at a pace no human planning process could match. The infrastructure we depend on for coding assistants and document drafters is the same infrastructure being used to prosecute wars.
That makes it a great target in wars. Connectivity is fragile. Australia’s international internet runs through roughly 18 submarine cables landing at a handful of points in Perth and Sydney — cables that are increasingly being sabotaged in the Baltic, the Red Sea, and the Taiwan Strait. ASPI has called this Australia’s digital Achilles’ heel. Satellite internet is not a fallback we control: Starlink is a US private company with deep military ties and no accountability to foreign governments — during the Ukraine war, corporate decisions about Starlink access altered the tactical balance on an active battlefield. Australia has no sovereign satellite internet capability, and no particular claim on American satellite bandwidth in a crisis.
Cloud API queries cross all of this; a machine in our city does not.
The hardware supply chain is similarly concentrated. AI chips are mostly made by NVIDIA (American), fabricated by TSMC (Taiwanese). If the Taiwan Strait becomes contested, the global AI compute supply chain goes through a chokepoint — the reason the US is spending tens of billions on domestic fabrication under the CHIPS Act. Buying hardware now, while it’s available, is itself a hedge.
2 What “sovereign compute” looks like at a small scale
When governments talk about sovereign compute, they mean billion-dollar data centres and national AI strategies. I want to talk about something much smaller: what does it look like for a group of 25–50 people to own and operate enough compute to run a frontier-class AI model?
It turns out this is newly, nearly feasible, because of a convergence of two trends: open-weight models have gotten very good, and the hardware to run them has gotten (relatively) affordable.
There are a lot of free variables (which model? which hardware? what trade-offs?), so I’ll pick some baseline examples — slightly arbitrary, but workable for a small collective.
2.1 Hardware
The NVIDIA DGX Station — the desktop version of the machines that power most AI data centres — now comes with a GB300 Grace Blackwell chip. It sits under a desk, draws 1600 watts, and can run models with up to a trillion parameters. The relevant configuration for our purposes:
- 252 GB of fast GPU memory (HBM3e), plus 496 GB of slower CPU memory (LPDDR5X)
- About 20 petaFLOPs of AI compute
- Price: roughly $85,000–$125,000 USD (the MSI XpertStation WS300 lists at $85,000 USD), or around $135,000–$195,000 AUD landed with GST.
That sounds like a lot for a desktop computer. It’s not a lot split between 50 people. At $160,000 AUD for a mid-range configuration, that’s $3,200 per member — comparable to a serious API habit for a year, but we own the hardware outright.
2.2 Model
Alibaba’s Qwen3–235B-A22B is a “mixture of experts” model — it has 235 billion parameters in total, but only activates 22 billion of them for any given query, which is to say, it needs lots of RAM but not unattainable amounts. It’s not Claude or GPT-5.x — expect noticeably weaker performance on the hardest reasoning and coding tasks — but it’s solidly capable for everyday use: drafting, summarising, code assistance, research support, agentic tool-calling. Good enough to use; good enough that we’d miss it if it were gone.
At 4-bit quantization (a compression technique that reduces memory usage with mild quality loss), the whole model fits in the DGX Station’s GPU memory with room to spare. The remaining memory is used for tracking conversation context, which determines how many people can use it at once.
A single DGX Station running Qwen3–235B at 4-bit quantization can serve 20–80 concurrent conversations depending on how long each conversation’s context is (see the technical companion for the detailed maths). For a collective of 50 people, not all of whom will be using it simultaneously, this is … manageable? Maybe. We’ll hit friction because we all want to use it during the day and leave it idle overnight, but with some scheduling and patience, it could work.
2.3 The cost of operation
Running the machine 24/7 costs about $350 AUD/month in electricity at Australian rates. Amortising the hardware over three years adds about $4,500/month. Total: roughly $5,000 AUD/month, or $100 per member per month in a 50-person collective.
For comparison, a serious user of Claude or ChatGPT’s pro tiers pays $20–$200 USD/month for rate-limited access. A developer using API access for agentic workflows can easily burn through $100–$500 USD/month in tokens. At reasonable utilization — say the machine is processing tokens 50–80% of the time, across a mix of fast prompt ingestion and slower token generation — we might push 500M–2B tokens per month through the system. That works out to roughly $2.50–$10 AUD per million tokens, depending on how busy we keep it. For comparison, commercial API pricing for models of comparable capability runs $3–$15 per million input tokens and $10–$75 per million output tokens depending on the provider and model tier (e.g. Claude Sonnet at $3/$15 per million tokens, GPT-4o at $2.50/$10). Self-hosted inference comes out roughly on par with commercial APIs of similar quality — modestly cheaper if we use it heavily.
The arithmetic above prices sysadmin labour at zero. Someone in the collective has to install vLLM, monitor uptime, rotate models, deal with OOMs at 2am, run the network, handle access management. At Australian engineering rates, billing any of that labour turns “comparable to a Claude habit” into “we should just pay for Claude”. In money terms this is roughly a wash with commercial APIs, plus an in-kind labour subsidy from whoever in the collective enjoys this sort of thing — dead money on most days, and the only thing that works on the day the cables go down.
On top of the wash we get no per-query metering, no rate limits, no surprise bills, and the hardware free and clear at the end of three years.
2.4 Sparse attention
A recent favourable development.
Everything above treats the memory used to track conversations as a hard ceiling: once the machine is holding enough simultaneous conversations, the rest of us wait. That ceiling holds for the model architectures I costed. But in April 2026 DeepSeek released V4 (weights here), and the interesting part isn’t the parameter count — it’s the attention design. V4 compresses each conversation’s context before storing it, so a sprawling million-token agentic session costs the machine under 10 GB of working memory instead of the tens of GB a conventional model of the same class would need.
Feed that into the same arithmetic and the result reverses. The worst case — several members each running long, context-heavy agentic sessions at the same time — eases by roughly an order of magnitude. The binding constraint shifts from “memory fills up after one power user” to “how fast can the machine generate tokens” — a throughput question we can measure and budget for. The technical companion works the arithmetic.
Two caveats, though. This is one model family from one Chinese lab, so it carries the same dependency-versus-alignment split as the Qwen path: open weights settle whether we’re allowed to use it, not what guardrails were baked in at the alignment stage (the de-censoring discussion below covers this). And a memory ceiling moving outward is not a throughput guarantee — sparse attention also trims the compute per token, but the constraint moves rather than vanishes.
The architectural direction is the signal I care about more than this particular model. Hardware we buy now doesn’t become obsolete when models get more memory-efficient; it gets more capable per dollar, which is the counter-cyclical property the whole argument rests on.
3 Removing CCP guardrails
There’s a catch with using Chinese open-weight models: they come with censorship built in. Research from Shisa.AI has documented the specific patterns: Qwen models will hard-refuse certain prompts (anything touching Taiwan sovereignty, Tiananmen Square, various topics in Xinjiang), and increasingly, newer versions have shifted from outright refusal to controlled compliance — they’ll answer the question, but steer the framing toward CCP-aligned positions. The behaviour is language-dependent: the Shisa.AI analysis found significantly fewer refusals in Chinese than in English on the same questions (>80% fewer), suggesting the censorship is calibrated for different audiences.
For an Australian collective, this is a solvable problem. The open-source community has developed several techniques for removing these guardrails, ranging from essentially free to moderately expensive:
Abliteration (cost: ~$100–$200 AUD for models of our size) is a technique from representation engineering where we identify the “refusal direction” in the model’s internal representations and remove it through a linear algebra operation on the weights. No training required — it’s a post-processing step that takes hours, not days. Over 4,000 community-modified models have been published using this method on HuggingFace alone. It’s effective at removing hard refusals, though it doesn’t fully address the subtler framing biases.
Preference fine-tuning via Direct Preference Optimization (cost: ~$1,500–$4,000 AUD including dataset creation) goes deeper. We create a dataset of question-answer pairs where the “preferred” answer is neutral/factual and the “dispreferred” answer is CCP-aligned, then train the model to prefer the neutral framing. This addresses both hard refusals and soft steering. The training runs on rented cloud GPUs — a few hundred to a couple of thousand dollars’ worth — and the resulting model can be deployed on our own hardware permanently.
Either way, the total cost of removing CCP guardrails is a rounding error compared to the hardware. And once it’s done, it’s done — the modified weights are on our machine, and no one can remotely re-censor them.
4 Audition before we buy
We don’t have to commit $160k on faith. Cloud GPU rental is now cheap enough that a collective can test-drive the full setup before buying hardware.
Rent a couple of H100 GPUs from a provider like Lambda Labs or RunPod for $2–$3 USD/hour per GPU. Deploy the model with the same inference software (vLLM or SGLang) that we’d use on the DGX Station. Run our actual workloads for a week or two. See if the throughput, latency, and model quality meet our needs.
Total cost for a two-week test: $500–$1,000 AUD. If the collective decides to proceed, that’s money well spent on due diligence. If it decides not to, we’ve lost the cost of a nice dinner, not a house deposit.
NVIDIA also offers DGX Cloud, which runs the exact same software stack as the physical DGX Station — same NIM inference microservices, same NGC containers, same management tooling. If workflow portability matters, this is the smoothest way to audition.
5 The NBN still sucks but not so badly that we can’t work around it
There’s a mundane infrastructure challenge that’s easy to overlook: Australian residential internet is not great. If the machine is in someone’s house on an NBN connection — particularly one of the older HFC or FTTN links — we’re looking at multiple brief outages per day, asymmetric upload speeds that make remote access sluggish, and no SLA to speak of. For a token server, this is mostly an annoyance (our request fails, we retry), but for longer agentic workflows that run over minutes or hours, connection drops mid-session are annoying.
There’s a spectrum of options here. At one end: someone’s spare room, cheap, community-feeling, unreliable. At the other: a quarter-rack at a local colo facility with redundant fibre, expensive, corporate-feeling, rock-solid. In between: a business-grade NBN plan with a static IP and better SLA, or a 5G failover link for redundancy. The right answer probably depends on how many members are remote versus local, and how tolerant the collective is of occasional downtime.
Being as reliable as Anthropic’s data centres in peacetime would be nice, but the baseline that matters is the day the cables are cut — and even the NBN can probably beat that.
6 The member-side stack
The arithmetic above sizes the server — one DGX Station, one model, shared. Each member also needs client-side software on their laptop pointing at the shared endpoint, and the choices there are largely independent of the server choices.
It helps to be precise about the layers between a member’s keyboard and the model running in the colo, because most of the operational decisions live at different layers. The deep version of this taxonomy is in the companion notebook on AI agents, applied; the short version for the collective case:
- Server / daemon (collective-owned) — the vLLM or SGLang process on the DGX Station, exposing an OpenAI-compatible HTTP endpoint. One of these, shared.
- Harness / agent loop (per-member) — the orchestration layer over the endpoint. Manages conversation state, tool calls, system prompts, multi-turn agent loops. Examples:
pi(Mario Zechner’s npm-installable coding-agent CLI, cross-platform), Aider, OpenCode, Continue. - Frontend / chat client (per-member, optional) — Osaurus, Jan, LM Studio, a web UI, or any editor plugin (Cursor, Zed AI) configured for a custom OpenAI provider.
The collective owns and operates the server; the members each pick their own harness and frontend pointing at it. Two members might use the same backend with completely different above-the-server setups — Cursor configured for a custom OpenAI provider for one, pi-in-tmux + Helix for another — without interfering with each other.
pi is the harness that fits this model particularly well. It is cross-platform (Node, runs on macOS, Linux, Windows, Android), provider-agnostic by design (15+ providers plus arbitrary OpenAI- or Anthropic-compatible endpoints via a small config file), and harness-only — it does not bundle a chat GUI or tie itself to one model. A member adds the collective endpoint to ~/.pi/agent/models.json alongside their Anthropic key and their local Ollama; pi switches between them with /model or Ctrl+L. That makes failover graceful at the per-member level — when the colo endpoint is up, the harness uses it; when it is down (cables cut, building on fire, sysadmin asleep), the same harness still works against Claude or a smaller local model. The collective backend is an upgrade, not a single point of failure.
The harness layer also explains where the operational labour lands once the server is humming. Members customise their own tooling — keybindings, extensions, skills, prompt templates — without coordination. The collective only has to keep one stack running (vLLM + a chosen model); each member runs whatever shape of agent loop over it suits their work.
7 Other community infrastructure
In the previous post, I argued that neo-friendly societies should invest in counter-cyclical assets — things that retain value precisely when the state and conventional institutions are under stress.
Sovereign compute infrastructure fits this criterion pretty well:
- It becomes more valuable if geopolitical tensions restrict access to foreign AI services
- It becomes more valuable if commercial API providers raise prices or restrict capabilities
- It becomes more valuable if regulatory changes create compliance barriers to using foreign-hosted AI
- Unlike financial assets, it has direct use value — it does useful work for members every day
It’s also a natural complement to the other things a friendly society might do. The same AI infrastructure that serves our members’ professional needs can also help run the society itself — generating regulatory filings, processing claims, managing communications. This is the AI-assisted administration angle from the previous post, but with infrastructure we own rather than rent.
8 Replicating this model
As with the friendly society model itself, the most valuable output from a societal perspective isn’t a single collective with a single machine, but the documented, replicable process that other groups can fork.
The hardware purchase process, the model selection and de-censoring procedure, the inference server configuration, the access management for members, the cost-sharing model — all of this can be packaged as a how-to guide. Publish the playbook, let others spin up their own nodes.
A network of small collectives, each with their own sovereign compute and running their own models, would be meaningfully more resilient than any individual group relying on a single commercial provider. And unlike a data centre, a DGX Station fits under a desk and plugs into a standard power circuit (a standard Australian 10A outlet can do that, and a 20A circuit is a routine job for an electrician). The barrier to entry could be financial, not technical.
9 Legal structure
A collective that owns a $160k asset needs a legal entity. The technical companion explores the options in detail, but the short version: an incorporated association is the simplest starting point (~$200 to set up, ~$57/year), and a cooperative under the Co-operatives National Law is the better long-term fit if the model proves viable — it’s the legal form designed for groups of people who jointly own infrastructure they all use. ACNC registration as a charity is probably not applicable unless the collective has an explicit community education or digital inclusion mission. Either way, budget $2,000–$5,000 for a solicitor to review the constitution before committing members’ money — cheap insurance on a six-figure purchase.
10 Open questions
- How do we handle the operational side — who has physical access to the machine, who administers it, what happens if it breaks?
- Where does it physically live? Someone’s house? A shared office? A colo? (See broadband discussion above.)
- Is there appetite for a network of these collectives, sharing infrastructure knowledge and potentially load-balancing across nodes?
- What’s the right model governance framework? Who decides which models to run, what guardrails to keep or remove, what use policies to enforce?
- How do we handle ongoing model updates? New versions of Qwen and other open models are released regularly; someone needs to evaluate, de-censor, quantize, and deploy them.
As with the friendly society post: if you know about any of this, or if you’re interested in being part of a pilot group, I’d love to hear from you.
The hardware is available for now. The models are available. The software stack seems mature enough. The missing piece is the institutional form — the small, trust-based collective that can actually make it go.
That’s the part we need to build.
