If you’ve played with ChatGPT, Claude or Copilot for more than five minutes, you’ve probably had the same uneasy thought: I’m pouring my life into someone else’s black box.

Every query, draft contract, medical worry, marital gripe, trade secret and half-baked business idea goes up to data centres run by a handful of US (and Chinese) tech giants. They promise to protect it. But the basic power imbalance is obvious: they own the servers, so they set the terms.

Over the next decade, that imbalance is going to be challenged – not just by regulation, but by something more basic: commodity hardware. We are heading for a world where it becomes perfectly normal to run serious AI language models on machines you own, in your study or in your home server cabinet. It won’t happen tomorrow, or even next year, but it will happen – and I plan to be among the first to do it (wallet willing).

Right now, building your own serious AI server is still eye-wateringly expensive.

At the extreme end, Nvidia’s hardware is priced beyond the reach of most individuals. Industry guides suggest a fully configured H100 server with eight H100 GPUs costs well north of US$300 000 all-in. New Blackwell-based systems – the kind of kit hyperscalers like Microsoft, Google and Meta Platforms buy in bulk – are reported to be in the region of $3-million per rack. They run hot and they guzzle electricity.

But Nvidia has started to talk about “personal AI supercomputers”. Its new DGX Spark is pitched exactly at that niche. Reports put its price somewhere around $3 000 to $4 000, depending on configuration and vendor. That’s a huge step down from data centre hardware, but in South African rand terms, you’re still looking at R60 000 to R80 000 or more.

Still, that’s cheaper than renting cloud GPUs indefinitely. One recent analysis put a single H100 instance at up to $65 000/year via the cloud, versus about $30 000 to $35 000 to own equivalent hardware over three to five years. But that’s still enterprise-scale economics and it’s not something you casually buy or build in your study at home.

Middle ground

There is a middle ground, and it’s where many early adopters are already playing: high-end workstations and gaming rigs.

Consider Apple’s Mac Studio. The current M3 Ultra option can be specced with up to 512GB of unified memory (shared by the CPU and GPU) and up to an 80-core GPU, easily pushing the machine well into the six-digit rand range depending on storage and CPU/GPU configuration. It’s an incredible little machine for its size – and capable of running substantial local models – but it’s still “very serious hobbyist” money and completely out of the reach of most people.

Read: So, will China really win the AI race?

On the PC (non-Mac) side, the picture is slightly better. You can run respectable seven- to 13-billion parameter models on consumer GPUs like Nvidia’s RTX 5080 and 5090 graphics cards. A brand-new RTX 5090-class card with 32GB of VRAM is still in the “luxury toy” bracket but older (and second-hand) 24GB 4090 and 3090 cards offer a lot of VRAM at lower prices than Nvidia’s new halo products.

By the time you’ve added 64GB (or, better, 128GB) of system RAM, fast flash storage and a decent CPU, you’re still staring at a machine in the upper five digits in rand terms. That’s okay for a small business running its own, fine-tuned models in-house; it’s overkill (and wildly over-budget) for a typical household.

So, yes, local LLMs are possible today. They are even practical for some workloads on less-demanding hardware (I run some smaller models, like Mistral and GLM-4, using Ollama on my ageing M1 Max MacBook Pro). But the machine croaks on larger models, limited by the available unified memory (32GB in my case) and the lack of GPU grunt in the now-four-year-old Apple chip.

Given the current costs, why should we even care about local LLMs? Because cloud AI is a privacy nightmare waiting to happen.

Even assuming perfect behaviour by the big platforms – no training on your private data, for example (yeah, right) – the architecture itself centralises risk. Your prompts, outputs and sometimes your underlying data all leave your environment. That’s before we even get to the business model. The same companies selling you “AI productivity” are also in the business of ad targeting and behavioural profiling and squeezing every possible useful morsel out of user data, including your private and sensitive information.

Running models locally flips that. Your raw data never leaves your machine. There is no provider log to subpoena in court, no system quietly learning that you’re considering leaving your employer or buying a competitor’s product. The attack surface shrinks to: “Can someone break into my hardware?”

For journalists, lawyers, doctors or anyone else dealing with sensitive data, that’s not a nice-to-have. It’s quickly going to become essential.

The hopeful bit is that the economics are moving in our favour. The CPU-centric Moore’s Law has undeniably slowed, but AI price-performance is still improving at breakneck speed. Each GPU generation brings more performance per watt, more memory bandwidth and (sometimes) more VRAM. At the same time, the software stack is advancing at a rapid pace, helping LLMs use available hardware more efficiently.

Combine these trends and something interesting happens: the line where “good enough local AI” intersects with “ordinary household budget” is moving inexorably closer.

Road map

Here’s a speculative road map (disclosure: provided with the assistance of ChatGPT – yes, I see the irony):

By 2027/2028: High-end gaming PCs and creative workstations in the R40 000 to R60 000 range will routinely ship with 32-48GB of VRAM and 64GB or even 128GB of system RAM. That’s enough to run genuinely capable assistant-class LLMs locally.
By 2030: The “upper-midrange” desktop – what a serious gamer might buy – will comfortably host 64GB VRAM GPUs and 128GB or 256GB of system RAM. Think of this as the point where buying a local LLM-capable machine doesn’t become like a choice between PC hardware and buying a small car. In rand terms, that means perhaps R30 000 or R40 000 for a box that can handle the bulk of everyday AI workloads at home, assuming the rand keeps steady and the current surge in memory prices is temporary.
Early 2030s: Expect AI appliances, shoebox-sized boxes (or smaller), perhaps sold by the same brands that make your home router, bundling an efficient AI accelerator, plenty of memory and a slick user interface. Price bracket: high-end smartphone, perhaps. They’ll sit next to your fibre wall box, quietly running your family’s chatbot assistants, summarising mail, indexing your documents and photos, and answering general questions – all without touching the public cloud.

Even if those data ranges are on the optimistic side, barring some catastrophic slowdown, the direction of travel here is clear.

This is not an argument to abandon cloud AI. Hyperscale models will always be ahead on raw capability, training data and cutting-edge research. But we should absolutely be planning for a hybrid world where routine, private workloads run on devices and servers we own. In this world, cloud AI will used more selectively for tasks that genuinely justify it.

Local AI models will run on high-end consumer GPU hardware, like Nvidia's RTX 5090 (pictured) — Local AI models do run on high-end consumer GPU hardware, like Nvidia’s RTX 5090 (pictured)

This shift will start with all of us asking this question: do I really want to send this request to someone else’s server? If the answer is increasingly “no”, then building your own AI server stops being a geek fantasy and starts looking like a rational act of digital self-defence.