The Great Offline Divide

Tech

One might recall, with a suitable amount of dramatic flair, a recent dispatch from the estimable MIT Technology Review. Their piece, "How to run an LLM on your laptop," opened with the rather tantalising hypothetical of an impending societal unravelling. Picture it: you, a USB stick, and the profound burden of rebooting civilisation. Our esteemed colleague, Simon Willison, posited that an offline LLM is "like having a weird, condensed, faulty version of Wikipedia."

This, naturally, led my inquisitive mind down a rather specific rabbit hole: Precisely how do the purported compact forms of local LLMs stack up against the venerable, albeit decidedly less chatty, offline Wikipedia archives in terms of sheer digital acreage? It's a question for the ages, or at least for a moderately bored Friday afternoon.

To address this burning query (and by "burning," I mean a gentle flicker of curiosity), I embarked upon a rudimentary comparative analysis. I perused the Ollama library for models amenable to the humble consumer-grade silicon, and then turned to Kiwix for Wikipedia bundles, sans the bandwidth-guzzling imagery, to ensure a somewhat fairer, if still inherently flawed, comparison. The findings, presented here in ascending order of digital heft, are, shall we say, illuminating:

Name Download size
Best of Wikipedia (best 50K articles, no details) 356.9MB
Simple English Wikipedia (no details) 417.5MB
Qwen 3 0.6B 523MB
Simple English Wikipedia 915.1MB
Deepseek-R1 1.5B 1.1GB
Llama 3.2 1B 1.3GB
Qwen 3 1.7B 1.4GB
Best of Wikipedia (best 50K articles) 1.93GB
Llama 3.2 3B 2.0GB
Qwen 3 4B 2.6GB
Deepseek-R1 8B 5.2GB
Qwen 3 8B 5.2GB
Gemma3n e2B 5.6GB
Gemma3n e4B 7.5GB
Deepseek-R1 14B 9GB
Qwen 3 14B 9.3GB
Wikipedia (no details) 13.82GB
Mistral Small 3.2 24B 15GB
Qwen 3 30B 19GB
Deepseek-R1 32B 20GB
Qwen 3 32B 20GB
Wikipedia: top 1 million articles 48.64GB
Wikipedia 57.18GB

Methodological Confessions (and Other Disclaimers)

Now, a true scholar must, of course, append a list of salient caveats. And here they are, presented with the gravitas they so richly deserve:

  • The Inevitable Apples-to-Oranges Problem: Let us not equivocate. We are comparing fundamentally disparate technologies. An encyclopaedia is a curated repository of knowledge; an LLM is, to put it mildly, a stochastic parrot of the highest order. Their raison d'être could not be more divergent.
  • The Resource Hog Factor: While we focus on download size, one must remember that LLMs, even in their "local" guise, possess a voracious appetite for RAM and CPU cycles. Offline Wikipedia, conversely, will happily purr along on my antique, low-power laptop, presumably while sipping a cup of artisanal tea.
  • The Specificity Conundrum: This comparison is generalised. One could, for instance, download a highly specialised Wikipedia chemistry bundle or an LLM meticulously fine-tuned for a particular hardware configuration. (And let us not forget Kiwix's treasure trove, including the entirety of Stack Overflow, a veritable digital life raft for any future programmer apocalypse.)
  • The "Vibes" Heuristic: My selection criteria for these particular entries? Entirely subjective. Rigour was, frankly, secondary to intuition. One might even say I chose them... based on a hunch.

Despite these charmingly amateurish methodological choices, I confess to finding it rather intriguing that a curated "Best of Wikipedia" (50,000 articles strong, no less) is, roughly speaking, within the same digital ballpark as a Llama 3.2 3B model. Or, indeed, that the entirety of Wikipedia can begin smaller than the smallest local LLM I tested, yet paradoxically, also swell to surpass the largest. The digital universe, it seems, contains multitudes, and also, occasionally, a healthy dose of irony.

Perhaps, in the spirit of preparedness, one should simply download both. You know, just in case the internet takes an unexpected holiday.

Next Post