š January 06 - January 12 | Are bigger models better for AI?
Good day everyone, and may I wish you a happy 2025.
Iām back with the first proper article of 2025 after a well-deserved break from several activities. I kept working over the holiday period, but I took a few days off to be with family who had returned to the island for a short stay. I was very happy to take that time off.
Youāll have noticed that I snuck in a quick note during that period. I had the idea to lay down some thoughts Iāve been having about the state of (tech) community in the West Indies. It was kindly re-published on the platform that I was talking about (CIVIC - Iāve since discussed it with the forum admin, who revealed the name in his post, so Iām fine with stating it here now. I thank Yacine Khelladi for that). It didnāt generate as much discussion as I was hoping, which proves my point sadly! One company did reach out, and Iāll be responding this week.
Anyway, this one is about a paper I read in November āyeah, I live an exciting lifeā about an aspect of the AI hype cycle that is often misunderstood or forgotten. And, as a bonus, invokes Betteridgeās law.
AI hype has done at least three things aside from putting the marketing term in the general publicās lexicon. One, it has hijacked discussions on more serious matters by installing, like a brain worm, in the minds of decision-makers a sense of concentration on AI that will (should?) fix all the problems. āGot healthcare problems in your country? AIāll fix it. Just give us more money.ā Two, it has been allowed to talk over debate about whether or not more compute is going to solve the very real problems current generative AI has. The third, and most important for me, is that it has wholly obfuscated sensible discussion on what AI is. Almost anything that calculates is AI now, according to some. This is stupidly untrue, and the differences between generative AI (GenAI), Machine Learning (ML), deep learning (DL), and many other types of AI have been completely lost.
We in the Caribbean have been largely shielded from this locally and only affected during the consumption of products and services from the US and other countries. However, that will change massively in the Caribbean throughout 2025. Everything is going to be AI.
Iām here to ask you to watch out for this intellectual sleight of hand and question what is actually meant by āAIā when a company is trying to push its wares on you. When your local electricity company says you need to replace your meter with one that has AI, What does that actually mean? In that instance, AI has only replaced the last marketing buzzword of the moment in the instrumentation field, Smart. Smart Meters, Smart Controls etc, etc. It was never āsmartā and wonāt be āAIā either.
The paper reviewed here is from Gaƫl Varoquaux, Alexandra Sasha Luccioni, and Meredith Whittaker. It is called Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI. You can find a pdf here: Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI.
It discusses the attention and growing investment in Large Language Models (LLMs) and the narrative that bigger is better. That is, more compute and more data is key to a better AI. The paper challenges this view by asking a couple of simple questions: What is this assumption based on? What are the collateral consequences of this?
The paper looks at one of the most influential studies about AI scale and performance (AlexNet)1that discussed this.
all of our experiments suggest that our results can be improved simply by waiting for faster GPUs and bigger datasets to become availableā
This set the tone for other studies that, already mindful of this study, concluded similarly into what has become an assumption in the same way that the assumption of Mooreās Law has anchored itself into a discussion of processor performance since it was first ushered. Like Mooreās Law, the reality is much more nuanced and complex. For example, modern processors are actually many processors on a silicon die, more akin to a āsystem on a chipā than a simple processor. It could be argued that individual processors hit a limit quite some time ago and that a workaround and redefinition of what a āprocessorā is was needed to keep the dream alive.
So it is with LLM models. They follow more of a law of diminishing returns than a ādoubling of transistors every 18 months (and thus performance).ā After a certain point, a saturation point is reached on many tasks performed by LLMs and other AI models. Several studies have shown this, as described in the paper.
This is precisely why weāre starting to see models that can be run on personal computers, with one of the most influential companies in the AI hardware space, Nvidia, announcing its own AI workstations, marketed to individuals and institutions that have hit a wall with cloud-based AI solutions that are both costly and often less secure.
Not only has performance plateaued in many circumstances, but the demands on resources have exponentially increased for little gain. Yet, it appears that many applications do not need scale to be efficient and useful. In a medical imaging test on āorgan segmentationā (an ML application), models over around 1 GB in size plateaued in performance shortly after reaching that GB, despite the medial images themselves often being much larger. Performance tanked as the models got bigger and bigger. Other applications seem to corroborate this, too. In computer vision, performance rises quickly and then tanks as the models pass over an āoptimalā size. This has also been shown to be the case for LLMs. One test showed that the LLM performance started to decline from around 100 Gb in size.
Seemingly, a conclusion is being formulated by a number of those in the research community that smaller, more focused models are better for accuracy and performance reasons, but also another factor that I hinted at in the introduction.
Some assume that feeding the machine with more will automatically render the results of that machine ābetterā. We have seen in a couple of examples that that assumption is simply not true in many circumstances. But that assumption also ignores another simple constraint: resources are not infinite.
Compute is constrained by physics and the capacity to manufacture reliably and sustainably. Energy is constrained by multiple factors, such as production, delivery, and cost, to name a couple. Code itself is constrained by developerās capability, productivity and the very real issue of time. Have a read of The Mythical Man Month, otherwise known as Brookās Law.
Its central theme is that adding manpower to a software project that is behind schedule delays it even longer.
Another aspect often ignored or glossed over is the costs associated with the alleged performance gains. Growth in the computer required to create and deploy AI models grows faster than the compute cost decreases. So, as their accessibility increases to ever more of the population, there is this wishful thinking that efficiency improvements will solve this. However, the paper points out an economic effect called Jevons Paradox.
It is a well-known phenomenon in economics that when the efficiency of a general-use technology increases, the falling costs leads to an increase in demand, resulting in an overall increase in resource usage.
The paper discusses the environmental effects of CO2 emissions, concluding that the cost of a single AI inference is growing faster than compute is improving. And given the fact that companies are scrambling to add āAIā to everything, something that was painfully on display at this yearās CES in LasVegas last week, this could increase by an order of magnitude the carbon footprint of AI use and thus contribute even further to the collapse of the climate. To understand this, it is important to understand that inference is AIās biggest compute cost centre. Google āattributes 60% of its AI-related engird use to inferenceā. Other studies have shown that with a few million users daily on Open AI, energy use of inference outweighed that of training within a few weeks. It is no wonder most big tech firms have gone quiet on their sustainability targets, with companies like Microsoft announcing that they would āmissā them. Miss, is an understatement!
The paper discusses quality, which Iāll let you read as it is, too, pretty much universally ignored by the marketing materials and the hyping up of the ābenefitsā of AI by the salesmen who have their interests at heart, not ours. Suffice it to say that bigger models produce more errors at the cost of compute, energy, CO2, etc.
Lastly, the paper discusses another angle often lost in the discussion about AI. Scale, i.e., bigger-is-better, is really a means to build a moat around these businesses. GPT-type models are, for the most part, pretty simple, and there are enough examples and open-source projects that it is straightforward to build one yourself āgiven you possess the right technical skills. In other words, you cannot patent them and protect them using IP laws. So, how do you make it harder for anyone else to enter the game? Have you seen the cost of setting up and running a datacenter? That should tell you all you need to know. To give you an idea, Nvidiaās H100 GPU costs about $40,000. You need a lot of them. Meta is estimated to have spent $18 billion on GPUs in 2024 alone. So this essentially eliminates all but the biggest of budgets, thus protecting the established players, not to mention the circular investment deals like Microsoft and Open AI.
The broader issue with that is that scientific investigation and innovation can only be done with the blessing of these companies, giving them unilateral control to approve, deny, and stop research if that project doesnāt align with its views. Given the recent far-right shift in the politics of Meta, Iām sure this will end well. Nvidiaās aforementioned AI PC is a tiny step in a direction away from this, but it is not enough.
As Caribbean residents, we are perpetually exposed to the ravages of the climate. I seriously question whether these tools are useful enough to warrant their ubiquitous usage in the region. Iāll leave that up to you and your conscience to determine.
Go and read the paper; it is just shy of 10 pages long, and the language is not too technical that non-techs canāt understand.
Reading
Here is a quick summary of articles Iāve read recently. These are not endorsements of their content. I sometimes vehemently disagree with their premise but feel it is important to read as much of a variety of views as possible.
Mark Zuckerberg’s commitment to free speech is as deep as Exxon’s commitment to clean energy
Nuff said.
IGF 2024 in Riyadh: AI, WSIS+20 and the Global South
A roundup from CircleID on the IGF in Riyadh.
āThe Caribbean is a microcosm of Big Techās digital colonialism. Small and medium-sized emerging countries are profitable to exploitā
Consulting firm Strand Consult discusses what we all know already. Where are they from again?
Thanks for reading. Please share with anyone you think might like to read. Have a great week.
-
. ↩︎