OpenAI’s release of its new large language models this week dubbed “Strawberry”, “Q-Star” and “o1” (depending on who you ask and how hip you are among the AI geeks) marked a massive shift in AI’s abilities. And unusually for the tech world, it wasn’t about making things faster – instead, it’s made AI significantly slower. And better.
The way AI has worked up to this point could be described as “thinking fast”, as defined by Daniel Kahneman in his popular science book, Thinking, Fast and Slow (2011). The fast thinking, which he calls System 1, is that immediate reaction you have to a simple situation. For instance, complete the phrase: “Take it slow and smell the…” Your fast system should have filled in the blank without straining your neurons too much.
The popular large language models would also not have struggled with that question, coming up with “roses” pretty darn fast. (Conveniently, this also demonstrates exactly how a large language model works – it simply predicts the most likely next word based on what’s gone before.)
Slow thinking, Kahneman’s System 2, is reserved for planning, deep thinking, and analysing: pretty much everything that isn’t a shoot-from-the-hip reaction. For instance, “Plan an article about OpenAI’s new models” would result in System 2, or slow thinking. It immediately raises more questions: who is my audience? How technical are they? Will people worry that this is going to take their jobs? Is there a witty quote about that?
Historically, the AIs have been incredibly bad at this. That’s been a good thing for job security for people like me who want to write articles or code applications, but it’s been a bad thing for people like me who use AI in the applications, but can’t get it to do what I want it to do.
Here’s a real-world example: I’ve been building a headline suggester for Daily Maverick. Through a great deal of data analysis of the top 10 000 articles on DM, I discovered that the optimal headline length is 14 words, longer than the accepted best-practice of 12 words. So, I’ve told my AI to limit itself to the 12- to 15-word mark for headlines. In return, I get headlines from four to 20 words. It kind of tends towards what I’m looking for, but there are no guarantees.
Paradoxical question
The problem is demonstrated by one of the most famous tests for AIs: asking it, “How many words are in your answer?” GPT-4 would typically reply: “There are six words in this answer.” (I won’t make you count: there are seven.) For the AI, it’s a paradoxical question. It’s terrible at counting in the best of circumstances, and asking it to count the words in an answer while it’s answering it is impossible, since it doesn’t know until it’s completed generating new word after new word, and by the time it gets to the end (with the number in the middle), it’s too late.
So, after all that Intro to AI 101, how is this new Strawberry AI from OpenAI different? I’m sure you’ve figured out by now that it breaks out of the slow-thinking paradigm, and introduces long thinking. Yes, we’ve given the AI’s the ability to plan ahead. (I’m sure this won’t end in post-apocalyptic tears.) OpenAI has mostly hidden exactly how it’s doing it, but we can infer quite a bit from how the two new AI models (slugged o1-preview and o1-mini) work.
Read: OpenAI in talks to raise funds at $150-billion valuation
Immediately you’ll notice that it doesn’t stream out a response like a typewriter anymore. So, we know it’s not simply thinking about what the next word will be. (Well, it probably is, but as part of a multi-stage process.)
OpenAI has also revealed some of the thinking process, where you can see how it breaks a problem apart. It would recognise the “How many words are in your answer?” as a paradox, come up with a plan to solve it (write out the answer, but leave a placeholder for the number), and then implement its plan.
We can also see how much work it’s doing in two ways: how long it takes, and how many tokens it’s using. A token is typically (but not necessarily) a word to an AI. More tokens means more words, which means more work. Asking “How many Rs in Strawberry?” would take 31 tokens for the previous GPT-4o model to answer, incorrectly as two. The new model takes 430 tokens to answer the same question, even though we don’t see those tokens. Unlike the previous generation, it also gets it right.
In general, based on the few tests I’ve done, it takes over 10 times as much work to get an answer, which translates to significantly more to time to answer, more tokens, and we can assume greater power consumption.
How much more power? AI is not a green technology. OpenAI’s founder, Sam Altman, has invested US$375-million into a nuclear fusion start-up; Amazon is investing in nuclear; and Google and Microsoft are working together to find and build new energy sources. Thanks to AI, IT infrastructure’s power usage is expected to triple by 2030.
Read: Zuckerberg says new Llama 3.1 model rivals ChatGPT
Teaching AI to “think slow” will come with a massive cost, which is likely to slow its adoption in the short term. When I paid $0.50 to find out how many Rs were in Strawberry, it certainly put the brakes on my personal adoption. But the costs will drive down while the speed goes up. Many of the jobs previously safe from AI creep will be facing the same fate as graphic artists, content farms and bad romance novelists. If you haven’t yet, it’s time to embrace our AI overlords. As economist Richard Baldwin memorably quoted last year, “AI won’t take your job, it’s somebody using AI that will take your job.”
- Jason Norwood-Young is a technologist currently working at applying AI and machine learning to the benefit of the media industry. He also works in the open data, big data, data visualisation and privacy fields