skip to main content

The Evolution of AI Reasoning

In this episode of “Waking Up With AI,” Katherine Forrest and Anna Gressel explore the transformation of LLM capabilities over the past few years. Are these models merely engaging in word prediction, or can they actually “reason”?

  • Guests & Resources
  • Transcript

Katherine Forrest: Good morning, and welcome to another episode of “Waking Up With AI,” a Paul, Weiss podcast. I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel.

Katherine Forrest: And Anna, I'm going to show you — our audience can't see it because they only can listen to us — but I'm going to show you my moose mug, which will tell you where I'm drinking because I'm drinking coffee out of this moose mug. And yeah, it's good.

Anna Gressel: I love it.

Katherine Forrest: But we're not in the same time zone, not because of my moose, but because you have winged your way, is that what it is? Winged your way? It's not “wunged” your way, but it's winged your way, over to Europe again. So where are you?

Anna Gressel: That is true. So I was just in France, and I was speaking at a big conference for lawyers in Paris. And actually, Katherine, you would have loved it. We had a completely fascinating conversation with a bunch of experts on the coming age of AI agents and all

of the things that corporations should be thinking about in terms of liability issues and defensibility issues. And it was actually a great event.

Katherine Forrest: Well, that's fantastic. And I probably really, really, really would have loved

It, talking about agent — agentic AI, AI agents and all that. But I have been in Maine with my moose mug working on the book that we talked about actually in our last podcast. I'd done an article on it that we used in our last podcast to do the podcast within the podcast. The book is called “Of Another Mind,” and that's what I'm here for.

Anna Gressel: I mean, I'm sure we're going to talk about this book, I hope so, on the podcast in future episodes.

Katherine Forrest: Yeah, well, no, I think we're going to talk about it really extensively at different times, and I am totally immersed in it right now. So I thought that we could maybe have a conversation about part of it.

Anna Gressel: Yeah, I think that would be great. And let's use it to set the stage for some of those later conversations we're going to have about a topic that's come up repeatedly since you and I have been working together: whether today, like today, today, AI actually engages in word prediction or to the contrary, anything we could recognize as reasoning.

Katherine Forrest: Okay. I am so into this conversation. Let's go ahead and jump in.

Anna Gressel: Awesome. So I want to start in 2021 because a paper came out that was quite

famous that year that referred to a lot of different concepts related to AI training and set the stage for a conversation that folks have been having for several years about the capabilities of large language models. That's a paper many of you will have heard of called “On the Dangers of Stochastic Parrots” by Emily Bender, Timnit Gebru, Angelina McMillan-Major and Shmargaret Shmitchell.

Katherine Forrest: I know that paper really well because it's one that has kept coming up.

It's kept coming up really over and over again in the last couple of years.

Anna Gressel: Yeah, and the paper talks about a number of things, but one of them is that no actual understanding is taking place within LLMs. And it describes LLMs as essentially cheating their way through tests by manipulating linguistic form really well.

And the authors talk about misdirected research efforts around LLMs trying to test for language understanding. And their proposition is basically that LLMs do not, in fact, have access to the meaning behind the words themselves.

Katherine Forrest: Right. And that's where the phrase stochastic parrot comes in, because it's suggesting that the model is just really taking language and applying probabilistic methods. You know, an algorithm to different kinds of data sets and data points, and determining which word is most likely to be responsive to a question based upon the words in the question without attributing any meaning to the question, just analyzing really these words are next to these other words. Therefore, I'm going to grab this word, which seems to be related, but it does it in such a sophisticated way that it's just a form of parroting, if you will, language back as a result of a query or in response to a query without any real understanding. That's the theory.

Anna Gressel: Yeah. And a lot of speakers on panels, look, I'll say myself included at different points, have talked about LLMs based on transformer architecture as a basically next word prediction engine. So they’re models that just predict the next word in a sentence. And Katherine, I know you and I have talked a lot, a lot about that framing and how we feel about it.

Katherine Forrest: Right. And I want to start our conversation today because we're going to move away from stochastic parrots. At least that's my argument, is that LLMs today are far more than stochastic parrots. You see, I have to drink more coffee from my moose mug in order to really get this right.

But I want to start by saying that when the paper was written, it was 2021. And so when we talk about this paper and why we think things are different, we're not criticizing the paper. What we're really doing is talking about the velocity of change in LLMs since 2021. So LLMs today really have different capabilities than they did then, by a lot. And so, we can give some examples about why that's so. And I have one example particularly in mind.

Anna Gressel: Are you heading for that o1 system card? I know it's like your favorite thing talk about these days.

Katherine Forrest: It is. It is. I'm really predictable. I love the o1 system card. Everybody should read the o1 system card. In fact, if you're interested in what a model does, you should always take a look at the system card in general, which is often released by the model developers to discuss a variety of information about the model.

But what I was going to say about this was that at the time of the release of the o1 model, when the system card came out, there was also another accompanying release by the developer, which in this case was OpenAI for the o1 model. And it published an announcement on its website, which you can still find today on the website as a sort of, I wouldn't call it a marketing announcement so much as an informational announcement. But it's a very interesting announcement that proudly refers to the model as “learning to reason with LLMs,” and the announcement says that “we're introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. O1 thinks before it answers, it can produce a long internal chain of thought before responding to the user.”

Anna Gressel: Yeah, and both the announcement and the system card also talk about some of the incredible capabilities that the o1 model has.

Katherine Forrest: Right. And for purposes of our discussion on whether or not LLMs can “reason,” as we've been sort of posing the question today, it's really useful, I think, to analyze some of the capabilities of this model because certain capabilities have been harder for LLMs to achieve over time. By the way, there are some other highly capable models, but we're using the o1 as an example, demonstrate these particular capabilities, demonstrate something that's beyond just word prediction. For instance, mathematical abilities, chemistry abilities, physics capabilities, and these are not necessarily text-based. And they can often require multimodal capabilities to perceive what the issue is that's in front of you, the mathematical problem, the equation in physics, et cetera, and to be able to interpret those graphs and formulas.

Anna Gressel: Yeah, and one of the things that's interesting about this model and some of the other models that use similar technology is that it embeds what's called chain of thought reasoning.

Katherine Forrest: Right. And chain of thought is basically having the model explain in words, in step-by-step form how it arrives at a particular answer. And it explains its literal “chain of thought.” And this is a really terrific ability that also allows us to understand how the model is reasoning and that it is reasoning, and it's truly doing more than just predicting the next word.

So back to the stochastic parrot, it seems clear to me that the o1 model is certainly doing a lot more than parroting back words because you can see the chain of thought reasoning that is behind the answers.

Anna Gressel: Yes, that's a good point, Katherine. And it's clear that there's at least some understanding of, or facility with, the underlying concepts that's becoming integrated in these models in some way or another. And that's not just language representing those concepts. And to me, I think that's particularly true with multimodal models. And some of those are becoming sophisticated enough to, for example, simulate things about the world, like fluid dynamics or other physical systems. That's super interesting. And it's part of what is making those models really interesting as well for industrial use, like in the fields of industrial design or manufacturing. Those models are getting more sophisticated every day. And Meta, for example, has released a model that does text-to-3D generation and something called texture refinement. It's super cool, but it's completely different than next word prediction.

Katherine Forrest: Right. And we know from prior episodes on our podcast, these MLLMs, these multimodal LLMs, take in a variety of types of data.

Anna Gressel: Yep, not just text data.

Katherine Forrest: Right. And they're doing more than just analyzing semantic structure of language or the relationships between words. They're analyzing patterns of images that can be patterns of, for instance, how far apart the average person in my family's nose is from his or her eyes or what kind of timbre they have in their voice as their vocal cords vibrate, lots of different capabilities.

Anna Gressel: Totally. Or the textures of specific physical systems, right? What we were just talking about. And companies are working on doing this so their models can create content in different formats or that works in different environments. So they could create a deepfake that might be entirely made up, but really, it's something that's truly photorealistic. Or even a model of hardware that it has complete fidelity to its real world analog.

Katherine Forrest: Right, and it's hard to say that this is just parroting something back. It really is more than that. It sort of needs to be more than that if you look at the capabilities and you look at what the model has to do in order to provide the results that it provides.

Anna Gressel: I completely agree. It's such an interesting question. And it seems like we could go on and on about this. But it might be time to send you back to writing your book because I know you're trying to finish it.

Katherine Forrest: I am. I am. And today, my task is to talk about what the constituent preconditions are for AI consciousness. And so, I'm excited about this little piece of it. So, let's sign off for today, folks. That's all we've got time for. I'm Katherine Forrest.

Anna Gressel: I'm Anna Gressel. And catch you next time on “Waking Up With AI.”

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2025 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy