skip to main content

AI Research Reimagined

In this episode of “Waking Up With AI,” Katherine Forrest and Anna Gressel examine the integration of end-to-end reasoning and agentic AI capabilities, with new developments from OpenAI, DeepMind and other leading AI labs. Katherine also shares her firsthand experience with OpenAI’s new deep research capability, which is transforming academic applications of AI.

  • Guests & Resources
  • Transcript

Katherine Forrest: Good morning, everyone, and welcome to another episode of “Waking Up With AI,” a Paul, Weiss podcast. I'm Katherine Forrest, and I'm thrilled to be here with you.

Anna Gressel: I know, reunited at last, Katherine.

Katherine Forrest: Anna Gressel. Here she is back from wherever she's been.

Anna Gressel: Mm-hmm. I know.

Katherine Forrest: An undisclosed location.

Anna Gressel: I know. We're both back from our exciting secret travels. They're not so secret. For mine, they're like splashed all over social media, but it's super nice to be back together and talking about really an interesting topic today.

Katherine Forrest: Very, very interesting time right now in the world of technical developments. And in fact, there was one that occurred just before this episode. So we'll be talking a little bit about that but, we've been talking for a while on this podcast about changes on the horizon for how we interact with AI and what AI can do for us. And in some of our prior podcasts, we've talked about whether or not AI thinks. We've also talked about the concept of chain-of-thought and how that can actually improve the reasoning abilities of AI. We've talked about AI agents and which actions can be taken autonomously on a user's behalf. And we're going to combine some of those things today.

Anna Gressel: Yeah, I think some of that horizon scanning is actually what we love doing the most, partially why we indulge ourselves with it on the podcast, because we both really, really enjoy it. And so today we'll be talking about some recent developments from, as Katherine mentioned, Anthropic, but also OpenAI, DeepMind, DeepSeek around…

Katherine Forrest: I hadn't actually mentioned, oh did I mention the Anthropic one?

Anna Gressel: You did mention it briefly.

Katherine Forrest: I said there was a tech development.

Anna Gressel: Oh, did you not mention it was Anthropic? Well, spoiler, spoiler.

Katherine Forrest: Okay, go for it.

Anna Gressel: Yeah, spoiler, it was Anthropic, I think, just today. We haven't even had a chance to dig into what they've done today. This is how fast everything moves. You know, we're recording one minute and there's a development like five minutes before we get on and it's hard to have time. But we're going to be talking a little bit about these end-to-end reasoning models that have been released over the past months, and several of the top AI labs have started releasing AI agents. We've talked about that. Katherine, why don't we dig in a little bit into these end-to-end reasoning kind of chain-of-thought research models.

Katherine Forrest: Yeah, and so we've got one now by OpenAI, DeepMind and DeepSeek. And then of course, the one that you spilled the beans about just before we went on air for this, Anthropic has now released its own reasoning model. For many observers, it may have appeared that these two fairly novel capabilities of both reasoning and action-taking, the reasoning, chain-of-thought, action-taking, agentic, were developing in their own sort of channels, right? And in their own ways. But common sense tells us, and just the velocity of change tells us, that things in this AI world tend to come together. And that when you put reasoning together with action, and you think about reasoning as an extension of action or action as allowing for additional reasoning, they come together. So AI researchers have been putting them together and they have taken these two separate computer science problems, if you were, and they have now really worked on different ways to put end-to-end reasoning together with agentic AI capabilities for autonomous reasoning.

Anna Gressel: Yeah, that's exactly right. I mean, we saw research that was really focused on reasoning about how to teach a model to improve its chain of thought. We've talked about this before, but that concept is basically the scratch pad. I mean, it's a literal scratch pad the model uses to take notes on to help reach an answer and then scaling and improving that method. And that work had seemed kind of separate first from agentic AI research, which focused on how to architect an agent to do stuff you wanted it to do by simulating planning and memory, multimodal understanding and tool use, all of which are incredibly important for agentic development.

Katherine Forrest: That's not to say that these two fields of end-to-end reasoning and agentic capabilities were really islands unto themselves. But it does mean that they presented distinct challenges to the AI research community.

Anna Gressel: Yeah, and I think that chasm has collapsed in on itself a little, or at least a pretty good bridge was built across it a few weeks ago with OpenAI's newest agentic offering. Not Operator, which came out towards the end of January, but actually a model or product called deep research, which came out in February after DeepSeek set the world abuzz. And by the way, DeepMind also has a tool they call Deep Research, which came out in December. I don't think there's any relationship between the two of them, so like, you know, let's all be careful about the number of “deeps” that we have in all these words. They sound similar, but they're a little bit different.

Katherine Forrest: Right, and deep research being used twice. But for the moment, let's really dig down into what is a phenomenal capability in OpenAI’s deep research. And that you can get on your ChatGPT. And what you can do is, you know, a lot of people are familiar with the free ChatGPT or the $19.99 version of ChatGPT. And we're going to talk a little bit about the Pro version of ChatGPT, which is $200 a month, and the fact that that Pro version now has a deep research capability. So deep research starts with reasoning, as obviously an essential component of a deep research process. And what the technology allows you to do is now do academic quality research with, effectively, a ChatGPT model.

Anna Gressel: Yeah, and it does that by combining two things: the capabilities of a large language model to interpret and connect material, and then a function that enables deep research on the web. So this is actually like going out and doing live research on the internet. And that's the central task shared between these tools. So that means that the system takes a question from the user, it finds sources relevant to the question by searching the web and it synthesizes the information found across those sources into a research report. So the first step is providing a research query to deep research. And it's really designed to thrive with research queries, not one-offs like who won the Olympics last year. So it's more like questions I would actually give to an associate and ask for something back in a few hours or a few days.

Katherine Forrest: One thing that I want to say to the audience is that I signed up for this OpenAI deep research capability by upgrading my subscription. I happen to have the OpenAI $19.99 a month. I actually have several different models. I have Claude's, you know, I've got a whole bunch of different models that I've used from time to time. I have an OpenAI, a ChatGPT one, and I upgraded it. And when we're talking about deep research and talking about, as Anna said, the creation of a potential report or research report, what we're talking about is the ability to enter a query, which I did. I had just the query bar opened. It looks very much like just the ChatGPT query bar, although you push the button for deep research to sort of get it to do deep research and you can pick your model. And so it's running on the o3 model, the o3-mini or the o3-high, or you could run it on a different one, but that's the one that people are really excited about. In fact, with that o3 model, it's the first time that you can really use that o3 model. And so you put the query in and it helps you, first of all, refine the query. I put my query. I'd like to have this, a paper on emergent behaviors in AI models that have been observed over the last couple of years. So it wrote back and it said, before I do this, I'd like to understand whether or not you would like to have it in an academic style or in a just a readable sort of narrative style, what kind of footnote style you want, and if you have any particular other instructions. And so my other instruction was, I said I want it in an academic style. I wanted it to actually have a Chicago Manual of Style footnotes and that my other instruction was that I wanted it completely footnoted and linked. So it said, fine. It said, you know, I can do this. I'll get back to you when I'm done and it's ready for your review. Now what's interesting is it took about 10 minutes. And what this model is doing is it's running on the o3 large language model capability, and it's doing this extensive research. I don't know exactly what the corpus is, but it's the internet. And whether it's got additional pieces in there or not, I don't know exactly if they've added to it or not added to it. But then it came back with an extensive paper, about 10 pages long, single-spaced, that has an abstract, reads very much like an academic research paper, and is incredibly well footnoted to archive articles that it read to, let me get my iPad up here, to quantamagazine.org to hai.stanford.edu, lots of archive articles and lots of Quant Magazine. And so it came up with this incredibly in-depth article.

Anna Gressel: And I think, you know pulling on one thread, which I think is really important. When ChatGPT and some of these other chatbots originally came out, actually the inability to iterate with questions was one of the big criticisms of those models because they were really reliant on a prompt being perfect to get a perfect response. And you could say, actually, do it differently. Actually, I really meant this. Like you could refine over time. And particularly as the context window got bigger with those models, that meant you could actually go back and forth and it would have memory for that.

Katherine Forrest: Remind our audience what a context window is, Anna.

Anna Gressel: I mean, colloquially, I think the context window is just how much information it's able to process in answering a particular query. And so that can include past queries you have and like past back and forth kind of questions or answers. And so that context window, you know, made it better at iterating with you, but it can never actually go back to you and say, I'm a little confused, can you clarify? Or did you mean this by this term? Or that by that term? And that was a limiting factor in how good the outputs were. And that was something that I think a lot of people recognized was the inability to be on a two, in a two-sided conversation with your chatbot did actually decrease some of the contextual appropriateness of the output or the ability of the output to be really tailored to what you were actually looking for. So this seems like a pretty big game changer, the fact that it can come back and say, do you want these footnotes? But also, what did you mean by this? Or would you prefer for me to take this as an assumption or that as an assumption? I mean, there are very interesting questions back that I'm seeing in some of the demos about these tools that are almost like what a junior associate would ask when they got an assignment. Did you actually mean this or should this be an assumption of the project? And that I think is completely fascinating.

Katherine Forrest: It is really fascinating and also, it's like having a research assistant for any academic topic that you want in your pocket. The major uses right now, according to some news articles on this, are technology research, medical research and legal research. And by the way, I've not tried it for those areas. I was just doing the one that I did today, so I don't want to vouch for the level of hallucination, you know, accuracy that this has. And of course, I'm sure every firm now has their own policies on whether or not certain tools can be used, and those have to always be kept in mind. But it's running on the o3 model, which is really the, or now Grok says it's also the, but it's certainly one of the top most powerful models, really, that we are even aware of right now.

Now one other thing I wanted to just say was that DeepMind, as we've mentioned before, also has its deep research capability. And that is architected around Google's Gemini 1.5 Pro, which is a different model. So we've got the OpenAI ability to run it on the o3 and then you can also do the DeepMind’s running on the Gemini 1.5 Pro. A chief research officer at OpenAI explained in a post that was recently on X, and I'm quoting here, that “The important breakthrough in OpenAI's deep research is that the model is trained to take actions as part of its chain of thought. The problem with agents has always been that they can't take coherent action over long time spans. They get distracted and stop making progress. That's now fixed." So what we've got, and I know I'm going on and on and on, Anna, but you see I'm so excited about this. What we've got is the deep reasoning model now combined with that agentic, autonomous ability for a model to take actions on its own.

Anna Gressel: Yeah, and for folks who listen to our agent, our more recent agent episode, we've done a few, we talked about how the Anthropic upgraded Claude 3.5 Sonnet model got distracted and started looking at Yellowstone photos. And this is, in a way, what Katherine's saying is an ability to stay on task. And staying on task is really important for agents because it's important if they can actually take actions in the real world that they take the actions you intended them to take. That’s a pretty significant comment and a particularly significant breakthrough on the agent side or development on the agent side. I don't know if breakthrough is quite the right word.

Katherine Forrest: Well, I think, based on what I saw, it was pretty extraordinary. One thing that I pay a lot of attention to are emergent behaviors and the research on emergent behaviors. Now, I've not looked at all the footnotes again, so I can't vouch for the level of accuracy or any hallucinations. But what I can tell you is that the information that was in the piece that I researched was extraordinarily, number one, well-written. It was really, really well-written, but also really well-researched. So they're getting it from somewhere. So we're going to have to check out, and I will check out, the accuracy of the model. And there are some differing viewpoints on the accuracy of the model, but the overall, if you look at the overall reviews, there are a number of people saying this is an absolute game changer. What you've got is, as I have said before, a research assistant in your pocket, which is extraordinary. Now you're paying for it, right? You're paying $200 a month, and you only get a hundred of these research queries a month. So you got to make sure you pick your research projects a little carefully. But it's a pretty powerful tool.

Anna Gressel: Yeah, I think we're going to see it open up a ton of interesting use cases. And it's like, in many ways at every advance we see in the AI space, we then see some of the promise realized on use cases that felt like they were almost there but weren't quite there. So maybe we'll start seeing those jumps as well in the products on the market.

Katherine Forrest: And before we close out for the day, let's just mention that Perplexity has a version of deep research that also appears to be a similar sort of reasoning model. And so there's going to be a lot happening in this area. Again, it's combining agentic with reasoning to allow this sort of iterative process, and for the model to go out and to research and to be able to bring you back a paper. And so it's doing a lot of independent work out there on the internet. So we'll be coming back to this, Anna.

Anna Gressel: Oh, most certainly. Most certainly.

Katherine Forrest: Alright, so that's all we've got time for today. I'm Katherine Forrest.

Anna Gressel: And I am Anna Gressel. Like and subscribe if you're enjoying the podcast.

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2025 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy