Waking Up With AI: Developments in AI Agents

Learn More

Developments in AI Agents

Join Katherine Forrest and Anna Gressel as they delve into the latest advancements in AI policy, regulation, and agentic technology. On this week’s episode they discuss how AI agents are transforming industries and the critical compliance and governance issues that come with these innovations.

Guests & Resources
Transcript

Katherine Forrest

Partner

» Biography

Anna Gressel

Partner

» Biography

Katherine Forrest: Hey folks, good day to everyone, and welcome to another episode of “Waking Up With AI,” a Paul, Weiss podcast. I’m Katherine Forrest.

Anna Gressel: And I’m Anna Gressel.

Katherine Forrest: And I just want to say the reason I say good day is because we're actually recording this, Anna, at two o'clock in the afternoon. So rather than having a cup of coffee next to me, I've got a caffeine-free Diet Coke.

Anna Gressel: I don’t actually have anything next to me.

Katherine Forrest: There you go. That's the truth. All right, so we're just a couple of days recording this post-election in the United States, and it's an important moment here for many reasons, but there are going to be a lot of impacts on AI policy regulation, potentially the supply chain, things like access to some of the energy sources that are so necessary for AI development, so I think we really could do a whole episode on this.

Anna Gressel: I think that's exactly right, and we probably should do one just looking ahead to 2025 and the policy implications and landscape for AI of an entirely new administration.

Katherine Forrest: Right, I think that maybe we'll put this off for just a couple of weeks and see where the dust sort of lands, but there could be everything from AI-specific issues that are really directed towards AI in some way. And also there may be some changes to some agencies that have had some important policies, procedures, certain regulations that they've implemented, the EEOC being one, the CFPB being another for AI, and so we'll see where a lot of different things land and maybe do that whole episode in just a couple of weeks. So our listeners should stay tuned for that.

Anna Gressel: Completely agree Katherine, and what I’ll say is if folks have questions in the meantime or want to talk through any of this stuff directly, feel free to reach out. Hopefully, our listeners know our e-mail addresses, which are actually available on our website, so you can find us if you want to find us. Don't hesitate to reach out if you want to talk through where the new administration might be headed or the regulatory enforcement landscape. We have a lot of thoughts on that. It's an important topic.

Katherine Forrest: I have a lot to say. I'll save some of it for that special episode we're going to do, but for today, let's turn to a very important topic about developments in the AI agent space. And let's give some attention to that.

Anna Gressel: Yeah, so for folks who've been listeners to the podcast for a while, you know that Katherine and I have been very interested in AI agents.

Katherine Forrest: Well, I actually love the phrase, agentic AI, because I really don't know why they have to call it agentic. You would think they would just call it agent AI, but no, it's agentic. But more than that, we actually had done a whole podcast episode on this back in April of 2024. And we at that time had forecasted that AI agents would be really the hot topic for 2024, and I think they have been.

Anna Gressel: Yeah, I mean, we're here in fall of 2024, and basically what we predicted came to pass. And AI agents are the hot thing this fall. Everyone's talking about them and thinking about them. So we thought we would kind of ground you in where we are today.

Katherine Forrest: Right, so let's use our time then, Anna, to sort of jump right into it and talk to our listeners about some of the exciting, really truly exciting developments in the agent area, and talk about then a few of the interesting legal implications relating to some governance and compliance issues that these agents raise.

Anna Gressel: Yeah, for sure. And compliance and governance issues around agents may not actually be on the radar of many companies today, but they're going to be increasingly important as agents become really commercially viable and deployed on a widespread basis. So we wanted you to hear about those issues here first, and we're happy to talk more in detail about them in future episodes as well.

Katherine Forrest: Right, and just for those of you who haven't actually listened to the April 2024 episode on agents, we'll just do a little definition and sort of setting of the table here and give our listeners a quick reminder of what we mean by “AI agents” or, my favorite phrase, “agentic AI.”

Anna Gressel: Sure. At the outset, I think it's worth noting that those terms cover a wide and very diverse range of technologies. There's not really a consensus definition yet for AI agents or agentic AI.

Katherine Forrest: But nobody's ever satisfied when we say there's no definition, so we have to give them something, Anna. So just think of something. Let's just give them the one that's most used.

Anna Gressel: So I think rather than giving a definition, I'll share some key attributes of AI agents that come from Mike Clark at Meta, and I highly, highly, highly recommend Mike's Substack called “AI Agent Insights.” It's awesome, I think he's on the fourth one right now. They come out all the time and they're super interesting.

Katherine Forrest: Why don't you tell people before you get there, hold on, hold on. What's a Substack? Like you float over the word Substack, like that’s using everyday language. Like everybody talks like at the hot dog stand about like, yeah, my Substack. Yeah, you're Substack. Like what's a Substack?

Anna Gressel: So I assume that people listen to podcasts, maybe subscribe to Substacks, but Substacks are just like short newsletters that people can publish, they can self publish, and you can subscribe to them and get them. So I know Mike, I follow his Substack, and it means I get his cool insights all the time. It's actually awesome. There's an app you can download too.

Katherine Forrest: Alright, so jump right into his Substack.

Anna Gressel: Awesome. So Mike suggests that true agents, he calls them true agents, will be differentiated from more basic automation tools through five different features. First, like true agents can research and gather information to accomplish a goal. Second, they can reason and analyze information to make informed choices. Third, they can make decisions autonomously without human intervention. Fourth, they can take action independently to achieve those goals. And fifth, finally, they can learn from experiences and improve their performance over time. So that's his list of characteristics of true AI agents.

Katherine Forrest: I have to say that is quite a list. And we might be some of the way there with current agentic technology that we're seeing now on the market, but there's a lot of room, I think, still for the technologies to evolve. But we'll talk about some of the technologies we're seeing right now.

Anna Gressel: Yeah, and we're really just in the early days, kind of like the Chat-GPT moment for generative AI. I think we're going to see really an acceleration of development in agents in the coming months and years.

Katherine Forrest: All right, so let's talk then a little bit about one of the newest technologies in the agentic area that came out from one of the big AI developers out there, Anthropic.

Anna Gressel: Yeah, so Anthropic recently released a product that they call “Upgraded Claude 3.5 Sonnet,” which they call a computer-use model.

Katherine Forrest: All right, and so I went on to the Anthropic website and I've watched some YouTube videos and you can't actually play yourself as a regular person with 3.5 Sonnet in terms of some of the stuff just yet. At least I couldn't get to it, but I was able to watch a lot of what is out there right now on it and it's really, really, really interesting and I highly recommend that our listeners go and check it out themselves. But really what this computer use model does is it continuously takes a screenshot of a user screen, of your screen, of the person who's using the tool, and it sees then the same information that the human does and it then takes over your computer if you ask it to undertake a task which it will then do autonomously and it will do it in a way that is analogous, that's similar to the way that you actually interact with a computer. Let me give you just sort of one example and then you, Anna, can probably give an example. But one example is, let's just say that you want the 3.5 Sonnet computer use model to undertake some research for you. You would give it a prompt. It would be able to go to your browser. You could watch it do this. Go to your browser, open up Google, put in archive, ARXIV, enter the prompt that you want within archive, find the article or articles that you want, press print, find the right printer that is showing ready on your screen, print it out, and proceed from there. So it's able to do really amazing things.

Anna Gressel: Yeah, and I think it helps to benchmark this as like super different than what a chatbot does, right? So a chatbot, you're interacting with text and images and you're giving them a prompt, it gives you a response. They're not really, chatbots don't really action things in the real world, right? Here, the model actually navigates the user's environment and specifically, you know, according to Anthropic, the model can navigate websites and web applications. It can interact with user interfaces, so that means like moving mouses and clicking and typing things, and it can actually interpret the visual information from the screenshot and decide how to complete the task.

Katherine Forrest: Right, for instance, one of the tasks that you can see Anthropic doing when you look at some of the information that's publicly available is actually doing coding, and it can do life cycle coding where it will both design the code, it will then write the code, it can debug the code, it can actually do exactly what you just said, Anna, which is understand if there's a bug, it can interpret a line of code, find out where the bug is, delete the bug, change the bug, and then actually then go ahead and continue with either performing the operation, finishing up the code segment, et cetera. So it's really pretty extraordinary, and it can also work across different kinds of computer environments. So it doesn't actually have to be programmed for a specific operating system or a specific API, but it can actually look at different computer interfaces and work with different computer interfaces.

Anna Gressel: As Katherine mentioned, this functionality is still in beta, and it’s certainly not flawless. Anthropic itself actually gives a funny example of this. They had the model working on a coding demo, kind of like what Katherine was talking about, and the model spontaneously took a break to search for and look at various pictures of Yellowstone National Park.

Katherine Forrest: Right, I just want to pause on that because this is something that I have at least never encountered before, which is that the model itself, doing something entirely different from having anything to do with Yellowstone National Park, decided somehow it had run across a picture of Yellowstone National Park, it paused, it examined the picture of Yellowstone National Park for, Anna, what reason? Was it enjoying the landscape of Yellowstone National Park? What was it doing? But it took a break, and then when it was ready, it continued. And so, you know, what we've got really is something that is really quite new, but at the very least, with this agentic AI and this computer use capability of the Sonnet 3.5, we've got human level performance on the computer in terms of using the functionality of the interface that you've got with the browser, trying to think of and reason through the various tasks as a human would to try to problem solve. And it's the most interesting kind of agentic use that I've seen so far.

Anna Gressel: Yeah, and I mean, it's I mean, I love the Yellowstone example. It's so relatable. I mean, who has not just like looked out the window or decided it was time to just like see something beautiful on their screensaver? I mean, there are moments like that. But you can think about all the different risks that, if you were going to replace that behavior with something risky, all the different risks that agentic technologies could create. So they could go on websites and order illicit substances or put in private information into the websites. There's a lot here that needs to be controlled for, and Anthropic itself suggests some best practices in their beta publication.

Katherine Forrest: And there are also a number of other companies that are working on frameworks for evaluating AI agent safety and having exactly these kinds of risks in mind. And in fact, there's a paper, you know, we always sort of recommend papers, some of our favorite papers. And there's one that IBM recently published, which is called “Benchmark for Evaluating Safety and Trustworthiness of Web Agents.” And it actually highlights the importance of ensuring that agents should be controlled, how they can be controlled, following different kinds of organizational policies that conform with user preferences and things like that. So safety is very important, but people are really working on that.

Anna Gressel: Yeah, and that's a really important point. And it brings us back to this compliance and governance theme we mentioned at the outset. And, in brief, I think it's going to be really critical for companies to understand the full implications of AI agents being deployed, either, for example, by their employees as part of an enterprise tool set, or externally by their customers or competitors. So those are agents in the world that are interacting with the company's systems, like their customer support systems or anything that's public facing. So it's going to be critical for companies to think defensively about when they're comfortable with agents interacting with their systems, what limits to put on those agents or on their users, and how to detect when agents are acting outside of their permissible bounds. These are critical, critical issues looking forward.

Katherine Forrest: Yeah, they're really key points because they go to a variety of safety issues, of liability issues, of indemnification issues, for instance. Imagine that you've got an agentic kind of AI and you've brought it in-house and you've licensed it in and it's given a particular task, but you didn't understand that the AI agent would undertake certain steps that are outside of the bounds of what you and your company policy would normally allow. So there are going to be a lot of things to think through as we go through this whole sort of agentic journey.

Anna Gressel: Yeah, and that journey is like just kicking off. I mean, right as Anthropic published its kind of announcement on this, and so did Microsoft and Google. But also even outside of the tech space, we had companies like ADNOC and AIQ, which are two UAE-based companies that announced a first-of-its-kind AI agent for the energy sector. I mean, we're really beginning to see this uptick in development and adoption really worldwide.

Katherine Forrest: And I think we should actually maybe spend some time on in one of our upcoming podcasts on that first-of-its-kind AI agent that comes out of the UAE because it's going to have some very interesting implications. But we'll cover those developments in future podcasts, and that's all we've got time for today. As you know, we could go on and on about it. And I didn't even mention my favorite thing, which is the o1 system card today, Anna. This is like the first episode where I didn't somehow manage to work the o1 system card into our talk. But I'm Katherine Forrest.

Anna Gressel: I'm Anna Gressel, and if you like the podcast, make sure to share it and also give us a rating on your favorite podcast platform.

Show Transcript