skip to main content

Do You Know Where Your AI Is?

Join Katherine Forrest and Anna Gressel as they explore the concept of on-device AI. They delve into the benefits and challenges of on-device AI, as well as related topics like on-premise AI and model optimization techniques like quantization and pruning.

  • Guests & Resources
  • Transcript

Katherine Forrest: All right. Hello, folks, and welcome to — we always say today's episode, but it's really this week's episode — of “Waking Up With AI,” a Paul, Weiss podcast. I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel.

Katherine Forrest: And Anna, our listeners may not know that we're recording this at the end, the very end, of 2024 because we're going to be airing it in the beginning of 2025. But as we sometimes do, and we do this at conferences a fair amount at this time of year, we talk about what do we think are the critical tech developments that are going to be front and center coming up in the next year? So before we go any further, let's look into our crystal balls and see what we think we can see about what's going to be happening in 2025. And I will go out on a limb and start, and I'm going to say that — and maybe I've even said it before, so I'm just repeating myself, but it's only because I believe it so much — AI safety and safety-based debates about whether or not AI has hit AGI or some portion of an AGI-like metric, and that we're going to be talking about how to make sure we can continue to control highly capable models.

Anna Gressel: Definitely. And my prediction, for what it's worth, is that this is going to be a big, big year for advances in the related concepts of on-device AI and robotics. Maybe we'll cover robotics a little bit more in a separate podcast.

Katherine Forrest: Okay, and so on-device AI is not something that we've talked about before with our audience, and it's a phrase that gets thrown around a lot and I think is often not fully understood. There's sort of an intuitive aspect of on-device AI, but there are also parts of it that are really much more technically nuanced. So let's just dig into it.

Anna Gressel: Yeah, I mean, maybe we should start by breaking down what on-device AI even means. And as a general matter, on-device AI refers to the ability to run AI directly on a local device. And sometimes you'll hear the term edge device. That's also what that means. And that would include things like smartphones or tablets, wearables, and even things like automobiles, without the need to send data continuously back and forth to remote servers that are in some sort of far-off data center.

You know, maybe it doesn't seem like that fancy and advanced to some of the people who are thinking about agents today, but the ability to do this is actually an incredibly important step in enabling a whole new class of AI applications, including things like robotics. Remember I said that was a related topic, but also things like augmented reality applications and important technologies like drones or automated vehicles. And part of this is because you don't have to send data back and forth, right? So this can also do things like preserve the privacy of the data collected on the device and minimize really important dependencies that are created when you need internet, right? You need the internet to run an application.

Katherine Forrest: Right, but these on-device applications that you've just talked about, where the AI is on the device and you've got these additional privacy protections, sometimes that, for certain devices, they can actually send data later on to the cloud if you've asked it to do so or given it a permission to do so.

Anna Gressel: Yeah, that's absolutely true. I mean, we have a wide variety of devices today that already use AI, right? So the concept of on-device AI is not new. And that includes things like, you know, your smartphones and your watches. I mean, there's AI really almost in every device already today. I mean, not to mention our smart cars, right?

Katherine Forrest: Well, like my Tesla, which we've named Annie, because I have a way of naming all my cars.

Anna Gressel: I forgot you name your cars, which I kind of love. And I have forgotten the names of the other ones, but you'll remind me at some point.

But having AI on devices was easy to do. Let's just benchmark where we are today and where we used to be. It's actually pretty easy to do with traditional AI models because machine learning models are relatively lightweight and energy-efficient. And they can run on small devices like phones with, and this is important, with conventional chip technologies. So just the chips that exist in our phone today. And it's also really easy to have user interfaces for AI applications that exist on phones, right? So ChatGPT, I have the interface, the app itself, installed on the phone, but the underlying foundation model still resides on a company's cloud platforms and servers. That is, the model itself is not in my phone, even though the app interface is in my phone.

Katherine Forrest: But you still call that on-device AI.

Anna Gressel: I probably wouldn't call it on-device AI to be entirely honest. But it does, some part of the application does live on my device.

Katherine Forrest: Okay, so you — on-device AI is when the entire AI application is living on your device.

Anna Gressel: Yeah, some material part of it. It may not, it might be some of it, it might be all of it. But I wouldn't think just a user interface is enough to qualify something as on-device.

Katherine Forrest: And because ChatGPT needs an internet connection to interface between your smartphone and that portion of the application and the server when you're sending a particularly complex question, and it's thinking before responding, then, as you were saying, Anna, you don't have a fully on-device AI application there.

Anna Gressel: That's exactly right.

Katherine Forrest: Now, how about the fact that with a smartphone, when you're sending a query, you know you type a query into your smartphone and the smartphone can do a little piece of it there on your phone, but then it sends out the complex part of the query onto the server, it takes time. And whenever you take time in the computer world, it's often called latency. So you can end up with latency, in terms of the length of time that it takes the query to actually talk to the cloud and then come on back to your phone.

Anna Gressel: Yeah, latency is a really important metric. I think it's incredibly important for understanding why people are looking at on-device AI. That's because latency as a metric may not matter much when I'm asking an AI application to create an image of the rainforest, because I think that's fun, with a little frog in it, or to write a poem for me. But for mission-critical AI applications, and that might include military uses or healthcare uses or mobility uses like drones or automated vehicles, latency can pose major issues with safety and reliability of the AI system. Basically, if we have more time between the query and the response, that is more time for the system to not take the right action or potentially even to fail.

And that's why putting generative AI models like LLMs or multimodal models on devices themselves is such an important area of R&D today. But because those models are so large and require significant computing power, putting them into really small devices like wearable glasses or, you know, teeny, little drones is a pretty complex and challenging task.

Katherine Forrest: All right, so let's come back to this in a moment. But I want to talk about another phrase that gets thrown around a lot in the AI context, and it's called on-prem, ON-P-R-E-M, which stands for on-premises AI. And this is a different context altogether, but I think it's worth exploring because we're talking about on-device AI. So now we're going to talk about on-premises AI. And that's the local deployment of AI models and related infrastructure on a company's, or within a company's, own physical facility rather than in the cloud. And so let's talk, Anna, a little bit about on-prem AI and any similarities that it may have to on-device AI or differences.

Anna Gressel: I mean, I think there are definitely some similarities. If you have AI running on-prem, so for example, if I was a big company and I had my own servers within the walls of my company, under lock and key, I mean, I actually mean like a physical lock and key and not just a digital lock-in key. If information's not going out of those servers, it actually can really preserve things like privacy, confidentiality. There are also ways that it can help with issues like data localization if you're in a jurisdiction that requires that sensitive data not be sent to other countries, right? So there are lot of advantages to on-prem AI, but again, some disadvantages because you'd have to actually have the full compute power to run a massive model locally, and that's hard. I mean, there's a reason that a lot of the computing is happening on cloud, because those are the places where the largest, greatest compute power tends to exist that's available to companies.

Katherine Forrest: Right, and to actually run a lot of AI models on-prem takes enormous amount of space. It takes a lot of server space, and you've got to be able to fit a lot of compute and memory into that server. So right now, you've got a number of developers that are working on shrinking these generative AI models that are going to be used on-premises, not to take up that much memory and just sort of shrink them down a little bit so that they can take up less room on-premises.

Anna Gressel: Right, exactly. Less room on-premises or less room on-device. And you also have companies that are working on chip technologies to try to make better chips on-device that can run even more sophisticated models. So there are a number of chip developments that are in progress as well. But, Katherine, I think it's worth pausing and taking a moment to talk about the ways that companies make models smaller. And maybe you can indulge me, because I know I want to break out my favorite word that I know you laugh at me about.

Katherine Forrest: No, don't do it. Don't do it, Anna. Don't do it.

Anna Gressel: Yes, I'm going to do it. I'm going to do it. Prepare yourself. So it is quantization, which I just think is such a fun word, but I know it's a little bit niche.

Katherine Forrest: It's like the jargon-iest of the jargon. But the quantization, and you'll talk about it in just a moment, but let me just sort of summarize for our audience what we're talking about now. We've talked about on-device AI and how you've got these essentially small AI models running on particular devices. Then you've got like the little bit of what I'm going to call a hybrid. I know that that's not technical, and a lot of people would reject it. But that's sort of like where you've got a piece of an application sitting on a device, and then the device actually has to talk to the server in order to complete the task. And then you have on-prem AI. And so with all of these, you've got space constraints. And you've got significant space constraints for either the model or the server, or the model and the server because they're interrelated.

So now what we're going to do for our audience is talk about the efforts that are being made to shrink all of this down. And that's where that lovely, hideous word quantization comes in. So go for it, Anna. You just dig right into that word, and let's just see if we can make it comprehensible.

Anna Gressel: I mean, I think it's kind of fun. It's like quark. I think I like these words that start with Qs.

Katherine Forrest: Don't even start on a quark.

Anna Gressel: But quantization, I mean you don't even really need to know what the word is. Let's just talk about the concept. It basically means optimizing a model to reduce its size and its processing speed. And the way that they do that, with a technique called quantization, is actually to reduce the precision of the numbers used in computing, in computation of the model, but preserving accuracy. So it's like, let's make this smaller by reducing some of the precision involved. It's just one way of making a model more lightweight and easier to run on-device. And you can see that there's a lot of quantization happening in the open source community. So you might go and see a model available on Hugging Face, and quantization is how they make some of those models smaller.

The other method of doing this, which is actually fascinating for different reasons, is called pruning. Pruning is about removing unnecessary weights from the model. Actually, this is my neuroscience side, neurons in your brain can actually be pruned, and sometimes that is a process that leaves room for growth. So I think the concept of pruning is cool. The other reason why pruning is cool is there is some research on selectively pruning models to eliminate concepts from them, but that's still in pretty nascent phases. Right now we just talk about pruning as making models smaller. So, all of this is headed where? It's really headed towards the development of what we might call small language models, in contrast to large language models or small generative AI models. And those are models that are like 1 billion parameters or 3 billion parameters, as opposed to 400 or 500. But they can still do impressive things. And there are some that have been released recently by Meta and ATRC. And it's just the beginning of a larger charge towards these smaller models.

Katherine Forrest: So we've been talking now about charging towards these smaller models. So let's talk about some of the practical benefits of on-device AI, on-prem AI and their legal implications.

Anna Gressel: Definitely. There's some that we've already talked about, including improved privacy and confidentiality that comes with local data processing, as opposed to data processing that involves the internet.

Katherine Forrest: Right, and there's other privacy concerns, concerns about whether or not there are certain kinds of surveillance that can be controlled or avoided. You can have additional control over guardrails around the applications. And I said privacy, but also generally, not only personal privacy, but confidentiality, competitive confidentiality. Those can also come from, or be enhanced by, both on-prem and on-device AI.

Anna Gressel: Yeah, and they're interesting liability questions, in terms of whether small models are going to be as competent or capable as larger models and who should be liable, if anyone, if the model is no longer reliable because it's too small. I mean, those are just hypothetical questions, but interesting ones to consider. But ultimately, the bottom line for lawyers is like with any AI technology: it's critical to understand where the model is actually located and how it processes data in order to understand its legal risk profile. So although many of the current AI models used by companies rely on cloud deployment today, going forward it's also increasingly likely that we're going to see companies experiment with on-device applications and on-premise applications and all different kinds of applications that involve localized deployments as well.

Katherine Forrest: Right, so we've got sort of three different, again, deployment locations that have confidentiality and compliance and privacy issues, security issues, cybersecurity issues. And that is whether the AI model is on the device, whether it's on your own premises so you can control it, or whether or not it is in the cloud. Now, in the cloud, it may be, and some would argue, extremely secure in the cloud, given all of the security protocols that cloud providers are able to now bring to bear. And they're able to create all kinds of instances for their clients that are secure. But that doesn't mean that certain clients for certain applications don't want to nonetheless have on-prem AI. So we've got those three things all at work. So it's 10 p.m., and do you know where your AI is? Do you remember that commercial? No, or are you too young for that commercial?

Anna Gressel: Oh, I do. No, I'm actually not.

Katherine Forrest: And now you don't even need to ask, are you 10 p.m., and where's your AI? Because all you'll do is do the “find my device” and it will, it'll find it for you. So, all right, it's a very active space, and I'm sure we're going to be returning to this in the future, but that's all we've got time for today. I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel. Make sure to like and share and rate the podcast if you're enjoying it.

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2025 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy