Open Source AI
In this episode of “Waking Up With AI,” Katherine and Anna discuss the brief history of open source software, while also exploring the complexities and implications of open source AI models.
- Guests & Resources
- Transcript
Partner
» BiographyPartner Elect
» BiographyKatherine Forrest: Well good morning, everybody and welcome to today's episode of “Waking Up With AI,” a Paul, Weiss podcast. I'm Katherine Forrest.
Anna Gressel: And I am Anna Gressel.
Katherine Forrest: And the first thing I want to say before we get into sort of the business of the day is I'm going to admit that I'm not actually sitting here with a cup of coffee, Anna, because we're recording this at four in the afternoon. And I have this incredibly delightful beverage named Fresca. It’s a high recommend. But, you know, there have been so many developments in AI right now over the past few weeks that it's really hard to even know where to begin apart from the Fresca, of course.
Anna Gressel: That's true. So, we had the release of Meta’s 405b model. Super exciting. It's one of the largest openly released models today. And separately, we've seen some interesting issues in the political arena around deepfakes being deployed around the election.
Katherine Forrest: Right, and we're seeing an unprecedented number of papers that are discussing AI safety. Big, big topic.
Anna Gressel: I know, we could seriously do an episode today and barely keep up with what is happening.
Katherine Forrest: But then we wouldn't have time for our day job, and Anna we're supposed to have a day job of advising clients on all these issues and in fact we do.
Anna Gressel: So, let's turn to one particularly hot topic, open source models. This phrase is used so often, and I think people assume they know what it means or have no idea what it means but really are too embarrassed to say.
Katherine Forrest: Right, and we can help them demystify it.
Anna Gressel: Yep, and we should say at the outset there are lot of complicated issues with open source releases of models. We don't really have time to get into all of those hard debates today, so we'll just start at the beginning for this episode.
Katherine Forrest: Okay, so starting at the beginning, I want to talk about what the concept of releasing something open source means and where it comes from. First, most software today, as everybody knows, and over the past few decades has been developed by private companies and released under licenses for a fee. So, you pay a fee for your word processing program or your tax program or this or that kind of software package. And all of those software packages that you're paying, for with a license fee. We're going to call those sorts of closed releases just for purposes of today's talk.
Anna Gressel: Definitely. And that's a really big contrast with what we might call a true release of open source software. And that's open source software where the actual source code is made available to the public. And it means anyone can view that source code, get inside, look at how the software is architected. They can modify it and they can even distribute it themselves for their own modifications.
And so how does this happen? Well, generally, open source software is covered by license terms that are really permissive. And some go as far as providing that the person downloading and using the open source software can pretty much do anything they want with it.
Katherine Forrest: And that's consistent with sort of a philosophical framework that goes around much of open source software and a couple of concepts that we can mention here. The first one is democratization. And part of the open source philosophy is that when things are released open source from software and source code is released open source, it allows everyone access to technology that can then enable widespread research and it can allow broad contributions from all kinds of folks and broad utilization.
The second thing that I think I'd mention is actually related to that, which is meritocracy. The concept that if you've got a lot of different kinds of people who are able to bring their own brain power and technological know-how to an open source program, which allows them to develop it in a way that might be extremely positive and the best software advances will then win. That's the meritocracy aspect of it.
Anna Gressel: And then there's also transparency. And this is the idea that because the code is generally made open to the public, anyone can see how it works and that access can allow for innovation. It can also increase safety because you have a whole community of people testing models, finding out what their vulnerabilities are, and then proposing fixes. We've seen that operate in practice in the cybersecurity arena. And open source releases have a really interesting history.
Katherine Forrest: That's right. They came out of the early years of active collaboration among academics and many of whom were computer scientists in the 50s and the 60s, and then in the 70s we started to see much more active release of proprietary software and restrictions around source code.
Anna Gressel: And in 1983, the general public license or GPL was first launched. And this allowed people to release software open source pursuant to certain terms. And then the 90s saw a real open source movement with the release of the Linux kernel under the GPL. And in the mid-1990s, the Apache Group released its Apache HTTP server, which became the dominant web server on the internet.
Katherine Forrest: Right, and then we had Mozilla that released the Netscape browser open source in 1998 and in 2000 there's been a whole lot of activity over the last couple of decades with the 2005 opening up of GitHub as the place where there was a lot of open source development and Google that had a number of really important open source projects including Android.
Anna Gressel: Today, we see a number of AI models that have been released in what we might call an open manner. And that can range from models licensed under a true open source license or released under a more bespoke license. So, for example, Meta has consistently released its Llama family of models under a bespoke open source license, including its newest 405b model.
And it's really interesting, Mark Zuckerberg put out a manifesto on the value and benefits of open source releases of AI models. And I'd really urge our listeners to read it because he spells out why open source releases can have so many public benefits.
Katherine Forrest: And in the AI area, there's a recognition that with open releases of models, there's really a spectrum at play. On the one end of the spectrum is releasing the weights of the model. And a model weight refers to key parameters that a model has learned during the training process. And knowing the model weights, if they're released openly, that means you can theoretically be able to define the behavior and performance of a model. GPT-3 was not fully open sourced, but it released model weights and Google's BERT did the same thing.
Anna Gressel: When AI model weights are openly distributed, it actually enables some developers to jumpstart the model development process. The concept is that that should enable faster progress in the AI space. We've seen some really interesting things happening in the open source space around AI, Katherine, including techniques like quantization of models, which is kind of a long word for saying that these techniques can enable models to have a lighter footprint, and then they can be run locally on devices.
Katherine Forrest: I don't even know what quantization of AI models means, but I'm just going to skip over that for the moment and say that there are also questions that get raised about open source releases.
Anna Gressel: Some people discuss these actively and there's no right answer. There's a very robust dialogue happening right now, both in the U.S. and EU on the right regulatory approach to open source models, given what people perceive to be the risks of an open release. And the European Commission's AI office is going to embark on a series of important dialogues on this issue in the coming year.
Katherine Forrest: And some argue for open source really to fulfill its promise, there has to be some way to get models trained less expensively.
Anna Gressel: So, Katherine, what's our legal takeaway from all of this?
Katherine Forrest: It's that if you're giving advice on a model, you should understand if it's open sourced or closed release. It has some really important implications.
Anna Gressel: Yep, it can impact questions of responsibility, where different actors are in the AI value chain, and regulations are also starting to treat open source differently. For example, there are important exemptions for open source software in the EU AI Act.
Katherine Forrest: All right. Well, that's just a really quick introduction to open source, and that's all we've got time for today. I'm Katherine Forrest.
Anna Gressel: And I'm Anna Gressel. Thanks for joining us.