Times of London: Wikipedia, AI, & THE EDITORS | An Audiobook Update
My conversation with Alexis Conran on the wireless.
Before we get started, an update that production of the audiobook edition of The Editors is underway. The release date is scheduled for November 26. Produced by Audible, the story will be narrated by two veteran (human) narrators: Mia Hutchinson-Shaw and Tim Lounibos. I’ve loved audiobooks since listening to A Wrinkle in Time as a child on a family road trip, and I’m thrilled to bring the story to life in this new format. Mark your calendars for The Editors audiobook release on Tuesday, November 26!
Last Saturday, I appeared on Alexis Conran’s show for Times Radio London where we had a fun and far-reaching conversation about the famous internet encyclopedia. Here’s the transcript, lightly edited and annotated for clarity.
Alexis: Where do you get your information from? (Other than Times Radio, of course.) Is it Google? Is it Wikipedia?
Now, I can hear some of you scoffing, but since its founding, Wikipedia has become a surprisingly important part of the online information ecosystem. A bulwark against viral misinformation. So much so that in 2018, YouTube announced it would use Wikipedia to help fact check conspiracy videos.
But the future of the site is now under threat with the rise of artificial intelligence. So should we be doing more to protect it? Well, Stephen Harrison is a tech lawyer, a journalist, and the author of a new thriller based on Wikipedia called The Editors.
And he's been thinking hard about the site's reputation in future. I caught up with Stephen Harrison a little earlier on.
Stephen: The Editors is a suspense novel that's inspired by Wikipedia. When I first got started covering Wikipedia as a journalist, I always thought it was ripe for a suspense novel because there's so many people around the world that rely on the information. There are also foreign governments and corporate agents who might try to manipulate the information that's on the site.
Why would they interfere? Because Wikipedia influences the information that appears on Google. It changes the content that is generated by AI.
So on the one hand, we have these manipulators. And then we have the ordinary Wikipedia editors. Frankly, I think a lot of the good editors are heroic. They are vigilantly on guard against misinformation on the site.
Alexis: The book touches on this. What about the trustworthiness of Wikipedia? I mean, we've all spotted errors in there. A lot of people were worried because of its structure and the idea that people can sort of log in and correct things or add things. How do you feel about where Wikipedia is now? As a journalist, would you use it as a source?
Stephen: I wouldn't use it as a source as a journalist, but I think a lot of journalists would be lying if they didn't admit that they might look at it for initial research. You can get the summary version on Wikipedia and then go to the underlying sources to go deeper.
As far as the general trustworthiness of Wikipedia, and I admit that this sounds like a lawyer answer, but it depends. Specific Wikipedia articles can be really good. I think that articles that got a lot of visibility by editors can often present a very good summary. But of course, the pages are changing from time to time, and they can always be moving in a direction that is more factual or less. For the listeners out there, you can get involved if you want to, and start participating in these editorial debates behind the scenes.
Alexis: We’re living in an age and in a current political climate where the issue of trust in what you see online is paramount. And in your book you touch on this, where one of the editors in your Infopendium is editing on behalf of wealthy and powerful clients. I know this was a challenge for real Wikipedia, where we had politicians in this country [the U.K.] getting in and editing their own pages. How much of that has Wikipedia dealt with so far? Is this still an issue?
Stephen: I do think it’s an issue. I think what’s interesting in this era of automation and AI, the best tools for identifying these for-profit and highly biased editors are still human tools. People have an intuition if an editor is only editing about such-and-such politician or such-and-such corporate leader, right? [See this article on how AI is currently not as good as humans at detecting AI.]
So people can still kind of detect bias, and these volunteers become very vigilant when they have a suspicion that someone is editing on behalf a paying client. It gets very tricky though when a bad actor is using several sockpuppet accounts. In those cases, they’re trying to make it look like they’re multiple different people. That’s really tricky for Wikipedia editors. And I think it’s something that the volunteers have to lookout for constantly.
Alexis: Pretty much everybody listening will have at some point used Wikipedia, but probably not know how the information got onto that page. At the moment, what is that process? Are there any checks on what goes on there? How does that work?
Stephen: It’s a site that anybody can edit, but it’s not an anarchy. There are rules and there are policies. And so if someone, a new contributor, goes to the site and they have found a reliable source that states a certain statement, that statement can then be added to an article. I always tell people that every sentence in a Wikipedia page should have a citation because it’s not meant to be original writing. A lot of times new contributors are shocked by how quickly they get slapped down for failing to cite a reliable source. So in practice, the site is sort of policing itself. It’s self-policing.
Alexis: Are there editors that have more power than other editors?
Stephen: What’s interesting to me is that there are only about 1500 core editors that are very involved on a regular basis with Wikipedia. It reflects other things in society: Everyone can join, yes, but very few people choose to get really involved.
There are administrators that are selected by their [peer editors] through a process that is not called an election, but is in practice very similar to an election. Administrators have more powers. They can block a user, or add protection to a page so that it can only be edited by Wikipedia editors with a certain number of edits. So yes, there is a bit of a hierarchy among the editors, but in general most [ordinary] editors can do most things on the website. You can make changes just by signing up, or if you don’t want to sign up, the site will default to using your IP address.
Alexis: You’ve also been talking about AI using Wikipedia as a learning tool. But of course that means that all that work that those editors have done goes uncredited. How much of an issue do you think that is when it comes to AI? Because AI is being trained by all of us. Every time you're asked to click in one of those boxes, “I am not a robot, spot all the traffic lights in this image,” what you're doing is you're just training AI.
Stephen: What I am worried about with AI is that it kind of pushes Wikipedia further back so that it has less visibility to the average user. When someone goes to ChatGPT to get their answer, they’re not always realizing that it’s coming from Wikipedia. And in order for our AI systems to be good, Wikipedia needs to be constantly updated.
Wikipedia needs to maintain a fresh contributor base and people need to be excited to continue working on the site. The public perception is that AI is doing this remarkable work on its own, but no, it's actually using the free labor of these Wikipedia editors. And I think the site needs that sort of crediting and visibility.
Alexis: That’s a big problem, isn't it, Stephen? I've spoken to authors who know for a fact that their books have been fed into AI without any credit. Is there any way of going back on that and giving people that credit or has the bottle, genie bottle been open and it's just way too late?
Stephen: Hm. I worry that we're not going to be able to change the past. What I think ChatGPT and LLMs have done incorrectly is not providing any provenance or citation. The AI usually fails to provide a source.
I am hopeful that even if it doesn’t become a law—even if we can’t get agreement across the different jurisdictions of the world of what the law should be—I would still hope that it becomes a consumer demand that the information coming from AI comes with a source. We users need to push for this.
With authors in particular, it’s not only informational but also creative content. I wouldn’t want my book to be mined for data by ChatGPT, and certainly not without getting a license fee or royalty for it.
Alexis: That doesn’t exist, does it? There’s currently no law prohibiting AI companies from taking your book, The Editors, and feeding it into ChatGPT?
Stephen: We have some important legal cases that are coming down the pipe. In the U.S., the New York Times is suing OpenAI saying that their journalism has been used without consent. The comedian Sarah Silverman has sued. [Update: A judge has trimmed Silverman’s claims down to direct copyright infringement.]
The core issue is something like this: When AI is copying, is it copying the way a human does, taking the core idea or spirit of the work, or is it just copying the structure? I sort of wonder if we can divide these two things.
I’m hopeful that the courts can use the existing copyright statutes in a way that will protect authors and say that the providers of AI need to compensate authors when they’re using it to train their applications.
Alexis: Isn’t it a problem though, when it comes to trying to put some rules around AI and the development of AI, that you can regulate if you like in the U S or in the UK or in the EU. But because AI is being developed by countries and private companies worldwide, the argument will be, well, other countries that don't have to obey those rules, their AI is going to develop quicker than ours and perhaps become more advanced. There is a sort of that sort of arms race at the moment. Is that the sort of the elephant in the room, if you like?
Stephen: I think so. Maybe a good model or comparison might be privacy laws. After Europe passed GDPR [the General Data Privacy Regulation], there was some thinking here in the US that the GDPR framework might be too burdensome, that it might inhibit innovation by tech companies. So here in the US, while we don’t even have a comprehensive data privacy statute, California has the CCPA [California Consumer Privacy Act]. It’s thought to be like Europe, but a little less strict. A little more flexible.
So I could see countries doing that. They could say, we’ll see what this other country enacts, and then we’ll implement something that’s a little less burdensome for our tech companies, because we don’t want to inhibit the AI products in our own country. That’ll be interesting to watch.
Thanks for reading. As a reminder, I’d be happy to join your book club by Zoom to discuss The Editors. And you can read & listen to other media coverage of the book here.