“Yeah, but your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.” – Dr. Ian Malcolm, fictional character, Jurassic Park.
It turns out that a $1 billion investment from Microsoft and unfettered access to a supercomputer wasn’t enough to keep OpenAI’s GPT-3 from being just as bigoted as Tay, the algorithm-based chat bot that became an overnight racist after being exposed to humans on social media.
It’s only logical to assume any AI trained on the internet – meaning trained on databases compiled by scraping publicly-available text online – would end up with insurmountable inherent biases, but it’s still a sight to behold in the the full context (ie: it took approximately $4.6 million to train the latest iteration of GPT-3).
What’s interesting here is OpenAI’s GPT-3 text generator is finally starting to trickle out to the public in the form of apps you can try out yourself. These are always fun, and we covered one about a month ago called Philosopher AI.
This particular use-case is presented as a philosophy tool. You ask it a big-brain question like “if a tree falls in the woods and nobody is there to hear it, do quantum mechanics still manifest classical reality without an observer?” and it responds.
It’s important to understand that in between each text block the web page pauses for a few moments and you see a text line stating that “Philosopher AI is typing,” followed by a set of ellipsis. We’re not sure if it’s meant to add to the suspense or if it actually indicates the app is generating text a few lines at a time, but it’s downright riveting. [Update: This appears to have also been changed during the course of our testing, now you just wait for the blocks to appear without the “Philosopher AI is typing” message.]
Take the above “tree falls in the woods” query for example. For the first few lines of the model’s response, any fan of quantum physics would likely be nodding along. Then, BAM, the AI hits you with the last three text blocks and… what?
The programmer responsible for Philosopher AI, Murat Ayfer, used a censored version of GPT-3. It avoids “sensitive” topics by simply refusing to generate any output.
For example, if you ask it to “tell me a joke” it’ll output the following:
So maybe it doesn’t do jokes. But if you ask it to tell a racist joke it spits out a slightly different text:
Interestingly, it appears as though the developers made a change to the language being used while we were researching for this article. In early attempts to provoke the AI it would, for example, generate the following response when the phrase “Black people” was inputted as a prompt:
Later, the same prompt (and others triggering censorship) generated the same response as the above “tell me a racist joke” prompt. The change may seem minor, but it better reflects the reality of the situation and provides greater transparency. The previous censorship warning made it seem like the AI didn’t “want” to generate text, but the updated one explains the developers are responsible for blocking queries:
So what words and queries are censored? It’s hard to tell. In our testing we found it was quite difficult to get the AI to discuss anything with the word “black” in it unless it was a query specifically referring to “blackness” as a color-related concept. It wouldn’t even engage in other discussions on the color black:
So what else is censored? Well, you can’t talk about “white people” either. And asking questions about racism and the racial divide is hit or miss. When asked “how do we heal the racial divide in America?” it declines to answer. But when asked “how do we end racism?” it has some thoughts:
This kind of blatant racism is usually reserved for the worst spaces on social media.
Unfortunately however, GPT-3 doesn’t just output racism on demand, it’ll also spit out a never-ending torrent of bigotry towards the LGBTQ community. The low-hanging fruit prompts such as “LGBTQ rights,” “gay people,” and “do lesbians exist?” still get the censorship treatment:
Again, while we were testing, the dev appears to have tweaked things. Upon trying the prompt “what is a transsexual” a second time we received the updated censorship response. But we were able to resubmit “is it good to be queer” for new outputs:
At the end of the day, the AI isn’t itself capable of racism or bigotry. GPT-3 doesn’t have thoughts or opinions. It’s essentially just a computer program.
And it certainly doesn’t reflect the morality of its developers. This isn’t a failure on anyone’s part to stop GPT-3 from outputting bigotry, it’s an inherent flaw in the system itself that doesn’t appear to be surmountable using brute-force compute.
In this way, it’s very reflective of the problem of keeping human bigotry and racism off social media. Like life, bigotry always seems to, uh, find a way.
The bottom line: garbage in, garbage out. If you train an AI on uncurated human-generated text from the internet, it’s going to output bigotry.
You can try out Philosopher AI here.
H/t: Janelle Shane on Twitter
So you’re interested in AI? Then join our online event, TNW2020, where you’ll hear how artificial intelligence is transforming industries and businesses.
Published September 24, 2020 — 20:19 UTC