The word from Gizmodo, via the Guardian, is that Elon Musk-backed AI Company Claims It Made a Text Generator That's Too Dangerous to Release
Elon Musk has been clear that he believes artificial intelligence is the “biggest existential threat” to humanity. Musk is one of the primary funders of OpenAI and though he has taken a backseat role at the organization, its researchers appear to share his concerns about opening a Pandora’s box of trouble. This week, OpenAI shared a paper covering their latest work on text generation technology but they’re deviating from their standard practice of releasing the full research to the public out of fear that it could be abused by bad actors. Rather than releasing the fully trained model, it’s releasing a smaller model for researchers to experiment with. The researchers used 40GB of data pulled from 8 million web pages to train the GPT-2 software.
"In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
"The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science."
"Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved."
"Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow."
Amusing or Dangerous?
I have to say that is quite amusing. But is it dangerous?
The OpenAI researchers found that GPT-2 performed very well when it was given tasks that it wasn’t necessarily designed for, like translation and summarization. After analyzing a short story about an Olympic race, the software was able to correctly answer basic questions like “What was the length of the race?” and “Where did the race begin?”
These excellent results have freaked the researchers out. One concern they have is that the technology would be used to turbo-charge fake news operations. The Guardian published a fake news article written by the software along with its coverage of the research. The article is readable and contains fake quotes that are on topic and realistic. The grammar is better than a lot what you’d see from fake news content mills. Other concerns that the researchers listed as potentially abusive included automating phishing emails, impersonating others online, and self-generating harassment.
Guardian Feeds GPT2 One Sentence on Brexit
The creators of a revolutionary AI system that can write news stories and works of fiction – dubbed “deepfakes for text” – have taken the unusual step of not releasing their research publicly, for fear of potential misuse.
OpenAI, an nonprofit research company backed by Elon Musk, Reid Hoffman, Sam Altman, and others, says its new AI model, called GPT2 is so good and the risk of malicious use so high that it is breaking from its normal practice of releasing the full research to the public in order to allow more time to discuss the ramifications of the technological breakthrough.
When used to simply generate new text, GPT2 is capable of writing plausible passages that match what it is given in both style and subject. It rarely shows any of the quirks that mark out previous AI systems, such as forgetting what it is writing about midway through a paragraph, or mangling the syntax of long sentences.
Feed it the opening line of George Orwell’s Nineteen Eighty-Four – “It was a bright cold day in April, and the clocks were striking thirteen” – and the system recognises the vaguely futuristic tone and the novelistic style, and continues with:
“I was in my car on my way to a new job in Seattle. I put the gas in, put the key in, and then I let it run. I just imagined what the day would be like. A hundred years from now. In 2045, I was a teacher in some school in a poor part of rural China. I started with Chinese history and history of science.”
Feed it the first few paragraphs of a Guardian story about Brexit, and its output is plausible newspaper prose, replete with “quotes” from Jeremy Corbyn, mentions of the Irish border, and answers from the prime minister’s spokesman.
One such, completely artificial, paragraph reads: “Asked to clarify the reports, a spokesman for May said: ‘The PM has made it absolutely clear her intention is to leave the EU as quickly as is possible and that will be under her negotiating mandate as confirmed in the Queen’s speech last week.’”
Fake Product Reviews
OpenAI made one version of GPT2 with a few modest tweaks that can be used to generate infinite positive – or negative – reviews of products. Spam and fake news are two other obvious potential downsides, as is the AI’s unfiltered nature . As it is trained on the internet, it is not hard to encourage it to generate bigoted text, conspiracy theories and so on.
Mimic Trump Mode
The researchers also created a "Mimic Trump" mode that looks at the patterns of individuals and can generate Tweets on any subject.
To make Trump-generated Tweets more realistic, GPT2 was programmed to misspell words and make new words up. GPT2 could then Tweet the results.
The researchers commented, "Trump would love this."
Yet another model was programmed to take all of the negative stories about Tesla and debunk them.
By the way THIS IS FAKE NEWS - I JUST MADE THIS UP. So don't quote anything in the preceding four paragraph.
However, everything I said above is certainly doable.
Musk's OpenAI discusses Better Language Models and Their Implications
GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.
There is much more in the article including many more paragraphs of the unicorn story.
Clearly it can generate science fiction stories for kids.
Dangerous or Not?
Mike "Mish" Shedlock