AI proves ‘too good’ at writing fake news, held back by researchers
The organization created a machine learning algorithm, GPT-2, that can produce natural-looking language largely indistinguishable from that of a human writer while largely “unsupervised” – it needs only a small prompt text to provide the subject and context for the task.
We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: https://t.co/sY30aQM7hUpic.twitter.com/360bGgoea3— OpenAI (@OpenAI) February 14, 2019
The team have made some strides toward this lofty goal, but have also somewhat inadvertently admitted that, once perfected, the device can mass-produce fake news on an unprecedented scale. A fake news super weapon for the information warfare era, if you will.
“We have observed various failure modes,” the team observed. “Such as repetitive text, world modelling failures (eg the model sometimes writes about fires happening under water), and unnatural topic switching.”
Here's a short story i generated using OpenAI's GPT-2 tool (prompt in bold) pic.twitter.com/DGIVwGuAUV— will knight (@willknight) February 14, 2019
With topics familiar to the system (a large online footprint with plenty of sources e.g. news about Ariana Grande, Hillary Clinton etc) the system can generate “reasonable sample” roughly 50 percent of the time.
“Overall, we find that it takes a few tries to get a good sample” says David Luan, vice president of engineering at OpenAI.Also on rt.com Pentagon’s 1st AI strategy vows to keep pace with Russia & China, wants help from tech
GPT-2 boasts 1.5 billion parameters, and was trained on a far larger dataset than its next nearest competitors and the system employs machine learning to establish “quality” sources of content, based on some eight million pages posted to link-sharing site Reddit. For a link to qualify for inclusion, it needs a “karma” score of three or higher, meaning that three human minds deemed the link worthy of viewing.
“This can be thought of as a heuristic indicator for whether other users found the link interesting, educational or just funny,” the team writes.Also on rt.com AI can’t beat a human in a debate (yet), but give it time…
Quotes and attributions are entirely fabricated by GPT-2, but the story, constructed word-by-word, is coherent and based entirely on pre-existing content online while avoid direct plagiarism. Critics have already highlighted that the paper published alongside OpenAI's announcement has not been peer-reviewed.
Debate is already raging online about the moral and ethical implications of such technology and its potential impact on the online information ecosystem as well as the political process around the wider, physical world.
Following the publication of OpenAI's GPT-2 yesterday, there's been some great discussion in the AI community over the rights/wrongs of their publication approach. Good thread on it (and thread within this thread) here https://t.co/34qi5HWza4— James Vincent (@jjvincent) February 15, 2019
Not releasing the model on the claims of dual use is wrong. Fanning the flames is Senator @BenSasse 's bill "Malicious Deep Fake Prohibition Act of 2018’’Why wrong, what's the bill and lessons from security community to inform this discussion. THREADcc: @jackclarkSFhttps://t.co/mL9Qf5rsqb— Ram Shankar (@ram_ssk) February 14, 2019
“[We] think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems,” OpenAI said.
Google parent company Alphabet has adopted a similar practice of not divulging its latest AI research openly to the public for fear that it may be weaponized.
Think your friends would be interested? Share this story!