Ditch humans or cooperate? Google’s DeepMind tests ultimate AI choice with game theory
DeepMind’s latest research is focused on the dichotomy between cooperation and competition, specifically among reward-optimized agents (human or synthetic), in highly variable environments.
While far from deciding humanity’s fate at this point, the information gathered thus far gives us an indication of the extent to which man and machine may cooperate in the near future, on everything from transportation systems to economics.
The team is trying to expand the comfort zone of existing AI agents in a variety of ways, most recently through two distinct game types that draw heavily on elements from game theory.
In the first game, the two agents must compete to gather as many apples as possible, a straightforward premise centered around scarcity and cooperation. The more plentiful the apples, the more likely the players were to cooperate or, at least, leave the other alone.
However, there is a twist: both players are armed with a ray gun and can stun the other player at any time, immobilizing them for a brief period, allowing the aggressor to gather more resources unimpeded. This is classified as a ‘complex behavior’ within the game, as it requires more computing power, thought, or effort to carry out, as opposed to a singular directive such as a collecting apples.
The DeepMind team found that the greater the level of intelligence applied (or larger the neural network supporting the software agent), the more aggressive the software agents became.
The second game, the Wolfpack game, involves hunting for prey for a reward. The twist here is that other wolves in the surrounding area also receive a reward for a successful hunt. The more wolves within the designated area, the greater the reward each wolf receives.
This game rewarded cooperation (the complex behavior in this instance) far more than the apples game, regardless of how intelligent the participants were.
The researchers believe that there is a propensity towards the more complex behaviour in each game, especially as agents become more intelligent i.e. aiming at and zapping an opponent and cooperating for greater rewards in each game.
Leibo emphasized that in the current round of experiments, none of the software agents had a functioning short-term memory, and thus could not make inferences on other subjects’ behavior based on past experience.
“Going forward it would be interesting to equip agents with the ability to reason about other agent’s beliefs and goals,” he said.