Leading AI lab, OpenAI launched their latest project: an AI that can play hide-and-seek. It’s the newest example of how, with present machine learning techniques, a straightforward setup can produce shockingly refined outcomes.
The AI agents play a straightforward version of the game, where the “seekers” get points every time the “hiders” are in their area of view. The “hiders” get a little time at the start to set up a hiding place and get points when they’ve efficiently hidden; both sides can move objects around the playing area (like blocks, walls, and ramps) for an advantage.
The outcomes from this simple setup had been quite spectacular. Throughout 481 million games of hide-and-seek, the AI seemed to develop strategies and counterstrategies, and the AI agents moved from running around at random to coordinating with their allies to make the complicated strategy work. (Alongside the best way, they showed off their capability to break the game physics in surprising methods, too; extra on that below.)
It’s the latest example of how much could be accomplished with an easy AI technique known as reinforcement learning, where AI systems get “rewards” for desired behavior and are set free to study, over millions of games, the easiest way to maximize their rewards.
Reinforcement learning is straightforward; however, the strategic behavior it produces isn’t simple at all. Researchers have up to now leveraged reinforcement learning among other techniques to construct AI systems that may play complex wartime strategy games, and a few researchers assume that highly refined systems might be built just with reinforcement learning. This easy game of hide-and-seek makes for an excellent example of how reinforcement learning works in action and the way simple directions produce shockingly intelligent behavior. AI capabilities are continuing to march forward, for better or for worse.
That’s the hazardous flip side of techniques like reinforcement learning. On the one hand, they’re highly effective techniques that may produce advanced behavior from a simple starting point. However, they’re potent techniques that can provide an unexpected and generally undesired superior response from a simple starting point.