Watching someone play, whether on stream or in an uploaded video, can teach you about that particular game. You can learn some trick, or know the fundamentals of its gameplay. But as for any game, you need to know how it is handled, the mapping of the actions on the buttons of a pad or a keyboard, how the screen responds to your actions.
When you play a game and you already know the gameplay of that game, you learn more than someone who is watching it without having tried it on a console / PC, since that other person needs to get to grips with the physical controls first. This is an ABC, but imagine that you are an expert in Artificial Intelligence and you need to teach not a player, but an AI. And also with a more visual than playable training. How would you do it?
The AI that plays Minecraft
There are several neural networks that have conquered various types of games in recent years through what is called reinforcement learning: DeepMind’s AlphaZerowhich took on chess, Go and Shogi, and the subsequent program MuZerowhich added the ability to handle Atari games.
“There are millions of hours of gameplay on the Net, what happens is that these videos only provide a record of what happened, but not precisely how it was achieved”: this is the challenge faced by OpenAI engineers, a company specialized in Artificial Intelligence, in their project ‘Learning to play Minecraft with VPT (Video Pre-training)’. They set out to train a neural network on a huge unlabeled video dataset of real player Minecraft gameplay, while “we use only a small amount of data from tagged contractors.”
The challenge was for its AI to learn to play a more complex game than those mentioned, such as Minecraft, and also using more options than just language, using a more visual model taking advantage of the number of hours of gamers playing Minecraft that there are in Internet.
According to OpenAI engineers, “our model can learn to make diamond tools, a task that typically takes competent humans over 20 minutes (24,000 actions). Our model uses the native human interface of keystrokes and mouse movements, which makes it quite general, and represents a step towards general agents using computers.”
The concept of training your AI with more gameplay videos than the actual gameplay begins by gathering a small dataset of the hired players “in which we record not only their video, but also the actions they performed, which in our case they are keystrokes and mouse movements. With this data we train an inverse dynamics model (IDM), which predicts the action that is performed in each step of the video”.
Importantly, the IDM can use past and future information to guess the action at each step. This task is much simpler and therefore requires much less data than the behavioral cloning task of predicting actions from past video frames only, which requires inferring what the person wants to do and how to do it. cape. Next, “we can use the trained IDM to tag a much larger dataset of online videos and learn to act by cloning behaviors.”
The researchers attached tags to frames of the game video for actions such as ‘inventory’, to check the player’s collection of items using the ‘E’ key; and ‘sneak, to move carefully in the current direction, using the SHIFT key. These actions are recorded as JSON text strings at each moment of the game and are stored with the video frames.
The game frames with their labeled actions were used to train the IDM’s neural network, which learns which actions go with which frames. The IDM is a mixture of various types of neural networks, such as a 3D convolutional neural network and a ResNet to analyze video frames, and various Transformer networks (neural networks for sequences, based on self-attention,) of attention to predict the next video frame.
The IDM’s trained capacity is then used on a much larger set of video streams, a total of 70,000 hours of untagged Minecraft footage collected from the web. The IDM applies “pseudo-labels” to this much larger collection. In other words, the IDM, and the contractors’ fees, are a way to crank out a huge set of training videos.
Trained with 70,000 hours of IDM-tagged online video, the so-called behavioral cloning model (the “VPT foundation model”) performs tasks in Minecraft that “are nearly impossible to accomplish with reinforcement learning from scratch.” The AI learned to cut down trees to collect logs, turn those logs into planks, and then turn those planks into a crafting table. A sequence that takes a human Minecraft expert approximately 50 seconds or 1,000 consecutive game actions.
In addition, the model performs other complex abilities that humans often do in the game, such as swimming, hunting animals for food, and eating that food. He also learned the skill of “pillar jumping” , a common behavior in Minecraft of rising up by repeatedly jumping and placing a block under oneself. And its creators managed to make the AI do all the steps and actions that are required to achieve the Diamond Peak, which took him more than 20 minutes and 24,000 actions.
Why train it with Minecraft
Why use Minecraft and not another game? OpenAI chose to validate its learning method in Minecraft because is about “one of the most played video games in the world and therefore has a large amount of video data available for free”. And because it is “an open game with a wide variety of things to do, similar to real-world applications, such as using the computer”.
Unlike previous Minecraft work that uses simplified action spaces in order to make exploration easier, its AI uses the much more applicable, but also much more difficult, native human interface: 20Hz frame rate with mouse and keyboard .
The construction work of the neural network, called VPT, was developed in two stages. In the first, human players or contractors were needed, who gathered 4,500 hours of play. Researchers later discovered that they actually only needed about 2,000 hours.
Baker and his team describe the process:
“We had open applications for one day, and then we randomly selected 10 applicants for the first round of contractors. Later, when we needed more data and some contractors asked to have their contracts terminated, we added more applicants from the original pool, as well as references from contractors who were already working.
Contractors were paid $20/€19 per hour (minus Upwork platform fees and applicable taxes). All results presented in this paper are based on about 4,500 hours of data (including data recorded to collect human game statistics that was not used for training), which they cost us about 90,000 dollars (86,300€). Throughout the project, we collected some data that we didn’t use due to recorder failures and for some ideas that we ultimately didn’t carry out.
In total, we spent about $160,000 (€153,410) in compensation to contractors throughout the project. However, as we discussed in section 4.6, we could probably get most of our results with a trained IDM. using only $2,000 (€1,917) of datathat is, the basic VPT model, the BC fit to the earlygame_keyword dataset, and the results of the RL fit.
The collection of the contractor_house dataset It cost about $8,000 (€7,670). Since we used the IDM trained on about 2,000 hours of contractor data, the actual cost of contractor data for those results was about $40,000 (€38,352).”