Artificial Intelligence (AI) never ceases to amaze us, and this time OpenAI researchers revealed an experiment in a research paper and blog post showing how an AI was trained to play Minecraft and be much better than a human.
“With fine tuning, our model can learn to make diamond tools, a task that typically takes competent humans over 20 minutes (24,000 actions). Our model uses the native human interface of keystrokes and mouse movements, which makes it quite general, and represents a step towards general agents using computers,” the OpenAI folks say.
To do so, they used the Video PreTraining (VPT) system, which consists of collecting a small set of data from other users “where we record not only their video, but also the actions they took, which in our case are keystrokes and mouse movements.” mouse. With this data we train an inverse dynamics model (IDM), which predicts the action that is being performed in each step of the video. It is important to note that the IDM can use past information and future to guess the action at each step. This task is much easier and therefore requires much less data than the behavioral cloning task of predicting actions. given only past video frames, which requires inferring what the person wants to do and how to achieve it. We can then use the trained IDM to tag a much larger dataset of online videos and learn to act through behavioral cloning.”
“For many tasks, our models exhibit human-level performance, and we are the first to report computing agents that can make diamond tools, which can take competent humans more than 20 minutes (24,000 environmental actions) of gameplay to accomplish.” , OpenAI wrote in its research paper detailing the results.
We trained a neural network to competently play Minecraft by pre-training on a large unlabeled video dataset of human Minecraft play and a small amount of labeled contractor data. https://t.co/a2pyBqvLvg pic.twitter.com/XbqtwQSTwU
— OpenAI (@OpenAI) June 23, 2022
“Trained on 70,000 hours of IDM-tagged online video, our behavioral cloning model (the “VPT base model”) performs tasks in Minecraft that are nearly impossible to accomplish with reinforcement learning from scratch. Learn how to cut down trees to collect logs, craft those logs into planks, and then craft those planks at a crafting table; this sequence takes a competent human in Minecraft approximately 50 seconds or 1,000 consecutive game actions”, concludes OpenAI.