A blue virtual being, made of neural networks, plays with colorful blocks inside a glowing sphere

Stephan is a writer for the magazine.

The history of artificial intelligence was changed by the invention of a data set by a computer scientist. The data set included millions of labeled images that could be used to train machine- learning models. The machines were better at recognizing objects than humans in 2015. Li started looking for another of the "North Stars" that would give the artificial intelligence a different push.

She was inspired by the time when many land-dwelling animal species appeared for the first time. The emergence of eyes that could see the world for the first time was thought to be the cause of the burst of new species. She said that vision in animals never occurs by itself but instead is embedded in aholistic body that needs to move, navigate, survive, manipulate and change. I pivoted towards a more active vision for artificial intelligence because of that.

Li focuses on agents that don't accept static images from a data set but can move around and interact with their environments in simulations of three-dimensional virtual worlds

This is the goal of a new field called embodied artificial intelligence, and Li is not the only one embracing it. Reinforcement learning, which has always trained interactive agents to learn using long-term rewards as incentive, overlaps with robotics since robots can be the physical equivalent of embodied artificial intelligence agents in the real world. A major shift from machines learning straightforward abilities, like recognizing images, to learning how to perform complex humanlike tasks with multiple steps, such as making an omelet, is what Li and others believe.

We get more ambitious, and we want to build an intelligent agent. Jitendra Malik is a computer scientist at the University of California, Berkeley.

A set of virtual activities has been created by the creator of the ImageNet data set.

The Stanford Institute for Human-Centered Artificial Intelligence used this picture.

Any agent that can probe and change its environment is included in embodied artificial intelligence work. Modern agents in realistic simulations may have a virtual body, or they may sense the world through a moving camera perspective that can still interact with their surroundings. Li said that the meaning of embodiment is not the body itself, but the ability to interact and do things with your environment.

The interactive way of learning gives agents a whole new way of learning. Being the one to experiment and cause the relationship to happen yourself is different from observing a possible relationship between two objects. The thinking says that the new understanding will lead to more intelligence. With a suite of new virtual worlds up and running, embodied agents have begun to deliver on this potential, making significant progress in their new environments

There is no proof of intelligence that is not learning through interacting with the world, according to a researcher at the University of Osnabrck.

Toward a Perfect Simulation

It was only in the past five years that researchers were able to create realistic virtual worlds for the purpose of exploring. Movie and video game industries have improved graphics. Artificial intelligence agents were able to make themselves at home in virtual worlds in 2017. The Allen Institute for Artificial Intelligence has created a simulation that allows agents to wander through naturalistic kitchens, bathroom, living rooms and bedrooms. When agents decided to get a closer look, they could study three-dimensional views that shifted as they moved.

It was possible for agents to reason about changes in time. Manolis Savva is a computer graphics researcher at Simon Fraser University. You have control over the temporally coherent stream of information in embodied artificial intelligence.

These simulations are good enough to teach agents to do new things. Rather than just recognizing an object, they can interact with it, pick it up and navigate around it, which is an essential step for any agent to understand its environment. Virtual agents went beyond vision in 2020 to hear the sounds of virtual objects and how they work.

The work is still being worked on. The best simulation is much less real than the real world. With colleagues at MIT and IBM, Yamins co-developed ThreeDWorld, which puts a strong focus on mimicking real-life physics in virtual worlds.

Savva said it was difficult to do. The research challenge is large.

It is enough for agents to learn in new ways.

Comparing Neural Networks

One easy way to measure embodied AI's progress is to compare embodied agents' performance to their counterparts trained on static image tasks The early results show that embodied agents learn differently than their predecessors.

Researchers found that an embodied agent was more accurate at detecting objects than the traditional approach. It took the object detection community more than three years to get to this level of improvement. We gained a lot of improvement simply by interacting with the world.

It has been shown in other papers that object detection can be improved if you allow them to explore a virtual space just once, or if you let them move around to see more objects.

Researchers are finding that embodied and traditional methods of learning are different. The neural network is the essential ingredient behind learning. A neural network is similar to the networks in human brains. One paper led by Clay and the other by Grace Lindsay, an incoming professor at New York University, showed that the neural networks in embodied agents were less active in response to visual information. Most of the time, nonembodied networks required a lot of neuron to be active. Lindsay's group compared the embodied and nonembodied neural networks to the activity in a mouse's visual cortex and found the embodied version was the closest match.

Lindsay is quick to point out that the embodied versions are not necessarily better than the real thing. Clay and Lindsay's work comparing the underlying differences in the same neural networks has agents doing completely different tasks so they need neural networks that work differently to accomplish their goals.

While comparing embodied neural networks to nonembodied ones is a measure of progress, researchers aren't interested in improving embodied agents' performance on current tasks The goal is to learn more complicated, humanlike tasks, and that is where researchers are most excited to see signs of progress. An agent needs to remember the long-term goal of its destination while forging a plan to get there without getting lost.

A team led by Dhruv Batra, a research director at Meta Artificial Intelligence and a computer scientist at the Georgia Institute of Technology, rapidly improved performance on a specific type of navigation task called point- goal navigation. An agent is dropped in a brand new environment and has to navigate to target coordinates relative to the starting position without a map. Batra said that they were able to get greater than 99% accuracy on a standard data set by giving the agents aGPS and compass. They were able to expand the results to a more realistic scenario where the agent doesn't have a compass or gps. The agent was able to reach 94% accuracy by estimating its position on the fly.

Dhruv Batra in a blue checkered shirt indoors

Mottaghi said that this is great progress. This doesn't mean that navigation is solved In part, that's because many other types of navigation tasks that use more complex language instructions, like "Go past the kitchen to retrieve the glasses on the nightstand in the bedroom," remain at only 30% to 40% accuracy.

One of the simplest tasks in embodied artificial intelligence is navigation. Artificial intelligence agents are far from mastering objects. There are a lot of ways that the agent can go wrong when interacting with new objects. Most researchers get around this by choosing tasks with a few steps, but most humanlike activities, like baking or doing the dishes, require long sequence of actions. There needs to be a bigger push.

Li may be at the forefront, having created a data set that she hopes will help embodied artificial intelligence. Her team has released a standardized simulation data set with 100 humanlike activities for agents to complete that can be tested in any virtual world. By creating metrics that compare the agents doing these tasks to real videos of humans doing the same task, Li will allow the community to better evaluate the progress of virtual artificial intelligence agents.

The purpose of simulation is to train the agents for the real world.

She believes that simulation is one of the most exciting areas of robotic research.

The New Robotic Frontier

Intelligence agents are embodied by robot They are the most extreme form of embodied artificial intelligence agents. Even agents can benefit from virtual world training.

State-of-the-artrobots, like reinforcement learning and those types of things, need millions of iterations to learn something meaningful. Training realrobots on difficult tasks can take a long time.

Thousands of agents can be trained at once in thousands of slightly different rooms if they are trained in virtual worlds. Virtual training is a safer option for the robot.

Researchers at Openai proved that it was possible to transfer skills from simulation to the real world. The cube was trained to be manipulated by a robotic hand. Recent successes have allowed flying drones to learn how to avoid crashes in the air, self-driving cars to deploy in urban settings across two different continents, and four-legged doglike robots to complete an hourlong hike in the Swiss Alps.

Humans could be sent into virtual space via virtual reality headsets in the future. A key goal of robotics research is to build machines that are helpful to humans in the real world. They need to learn how to interact with humans before they can do that.

It's going to be very powerful to use virtual reality to get humans into these simulations and allow them to interact with the robot.

Whether they exist in simulations or the real world, embodied artificial intelligence agents are learning the same things we do. The field is moving at a rapid pace on all fronts.

There is a convergence of deep learning, robotic learning, vision and even language. Through the moonshot or North Star, we are going to learn the technology of intelligence that can lead to major breakthrough.