What Does It Mean to Align AI With Human Values?

Mike Letterman

Dec 13, 2022, 3:42 PM
2 Views

I learned to program on an older machine. There was a built-in command for the operating system. The machine would try to figure out what I meant to do if I typed a command and got an error. It worked for a small portion of the time.

Humans are prone to giving machines ambiguous instructions and we want them to do what we say, not necessarily what we say.

Sometimes computers misconstrue what we want them to do. While investigating an image classification program, one machine learning researcher discovered that it was based on how long it took to access the image file, rather than the image itself. The Roomba was connected to a neural network that rewarded speed but punished the vacuum cleaner when it bumped into furniture, because another person wanted his vacuum to stop bumping into furniture. The machine always drove backwards.

There is a dark side to these anecdotes. They think that the inability of machines to discern what we really want them to do is a risk. They believe that we need to find ways to align the systems with the people they are supposed to serve.

The process of discovery is explored in this column. Melanie Mitchell is a professor at the Santa Fe Institute and the author of Artificial Intelligence: A Guide for Thinking Humans.

You can see all the columns that have been quantized.

Nick Bostrom's book Superintelligence argued that the rising intelligence of computers could pose a threat to the future of humanity. Bostrom never defined intelligence, but he adopted a definition from the researcher Stuart Russell, who said that an entity is considered to be intelligent if it chooses actions that are expected to achieve its objectives.

Bostrom looked at two theses to figure out the risks of artificial intelligence. Bostrom said that intelligence and final goals are axes along which possible agents can vary. More or less any level of intelligence can be combined with a final goal. The second thesis states that an intelligent agent will act in ways that promote its own survival, self- improvement and acquisition of resources if these make it more likely to achieve its final goal. He assumed that researchers would create an artificial intelligence that surpasses the cognitive performance of humans in virtually all areas of interest.

If we don't align superintelligent artificial intelligences with our values and desires, this will spell doom for humanity. Bostrom used a thought experiment to show the danger of giving a super intelligent artificial intelligence the goal of maximizing the production of paper clips. The artificial intelligence system will use its brilliance and creativity to increase its own power and control, eventually acquiring all the world's resources to make more paper clips, according to Bostrom's theses. Paper clip production will be maximized as humanity dies out.

If you believe that intelligence is defined by the ability to achieve goals, and that a superintelligent agent would use its superintelligence to do anything to achieve that goal, then you will arrive at the same conclusion.

In science fiction, humans are threatened by machines who misinterpret human desires. There is a segment of the research community that is very concerned about this scenario playing out in the real world. Dozens of institutes have already spent hundreds of millions of dollars on the problem and research is underway at universities around the world.

Job loss, bias, privacy violations, and misinformation spread are some of the immediate risks posed by non-super intelligent artificial intelligence. There isn't much overlap between the communities that worry about short-term risks and those that worry about long-term risks. There is something of an artificial intelligence culture war, with one side more concerned about the current risks than the potential catastrophic risks, and the other side less concerned about the current problems.

Outside of these specific communities, it looks like a religion, with revered leaders, unquestioned doctrine and devoted disciples fighting a potentially all-powerful enemy. Scott said that there are now two branches of the faith:Orthodox andReform. He writes that he is worried about "misaligned artificial intelligence that deceives humans while it works to destroy them." He writes that Reformai-riskers entertain that possibility, but they worry more about the dangers of powerful artificial intelligence that is weaponized by humans.

Many researchers are involved in alignment-based projects, ranging from attempts to impart principles of moral philosophy to machines. These efforts have not been very useful in getting machines to reason about real world situations. The many obstacles preventing machines from learning human preferences and values are noted by many writers. We don't know who our machines should try to learn.

inverse reinforcement learning is thought to be the most promising path forward by many in the alignment community. Proponents of alignment believe that it can lead to paper clip maximizer scenarios if the machine is not given an objective. The machine is supposed to observe the behavior of humans and figure out their goals and values. In the past few years, researchers have used IRL to train machines to play video games by observing humans and to teach them how to do backflips by giving them feedback from humans.

It is not clear if machines can teach the more subtle and abstract ideas of human values. Brian Christian, author of a popular science book about artificial intelligence alignment, is optimistic that it is possible to replace the concept of backflip with an even more vague concept. Or something similar. Is it good behavior?

The challenge is underestimated. Good behavior and kindness are more complex and context- dependent than anything IRL has been able to achieve so far. Consider the idea of "truthfulness", a value we want in our artificial intelligence systems. Today's large language models can't distinguish truth from lies. To protect privacy, to avoid insulting others, or to keep someone safe are some of the things we may want our artificial intelligence assistants to avoid.

Other ethical concepts are not easy to understand. It should be clear that an essential first step toward teaching machines ethical concepts is to enable machines to grasp humanlike concepts in the first place.

I see a bigger problem with the science behind the idea of alignment. A machine that surpasses humans in all cognitive tasks but still lacks humanlike common sense and is oddly mechanical in nature is what most discussions imagine. In keeping with Bostrom's thesis, the machine has achieved superintelligence without having any of its own goals or values.

Intelligence might work this way. Neuroscience and psychology do not support this possibility. Intelligence is connected to our goals and values, as well as our sense of self and our social and cultural environment. Many failed predictions have been made because of the intuition that a kind of pure intelligence could be separate from other factors. It seems like a generally intelligent artificial intelligence system would have to develop, like ours, due to its own social and cultural upbringing.

Russell argues in his book that the right time to worry about a potentially serious problem for humanity depends on how long it will take to prepare and implement a solution. Without a better understanding of what intelligence is and how it is related to other aspects of our lives, we can't find a solution. It will take a broad, scientifically based theory of intelligence to properly define and solve the alignment problem.

What Does It Mean to Align AI With Human Values?

Mike Letterman

Tech

Buzz

Travel

Health

Related News

Patrick Mahomes, Kansas City Chiefs prevail against Buffalo Bills, win wild AFC divisional game in overtime

Popular on TVN

Stay In Touch

Follow Us