Computer Scientists Find a Key Research Algorithm's Limits

A crucial algorithm known as gradient descent is used in many aspects of applied research. This procedure is used to find the largest and smallest values for a mathematical function. It is also known as optimizing the function. This can be used to determine everything from the most profitable manufacturing method to the best way of assigning shifts to employees.
However, researchers still don't know exactly what situations the algorithm is most effective in despite its widespread use. New research explains that gradient descent is fundamentally a difficult computational problem. This new finding limits the performance that researchers can expect to see from this technique in specific applications.

Original story reprinted by permission of Quanta Magazine. This independent publication published by the Simons Foundation has an editorial policy that allows for the reproduction of this article. It covers research developments in mathematics, the physical and the life sciences and aims to increase public understanding.

Paul Goldberg of Oxford was coauthor of the work with John Fearnley, Rahul Savani of University of Liverpool, and Alexandros Hollender of Oxford. At the Symposium on Theory of Computing, June 2012, the result was awarded the Best Paper Award.

A function can be viewed as a landscape. The elevation of the land equals the value of the function (the profit) at the particular location. Gradient descent is a method of searching for local functions that are minimal. It works by looking in the direction of the steepest ascent at any given place and then moving downhill. Gradient descent is the name given to the slope of the landscape.

Although gradient descent is an important tool in modern applied research, there are many problems that it fails to solve. This research was necessary because there wasn't a comprehensive understanding of what causes gradient descent to fail.

Costis Daskalakis, Massachusetts Institute of Technology, stated that a lot of gradient descent work was not based on complexity theory.

The study of computational complexity refers to the analysis of the resources required to solve or verify different computing problems. Researchers divide problems into classes. All problems within the same class share some basic computational characteristics.

Let's take as an example, a place where more people live in houses than there are houses. You are given a phonebook with all the addresses and names of the people in the town. Now you have to find the two residents who live in the same house. Although you know that you will find the answer because there are many people in town, it might take some searching, especially if they don't share a lastname.

This question is part of a class called TFNP. It stands for total function nondeterministic polonomial and it contains all the computational problems that can be solved quickly. Researchers focused on the intersection between two subsets within TFNP.

PLS (polynomial search) is the first subset. This group of problems involves finding the minimum and maximum value of a function within a specific region. These problems will have solutions that are easy to understand.

PLS stands for Planning a Route that Allows You to Visit a Fixed Number of Cities with the Shortest Travel Distance. This is because you cannot change the order of the cities you visit in any given tour. It is easy to calculate the route length and with a limit to the number of ways you can modify it, it's easy to see which changes reduce the trip. You will eventually find a route that you can't improve by making an acceptable local move.

PPAD (polynomial paraity arguments on directed graphics) is the second subset. These problems can be solved using a more complex process known as Brouwers fixed-point theorem. According to the theorem, for every continuous function, there will be one point at which the function remains unchangeda fixed point. This holds true for everyday life. The theorem states that if you stir a glass water, there must be at least one water particle that ends up in the exact same place that it began.