The work that made cloud infrastructure at Amazon possible was done by a man who isn't a household name, but is behind a few of them for computer scientists. He brought more attention to a few problems, including the bakery algorithm and the Byzantine Generals Problem. This is not an accident. The computer scientist is thoughtful about how people use and think about software. He won the A.M. Turing Award for his work on distributed systems, where multiple components on different networks coordinate to achieve a common objective. Internet searches, cloud computing and artificial intelligence all involve a lot of machines. This kind of coordination can lead to more problems. A distributed system is one in which the failure of a computer you didn't know existed can render your own computer useless. Concurrent systems, where multiple computing operations happen during overlapping slices of time, is one of the biggest sources of problems. The idea ofcausality was introduced in a 1978 paper by Lamport. Two observers may disagree on the order of events, but if one causes another, that eliminates the ambiguity. Sending or receiving a message can establish causality. A standard way to reason about concurrent systems is provided by logical clocks. With this tool in hand, computer scientists wondered how they could make these connected computers even bigger. Paxos is an elegant solution that allows multiple computers to execute complex tasks. Modern computing could not exist without Paxos.
In the early 1980s, as he developed the field, he created a document preparation system called LaTeX, which provides sophisticated ways to typeset complex formulas and format scientific documents. The standard for formatting papers in math and computer science has changed thanks to the use of LaTeX. Formal verification and the use of mathematical proof to verify the correctness of software and hardware systems have been the focus of the work done by Lamport since the 1990s. He created aspecification language for temporal logic of actions. A software specification is like a recipe for a program, it describes how software should behave on a high level. It isn't always necessary since coding a simple program is like boiling an egg. The coding equivalent of a nine-course banquet requires more precision. You need to combine the components of the dish in a precise way, then serve them to everyone in the correct order. This requires exact recipes and instructions, written in unambiguous and succinct language, but descriptions written in English prose could leave room for misinterpretation. The precise language of mathematics is used to prevent bugs and avoid design flaws. A model checker is a program that can be used to check if a recipe makes sense and works the way the chef wants it. Chefs would never cater a banquet without first knowing that their recipes will work, whereas programmers often cobble together a system before writing a proper specification. What is wrong with computer science education, and how using TLA+ can help programmers build better systems were some of the topics discussed by Quanta. The interview was edited for clarity.
I had a hunch that what the people were trying to accomplish was impossible. I came up with an idea that the people should have been using, instead of trying to prove it. They had a bunch of code. Most programmers don't think in terms of algorithms. There is no way that your program is not going to be full of bugs if you just code a concurrent system.What was wrong with their original algorithm?
Initially, the paper that introduced Paxos was not read very much. Why was it that way? It was impossible for people to read the paper because I like explaining things with stories and I made up names for characters. There was a cheese inspector in the paper. I was unaware that nonmathematicians get very freaked out by Greek letters. The paper was not read as it should have been because the readers couldn't deal with it. At first, that didn't work as well. Although in the long run it did, because people call this family of consensus algorithms Paxos instead of view stamped replication, which was another name for the same algorithm from Barbara. People in the 70s were proving the properties of the program in terms of programming languages. People realized that they should say what the program is supposed to accomplish first. I realized in the early 1980s that one way to write higher-level specifications for concurrent systems was to use abstract algorithms. I was able to express them in a completely rigorous fashion with TLA+. Everything clicked. If you really want to do things right, you need to write your algorithm in the terms of mathematics.After working on distributed systems for so many years, what got you into TLA+?
Model checking is a method for testing all the executions of a small model of the system. It shows the correctness of the model. Code just produces code when model checks tests for correctness. It doesn't test anything. Before model checking, the only way to be sure that your algorithm worked was to write a proof. Model checking checks all executions of a small instance of the algorithm. If you're lucky, you can check large enough instances that you have enough confidence in the algorithm. The proof can be used to prove the correctness of a system of any size. To be able to capture the reasoning that mathematicians do, Coq was designed. The four-color theorem was used to prove it. A proof of a mathematical statement is almost certainly true. TLA+ is designed for engineers who want to prove the properties of their systems. I learned what you needed to do in order to prove the correctness of a concurrent algorithm after 15 years of writing proofs. The logic allowed it to be all formal. The complete language is based on that.It sounds like model checking is related to another method of program verification: interactive theorem proving using tools such as Coq. How are they different?
I'm doing what I can. programmers and computer scientists are terrified of math It's a tough sell. Every project has to be done quickly. It's never time to do it right. You're adding a new step in the development process, and that's also a hard sell, so there's always time to do it over. Most of the code written by programmers across the world doesn't require very precise statements about what it is supposed to do. There are things that need to be correct. People want the chip to work right when they build it. When people build a cloud infrastructure, they don't want bugs that will lose their data. You need to be very strict for the kind of application where precision is important. If there is concurrency involved, you need something like TLA+.Is it always worth that upfront effort?
The importance of thinking and writing before you code is not taught in undergraduate computer science courses. There is no communication between the people who teach programming and the people who teach program verification. The fault lies on both sides of that divide. The people who teach programming don't know how to verify it. The people teaching verification don't know how to use it in practice. TLA+ isn't going to find a lot of users until the divide is bridged. I would like to get the people who teach concurrent programming to understand that they need it. Maybe there is some hope. Yes, on mathematical thinking. I don't know how to teach it to them. I know what people should have learned. They should not be afraid of math. They don't know how to use simple math that they have probably taken a course in. They don't know what it is. They learn enough to pass the exam, but then forget about it. I don't think about aesthetic things. I don't use the same words to express my feelings as other people do. Being beautiful isn't something I would say about an algorithm. I value simplicity very much. Any way you want. I don't recommend spending a lot of time thinking about it.I get the sense that you aren’t too happy with computer science education these days. Is it because it doesn’t put enough emphasis on mathematics?
How would you structure an undergraduate curriculum, then?
Mathematicians often say they see beauty in math. You started out in that field, so do you see beauty in algorithms?
One last thing, about another side project of yours with a sizable impact: LaTeX. I’d like to finally clear something up with the creator. Is it pronounced LAH-tekh or LAY-tekh?