The standard tools of modern computing are the chunks of code that allow programs to sort, filter and combine data. Like tiny gears inside a watch, the programs execute well-defined tasks.

They are ubiquitous and have been painstakingly adjusted over time. When a programmer needs to sort a list, they will use a standard method that has been used for decades.

Machine learning is a branch of artificial intelligence that is being used by researchers to look at traditional algorithms. Machine learning tools can provide insights into the data that traditional algorithms handle. The tools have rejuvenated research into basic algorithms.

Abstractions navigates promising ideas in science and mathematics. Journey with us and join the conversation.

A computer scientist at the Massachusetts Institute of Technology said that traditional and machine learning are vastly different ways of computing.

The paper by Tim Kraska, a computer scientist at MIT, and a team of researchers from Google, started the recent explosion of interest in this approach. The authors suggested that machine learning could be used to improve the traditional Bloom filter, which is a straightforward but daunting problem.

If your employees are going to websites that pose a security risk, you need to check your IT department. You might think you need to check every site they visit against a blacklist. You can't check every site against a huge list in a tiny amount of time if the list is large.

You can quickly and accurately check whether a site is on the blacklist with the help of the Bloom filter. The huge list is compressed into a smaller list that has some specific guarantees.

If they say the site is bad, it's bad. They can produce false positives, so maybe your employees won't be able to visit some sites they should have access to. They trade some accuracy for an enormous amount of data compression.

Every website is equally suspicious until it is confirmed to not be on the list. Some websites are more likely than others to be on a blacklist because of details like their domain or URL. People understand this and are likely to read URLs to make sure they are safe before they click on them.

This kind of logic can be applied by Kraska's team. A machine learning model that learns what malicious URLs look like after being exposed to hundreds of thousands of them is called a learned Bloom filter.

The RNN uses its training to determine if a website is on the blacklist when it learns the Bloom filter. If the RNN says it is on the list, the learned Bloom filter rejects it. If the RNN says the site isn't on the list, the small Bloom filter gets a turn, accurately but unthinkingly searching its compressed websites.

By giving the final say to the Bloom filter at the end of the process, the researchers made sure that it could still guarantee no false negatives. The small Bloom filter acts as a backup, keeping its false positives to a minimum. A website that could have been blocked by a larger Bloom filter can now get past it. Kraska and his team were able to take advantage of two proven but traditionally separate ways of approaching the same problem to achieve faster, more accurate results.

Machine learning and traditional algorithms are vastly different ways of computing, and using predictions is a way to bridge the two.

The Massachusetts Institute of Technology has a person named Piotr Indyk.

Kraska's team showed that the new approach worked, but they didn't explain why. Kraska's paper was found to be innovative and exciting by an expert at Harvard University. He asked, "What exactly does that mean?"

A theory that explained exactly how it worked was provided by Mitzenmacher in 2019. Kraska and his team showed that it could work in one case, but Mitzenmacher proved it could always work.

The learned Bloom filters were improved by Mitzenmacher. He showed that by adding a standard Bloom filter to the process before the RNN, negative cases can be pre-filtered. He used the theory he developed to prove it was an improvement.

The early days of predictions have led to innovative ideas and rigorous mathematical results, which in turn leads to more new ideas. In the past few years, researchers have shown how to incorporate predictions into scheduling and chip designs.

An approach to computer science that's growing in popularity is making algorithms more efficient by designing them for typical uses.

The most difficult scenario for computer scientists is one designed by an adversary to defeat them. Imagine trying to check the safety of a website. The website has a computer virus in its URL and page title. It's so confusing that it can trip up even the most sophisticated programs.

Indyk said that most of the websites employees visit aren't generated by adversaries. By ignoring the worst-case scenarios, researchers are able to design tailored algorithms. While databases currently treat all data equally, predictions could lead to databases that structure their data storage based on their contents and uses.

Machine learning programs are only the beginning, as they typically only do so in a limited way. Most of the new structures only have a single machine learning element. Kraska imagines an entire system built up from several separate pieces, each of which relies on predictions and whose interactions are regulated by prediction-enhanced components.

Kraska said that taking advantage of that will impact a lot of different areas.