There is an old joke that physicists love to tell: Everything was discovered and reported in a Russian journal back in the 1960s. We just don't know what it is. Although the joke is a bit hyperbolic, it accurately depicts the current state. The amount of knowledge is enormous and growing rapidly: In 2021, arXiv will host 190,000. This is just a small subset of the scientific literature that was produced in this year.
We don't know all that much because no one can read the whole literature in their field. This includes journal articles, research papers, lab notes and slides, white papers, technical notes, reports, and PhD theses. It is possible, in fact, that answers to many questions are hidden within this mountain of papers. Important discoveries may have been forgotten or overlooked, and connections remain elusive.
Artificial intelligence is one possible solution. Algorithms are able to analyze text without human supervision and find relationships between words that can help uncover knowledge. We can achieve far more if we abandon writing scientific articles that have not changed much in the last hundred years.
Text mining has its limitations. It doesn't have access to all papers or legal issues. AI is not able to understand concepts or the relationships between them. It can also be sensitive to biases in data sets, such as the choice of papers it analyses. AI and even a non-expert human reader find it difficult to understand scientific papers. This is partly because the meanings of terms in different disciplines can be different. It is becoming increasingly difficult to identify a topic accurately using keywords to find all relevant papers due to the increasing interdisciplinarity in research. Even the most brilliant minds have difficulty connecting and (re)discovering related concepts.
SIGN UP Subscribe to WIRED to stay informed with your favorite Ideas writers.
This is why AI cannot be trusted. After text-mining, AI outputs will be double-checked by humans. It's a tedious task that defeats the purpose of AI. To solve this problem we need to make science papers not only machine-readable but machine-understandable, by (re)writing them in a special type of programming language. Also, teach science to machines in the language they understand.
Although writing scientific knowledge in programming-like languages will be difficult, it will be possible to sustain the process. New concepts will be added to the existing science library that machines can understand. As machines learn more scientific facts, they can help scientists simplify their logic arguments, spot inconsistencies, duplications, and spot plagiarism; and highlight connections. AI that is able to understand physical laws is far more powerful than AI that relies on data alone. Science-savvy machines will also be better equipped to assist in future discoveries. A machine with a deep understanding of science can complement rather than replace scientists.
This process of translating has been started by mathematicians. By writing proofs and theorems in languages such as Lean, they teach mathematics to computers. Lean is a programming language and proof assistant that allows one to introduce mathematical concepts as objects. Lean uses the known objects to determine whether a statement holds true or false. This allows mathematicians to verify their proofs and find areas where their logic is not sufficiently rigorous. Lean can do more with the mathematics it knows. The Xena Project, Imperial College London, aims to integrate the entire undergraduate mathematics curriculum into Lean. Proof assistants could one day help mathematicians to do their research by checking their reasoning, and searching the vast amount of mathematics knowledge they have.