A combination of Gaussian processes with a particle or light beam passing through. This image shows the inner workings and functionality of gpCAM, a program developed by Berkeley Lab researchers to enable autonomous scientific discovery. Credit: Marcus Noack, Berkeley LabThe challenge facing experimental facilities across the globe is that their instruments are getting more powerful. This leads to an increase in volume and complexity of scientific data. These tools require new algorithms to make the most of their capabilities. This will allow scientists to answer more complex scientific questions. The ALS-U project at Lawrence Berkeley National Laboratory (Berkeley Lab), to upgrade the Advanced Light Source facility will produce 100 times more soft X-ray radiation and superfast detectors, which will allow for a significant increase in data-collection speeds.Researchers need to find new ways to reduce the data required for scientific discovery. An emerging field called autonomous discovery is promising. It allows multi-dimensional parameter spaces, such as those found in multi-dimensional space exploration, to be explored faster, more efficiently and with minimal human intervention."More experimental fields are using this new optimal, autonomous data acquisition because it's always about approximating a function, given noisy data," Marcus Noack, a researcher in the Center for Advanced Mathematics for Energy Research Applications at Berkeley Lab, said. He is also the lead author of a paper on Gaussian processes for data acquisition that was published in Nature Reviews Physics on July 28. This paper is the result of a multiyear, multinational effort by CAMERA to bring new autonomous discovery techniques to a wide scientific community.Stochastic processes take the leadIn recent years, autonomous discovery methods have improved. In particular, stochastic processes such as Gaussian process regression [GPR]), have emerged as the preferred method for steering various classes of experiments. GPR's success in steering experiments is due its probabilistic nature. This allows us to make decisions based upon the uncertainty of the current model. This is the core of gpCAM software developed by CAMERA.Noack stated that stochastic processes, in contrast to deep learning can be used to make decisions on small datasets and provide uncertainty estimates that can optimize the learning process.Although CAMERA's initial research focused primarily on synchrotron beamline experimentation, a growing number scientists from other disciplines are beginning to see the benefits of incorporating autonomous discovery techniques in their project workflows. A workshop on autonomous discovery and science and engineering, sponsored by CAMERA, was held in April. It was chaired and attended by hundreds of scientists around the globe. This is a reflection of the growing interest in this field.Martin Bhm, instrument scientist at Institut Laue-Langevin, Grenoble, France and co-author of the Nature Reviews Physics paper, said that although we are still in the early stages with this, much progress has been made over the past year. It offers new ways to do experiments in spectrometry and allows the instruments to do the work. This results in time savings for the users. There are many other potential applications for physics, math and chemistry.Multiple uses emergingJohn Thomas, a postdoctoral researcher in Berkeley Lab's Molecular Foundry is using photo-coupled scanner probe microscopy to study the material properties of thin film semiconducting system. He has also been using gpCAM as a tool to improve these efforts.Thomas stated that nanoscale applications that use artificial intelligence and machine-learning algorithms, specifically for scanning probe system, have been a focus of the Weber Bargioni group [at The Foundry] for a while." "We were interested in Gaussian processes for autonomous discovery during the summer 2020."Recently, the group completed an application that made use of gpCAM in a Python-toLabVIEW interface. With some user input, gpCAM drives a semiconductive, two-dimensional probe across a semiconductive material for collecting hyperspectral data. The images obtained are a convolution between electronic and topographic information. Point spectroscopy extracts the local electronic structure.Thomas stated that autonomous driving of scanning probe tools, without constant human intervention, can maximize tool performance for scientists and engineers by continuing experiments outside of business hours or providing routes for simultaneous tasks in a workflow. The tool can also be set up to run an autopilot while the user can make efficient use of the time available. We can now use Gaussian processes with sub-ngstrm resolution to map and identify defects in 2D heterostructures.Aaron Michelson is a graduate researcher at Columbia University in the Oleg Gang Group. He studies DNA origami-based self assembly and is just starting to use gpCAM. It is being used by his team to investigate the thermal annealing history DNA origami superlattices at nanoscale. In another instance, it is being used for mining large datasets from 2DX-ray microscopy research.He said that DNA nanotechnology is often limited in its ability to sample large parameters for synthesis. This requires either a large amount of data to collect or an efficient way to experiment. It is possible to directly incorporate autonomous discovery in large data mining and in guiding new experiments. This allows researchers to stop making random samples and instead take control of their decisions."Noack's leadership and work have brought together an extensive, inter-disciplinary co-design community. James Sethian, CAMERA Director and co-author of the Nature Reviews Physics paper, said that this type of building of scientific communities is the core of what CAMERA tries.Continue exploring Automatically steering experiments towards scientific discoveryMore information: Marcus M. Noack and colleagues, Gaussian processes to autonomously acquire data at large-scale synchrotron/neutron facilities, Nature Reviews Physics (2021). Marcus M. Noack and colleagues, Gaussian processes to autonomous data acquisition at large scale synchrotron or neutron facilities, (2021). DOI: 10.1038/s42254-021-00345-y