Executive Summary
DE-RISKING THE EXPLORATION OF NEW CHEMISTRIES
2 novel thermoelectrics predicted, synthesized and tested
MODEL REUSED ON THOUSANDS OF PROJECTS
>2500 researchers have used the online, openly available model to make > 10,000 predictions
RAPID ASSESSMENT TOOL
Machine learning can readily estimate the performance of millions of materials

The Challenge
- Make a machine learning model that accurately predicts the properties of thermoelectric materials
- Use the model to find new thermoelectrics that would not have been obvious with intuition alone
- Make the model accessible openly online so that other researchers can use it on their projects
Researchers from the Universities of Cambridge, Utah, and Alberta, alongside the co-founders of Citrine, accepted this challenge. Further details can be found in this APL Materials paper:
https://aip.scitation.org/doi/full/10.1063/1.4952607
The Approach
First, the researchers collected a data set of the electrical and thermal conductivities, Seebeck coefficients and band gaps for as many currently known thermoelectrics as practically possible. The team used a mix of experimental data and electronic structure data calculated from first principles.
Next, the researchers developed a set of descriptors, or features, which they used to characterize the materials that would act as inputs to the machine learning model. Feature development and selection is an opportunity to inject domain knowledge into the modelling process–these descriptors can encode “known physics” about materials.
The model was then trained and cross-validated on the data set to determine the error distributions. A calculated non-zero band gap was used as a constraint, to rule out metals.
The Results
To validate the predictive power of the model, two materials, Er12Co5Bi and Gd12Co5Bi, were chosen to be synthesized and tested. They were chosen because they are chemically distinct from known thermoelectrics, and the experimental team believed they would be easy to synthesize.
Experimental testing validated the predicted property results and showed these materials to have similar crystal structures to other known thermoelectrics. This is a remarkable result, as crystal structure was not an input into the model.
These materials are scientifically interesting not only because of their chemical distinctiveness, but also because their thermal conductivity rises with temperature–a highly unusual property. They can now be further optimized through doping and microstructure engineering.
Reuse
The model, publicly available as a web app, has been used by >2500 researchers to do >10,000 predictions since the APL Materials paper was published. This model continues to be reused by researchers all over the world to guide and accelerate their work.
http://thermoelectrics.citrination.com
What next?
Materials informatics is transforming the process by which new materials and chemicals are developed – part of the larger digital transformation in manufacturing. The application of data-driven methods to materials R&D has huge potential benefits. Machine learning coupled with a smart data infrastructure can help companies develop better materials, faster, reduce production and R&D costs and capitalize on domain knowledge across the enterprise.
References
1: Perspective: Web-based machine learning models for real-time screening of thermoelectric materials properties, APL Materials 4, 053213 (2016), Michael W. Gaultois, Anton O. Oliynyk, Arthur Mar, Taylor D. Sparks, Gregory J. Mulholland, and Bryce Meredig