We evaluate the performance of four machine learning methods for modeling and predicting FCC solute diffusion barriers. More than 200 FCC solute diffusion barriers from previous density functional theory (DFT) calculations served as our dataset to train four machine learning methods: linear regression (LR), decision tree (DT), Gaussian kernel ridge regression (GKRR), and artificial neural network (ANN). We separately optimize key physical descriptors favored by each method to model diffusion barriers. We also assess the ability of each method to extrapolate when faced with new hosts with limited known data. GKRR and ANN were found to perform the best, showing 0.15 eV cross-validation errors and predicting impurity diffusion in new hosts to within 0.2 eV when given only 5 data points from the host. We demonstrate the success of a combined DFT + data mining approach towards solving materials science challenges and predict the diffusion barrier of all available impurities across all FCC hosts.
H Wu, A Lorenson, B Anderson, L Witteman, H Wu, B Meredig, and D Morgan. Robust FCC solute diffusion predictions from ab-initio machine learning methods. Computational Materials Science, June 2017