Scientific Papers

Citrine Authors


J Ling, M Hutchinson, E Antono, B DeCost, E Holm, B Meredig. Building Data-driven Models with Microstructural Images: Generalization and Interpretability. Materials Discovery, 2018.

J Hill, A Mannodi-Kanakkithodi, R Raprasad, B Meredig. Materials Data Infrastructure and Materials Informatics. Materials Data Infrastructure and Materials Informatics. In: Shin D., Saal J. (eds) Computational Materials System Design. Springer, Cham (2018)


M Hutchinson, E Antono, B Gibbons, S Paradiso, J Ling, and B Meredig. Overcoming data scarcity with transfer learning. arXiv preprint arXiv:1711.05099, 2017

J Ling, M Hutchinson, E Antono, S Paradiso, and B Meredig. High-Dimensional Materials and Process Optimization using Data-driven Experimental Design with Well-Calibrated Uncertainty Estimates. arXiv preprint arXiv:1704.07423, 2017

H Wu, A Lorenson, B Anderson, L Witteman, H Wu, B Meredig, and D Morgan. Robust FCC solute diffusion predictions from ab-initio machine learning methods. Computational Materials Science, June 2017

B Meredig. Industrial materials informatics: Analyzing large-scale data to solve applied problems in R&D, manufacturing, and supply chain. Current Opinion in Solid State and Materials Science, June 2017

J Ling and A Kurzawski. Data-driven Adaptive Physics Modeling for Turbulence Simulations. 23rd AIAA Computational Fluid Dynamics Conference, AIAA AVIATION Forum 2017

JL Wu, JX Wang, H Xiao, and J Ling. A Priori Assessment of Prediction Confidence for Data-Driven Turbulence Modeling. Turbulence and Combustion, March 2017


Michel, K., & Meredig, B. (2016). Beyond bulk single crystals: A data format for all materials structure–property–processing relationships. MRS Bulletin, 41(08), 617-623. [invited paper]

O’Mara, J., Meredig, B., & Michel, K. (2016). Materials Data Infrastructure: A Case Study of the Citrination Platform to Examine Data Import, Storage, and Access. JOM, 68 (8), 2031-2034. [invited paper]

Gaultois, M. W., Oliynyk, A. O., Mar, A., Sparks, T. D., Mulholland, G. J., & Meredig, B. (2016). Web-based machine learning models for real-time screening of thermoelectric materials properties. APL Materials 4(053213). [Featured Article for the Materials Genome special topic issue]

Hill, J., Mulholland, G., Persson, K., Seshadri, R., Wolverton, C., and Meredig, B. (2016) Materials science with large-scale data and informatics: Unlocking new opportunities. MRS Bulletin 41(05), 399-409. [Technical Feature]

Mulholland, G.J. & Paradiso, S.P. Materials informatics across the product lifecycle: Selection, manufacturing, and certification. APL Materials [invited paper]

Chen, W., Pohls, J. H., Hautier, G., Broberg, D., Bajaj, S., Aydemir, U., Gibbs, Z.M., Zhu, H., Asta, M., Snyder, G.J., Meredig, B., White, M.A., Persson, K., & Jain, A. (2016). Understanding Thermoelectric Properties from High-Throughput Calculations: Trends, Insights, and Comparisons with Experiment. Journal of Materials Chemistry C.

Sparks, T. D., Gaultois, M. W., Oliynyk, A., Brgoch, J., & Meredig, B. (2016). Data mining our way to the next generation of thermoelectrics. Scripta Materialia, 111, 10-15. [invited paper]

Meredig, B. Industrial materials informatics: analyzing large-scale data to solve applied problems in R&D, manufacturing, and supply chain. COSSMS [invited paper, submitted]

Papers Mentioning Citrine


Bunn, J. K., Hu, J., & Hattrick-Simpers, J. R. (2016). Semi-Supervised Approach to Phase Identification from Combinatorial Sample Diffraction Patterns. JOM, 68(8), 2116-2125.

In the current implementation, SS-AutoPhase (semi-supervised AutoPhase) was used to phase map 278 diffractograms from a FeGaPd “open-data” combinatorial thin-film library.[Citrine Informatics, Fe-Ga-Pd, Citrination,]

Kalidindi, S. R., Brough, D. B., Li, S., Cecen, A., Blekh, A. L., Congo, F. Y. P., & Campbell, C. (2016). Role of materials data science and informatics in accelerated materials innovation. MRS Bulletin, 41(08), 596-602.

Building on these efforts, new tools are being developed to improve data curation, such as the Materials Data Curation System,70 Materials Commons,71 and the Citrine platform.55 Chance and Paul72 outline how to connect the wide variety of data sets and tools using a semantic web infrastructure.

Blaiszik, B., Chard, K., Pruyne, J., Ananthakrishnan, R., Tuecke, S., & Foster, I. (2016). The Materials Data Facility: Data Services to Advance Materials Science Research. JOM, 68(8), 2045-2052.

To this end, a variety of materials-related databases and data repositories have been established, for example, the Materials Project,4 the Open Quantum Materials Database (OQMD),5 the NIST Materials Data Repository,6,7 NREL MatDB,8 NIMS MatNavi,9 Automatic-FLOW for Materials Discovery,10 Novel Materials Discovery (NoMaD) repository,11 Computational Materials Data Network,12 Citrine Informatics’ Citrination platform,13 and AiiDA.1

Jain, A., Hautier, G., Ong, S.P., & Persson, K. New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. Journal of Materials Research 31(08), 977-994.

However, most materials property information remains scattered across multiple resources […] We note that Citrine Informatics ( is one commercial entity that is attempting to centralize information collected from diverse sources (both experimental and computational).

Such analyses are culminating in more general “recommender” systems that can suggest new compounds based on observed data. For example, Citrine Informatics has built a recommendation system for suggesting new and unconventional thermoelectric materials.117

McDowell, D. L., & Kalidindi, S. R. (2016). The materials innovation ecosystem: A key enabler for the Materials Genome Initiative. MRS Bulletin, 41(04), 326-337.

Materials data and model repositories include those at NIST, 66 Citrine Informatics’ system, 67 NanoHuB, 68 and the National Data Service’s Materials Data Facility.

Jain, A., Persson, K. A., & Ceder, G. (2016). Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases. APL Materials, 4(5), 053102.

One organization that has made significant progress in establishing a centralized data resource for materials scientists is Citrine Informatics, a company that specializes in applying data mining to materials discovery and optimization.

Zhao, H., Li, X., Zhang, Y., Schadler, L. S., Chen, W., & Brinson, L. C. (2016). Perspective: NanoMine: A material genome approach for polymer nanocomposites analysis and design. APL Materials, 4(5), 053204.

In the private sector, Citrine Informatics has been providing a cloud-based platform with a fast expanding material database containing datasets from multiple sources as well as data-driven material design tools.

Seshadri, R., & Sparks, T. D. (2016). Perspective: Interactive material property databases through aggregation of literature data. APL Materials,4(5), 053206.

Some of these challenges have solutions on the horizon of which the authors are aware of. For example, Citrine Informatics ( has undertaken the task of developing an open access database that will provide data infrastructure for all materials properties, both calculated and measured experimentally.

Guevarra, D., Shinde, A., Suram, S. K., Sharp, I. D., Toma, F. M., Haber, J. A., & Gregoire, J. M. (2016). Development of solar fuels photoanodes through combinatorial integration of Ni–La–Co–Ce oxide catalysts on BiVO4. Energy & Environmental Science.

The authors thank Citrine Informatics ( for data hosting. The raw potentiostat and spectrometer data for the anodic sweep of the CV for each photoanode are available at


Kalidindi, S. R., & De Graef, M. (2015). Materials data science: current status and future outlook. Annual Review of Materials Research, 45, 171-193.

One approach for the creation of materials databases is described in detail in Reference 43. Examples of such databases include Citrine Informatics (44) for physical properties of nearly 30,000 chemical compounds, the Clean Energy Project database (45) for electronic properties of organic compounds used in plastic solar cells, The Materials Project (46) at MIT/LBNL and the Automatic-FLOW for Materials Discovery project (47) at Duke University for large-scale data from electronic structure computations of compounds, CALPHAD (48) for computationally derived thermodynamic properties of various thermodynamic phases, and the Open Quantum Materials Database (49) at Northwestern University for density functional theory–calculated thermodynamic and structural properties of nearly 300,000 compounds.

White, A. (2015). Federal agencies announce materials data challenge. MRS Bulletin, 40(11), 906-907.

In the spirit of open data, Citrine Informatics, a materials data analytics platform, is providing Challenge solvers with access to its database containing almost 3 million materials-property pairs aggregated from a variety of sources. “It became clear that there aren’t that many publicly available data sources from which teams could draw,” says Greg Mulholland, one of Citrine’s founders. “We saw this as an opportunity to be a provider of that programmatic, structured data.”

Bhat, T. N., Bartolo, L. M., Kattner, U. R., Campbell, C. E., & Elliott, J. T. (2015). Strategy for Extensible, Evolving Terminology for the Materials Genome Initiative Efforts. JOM, 67(8), 1866-1875.

Within the materials community, there are several data repository efforts including the NIST Materials Data Repository (, Automatic Flow for Materials Discovery (AFLOW) through the AFLOW Consortium (, Citrine Informatics (, Materials Project (, National Center for Supercomputing Applications (NCSA) Materials Data Facility as part of its National Data Facility (, and the University of Michigan Materials Common as part of its PRISM Center (Predictive Integrated Structural Materials Science: