External Research, Blog

Understanding corrosion is important for material selection in product design. As new alloy types are developed, such as HEAs, new methods are needed to predict localized corrosion. Machine Learning (ML) is suited to this as it can cover a wide range of alloy types and use a broad base of inputs such as environmental factors, processing conditions, microstructure, and simulation data (e.g, thermodynamic and kinetic quantities). But a machine learning model needs data on which to be trained.

Background on corrosion resistance

chimneys in chemical plant, corrosion resistance

In metal alloys, corrosion resistance typically comes from a protective so-called passive oxide film, just a few atoms thick, that forms on the surface of the alloy in response to electrochemical reactions between the metal surface and the aqueous environment. For corrosion to occur, chemical species in the environment have to break through this passive film, and that usually results in localized corrosion, inhomogeneously distributed over the metal surface.

Machine learning for localized corrosion

Traditionally, empirical relations such as the pitting resistance equivalence number (PREN) [1] are used to predict localized corrosion, but these types of relations are restricted to specific alloy types and very limited ranges of composition, and can only report relative corrosion resistance. While such methods are useful within specific material classes, they fail when extending material design to new alloy types such as high entropy alloys (HEAs). Such empirical relations do not take into account any additional information such as environmental conditions or microstructure.

With machine learning algorithms, predictive models can cover a wider range of compositions, for example from Fe-based stainless steels to Al alloys, without having to build separate models for each class of material. ML algorithms can also use a more diverse set of inputs, such as environmental factors (e.g., temperature and pH), processing conditions (e.g., annealing temperature), microstructure (e.g., grain size, matrix composition), and simulation data (e.g, thermodynamic and kinetic quantities), all of which affect alloy corrosion resistance.

However, such machine learning models can only be created if relevant corrosion resistance data is available. Most publicly available datasets [2, 3] are focused on either measurements of uniform corrosion (which occurs on a much slower timescale than localized corrosion in corrosion-resistant alloys) or measurements on specific materials, both of which have limited use for building an ML model to predict localized corrosion for a diverse set of alloy compositions.

What has the Citrine team done?

With funding from the US Department of Energy, the Citrine External Research Department, collaborating with researchers at Ohio State University and University of Virginia, collected and published a dataset of measurements on corrosion-resistant alloys related to localized corrosion resistance. For the first time, a compositionally broad database of localized corrosion performance metric has been collected and distributed in a form suitable for machine learning.

A schematic overview of the dataset. The data were collected from 85 publications, with materials which fall into 4 material classes. There are 6 datasets reporting a total of 8 different corrosion metrics, with 1274 total records. Figure and caption licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). Original source: “Electrochemical metrics for corrosion-resistant alloys,” by C. Nyby et al, Scientific Data 8, 58 (2021).
A t-SNE plot of the material compositions

A t-SNE plot of the material compositions which shows general qualitative relationships between individual alloy compositions in a section of the dataset. The clusters of data represent different material classes, while the overall spatial distribution indicates the diversity of compositions present in the dataset. 

Why was Citrine in a good position to do this?

The Citrine team has deep expertise in handling materials data and is often called on to help companies ingest complex, domain-specific data into our platform. By coordinating efforts across three collaborating institutions, the External Research Department ensured that all data and relevant metadata (such as processing history and testing environment) was accurately recorded and stored. 

How will this work be used?

The dataset is now publicly available for researchers and can be used to build new statistical models covering a large range of compositions, enabling more rapid design and assessment of novel corrosion-resistant alloys.


  1. Lorenz, K. & Medawar, G. Über das Korrosionsverhalten austenitischer Chrom-Nickel(Molybdän-)Stähle mit und ohne Stickstoffzusatz unter besonderer Berücksichtigung ihrer Beanspruchbarkeit in chloridhaltigen Lösungen. Thyssen Forschung 1, 97–108 (1969).
  2. Ricker, R. CORR-DATA. https://doi.org/10.18434/M3TH4R (1997).
  3. National Association of Corrosion Engineers. Corrosion survey database (COR SUR). https://search.library.wisc.edu/catalog/9910350218802121 (2002).

Find Out More

You can read some of the peer-reviewed papers written by the external research department of Citrine here.