What We've Learned

Artificial Intelligence is a powerful tool with enormous potential business benefits. However, in the materials and chemicals industries, there are unique challenges to overcome.


1. Small Data

“BIG DATA” is a phrase commonly associated with AI and machine learning. Companies with millions of customers or thousands of sensors can quickly and cheaply amass billions of data points. In materials and chemicals, each data point can cost months of time and tens of thousands of dollars. A machine learning approach for materials therefore has to be tailored to small, sparse data sets.

big vs small data

2. Diverse Data Sources

Materials and chemicals data comes from many different sources—test data, simulation data, reference data, supplier data sheets—and in different formats—microstructure images, processing instructions, chemical formulas, x-ray diffraction data, etc.

To use this diverse data, materials and chemicals companies need two things:

Data ingestion
  1. An easy way to get legacy data into a centralized database
  2. A way to structure the data in a common format that has the right mix of flexibility and standardization

3. Converting Information into Machine-Readable Data

Materials data, including x-ray diffraction patterns and microstructures, is more complex than simple text and dates, etc. The letters and numbers themselves in a chemical formula are not what’s important, it’s what they represent that is critical to analysis.

Materials descriptor library schematic

4. The Prediction Task Is More Complex Than Classifying Cats and Dogs

Many typical AI applications involve pattern recognition: a matching of common cases. Based on a large set of already-classified animal photographs, can a model accurately predict whether the next photo is a dog or a cat? R&D scientists are not interested in finding existing common cases, but rather exploring higher performing materials that push the limits of existing structures and properties.

5. AI for Materials Needs to Understand Physics

The laws of physics and chemistry are not applicable in typical machine learning applications. In the materials and chemistry space, they have to be obeyed! Understanding all the rules governing the space you work on is an asset that can be used to narrow down the candidates you are exploring, or increase the accuracy of a machine learning model by programming in known relationships between parameters and results, leading to better predictions and quicker results.

6. Failed Data Is Rare

Sample bias in datasets is common across all AI applications, including materials and chemicals. One form of this in particular affects materials and chemistry. Machine learning models require data that includes a range of measurements. It needs failures as well as successes; without it, the model will never predict a failure. Scientific publications tend to bias toward successful results, and data from previous failed experiments is often not available.

7. Uncertainty Is Critical

In many commercial applications of machine learning, like consumer preferences or business intelligence, uncertainty in AI predictions has little consequence. For example, a retailer would not want to include “we are 60% certain that you’d like to buy these shoes” in a marketing newsletter. In fact, this might be detrimental to sales. However, in the materials industry, product developers need to know the uncertainty in model predictions in order to act confidently when deciding which experiment to perform next, as it often requires a large investment in time, money, or resources.

Probability chart

8. AI Models Need to Be Understood by Domain Experts and Your R&D Team, Not Just Data Scientists

As R&D is digitized in the materials and chemicals industry, it is important for scientists to be able to scrutinize and sense check models. Additionally, new scientific insights or IP created via an ML-driven approach need to be shared among the researchers in order to facilitate knowledge transfer and upskill the team. In other words, a typical “black box” machine learning software is not fit for purpose.

9. Scalability

Many materials and chemicals companies now have a centralized data-science team who is capable of structuring data, training a machine learning model, and communicating results to product developers on a project.

However, we’ve seen companies get into trouble as they attempt to scale this effort to multiple projects or multiple business units. When scaling a materials informatics effort, companies have to consider database and code maintenance, version control, model deployment, security and access control, data management, continuous deployment/continuous integration pipelines, model and data reusability, and infrastructure configuration—all of which require a significant investment in software engineering.

10. Security: Protecting Your IP

Segregation and encryption of customer data

Data security is of utmost importance in the materials and chemicals industry. Unique formulations recipes, processing steps, and test/characterization data are what give materials and chemicals companies a unique competitive advantage. Digitizing this data opens up new avenues for data loss or unauthorized access of IP. Materials and chemicals companies should carefully consider how they store, share, and manage their materials data when exploring AI applications.


More detail on the challenges in AI for materials and how Citrine overcomes them is available in our white paper: Challenges in Machine Learning For Materials.