To kick off our blog, we’ll start with the fundamentals. Materials informatics is the application of data science, machine learning, and scientific domain knowledge to gain novel scientific, processing, or business insights.

Let’s break down the three components of a successful materials informatics initiative.

Smart Data Infrastructure

A smart data infrastructure is crucial as AI models need trusted, structured data to work. It must be smart in three ways:

  1. It needs easy, but standardized ways of taking in data
  2. It needs to be able to take in unstructured data (such as micrographs) and convert this to structured data points (such as particle size) that domain experts have determined are important
  3. It needs to enable researchers to easily access and interpret the data stored
Flexible, materials-specific data model

Once the data is in the system, there are different ways it can be used. Straightforward searches for particular data points, or comparisons of materials’ properties, are part of materials informatics. But you can also use the data in more sophisticated ways to answer complex questions and gain new scientific or business insights.

Machine Learning in Materials

Machine Learning algorithms can be used to predict how different inputs (eg. chemical composition, processing conditions, ingredients) lead to different outputs (eg. properties, performance). They take in initial test-sample data and create a hypothesis of how much different aspects such as composition and processing conditions are affecting the resulting material properties. Machine Learning models are only as accurate as the data on which the model is trained, so, with sparse data, the first model might not be very accurate. As long as you know its uncertainty, you can suggest candidates for the next round of testing, and more data will improve the model.

This sequential learning process has been shown to be more efficient in optimizing properties than trial and error. 

domain knowledge integration

High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates.
Julia Ling, Maxwell Hutchinson, Erin Antono, Sean Paradiso, Bryce Meredig. Integrating Materials and Manufacturing Innovation, 6(3),207–21 (2017).

Domain Knowledge

Integrating domain expertise into a materials informatics workflow:

  • Captures important scientific quantities and relationships as input features 
  • Limits model predictions to only physically possible candidates
  • Incorporates known physical relationships into the modeling process, so that the model does not have to re-learn known physics and chemistry
  • Helps scientists interpret results and determine critical next steps in new product development
Artificial intelligence

What does Materials Informatics do?

This combination of data infrastructure, expert domain knowledge, and AI can do more than just optimize the properties of a new material. It can also:

  • Optimize the processes used to make the material
  • Quickly understand how different formulations can achieve the same properties
  •  Rationalize ingredients used across a portfolio
  •  Inform product development investment decisions by surfacing likely high-value R&D projects
Flexible, materials-specific data model
Capture Knowledge

An additional benefit of adopting materials informatics across an organization is capturing valuable knowledge–previously held in lab notebooks and in the minds of scientists and product developers–and sharing it across an organization. Important knowledge and scientific insights aren’t lost when senior employees retire or move to a new organization, or when students or researchers graduate or pursue a new opportunity. Materials Informatics ensures that new projects and decisions stand on the shoulders of giants.