Home / Platform / Citrine DataManager

Citrine | DataManager

A toolkit to transfer your company’s proprietary data so your teams can use it as inputs to Citrine VirtualLab, the generative AI formula optimizer

What it is:
Citrine DataManager (CDM) refers to software tools and processes that enable companies to ingest and normalize proprietary chemical formulations and material specifications so they can be analyzed by Citrine’s generative AI tools.
Why this matters to our customers:
By automating the data ingestion process and using a data model that was explicitly designed for generative AI, CDM gives clients a way to use all the data they have – not just the data that is convenient to record.

Structuring data for AI to be effective
CDM structures your data so AI algorithms can succeed, using machine learning (ML) and simulation techniques to reduce the number of experiments needed to generate desired outcomes.

Customer Responsiveness
When you’re faced with new customer requirements, you need to quickly find out which materials could be most easily and cheaply adapted to meet the demand. CDM enables you to do this by preparing your data for AI.

Cost Reduction
CDM arranges data for algorithmic use so you can systematically consider far more parameters than before. This can drive down operating expenses by rationalizing ingredients for bulk purchases, using cheaper alternatives to achieve the same result, or changing process settings to use less energy or increase consistency.

Meeting Changing Regulations, Becoming More Sustainable
Whether you face evolving regulations around conflict minerals, restricted substances, or emissions and waste, CDM’s good data management helps you spot problematic inputs. It also powers AI models that can optimize performance criteria while also reducing dependence on specified materials, improving recyclability, or reducing the carbon footprint of production.

What it replaces:
Many companies still store chemical and formula data using a wide variety of methods, often differently across teams and business units. Some companies have moved to digital systems like electronic lab notebooks (ELNs) and laboratory inventory management systems (LIMS) to aggregate research data. All too often, though, these systems become a place “where data goes to die” because no one sees benefit from them. Citrine unlocks the hidden power in this data.
How it works:
Start by defining a project’s scope and start to identify what relevant data you have already to support that project. With CDM, you’ll include information about how your materials were created, observed properties, and performance in different environments. Data can be ingested quickly via CSV and other spreadsheet formats, or even directly using our API.

Using Citrine, you need surprisingly little data when using AI to design new candidate materials or chemicals. you may be tempted to wait to employ AI until reaching an arbitrary standard of ‘data cleanliness’, but this is not necessary to start seeing benefits from Citrine’s AI platform.
Data ingestion
CDM includes a flexible Python interface to automate data ingestion. Researchers can inspect the data to spot any outliers so they can clean and correct data if needed.

Simpler systems such as ELN/LIMS typically warehouse raw datafiles, requiring scientists to extract files and run them through proprietary software before they can be used. CDM streamlines this process, using a library of “ingester” agents to import data more quickly from common instrument files such as those from x-ray diffraction equipment (.xrd) and crystal structure programs (.cif). The data is then more valuable, as it can be easily analyzed and compared.

CDM’s descriptor libraries convert information like chemical formulas to incorporate relevant features automatically.

Our open-source materials data model (Graphical Expression of Materials Data, or GEMD) is at the core of CDM. We developed GEMD to work across many different materials and chemical classes. It can systematically specify all steps in a material’s processing history, as things like processing conditions and batch measurements are relevant to a material’s final properties.

Each step from procurement to final product is visualized in the material history; and investigators may ‘click’ to uncover details. As in this example, each stage is recorded, including, both the specified processing parameters (e.g., 200°C) and the actual measured parameter in that run (e.g., 199°C). The color-coded graphical user interface is easy to use, and users can also use the Python client to add, review, and revise data.

Learn more:

White Paper

data science team

Amplify Your Data Science Team’s Impact

See how a 3rd party Materials Informatics platform can free up your data science team to do what they do best.



Digitalization and Materials Informatics

Discover the critical success factors for digitalization in materials and chemicals.

White Paper

Return on investment for artificial intelligence project in materials

How an AI-Enabled Data Infrastructure Hits the Bottom Line

Explore how the Citrine Platform can systematically boost ROI for new product development.

See for yourself: