What We've Learned

Author: Hannah Melia

Although many companies are already reaping the rewards of using AI, others are being held back by myths around the need for large datasets. While representative data can propel an AI project forward quickly, you don’t need to wait for large, structured datasets to start your AI journey.

AI can make you more competitive

Our customers are seeing an 80% reduction in the time to formulate adhesives, record breaking yields in fine chemicals, and 23% reductions in ingredient costs in CPG. So most of the companies we speak to have, or want to have a digital transformation strategy and know that they need to start using AI to stay competitive.

Why do some companies put it off?

The return on investment is compelling. However, just like me putting my diet off until next week, some companies are waiting. Perhaps because they want to be fast followers, letting the early adopters sort out the kinks; perhaps because they are a little overwhelmed by buzzwords surrounding AI; perhaps because they have been told they need a lot of data to have success.

If you start now, you won’t be an early adopter

The word is out, AI works, and AI works in your area of product development. You can read our case studies to find out more. That means that many companies are already using AI to make their products better and cheaper, faster. Citrine was founded in 2013, we’ve worked through about 100 engagements and learned from that experience. That is why our platform is easy to use and requires no code.

Change Management

Adopting AI is a change to people’s day-to-day working styles and as such, of course presents a challenge. However, it is a small part of an overall digital transformation for many companies and an excellent way to show the value of good data management. By doing high value AI projects you can show your team the value of their data and motivate them to participate in wider digitization programs. Our team has experience working with companies at all stages of digital transformation.

Are you embarrassed by your data?

Some of our customers start with no data, others with 20-30 data points, and others with large historic data repositories, which can be paper-based, stored in on-prem databases, or in the cloud. Oftentimes, even if there is a lot of data, it is messy, siloed, and not well documented. Over the last 10 years we have seen it all.

Luckily, our team has the right experience and skillset to help you on your AI journey, wherever you start.

No Data

Sometimes, either by necessity, or choice, our customers start having carried out no experiments on the exact subject area of the project. In this case, we work with them to design an initial set of experiments to carry out. Similar to a Design of Experiment matrix, but stripped down to cover the search space in the fewest experiments possible. The aim is to prime the AI model so that it can guide future experiments. Our AI models can do two different things: they can tell us where there is a lot of uncertainty in a prediction, so we need a data point there, and they can tell us which experiments are most likely to hit targets. Sequential Learning (the process by which groups of 5 or so experiments are suggested, run, the results inputted and the AI model retrained and used to suggest the next set of experiments) uses these two bits of information to get closer and closer to the objectives of the project.

Case Study
In our case study, learn how KCARBON went from a cold start (zero data) to a model that suggests successful recipes for additives for carbon fiber in 5 months.

Some Data – Different sources

They might have an analogous product, or experiments on the right product where only some of the individual properties are recorded. In this case, the platform will use transfer learning and our unique GEMD data model to harmonize data from different sources and use the available data to train the model.

White Paper
Check out our white paper on overcoming data scarcity with transfer learning.

30 Relevant Data Points

How much data exactly you need to create a predictive AI model depends on how many properties you are trying to optimize and how representative the data is. A cluster of a thousand data points outside your search space is not going to help. 30 well spaced, relevant data points can be enough to start the sequential learning process.

Case Study
In our case study with SLAC National Accelerator Laboratory, learn how the first 16 data points generated by the system were used to train the AI model.

How Can the Citrine Platform Work with so Few Data Points?

Data Ninjas

Our team is able to get whatever data you have into the platform. Our Professional Services Team is able to scan paper forms, perform data entry, create data pipelines and whip spreadsheets into shape.  They have experience integrating different LIMS, ERP and ELN systems and our graphical data model never says no to more data.

Data Strategy

Our experienced team also performs data readiness projects, assessing the different sources of data, understanding your business and where value comes from, and plotting out a strategy of how to get which data into the platform in what order. By starting AI before all your data is “Ready” you gain the advantage of understanding which data you actually need to drive business value without boiling the ocean.

Leveraging Your Team’s Knowledge

As a no-code, graphical AI tool, everyone can use the platform. That means that your expert product developers can easily add their knowledge to the AI model to give it a leg up. They can point it in the right direction by narrowing the search space to feasible products and they can ensure that all the important properties are taken into account when setting objectives.

Uncertainty Quantification

This is the geeky bit. Over 10 years, our world-renowned machine learning experts have been working away to make sure we can work with small data. If you are a data scientist you can read about it below. If not, just be happy that it works, and you don’t have to fiddle with it to make it work!

Read about uncertainty quantification

Want to learn more? Contact our team for a demonstration of the Citrine Platform and see how it can work for your business.