Don't Wait for "AI-ready" Data

We’ve all heard it:

“You need lots of data before you can do AI.”

It’s become accepted wisdom in MBA programs, management playbooks and R&D teams. But in materials and chemicals, it often puts the cart before the horse.

The Reality: Data Is Expensive and Scarce

In many industries, data is abundant. In ours, it isn’t.

Each datapoint can cost hundreds or even thousands of dollars to generate. Experiments take time. Equipment is limited. And when you’re working at the cutting edge, the data you actually need often doesn’t exist yet.

This creates a paradox:

You’re told you need large datasets to use AI
But generating those datasets is slow, expensive, and uncertain

The result is that teams delay AI adoption while waiting to get data “AI-ready”.

In practice, that often means waiting indefinitely.

The Default Response: Data First, Value Later

Faced with this challenge, many organizations launch large-scale data initiatives.

They aim to:

Aggregate historical data

Standardize units and definitions

Build centralized infrastructure

These efforts are well-intentioned, but they tend to follow a familiar pattern:

Months spent debating data definitions

Years spent building systems

Significant investment in infrastructure

And at the end, there’s no guarantee that:

The data is relevant to the problems you want to solve

The structure supports AI workflows

The effort delivers measurable value

In some cases, teams end up with well-organized data that still doesn’t move the needle.

A Better Approach: Start with AI

There is a more effective way to approach this problem.

Instead of waiting for perfect data, start with AI that is designed for the reality of materials R&D:

Works with small, imperfect datasets

Handles missing, noisy, and inconsistent data

Automatically harmonizes and normalizes data behind the scenes

But more importantly:

Modern AI doesn’t just use data — it helps you decide what data to generate next.

Let the Model Tell You What Data It Needs

In traditional R&D, experimentation is often guided by intuition and trial-and-error.

With AI, that changes.

The platform can:

Identify which experiments are most likely to succeed

Highlight where the model is uncertain

Recommend experiments that will reduce that uncertainty most efficiently

In other words:

The model tells you which data it needs.

Instead of trying to build a large dataset upfront, you:

Start with what you have
Use AI to guide the next best experiments
Generate only the data that actually improves outcomes

This is a fundamentally more efficient way to work.

Prove Value First

With the right tools in place, teams can run focused, high-impact projects:

Optimize a formulation

Improve a key property

Reduce experimental cycles

These projects are not just about prediction—they are about learning faster.

By prioritizing the most informative experiments, teams:

Reduce wasted lab work

Reach target performance faster

Build useful data as a byproduct of progress

Let Use Cases Shape Your Data Strategy

When AI is applied early, something important happens:

You begin to understand your data through the lens of real use cases.

Instead of asking:

“What data should we collect?”

You start asking:

“What data actually drives outcomes?”

This shift is critical.

It grounds your data strategy in:

Real workflows

Real decisions

Real impact

Not assumptions or generic best practices.

Driving Adoption from Within

There’s another benefit that’s often overlooked: people buy in.

When scientists and engineers see AI delivering results:

They understand why structured data matters

They become more engaged in improving data quality

They contribute to building reusable datasets

Data quality improves not because it was mandated—but because it is clearly valuable.

AI as the Starting Point, Not the End

AI is often positioned as the final step in a digital transformation journey.

In materials and chemicals, it should be the opposite.

AI is not the end of your data journey.
It’s the starting point.

Start with tools that can work with the data you have.
Use them to deliver value.
Let that value guide how your data evolves.

Don’t Put the Cart Before the Horse

The idea that you need perfect, large-scale datasets before using AI is not just impractical in our industry—it’s counterproductive.

Instead:

Start small
Prove value
Build momentum
Let data follow

Because in materials product development, the fastest way to better data… is to start using it.

Don’t Wait for “AI-ready” Data