We’ve all heard it:
“You need lots of data before you can do AI.”
It’s become accepted wisdom in MBA programs, management playbooks and R&D teams. But in materials and chemicals, it often puts the cart before the horse.
The Reality: Data Is Expensive and Scarce
In many industries, data is abundant. In ours, it isn’t.
Each datapoint can cost hundreds or even thousands of dollars to generate. Experiments take time. Equipment is limited. And when you’re working at the cutting edge, the data you actually need often doesn’t exist yet.
This creates a paradox:
- You’re told you need large datasets to use AI
- But generating those datasets is slow, expensive, and uncertain
The result is that teams delay AI adoption while waiting to get data “AI-ready”.
In practice, that often means waiting indefinitely.
The Default Response: Data First, Value Later
Faced with this challenge, many organizations launch large-scale data initiatives.
They aim to:
Aggregate historical data
Standardize units and definitions
Build centralized infrastructure
These efforts are well-intentioned, but they tend to follow a familiar pattern:
Months spent debating data definitions
Years spent building systems
Significant investment in infrastructure
And at the end, there’s no guarantee that:
The data is relevant to the problems you want to solve
The structure supports AI workflows
The effort delivers measurable value
In some cases, teams end up with well-organized data that still doesn’t move the needle.
A Better Approach: Start with AI
There is a more effective way to approach this problem.
Instead of waiting for perfect data, start with AI that is designed for the reality of materials R&D:
Works with small, imperfect datasets
Handles missing, noisy, and inconsistent data
Automatically harmonizes and normalizes data behind the scenes
But more importantly:
Modern AI doesn’t just use data — it helps you decide what data to generate next.
Let the Model Tell You What Data It Needs
In traditional R&D, experimentation is often guided by intuition and trial-and-error.
With AI, that changes.
The platform can:
Identify which experiments are most likely to succeed
Highlight where the model is uncertain
Recommend experiments that will reduce that uncertainty most efficiently
In other words:
The model tells you which data it needs.
Instead of trying to build a large dataset upfront, you:
- Start with what you have
- Use AI to guide the next best experiments
- Generate only the data that actually improves outcomes
This is a fundamentally more efficient way to work.
Prove Value First
With the right tools in place, teams can run focused, high-impact projects:
Optimize a formulation
Improve a key property
Reduce experimental cycles
These projects are not just about prediction—they are about learning faster.
By prioritizing the most informative experiments, teams:
Reduce wasted lab work
Reach target performance faster
Build useful data as a byproduct of progress
Let Use Cases Shape Your Data Strategy
When AI is applied early, something important happens:
You begin to understand your data through the lens of real use cases.
Instead of asking:
“What data should we collect?”
You start asking:
“What data actually drives outcomes?”
This shift is critical.
It grounds your data strategy in:
Real workflows
Real decisions
Real impact
Not assumptions or generic best practices.
Driving Adoption from Within
There’s another benefit that’s often overlooked: people buy in.
When scientists and engineers see AI delivering results:
They understand why structured data matters
They become more engaged in improving data quality
They contribute to building reusable datasets
Data quality improves not because it was mandated—but because it is clearly valuable.
AI as the Starting Point, Not the End
AI is often positioned as the final step in a digital transformation journey.
In materials and chemicals, it should be the opposite.
AI is not the end of your data journey.
It’s the starting point.
- Start with tools that can work with the data you have.
- Use them to deliver value.
- Let that value guide how your data evolves.
Don’t Put the Cart Before the Horse
The idea that you need perfect, large-scale datasets before using AI is not just impractical in our industry—it’s counterproductive.
Instead:
- Start small
- Prove value
- Build momentum
- Let data follow
Because in materials product development, the fastest way to better data… is to start using it.