Analytics Platform: Build or Buy?

As more materials and manufacturing companies consider acquiring data analytics capabilities, a question we often hear from customers is, “How is Citrine’s analytics platform better than what we could build in-house?” We believe that partnering with Citrine enables our customers to focus intensively on their core competencies (e.g., metallurgy, chemistry, process engineering, manufacturing, etc) without trying to simultaneously develop advanced data infrastructure and analytics software systems that lie outside their traditional areas of strength. These areas are precisely Citrine’s core competencies, so we think that customers will find they are best served by leveraging our tremendous strength in the software domain to amplify their own in the physical domain. To help further clarify the advantages of adopting the Citrine materials analytics solution over building a custom in-house solution from the ground up, we will walk through a condensed version of our data analytics pipeline.

The Data Analytics Pipeline

Step 1: Training data 

The most important prerequisite for harnessing analytics to drive R&D and manufacturing is a very large, systematically organized and structured training data set. These training data are used to inform machine learning models that then can make predictions of materials behavior.

The data relevant to problems of interest for our customers is likely scattered across published literature, siloed internal databases, and even Excel spreadsheets and PowerPoint decks. Building an in-house solution would require combining all the sources into a single database format, significant manual data entry, and building an effective interface to enable broad access to the data. Our highly scalable Citrination platform is already the world’s largest structured and open collection of materials information, accessible via graphical interface or API, and we use proprietary extraction technology to rapidly aggregate new data from papers and patents specific to our customer’s areas of interest.

Step 2: Feature Selection

Applying machine learning to materials involves asking algorithms to correlate things we know about materials (i.e., features) with things we’d like to know about materials (i.e., their properties under conditions of interest). This raises this question, which features should be used to maximize the predictive power of analytics?

Designing feature sets from scratch and thoroughly statistically validating them would likely require reading a large body of literature on feature design in order to come up to speed. Citrine has developed a large, proprietary library of features that we have proven out and validated across a wide variety of materials classes and problems. Further, we are constantly scanning the literature to incorporate the best-in-class “open-source” features as they become available.

Step 3: Software package and algorithm selection

Citrine’s platform is an alternative to making your own set of choices of analytics package(s) and machine learning approach(es).

You have two important choices to make, involving a series of tradeoffs. The first choice is which analytics/machine learning package to use; none of the available tools are tailored for materials or manufacturing data, they range from free to extremely costly, and likely require surmounting a substantial learning curve to master one or more of them. The second choice is which machine learning algorithm to use for your problem––testing and validating each method separately to evaluate which is best for each case will be necessary. Citrine’s platform, however, is a plug-and-play solution, in that we can take care of algorithm selection and validation for you. You can have confidence that you are taking advantage of the very best that the machine learning community has to offer.

Step 4: Model validation

Like in all statistical and modeling techniques, machine learning is prone to the deadly “garbage in, garbage out” phenomenon. In other words, it’s relatively easy to coerce statistical software packages to spit out numbers, but far more problematic to assess their quality. Once you have a model, how do you know that it is meaningful and that you can trust it?

Like with any technical discipline, mastering the intricacies of machine learning takes months (optimistically) to years (realistically) of experience. The choices in this case are either to embark on a costly and time consuming journey of self-education, or to hire expensive data scientists. Citrine’s platform implements best practices across the board in terms of model validation. With our rigorous validation metrics, you’ll know exactly how accurate you can expect our models to be in situations you care about.

Step 5: Model deployment

Assuming you have created models that you trust, how do your scientists access these models?

To deploy models in-house, without a substantial amount of engineering, you will be limited to having one or two in-the-know scientists manually running models they have built locally on a single computer, which is a major bottleneck to access. Citrine’s platform exposes machine learning models with easy-to-use web interfaces, accessible by anyone with security credentials on any device with a web browser. Our goal is to reduce the barrier to wide deployment of models to zero.

Step 6: Model improvement

You won’t want your models to remain static. As you gain access to new data and information, you will want to re-build your models to take advantage.

Generally, build-it-yourself modeling involves going back to Step 1 when new data are acquired. Citrine’s platform enables users to submit new data to the models; the models then re-train in the background with no additional effort on the part of users. Citrine also makes it easy to see various predictions that were made with different model versions.

Step 7: Model extensions and wrappers

Developing an accurate, predictive model of materials behavior is a major triumph. Unfortunately, it is not likely your end goal. Instead, you may be asking questions such as, “How could I modify my current chemistry to improve a target property?” or “Are there possible solutions well outside of our current search space that we’re just not considering?” In these cases, you will need to additionally introduce local optimization, global optimization, and multivariate constraint modeling approaches.

In-house, you will need to write additional code to wrap optimizers around your existing models. Trust us, none of your materials engineers wants to write and bug-test routines like gradient descent from scratch, or make Numerical Recipes in C their newest bedtime reading selection. Citrine’s platform has local optimization (tweaking existing formulations to reach a local optimum), global optimization (true green-field searching), and multiple constraint optimization (lining up many properties in target windows) all built in, off the shelf.

Step 8: Software maintenance

While software has become dramatically more sophisticated over time, it is not yet smart enough to heal itself. Things break, bugs emerge, and users demand new features.

Unfortunately, in-house software development isn’t a one-time expense––it’s an unpredictable mortgage. You’ll have to account for new feature development, ongoing maintenance costs, possible downtime, and loss of productivity should anything bad happen. Citrine stands behind our platform 100%. We will do whatever it takes to make sure your team is delighted with our software all the time––whether giving tutorials, answering support emails or calls, fixing bugs, or building new features. We are only successful if you are successful.

Partner with Citrine

Citrine brings the immense power of large-scale analytics
to bear on materials and chemicals data, building these analytics on top of the world’s largest and most sophisticated materials data infrastructure. We believe that a data-drive paradigm is the future of materials and manufacturing, and our customers agree. You have an important choice: build an analytics solution in-house, or engage with the recognized leaders in materials data analytics at Citrine to put our groundbreaking platform to work in your organization. Working with Citrine will save substantial time and money and deliver best-in-class insights to your organization quickly.

To start a proof-of-concept with us or to learn more, email info@citrine.io and connect with our team.