This post demonstrates the application of models trained on Citrination for optimal experimental design. Using machine learning to identify experiments with the greatest likelihood of improving the objective reduces the number of required experiments to find high ZT materials by 3x.
This is a special time of year for me: the beginning of a new baseball season, and the hope against hope that the Chicago Cubs can finally win a World Series after a 107-year championship drought (here’s a realistic view of what that would look like).
While my research career and work at Citrine focus on materials informatics, I also do sports analytics as a hobby. Baseball stands out among American professional sports for being particularly data-obsessed, and the book and movie Moneyball have elevated baseball analytics to a pop culture phenomenon. Billy Beane, general manager of the Oakland Athletics, famously used advanced data analytics to gain a competitive edge against perennial titans such as the New York Yankees, despite having one of the smallest payrolls in baseball.
We founded Citrine because we want to help customers unlock a Moneyball edge in materials and manufacturing. Just as the Oakland Athletics became four times more efficient (in terms of payroll dollars per win) than the Boston Red Sox by harnessing the power of data, materials and manufacturing companies can make R&D and production dramatically more efficient by analyzing large-scale data about the materials and chemicals they use.
At Citrine Informatics, we believe that everyone should have free access to the materials-related data they need. Our team is working hard to make materials data open, accessible, and useful, fostering a data-themed community of researchers and furthering materials science in the process.
For a materials science student with a background in computer science, finding somewhere to combine work in these two fields is already an uncommon opportunity. Applying machine learning and data mining techniques to answer questions in materials science has been an opportunity that is really unique to Citrine. The projects that I’ve worked on have paired materials intuition with an understanding of data science and machine learning to build models for various material properties. This goes hand in hand with learning how to visualize and communicate machine learning results to people in the materials science field who can be unfamiliar with machine learning concepts. In addition to various modeling projects, I’ve also been able to contribute to the continued development of our in-house machine learning infrastructure. Working as an intern in a small, fast-moving team has been especially rewarding as projects I’ve worked on actually see the light of day. Work that I’ve done has been delivered to customers, and demo’d to investors and potential customers. It has been an honor and a rush to be able to work at Citrine as we try to fundamentally change how data is used in the field of materials science.
As more materials and manufacturing companies consider acquiring data analytics capabilities, a question we often hear from customers is, “How is Citrine’s analytics platform better than what we could build in-house?” We believe that partnering with Citrine enables our customers to focus intensively on their core competencies (e.g., metallurgy, chemistry, process engineering, manufacturing, etc) without trying to simultaneously develop advanced data infrastructure and analytics software systems that lie outside their traditional areas of strength. These areas are precisely Citrine’s core competencies, so we think that customers will find they are best served by leveraging our tremendous strength in the software domain to amplify their own in the physical domain. To help further clarify the advantages of adopting the Citrine materials analytics solution over building a custom in-house solution from the ground up, we will walk through a condensed version of our data analytics pipeline.
Heroku offers many features that make writing & deploying web applications extremely painless. However, their networking options make it moderately difficult to connect securely to resources in EC2. Thankfully, with a little bit of elbow grease you can use stunnel to create a secure network tunnel directly to a machine inside your VPC. This blog post will walk you through all of the necessary steps.
An extract from Jean-Claude Bradley’s Open Melting Point Dataset has been added to Citrination. This data set contains over 3000 curated and validated melting points compiled from a variety of reliable sources.
Citrination helps to make valuable open access data sets, like this one, more accessible and useful.
This blog post will walk you through the steps you will need to follow in order to create a Materials Information File (MIF). The MIF is a flexible, JSON-based schema that has been developed to impose structure on materials data. More information on this file format can be found here.
Citrination continues to grow, and this week we highlight one of our new datasets that contains some base properties of elements commonly found in bulk metallic glasses. This dataset includes elastic constants, hardness, melting temperature and more.
We are always adding more data, so check back frequently!
I’ve spent the summer working at Citrine fresh out of an undergraduate degree where I studied both Materials Science and Computer Science at Stanford. Though I thoroughly enjoyed studying both fields, I found limited opportunities to apply the two together until beginning work here. While companies in entertainment and shopping have reaped the benefits of massive data sets, many fields in the scientific community, notably materials science, have remained largely separate from data science even as they amass huge quantities of data. Working with materials data at Citrine has made me reflect on differences between how data scientists and materials scientists can perceive data in different ways, and how insights from data science can benefit materials research.