In this episode, Dr. Meredig and Dr. Warren discuss:
- How a shared improv background has surprisingly made a positive impact on their scientific careers and technical communication skills.
- The history of the Materials Genome Initiative, its role in the materials innovation ecosystem, and its future outlook.
- Successful models of collaboration between policymakers, national labs, academic research groups, and for-profit companies driving innovation in materials research and development.
- The role the National Institute of Standards and Technology (NIST) plays as a convener and community builder in the field of data-driven materials science.
- The importance of high-quality curated data in a materials R&D ecosystem.
“What people really want to see is the advantage of having a data management plan, and that’s where we’re seeing a change. Now, all you have to do is open a journal to see a lot of high-quality articles on data-driven methods and machine learning for materials discovery, and that’s an easier case to make to the community.” — Dr. Jim Warren
Dr. Jim A. Warren is the Technical Program Director for Materials Genomics in the Material Measurement Laboratory of NIST, where he has been a scientist since 1992. He came to NIST after receiving his PhD in Theoretical Physics at the UC Santa Barbara and his BA (also in Physics) from Dartmouth College. In 1995, Jim co-founded the NIST Center for Theoretical and Computational Materials Science.
Dr. Warren is currently focused on the Materials Genome Initiative, a multi-agency initiative designed to create a new era of policy, resources, and infrastructure that supports U.S. institutions in the effort to discover, design, develop, and deploy advanced materials twice as fast, at a fraction of the cost.
Bryce Meredig: Welcome to DataLab, the materials informatics podcast with Bryce Meredig, Chief Science Officer at Citrine Informatics. In this episode of DataLab, we talk with Dr. Jim Warren, the Director of the Materials Genome Initiative at the National Institute of Standards and Technology, or NIST. We talk with Jim about the history of the Materials Genome Initiative, and lessons he has learned while leading related projects at NIST. We also discuss Jim’s perspective on materials data, the materials science community, and data infrastructure.
Bryce Meredig: Our guest on the podcast is Dr. Jim Warren from the US National Institute of Standards and Technology, or NIST. Jim is the Technical Program Director for Materials Genomics at the Material Measurement Laboratory of NIST. He’s been at NIST for a long time, since 1992. He joined after finishing his PhD in physics at UC Santa Barbara and his AB in physics from Dartmouth. Jim is also the co-founder of the NIST Center for Theoretical and Computational Material Science. More recently, Jim has been very focused on the Materials Genome Initiative, or MGI, which is a multi agency initiative designs to create a new era of policy, resources and infrastructure that support US institutions and the effort to discover, design, develop and deploy advanced materials twice as fast at a fraction of the cost.
Bryce Meredig: Jim, thank you so much for being with us and being willing to be our first podcast guest.
Jim Warren: It’s absolutely my pleasure. That was a great introduction and a great actual description of the initiative. It’s almost like I wrote it myself.
Bryce Meredig: It probably sounded very familiar to you. Jim, before we dive into the more technical meat of the discussion today, I actually wanted to bring up an unusual connection that we have in common, something we’ve chatted about over our many interactions at conferences and workshops, which is the fact that we both have improvisational theater, improv, training. I know you’ve performed improv in a number of different groups at different points of your life. Could you share a little bit about that story?
Jim Warren: Wow, okay. We’re getting off into the deep and dark past. But I was in an improv group in college, and it was transformative for my career personally, because it essentially eliminated all anxiety and fear of being on stage and being at the end of a question like the one you just asked and not knowing exactly what to say. It’s been extremely potent, a way of sort of managing the flows of information and the interactions with colleagues. I recommend it as a skill for anybody that’s interested in communication.
Bryce Meredig: Yeah, I completely agree. I know that at Northwestern where I did my doctoral work, there was a lot of interest in bringing in improv training for scientists and engineers specifically for some of the reasons you mentioned, because communication is such an important part of the job and the career. Once you’ve done some of the ridiculous things on stage that you end up doing in improv, it makes giving a talk at a conference seem comparatively lower pressure, I would say. So, I’ve had a very similar experience.
Jim Warren: Yeah. Of course there are risks because sometimes your brain runs down avenues that it probably shouldn’t.
Bryce Meredig: That’s true. You always are taught not to self censor in improv, but you have to do a little bit of that sometimes at a conference, of course. Jim, let’s talk about MGI, the Materials Genome Initiative a little bit. I think this is something that a lot of people who are going to listen to this podcast have heard of but may not know a lot of the details. I think there’s no one better than you to help educate people on what the MGI is about and why it’s important. Could you describe in a little more detail what it is and what you think it’s role is in the materials innovation ecosystem?
Jim Warren: Sure. Your brief description was very accurate but it was very tightly packed, so let’s walk through it. As you said, it’s an initiative. In essence, an initiative is nothing more than a statement by the executive, in this case, it was President Obama that rolled it out, and now it continues under the current administration, that the agencies should work together so that they’re more than essentially the sum of their own mission to achieve a specific objective. In this particular case, the objective is this acceleration of the discovery design development deployment of new materials for insertion into manufactured products, essentially.
Jim Warren: So, the agency work together and they try to find the best ways to enable this, and really we focus on this development of this so called materials innovation infrastructure, which are just the elements of science and engineering that is computation, experiment, and data management, but the idea here is that we want to get those as tightly integrated as possible, and that’s really what the infrastructure is about. Each of the agencies is pursuing that in their own way, whether it’s the NSF funding research along these lines through special programs like their DMREF effort or NIST focus largely on the data problem, whether it’s the quality of the data or how do you manage the data, where do you put the data, questions like that. Then each of the Defense Department agencies that’s involved will have their own specific defense mission, and then all of the insulary infrastructure to support those missions.
Bryce Meredig: Jim, you mentioned that one of the key goals of the MGI is for the participating agencies to collaborate or establish synergy where there are more than some of their parts. What have you seen be successful with collaborations or even challenges in collaborations across these large US government organizations?
Jim Warren: Right. One of the real measures of our success is our ability to actually do that collaboration. There are a couple of really nice examples. For example, for the last number of years, I’m not exactly sure, five, the NSF and DOE now have what they call a joint PI meeting, which is there are quite a few PIs that both of them, the basic energy of science is part of DOE and NSF fund, and so instead of having a single meeting for each of them, there’s a single meeting now where they have both sides. So, you get that crossover in the funding. That’s that sort of simple example. It sort of seems obvious, but making these things work sometimes, these are big agencies, really takes conversations that occur at a very high level.
Jim Warren: At a more economic level, we’re now very excited to see fairly recently the NSF did a supplementary call, so they took MGI style programs that they were already funding and offered people the opportunity to apply for extra money to work on data problems, for example. They could work with NIST or they could work with any of a number of entities to address those data problems. There you see actual money coming out of NSF to work with other agencies. Of course, that’s the ultimate commitment by an agency to sort of making something happen amongst the multiple agencies. We’re very excited by that.
Bryce Meredig: From the perspective of the materials research enterprise as a whole, how would you describe the impact so far by the MGI and the potential impact or the importance it could have in the future?
Jim Warren: That’s a tough question, of course, because now we’re saying, “What’s happening? What’s good? Where are the impacts now?” A lot of the impacts are tricky to measure because they have to do essentially with the psyche, if you will, of the community. We spent a lot of time basically trying to tell people their paradigm shift was in the offing, and I think we were very correct upon that point, but nonetheless, it’s very difficult to sell a paradigm shift, as it were. People just have to experience the change themselves and then start to see the landscape changing around them. We moved the funding needle a bit and people start to realize that they’ve got to reformat their thinking, and now we’re starting to see, I think, a huge amount of change in behavior.
Jim Warren: Then as you couple this, and I’m sure we’ll get into it a bit more with the advent of machine learning and AI style approaches, which was happening sort of in parallel, now there’s these enormous polls to start to think in an MGI style right from the very beginning. We’re starting to see that the infrastructures that have been built either by the government or by companies or academia are really starting to be leveraged and people are really starting to achieve the goals of the MGI. Right? Design new materials. There are, of course, lots of small examples where people have made new materials.
Bryce Meredig: You mentioned an interesting challenge that exists with the MGI, which is this notion of culture change. Certainly that’s something that we’ve seen at Citrine and that’s something that organizations wrestle with. How have you and NIST been approaching the question of culture change around MGI style thinking and materials data?
Jim Warren: Arguably one of the most important pieces of my job is basically selling these ideas. So, a lot of the culture change is affected initially by just selling of vision, saying this is obvious in some sense the reasons that we’re not doing it are not technical, we’re not doing it because we don’t do it. It’s this sort of chicken and egg kind of problem. There’s a couple of elements to that culture change. To make culture change happen, in my opinion, you need two things. Right? You need to change the rules and you need to change the tools.
Jim Warren: The government, of course, is very good at rule making. Very few people like it when you change the rules though. It’s usually considered a burden, but we have seen some of that. There’s a lot of pressure coming from various quarters to make your data more open. If you’re receiving government funding, there’s an expectation that that publication hopefully will be more open access than it currently is. Depending on the fields you’re in, you’re seeing a lot of push in that space. In particular, there’s a push toward making sure that the data that supports a publication now is openly available. You can imagine government mandates in that space to do precisely that. In the case of the federal agencies, there were directives that came down in the beginning of the decade, and certainly continues to this day that essentially open up as much of government data as is practical, including scientific data, so there’s another kind of rule change that occurs there.
Jim Warren: At the same time, what we really need are people to have tools available that allow them to take advantage of these platforms, of these data availabilities, and, excuse me, and then to build platforms where people can then use that data to do good science. Once we see the tools get built up and the infrastructure is sufficient, I think that’s when you see the paradigm shift. That’s when you see the culture shift, because no longer are you seeing this is a good idea, you could do cool stuff. What you’re seeing is your colleagues doing cool stuff and you’re like, “I want to do that too.”
Bryce Meredig: The tried and true idea of social proof. I think we know that works well in science. I’d be curious to hear more about what you mean in this context when you talk about tools and platforms. I think especially platform is kind of an abstract idea for a lot of folks, especially if they’re not computationalists.
Jim Warren: Sure.
Bryce Meredig: What are some of the things that you’re thinking of when you talk about that?
Jim Warren: Well, so this is… and maybe we’ll come around and we’ll talk about the Citrine platform as well, which is a fine example of what I’m talking about, but everybody, or most people, are quite familiar with your Facebook, your Instagram, or whatever. You’re looking at a platform, a website, from people’s point of view, where they can go and do something that they want to do, whether it’s set up a party for their friends or chat about the recent news or whatever it is. But from Facebook’s point of view, of course they’re trying to do something completely different, which is gather data and market things to you. In the same way, you can imagine platforms where people can go, and they say, “I want to put my data here so I can do cool science.” But at the same time, the second that you put the data there, you have then added to the aggregate knowledge, and then, in a lot of cases, that data will be open access. It will now have been well curated, because that will be a condition for putting it on that platform.
Jim Warren: So, you can start to see all of these side benefits once you put this “tool” in place. That then drives a different kind of behavior on the part of the researcher, because they want to achieve a research goal. But at the same time, the community starts to benefit immediately or much more quickly. There could be embargoes and things like that, but in general, they’ll be able to benefit from this amassing of information, or at least the protocols for making that information available.
Bryce Meredig: I think that’s really the holy grail, one of the holy grails is if we have this infrastructure and these platforms in place, does that enable us then to do science we couldn’t have done otherwise? Would you say there are any good examples of that or sort of emerging examples?
Jim Warren: Absolutely. Right now they’re more boutique. I’ve been looking for these examples forever, but you can take this nice example of another platform which is a NIST funded materials data facility based at University of Chicago Argonne, where I have more hand tools and aggregation of multiple data sets from different researchers. Some of the data came from a NIST site and some came from Rampi Ramprarsad’s site. Rompey used to be at UConn, now he’s at Georgia Tech. They took those data sets and essentially improved an answer that could be achieved from any one of the separate computations in this case. So, you merge the data sets, you get a higher quality answer.
Jim Warren: In a sense, that’s trivial because that’s the way science is supposed to work. Right? You take the prior work, you don’t redo all the prior work, you add to the prior work and improve the answer. So, trivial in that sense, but from a technical point of view, it’s hard. Right? What we’re really trying to do is to solve a difficult technical problem around the differing formats of data, making sure you understand the data, interpret the data, and then get these platforms to start to talk to each other. That’s where the work is and what a lot of us are thinking about, is how to get these things integrated.
Bryce Meredig: Yeah, I know. I’ve always been struck to draw an analogy to another field, how common it is for researchers in medicine to do these so called meta studies or meta analyses where the entire research project is to collect the results of, let’s say, 500 different studies and perform a meta analysis across them. I think there’s enormous room to be doing that in material science and chemistry, and it’s just not the norm today. But I think it could be with some of these MGI tools and the ecosystem that’s emerging.
Jim Warren: Absolutely, totally agree. It is interesting, I guess we do do it on a smaller scale. My guess is in the case of medicine, it’s a question of labor and money. Everything seems to have an extra zero in that space.
Bryce Meredig: Yeah, that’s true. It’s often said, I guess, that we don’t have an equivalent of the NIH on the materials side. It’s a combination of many different funding agencies, which of course you already mentioned. I want to go back to what you just said a moment ago about building infrastructure, specifically interoperable infrastructure and getting the platforms and the tools that are emerging to talk to each other. How would you evaluate the status of that kind of work today and what should we be together about in the future?
Jim Warren: Okay. That’s clearly still what I would say gently is piecemeal. Not everything obviously seamlessly talks to everything else, and I don’t see that happening any time in the near future. At the same time, the way I think about it is a domain, whether it might be maybe just certain kinds of microstructure and characterization, it doesn’t really matter what the problem is, but you want to go in and see if you can attack a sub space within the materials domain, which is so enormous. Right? There are so many different kinds of processes, whether it’s metallurgic or casting or some kind of manufacture of ceramics, or whatever it is, so we need to pick a specific subset to solve these problems.
Jim Warren: Then you have to ask the question, what infrastructures exist and how am I going to integrate them? Often the answer there is take a bunch of technical developers in these platforms and then lock them in a room for a few days. They can usually sort out the technical issues and hopefully evolve a few de facto standards, if you will, for how people can describe their information so that they’re more easily sucked into these platforms and they’re more useful to everybody in general. We do have some success stories there. There are some nice collaborations with Citrine, with the Materials Data Facility, with the Air Force research labs efforts and a few of the other major platforms.
Jim Warren: Of course the whole density functional theory, first principles computations and materials, is, I think, a really large success where the data was fairly well characterized and understood, and so they could build these sort of interoperable platforms.
Bryce Meredig: Yeah, I think that’s certainly been the trend I’ve seen as well, that the infrastructure community and the platform community of which of course Citrine is a member, NIST is a member of the Materials Data Facility. I’d like to think we are starting to close ranks and starting to make some of those connections happen, because of course, as an infrastructure provider, one of the things you’re always concerned about is do end users, the materials researchers that we’re working with, feel that they have to make mutually exclusive choices about the tools and the platforms that they’re using? I think the more they feel that they’re getting pigeon holds into one path or one set of standards, the less likely they are to come to the table. I think that’s a case where it’s on us in the platform community to make it easy for them.
Jim Warren: I agree. I think that your statement around risk, effectively for anybody committing to a platform, particularly if the curation costs are at all high, are something we definitely want to mitigate. Yes, we should close ranks. A lot of NIST’s mission is to do these kinds of convening, of getting people into a room and trying to decide on what stuff can we collaborate on, which stuff is in your business model and you don’t want to collaborate on, that’s fine too, but let’s figure out where the sweet spot is for as many of us as possible. If this looks scary and disruptive to your business model, let’s see if we can figure out a way for you to modify your business models, because you probably have some enormous assets of significant value that you can bring to the table. We’re trying to help people with that as well because any time that things change, it’s a little bit scary.
Bryce Meredig: You mentioned curation, the notion of the effort and expense involved in curating materials information and materials data, and of course this is an area where NIST has a global reputation as being a standards organization and specifically a source of high quality materials data. I’d be curious to hear your thoughts and comments on the role or the importance of high quality known pedigree materials data in materials research and MGI specifically.
Jim Warren: Wow. That’s a great question. I would argue that certainly when we started the MGI we thought that was what we were going to focus on. We were saying, “Okay, data quality. That’s what we’re good at. NIST is all about adding that extra decimal point onto the most accurate measurements, and we like to provision that accurate data.” Certainly that’s not going to change, but we also realize at the same time that we have to do a whole lot better job basically on the data management, data exchange problem, and we really had no idea just how hairy it was until we started to get involved in it. So, you’ve got to do that great job managing the data, qualifying the data, describing all of the various properties of the data, where it came from, providence information, stuff like that. Then to answer your second question, what are we going to do about the quality problem, which is, of course, a lot of this data.
Jim Warren: It’s going to be a question as to how they evaluate its quality. So, one of the things NIST will continue to do is try to find ways to provision as much reference data as it can that allows people to trust in some kind of ground truth for whatever it is that they’re trying to do. We will be working with the broader community to try to develop these kinds of reference data sets. We’ll also be helping people develop best practice, we’ll also be hopefully disseminating methodologies because we won’t be able to say, “NIST is not in general interested in providing a seal of approval for data unless NIST was involved in generating that data itself.”
Jim Warren: There’s obvious reasons for that. We’re a government agency, we don’t want to get into the business of ranking stuff that’s coming basically from our own customers. So, we want to totally stay as neutral as possible, and at the same time, provide service. We want to give people the ability to make judgements for themselves. Of course, the more meta data there is associated with the data, the better a job anybody can do in assessing the quality of that information. Of course another big piece of this is uncertainty quantification, something NIST has long cared about. Again, the data will be probably judged as a higher quality if the uncertainty sensitivity analysis and things of that quantitative nature are also well characterized.
Bryce Meredig: I think the best mental model I’ve actually come across for the differing level of context and quality for materials data comes from NIST. It’s that pyramid visualization that I think I’ve seen on slides from you and other folks. We’re down at the bottom, the largest quantity is raw data generated out of an experiment, unknown quality, unknown providence, and clearly a lot of additional post processing and analysis that needs to happen to get to the point of extracting value from the data. Then of course at the top you have highly curated reference quality data sets like the NIST standard reference data sets. Then there are many levels in between. I think the MGI is causing more people to realize how important it is to have that whole spectrum in mind and have a plan for how data go from being at the bottom of that pyramid towards the top.
Bryce Meredig: I think you mentioned something important which is education that this is not a familiar mode of thinking for a lot of material scientists. Have you seen people start to come around on the importance of data curation, data management, things that often don’t get front of mind attention from funding agencies and so forth, but we’re seeing in MGI and materials informatics becoming really, really important and critical for success.
Jim Warren: Certainly it’s on people’s minds, I think it started, as I was saying a few minutes ago, with the realization that data management plans were becoming more prominent, and so people did need to think about it or they wouldn’t get funded. But honestly, the peer review system wasn’t giving a very high ranking under when these things were being funded, so the emphasis was not quite where I would consider it to be ideal. At the same time, we were saying before, only having rules in place doesn’t really make it work. What people really want is this case where they see the advantage of using the data, and there I think we are starting to see a change. All one needs to do is open up a magazine, or, excuse me, a journal. There are a lot of articles now on machine learning for materials discovery, very high profile journals, all the big names, Nature Science, Proceedsing of the National Academy, whatever it is. I think that’s an easier case than almost anything else.
Jim Warren: So, the real problem people have is how to get started. There’s a lot of aspects to this problem now. It’s not a solved one, it’s not like you can just go download a solution. You got to kind of know what you’re doing, and the landscape is rich. The workshops that we’re having now, we’re now approaching another one where we’re going to try to deal with just some microscopy issues, and can we do a better job in that space? It’s going to be a long and interesting process until I think it starts to become self sustaining and the community itself helps me solve this problem instead of this small team of us trying to do it ourselves.
Bryce Meredig: Right. I think one thing we’ve learned as a community is that you can’t boil the ocean. Because material science is so diverse from the perspective of the sub domains, the types of data being generated, it seems that we have to tackle the problem on a sub community by sub community level, for example, like you just mentioned, the microscopists can get together and start to solve some of these problems. Those of us in the infrastructure side can do it. You brought up the density functional theory community. I think we’re starting to see this sort of organically happen, but we’ve had to learn through experience that you can’t solve the problem for all of material science, top down or in one stroke.
Jim Warren: We knew that in the beginning, right? Always it was look at the scope of the problem and go, “This is big.” Our benchmark community was the astronomy community where they’ve done just such a spectacular job, but the problem is arguably less complicated, it’s just light of various frequencies that they have to capture from a bunch of different kinds of devices and organized by a point in the sky. Here we’ve got endless processing techniques, a whole bunch of different measurement techniques, so it’s just a lot more complicated. As I said before, let’s go find a spot and solve it there and then see if we can integrate. That’s certainly what our approach has been.
Bryce Meredig: You mentioned a minute ago the very quickly rising popularity and interest in machine learning and AI in material science, specifically. One point of confusion that I’ve seen on the part of the community is how do machine learning and AI relate to the Materials Genome Initiative? To what extent do these overlap? To what extent is one a subset of the other? How would you characterize that?
Jim Warren: I would say that they’re essentially one in the same. I would make two points, one of which is sort of self aggrandizement, which is I started talking about this in the context of MGI back in 2013, when we knew that the third element of NIST’s mission would be enabling off the infrastructure data driven material science. I wasn’t even using the words AI and machine learning at that time because it really wasn’t part of the popular lexicon. Honestly, I thought it was still a good distance away. I’ve been pleasantly been proven wrong on that point, which is that these algorithms are really starting to come up very quickly and certainly taking over the popular consciousness in the last few years, so the hype is a bit overwhelming. But at the same time, it’s a blessing because it’s been this extreme driver. As the papers start to come out, people are suddenly looking around going, “Wow, I’ve got to get on this bandwagon. This looks like there’s some really amazing research going on.” That’s sort of step one.
Jim Warren: Then the second thing is the MGI was really, in no small part, about how do we do a better job of getting computation modeling integrated into the traditional material science regimen and getting it as tightly integrated as possible. The traditional models that we tended to think about were physics based phenomenological or quantum based models of nature that you could then use to make predictions, which would then allow you to accelerate materials design. But AI is nothing more than a tool for generating models. There are some different risks associated with developing those models, but nonetheless, it’s a model generating machine. So, in that sense, it’s just another piece of the tool set that enables the acceleration of materials discovery. So, it’s a completely natural fit within the materials genome.
Bryce Meredig: Well, we are actually running out of time here, but one last question that I wanted to ask you is from the perspective of NIST and being in the eyes of many, I think, sort of the representative of the materials genome at NIST, especially outwardly facing, if an organization, whether it be an industrial company or a laboratory group, wants to learn more about MGI, about the importance of data in materials research, they want to learn about data management, do you have any recommendations for them about where they can get started? I think you mentioned there’s a lot out there, and that could probably be overwhelming for people.
Jim Warren: That’s true. First of all, anybody’s just welcome to send me an email, quite frankly. I’m on the internet, firstname.lastname@example.org, and I’ll try to help. We also have a mailing list that you can get at that anybody can sign up to that is interested. That’s a very low traffic mailing list, so sort of big announcements for the Materials Genome Initiative, so I think we probably send out three or four, at most, a year. That can be found at mgi.gov. NIST also has it’s own MGI space at mgi.nist.gov, and there’s a lot of information about our programs there. So, I think that’s most of the major NIST and government relevant resources, but drop me a line. I’d be happy to chat or get the conversation going.
Bryce Meredig: Great, Jim. Thank you so much for taking the time to join us here to be part of this experiment. The first podcast, hopefully the first of many. We really appreciate your taking the time out of a busy schedule to join us.
Jim Warren: Well, I really enjoyed it, and I appreciate the conversation.
Bryce Meredig: Thanks, Jim.
Bryce Meredig: Thanks for listening to DataLab. If you have questions or an idea for an episode, contact our team at email@example.com.