Create a MIF: Materials Information File

This blog post will walk you through the steps you will need to follow in order to create a Materials Information File (MIF). The MIF is a flexible, JSON-based schema that has been developed to impose structure on materials data. More information on this file format can be found here.

Setting up

Citrine provides a Python toolkit for working with MIF files called mifkit (source code and installation instructions are available here.) We use mifkit throughout this post, as well as Python’s built in csv module for parsing CSV files.

The Data

In this post, we will convert the following table of bulk and shear moduli to the MIF schema. The table also provides information about the materials themselves and the conditions at which the measurements were taken.

Before we start, we will export this spreadsheet to a CSV, which would look like this:

Let’s get started writing a script that will convert this data to a MIF. As we are working with information about a material-measurement pair, we will use the Sample object, part of the MIF’s core schema, to store this data.

1. Import your modules

Create a Python file and import mifkit and any additional modules you will need to parse the data.


# -*- coding: utf-8 -*- set the coding here as the input file contains non-ascii characters
from mifkit import mif
from mifkit.objects import * #using import * imports all possible MIF objects
import csv

2. Open your data file

Open the data file and parse its content using the CSV module. As the first two rows in the sample file are headers, we will use the next() function to skip over these when iterating through the input file.


with open("input_table.csv", "rU") as f: #this opens our data file ‘input_table.csv’ in universal read mode
reader = csv.reader(f) #parse the data using the csv module
next(reader) #skip row one
next(reader) #skip row two

3. Create a list to store information from each row and loop through each row of the CSV after the headers


samples = []
for row in reader:

4. Store the reference information

Reference objects store information about the source of the data. There are a number of fields that you can use to store this information, the most common being doi, title, and url (all of which are strings). If possible, use the doi field since this is a unique identifier that can be used to unambiguously look up sources.

reference = Reference()
reference.doi = row[4] #row[4] references the fifth column of the current row and the string from this cell will be stored in the doi field

5. Store the material information

Material objects store information about the material which is under investigation. Available fields are chemical_formula, common_name, and condition. In this example, we have the chemical formula as well as several conditions (structure, crystallinity, and crystal system) of the material.

First, the chemical formula can be stored in the relevant field

material = Material()
material.chemical_formula = row[1] #row[1] references the second column of the current row

The material condition field stores Value objects and we’ll create one object for each of the conditions.

structure = Value()
structure.name = “Structure”
structure.scalar = row[0] #the structure from the first column is stored as a scalar

crystallinity = Value()
crystallinity.name = “Crystallinity”
crystallinity.scalar = “Single Crystal” #This is the same for every row and is only provided in the heading so it can be hard coded

crystal_system = Value()
crystal_system.name = “Crystal System”
crystal_system.scalar = “Cubic” #This is the same for every row and is only provided in the heading so it can be hard coded

We store each of these conditions in a list:


material.condition = [structure, crystallinity, crystal_system] #store a list of value objects

6. Store the measurement information

The Measurement object is used to store information about a measurement and the conditions under which it was taken. In this example, we have two measurements: Shear modulus (G0) and Bulk Modulus (K0). We also have information that the measurements were taken at the standard conditions; these fields will be saved as conditions of the measurement.

First, store the information about the measurement conditions; these apply to both measurements.


temperature = Value()
temperature.name = “Temperature”
temperature.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row

pressure = Value()
pressure.name = “Pressure”
pressure.scalar = “Standard” #The header states that the measurements were taken at standard conditions so this can be hard coded for each row

Next, create Value objects to store the information about the measurement properties, bulk and shear modulus.


bulk_modulus = Value()
bulk_modulus.name = “Bulk Modulus K$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts
bulk_modulus.scalar = row[2] #bulk modulus is given in the third column of each row
bulk_modulus.units = “GPa” #units are given in the heading and can be hard coded

shear_modulus = Value()
shear_modulus.name = “Shear Modulus G$_0$” #Citrination uses LaTeX notation to represent symbols, superscripts and subscripts
shear_modulus.scalar = row[3] #shear modulus is given in the fourth column of each row
shear_modulus.units = “GPa” #units are given in the heading and can be hard coded

Then, we will create a Measurement object for each measurement and store the property and condition information in the relevant fields.


bulk_modulus_measurement = Measurement()
bulk_modulus_measurement.property = bulk_modulus
bulk_modulus_measurement.condition = [temperature, pressure]

shear_modulus_measurement = Measurement()
shear_modulus_measurement.property = shear_modulus
shear_modulus_measurement.condition = [temperature, pressure]

We will also indicate that the data is experimental data. The data_type field can only store either the string “Experimental” or the string “Computational”


shear_modulus_measurement.data_type = “Experimental”
bulk_modulus_measurement.data_type = “Experimental”

7. Combine the row’s information into a sample

Once you have stored all the information from a given row, you will need to combine this into a sample object by storing the information in the relevant fields.


sample = Sample()
sample.reference = reference
sample.material = material
sample.measurement = [bulk_modulus_measurement, shear_modulus_measurement] 

8. Store the sample in the samples list

For each row in the CSV file append the sample object for that row to the samples list.


samples.append(sample)

9. Dump the samples to a JSON file using the mif.dump method

mif.dump functions in a very similar way to the json.dump method in Python’s JSON module and it can accept all the same arguments as json.dump.


with open(“output.json”, “w”) as output_file: #create an output file
mif.dump(samples, output_file, indent=4) #dump the sample list to JSON and include an indent of 4 so that the file can be reviewed more easily

10. Run your Python script

Your script is now complete! You can run it and view the results in the file output.json.

Create your own MIF and contribute to Citrination

Now that you are familiar with mifkit, feel free to share some of your data on Citrination! By uploading data to this platform, you will be contributing to a growing dataset which is making materials data more open, accessible, and useful.

To contribute data, go to https://citrination.com/data_uploads/new.

If you have any questions or suggestions regarding this post, please contact us. Come back soon for our post on using our CSV template to structure your data.

The full script can be downloaded from here.