PyMKS Package Paper


  • Current practices for developing Materials Data Analytics Toolsets are highly localized within a few individual groups resulting in major inefficiency (unnecessary duplication of codes, inadequate verification and validation of multiple instantiations of code, not engaging the right talent for the right task, etc.)
  • Community development and sharing of code repositories has been successful in certain science communities. Advantages of this approach include increased e-teaming and e-collaborations, vastly improved code hygiene, promotion of open science, rapid verification and validation, and a dramatic increase in overall productivity. All of these are key ingredients for realizing a major bump (scale-up) in innovation in sciences and technology

- Community development of a materials data analytics toolset can significantly change the landscape of materials innovation through the emergence and adoption of the cross-disciplinary field of Materials Informatics; help realize the vision outlined in MGI and ICME. This is probably the only practical way to get materials scientists and computer scientists to establish meaningful, mutually beneficial, and highly productive collaborations.


  • The Materials Project is currently powered by pyMatgen (Ong 2013) - Integrated tools in python from first principles (DFT and MD) for thermodynamic and electrical quantities with applications in functional materials.
  • Open Quantum Materials Database (OQMD) (Saal 2013)- Open source data repository for phase diagrams and electronic ground states computing using DFT. OQMD also offers visualization tools and a Python API.
  • OpenKIM for atomic potentials
  • Citrine ?

All of the above are essentilaly data/information repositories. Maybe we can break this down as Data/Information Repositories, Code Repositories, and e-collaboration platforms. Perhaps not talk too much about the first and last - but just mention some names and provide some links and focus more of your review and paper on the second one?


- Questeck ? - I do not think they do anything in the "open" arena

In terms of code repositories - I think there are also codes dedicated to classes of problems. Would LAMMPS fall into this category? How about SPARRKS and MOOSE. DREAM-3D claims to be a code repository as well. In Europe there in DAMASK from Max Planck in Dusseldorf.

Data Storage

  • NIST Data Gateway - Over 100 free and paid querirbale on-line materials databases. Some examples of data are atomic structure, thermodynamics, kinetics, fundamental physical constants, x-ray spectroscopy, and more.
  • NIST D Space -
  • NIST Materials Data Curation Systems (MDCS) - General nline databased for materials data
  • Computational Materials Data Network (CMD Network) - Founded by ASM International, CMD Network is a data focus project that facilitates data warehousing, sharing and collaborations.
  • Granta ?
  • MatWeb - Database containing materials properties for over 100,000 materials.
  • Materials Atlas (link not working) - Database storing 3D results from experiments and simulations.

Structure Tools

  • Dream3D - Software tool that does synthetic microstructures generation, image processing and mesh creation for finite element.
  • PyMKS aims to seed and nurture an emergent user group in the materials data analytics for establishing homogenization and localization linkages by leveraging open source scientific and machine learning packages in Python. The approach used to develop PyMKS as well as several examples are presented. This paper is a call to others interested in participating in this open science activity.